How to run Reth in Intel TDX

by 𝚝𝟷’s KSS

Dec 02, 2024

Evolution of Intel SGX into Intel TDX

Trusted Execution Environments (TEE) are hardly a novelty. They are currently one of the most established methods of performing verifiable computation server-side. As crazy as it sounds, SGX was introduced by Intel nearly a decade ago. The wider blockchain community took its time to realize how great of a pairing TEEs and blockchains are, and harness that potential by taking the best of on-chain security and off-chain performance.

There were some good reasons to be cautious about Intel SGX with its poor developer experience, lack of crucial features and numerous exploits (https://sgx.fail/, anyone?). In the end, with the release of Intel TDX in 2023 and great RnD teams such as Flashbots and Automata paving the path, many L2 solutions decided to adopt TEEs as a crucial extension of their tech.

Before I get into the details, I would like to shout out to our partners at Intel and Google Cloud Platform, for providing cloud hardware and expert support. Special thanks to Mike and Kunal from GCP and Benny from Intel!

Is it possible to run heavy software in TDX?

A great thing about Intel TDX is that it allows you to deploy and run software within the confines of a secure Virtual Machine. Thanks to that, TEE devs are no longer forced to work with a cumbersome Intel SGX C++ SDK (or its Rust port) and instead build their stuff in a programming language of their choice, with limited overhead.

Since TDX was released quite recently, the extent of its applicability is still a bit of an uncharted territory. One can be reasonably worried about TDX’ performance and resource limitations when running heavier software, such as a full blockchain node. Well, fret not, since good old KSS is here to show you that this can be done, and quite easily so.

Getting a TDX-ready machine

Before dreaming of building TDX-powered software, first you need the right hardware. Self-hosting such a machine is an obvious thought, although I would discourage that whole-heartedly. Not only do you need to have an Intel processor boasting SGX enclaves (which are needed for TDX attestation)¹, but also a compatible BIOS version and settings. Achieving this can be both pricey and time-consuming.

There is an easier way to do that though! 𝚝𝟷 is lucky enough to have strong partnerships with both Intel and Google Cloud Platform and both of these were able to provide us with usable TDX boxes. A big advantage of Intel is that they provide you with bare metal access to the machine that runs the TDX VM. On the other hand, GCP offers a little better user experience with some of the richness of its Graphical User Interface helping with management of your TDX VMs, their resources and communication between each other.

It has to be mentioned that TDX support is in its infancy which can manifest itself in some issues such as limited GUI support in GCP or quite a funky SSH setup in Intel. Both of these solutions work just well, at least for the development purposes. We are certain that by the time 𝚝𝟷 releases to mainnet, cloud support for Intel TDX will be quite mature. For now - just being able to SSH into a TDX VM and verify using https://github.com/canonical/tdx that all the hardware heavy lifting was done for us is fantastic.

Extremely slow IO with older firmware versions

Another growing pain we encountered was that with certain older firmware versions, input/output performance in the guest VM was significantly (roughly by one order of magnitude) slower than in a non-TDX guest VM. We spotted this by accident and confirmed the diagnosis by adjusting the benchmark test suite (merged upstream now). Upgrading to SEAM_1.5.02.00 and TDX_1.5.05.46 fixed the issue (we had received dedicated instructions on how to achieve this, but Intel’s official guide is live now).

Example performance benchmark test

For INTEL(R) XEON(R) PLATINUM 8570, with 224 CPUs, running in Intel's SDP cloud, Ubuntu:

<testsuites>
    <testsuite name="pytest" errors="0" failures="0" skipped="0" tests="23" time="30914.770" timestamp="2024-06-12T04:35:25.801200" hostname="sdp">
        <testcase classname="pytest.test_host_tdx_hardware" name="test_host_tdx_hardware_enabled" time="0.001" />
        <testcase classname="pytest.test_host_tdx_software" name="test_host_tdx_software" time="0.011" />
        <testcase classname="pytest.test_boot_basic" name="test_guest_boot" time="102.168" />
        <testcase classname="pytest.test_boot_coexist" name="test_coexist_boot" time="5.650" />
        <testcase classname="pytest.test_boot_multiple_vms" name="test_multiple_vms" time="25.068" />
        <testcase classname="pytest.test_boot_td_creation" name="test_create_td_without_ovmf" time="0.432" />
        <testcase classname="pytest.test_guest_eventlog" name="test_guest_eventlog" time="106.441" />
        <testcase classname="pytest.test_guest_eventlog" name="test_guest_eventlog_initrd" time="101.572" />
        <testcase classname="pytest.test_guest_measurement" name="test_guest_measurement_check_rtmr" time="102.928" />
        <testcase classname="pytest.test_guest_reboot" name="test_guest_reboot" time="207.524" />
        <testcase classname="pytest.test_guest_report" name="test_guest_report" time="107.273" />
        <testcase classname="pytest.test_host_tdx_hardware" name="test_host_tdx_hardware_enabled" time="0.001" />
        <testcase classname="pytest.test_host_tdx_software" name="test_host_tdx_software" time="0.012" />
        <testcase classname="pytest.test_perf_benchmark" name="test_run_perf_0_normal" time="11124.133" />
        <testcase classname="pytest.test_perf_benchmark" name="test_run_perf_1_td" time="14106.958" />
        <testcase classname="pytest.test_perf_boot_time" name="test_boot_time_0_normal" time="18.120" />
        <testcase classname="pytest.test_perf_boot_time" name="test_boot_time_1_td" time="40.882" />
        <testcase classname="pytest.test_perf_boot_time" name="test_boot_time_2_normal_16G" time="17.444" />
        <testcase classname="pytest.test_perf_boot_time" name="test_boot_time_3_td_16G" time="56.652" />
        <testcase classname="pytest.test_perf_boot_time" name="test_boot_time_4_normal_64G" time="17.471" />
        <testcase classname="pytest.test_perf_boot_time" name="test_boot_time_5_td_64G" time="57.457" />
        <testcase classname="pytest.test_quote_configfs_tsm" name="test_quote_check_configfs_tsm" time="92.785" />
        <testcase classname="pytest.test_stress_boot" name="test_boot" time="4623.619" />
    </testsuite>

Running Reth with Kurtosis

Awesome, we have our TDX box. Now what? There is nothing stopping you from running a full Reth node using instructions at https://github.com/paradigmxyz/reth . It should just work. If you are reading this article, it is more likely you are planning to do some Ethereum L2 development using TEEs. This is more akin to what we are trying to do at 𝚝𝟷.

Developing your L2 solution against Ethereum testnets can be cumbersome - with performance issues, gas and a multitude of unwanted transactions. On top of that, if you are developing a rollup, you might be daunted by the necessity of putting together a private network of execution and consensus nodes.

Thankfully, Kurtosis comes to the rescue! This amazing tool lets you spin up a fully functional Ethereum stack literally in the matter of minutes. Just follow this guide and you should find that running a private Ethereum network inside a TEE is not as scary as it sounds! You are more than welcome to experiment with network design, but to run a simple Ethereum config with Reth and Lighthouse, we suggest running Kurtosis with Ethereum Package and a following network_params.yaml:

# This just adds a single participant execution and consensus node. Add more if you like!
participants:
  - el_type: reth
    el_image: ghcr.io/paradigmxyz/reth    
    cl_type: lighthouse
    cl_image: sigp/lighthouse:latest
# These are some of the most useful additional services, but there is more! Refer to Kurtosis docs
additional_services:
  - blockscout
  - dora
  - prometheus_grafana
# This persists chain data onto Hard Drive instead of keeping it in memory
persistent: true

Should you have any issues with Kurtosis, do not hesitate to jump on their Discord and ask for help. They are very friendly and helpful!

Summary

In this article, I went over several technologies that we use at 𝚝𝟷and briefly hinted on how to put these puzzles together. The result is not only a Reth running inside an Intel TDX Trusted Domain, but actually an entire private Ethereum network!

A key takeaway from this exercise is that developing software using TEEs in 2024 is easier than ever! We just ran an Ethereum network inside TDX in under an hour! Doing the same thing in pure SGX several years ago would be very difficult - or even impossible.

It is true that Intel TDX introduces a slightly wider attack surface compared to the SGX, but for many use cases it is an amazing tradeoff for the insane developer experience it offers.

Fire up your secure and private VM and go crazy!

TDX Whitepaper The Intel-TDX architecture is designed to utilize an IntelSGX enclave, called the TD-quoting enclave, to generate the remote attestation for a TD. The CPU would provide a new instruction, SEAMREPORT, to be invoked only by Intel-TDX module and create an evidence structure that is cryptographically bound to the platform hardware for consumption by the TD-quoting enclave.