Enabling Rust on the AMD Zynq UltraScale+ MPSoC

In this post, we’re sharing our journey of building a dedicated crate that provides support for the AMD Zynq UltraScale+ MPSoC. It is publicly available as adacore-zynqmp on crates.io. We’ll walk through how we orchestrated the boot process across different Exception Levels, implemented exception handling, initialized UART, and successfully integrated Newlib to provide limited std support.

Hardware

The AMD (formerly Xilinx) Zynq UltraScale+ is not a simple microcontroller. It is a heterogeneous Multi-Processor System on Chip (MPSoC) that integrates multiple processing domains onto a single die.

The device is divided into two primary subsystems: the Processing System (PS) and the Programmable Logic (PL). While the PL provides FPGA capabilities for custom hardware acceleration, our crate focuses on the PS, specifically the Application Processing Unit (APU).

The APU consists of a quad-core ARM Cortex-A53 cluster implementing the ARMv8-A architecture. This 64-bit architecture (AArch64) introduces significant differences compared to the 32-bit ARMv7 or Cortex-M architectures often found in embedded Rust projects:

Exception Levels (EL): A hierarchical privilege model ranging from EL0 (User) to EL3 (Secure Monitor).
Memory Management: A complex Memory Management Unit (MMU) that requires explicit configuration of translation tables and memory attributes (such as cacheability and shareability) before code can be executed reliably.
Peripheral Mapping: All peripherals, including the UART controllers needed for standard I/O, are memory-mapped within the PS address space.

In addition to the APU, the PS also houses a Real-Time Processing Unit (RPU) consisting of dual Cortex-R5F cores. At this stage, our crate is agnostic to the PL and RPU. While the hardware permits tight integration between the processing system and the FPGA fabric, we have deferred that support to future extensions.

Boot Sequence

On a complex architecture such as AArch64, the path from power-on to executing useful Rust code is not a straight line, but a sequence of state checks and privilege transitions.

The ARMv8-A architecture defines a hierarchy of privileges. Upon reset, the Cortex-A53 cores typically wake up in EL3 (Secure Monitor), the highest privilege level. However, depending on the boot method, such as loading via JTAG or being handed off from a previous stage bootloader like U-Boot, the processor might already be in EL2 (Hypervisor) or EL1 (OS/Kernel).

Our crate is designed for flexibility. It reads the CurrentEL system register at startup and adapts the initialization sequence accordingly. While a standard OS separates kernel space (EL1) and user space (EL0), our bare-metal environment runs all user-defined logic in EL1. This provides the application with full hardware access while keeping it separated from the secure monitor context.

To achieve this, we implement a "drop-down" strategy in assembly:

EL-Specific Initialization: Before leaving the current level (EL3 or EL2), we perform essential setup operations. This includes initializing the stack pointer, setting the Vector Base Address Register (VBAR_ELx) for early traps, and enabling essential coprocessors like the FPU and system timers.
Context Transition: We configure the execution state for the next lower level (ensuring the processor will run in the 64-bit AArch64 mode rather than the 32-bit AArch32 mode) and perform the exception return (ERET) instruction to transition safely into EL1.

One of our primary goals was to implement as much of the crate as possible in Rust to leverage its safety guarantees. We investigated utilizing existing ecosystem solutions (such as the now-deprecated r0 crate) to handle memory initialization from within Rust functions.

However, we determined that this approach carries significant risk. The Rust compiler makes specific assumptions about the state of memory (e.g., that static variables are properly initialized). Executing Rust code before the .bss section is zeroed and the .data section is relocated can violate these assumptions, leading to Undefined Behavior (UB).

To ensure strict compliance and stability, we implemented the memory initialization entirely in assembly. This ensures that by the time the program counter jumps to the Rust entry point, the memory environment perfectly matches the compiler's expectations.

Once the memory is consistent and the processor is in EL1, we branch to the Rust implementation. Here, we finalize the environment by setting up the EL1 Exception Vector Table to enable handling of exceptions and configuring the MMU. Once everything has been set up, we call the user-defined entry point.

On the Cortex-A53, the MMU is not optional if performance is a requirement. Without the MMU, the processor treats all memory as strongly ordered and non-cacheable, resulting in severe performance degradation.

Since we are not running a multi-process operating system requiring virtual memory isolation, we configure the MMU with a flat (identity) mapping. This creates a 1:1 translation between virtual and physical addresses. The primary goal of this configuration is not address translation, but memory attribute management. By enabling the MMU, we can mark code and data regions as cacheable while keeping memory-mapped I/O (MMIO) regions strictly non-cacheable. This unlocks the full speed of the processor caches and ensures memory protection, preventing code execution from data regions (NX bit) and unauthorized writes to read-only sections.

UART Driver

The PS provides two UART controllers (UART0 and UART1). Like most peripherals on this architecture, they are controlled via MMIO.

Writing a driver in bare-metal Rust involves balancing direct hardware access with type safety. Instead of relying on manual pointer arithmetic or magic numbers, we leveraged two key crates to build a robust abstraction:

bitbybit: We used this crate to map the hardware registers to Rust structs. This allows us to interact with the UART control and status registers as typed fields, preventing common errors associated with bit shifting and masking.
embedded-io: By implementing the Read and Write traits from this crate, our UART driver becomes compatible with the wider embedded Rust ecosystem. It provides an interface equivalent to std::io, enabling standard formatting macros.

To simplify the usage, we implemented a lazy initialization pattern. The driver configuration is performed automatically the first time the device is requested.

We expose the hardware through two accessors, uart::uart0() and uart::uart1(). Because the driver implements the standard Write traits, printing formatted text to the console is straightforward:

use adacore_zynqmp::uart::Write;

// Obtain a handle to the first UART controller
// Initialization happens automatically on first access
let mut uart = unsafe { adacore_zynqmp::uart::uart0() };

// Use standard formatting macros
writeln!(uart, "Hello world, the answer is {}!", 42).unwrap();

Exception Handling & Interrupts

A robust embedded system must handle the unexpected. Whether it’s a standard hardware interrupt (IRQ) from a timer, a critical fast interrupt (FIQ), or a synchronous exception caused by a bad memory access, the CPU needs a predefined path to handle the event.

In the AArch64 architecture, this is managed via the Exception Vector Table. The base address of this table is stored in VBAR_EL1. When an exception occurs, the processor calculates an offset from this base address depending on the exception type and the current execution state, then jumps to that location.

Writing a vector table in raw assembly can be tedious and error-prone, involving careful context saving (stacking registers x0 through x30) to ensure the interrupted code can resume safely. Our crate abstracts this complexity entirely. We provide a pre-configured vector table that handles the low-level context switching automatically.

While the crate handles the mechanism of the exception, the policy of what to do when an exception occurs is left to the application developer. We designed the crate using a "weak linkage" pattern, allowing developers to define custom logic for specific exception types by implementing specific function symbols. The developer simply defines functions with the required no_mangle attributes. The crate's linker configuration automatically directs the CPU to these user-defined functions when the corresponding exception fires.

// Example: Defining custom exception handlers in the application

#[unsafe(no_mangle)]
extern "C" fn _sync_handler() {
    // Handle synchronous exceptions (e.g., log the faulting address)
}

#[unsafe(no_mangle)]
extern "C" fn _irq_handler() {
    // Handle standard interrupts (e.g., check interrupt controller)
}

#[unsafe(no_mangle)]
extern "C" fn _fiq_handler() {
    // Handle fast interrupts
}

#[unsafe(no_mangle)]
extern "C" fn _serror_handler() {
    // Handle system errors
}

This approach allows the application logic to remain clean and focused. The crate handles the "dirty work" of saving the register state and routing the signal, delivering the developer directly into a clean Rust function context to handle the business logic of the interrupt.

`std` Support via Newlib

Writing strictly no_std Rust is powerful, but it often means giving up convenient features such as dynamic collections (Vec, String) or standard formatted printing. To bridge this gap without a full operating system, we integrated Newlib, a lightweight C library optimized for embedded systems.

Newlib acts as an intermediary. When high-level Rust code attempts to allocate memory or print to the console, the standard library calls down to low-level system hooks. Our crate implements these hooks to translate generic requests into specific hardware actions.

We focused on implementing three critical interfaces to enable a "std-like" experience:

Standard Output (stdout): We mapped the Newlib write symbol directly to our UART driver. This was a significant quality-of-life upgrade. Instead of manually passing a UART handle to every function that needs to log data, developers can now use the standard Rust macros:

// No need to pass a driver instance; standard macros just work
println!("System status: OK");
eprintln!("Error: Initialization failed");

Dynamic Memory Allocation: To support the alloc crate, we implemented sbrk to manage the heap region defined in our linker script. The function simply advances the “program break” pointer whenever Newlib’s memory allocator requests additional space. This is sufficient to unlock Rust’s powerful heap-based types. With this in place, we can use Vec, Box, and String directly on bare metal. This enables flexible data processing pipelines that would otherwise require complex, static buffer management.
Program Exit: On a desktop operating system, exit() returns control to the OS. On bare metal, there is nowhere to return to. We implemented the _exit hook to trigger a Soft Reset. By writing to the specific registers in the CRL_APB (Clock and Reset Control) module, calling exit (or panicking) effectively reboots the processor, resetting the system state for the next run.

By combining these elements, our "bare metal" code looks surprisingly high-level:

fn main() {
    // We can use dynamic vectors
    let mut data = Vec::new();
    for i in 0..10 {
        data.push(i * 2);
    }

    // We can print standard output
    println!("Computed Data: {:?}", data);

    // When we return, the system soft-resets
}

Validating the Platform: Automated Testing via QEMU

Building a support crate is high-stakes. If the startup code fails, the application never runs. To ensure stability without requiring physical hardware for every commit, we opted for a software-only testing pipeline using QEMU emulation.

We initially looked to the existing embedded Rust ecosystem for testing solutions, but we encountered architectural mismatches. embedded-test is primarily designed for hardware-in-the-loop testing using debug probes, which didn't fit our purely emulated CI requirements. defmt-test is strictly coupled to the Cortex-M architecture and thus incompatible with our AArch64 Cortex-A53 target.

To bridge this gap, we implemented a custom test runner utilizing the cargo-xtask pattern. This allows us to orchestrate the testing lifecycle directly via Cargo.

The runner compiles the test binaries and launches a QEMU instance targeting the Zynq UltraScale+ machine (xlnx-zcu102). Crucially, we needed a way for the emulated device to report success or failure back to the host CI environment. We achieved this using qemu-exit, which leverages the semihosting interface to trigger a simulator exit with a specific status code.

This setup allows us to run end-to-end tests on every merge request, covering many parts of the crate's functionality:

Boot Sequences: We verify that startup logic works correctly, whether the kernel is loaded at EL3 (simulating a cold boot) or EL1 (simulating a handoff from a bootloader).
Peripheral I/O: We validate the UART driver by asserting expected output patterns.
Exception Handling: We trigger synthetic interrupts and exceptions to ensure the vector table correctly routes control to our custom Rust handlers and back again.
Runtime Services: We confirm that Newlib integration is functional by executing heap allocations (e.g., Vec::push) and string formatting operations.

Related Work

Related efforts in the Rust ecosystem include the aarch64-rt crate, developed in parallel and not yet available when we began this work. It provides startup code and exception handling for AArch64 Cortex-A processors, but does not cover SoC-specific functionality such as UART integration or Newlib support.

Another relevant project is the duke-artiq/zynqmp-rs repository, which targets the ZCU111 Evaluation Board. At the time of writing, the repository has not seen updates for over a year. Its implementation relies on the now-deprecated r0 crate and includes only minimal (dummy) exception handling.

Conclusion

The crate enables using Rust on the Zynq UltraScale+ MPSoC. By abstracting the intricacies of the AArch64 boot process, we have created a runtime environment that transforms a complex heterogeneous SoC into an approachable target for safe systems programming. Users of the crate can now bypass the high barrier to entry typically associated with bare-metal development, gaining immediate access to standard library features such as dynamic allocation and formatted output without the overhead of a full operating system.

The current iteration provides a production-ready environment for single-core applications running on the Cortex-A53 core. Moving forward, we aim to enhance its functionalities. In particular, we plan to implement libtest support to provide a native test harness, enabling developers to execute unit and integration tests using the standard cargo test workflow. Ultimately, this infrastructure frees developers to shift their focus from low-level silicon configuration to high-value application logic.

Tools

Services

Get an Overview

Ready To Go?

Languages

Ready To Go?

Industries

Learn how NVIDIA Adopted SPARK for Security-Critical Software Development

Company

Careers

Explore Resources

The AdaCore Blog

Learn Ada & SPARK

Community

Get Started with Ada

Support

Product Roadmap

Enabling Rust on the AMD Zynq UltraScale+ MPSoC

Hardware

Boot Sequence

UART Driver

Exception Handling & Interrupts

`std` Support via Newlib

Validating the Platform: Automated Testing via QEMU

Related Work

Conclusion

Author

Tobias Reiher

Latest Blog Posts

Announcing the 2026 Ada/SPARK Crate of the Year Award

International Women in Engineering Day: My Path to Success

Abstract Interpretation vs Agentic AI

Enabling Rust on the AMD Zynq UltraScale+ MPSoC

Hardware

Boot Sequence

UART Driver

Exception Handling & Interrupts

std Support via Newlib

Validating the Platform: Automated Testing via QEMU

Related Work

Conclusion

Author

Latest Blog Posts

Announcing the 2026 Ada/SPARK Crate of the Year Award

International Women in Engineering Day: My Path to Success

Abstract Interpretation vs Agentic AI

`std` Support via Newlib