What is IOMMU and how it can be used?

Introduction

The I/O memory management unit (IOMMU) is a type of memory management unit (MMU) that connects a Direct Memory Access (DMA) capable expansion bus to the main memory. It extends the system architecture by adding support for the virtualization of memory addresses used by peripheral devices. Additionally, it provides memory isolation and protection by enabling system software to control which areas of physical memory an I/O device may access. It also helps filter and remap interrupts from peripheral devices. Let’s have a look at IOMMU advantages, disadvantages, and how it is implemented from the perspective of hardware and software.

Advantages of IOMMU usage:

  • One single contiguous virtual memory region can be mapped to multiple non-contiguous physical memory regions. IOMMU can make a non-contiguous memory region appear contiguous to a device (scatter/gather). Scatter/gather optimizes streaming DMA performance for the I/O device.
  • Memory isolation and protection: device can only access memory regions that are mapped for it. Hence faulty and/or malicious devices can’t corrupt memory.
  • Memory isolation allows safe device assignment to a virtual machine without compromising host and other guest OSes.
  • IOMMU enables 32-bit DMA capable devices to access to > 4GB memory.
  • Support hardware interrupt remapping. It extends limited hardware interrupts to software interrupts. Primary uses are the interrupt isolation and translation between interrupt domains (e.g. IOAPIC vs x2APIC on x86)

Disadvantages of IOMMU usage:

  • Latency in dynamic DMA mapping, translation overhead penalty.
  • Host software has to maintain in-memory data structures for use by the IOMMU

In hardware

IOMMU’s architecture is designed to accommodate a variety of system topologies. There can be multiple IOMMUs located at a variety of places in the system fabric. It can be placed at bridges inside one buses or at bridges between buses of the same or different types. In modern systems, IOMMU is commonly integrated with the PCIe root complex. It connects to the PCIe bus from downstream and to the system bus (e.g. infinity fabric) from upstream. One example topology is shown below.

Example Platform Architecture

In software

The IOMMU is configured and controlled via two sets of registers. One in the PCI configuration space and another set mapped in system address space. Since the IOMMU appears to OS as a PCI function, it has a capability block in the PCI configuration space. Additionally, up to eight data structures are placed in the main system memory. These data structures contain information about the mapping of interrupts and memory for the usage of peripheral devices. Information about the device’s memory access permissions is stored there too.

Enabling

IOMMU is a generic name for technologies such as VT-d by Intel, AMD-Vi by AMD, TCE by IBM and SMMU by ARM. Make sure that your CPU supports one of these before you try to enable IOMMU.

UEFI/BIOS

First of all, IOMMU has to be initiated by UEFI/BIOS and information about it has to be passed to the kernel in ACPI tables. For the end-user, that means that you have to enter the UEFI/BIOS settings and set the IOMMU option to enabled. For readers interested in developing firmware capable of IOMMU initialization, the next post in this topic will describe the process of enabling IOMMU for PC Engines apu2 in coreboot.

Linux kernel

Enable IOMMU support by setting the correct kernel parameter depending on the type of CPU in use:

  • For Intel CPUs (VT-d) set intel_iommu=on
  • For AMD CPUs (AMD-Vi) set amd_iommu=on
  • Additionally if you interested in PCIe passthrough set iommu=pt

After rebooting, you can check dmesg output to confirm that IOMMU is enabled

On Intel platforms:

1
dmesg | grep -i -e DMAR -e IOMMU

Output should look somewhat like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
[    0.000000] ACPI: DMAR 0x00000000BDCB1CB0 0000B8 (v01 INTEL  BDW      00000001 INTL 00000001)
[    0.000000] Intel-IOMMU: enabled
[    0.028543] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[    0.028831] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[    0.028924] IOAPIC id 8 under DRHD base  0xfed91000 IOMMU 1
[    0.536201] DMAR: No ATSR found
[    0.536211] IOMMU 0 0xfed90000: using Queued invalidation
[    0.536228] IOMMU 1 0xfed91000: using Queued invalidation
[    0.536236] IOMMU: Setting RMRR:
[    0.536246] IOMMU: Setting identity map for device 0000:00:02.0 [0xbf000000 - 0xcf1fffff]
[    0.537511] IOMMU: Setting identity map for device 0000:00:14.0 [0xbdea8000 - 0xbdeb6fff]
[    0.537521] IOMMU: Setting identity map for device 0000:00:1a.0 [0xbdea8000 - 0xbdeb6fff]
[    0.537535] IOMMU: Setting identity map for device 0000:00:1d.0 [0xbdea8000 - 0xbdeb6fff]
[    0.537547] IOMMU: Prepare 0-16MiB unity mapping for LPC
[    0.537559] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    2.039251] [drm] DMAR active, disabling use of stolen memory

On AMD platforms:

1
dmesg | grep -i -e AMD-Vi

Output should look somewhat like this:

1
2
3
4
5
6
[    0.805831] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.805832] pci 0000:00:00.2: AMD-Vi: Extended features (0x4f77ef22294ada):
[    0.805834] AMD-Vi: Interrupt remapping enabled
[    0.805834] AMD-Vi: Virtual APIC enabled
[    0.806045] AMD-Vi: Lazy IO/TLB flushing enabled
[    4.535573] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>

PCIe Passthrough

One of the most interesting use cases of IOMMU is PCIe Passthrough. With the help of the IOMMU, it is possible to remap all DMA accesses and interrupts of a device to a guest virtual machine OS address space, by doing so, the host gives up complete control of the device to the guest OS. It implicates that the host doesn’t have to virtualize this device nor translate communication between it and the guest, this almost completely removes performance overhead and latency caused by such translations. For example, by performing a passthrough of GPU it is possible to play games on Windows as a guest OS with performance unnoticeably different compared to the bare metal Windows. Another frequent use case is passing through a network interface card to use its full performance on the guest OS. The obvious drawback would be the inability to use passed devices by the host or other guests. Some solution to this problem is a technology called SR-IOV. It allows different virtual machines in a virtual environment to share a single PCI Express hardware interface, though very few devices support SR-IOV. It is almost exclusively available in server-grade devices. It is worth noting that not always IOMMU is capable of isolating a single device. Depending on the PCIe tree configuration, some devices are inseparable. If it is a case, those devices will be placed by the kernel in a single IOMMU group. These groups will be the topic of one of a future post.

Further reading

https://www.amd.com/system/files/TechDocs/48882_3.07_PUB.pdf

https://www.kernel.org/doc/html/latest/driver-api/vfio.html

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF

Summary

IOMMU is a useful device with many advantages. Among other things, It protects a system from DMA attacks and allows better isolation of virtual machines. Its main disadvantage is the latency of DMA transfers due to the need to check mapping and permission saved in main memory when the transfer occurs, but this latency can be largely alleviated by caching that information inside IOMMU. In summary, the use of IOMMU seems to be beneficial in almost every case.

If you think we can help in improving the security of your firmware or you looking for someone who can boost your product by leveraging advanced features of used hardware platform, feel free to book a call with us or drop us email to contact<at>3mdeb<dot>com. If you are interested in similar content feel free to sign up for our newsletter


Marek Kasiewicz
Junior Firmware Developer. Always trying to understand how things work at deeper level, keen on learning computer architecture.