|
| 1 | +######################## |
| 2 | +BL1 Immutable bootloader |
| 3 | +######################## |
| 4 | + |
| 5 | +:Author: Raef Coles |
| 6 | +:Organization: Arm Limited |
| 7 | + |
| 8 | + |
| 9 | +************ |
| 10 | +Introduction |
| 11 | +************ |
| 12 | + |
| 13 | +Some devices that use TF-M will require initial boot code that is stored in ROM. |
| 14 | +There are a variety of reasons that this might happen: |
| 15 | + |
| 16 | +- The device cannot access flash memory without a driver, so needs some setup |
| 17 | + to be done before main images on flash can be booted. |
| 18 | +- The device has no on-chip secure flash, and therefore cannot otherwise |
| 19 | + maintain a tamper-resistant root of trust. |
| 20 | +- The device has a security model that requires an immutable root of trust |
| 21 | + |
| 22 | +Henceforth any bootloader stored in ROM will be referred to as BL1, as it would |
| 23 | +necessarily be the first stage in the boot chain. |
| 24 | + |
| 25 | +TF-M provides a reference second-stage flash bootloader BL2, in order to allow |
| 26 | +easier integration. This bootloader implements all secure boot functionality |
| 27 | +needed to provide a secure chain of trust. |
| 28 | + |
| 29 | +A reference ROM bootloader BL1 has now being added with the same motivation - |
| 30 | +allowing easier integration of TF-M for platforms that do not have their own |
| 31 | +BL1 and require one. |
| 32 | + |
| 33 | +**************************** |
| 34 | +BL1 Features and Motivations |
| 35 | +**************************** |
| 36 | + |
| 37 | +The reference ROM bootloader provides the following features: |
| 38 | + |
| 39 | +- A split between code being stored in ROM and in other non-volatile memory. |
| 40 | + |
| 41 | + - This can allow significant cost reduction in fixing bugs compared to |
| 42 | + ROM-only bootloaders. |
| 43 | + |
| 44 | +- A secure boot mechanism that allows upgrading the next boot stage (which |
| 45 | + would usually be BL2). |
| 46 | + |
| 47 | + - This allows for the fixing of any bugs in the BL2 image. |
| 48 | + - Alternately, this could allow the removal of BL2 in some devices that are |
| 49 | + constrained in flash space but have ROM. |
| 50 | + |
| 51 | +- A post-quantum resistant asymmetric signature scheme for verifying the next |
| 52 | + boot stage image. |
| 53 | + |
| 54 | + - This can allow devices to be securely updated even if attacks |
| 55 | + involving quantum computers become viable. This could extend the lifespans |
| 56 | + of devices that might be deployed in the field for many years. |
| 57 | + |
| 58 | +- A mechanism for passing boot measurements to the TF-M runtime so that they |
| 59 | + can be attested. |
| 60 | +- Tooling to create and sign images. |
| 61 | +- Fault Injection (FI) and Differential Power Analysis (DPA) mitigations. |
| 62 | + |
| 63 | +********************************* |
| 64 | +BL1_1 and BL1_2 split bootloaders |
| 65 | +********************************* |
| 66 | + |
| 67 | +BL1 is split into two distinct boot stages, BL1_1 which is stored in ROM and |
| 68 | +BL1_2 which is stored in other non-volatile storage. This would usually be |
| 69 | +either trusted or untrusted flash, but on platforms without flash memory can be |
| 70 | +OTP. As BL1_2 is verified against a hash stored in OTP, it is immutable after |
| 71 | +provisioning even if stored in mutable storage. |
| 72 | + |
| 73 | +Bugs in ROM bootloaders usually cannot be fixed once a device is provisioned / |
| 74 | +in the field, as ROM code is immutable the only option is fixing the bug in |
| 75 | +newly manufactured devices. |
| 76 | + |
| 77 | +However, it can be very expensive to change the ROM code of devices once |
| 78 | +manufacturing has begun, as it requires changes to the photolithography masks |
| 79 | +that are used to create the device. This cost varies depending on the complexity |
| 80 | +of the device and of the process node that it is being fabricated on, but can be |
| 81 | +large, both in engineering time and material/process costs. |
| 82 | + |
| 83 | +By placing the majority of the immutable bootloader in other storage, we can |
| 84 | +mitigate the costs associated with changing ROM code, as a new BL1_2 image can |
| 85 | +be used at provisioning time with minimal changeover cost. BL1_1 contains a |
| 86 | +minimal codebase responsible mainly for the verification of the BL1_2 image. |
| 87 | + |
| 88 | +The bootflow is as follows. For simplicity this assumes that the boot stage |
| 89 | +after BL1 is BL2, though this is not necessarily the case: |
| 90 | + |
| 91 | +1) BL1_1 begins executing in place from ROM |
| 92 | +2) BL1_1 copies BL1_2 into RAM |
| 93 | +3) BL1_1 verifies BL1_2 against the hash stored in OTP |
| 94 | +4) BL1_1 jumps to BL1_2, if the hash verification has succeeded |
| 95 | +5) BL1_2 copies the primary BL2 image from flash into RAM |
| 96 | +6) BL1_2 verifies the BL2 image using asymmetric cryptography |
| 97 | +7) If verification fails, BL1_2 repeats 5 and 6 with the secondary BL2 image |
| 98 | +8) BL1_2 jumps to BL2, if either image has successfully verified |
| 99 | + |
| 100 | +.. Note:: |
| 101 | + The BL1_2 image is not encrypted, so if it is placed in untrusted flash it |
| 102 | + will be possible to read the data in the image. |
| 103 | + |
| 104 | +Some optimizations have been made specifically for the case where BL1_2 has been |
| 105 | +stored in OTP: |
| 106 | + |
| 107 | +OTP can be very expensive in terms of chip area, though new technologies like |
| 108 | +antifuse OTP decrease this cost. Because of this, the code size of BL1_2 has |
| 109 | +been minimized. Code-sharing has been configured so that BL1_2 can call |
| 110 | +functions stored in ROM. Care should be taken that OTP is sized such that it is |
| 111 | +possible to include versions of the functions used via code-sharing, in case the |
| 112 | +ROM functions contain bugs, though less space is needed than if all code is |
| 113 | +duplicated as it is assumed that most functions will not contain bugs. |
| 114 | + |
| 115 | +As OTP memory frequently has low performance, BL1_2 is copied into RAM before it |
| 116 | +it is executed. It also copies the next image stage into RAM before |
| 117 | +authenticating it, which allows the next stage to be stored in untrusted flash. |
| 118 | +This requires that the device have sufficient RAM to contain both the BL1_2 |
| 119 | +image and the next stage image at the same time. Note that this is done even if |
| 120 | +BL1_2 is located in XIP-capable flash, as it both allows the use of untrusted |
| 121 | +flash and simplifies the image upgrade logic. |
| 122 | + |
| 123 | +.. Note:: |
| 124 | + BL1_2 enables TF-M to be used on devices that contain no secure flash, though |
| 125 | + the ITS service will not be available. Other services that depend on ITS will |
| 126 | + not be available without modification. |
| 127 | + |
| 128 | +************************************* |
| 129 | +Secure boot / Image upgrade mechanism |
| 130 | +************************************* |
| 131 | + |
| 132 | +BL1_2 verifies the authenticity of the next stage image via asymmetric |
| 133 | +cryptography, using a public key that is provisioned into OTP. |
| 134 | + |
| 135 | +BL1_2 implements a rollback protection counter in OTP, which is used to prevent |
| 136 | +the next stage image being downgraded to a less secure version. |
| 137 | + |
| 138 | +BL1_2 has two image slots, which allows image upgrades to be performed. The |
| 139 | +primary slot is always booted first, and then if verification of this fails |
| 140 | +(either due to an invalid signature or due to a version lower than the rollback |
| 141 | +protection counter) the secondary slot is then booted (subject to the same |
| 142 | +checks). |
| 143 | + |
| 144 | +BL1_2 contains no image upgrade logic, in order for OTA of the next stage image |
| 145 | +to be implemented, a later stage in the system must handle downloading new |
| 146 | +images and placing them in the required slot. |
| 147 | + |
| 148 | +******************************************** |
| 149 | +Post-Quantum signature verification in BL1_2 |
| 150 | +******************************************** |
| 151 | + |
| 152 | +BL1_2 uses a post-quantum asymmetric signature scheme to verify the next stage. |
| 153 | +The scheme used is Leighton-Michaeli Signatures (henceforth LMS). LMS is |
| 154 | +standardised in `NIST SP800-208 |
| 155 | +<https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-208.pdf>`_ |
| 156 | +and `IETF RFC8554. <https://datatracker.ietf.org/doc/html/rfc8554>`_ |
| 157 | + |
| 158 | +LMS is a stateful-hash signature scheme, meaning that: |
| 159 | + |
| 160 | + 1) It is constructed from a cryptographic hash function, in this case SHA256. |
| 161 | + |
| 162 | + - This function can be accelerated by existing hardware accelerators, which |
| 163 | + can make LMS verification relatively fast compared to other post-quantum |
| 164 | + signature schemes that cannot be accelerated in hardware yet. |
| 165 | + |
| 166 | + 2) Each private key can only be used to sign a certain number of images. |
| 167 | + |
| 168 | + - BL1_2 uses the SHA256_H10 parameter set, meaning each key can sign 1024 |
| 169 | + images. |
| 170 | + |
| 171 | +The main downside, the limited amount of possible signatures, can be mitigated |
| 172 | +by limiting the amount of image upgrades that are done. As BL2 is often |
| 173 | +currently not upgradable, it is not anticipated that this limit will be |
| 174 | +problematic. If BL1 is being used to directly boot a TF-M/NS combined image, the |
| 175 | +limit is more likely to be problematic, and care should be taken to examine the |
| 176 | +likely update amount. |
| 177 | + |
| 178 | +LMS public keys are 32 bytes in size, and LMS signatures are 1912 bytes in size. |
| 179 | +The signature size is larger than some asymmetric schemes, though most devices |
| 180 | +should have enough space in flash to accommodate this. |
| 181 | + |
| 182 | +The main upside of LMS, aside from the security against attacks involving |
| 183 | +quantum computers, is that it is relatively simple to implement. The software |
| 184 | +implementation that is used by BL1 is ~3KiB in size, which is considerably |
| 185 | +smaller than the corresponding RSA implementation which is at least 6.5K. This |
| 186 | +simplicity of implementation is useful to avoid bugs. |
| 187 | + |
| 188 | +BL1 will use MbedTLS as the source for its implementation of LMS. |
| 189 | + |
| 190 | +.. Note:: |
| 191 | + As of the time of writing, the LMS code is still in the process of being |
| 192 | + merged into MbedTLS, so BL1 currently does not support asymmetric |
| 193 | + verification of the next boot stage. Currently, the next boot stage is |
| 194 | + hash-locked, so cannot be upgraded. |
| 195 | + |
| 196 | + The Github pull request for LMS can be found `here |
| 197 | + <https://github.com/ARMmbed/mbedtls/pull/4826>`_ |
| 198 | + |
| 199 | +********************* |
| 200 | +BL1 boot measurements |
| 201 | +********************* |
| 202 | + |
| 203 | +BL1 outputs boot measurements in the same format as BL2, utilising the same |
| 204 | +shared memory area. These measurements can then be included in the attestation |
| 205 | +token, allowing the attestation of the version of the boot stage after BL1. |
| 206 | + |
| 207 | +*********** |
| 208 | +BL1 tooling |
| 209 | +*********** |
| 210 | + |
| 211 | +Image signing scripts are provided for BL1_1 and BL1_2. While the script is |
| 212 | +named ``create_bl2_img.py``, it can be used for any next stage image. |
| 213 | + |
| 214 | +- ``bl1/bl1_1/scripts/create_bl1_2_img.py`` |
| 215 | +- ``bl1/bl1_2/scripts/create_bl2_img.py`` |
| 216 | + |
| 217 | +These sign (and encrypt in the case of ``create_bl2_img.py``) a given image file |
| 218 | +and append the required headers. |
| 219 | + |
| 220 | +************************** |
| 221 | +BL1 FI and DPA mitigations |
| 222 | +************************** |
| 223 | + |
| 224 | +BL1 reuses the FI countermeasures used in the TF-M runtime, which are found in |
| 225 | +``lib/fih/``. |
| 226 | + |
| 227 | +BL1 implements countermeasures against DPA, which are primarily targeted |
| 228 | +towards being able to handle cryptographic material without leaking its |
| 229 | +contents. The functions with these countermeasures are found in |
| 230 | +``bl1/bl1_1/shared_lib/util.c`` |
| 231 | + |
| 232 | +``bl_secure_memeql`` tests if memory regions have the same value |
| 233 | + |
| 234 | +- It does not perform early exits to prevent timing attacks. |
| 235 | +- It compares chunks in random orders to prevent DPA trace correlation analysis |
| 236 | +- It inserts random delays to prevent DPA trace correlation analysis |
| 237 | +- It performs loop integrity checks |
| 238 | +- It uses FIH constructs |
| 239 | + |
| 240 | +``bl_secure_memcpy`` copies memory regions |
| 241 | + |
| 242 | +- It copies chunks in random orders to prevent DPA trace correlation analysis |
| 243 | +- It inserts random delays to prevent DPA trace correlation analysis |
| 244 | +- It performs loop integrity checks |
| 245 | +- It uses FIH constructs |
| 246 | + |
| 247 | +************************** |
| 248 | +Using BL1 on new platforms |
| 249 | +************************** |
| 250 | + |
| 251 | +New platforms must define the following macros in their ``region_defs.h``: |
| 252 | + |
| 253 | +- ``BL1_1_HEAP_SIZE`` |
| 254 | +- ``BL1_1_STACK_SIZE`` |
| 255 | +- ``BL1_2_HEAP_SIZE`` |
| 256 | +- ``BL1_2_STACK_SIZE`` |
| 257 | +- ``BL1_1_CODE_START`` |
| 258 | +- ``BL1_1_CODE_LIMIT`` |
| 259 | +- ``BL1_1_CODE_SIZE`` |
| 260 | +- ``BL1_2_CODE_START`` |
| 261 | +- ``BL1_2_CODE_LIMIT`` |
| 262 | +- ``BL1_2_CODE_SIZE`` |
| 263 | +- ``PROVISIONING_DATA_START`` |
| 264 | +- ``PROVISIONING_DATA_LIMIT`` |
| 265 | +- ``PROVISIONING_DATA_SIZE`` |
| 266 | + |
| 267 | +The ``PROVISIONING_DATA_*`` defines are used to locate where the data to be |
| 268 | +provisioned into OTP can be found. These are required as the provisioning bundle |
| 269 | +needs to contain the entire BL1_2 image, usually >= 8KiB in size, which is too |
| 270 | +large to be placed in the static data area as is done for all other dummy |
| 271 | +provisioning data. On development platforms with reprogrammable ROM, this is |
| 272 | +often placed in unused ROM. On production platforms, this should be located in |
| 273 | +RAM and then filled with provisioning data. The format of the provisioning data |
| 274 | +that should be located in the ``PROVISIONING_DATA_*`` region can be found in |
| 275 | +``bl1/bl1_1/lib/provisioning.c`` in the struct |
| 276 | +``bl1_assembly_and_test_provisioning_data_t`` |
| 277 | + |
| 278 | +If the platform is storing BL1_2 in flash, it must set |
| 279 | +``BL1_2_IMAGE_FLASH_OFFSET`` to the flash offset of the start of BL1_2. |
| 280 | + |
| 281 | +The platform must also implement the HAL functions defined in the following |
| 282 | +headers: |
| 283 | + |
| 284 | +- ``bl1/bl1_1/shared_lib/interface/trng.h`` |
| 285 | +- ``bl1/bl1_1/shared_lib/interface/crypto.h`` |
| 286 | +- ``bl1/bl1_1/shared_lib/interface/otp.h`` |
| 287 | + |
| 288 | +If the platform integrates a CryptoCell-312, then it can reuse the existing |
| 289 | +implementation. |
| 290 | + |
| 291 | +*********** |
| 292 | +BL1 Testing |
| 293 | +*********** |
| 294 | + |
| 295 | +New tests have been written to test both the HAL implementation, and the |
| 296 | +integration of those functions for verifying images. These tests are stored in |
| 297 | +the ``tf-m-tests`` repository, under the ``test/bl1/`` directory, and further |
| 298 | +subdivided into BL1_1 and BL1_2 tests. |
| 299 | + |
| 300 | +-------------- |
| 301 | + |
| 302 | +*Copyright (c) 2022, Arm Limited. All rights reserved.* |
0 commit comments