Skip to content

Latest commit

 

History

History
62 lines (40 loc) · 7.68 KB

20220601-amo.md

File metadata and controls

62 lines (40 loc) · 7.68 KB

【翻译】RISCV 规范-原子内存操作

原文 RISCV 规范 20191213 §8.4。最新版见官网规范发布页

8.4 原子内存操作

8.4 Atomic Memory Operations

原子内存操作(AMO)指令执行读取-修改-写入操作以实现多处理器同步,并使用 R 型指令格式进行编码。这些 AMO 指令从 rs1 中的地址原子地加载数据值,将值放入寄存器 rd,对加载的值和 rs2 中的原始值应用二元运算符,然后将结果存储回 rs1 中的地址。 AMO 可以在内存中的 64 位(仅限 RV64)或 32 位字上运行。对于 RV64,32 位 AMO 总是对放置在 rd 中的值进行符号扩展。

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd.

对于 AMO,A 扩展要求 rs1 中保存的地址与操作数的大小自然对齐(即,对于 64 位字,八字节对齐,对于 32 位字,四字节对齐)。如果地址不是自然对齐的,则会产生地址未对齐异常或访问错误异常。如果不应该模拟未对齐的访问,则可以为除了未对齐之外能够完成的内存访问生成访问错误异常。第 22 章中描述的 “Zam” 扩展放宽了这一要求并指定了未对齐 AMO 的语义。

For AMOs, the A extension requires that the address held in rs1 be naturally aligned to the size of the operand (i.e., eight-byte aligned for 64-bit words and four-byte aligned for 32-bit words). If the address is not naturally aligned, an address-misaligned exception or an access-fault exception will be generated. The access-fault exception can be generated for a memory access that would otherwise be able to complete except for the misalignment, if the misaligned access should not be emulated. The “Zam” extension, described in Chapter 22, relaxes this requirement and specifies the semantics of misaligned AMOs.

支持的操作有交换、整数加法、按位与、按位或、按位异或,以及有符号和无符号整数的取大和取小。在没有排序约束的情况下,这些 AMO 可用于实现并行归约操作,其中通常会通过写入 x0 来丢弃返回值。

The operations supported are swap, integer add, bitwise AND, bitwise OR, bitwise XOR, and signed and unsigned integer maximum and minimum. Without ordering constraints, these AMOs can be used to implement parallel reduction operations, where typically the return value would be discarded by writing to x0.

我们提供了“读取并操作”风格的原子原语,因为它们可以比 LR/SC 或 CAS 更好地扩展到高度并行的系统。一个简单的微架构可以使用 LR/SC 原语实现 AMO,前提是该实现可以保证 AMO 最终完成。更复杂的实现也可能在内存控制器上实现 AMO,并且可以优化在目标为 x0 时获取原始值。

We provided fetch-and-op style atomic primitives as they scale to highly parallel systems better than LR/SC or CAS. A simple microarchitecture can implement AMOs using the LR/SC primitives, provided the implementation can guarantee the AMO eventually completes. More complex implementations might also implement AMOs at memory controllers, and can optimize away fetching the original value when the destination is x0.

选择 AMO 集是为了有效地支持 C11/C++11 原子内存操作,并支持内存中的并行 reductions。AMO 的另一个用途是为 I/O 空间中的内存映射设备寄存器(例如,设置、清除或切换位)提供原子更新。

The set of AMOs was chosen to support the C11/C++11 atomic memory operations efficiently, and also to support parallel reductions in memory. Another use of AMOs is to provide atomic updates to memory-mapped device registers (e.g., setting, clearing, or toggling bits) in the I/O space.

为了帮助实现多处理器同步,AMO 可选择提供发布一致性语义。如果设置了 aq 位,则不会观察到此 RISC-V hart 中的后续内存操作发生在 AMO 之前。相反,如果设置了 rl 位,则其他 RISC-V hart 在此 RISC-V hart 中的 AMO 之前的内存访问之前将不会观察 AMO。在 AMO 上同时设置 aq 和 rl 位会使序列顺序一致,这意味着它不能与来自同一个 hart 的更早或更晚的内存操作重新排序。

To help implement multiprocessor synchronization, the AMOs optionally provide release consistency semantics. If the aq bit is set, then no later memory operations in this RISC-V hart can be observed to take place before the AMO. Conversely, if the rl bit is set, then other RISC-V harts will not observe the AMO before memory accesses preceding the AMO in this RISC-V hart. Setting both the aq and the rl bit on an AMO makes the sequence sequentially consistent, meaning that it cannot be reordered with earlier or later memory operations from the same hart.

AMO 旨在有效地实现 C11 和 C++11 内存模型。尽管 FENCE R、RW 指令足以实现获取操作,而 FENCE RW、W 足以实现释放,但与设置了相应 aq 或 rl 位的 AMO 相比,两者都意味着额外的不必要排序。

The AMOs were designed to implement the C11 and C++11 memory models efficiently. Although the FENCE R, RW instruction suffices to implement the acquire operation and FENCE RW, W suffices to implement release, both imply additional unnecessary ordering as compared to AMOs with the corresponding aq or rl bit set.

图 8.2 显示了一个由 test-and-test-and-set spinlock 保护的临界区的示例代码序列。请注意,第一个 AMO 标记为 aq 以在临界区之前上锁,第二个 AMO 标记为 rl 以在临界区之后解锁。

An example code sequence for a critical section guarded by a test-and-test-and-set spinlock is shown in Figure 8.2. Note the first AMO is marked aq to order the lock acquisition before the critical section, and the second AMO is marked rl to order the critical section before the lock relinquishment.

我们建议使用上面显示的 AMO Swap 习惯用法来获取和释放锁,以简化推测锁省略的实现。

We recommend the use of the AMO Swap idiom shown above for both lock acquire and release to simplify the implementation of speculative lock elision [16].

A 扩展中的指令也可用于提供顺序一致的加载和存储。顺序一致的加载可以实现为同时设置 aq 和 rl 的 LR。顺序一致的存储可以实现为将旧值写入 x0 并设置 aq 和 rl 的 AMOSWAP。

The instructions in the “A” extension can also be used to provide sequentially consistent loads and stores. A sequentially consistent load can be implemented as an LR with both aq and rl set. A sequentially consistent store can be implemented as an AMOSWAP that writes the old value to x0 and has both aq and rl set.

    li           t0, 1        # 初始化交换值
again:
    lw           t1, (a0)     # 检查锁是否锁定
    bnez         t1, again    # 如果已锁定则重试
    amoswap.w.aq t1, t0, (a0) # 尝试获取锁
    bnez         t1, again    # 如果已锁定则重试

    amoswap.w.rl x0, x0, (a0) # 写入 0 以释放锁

图 8.2:互斥的示例代码。a0 包含锁的地址。

Figure 8.2: Sample code for mutual exclusion. a0 contains the address of the lock.