[clkmgr] Glitch in shadow register storage error #24592

andreaskurth · 2024-09-17T20:21:43Z

clkmgr has five shadowed registers:

IO_MEAS_CTRL_SHADOWED
IO_DIV2_MEAS_CTRL_SHADOWED
IO_DIV4_MEAS_CTRL_SHADOWED
MAIN_MEAS_CTRL_SHADOWED
USB_MEAS_CTRL_SHADOWED

Combinational logic inside prim_subreg_shadow compares the value of the shadowed register to that of the committed register and asserts the err_storage output if there's a mismatch:

opentitan/hw/ip/prim/rtl/prim_subreg_shadow.sv

Line 184 in 282d863

assign err_storage = (~shadow_q != committed_q);

In clkmgr (and only there, at least in the current Earlgrey), this output goes through a CDC into the IO_DIV4 (powerup) domain . Take the example of IO_MEAS_CTRL_SHADOWED:

opentitan/hw/top_earlgrey/ip_autogen/clkmgr/rtl/clkmgr_reg_top.sv

Line 1345 in 282d863

.err_storage (async_io_meas_ctrl_shadowed_hi_err_storage)

opentitan/hw/top_earlgrey/ip_autogen/clkmgr/rtl/clkmgr_reg_top.sv

Lines 1294 to 1302 in 282d863

    
           prim_flop_2sync #( 
        
             .Width(1), 
        
             .ResetValue('0) 
        
           ) u_io_meas_ctrl_shadowed_hi_err_storage_sync ( 
        
             .clk_i, 
        
             .rst_ni, 
        
             .d_i(async_io_meas_ctrl_shadowed_hi_err_storage), 
        
             .q_o(io_meas_ctrl_shadowed_hi_storage_err) 
        
           );

The problem is that the input of the CDC flop is driven by combinational logic. That is, as a register changes and its new value ripples through the comparator, the comparator could temporarily have inequality as result, and if you're unlucky, the CDC flops just then. If that happens, a shadow register storage error is incorrectly flagged, which feeds into a fatal alert.

To fix this, the output of the comparator (combinational logic in general) needs to be flopped in the source clock domain (where glitches can be prevented through STA), and the output of that flop then needs to go to the CDC flop.

Whether this problem occurs in any given silicon implementation is probabilistic, and the chance/risk can potentially be evaluated through statistical analysis of experiments on a batch of chips.

If this problem occurs in a given silicon implementation, it can be prevented from affecting operation of the chip either (A) by not writing the listed shadowed registers (which implies not using clkmgr's counting/measurement feature) or (B) by ignoring clkmgr's fatal alerts in alert_handler. With option (B), clkmgr's counting/measurement feature can still be used (the feature causes recoverable alerts if clocks exceed the configured thresholds), but clkmgr's other internal countermeasures that lead to fatal alerts (integrity protection of the idle counters, TL-UL, the register write selector, and the shadowed registers) are no longer handled by alert_handler. To notice alerts from them, firmware could periodically read out clkmgr's FATAL_ERR_CODE register.

The text was updated successfully, but these errors were encountered:

andreaskurth added the IP:clkmgr label Sep 17, 2024

andreaskurth mentioned this issue Sep 17, 2024

[reggen,clkmgr] Add flops to deglitch shadow storage error #24581

Closed

vogelpi mentioned this issue Sep 24, 2024

[clkmgr] Deglitch shadow storage error for MAIN clock domain #24622

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clkmgr] Glitch in shadow register storage error #24592

[clkmgr] Glitch in shadow register storage error #24592

andreaskurth commented Sep 17, 2024 •

edited

Loading

[clkmgr] Glitch in shadow register storage error #24592

[clkmgr] Glitch in shadow register storage error #24592

Comments

andreaskurth commented Sep 17, 2024 • edited Loading

andreaskurth commented Sep 17, 2024 •

edited

Loading