Replies: 4 comments 7 replies
-
Correct
Correct
The OR CTS engine tries to minimize skew because it is algorithmically simpler, but not necessarily the most optimal. Commercial engines use a technique called "concurrent clock optimization" which will look at the timing paths and purposefully skew certain registers if it makes timing better. Concurrent clock optimization has a similar effect to register retiming - the former shifts the clock so it borrows setup time from one stage to give to another stage. The latter shifts logic from one stage to another stage and therefore also shifts setup time.
Yes, async FIFOs or other clock domain crossings (CDCs) are convenient ways to break up and decouple clock trees. Clock trees cannot become too large, because the larger they are, the more power they consume and the more difficult it is to minimize skew/jitter/uncertainty. A clock tree can become so large that the jitter becomes larger than the clock period, in which case timing is impossible to meet. There is a design tradeoff between how many clock domains there are and data latency because the CDCs add one or more cycles when transmitting data across the interface.
I might rephrase the question more simply as "How can I tell if clock skew is bad for a design?" because clock skew always impacts the clock period, as alluded above. This is one of the areas where it takes a lot of intuition, experimentation, and heuristics to evaluate because the answer is rarely clear. A soft and perhaps unuseful rule of thumb would be "when the clock skew/jitter/uncertainty becomes a significant fraction of the clock period". There are no hard rules of thumb, because sometimes high skew can be tolerated in order to keep the design fully synchronous. I personally start to get suspicious if the skew is eating more than 20-40% of the clock period. But the size of the clock tree also matters and how much skew you would expect from a clock tree of that size. There are some red flags, though, to identify purely suboptimal results. One is if there are many, many hold buffers being inserted. This is usually due to bad timing constraints, but it could also be due to bad skew in the clock tree. Another red flag is if a path is failing both setup time and hold time. This most often happens not because of skew but because of jitter caused by on-chip variation. Jitter can cause clock edges to be both early and late, which means that if a path is failing both then the jitter is too high. |
Beta Was this translation helpful? Give feedback.
-
This is the asynchronous connection between TileLink and the rest of the system is in the Verilog code. Here is the expected gray counter and the corresponding Chisel code. |
Beta Was this translation helpful? Give feedback.
-
I chased down the synchronous reset although it doesn't show up in the most critical path at the ChipTop level. For now, I have created a macro out of the BranchPredictor to rein in build times. In that macro the synchronous reset has a very large fanout, which obviously is a disaster for timing. After some investigation at the top level, I have found out that MegaBoom, as documented, is relying heavily on register retiming and that the synchronous reset is in fact pipelined. However, since the design is hierarchical and not flattened, the design won't be able to take advantage of these three pipeline stages. Also, yosys does not support retiming. Retiming in OpenROAD/yosys has been discussed in some detail previously, I wanted to share the results of my investigation into synchronous reset specifically for MegaBoom. |
Beta Was this translation helpful? Give feedback.
-
For my part, the questions were answered so closing. |
Beta Was this translation helpful? Give feedback.
-
How can I tell if clock skew is increasing the minimum clock period for a design?
Here is my current understanding:
If two flip flops are not connected, then the clock skew between the clocks that drive those two flip flops doesn't matter because there is no timing path between these two flip flops.
Skew can be good and it can be bad. If there is a long timing path between two flip flops, then a negative skew for the starting flip flops or postivive skew for the capturing flip flop would make it easier to meet timing.
As a first order approximation though, the CTS will try to minimize clock skew, because in the end a very large clock skew will catch up with you and increase the minimum clock period.
Latest MegaBoom update:
I have modified MegaBoom so that it no longer has a PLL, but a clock for the TileLink (top level memory/peripheral interface) and for the RISC-V core.
As I understand, though I don't know the code very well, the RISC-V core is connected to the TileLink via an asynchronous FIFO(or equivalent thereof).
Therefore there are no ChipTop inputs/outputs that have an insertion point relative to the clock for the RISC-V core. This seems like a clever way of doing things, because then the insertion latency of the RISC-V clock doesn't matter(though clock uncertainty which I would expect to grow with a long clock insertation latency) for the clock period.
Some notes:
Beta Was this translation helpful? Give feedback.
All reactions