Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Expand and document lib.cdc #40

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Wren6991
Copy link
Contributor

@Wren6991 Wren6991 commented Mar 4, 2019

Very much a work in progress, but I wanted to open early to get feedback on the approach/direction. I'm porting/rewriting some of the missing modules from migen.genlib.cdc.

Currently in this patch set:

  • Add docstrings to the existing modules. I tried to copy the "house style" I saw elsewhere.
  • Basic checking to give more meaningful traces when you do something like request a MultiReg of length 0. Again I tried to copy the idioms I saw elsewhere
  • Implement PulseSynchronizer
  • Implement Gearbox

Known issues:

  • Smoke tests only; no formal checks
  • Checks are limited by single-clock simulation -- maybe it was a bad idea to play with the CDC library first :)
  • I'm passing in the domain names rather than using DomainRenamer. Not sure whether it's best to leave this PR open until this is in place, or merge early so people can start hacking on this. Not my decision :)
  • The storage size calculation in this version of Gearbox is different to the one in Migen. IMO the old one was unsafe, but this one may be too conservative. I could also be plain wrong!
  • (edit:) The output mux on the Gearbox is driven from two clock domains. This is a legitimate thing to do here, but I saw mention somewhere that doing this should cause an error.

I definitely still intend to port:

  • BusSynchronizer
  • ElasticBuffer

And we probably ought to do something about GrayCounter at some point, but I think it's less important than the others.

As a general style question, there seem to be two ways of doing module wiring in nmigen.lib at the moment. One is to pass external signals into __init__, which gives you more of a Verilog-style instantiation. The other is to create "port" signals in __init__, which the parent then wires up during elaboration. I've used the second style because it seems a bit cleaner, and doesn't require creating extraneous signals like in Verilog, but not sure if this is the right thing to do?

@codecov
Copy link

codecov bot commented Mar 4, 2019

Codecov Report

Merging #40 into master will increase coverage by 0.64%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #40      +/-   ##
==========================================
+ Coverage   80.70%   81.34%   +0.64%     
==========================================
  Files          32       32              
  Lines        4976     5158     +182     
  Branches     1071     1105      +34     
==========================================
+ Hits         4016     4196     +180     
- Misses        825      829       +4     
+ Partials      135      133       -2     
Impacted Files Coverage Δ
nmigen/lib/fifo.py 98.76% <ø> (ø)
nmigen/lib/cdc.py 100.00% <100.00%> (+13.33%) ⬆️
nmigen/compat/fhdl/structure.py 63.01% <0.00%> (-2.21%) ⬇️
nmigen/hdl/dsl.py 99.28% <0.00%> (-0.01%) ⬇️
nmigen/hdl/xfrm.py 96.21% <0.00%> (+0.01%) ⬆️
nmigen/back/pysim.py 96.64% <0.00%> (+0.02%) ⬆️
nmigen/build/plat.py 23.60% <0.00%> (+0.03%) ⬆️
nmigen/back/rtlil.py 86.26% <0.00%> (+0.05%) ⬆️
nmigen/hdl/ast.py 86.76% <0.00%> (+0.05%) ⬆️
nmigen/hdl/ir.py 94.33% <0.00%> (+0.07%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad1a40c...2e6cf3c. Read the comment docs.

@Wren6991
Copy link
Contributor Author

Wren6991 commented Mar 4, 2019

Last commit is just to fix a small coverage blackspot in the existing code. I was having far too much fun with the graphs on Codecov

@Wren6991
Copy link
Contributor Author

Wren6991 commented Mar 5, 2019

BusSynchronizer is as per Migen except:

  • number of synchronisation stages is optionally parameterised
  • timeout retry is optional by setting timeout to 0 (since it is a bit risky, can cause metastabilities in capturing domain)
  • Timer is not factored out into a module. nMigen does not have a misc library right now

Again it needs formal checks, and tests are devalued by running them in one clock domain.

Edit: updated the commit. Previously there was a clock-enabled odomain register driven by a clock-enabled idomain register. This is ok for CDC purposes, because the CDC-safe handshake controls the clock enables, but I've also put in a (shortened) MultiReg between these registers, because "all cross-domain signals go through a MultiReg" seems like a useful property.

Edit: squashed in another small change: making sure that sync_stages gets passed on to the MultiReg for the width=1 special case.

@Wren6991 Wren6991 force-pushed the cdc_lib branch 3 times, most recently from 502b138 to c592d35 Compare March 5, 2019 11:36
@Wren6991
Copy link
Contributor Author

Wren6991 commented Mar 5, 2019

Add ElasticBuffer. Faithful reproduction of the Migen version, except without the reset synchronisers (since these would normally be per-reset-domain, and there is still some code motion wrt reset/clock domains in nMigen as a whole).

Edit: also this elastic buffer is unusual, usually there is some provision for slipping the pointers to account for long-term unbounded phase change. This would be more light-weight than an AsyncFIFO but only works for a few PPM difference. I'm implementing as-is because something like that would have a different interface to current ElasticBuffer.

Force push due to finding small issues in earlier commits, e.g. gearbox increment didn't work for non-pow-2 chunk counts (which can't be tested in one clock domain, I just spotted it while reading the code). I've tried to keep it so that each commit can be separately checked out, tested and reviewed. If you'd rather I just push fixes, let me know :)

Also, I've copied the _incr used in lib.fifo. Might be worth factoring this out, but that's outside of this PR.

@whitequark
Copy link
Contributor

  • I'm passing in the domain names rather than using DomainRenamer. Not sure whether it's best to leave this PR open until this is in place, or merge early so people can start hacking on this. Not my decision :)

I'll try to fix DomainRenamer to work on Modules as soon as possible!

As a general style question, there seem to be two ways of doing module wiring in nmigen.lib at the moment. One is to pass external signals into __init__, which gives you more of a Verilog-style instantiation. The other is to create "port" signals in __init__, which the parent then wires up during elaboration. I've used the second style because it seems a bit cleaner, and doesn't require creating extraneous signals like in Verilog, but not sure if this is the right thing to do?

I use the second style, with some exceptions like MultiReg. The reason is that the signals aren't always completely independent, for example you might want to pass a width and have a set of signals with that width, double that width, etc created for you. Also, constructors with a large number of arguments are unwieldy and bug-prone, especially if you change the constructor later.

@Wren6991
Copy link
Contributor Author

Wren6991 commented Mar 5, 2019

I'll try to fix DomainRenamer to work on Modules as soon as possible!

Ok, no sweat, I'll rebase + fix as and when :)

@Wren6991
Copy link
Contributor Author

Wren6991 commented Apr 7, 2019

Rebased on latest master and fixed up docstrings:

  • Reflowed to 100 characters
  • Fixed ReST formatting
  • Americanized spellings to be consistent with class names
  • Some edits for clarity

I also stared at the code for a while, only one minor fix which was something that really oughtn't be retimed, but didn't have the no_retiming attribute. Not saying there's nothing to fix, just that it's reached the point where squinting at my own code is nonproductive.

Domains are passed as strings, will fix this after the xfrm.py changes go in.

@whitequark
Copy link
Contributor

@Wren6991 Done!

@Wren6991
Copy link
Contributor Author

Wren6991 commented Apr 10, 2019

Thanks :) I've fixed up PulseSynchronizer and Gearbox so far. Unfortunately I'm away on a trip at the moment and don't have all the necessary tools set up on my work laptop, so it'll take me a while to get the fixes tested and pushed.

For consistency I'm using "read" and "write" for default domain names (matching AsyncFIFO), but there is a not-totally-frivolous argument to use something like "i" and "o" instead:

  • Matches the signal naming convention used in e.g. lib/coding.py. Makes sense for i to be in "i", o in "o" etc
  • Having domains be the same number of characters is super nice. I have bad habits from RTL of lining up blocks of assignments in my code, and it's nice if this happens without inserting whitespace
  • DomainRenamer({"i": "gpu", "o": "display"})(MyCDCModule()) is just a bit pithier

Was just something that occurred to me when editing the files, it's not up to me.

@whitequark
Copy link
Contributor

Thanks :) I've fixed up PulseSynchronizer and Gearbox so far. Unfortunately I'm away on a trip at the moment and don't have all the necessary tools set up on my work laptop, so it'll take me a while to get the fixes tested and pushed.

For consistency I'm using "read" and "write" for default domain names (matching AsyncFIFO), but there is a not-totally-frivolous argument to use something like "i" and "o" instead:

Now that I think more about it, I think it doesn't make sense to use DomainRenamer for synchronization primitives. The reason being, for these primitives, you always want to rename them, so requiring each caller to do that work is clearly superfluous, and it results in unnecessarily hard to read code too.

There's a good argument that PulseSynchronizer should work the same as AsyncFIFO. I agree. I think AsyncFIFO should use explicit domain specification as well! Pretty much every instance of AsyncFIFO is immediately followed by a DomainRenamer call.

WDYT?

@Wren6991
Copy link
Contributor Author

So a flow could be:

  • If you are writing a block of synchronous logic, you use sync, and then apply DomainRenamer to yeet it into the correct domain
  • Knowledge of domains is confined to a specific region of your code, which places synchronous blocks into the correct domains and also glues them together with CDC modules
  • Since CDC + DomainRenamer is actually a single constructor in a trenchcoat, skip the extra step and pass domains in directly

If so, that feels cleaner, yeah!

@whitequark
Copy link
Contributor

Yep!

to yeet it into the correct domain

Incredible.

@Wren6991
Copy link
Contributor Author

Awesome. I guess the next step for this PR is a lot more testing then. I did note #28, which I guess I could start to take a look at at some point, as this bumps against it. It would be good to learn about that part. Formal would be good too.

For CDC it would also be cool to do some soak testing on real FPGAs, to catch genuine CDC bugs rather than just state machine bugs (albeit they're difficult to debug once found). Not sure if this is something you already had ideas on.

@whitequark
Copy link
Contributor

For CDC it would also be cool to do some soak testing on real FPGAs

The CDC primitives in oMigen are extremely well tested. I think we mainly need to make sure the nMigen ones don't stray from them, like it happened with AsyncFIFO...

@Wren6991
Copy link
Contributor Author

Ok that's a good point. Will read through the oMigen library again later today. For the most part it's pretty faithful but mistakes creep in!

There are things in the old library that seemed odd though, e.g. for Gearbox with lcm(iwidth, owidth) == iwidth, the write port will have depth 2. The pointers start 180° apart, but there is no guard period between final o read on an ichunk and the next write to that ichunk, or conversely, first read from an ichunk and the preceding write. You're at the mercy of dice roll on the two reset synchronisers. I'm currently using a different storage size calculation because of this (it can end up larger or smaller than oMigen depending on ratio).

I'll definitely make sure the logic is all ok, and then maybe raise questions about things like the above.

@whitequark
Copy link
Contributor

Sounds good!

@Wren6991
Copy link
Contributor Author

Wren6991 commented Apr 17, 2019

I did a side-by-side of oMigen vs current state of my code

  • PulseSynchronizer
    • Actual logic identical
    • Number of sync stages in MultiReg is parameterised.
    • Currently I don't have the reset_less attribute on the non-CDC flops in this module, but the MultiReg defaults to reset_less=True. It would probably be better to parameterise this, and either have everything or nothing reset_less
  • BusSynchronizer
    • Handshake logic is identical: req/ack with two PulseSynchronizers
    • Number of sync stages parameterised
    • Option to disable timeout, as it has small chance of temporarily breaking the handshake for oclock << iclock (or oclock sometimes gated); multiple reqs can be in flight!
    • I removed one layer of flops from data CDC path. See diagram
    • Again some confusion over what should/should not be reset_less. Currently all synchronous logic is resettable but the MultiRegs are not
    • Found an issue with my code: ibuffer clock enable was driven by only _pong.o (ack, resynced to idomain) in oMigen, but mine was more permissive, and also enabled on timer restarts and start-of-day flag. Think this is ok, but I've fixed it to be exactly like oMigen.
  • ElasticBuffer
    • Logic is faithful reproduction of oMigen
    • oMigen code also has some reset structure inside. It ORs together the reset from each domain, and resynchronises this new reset internally. I think this is problematic because reset assertion is asynchronous, so e.g. an input-domain reset can inject metastabilities into the output domain. This is better handled by proper reset sequencing at system level. WDYT?
  • GearBox
    • Same comment on resets as ElasticBuffer
    • Actual logic is faithful to oMigen
    • Storage size calculation is different in my version: oMigen takes LCM of two widths, and doubles it unconditionally. This is unnecessarily large for many coprime widths (e.g. 5:4 for video) and for lcm(i, o) in (i, o) it seems too small to me. My code starts with lcm(i, o) and doubles it until the depth of each port is at least four, logic being that this gives you one chunk of no-man's-land on each side between the chunk you are reading/writing and the chunk the other side is writing/reading. WDYT?

BusSynchronizer datapath

I drew a rough box diagram of oMigen BusSynchronizer:

image

My problem with this is that there is a 3-flop delay for both the request and the data. (1 flop in idomain, two in odomain). This means there is a race between the obuffer clock enable and the data. I.e. due to bit skew observed at inputs to the req and data MultiRegs, it seems possible to have req = 1 at the output but garbled data.

I've fixed the race by removing one stage from the datapath MultiReg. Note there are still two non-retimable layers of flops in odomain on the datapath, and also, sync_stages can be increased to stretch out the entire BusSynchronizer.

@whitequark
Copy link
Contributor

@Wren6991 Sorry, I completely missed your update, GitHub decided to not send me an email for it for some reason.

nmigen/lib/cdc.py Outdated Show resolved Hide resolved
if modulo == 2 ** len(signal):
return signal + 1
else:
return Mux(signal == modulo - 1, 0, signal + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already one in lib.fifo and this is getting ridiculous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it should be either inlined at my callsites, or preferably factored out. Wanted to keep the scope of this PR well-defined though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of a nicer place to put this? It's a common pattern (as is saturating increment/decrement)

@Wren6991
Copy link
Contributor Author

Wren6991 commented May 1, 2019

Do you agree on my diagram of oMigen BusSynchronizer? I might have misunderstood e.g. the PulseSynchronizer logic. Haven't looked at it for a while now.

@whitequark
Copy link
Contributor

I'll need to think more about it as I wasn't the one who wrote it. @sbourdeauducq ?

@sbourdeauducq
Copy link
Member

I've fixed the race by removing one stage from the datapath MultiReg.

But then, how do you preserve the MultiReg constraints between the anti-metastabilty registers?

Sounds easier to add one register of delay into the request path.

@Wren6991
Copy link
Contributor Author

@sbourdeauducq You're 100% right -- it's possible to just falsepath/MCP the connection between write data buffer and the MultiReg, but it's cleaner if it can just rely on constraints on paths internal to the MultiReg.

Added a patch which restores the data path to its original length, and lengthens the request path to compensate.

@Wren6991
Copy link
Contributor Author

Fixed merge conflict and made classes derive from Elaboratable (sorry for force-push)

@whitequark
Copy link
Contributor

Thanks, I'll take another look soon.

@Wren6991
Copy link
Contributor Author

Updated naming conventions according to #97. I've also updated the only use of lib.cdc elsewhere in the standard library (in lib.fifo) to reflect this. Sorry for being unresponsive, day job has been busy

@whitequark
Copy link
Contributor

That's fine, I have a lot of other work here, so you're not really blocking anything.

@whitequark whitequark added this to the 0.1 milestone Jun 28, 2019
@Wren6991
Copy link
Contributor Author

I saw this in #113

CDC improvements like CDC safety (#4) and automatic false path constraints (#87; depends on #88). These are definitely needed, but there are still plenty of designs that will be just fine without them, and I think for most people waiting to pick up nMigen they wouldn't be an issue. Again, these are more of 0.2 issues.

And have also noted #87. Wanted to mention that falsepath is not necessarily safe for all of these primitives: in particular, BusSynchronizer is unsafe if there is too much skew between the request path and the datapath, and false paths can accumulate huge routing delays when FPGA utilisation is high.

Since we want the widest possible toolchain support, one option is a maxdelay constraint of capture period minus setup. This isn't perfectly safe (clock skew), and does make closure harder, but might be worth considering.

@whitequark
Copy link
Contributor

Wanted to mention that falsepath is not necessarily safe for all of these primitives

That seems very troublesome, and certainly something we want to loudly warn about. Would you be able to prepare a detailed evaluation of all of our CDC primitives with this in mind?

BusSynchronizer is unsafe if there is too much skew between the request path and the datapath, and false paths can accumulate huge routing delays when FPGA utilisation is high.

Are there any ways to work around this?

Since we want the widest possible toolchain support, one option is a maxdelay constraint of capture period minus setup. This isn't perfectly safe (clock skew), and does make closure harder, but might be worth considering.

This seems like a dramatic increase in complexity of platform code. It's an option, but maybe we can avoid it completely...

@whitequark
Copy link
Contributor

@Wren6991
Copy link
Contributor Author

Wren6991 commented Jul 12, 2019

Relevant: m-labs/migen@f4979a2#commitcomment-34195474

That is an interesting handshake, but seems to overlap quite a lot with BusSynchronizer. I have had the occasional shower thought that BusSynchronizer should have an optional valid/ready handshake in the launching domain (and perhaps a valid pulse in the capturing domain), which would give you something similar to this BlindTransfer.

I'm also happy to just port BlindTransfer wholesale!

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jul 13, 2019

I'm OK with merging them, but BlindTransfer is also commonly used without a data payload. (A typical use case for it is error reporting, and for many errors a simple pulse that tells you that the error occurred is enough)

@whitequark
Copy link
Contributor

@Wren6991 I've finally resolved #97; can you please update this PR?

@whitequark
Copy link
Contributor

Moving to 0.2, which has the theme of "standard library improvements" anyway.


MultiReg is reset by the ``odomain`` reset only.
MultiReg is reset by the ``cd_o`` reset only.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiReg is renamed to FFSynchronizer in 8deb13c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants