-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistency between new write and materialized view #136
Comments
take the following with a grain of salt, I'm not an author of this project :) Yes the materialized view is eventually consistent. Every write is stored durably, then is propagated throughout the graph to inner and materialized views that depend on the new information. In the meantime, reads can access stale data - which is assumed to be fine for normal usage. Note as well that because we are working with partially materialized views (i.e. entries in the cache can be evicted to maintain a reasonable memory footprint), read queries to a view might require to launch upqueries to fetch information further up the graph - which I believe is done asynchronously. The underlying storage of the cache uses evmap (which is the double hashmap discussed in the two-sigma presentation). As mentioned in the README:
From this description, I would tend to believe that Noria refreshes after every write, but I might very well be wrong about this. At least, we know that both usages are possible. For more details, I would recommend reading the paper! |
Hi! Jonathan's summary is essentially correct, though let me try to give a more direct answer to your question. The materialized views in Noria are not re-calculated in the same way that traditional systems that provide materialized views do. Specifically, we do not "refresh" the view at some fixed frequency, or when certain events happen. Instead, every update incrementally updates all dependent views, so the delay between a write and when it is visible should only be the (relatively short) time it takes for the write to go through the query operators and reach the view in question. You can essentially consider this the same as refreshing the view on every write, though in practice it's more efficient than this because a) writes are processed in batch, and b) the view is updated incrementally rather than fully re-executed. |
I learned about Noria from the TwoSigma talk and I find Noria extremely interesting and it can potentially for a great fit for my use case.
If I understand correctly, the materialized view (cache) is eventually consistent with the new writes but not atomically i.e. the re-calculation of the materialized view is being done async to the write operation (with the machine's best effort)? If this is the case, may I ask if every (transaction of) write operation will trigger a re-calc of the cache? Or the re-calc inside Noria actually has its own interval (if there is, how long?) to check if a re-calc is required i.e. doing it in its own pace - to avoid the "backlog" which can happen (e.g. peak-time) when the write operation is more frequent than the time it takes to do the re-calc of the materialized view?
I understand from another GitHub issue that Noria at the moment is still a research-prototype and probably not as mature as MySQL etc for a general production use case, but may I ask which subset of the system/feature is actually mature enough to use with production data? thank you very much!
The text was updated successfully, but these errors were encountered: