-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(p2p): cache responses to serve without roundtrip to db #2352
Conversation
dcdaaf7
to
8079739
Compare
crates/services/p2p/src/service.rs
Outdated
impl CachedView { | ||
fn new(metrics: bool) -> Self { | ||
Self { | ||
sealed_block_headers: DashMap::new(), | ||
transactions_on_blocks: DashMap::new(), | ||
metrics, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want to also support sub-ranges or even partial ranges here, but can be in the future :)
009dc14
to
5b03fa0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this. I Hate to be annoying here, but to approve this I need to:
- See the Changelog updated.
- Understand the reasoning behind the current caching strategy and the benefits/drawbacks over an LRU cache.
- Be certain that we don't open the door to OOM attacks by allowing our cache to be overloaded.
Let me know your thoughts on 2 and 3. I'm happy to jump on a call to discuss this and figure out a good path forward.
pub struct CachedView { | ||
sealed_block_headers: DashMap<Range<u32>, Vec<SealedBlockHeader>>, | ||
transactions_on_blocks: DashMap<Range<u32>, Vec<Transactions>>, | ||
metrics: bool, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit hesitant to the current approach of storing everything and clearing on a regular interval. Right now, there is no memory limit of the cache, and we use ranges as keys. So if someone queries the ranges (1..=4, 1..=2, 3..=4), we'd store all blocks in the 1..=4 range twice - and this could theoretically grow quadratically for larger ranges.
I would assume that the most popular queries at a given time are quite similar. Why not use a normal LRU cache with fixed memory size? Alternatively just maintain a cache over the last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, its still wip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I see this PR is still a draft :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we now use block height as the key in 6422210
we will retain the time-based eviction strategy for now
… instead of a per-range basis
Co-authored-by: Mårten Blankfors <[email protected]>
} | ||
} | ||
|
||
pub(super) fn clear(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use some LRU cache instead of wiping the whole cache every 10 seconds? This is just a thought since the current approach should also work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in the future we can use an LRU :) we discussed it somewhere above too
crates/services/p2p/src/service.rs
Outdated
cache_reset_interval: Duration, | ||
next_cache_reset_time: Instant, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it could be an internal logic of the CachedView
and on each insert/get we can clean it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be, but I wanted to not make the get_from_cache_or_db
require a mutable reference to Self
because it's just a getter. no strong opinion here, so if you want it that way, i can move it around
|
||
for height in range.clone() { | ||
if let Some(item) = cache.get(&height) { | ||
items.push(item.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be follow up PR, but it would be nice if we avoid heavy clone here and used Arc
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment here - d897cba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
associated issue: #2436
let block_height_range = 0..100; | ||
let sealed_headers = default_sealed_headers(block_height_range.clone()); | ||
let result = cached_view | ||
.get_sealed_headers(&db, block_height_range.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the cache to be linked to the DB at the time it is created, rather than having to specify the DB when invoking the function get_sealed_headers
or get_transactions
. Just curious to know what's the reason behind this choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will notice that the view of the current tip of the db (LatestView) is passed into the CachedView while making calls
LGTM. I have a side question of whether the cache should be cleared in case of a DB rollback, to avoid inconsistencies? |
that's a good question! i wonder if we have a hook from the db to be notified when it gets rolled back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good question! i wonder if we have a hook from the db to be notified when it gets rolled back.
We only can do rollback before services started. We don't need to handle the case when we do rollback during running of the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the usage of the concurrent LRU cache would solve all race conditions between updating cache and actually usage of it. Plus it will remove the usage of a separate timer.
Could we try to look into it in this PR?
The most of the logic of PR remains the same, it will just remove "cleanup" logic.
crates/services/p2p/src/service.rs
Outdated
@@ -444,7 +450,7 @@ impl<V, T> UninitializedTask<V, SharedState, T> { | |||
} | |||
} | |||
|
|||
impl<P: TaskP2PService, V, B: Broadcast, T> Task<P, V, B, T> { | |||
impl<P: TaskP2PService, V: AtomicView, B: Broadcast, T> Task<P, V, B, T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]: Would be better to move all constraints to where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in b677c01
added in 496071f there is still the explicit clone that is done before sending the value as a response though |
self.cache.insert(key.clone(), Arc::new(value)); | ||
|
||
// Update the access order. | ||
order.retain(|k| k != &key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to use already existing implementation of the LRU from crates.io. The current implementation is too slow. and operation here is expensive. Plus, usage of the Mutex
to manage order
destroyed all benefits of using DashMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without using Dashmap
, we will need a mutable reference to the underlying LruCache, from https://crates.io/crates/lru for example. and for every p2p request that comes through, we will need to pass a mutable reference of the CachedView
fuel-core/crates/services/p2p/src/service.rs
Line 581 in 496071f
let cached_view = self.cached_view.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want lock-free lru then we can introduce a nonce
to each of the elements being inserted, and to evict them we can sort, iterate and remove. this would increase time spent in eviction though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use something like https://docs.rs/crossbeam-skiplist/latest/crossbeam_skiplist/ to order the insertions of elements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want lock-free LRU. The Dashmap
has locks inside, and I'm trying to say that usage of the Mutex
on upper level removes benefits of using Dashmap
that uses RW lock inside.
We don't need to re-invent LRU implementation, we just can reuse already existing implementation that works with &self
without requiring usage of the mutex/rwlock by us explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced dashmap with quick_cache which supports LRU-style eviction (Clock-PRO) - 1bdcf00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice stuff!
## Version v0.41.0 ### Added - [2547](#2547): Replace the old Graphql gas price provider adapter with the ArcGasPriceEstimate. - [2445](#2445): Added GQL endpoint for querying asset details. - [2442](#2442): Add uninitialized task for V1 gas price service - [2154](#2154): Added `Unknown` variant to `ConsensusParameters` graphql queries - [2154](#2154): Added `Unknown` variant to `Block` graphql queries - [2154](#2154): Added `TransactionType` type in `fuel-client` - [2321](#2321): New metrics for the TxPool: - The size of transactions in the txpool (`txpool_tx_size`) - The time spent by a transaction in the txpool in seconds (`txpool_tx_time_in_txpool_seconds`) - The number of transactions in the txpool (`txpool_number_of_transactions`) - The number of transactions pending verification before entering the txpool (`txpool_number_of_transactions_pending_verification`) - The number of executable transactions in the txpool (`txpool_number_of_executable_transactions`) - The time it took to select transactions for inclusion in a block in microseconds (`txpool_select_transactions_time_microseconds`) - The time it took to insert a transaction in the txpool in microseconds (`transaction_insertion_time_in_thread_pool_microseconds`) - [2385](#2385): Added new histogram buckets for some of the TxPool metrics, optimize the way they are collected. - [2347](#2364): Add activity concept in order to protect against infinitely increasing DA gas price scenarios - [2362](#2362): Added a new request_response protocol version `/fuel/req_res/0.0.2`. In comparison with `/fuel/req/0.0.1`, which returns an empty response when a request cannot be fulfilled, this version returns more meaningful error codes. Nodes still support the version `0.0.1` of the protocol to guarantee backward compatibility with fuel-core nodes. Empty responses received from nodes using the old protocol `/fuel/req/0.0.1` are automatically converted into an error `ProtocolV1EmptyResponse` with error code 0, which is also the only error code implemented. More specific error codes will be added in the future. - [2386](#2386): Add a flag to define the maximum number of file descriptors that RocksDB can use. By default it's half of the OS limit. - [2376](#2376): Add a way to fetch transactions in P2P without specifying a peer. - [2361](#2361): Add caches to the sync service to not reask for data it already fetched from the network. - [2327](#2327): Add more services tests and more checks of the pool. Also add an high level documentation for users of the pool and contributors. - [2416](#2416): Define the `GasPriceServiceV1` task. - [2447](#2447): Use new `expiration` policy in the transaction pool. Add a mechanism to prune the transactions when they expired. - [1922](#1922): Added support for posting blocks to the shared sequencer. - [2033](#2033): Remove `Option<BlockHeight>` in favor of `BlockHeightQuery` where applicable. - [2490](#2490): Added pagination support for the `balances` GraphQL query, available only when 'balances indexation' is enabled. - [2439](#2439): Add gas costs for the two new zk opcodes `ecop` and `eadd` and the benches that allow to calibrate them. - [2472](#2472): Added the `amountU128` field to the `Balance` GraphQL schema, providing the total balance as a `U128`. The existing `amount` field clamps any balance exceeding `U64` to `u64::MAX`. - [2526](#2526): Add possibility to not have any cache set for RocksDB. Add an option to either load the RocksDB columns families on creation of the database or when the column is used. - [2532](#2532): Getters for inner rocksdb database handles. - [2524](#2524): Adds a new lock type which is optimized for certain workloads to the txpool and p2p services. - [2535](#2535): Expose `backup` and `restore` APIs on the `CombinedDatabase` struct to create portable backups and restore from them. - [2550](#2550): Add statistics and more limits infos about txpool on the node_info endpoint ### Fixed - [2560](#2560): Fix flaky test by increasing timeout - [2558](#2558): Rename `cost` and `reward` to remove `excess` wording - [2469](#2469): Improved the logic for syncing the gas price database with on_chain database - [2365](#2365): Fixed the error during dry run in the case of race condition. - [2366](#2366): The `importer_gas_price_for_block` metric is properly collected. - [2369](#2369): The `transaction_insertion_time_in_thread_pool_milliseconds` metric is properly collected. - [2413](#2413): block production immediately errors if unable to lock the mutex. - [2389](#2389): Fix construction of reverse iterator in RocksDB. - [2479](#2479): Fix an error on the last iteration of the read and write sequential opcodes on contract storage. - [2478](#2478): Fix proof created by `message_receipts_proof` function by ignoring the receipts from failed transactions to match `message_outbox_root`. - [2485](#2485): Hardcode the timestamp of the genesis block and version of `tai64` to avoid breaking changes for us. - [2511](#2511): Fix backward compatibility of V0Metadata in gas price db. ### Changed - [2469](#2469): Updated adapter for querying costs from DA Block committer API - [2469](#2469): Use the gas price from the latest block to estimate future gas prices - [2501](#2501): Use gas price from block for estimating future gas prices - [2468](#2468): Abstract unrecorded blocks concept for V1 algorithm, create new storage impl. Introduce `TransactionableStorage` trait to allow atomic changes to the storage. - [2295](#2295): `CombinedDb::from_config` now respects `state_rewind_policy` with tmp RocksDB. - [2378](#2378): Use cached hash of the topic instead of calculating it on each publishing gossip message. - [2438](#2438): Refactored service to use new implementation of `StorageRead::read` that takes an offset in input. - [2429](#2429): Introduce custom enum for representing result of running service tasks - [2377](#2377): Add more errors that can be returned as responses when using protocol `/fuel/req_res/0.0.2`. The errors supported are `ProtocolV1EmptyResponse` (status code `0`) for converting empty responses sent via protocol `/fuel/req_res/0.0.1`, `RequestedRangeTooLarge`(status code `1`) if the client requests a range of objects such as sealed block headers or transactions too large, `Timeout` (status code `2`) if the remote peer takes too long to fulfill a request, or `SyncProcessorOutOfCapacity` if the remote peer is fulfilling too many requests concurrently. - [2233](#2233): Introduce a new column `modification_history_v2` for storing the modification history in the historical rocksDB. Keys in this column are stored in big endian order. Changed the behaviour of the historical rocksDB to write changes for new block heights to the new column, and to perform lookup of values from the `modification_history_v2` table first, and then from the `modification_history` table, performing a migration upon access if necessary. - [2383](#2383): The `balance` and `balances` GraphQL query handlers now use index to provide the response in a more performant way. As the index is not created retroactively, the client must be initialized with an empty database and synced from the genesis block to utilize it. Otherwise, the legacy way of retrieving data will be used. - [2463](#2463): The `coinsToSpend` GraphQL query handler now uses index to provide the response in a more performant way. As the index is not created retroactively, the client must be initialized with an empty database and synced from the genesis block to utilize it. Otherwise, the legacy way of retrieving data will be used. - [2556](#2556): Ensure that the `last_recorded_height` is set for the DA gas price source. #### Breaking - [2469](#2469): Move from `GasPriceServicev0` to `GasPriceServiceV1`. Include new config values. - [2438](#2438): The `fuel-core-client` can only work with new version of the `fuel-core`. The `0.40` and all older versions are not supported. - [2438](#2438): Updated `fuel-vm` to `0.59.1` release. Check [release notes](https://github.com/FuelLabs/fuel-vm/releases/tag/v0.59.0) for more details. - [2389](#2258): Updated the `messageProof` GraphQL schema to return a non-nullable `MessageProof`. - [2154](#2154): Transaction graphql endpoints use `TransactionType` instead of `fuel_tx::Transaction`. - [2446](#2446): Use graphiql instead of graphql-playground due to known vulnerability and stale development. - [2379](#2379): Change `kv_store::Value` to be `Arc<[u8]>` instead of `Arc<Vec<u8>>`. - [2490](#2490): Updated GraphQL complexity calculation for `balances` query to account for pagination (`first`/`last`) and nested field complexity (`child_complexity`). Queries with large pagination values or deeply nested fields may have higher complexity costs. - [2463](#2463): 'CoinsQueryError::MaxCoinsReached` variant has been removed. The `InsufficientCoins` variant has been renamed to `InsufficientCoinsForTheMax` and it now contains the additional `max` field - [2463](#2463): The number of excluded ids in the `coinsToSpend` GraphQL query is now limited to the maximum number of inputs allowed in transaction. - [2463](#2463): The `coinsToSpend` GraphQL query may now return different coins, depending whether the indexation is enabled or not. However, regardless of the differences, the returned coins will accurately reflect the current state of the database within the context of the query. - [2526](#2526): By default the cache of RocksDB is now disabled instead of being `1024 * 1024 * 1024`. ## What's Changed * Add metrics to TxPool by @acerone85 in #2321 * Fix collection of gas price metric by @rafal-ch in #2366 * Add documentation to run a ignition node in readme by @AurelienFT in #2363 * Fix collection of tx pool insertion time metric by @rafal-ch in #2369 * Add versioning to request response protocols by @acerone85 in #2362 * Return reason of why proof cant be generated by @rafal-ch in #2258 * p2p: use precalculated topic hash by @yaziciahmet in #2378 * Remove ignore RUSTSEC-2024-0336 by @AurelienFT in #2384 * Deal with negative feed back loop in DA gas price by @MitchTurner in #2364 * Add new flag for maximum file descriptors in rocksdb. by @AurelienFT in #2386 * Add codeowners for gas price algorithm crate by @rafal-ch in #2404 * Weekly `cargo update` by @github-actions in #2373 * chore(gas_price_service): initialize v1 metadata by @rymnc in #2288 * chore(gas_price_service_v0): remove unused trait impl by @rymnc in #2410 * Update tai64 to fix the wrong time offset by @AurelienFT in #2409 * fix(block_producer): immediately return error if lock cannot be acquired during production by @rymnc in #2413 * Add a way to fetch transactions in P2P without specifying a peer by @AurelienFT in #2376 * Add a new code owner for tx pool by @AurelienFT in #2417 * Satisfy clippy in `gas-price-analysis` by @rafal-ch in #2418 * Txpool metrics update by @rafal-ch in #2385 * Improve TxPool tests and documentation by @AurelienFT in #2327 * feat(gas_price_service_v1): define RunnableTask for GasPriceServiceV1 by @rymnc in #2416 * Return reason of why proof cant be generated (api change) by @rafal-ch in #2389 * Fuel/Request_Response v0.0.2: More meaningful error messages by @acerone85 in #2377 * Fix reverse iterator in RocksDB by @AurelienFT in #2398 * Add test node herself in reserved nodes. by @AurelienFT in #2390 * Weekly `cargo update` by @github-actions in #2424 * Weekly `cargo update` by @github-actions in #2440 * Resolve some falky tests and improve CI times by @AurelienFT in #2401 * feat: handle `Unknown` transactions, blocks and consensus parameters by @hal3e in #2154 * fix(p2p): cache responses to serve without roundtrip to db by @rymnc in #2352 * Replace task `run()` return result with custom enum by @MitchTurner in #2429 * Fix codeowners by @AurelienFT in #2444 * fix(graphql_playground): use graphiql instead by @rymnc in #2446 * Weekly `cargo update` by @github-actions in #2453 * refactor: remove `Option<BlockHeight>` and use new enum where applicable by @matt-user in #2033 * Fixed the error during dry run by @xgreenx in #2365 * Add decompression traits and a test case by @Dentosal in #2295 * Versioned Storage for Modifications History by @acerone85 in #2233 * Allow DA recorded blocks to come out-of-order by @MitchTurner in #2415 * feat: Change `kv_store::Value` to be Arc<[u8]> instead of Arc<Vec<u8>> by @netrome in #2411 * Optimize balance-related queries with a cache by @rafal-ch in #2383 * fix: Add missing features to `fuel-core-tests` by @netrome in #2467 * Keep data in fails cases in sync service by @AurelienFT in #2361 * Weekly `cargo update` by @github-actions in #2470 * Revert balances amount to `U64` and introduce new `amountU128` getter by @rafal-ch in #2472 * Create uninitialized task for v1 gas price service by @MitchTurner in #2442 * Port the 0.40.2 fix of TAI on master by @AurelienFT in #2485 * Ignore RUSTSEC-2024-0421 by @AurelienFT in #2489 * Ignore receipts from failed transactions in `message_receipts_proof` by @AurelienFT in #2478 * Add unrecorded blocks abstraction to gas price algo by @MitchTurner in #2468 * Fix last iteration in sequential opcode by @AurelienFT in #2479 * fix(gas_price_service_v0): bring back removed fields, causing UB when trying to access by @rymnc in #2511 * Refactor fuel-core to use version of StorageRead::read with offset (Full update to 0.59.1) by @acerone85 in #2438 * Sync the version of the `fuel-core` with minor hot fixes by @xgreenx in #2516 * fix(docs): typo preventing ci checks from passing by @rymnc in #2525 * Integration test for balances and (non)retryable messages by @rafal-ch in #2505 * Add document for launching Ignition node from source and Local network from source by @AurelienFT in #2502 * Make the rocksdb cache optional in config and add policy for column opening by @AurelienFT in #2526 * Weekly `cargo update` by @github-actions in #2530 * chore(rocksdb): getter for inner database handle by @rymnc in #2532 * Use gas prices from actual blocks to calculate estimate gas prices by @MitchTurner in #2501 * chore(codeowners): gas price service codeowners by @rymnc in #2534 * Add zk opcodes by @AurelienFT in #2439 * Gas price simulation data retriever by @acerone85 in #2533 * Shared sequencer integration by @Dentosal in #1922 * Use expiration policy by @AurelienFT in #2447 * Fixed TPS benchmark to work with latest changes by @xgreenx in #2515 * Use indexation cache to satisfy "coins to spend" queries by @rafal-ch in #2463 * feat(txpool|p2p): use seqlock instead of small copy-able RwLocks by @rymnc in #2524 * Create new index for tracking Asset metadata by @maschad in #2445 * feat(rocksdb): remove getters for internal rocksdb handles, expose `backup` instead by @rymnc in #2535 * Integrate with V1 algo for tests by @MitchTurner in #2469 * Lock-free `latest_l2_height` in gas price service by @rafal-ch in #2546 * chore(gas_price_service_v1): strictly ensure last_recorded_height is set, to avoid initial poll of da source by @rymnc in #2556 * Replace old Graphql Gas Price adapter with new latest gas price struct by @MitchTurner in #2547 * Rename cost and rewards without 'excess' by @MitchTurner in #2558 * Add current pool gas to the node info endpoint by @AurelienFT in #2550 * Pagination queries for `balances` endpoint by @rafal-ch in #2490 * 2559 Increase timeout for test by @MitchTurner in #2560 * Add test expiration policy in executor by @AurelienFT in #2563 ## New Contributors * @yaziciahmet made their first contribution in #2378 **Full Changelog**: v0.40.0...v0.41.0
Linked Issues/PRs
Description
When we request transactions for a given block range, we shouldn't only keep using the same peer causing pressure on it. we should pick a random one with the same height and try to get the transactions from that instead.This PR caches p2p responses (ttl 10 seconds by default) and serves requests from cache falling back to db for others.
Checklist
Before requesting review
After merging, notify other teams
[Add or remove entries as needed]