-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the usage of -spill_path? #1628
Comments
Hi, @qiranq99. Thanks for the report.
Yes, you are right. The spill mechanism is for vineyard server. In theory, the total amount of objects that vineyard server can store is at least
When using the You can create another client (notice, avoid client cache here) and put objects in the vineyard, then the previous objects will be deserialized under the spill path. Also, you may get some inspiration from the spill test. https://github.com/v6d-io/v6d/blob/main/test/spill_test.cc BTW, could you please share your use case here? Thus, we can make the spill mechansim better. |
@dashanji Sorry for not making it clear. The scenario is like:
With the periodical |
Yes, you are right. The spill will spill the additional data to the storage and reload it to memory when necessary. Also, the spill has two watermarks python3 -m vineyard --help
vineyardd: Usage: vineyardd [options]
...
-spill_lower_rate (low watermark of triggering memory spilling)
type: double default: 0.29999999999999999
-spill_path (path to spill temporary files, if not set, spilling will be
disabled) type: string default: ""
-spill_upper_rate (high watermark of triggering memory spilling)
type: double default: 0.80000000000000004
... |
Hi @dashanji Please try the following use case with
I would expect Not sure whether it's a bug or not. |
@qiranq99 It's not a bug. As you set the To test the spill, you can try the following code. import vineyard
import numpy as np
data = np.random.rand(1000,1000,1000) # ~7.4G each data object
for epoch in range(10):
client = vineyard.connect("Use the default vineyard socket here")
# one client can't put more than 50G data
for batch in range(5):
client.put(data)
client.close() Then you will find the serialized the objects in the spill path. |
@dashanji Theoretically, if
Basically, by utilizing memory spilling, we want |
Ideally, the provided example should work. I'm not sure if there are any regression bugs happen, and spill is not triggered here. We'll double-check. The memory usage might exceed |
There're indeed some lifetime issues requires further investigation. |
Hi!
From the information provided by
python3 -m vineyard --help
, the option-spill_path <path>
seems to allow spilling data to the disk storage if there is no sufficient memory. However, the program gives not-enough-memory once the allocated shared memory has reached maximum capacity, instead of trying to spill the data with a server started viapython3 -m vineyard -spill_path /.../...
Therefore, I'm wondering the actual usage of
-spill_path
, or any substitutions for spilling the memory.Cheers :)
The text was updated successfully, but these errors were encountered: