Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertex size spill problem #7

Open
JavierJia opened this issue Oct 19, 2013 · 1 comment
Open

Vertex size spill problem #7

JavierJia opened this issue Oct 19, 2013 · 1 comment
Assignees
Labels

Comments

@JavierJia
Copy link
Collaborator

The Hyracks frame size should be smaller that 128K, otherwise it will have some unknown problem.

Our EdgeSet will grow bigger and bigger in path merge phase. If we use 100K to store the EdgeSet, it can only contains 12800 readid.

The first plan is to store the sorted id set into the HDFS, and only store the filename inside the node. And the sorted order will also benefit the union/intersect operation afterward.

@ghost ghost assigned JavierJia Oct 19, 2013
@jakebiesinger
Copy link
Contributor

With a single filename, we don't need to touch the HDFS through pathmerge, remove tips, remove bridge, and remove low coverage. If you kept a set of filenames, you could also do bubble merge without touching HDFS (there's a union of readids in this step).

Our basic scaffolding needs to add to them only, which could be done by adding files like bubble merge could do.

Split Repeat needs to read the sets (on HDFS) and divide them into pieces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants