Vertex size spill problem #7

JavierJia · 2013-10-19T04:44:58Z

The Hyracks frame size should be smaller that 128K, otherwise it will have some unknown problem.

Our EdgeSet will grow bigger and bigger in path merge phase. If we use 100K to store the EdgeSet, it can only contains 12800 readid.

The first plan is to store the sorted id set into the HDFS, and only store the filename inside the node. And the sorted order will also benefit the union/intersect operation afterward.

jakebiesinger · 2013-10-21T18:22:47Z

With a single filename, we don't need to touch the HDFS through pathmerge, remove tips, remove bridge, and remove low coverage. If you kept a set of filenames, you could also do bubble merge without touching HDFS (there's a union of readids in this step).

Our basic scaffolding needs to add to them only, which could be done by adding files like bubble merge could do.

Split Repeat needs to read the sets (on HDFS) and divide them into pieces.

ghost assigned JavierJia Oct 19, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertex size spill problem #7

Vertex size spill problem #7

JavierJia commented Oct 19, 2013

jakebiesinger commented Oct 21, 2013

Vertex size spill problem #7

Vertex size spill problem #7

Comments

JavierJia commented Oct 19, 2013

jakebiesinger commented Oct 21, 2013