Skip to content

关于partitioner的疑问 #51

@leo-987

Description

@leo-987

我在 Learning Spark 中看到有一段话:

Finally, for binary operations, which partitioner is set on the output depends on the parent RDDs’ partitioners. By default, it is a hash partitioner, with the number of partitions set to the level of parallelism of the operation. However, if one of the parents has a partitioner set, it will be that partitioner; and if both parents have a partitioner set, it will be the partitioner of the first parent.

子RDD的partitioner应该由父RDD的partitioner决定。但在 SparkInternals 的第二章,父子RDD的partitioner都不相同,这是怎么回事?如果两个父RDD的其中一个是hash-partitioner,那么子RDD不应该也是hash-partitioner吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions