-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29272 When Spark reads an HBase snapshot, it always read empty … #6947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
...-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java
Show resolved
Hide resolved
a96d09e to
5e35c8d
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
5e35c8d to
79c6087
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
||
| public InputSplit(TableDescriptor htd, RegionInfo regionInfo, List<String> locations, Scan scan, | ||
| Path restoreDir) { | ||
| this(htd, regionInfo, locations, scan, restoreDir, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem quite right in here, because SnapShotStats.getStoreFilesSize() would return 0 if the table has no any data.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing 😃 , it shouldn't always be 1 here, let me try to fix it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to keep this constructor? The parent class is IA.Private, which means we are free to change anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me try to remove it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any difficulties removing this constructor?
| SnapshotStats(final Configuration conf, final FileSystem fs, final SnapshotManifest mainfest) | ||
| throws CorruptedSnapshotException { | ||
| this.snapshot = SnapshotDescriptionUtils.readSnapshotInfo(fs, mainfest.getSnapshotDir()); | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotRegionSizeCalculator.java
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
e45f5fc to
5f5ee39
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
a4bf605 to
9c3d569
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Hi @guluo2016 and @Apache9 , could you help review this pr again? 🙏 thanks |
...-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestSnapshotRegionSizeCalculator.java
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…data. HBASE-29272 When Spark reads an HBase snapshot, it always read empty data.
…data. HBASE-29272 When Spark reads an HBase snapshot, it always read empty data.
7abf47c to
5aed8c9
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| TEST_UTIL.deleteTable(tableName); | ||
| admin.deleteSnapshot(snapshotName); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @guluo2016 , I have seen your comments. I don't know why the test case didn't pass a few days ago, busy resolving it😭... Now it is okay, please review the latest changes, let the test case for two scenarios:
- table has no data, and region size is 0
- table has some data, and region size is greater than 0
|
LGTM The code is as follows (Referencing the demo in the corresponding Jira.) <properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
<hbase.version>4.0.0-alpha-1-SNAPSHOT</hbase.version>
<spark.version>3.3.2</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-mapreduce</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>public class App {
public static void main( String[] args ) throws IOException {
Configuration hconf = HBaseConfiguration.create();
hconf.set("hbase.rootdir", "file:///opt/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/hbase");
hconf.set("hbase.zookeeper.quorum", "127.0.0.1");
hconf.set("zookeeper.znode.parent", "/hbase");
SparkConf sparkConf = new SparkConf().setAppName("HbaseSnapshot").setMaster("local[*]");
try (JavaSparkContext sc = new JavaSparkContext(sparkConf)) {
Scan scan = new Scan();
scan.addFamily(Bytes.toBytes("info"));
hconf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan));
Job job = Job.getInstance(hconf);
Path path = new Path("file:///opt/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/snapshot");
String snapName ="t01_snap";
TableSnapshotInputFormat.setInput(job, snapName, path);
JavaPairRDD<ImmutableBytesWritable, Result> newAPIHadoopRDD = sc.newAPIHadoopRDD(job.getConfiguration(),
TableSnapshotInputFormat.class, ImmutableBytesWritable.class, Result.class);
newAPIHadoopRDD.foreach(tuple2 -> {
Result result = tuple2._2();
List<Cell> cells = result.listCells();
for (Cell cell : cells) {
System.out.println("The cell data is " + Bytes.toString(CellUtil.cloneValue(cell)));
}
});
System.out.println("newAPIHadoopRDD row count" + newAPIHadoopRDD.count());
}
}
}The execution results are as follows. [root@localhost check_hbase_data]# java --add-exports java.base/sun.nio.ch=ALL-UNNAMED -cp lib/*:target/check_hbase_data-1.0.jar com.test.App
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/code/check_hbase_data/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/code/check_hbase_data/lib/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
The cell data is vvvvvv
newAPIHadoopRDD row count1 |
|
@Apache9 Do you have any questions? |
|
@Apache9 Please help to review the latest change of this pr, thanks very much 🙏 |
| this.delegate = delegate; | ||
| } | ||
|
|
||
| public TableSnapshotRegionSplit(TableDescriptor htd, RegionInfo regionInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is marked as IA.Public, so you can not delete a public method from it directly. You need to make it deprecated for a whole major release cycle before deleteing.
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotRegionSizeCalculator.java
Show resolved
Hide resolved
7ba8c67 to
d89bc0e
Compare
This comment has been minimized.
This comment has been minimized.
d89bc0e to
f501f83
Compare
This comment has been minimized.
This comment has been minimized.
| this.delegate = delegate; | ||
| } | ||
|
|
||
| @Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc and deprecated tag to specify the life cycle for these APIs. You can find some examples in the current code base. And please also add some docs to explain why it is deprecated.
|
|
||
| public InputSplit(TableDescriptor htd, RegionInfo regionInfo, List<String> locations, Scan scan, | ||
| Path restoreDir) { | ||
| this(htd, regionInfo, locations, scan, restoreDir, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any difficulties removing this constructor?
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
Any updates here? |
@Apache9 sorry, forgot it. I will submit a new commit asap... |
Fix the issue that after Spark 3.2.0, when Spark reads an HBase snapshot, it always read empty, even if the hbase snapshot actually has data.