HBASE-29272 When Spark reads an HBase snapshot, it always read empty … #6947

terrytlu · 2025-04-25T10:49:46Z

Fix the issue that after Spark 3.2.0, when Spark reads an HBase snapshot, it always read empty, even if the hbase snapshot actually has data.

...-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java

guluo2016 · 2025-05-07T14:08:37Z

...-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java


    public InputSplit(TableDescriptor htd, RegionInfo regionInfo, List<String> locations, Scan scan,
      Path restoreDir) {
+      this(htd, regionInfo, locations, scan, restoreDir, 1);


This doesn't seem quite right in here, because SnapShotStats.getStoreFilesSize() would return 0 if the table has no any data.
What do you think?

Thanks for reviewing 😃 , it shouldn't always be 1 here, let me try to fix it..

Do we still want to keep this constructor? The parent class is IA.Private, which means we are free to change anything.

let me try to remove it..

Are there any difficulties removing this constructor?

Apache9 · 2025-05-22T13:54:42Z

hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotInfo.java

+    SnapshotStats(final Configuration conf, final FileSystem fs, final SnapshotManifest mainfest)
+      throws CorruptedSnapshotException {
+      this.snapshot = SnapshotDescriptionUtils.readSnapshotInfo(fs, mainfest.getSnapshotDir());
+      ;


Remove this?

hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotRegionSizeCalculator.java

terrytlu · 2025-06-04T07:53:26Z

Hi @guluo2016 and @Apache9 , could you help review this pr again? 🙏 thanks

...-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestSnapshotRegionSizeCalculator.java

…data.

…data. HBASE-29272 When Spark reads an HBase snapshot, it always read empty data.

…data. abc

…data.

terrytlu · 2025-06-25T02:39:10Z

...-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestSnapshotRegionSizeCalculator.java

+    TEST_UTIL.deleteTable(tableName);
+    admin.deleteSnapshot(snapshotName);
+  }
+}


Hello @guluo2016 , I have seen your comments. I don't know why the test case didn't pass a few days ago, busy resolving it😭... Now it is okay, please review the latest changes, let the test case for two scenarios:

table has no data, and region size is 0

table has some data, and region size is greater than 0

guluo2016 · 2025-06-26T13:23:33Z

LGTM
After applying these changes, I manually tested them and found they met expectations.

The code is as follows (Referencing the demo in the corresponding Jira.)

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>17</maven.compiler.source>
    <maven.compiler.target>17</maven.compiler.target>
    <hbase.version>4.0.0-alpha-1-SNAPSHOT</hbase.version>
    <spark.version>3.3.2</spark.version>
  </properties>
  <dependencies>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-common</artifactId>
      <version>${hbase.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>${hbase.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-mapreduce</artifactId>
      <version>${hbase.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>${spark.version}</version>
    </dependency>
  </dependencies>

public class App {
    public static void main( String[] args ) throws IOException {
        Configuration hconf = HBaseConfiguration.create();
        hconf.set("hbase.rootdir", "file:///opt/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/hbase");
        hconf.set("hbase.zookeeper.quorum", "127.0.0.1");
        hconf.set("zookeeper.znode.parent", "/hbase");
        SparkConf sparkConf = new SparkConf().setAppName("HbaseSnapshot").setMaster("local[*]");

        try (JavaSparkContext sc = new JavaSparkContext(sparkConf)) {
            Scan scan = new Scan();
            scan.addFamily(Bytes.toBytes("info"));
            hconf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan));
            Job job = Job.getInstance(hconf);
            Path path = new Path("file:///opt/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/snapshot");
            String snapName ="t01_snap";
            TableSnapshotInputFormat.setInput(job, snapName, path);
            JavaPairRDD<ImmutableBytesWritable, Result> newAPIHadoopRDD = sc.newAPIHadoopRDD(job.getConfiguration(),
                    TableSnapshotInputFormat.class, ImmutableBytesWritable.class, Result.class);

            newAPIHadoopRDD.foreach(tuple2 -> {
                Result result = tuple2._2();
                List<Cell> cells = result.listCells();
                for (Cell cell : cells) {
                    System.out.println("The cell data is " + Bytes.toString(CellUtil.cloneValue(cell)));
                }
            });
            System.out.println("newAPIHadoopRDD row count" + newAPIHadoopRDD.count());
        }
    }
}

The execution results are as follows.

[root@localhost check_hbase_data]# java --add-exports java.base/sun.nio.ch=ALL-UNNAMED  -cp lib/*:target/check_hbase_data-1.0.jar com.test.App
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/code/check_hbase_data/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/code/check_hbase_data/lib/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
The cell data is vvvvvv
newAPIHadoopRDD row count1

guluo2016 · 2025-06-26T13:25:29Z

@Apache9 Do you have any questions?

terrytlu · 2025-07-01T07:08:30Z

@Apache9 Please help to review the latest change of this pr, thanks very much 🙏

Apache9 · 2025-07-01T07:24:00Z

hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java

      this.delegate = delegate;
    }

-    public TableSnapshotRegionSplit(TableDescriptor htd, RegionInfo regionInfo,


The class is marked as IA.Public, so you can not delete a public method from it directly. You need to make it deprecated for a whole major release cycle before deleteing.

hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotRegionSizeCalculator.java

…data.

Apache9 · 2025-07-04T15:04:49Z

hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapred/TableSnapshotInputFormat.java

      this.delegate = delegate;
    }

+    @Deprecated


Please add javadoc and deprecated tag to specify the life cycle for these APIs. You can find some examples in the current code base. And please also add some docs to explain why it is deprecated.

Apache9 · 2025-07-04T15:05:47Z

...-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java


    public InputSplit(TableDescriptor htd, RegionInfo regionInfo, List<String> locations, Scan scan,
      Path restoreDir) {
+      this(htd, regionInfo, locations, scan, restoreDir, 1);


Are there any difficulties removing this constructor?

Apache-HBase · 2025-07-04T15:27:39Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 28s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ master Compile Tests _
+0 🆗	mvndep	0m 11s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 9s		master passed
+1 💚	compile	3m 56s		master passed
+1 💚	checkstyle	0m 49s		master passed
+1 💚	spotbugs	2m 4s		master passed
+1 💚	spotless	0m 45s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 12s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 8s		the patch passed
+1 💚	compile	3m 56s		the patch passed
-0 ⚠️	javac	0m 35s	/results-compile-javac-hbase-mapreduce.txt	hbase-mapreduce generated 1 new + 197 unchanged - 1 fixed = 198 total (was 198)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 36s	/results-checkstyle-hbase-server.txt	hbase-server: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
+1 💚	spotbugs	2m 19s		the patch passed
+1 💚	hadoopcheck	12m 7s		Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚	spotless	0m 45s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 17s		The patch does not generate ASF License warnings.
		42m 52s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#6947
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 7ee139ffb707 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `f501f83`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	85 (vs. ulimit of 30000)
modules	C: hbase-server hbase-mapreduce U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-07-04T19:11:19Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 30s		Docker mode activated.
-0 ⚠️	yetus	0m 3s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ master Compile Tests _
+0 🆗	mvndep	0m 37s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 31s		master passed
+1 💚	compile	1m 24s		master passed
+1 💚	javadoc	0m 43s		master passed
+1 💚	shadedjars	6m 11s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 12s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 8s		the patch passed
+1 💚	compile	1m 19s		the patch passed
+1 💚	javac	1m 19s		the patch passed
+1 💚	javadoc	0m 41s		the patch passed
+1 💚	shadedjars	6m 6s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	215m 30s		hbase-server in the patch passed.
+1 💚	unit	21m 38s		hbase-mapreduce in the patch passed.
		266m 32s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#6947
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux 97a738631fd2 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `f501f83`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/testReport/
Max. process+thread count	5049 (vs. ulimit of 30000)
modules	C: hbase-server hbase-mapreduce U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache9 · 2025-12-04T09:43:42Z

Any updates here?

terrytlu · 2025-12-04T10:32:27Z

Any updates here?

@Apache9 sorry, forgot it. I will submit a new commit asap...

This comment has been minimized.

Sign in to view

guluo2016 reviewed Apr 27, 2025

View reviewed changes

...-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java Show resolved Hide resolved

terrytlu force-pushed the master-HBASE-29272 branch 3 times, most recently from a96d09e to 5e35c8d Compare April 29, 2025 13:17

This comment has been minimized.

Sign in to view

terrytlu force-pushed the master-HBASE-29272 branch from 5e35c8d to 79c6087 Compare May 7, 2025 08:13

This comment has been minimized.

Sign in to view

guluo2016 reviewed May 7, 2025

View reviewed changes

Apache9 reviewed May 22, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

terrytlu force-pushed the master-HBASE-29272 branch 2 times, most recently from e45f5fc to 5f5ee39 Compare May 27, 2025 06:50

This comment has been minimized.

Sign in to view

terrytlu force-pushed the master-HBASE-29272 branch from a4bf605 to 9c3d569 Compare May 28, 2025 08:22

This comment has been minimized.

Sign in to view

guluo2016 reviewed Jun 5, 2025

View reviewed changes

...-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestSnapshotRegionSizeCalculator.java Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

terrytlu added 6 commits June 24, 2025 15:13

HBASE-29272 When Spark reads an HBase snapshot, it always read empty …

5d66c94

…data.

HBASE-29272 When Spark reads an HBase snapshot, it always read empty …

5154ecf

…data.

HBASE-29272 When Spark reads an HBase snapshot, it always read empty …

2d55b76

…data. HBASE-29272 When Spark reads an HBase snapshot, it always read empty data.

HBASE-29272 When Spark reads an HBase snapshot, it always read empty …

124d879

…data. HBASE-29272 When Spark reads an HBase snapshot, it always read empty data.

HBASE-29272 When Spark reads an HBase snapshot, it always read empty …

191c69d

…data. abc

HBASE-29272 When Spark reads an HBase snapshot, it always read empty …

5aed8c9

…data.

terrytlu force-pushed the master-HBASE-29272 branch from 7abf47c to 5aed8c9 Compare June 24, 2025 07:15

This comment has been minimized.

Sign in to view

HBASE-29272 When Spark reads an HBase snapshot, it always read empty …

b949655

…data.

This comment has been minimized.

Sign in to view

terrytlu commented Jun 25, 2025

View reviewed changes

guluo2016 approved these changes Jun 26, 2025

View reviewed changes

Apache9 reviewed Jul 1, 2025

View reviewed changes

terrytlu force-pushed the master-HBASE-29272 branch from 7ba8c67 to d89bc0e Compare July 4, 2025 10:24

This comment has been minimized.

Sign in to view

HBASE-29272 When Spark reads an HBase snapshot, it always read empty …

f501f83

…data.

terrytlu force-pushed the master-HBASE-29272 branch from d89bc0e to f501f83 Compare July 4, 2025 11:53

This comment has been minimized.

Sign in to view

Apache9 reviewed Jul 4, 2025

View reviewed changes

HBASE-29272 When Spark reads an HBase snapshot, it always read empty … #6947

Are you sure you want to change the base?

HBASE-29272 When Spark reads an HBase snapshot, it always read empty … #6947

Conversation

terrytlu commented Apr 25, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

terrytlu commented Jun 4, 2025

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

guluo2016 commented Jun 26, 2025

Uh oh!

guluo2016 commented Jun 26, 2025

Uh oh!

terrytlu commented Jul 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Jul 4, 2025

Uh oh!

Apache-HBase commented Jul 4, 2025

Uh oh!

Apache9 commented Dec 4, 2025

Uh oh!

terrytlu commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development