Skip to content

Conversation

@terrytlu
Copy link
Contributor

Fix the issue that after Spark 3.2.0, when Spark reads an HBase snapshot, it always read empty, even if the hbase snapshot actually has data.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch 3 times, most recently from a96d09e to 5e35c8d Compare April 29, 2025 13:17
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from 5e35c8d to 79c6087 Compare May 7, 2025 08:13
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.


public InputSplit(TableDescriptor htd, RegionInfo regionInfo, List<String> locations, Scan scan,
Path restoreDir) {
this(htd, regionInfo, locations, scan, restoreDir, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem quite right in here, because SnapShotStats.getStoreFilesSize() would return 0 if the table has no any data.
What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing 😃 , it shouldn't always be 1 here, let me try to fix it..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to keep this constructor? The parent class is IA.Private, which means we are free to change anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try to remove it..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any difficulties removing this constructor?

SnapshotStats(final Configuration conf, final FileSystem fs, final SnapshotManifest mainfest)
throws CorruptedSnapshotException {
this.snapshot = SnapshotDescriptionUtils.readSnapshotInfo(fs, mainfest.getSnapshotDir());
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch 2 times, most recently from e45f5fc to 5f5ee39 Compare May 27, 2025 06:50
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from a4bf605 to 9c3d569 Compare May 28, 2025 08:22
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu
Copy link
Contributor Author

terrytlu commented Jun 4, 2025

Hi @guluo2016 and @Apache9 , could you help review this pr again? 🙏 thanks

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from 7abf47c to 5aed8c9 Compare June 24, 2025 07:15
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

TEST_UTIL.deleteTable(tableName);
admin.deleteSnapshot(snapshotName);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @guluo2016 , I have seen your comments. I don't know why the test case didn't pass a few days ago, busy resolving it😭... Now it is okay, please review the latest changes, let the test case for two scenarios:

  1. table has no data, and region size is 0
  2. table has some data, and region size is greater than 0

@guluo2016
Copy link
Member

LGTM
After applying these changes, I manually tested them and found they met expectations.

The code is as follows (Referencing the demo in the corresponding Jira.)

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>17</maven.compiler.source>
    <maven.compiler.target>17</maven.compiler.target>
    <hbase.version>4.0.0-alpha-1-SNAPSHOT</hbase.version>
    <spark.version>3.3.2</spark.version>
  </properties>
  <dependencies>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-common</artifactId>
      <version>${hbase.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>${hbase.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-mapreduce</artifactId>
      <version>${hbase.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>${spark.version}</version>
    </dependency>
  </dependencies>
public class App {
    public static void main( String[] args ) throws IOException {
        Configuration hconf = HBaseConfiguration.create();
        hconf.set("hbase.rootdir", "file:///opt/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/hbase");
        hconf.set("hbase.zookeeper.quorum", "127.0.0.1");
        hconf.set("zookeeper.znode.parent", "/hbase");
        SparkConf sparkConf = new SparkConf().setAppName("HbaseSnapshot").setMaster("local[*]");

        try (JavaSparkContext sc = new JavaSparkContext(sparkConf)) {
            Scan scan = new Scan();
            scan.addFamily(Bytes.toBytes("info"));
            hconf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan));
            Job job = Job.getInstance(hconf);
            Path path = new Path("file:///opt/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/snapshot");
            String snapName ="t01_snap";
            TableSnapshotInputFormat.setInput(job, snapName, path);
            JavaPairRDD<ImmutableBytesWritable, Result> newAPIHadoopRDD = sc.newAPIHadoopRDD(job.getConfiguration(),
                    TableSnapshotInputFormat.class, ImmutableBytesWritable.class, Result.class);

            newAPIHadoopRDD.foreach(tuple2 -> {
                Result result = tuple2._2();
                List<Cell> cells = result.listCells();
                for (Cell cell : cells) {
                    System.out.println("The cell data is " + Bytes.toString(CellUtil.cloneValue(cell)));
                }
            });
            System.out.println("newAPIHadoopRDD row count" + newAPIHadoopRDD.count());
        }
    }
}

The execution results are as follows.

[root@localhost check_hbase_data]# java --add-exports java.base/sun.nio.ch=ALL-UNNAMED  -cp lib/*:target/check_hbase_data-1.0.jar com.test.App
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/code/check_hbase_data/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/code/check_hbase_data/lib/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
The cell data is vvvvvv
newAPIHadoopRDD row count1

@guluo2016
Copy link
Member

@Apache9 Do you have any questions?

@terrytlu
Copy link
Contributor Author

terrytlu commented Jul 1, 2025

@Apache9 Please help to review the latest change of this pr, thanks very much 🙏

this.delegate = delegate;
}

public TableSnapshotRegionSplit(TableDescriptor htd, RegionInfo regionInfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is marked as IA.Public, so you can not delete a public method from it directly. You need to make it deprecated for a whole major release cycle before deleteing.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from 7ba8c67 to d89bc0e Compare July 4, 2025 10:24
@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from d89bc0e to f501f83 Compare July 4, 2025 11:53
@Apache-HBase

This comment has been minimized.

this.delegate = delegate;
}

@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadoc and deprecated tag to specify the life cycle for these APIs. You can find some examples in the current code base. And please also add some docs to explain why it is deprecated.


public InputSplit(TableDescriptor htd, RegionInfo regionInfo, List<String> locations, Scan scan,
Path restoreDir) {
this(htd, regionInfo, locations, scan, restoreDir, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any difficulties removing this constructor?

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 3m 9s master passed
+1 💚 compile 3m 56s master passed
+1 💚 checkstyle 0m 49s master passed
+1 💚 spotbugs 2m 4s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 8s the patch passed
+1 💚 compile 3m 56s the patch passed
-0 ⚠️ javac 0m 35s /results-compile-javac-hbase-mapreduce.txt hbase-mapreduce generated 1 new + 197 unchanged - 1 fixed = 198 total (was 198)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 36s /results-checkstyle-hbase-server.txt hbase-server: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
+1 💚 spotbugs 2m 19s the patch passed
+1 💚 hadoopcheck 12m 7s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
42m 52s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6947
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 7ee139ffb707 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f501f83
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server hbase-mapreduce U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 37s Maven dependency ordering for branch
+1 💚 mvninstall 3m 31s master passed
+1 💚 compile 1m 24s master passed
+1 💚 javadoc 0m 43s master passed
+1 💚 shadedjars 6m 11s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 8s the patch passed
+1 💚 compile 1m 19s the patch passed
+1 💚 javac 1m 19s the patch passed
+1 💚 javadoc 0m 41s the patch passed
+1 💚 shadedjars 6m 6s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 215m 30s hbase-server in the patch passed.
+1 💚 unit 21m 38s hbase-mapreduce in the patch passed.
266m 32s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6947
Optional Tests javac javadoc unit compile shadedjars
uname Linux 97a738631fd2 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f501f83
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/testReport/
Max. process+thread count 5049 (vs. ulimit of 30000)
modules C: hbase-server hbase-mapreduce U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/20/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Dec 4, 2025

Any updates here?

@terrytlu
Copy link
Contributor Author

terrytlu commented Dec 4, 2025

Any updates here?

@Apache9 sorry, forgot it. I will submit a new commit asap...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants