Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-3331: Add operation specific HDFS counters for Tez UI #379

Merged
merged 3 commits into from
Dec 13, 2024

Conversation

abstractdog
Copy link
Contributor

@abstractdog abstractdog commented Nov 7, 2024

This a rebased, reworked version of the last patch on TEZ-3331:
https://issues.apache.org/jira/secure/attachment/12926702/TEZ-3331.8.patch

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@abstractdog abstractdog requested a review from ayushtkn November 21, 2024 07:19
@abstractdog
Copy link
Contributor Author

@ayushtkn, can you please review this one? we're using this downstream for a long time, haven't ported back to upstream yet

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx Laszlo for the PR, minor stuff rest looks good

FILE_BYTES_READ("fileBytesRead"),
FILE_BYTES_WRITTEN("fileBytesWritten"),

// Additional counters from HADOOP-13305
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some more counters as well, like OP_CREATE_NON_RECURSIVE, OP_EXISTS, OP_IS_FILE etc
Should we include them as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm refreshing it according to the current state of CommonStatisticNames

Comment on lines 82 to 87
if (!statisticUpdaters.containsKey(stats.getScheme())) {
Map<String, FileSystemStatisticUpdater> updaterSet = new TreeMap<>();
statisticUpdaters.put(stats.getScheme(), updaterSet);
}
FileSystemStatisticUpdater updater = statisticUpdaters.get(stats.getScheme())
.computeIfAbsent(stats.getName(), k -> new FileSystemStatisticUpdater(tezCounters, stats));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like computeIfAbsent
maybe can do

    
    // Fetch or initialize the updater set for the scheme
    Map<String, FileSystemStatisticUpdater> updaterSet = statisticUpdaters
            .computeIfAbsent(stats.getScheme(), k -> new TreeMap<>());
    
    // Fetch or create the updater for the specific statistic
    FileSystemStatisticUpdater updater = updaterSet
            .computeIfAbsent(stats.getName(), k -> new FileSystemStatisticUpdater(tezCounters, stats));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely, thanks!


private static MiniDFSCluster dfsCluster;

private static Configuration conf = new Configuration();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be final

public static void setup() throws IOException {
try {
conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, TEST_ROOT_DIR);
dfsCluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).format(true).racks(null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.format(true).racks(null) ain't required IMO, they are by default true & null

Comment on lines 77 to 79
FSDataOutputStream out = remoteFs.create(new Path("/tmp/foo/abc.txt"));
out.writeBytes("xyz");
out.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use this

    DFSTestUtil.writeFile(remoteFs, new Path("/tmp/foo/abc.txt"), "xyz");

and below as well

Copy link
Contributor Author

@abstractdog abstractdog Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see there is a difference between how these approaches work, which is reflected in the counters
no matter which one we choose, the counters can be asserted accordingly

original approach (mkdir, create)

		HDFS_BYTES_WRITTEN=3
		HDFS_WRITE_OPS=2
		HDFS_OP_CREATE=1
		HDFS_OP_MKDIRS=1

DFSTestUtil.writeFile

		HDFS_BYTES_WRITTEN=3
		HDFS_READ_OPS=1
		HDFS_WRITE_OPS=1
		HDFS_OP_CREATE=1
		HDFS_OP_GET_FILE_STATUS=1


private static final Logger LOG = LoggerFactory.getLogger(
TestTaskCounterUpdater.class);
private static Configuration conf = new Configuration();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be final

Comment on lines 32 to 33
private static final Logger LOG = LoggerFactory.getLogger(
TestTaskCounterUpdater.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit
line break wasn't required I think

TaskCounterUpdater updater = new TaskCounterUpdater(counters, conf, "pid");

updater.updateCounters();
LOG.info("Counters: " + counters);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use {}

Comment on lines +53 to +51
Assert.assertTrue("Counter not updated, old=" + oldVal
+ ", new=" + cpuCounter.getValue(), cpuCounter.getValue() > oldVal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put some sleep before updateCounters, just thinking in extreme conditions, this check won't go flaky, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, it seems quite unlikely, I would let it be decided by the future's precommit runs (it will be obvious once it fails)

Comment on lines 57 to 58


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid

OP_SET_ACL(CommonStatisticNames.OP_SET_ACL),
OP_SET_OWNER(CommonStatisticNames.OP_SET_OWNER),
OP_SET_PERMISSION(CommonStatisticNames.OP_SET_PERMISSION),
OP_GET_FILE_BLOCK_LOCATIONS("op_get_file_block_locations");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this, no enum in Hadoop? where is this coming from then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's from DFSOpsCountStatistics, but I'm removing it as I intend to include and expose only stats that are defined in CommonStatisticNames

@abstractdog
Copy link
Contributor Author

thanks @ayushtkn for the initial comments! I need to rework this area, I think some of the comments are related to the fact that I simply adapted an old patch, I need to think this over again! I'll let you know

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @abstractdog for the changes, Minor Comments, rest looks good

Comment on lines 35 to 36
// Additional counters from HADOOP-13305
// Additional counters from HADOOP-13305
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate line

Comment on lines 52 to 56
// /**
// * A Map where Key-> URIScheme and value->FileSystemStatisticUpdater
// */
// private Map<String, FileSystemStatisticUpdater> statisticUpdaters =
// new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you commenting this out? Can't we just delete it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, leftover, removing it

Comment on lines 54 to 61
private Map<String, FileSystemStatisticUpdater> statisticUpdaters =
new HashMap<String, FileSystemStatisticUpdater>();
private Map<String, Map<String, FileSystemStatisticUpdater>> statisticUpdaters =
new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this we can make final

StorageStatistics stats = iter.next();
// Fetch or initialize the updater set for the scheme
Map<String, FileSystemStatisticUpdater> updaterSet = statisticUpdaters
.computeIfAbsent(stats.getScheme(), k -> new TreeMap<>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using TreeMap now? If I decode right, earlier it was HashMap, it would be some cost using it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I cannot see the point of ordering while adding different statistic updaters for the same scheme, let me revert back to HashMap


DFSTestUtil.writeFile(remoteFs, new Path("/tmp/foo/abc.txt"), "xyz");

updater.updateCounters();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we first do

    FileSystem.clearStatistics();

In case there is any test added in future which does FS operations, I believe this test will screw up. So better to reset everything to 0, before we start testing

Copy link
Contributor Author

@abstractdog abstractdog Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking care of that in a @before method
tested it with copying basicTest to basicTest2 and tried until the problems went away :)

@abstractdog
Copy link
Contributor Author

Thanx @abstractdog for the changes, Minor Comments, rest looks good

thanks @ayushtkn, addressed your comments

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 9s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 5m 46s Maven dependency ordering for branch
+1 💚 mvninstall 7m 35s master passed
+1 💚 compile 0m 34s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 compile 0m 38s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 checkstyle 1m 0s master passed
+1 💚 javadoc 0m 50s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 0m 38s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+0 🆗 spotbugs 0m 29s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 38s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 7s Maven dependency ordering for patch
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javac 0m 27s the patch passed
+1 💚 compile 0m 20s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 javac 0m 20s the patch passed
+1 💚 checkstyle 0m 5s The patch passed checkstyle in tez-api
+1 💚 checkstyle 0m 6s tez-runtime-internals: The patch generated 0 new + 16 unchanged - 1 fixed = 16 total (was 17)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 22s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 findbugs 0m 56s the patch passed
_ Other Tests _
+1 💚 unit 1m 51s tez-api in the patch passed.
+1 💚 unit 0m 38s tez-runtime-internals in the patch passed.
+1 💚 asflicense 0m 20s The patch does not generate ASF License warnings.
25m 51s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-379/6/artifact/out/Dockerfile
GITHUB PR #379
JIRA Issue TEZ-3331
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile xml
uname Linux 9b784bc2902b 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / b5bf8dc
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-379/6/testReport/
Max. process+thread count 371 (vs. ulimit of 5500)
modules C: tez-api tez-runtime-internals U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-379/6/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abstractdog abstractdog merged commit 1f7465f into apache:master Dec 13, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants