Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Introduce C++ formatter in JNI. #2250

Open
0ctopus13prime opened this issue Nov 6, 2024 · 15 comments
Open

[Enhancement] Introduce C++ formatter in JNI. #2250

0ctopus13prime opened this issue Nov 6, 2024 · 15 comments

Comments

@0ctopus13prime
Copy link
Contributor

0ctopus13prime commented Nov 6, 2024

Description

Currently, we don’t have a formatting tool enabled for C++ code, and as a result, the codebase has a mix of different coding styles. In contrast, spotless is already enabled for Java, preventing builds if code does not follow the established format. I propose adopting a similar approach for our C++ (JNI) code.

Establishing a consistent code style for C++ in our repository would enhance readability and facilitate future collaboration. Allowing multiple styles to coexist can lead to the "Broken Windows Effect," where visible inconsistencies invite further deviations, ultimately contributing to a less organized and harder-to-maintain codebase.

Tools

After researching several options, I propose setting up clang-format, which is compatible across major platforms—Linux, macOS, and Windows. The tool is also easy to integrate with popular IDEs like CLion, VSCode, and others, making clang-format a straightforward choice for our needs.

Note : We can also consider to use spotless - https://github.com/diffplug/spotless

Code Style

The key principle is to adopt a unified and widely accepted code style that maintainers prefer. Naturally, auto-generated JNI method signatures will remain unchanged, even after introducing the formatter, as the method naming pattern is a contract that the JVM relies on to locate the methods it calls.

I want to put it on a vote:

Required Changes

General idea is to make Makefile generated by CMake to call clang-format to format C++ codes.
I prefer this approach since I like a system to take care of choirs by itself, but open to other approaches.

  • Pros : User does not need to consider running formatting as it will be taken care of internally.
  • Cons : Honestly, I can't think of any cons...

Or, alternatively we can make it reject to build if it found there are misformatted files similar to what we are doing in Java.
Once it's rejected, user must run formatting manually before building.

  • Pros : I can't think of pros of this one ... but open to hear what other would think.
  • Cons : User must manually run formatting before building.

Expected Hassle

Since it requires clang-format, we need to overhaul all the pipeline, CI infra had it already to avoid program not found exception during building.

So I'm proposing 2 stages of introduction:

1st phase : Introduce formatting, but run it only if the program exists.

find_program(CLANG_FORMAT_EXECUTABLE clang-format)
if(CLANG_FORMAT_EXECUTABLE)
   ...
endif()

2nd phase : After make sure that pipelines, CI and other major third party won't have issue with formatting, we can enforce the formatting.

Merge conflicts

Since it is expected to bring changes most files, all changes before formatting in dev branch will likely hit merge conflicts.

@jmazanec15
Copy link
Member

I want to put it on a vote:

I typically refer to google's style guide. But I dont have a strong preference. Do you have one you like in particularly?

Since it requires clang-format, we need to overhaul all the pipeline, CI infra had it already to avoid program not found exception during building.

It seems spotless supports it: https://github.com/diffplug/spotless. Would we be able to leverage gradle to do this so we dont have to worry about dependencies?

Since it is expected to bring changes most files, all changes before formatting in dev branch will likely hit merge conflicts.

This is unfortunate but a one time cost. I think its okay.

@0ctopus13prime
Copy link
Contributor Author

0ctopus13prime commented Nov 6, 2024

I typically refer to google's style guide. But I dont have a strong preference. Do you have one you like in particularly?

I really don't care haha. But, the google style is the most comfortable for me.

@0ctopus13prime
Copy link
Contributor Author

0ctopus13prime commented Nov 6, 2024

It seems spotless supports it: https://github.com/diffplug/spotless. Would we be able to leverage gradle to do this so we dont have to worry about dependencies?

Nice!! then we can decouple formatting from CMake entirely ONLY IF it can format as much as clang-format.
I think it would be easier to modify gradle file than doing it in CMake.
Hoping that spotless can support as much as clang-format does. I will attach the comparisons between two.

@0ctopus13prime
Copy link
Contributor Author

Looks like spotless internally calls clang-format? - https://github.com/diffplug/spotless/blob/main/plugin-gradle/README.md#cc

spotless {
  cpp {
    target 'src/native/**' // you have to set the target manually

    clangFormat()  // has its own section below
    eclipseCdt()   // has its own section below

    licenseHeader '/* (C) $YEAR */' // or licenseHeaderFile
  }
}

@0ctopus13prime
Copy link
Contributor Author

0ctopus13prime commented Nov 7, 2024

I think we can also consider eclipse CDT formatting with Google code style : https://github.com/google/styleguide/blob/gh-pages/eclipse-cpp-google-style.xml
We can import the above XML into IDE (Clion or VSCode), and the good thing is that the formatter downloading will be taken care of by gradle, we don't actually need to care much about the library dependency.
Also we don't need to modify CMake which is bit trickier to edit compared to gradle.

In build.gradle file

spotless {
  cpp {
    target('jni/include/**', 'jni/src/**')
    eclipseCdt('11.6').configFile('/tmp/eclipse-cpp-google-style.xml')
  }
}

Result:

-jlong knn_jni::commons::storeVectorData(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jlong memoryAddressJ,
-                                        jobjectArray dataJ, jlong initialCapacityJ, jboolean appendJ) {
-    std::vector<float> *vect;
-    if ((long) memoryAddressJ == 0) {

+jlong knn_jni::commons::storeVectorData(knn_jni::JNIUtilInterface *jniUtil,
+                                        JNIEnv *env, jlong memoryAddressJ,
+                                        jobjectArray dataJ,
+                                        jlong initialCapacityJ,
+                                        jboolean appendJ) {
+  std::vector<float> *vect;
+  if ((long) memoryAddressJ == 0) {

@jmazanec15
Copy link
Member

@0ctopus13prime did you get a poc working? If so Im good with google guide and eclipse

@0ctopus13prime
Copy link
Contributor Author

Sure, let me raise a PR shortly. I'm planning to start working on this after bumping up JNI C++ version to 17.

@0ctopus13prime
Copy link
Contributor Author

@jmazanec15
Hey Jack, everything seems to be working fine, just doing a last check.

FAISS uses snake_case for method names (NMSLIB uses camelCase, but that's going to be deprecated in the future anyway). The Google C++ style guide uses PascalCase for method names (e.g., Analyze(int foo)), but do you think we should align with FAISS and switch to snake_case (e.g., range_search(idx_t n, ...))? We could still follow other parts of the Google C++ style, just changing the method name style to snake_case.

// Google C++ code style

class MyClass {
 public:
  void Analyze(const std::string &text);
  void Analyze(const char *text, size_t textlen);
};
// Faiss
struct IndexHNSW : Index {
    typedef HNSW::storage_idx_t storage_idx_t;
    void range_search(
            idx_t n,
            const float* x,
            float radius,
            RangeSearchResult* result,
            const SearchParameters* params = nullptr) const override;

@jmazanec15
Copy link
Member

In general, I think PascalCase is fine. I dont think we need to align with any libraries. We just need to make sure our conventions are consistent.

@0ctopus13prime
Copy link
Contributor Author

0ctopus13prime commented Nov 8, 2024

@jmazanec15
Sounds good, once C++ 17 update PR is merged will raise a new PR for this shortly

@0ctopus13prime
Copy link
Contributor Author

Enabled C++17 in JNI, will continue working on this.

@0ctopus13prime
Copy link
Contributor Author

Eclipse CDT mostly works but it's not perfect. Seems struggling when parsing template <. It keeps recognizing it as a comparison sign.

std::unique < XXX
  > xxx = ...

Expected
std::unique<XXX> xxx = ...

I'm inclining toward clang-format + clang-tidy options which rely on clang lexer, providing more accurate linting and static analysis.
Will evaluate how easy to automate installing them in either build.gradle or CMAKE. They are very standard in C++ world, it should be straightforward to install them in most popular platforms - Mac, Linux and Windows

@0ctopus13prime
Copy link
Contributor Author

After running clang-tidy with Google code style on our JNI codes, I got around 330 warnings (excluding faiss and nmslib).
I’ll clean these up in a PR soon.
Also, it looks pretty manageable to make a cross-platform install script. I’ll test it on the major platforms first then include it in the PR once it’s ready.

@jmazanec15
Copy link
Member

sounds good! So, would gradle not auto-configure dependency?

@0ctopus13prime
Copy link
Contributor Author

0ctopus13prime commented Nov 12, 2024

@jmazanec15
Not for clang-format. It just calls it to format the code, but I will not download it unfortunately. 😛
But it does for eclipse CDT. Another bad luck is that it can't really do the formatting the template code...
Will share a comprehensive PR having an install script for clang-format + tidy as well as the result code after applied formatting shortly!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants