Add LightGBM classifier implementation #44

YuzeHao2023 · 2026-01-01T07:28:50Z

Implement LightGBMClassifier in sml/ensemble/lightgbm.py
- Supports gradient boosting with decision trees
- Parameters: n_estimators, learning_rate, max_depth, num_leaves, criterion, epsilon
- Methods: fit(), predict(), decision_function()
- Consistent with AdaBoost and RandomForest implementations
Add comprehensive tests in tests/ensemble/lightgbm_test.py
- Test against sklearn GradientBoostingClassifier
- Uses Iris dataset with boolean feature transformation
Add emulation tests in emulations/ensemble/lightgbm_emul.py
- Performance benchmarking with sklearn
- SPU execution comparison
Update ensemble init.py to export LightGBMClassifier

- Implement LightGBMClassifier in sml/ensemble/lightgbm.py - Supports gradient boosting with decision trees - Parameters: n_estimators, learning_rate, max_depth, num_leaves, criterion, epsilon - Methods: fit(), predict(), decision_function() - Consistent with AdaBoost and RandomForest implementations - Add comprehensive tests in tests/ensemble/lightgbm_test.py - Test against sklearn GradientBoostingClassifier - Uses Iris dataset with boolean feature transformation - Add emulation tests in emulations/ensemble/lightgbm_emul.py - Performance benchmarking with sklearn - SPU execution comparison - Update ensemble __init__.py to export LightGBMClassifier

gemini-code-assist · 2026-01-01T07:29:04Z

Summary of Changes

Hello @YuzeHao2023, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new LightGBM classifier implementation, expanding the machine learning capabilities of the sml library. It provides a robust, configurable gradient boosting model, complete with thorough testing and emulation to ensure its correctness and performance, particularly within secure computation environments.

Highlights

LightGBM Classifier Implementation: A new LightGBMClassifier has been implemented in sml/ensemble/lightgbm.py, supporting gradient boosting with decision trees and configurable parameters like n_estimators, learning_rate, max_depth, num_leaves, criterion, and epsilon.
Comprehensive Testing: Dedicated unit tests have been added in tests/ensemble/lightgbm_test.py to validate the LightGBMClassifier against scikit-learn's GradientBoostingClassifier using the Iris dataset.
Emulation Testing: Emulation tests are introduced in emulations/ensemble/lightgbm_emul.py to benchmark performance against scikit-learn and compare SPU execution, ensuring compatibility and efficiency in a secure computation environment.
Module Export: The new LightGBMClassifier is now exported from sml/ensemble/__init__.py, making it accessible within the sml.ensemble module.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new LightGBMClassifier, along with corresponding tests and emulation scripts. My review focuses on the correctness of the implementation, its maintainability, and the quality of the tests. The main issue is that the classifier's name LightGBMClassifier is misleading, as the implementation appears to be a variant of AdaBoost (SAMME.R) rather than the LightGBM algorithm. I've also identified an unused parameter in the classifier's constructor and noted that the tests lack assertions to verify correctness. Additionally, there's some duplicated code between the test and emulation files that could be refactored. Please see my detailed comments for suggestions on how to address these points.

gemini-code-assist · 2026-01-01T07:30:24Z

sml/ensemble/lightgbm.py

+from sml.tree.tree import DecisionTreeClassifier as sml_dtc
+
+
+class LightGBMClassifier:


The class is named LightGBMClassifier, but the implementation follows the AdaBoost.SAMME.R algorithm, not LightGBM or even standard gradient boosting. The weight update rule in _boost_round (lines 220-226) is characteristic of AdaBoost. This name is misleading and can cause confusion for users. Please rename the class to something that accurately reflects the algorithm, such as SAMMEClassifier.

gemini-code-assist · 2026-01-01T07:30:24Z

sml/ensemble/lightgbm.py

+    num_leaves : int
+        The maximum number of leaves in each tree.
+        Must be greater than 1. Default is 31.


The num_leaves parameter is defined in the docstring and accepted in __init__, but it is not used anywhere in the implementation. The DecisionTreeClassifier is instantiated using max_depth only. This can be misleading for users of the class. Please remove the num_leaves parameter from the docstring, the __init__ method signature, its validation, and its assignment to self.num_leaves.

gemini-code-assist · 2026-01-01T07:30:24Z

tests/ensemble/lightgbm_test.py

+    print(f"Accuracy in SKlearn: {score_plain}")
+    print(f"Accuracy in SPU: {score_encrypted}")


The test calculates accuracy scores but only prints them. A unit test must contain assertions to automatically verify the correctness of the implementation. Please add an assertion to check if the score from the SPU implementation is close to the score from the sklearn implementation. For example: assert jnp.isclose(score_plain, score_encrypted, atol=0.05).

gemini-code-assist · 2026-01-01T07:30:24Z

emulations/ensemble/lightgbm_emul.py

+    def load_data():
+        iris = load_iris()
+        iris_data, iris_label = jnp.array(iris.data), jnp.array(iris.target)
+        # sorted_features: n_samples * n_features_in
+        n_samples, n_features_in = iris_data.shape
+        sorted_features = jnp.sort(iris_data, axis=0)
+        new_threshold = (sorted_features[:-1, :] + sorted_features[1:, :]) / 2
+        new_features = jnp.greater_equal(
+            iris_data[:, :], new_threshold[:, jnp.newaxis, :]
+        )
+        new_features = new_features.transpose([1, 0, 2]).reshape(n_samples, -1)
+
+        X, y = new_features[:, ::3], iris_label[:]
+        return X, y


This load_data function is identical to the one in tests/ensemble/lightgbm_test.py. To avoid code duplication and improve maintainability, consider extracting this function into a shared utility module and importing it in both files.

gemini-code-assist · 2026-01-01T07:30:24Z

sml/ensemble/lightgbm.py

+# limitations under the License.
+
+
+import copy


The copy module is imported but is not used within the file. Please remove this unused import.

gemini-code-assist · 2026-01-01T07:30:24Z

sml/ensemble/lightgbm.py

+        pred = sum(
+            jnp.where(
+                (estimator.predict(X) == classes).T,
+                w,
+                -1 / (n_classes - 1) * w,
+            )
+            * flag
+            for estimator, w, flag in zip(
+                self.estimators_,
+                self.estimator_weight_,
+                self.estimator_flags_,
+                strict=True,
+            )
+        )


The decision_function uses a Python sum() on a generator expression. This pattern is not compatible with JAX's JIT compilation and can be inefficient for a large number of estimators due to Python-level iteration. For better performance and to enable JIT compilation, consider refactoring this to use jax.lax.fori_loop to accumulate the predictions.

gemini-code-assist · 2026-01-01T07:30:24Z

tests/ensemble/lightgbm_test.py

+    def load_data():
+        iris = load_iris()
+        iris_data, iris_label = jnp.array(iris.data), jnp.array(iris.target)
+        # sorted_features: n_samples * n_features_in
+        n_samples, n_features_in = iris_data.shape
+        sorted_features = jnp.sort(iris_data, axis=0)
+        new_threshold = (sorted_features[:-1, :] + sorted_features[1:, :]) / 2
+        new_features = jnp.greater_equal(
+            iris_data[:, :], new_threshold[:, jnp.newaxis, :]
+        )
+        new_features = new_features.transpose([1, 0, 2]).reshape(n_samples, -1)
+
+        X, y = new_features[:, ::3], iris_label[:]
+        return X, y


This load_data function is identical to the one in emulations/ensemble/lightgbm_emul.py. To avoid code duplication and improve maintainability, please extract this function into a shared utility module and import it in both files.

gemini-code-assist · 2026-01-01T07:30:24Z

tests/ensemble/lightgbm_test.py

+    sim = spsim.Simulator.simple(3, libspu.ProtocolKind.ABY3, libspu.FieldType.FM64)
+
+    X, y = load_data()
+    n_samples, n_features = X.shape


The variables n_samples and n_features are assigned but never used. They should be removed to improve code clarity.

gemini-code-assist bot reviewed Jan 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LightGBM classifier implementation #44

Add LightGBM classifier implementation #44

Uh oh!

YuzeHao2023 commented Jan 1, 2026

Uh oh!

gemini-code-assist bot commented Jan 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		from sml.tree.tree import DecisionTreeClassifier as sml_dtc


		class LightGBMClassifier:

		print(f"Accuracy in SKlearn: {score_plain}")
		print(f"Accuracy in SPU: {score_encrypted}")

Add LightGBM classifier implementation #44

Are you sure you want to change the base?

Add LightGBM classifier implementation #44

Uh oh!

Conversation

YuzeHao2023 commented Jan 1, 2026

Uh oh!

gemini-code-assist bot commented Jan 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant