Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add nomic modern bert #1684

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

feat: Add nomic modern bert #1684

wants to merge 8 commits into from

Conversation

Samoed
Copy link
Collaborator

@Samoed Samoed commented Jan 2, 2025

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Closes #1624

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.
2025-01-02 08:15:52.601067 >>> AmazonCounterfactualClassification
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 583, in run
    results, tick, tock = self._run_eval(
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 304, in _run_eval
    results = task.evaluate(
  File "/usr/local/lib/python3.10/dist-packages/mteb/abstasks/AbsTaskClassification.py", line 120, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/usr/local/lib/python3.10/dist-packages/mteb/abstasks/AbsTaskClassification.py", line 196, in _evaluate_subset
    scores_exp, test_cache = evaluator(model, test_cache=test_cache)
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/evaluators/ClassificationEvaluator.py", line 306, in __call__
    clf.fit(X_train, self.y_train)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py", line 1196, in fit
    X, y = self._validate_data(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 584, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py", line 1106, in check_X_y
    X = check_array(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py", line 921, in check_array
    _assert_all_finite(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
    raise ValueError(msg_err)
ValueError: Input X contains NaN.
LogisticRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values


2025-01-02 08:16:28.850049 >>> ToxicConversationsClassification
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 583, in run
    results, tick, tock = self._run_eval(
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 304, in _run_eval
    results = task.evaluate(
  File "/usr/local/lib/python3.10/dist-packages/mteb/abstasks/AbsTaskClassification.py", line 120, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/usr/local/lib/python3.10/dist-packages/mteb/abstasks/AbsTaskClassification.py", line 196, in _evaluate_subset
    scores_exp, test_cache = evaluator(model, test_cache=test_cache)
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/evaluators/ClassificationEvaluator.py", line 306, in __call__
    clf.fit(X_train, self.y_train)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py", line 1196, in fit
    X, y = self._validate_data(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 584, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py", line 1106, in check_X_y
    X = check_array(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py", line 921, in check_array
    _assert_all_finite(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
    raise ValueError(msg_err)
ValueError: Input X contains NaN.
LogisticRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values


2025-01-02 08:29:39.876958 >>> SummEval
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 583, in run
    results, tick, tock = self._run_eval(
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 304, in _run_eval
    results = task.evaluate(
  File "/usr/local/lib/python3.10/dist-packages/mteb/abstasks/AbsTask.py", line 126, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/usr/local/lib/python3.10/dist-packages/mteb/abstasks/AbsTaskSummarization.py", line 109, in _evaluate_subset
    scores = evaluator(model, encode_kwargs=encode_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/evaluators/SummarizationEvaluator.py", line 315, in __call__
    cosine_pearson_scores.append(pearsonr(human_scores, cosine_pred_scores))
  File "/usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py", line 4794, in pearsonr
    normym = linalg.norm(ym)
  File "/usr/local/lib/python3.10/dist-packages/scipy/linalg/_misc.py", line 146, in norm
    a = np.asarray_chkfinite(a)
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 630, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs

After this, I rerun classification tasks on smaller batch size (4 instead of 32) and AmazonCounterfactualClassification completed successfully, but ToxicConversationsClassification gave same error.

@zanussbaum can you help to integrate your model implementation to MTEB?

@zanussbaum
Copy link
Contributor

Hm not sure i totally understand what's going on here but for the classification tasks one thing that might be different is we don't normalize the embeddings

@Samoed
Copy link
Collaborator Author

Samoed commented Jan 2, 2025

Can you provide script for evaluating mteb?

@zanussbaum
Copy link
Contributor

I eval'd using our contrastors repo: https://github.com/nomic-ai/contrastors/tree/main/src/contrastors/eval/mteb_eval

@Samoed
Copy link
Collaborator Author

Samoed commented Jan 4, 2025

Updated scores

Leaderboard PR
AmazonCounterfactualClassification (en) 78.13 76.5821
EmotionClassification 48.26 51.35
ToxicConversationsClassification 67.46 DNF
SprintDuplicateQuestions 92.04 92.0572
TwitterSemEval2015 73.63 73.6807
ArxivClusteringS2S 38.09 DNF
RedditClustering 56.5 DNF
SciDocsRR 81.52 81.542
AskUbuntuDupQuestions 62.33 62.4368
SCIDOCS 18.59 18.071
SciFact 69.63 60.046
STSBenchmark 86.97 86.9903
STS16 85.74 85.7417
SummEval 31.39 31.2883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate ModernBERT
2 participants