scikit-learn-contrib · MatthewSZhang · Nov 15, 2025 · Nov 14, 2025 · Nov 14, 2025 · Nov 14, 2025
diff --git a/doc/multioutput.rst b/doc/multioutput.rst
@@ -12,7 +12,7 @@ MIMO (Multi-Input Multi-Output) data. For classification, it can be used for
 multilabel data. Actually, for multiclass classification, which has one output with
 multiple categories, multioutput feature selection can also be useful. The multiclass
 classification can be converted to multilabel classification by one-hot encoding
-target ``y``. The canonical correaltion coefficient between the features ``X`` and the
+target ``y``. The canonical correlation coefficient between the features ``X`` and the
 one-hot encoded target ``y`` has equivalent relationship with Fisher's criterion in
 LDA (Linear Discriminant Analysis) [1]_. Applying :class:`FastCan` to the converted
 multioutput data may result in better accuracy in the following classification task
@@ -23,7 +23,7 @@ Relationship on multiclass data
 Assume the feature matrix is :math:`X \in \mathbb{R}^{N\times n}`, the multiclass
 target vector is :math:`y \in \mathbb{R}^{N\times 1}`, and the one-hot encoded target
 matrix is :math:`Y \in \mathbb{R}^{N\times m}`. Then, the Fisher's criterion for
-:math:`X` and :math:`y` is denoted as :math:`J` and the canonical correaltion
+:math:`X` and :math:`y` is denoted as :math:`J` and the canonical correlation
 coefficient between :math:`X` and :math:`Y` is denoted as :math:`R`. The relationship
 between :math:`J` and :math:`R` is given by
 
@@ -36,7 +36,7 @@ or
     R^2 = \frac{J}{1+J}
 
 It should be noted that the number of the Fisher's criterion and the canonical
-correaltion coefficient is not only one. The number of the non-zero canonical
+correlation coefficient is not only one. The number of the non-zero canonical
 correlation coefficients is no more than :math:`\min (n, m)`, and each canonical correlation
 coefficient is one-to-one correspondence to each Fisher's criterion.
 

diff --git a/doc/ols_and_omp.rst b/doc/ols_and_omp.rst
@@ -39,7 +39,7 @@ it the following advantages over OLS and OMP:
   and/or added some constants, the selection result given by :class:`FastCan` will be
   unchanged. See :ref:`sphx_glr_auto_examples_plot_affinity.py`.
 * Multioutput: as :class:`FastCan` use canonical correlation for feature ranking, it is
-  naturally support feature seleciton on dataset with multioutput.
+  naturally support feature selection on dataset with multioutput.
 
 
 .. rubric:: References

diff --git a/doc/pruning.rst b/doc/pruning.rst
@@ -16,7 +16,7 @@ by sparse linear combinations of the atoms.
 We use these atoms as the target :math:`Y` and select samples based on their correlation with :math:`Y`.
 
 One challenge to use :class:`FastCan` for data pruning is that the number to select is much larger than feature selection.
-Normally, this number is higher than the number of features, which will make the pruned data matrix singular.
+Normally, this number is greater than the number of features, which will make the pruned data matrix singular.
 In other words, :class:`FastCan` will easily think the pruned data is redundant and no additional sample
 should be selected, as any additional samples can be represented by linear combinations of the selected samples.
 Therefore, the number to select has to be set to small.

diff --git a/examples/plot_fisher.py b/examples/plot_fisher.py
@@ -5,7 +5,7 @@
 
 .. currentmodule:: fastcan
 
-In this examples, we will demonstrate the canonical correaltion coefficient
+In this examples, we will demonstrate the canonical correlation coefficient
 between the features ``X`` and the one-hot encoded target ``y`` has equivalent
 relationship with Fisher's criterion in LDA (Linear Discriminant Analysis).
 """
@@ -17,14 +17,14 @@
 # Prepare data
 # ------------
 # We use ``iris`` dataset and transform this multiclass data to multilabel data by
-# one-hot encoding. Here, drop="first" is necessary, otherwise, the transformed target
+# one-hot encoding. Here, drop="first" is necessary; otherwise, the transformed target
 # is not full column rank.
 
 from sklearn import datasets
 from sklearn.preprocessing import OneHotEncoder
 
 X, y = datasets.load_iris(return_X_y=True)
-# drop="first" is necessary, otherwise, the transformed target is not full column rank
+# drop="first" is necessary; otherwise, the transformed target is not full column rank
 y_enc = OneHotEncoder(
     drop="first",
     sparse_output=False,

diff --git a/examples/plot_forecasting.py b/examples/plot_forecasting.py
@@ -7,7 +7,7 @@
 
 In this examples, we will demonstrate how to use :func:`make_narx` to build (nonlinear)
 AutoRegressive (AR) models for time-series forecasting.
-The time series used isthe monthly average atmospheric CO2 concentrations
+The time series used is the monthly average atmospheric CO2 concentrations
 from 1958 and 2001.
 The objective is to forecast the CO2 concentration till nowadays with
 initial 18 months data.
@@ -94,7 +94,7 @@
 # Nonlinear AR model
 # ------------------
 # We can use :func:`make_narx` to easily build a nonlinear AR model, which does not
-# has a input. Therefore, the input ``X`` is set as ``None``.
+# has an input. Therefore, the input ``X`` is set as ``None``.
 # :func:`make_narx` will search 10 polynomial terms, whose maximum degree is 2 and
 # maximum delay is 9.
 

diff --git a/examples/plot_narx.py b/examples/plot_narx.py
@@ -47,7 +47,7 @@
 X = np.c_[u0[max_delay:], u1[max_delay:]]
 
 # %%
-# Build term libriary
+# Build term library
 # -------------------
 # To build a reduced polynomial NARX model, it is normally have two steps:
 #
@@ -56,14 +56,14 @@
 #
 # #. Learn the coefficients of the terms.
 #
-# To search the structure of the model, the candidate term libriary should be
+# To search the structure of the model, the candidate term library should be
 # constructed by the following two steps.
 #
 # #. Time-shifted variables: the raw input-output data, i.e., :math:`u_0(k)`,
 #    :math:`u_1(k)`, and :math:`y(k)`, are converted into :math:`u_0(k-1)`,
 #    :math:`u_1(k-2)`, etc.
 #
-# #. Nonlinear terms: the time-shifted variables are onverted to nonlinear terms
+# #. Nonlinear terms: the time-shifted variables are converted to nonlinear terms
 #    via polynomial basis functions, e.g., :math:`u_0(k-1)^2`,
 #    :math:`u_0(k-1)u_0(k-3)`, etc.
 #
@@ -124,8 +124,8 @@
 # %%
 # Build NARX model
 # ----------------
-# As the reduced polynomial NARX is a linear function of the nonlinear tems,
-# the coefficient of each term can be easily estimated by oridnary least squares.
+# As the reduced polynomial NARX is a linear function of the nonlinear terms,
+# the coefficient of each term can be easily estimated by ordinary least squares.
 # In the printed NARX model, it is found that :class:`FastCan` selects the correct
 # terms and the coefficients are close to the true values.
 
@@ -143,9 +143,9 @@
 
 print_narx(narx_model)
 # %%
-# Automaticated NARX modelling workflow
+# Automated NARX modelling workflow
 # -------------------------------------
-# We provide :meth:`narx.make_narx` to automaticate the workflow above.
+# We provide :meth:`narx.make_narx` to automate the workflow above.
 
 from fastcan.narx import make_narx
 

diff --git a/examples/plot_narx_multi.py b/examples/plot_narx_multi.py
@@ -1,6 +1,6 @@
 """
 =======================
-Mulit-output NARX model
+Multi-output NARX model
 =======================
 
 .. currentmodule:: fastcan
@@ -64,7 +64,7 @@
 
 
 # %%
-# Identify the mulit-output NARX model
+# Identify the multi-output NARX model
 # ------------------------------------
 # We provide :meth:`narx.make_narx` to automatically find the model
 # structure. `n_terms_to_select` can be a list to indicate the number

diff --git a/fastcan/_refine.py b/fastcan/_refine.py
@@ -38,10 +38,10 @@ def refine(selector, drop=1, max_iter=None, verbose=1):
     In the refining process, the selected features will be dropped, and
     the vacancy positions will be refilled from the candidate features.
 
-    The processing of a vacany position is refilled after searching all
+    The processing of a vacant position is refilled after searching all
     candidate features is called an `iteration`.
 
-    The processing of a vacany position is refilled by a different features
+    The processing of a vacant position is refilled by a different features
     from the dropped one, which increase the SSC of the selected features
     is called a `valid iteration`.
 
@@ -51,7 +51,7 @@ def refine(selector, drop=1, max_iter=None, verbose=1):
         FastCan selector.
 
     drop : int or array-like of shape (n_drops,) or "all", default=1
-        The number of the selected features dropped for the consequencing
+        The number of the selected features dropped for the consequent
         reselection.
 
     max_iter : int, default=None

diff --git a/fastcan/narx/_utils.py b/fastcan/narx/_utils.py
@@ -217,7 +217,7 @@ def make_narx(
         The verbosity level of refine.
 
     refine_drop : int or "all", default=None
-            The number of the selected features dropped for the consequencing
+            The number of the selected features dropped for the consequent
             reselection. If `drop` is None, no refining will be performed.
 
     refine_max_iter : int, default=None

diff --git a/fastcan/narx/tests/test_narx.py b/fastcan/narx/tests/test_narx.py
@@ -263,7 +263,7 @@ def make_data(multi_output, nan, rng):
         ).fit(X, y)
 
 
-def test_mulit_output_warn():
+def test_multi_output_warn():
     X = np.random.rand(10, 2)
     y = np.random.rand(10, 2)
     for i in range(2):
@@ -342,7 +342,7 @@ def test_fit_intercept():
         assert_array_equal(narx.intercept_, [0.0, 0.0])
 
 
-def test_mulit_output_error():
+def test_multi_output_error():
     X = np.random.rand(10, 2)
     y = np.random.rand(10, 2)
     time_shift_ids = np.array([[0, 1], [1, 1]])