Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: TemplateExpression with parameters #394

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Dec 26, 2024

@gm89uk @Andrea-gm here's an attempt at MilesCranmer/PySR#787. If you get a chance to try, I'd be very interested to hear what you think.

The API is like this (v2). We request an expression f and parameter vectors p1, p2, and p3, via the type parameter {(:f,), (:p1, :p2, :p3)}. We then write out how a function specifying how these are combined, and how the expression is evaluated. We then specify the length of each parameter vector (which here is the number of unique parameter classes):

structure = TemplateStructure{(:f,), (:p1, :p2, :p3)}(
    function ((; f), (; p1, p2, p3), (x1, x2, class))
        p1[class] * x1^2 + f(x1, x2, p2[class]) - p3[1]
    end;
    num_parameters=(; p1=10, p2=10, p3=1)
)

This is equivalent to (for p1 = $\alpha$, p2 = $\beta$, p3 = $\gamma$):

$$ \hat{y} = \alpha_c x_1^2 + f(x_1, x_2, \beta_c) - \gamma $$

for each datapoint $i$ with features $x_1$ and $x_2$, and class $c$. There are 10 classes which each have parameters $\alpha$ and $\beta$. There is also $\gamma$ which is simply a manually-specified constant.

The Python version could be:

expression_spec = TemplateExpressionSpec(
    ["f"],
    ["p1", "p2", "p3"],
    combine="((; f), (; p1, p2, p3), (x, y, class)) -> p1[class] * x^2 + f(x, y, p2[class]) - p3[1]",
    num_parameters=dict(p1=10, p2=10, p3=1)
)

Note that you would pass class as the third feature of the dataset.

Original API

The API is like this:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x) + p[i] * y; num_parameters=2
)

You need to specify how many total parameters you ask for in advance. Usually this is just length(unique(i)) for some vector i that contains the categorization of each row. But you can have multiple categorizations too – just pass those as additional columns to the dataset. (They will get converted back to integers when indexing p here)

Those parameters can be used in any way you want. You can pass them to one of the subexpressions:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x, p[i]) + y; num_parameters=2
)

which basically let's you do what ParametricExpression does, but in a fixed functional form way.

You can also get parameters manually, like this:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y), p) -> f(x) + p[1] * y; num_parameters=1
)

Where you can see we just ask for a single parameter in the structure.

Full example:

using SymbolicRegression
using Random: MersenneTwister
using MLJBase: machine, fit!, report, matrix

# structure => f(x) + p[1] * y, single param
struct_search = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x) + p[i] * y; num_parameters=2
)

rng = MersenneTwister(0)

# We'll create 2 features, and 1 categorical variable (with 2 categories total)
X = (; x=rand(rng, 32), y=rand(rng, 32), i=rand(rng, 1:2, 32))
true_params = [0.5, -3.0]
true_f(x) = 0.5 * x * x

y = [true_f(X.x[i]) + true_params[X.i[i]] * X.y[i] for i in 1:32]

model = SRRegressor(;
    niterations=20,
    binary_operators=(+, *, -),
    unary_operators=(),
    expression_type=TemplateExpression,
    expression_options=(; structure=struct_search),
    early_stop_condition=(l, c) -> l < 1e-6 && c == 5,
)

mach = machine(model, X, y)
fit!(mach)

r = report(mach)
best_expr_idx = findfirst(
    i -> r.losses[i] < 1e-6 && r.complexities[i] == 5, 1:length(r.equations)
)
best_expr = r.equations[best_expr_idx]

# We can check that we recovered the `true_params`,
# since this problem fully constrains them:
params = get_metadata(best_expr).parameters
@test isapprox(params, true_params; atol=1e-3)

This comment was marked as resolved.

@coveralls
Copy link

coveralls commented Dec 26, 2024

Pull Request Test Coverage Report for Build 12551568401

Details

  • 177 of 183 (96.72%) changed or added relevant lines in 4 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 94.772%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/TemplateExpression.jl 163 169 96.45%
Files with Coverage Reduction New Missed Lines %
src/ComposableExpression.jl 2 96.55%
Totals Coverage Status
Change from base Build 12521558752: 0.04%
Covered Lines: 3281
Relevant Lines: 3462

💛 - Coveralls

@gm89uk
Copy link

gm89uk commented Dec 26, 2024

Apologies in advance for the delay Miles, I'm away over Christmas, and won't be able to test it out till the 28th. Happy holidays

@Andrea-gm
Copy link

Hi Miles. Apologies for the later reply, I am also on and off these days. The API looks great to me, thanks so much for it!
Is it already possible to test this from Python? I am unfortunately not familiar with julia

@gm89uk
Copy link

gm89uk commented Dec 29, 2024

Hi Miles,

Really like the syntax although I misunderstood that parameters = length(unique(i))

For full clarity:
I interpreted (and with ParametricExpression) that parameters were the number of parameters to optimise for each categorial variable. i.e. if you had 1 categorial variable with 4 categories (1,2,3,4), and you set number of parameters to 1, then it would optimise p1 (or whatever) to 4 different values within a function, one for each category; but actually you mean in this instance we should have number of parameters to be 4?

If batching is running, then length(unique(lens)) may be variable with each run.

So lets say i has 4 categories, 1,2,3,4. To find the form p*sin(x) = y, I should set number of parameters to be 4 or 1?
Nevertheless, a little stuck, thank you again for your patience.
I removed and ran ] add SymbolicRegression#master

structure = TemplateStructure{(:f,)}(
  ((; f), (x1, x2, x3, x4, x5, x6, y, cat), c) -> begin
    o = f(x1, x2, x3, x4, x5, x6, c[cat]) 
    if !o.valid
        return ValidVector(Vector{Float32}(undef, length(o.x)), false)
    end
    #Compute gradients for first 3 variables at once
    for varidx in 1:3
    	grad_column = D(f, varidx)(x1, x2, x3, x4, x5, x6, c[cat]).x
    	if !(all(grad_column .>= 0) || all(grad_column .<= 0))
        	return ValidVector(fill(Float32(1e9), length(o.x)), true)
    	end
    end
    return o
  end;
  num_parameters = length(unique(cat)) #I understand that cat is not defined here, but how do I pass a variable number of parameters so that this code works with batching?
)
model = SRRegressor(
    niterations=1000000,
    binary_operators=[+,-,*,/],
    maxsize=50,
    bumper=true,
    turbo=true,
    populations=18,
    expression_type = TemplateExpression,
    expression_options = (; structure),
    population_size=100
    parsimony = 0.01,
    batching=true,
)
mach = machine(model, X, y)
fit!(mach)

@MilesCranmer
Copy link
Owner Author

So lets say i has 4 categories, 1,2,3,4. To find the form p*sin(x) = y, I should set number of parameters to be 4 or 1?

And if i had 4 categories, and you were searching the equation p1 * sin(x) - p2 = y, then you would set the number of parameters to 8.

In other words, num_parameters is literally the length of the vector c in your example. You can use that vector in any way you want – which makes it super flexible.

Note that confusingly, num_parameters here (still just an idea, nothing is fixed) is different than ParametricExpression's max_parameters, which in this case, would be 1 and 2 respectively, rather than 4 and 8. This is because the ParametricExpression automatically infers the length of c based on the number of categories. Whereas here, we are just setting the length explicitly.

The one downside is you do need to do a bit more work to set up a problem. But I feel like it's so explicit and flexible that it's worth it, and maybe even easier to understand?


This also means you could have two category types: i1 and i2. If i1 had 4 categories, and corresponded to the parameter p1, and i2 had 2 categories for parameter p2, then you would set num_parameters to 6.


Do you think num_parameters is the right word? Or is there a better term here that might differentiate it with respect to the ParametricExpression max_parameters?

@MilesCranmer
Copy link
Owner Author

What about num_parameter_values instead of num_parameters? Then it contrasts a bit with ParametricExpression's max_parameters which is more like the number of parameter variables, rather than the number of actual scalar values.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Dec 30, 2024

@atharvas would be interested in your thoughts on this API as well, I think this will be useful for our stuff! See the first post in this thread for the current API.

@MilesCranmer
Copy link
Owner Author

I've got it printing parameters now too which is nice:

template_parameters_3_trimmed.mp4

@gm89uk
Copy link

gm89uk commented Dec 30, 2024

Great progress! I was actually about to ask, how do you acquire the parameters in a very long run where you set the niterations to a very high number.

Thanks for the comments, it is much appreciated. Just as a user it is super helpful for me to know what is intuitive.

For this syntax, what would happen if you have a cat and a cat2 here, with different number of unique integers in each?

I've been thinking about this with an example that captures the complexities that can arise:
i1 is categorial variable with 4 members, i2 is a categorial variable with 2 members, we aim to have 2 parameters for i1 (so num_parameters 8), 1 parameter for i2 (so num_paramters 10), and a single parameter for constant optimisation (so num_parameters 11).

Current syntax:

((; f), (x1, x2, i1,i2), p) -> let
    p1 = p[i1]  #1st parameter for i1, 1st parameter overall
    p2 = p[i2+4]  #1st parameter for i2, 2nd parameter overall
    p3 = p[i1]  #reusing first parameter
    p4 = p[i1+6] #2nd parameter for i1, 3rd parameter overall
    p5 = p[10] #1st parameter constant optimisation, 4th parameter overall
   #used however you want between functions or within
end;
num_paramters = 11

Honestly, I find in complex workflows this will very prone to errors, especially if you get beyond 2,3 parameters.

Proposed synxtax:

((; f), (x1, x2, i1,i2), p) -> let
    p1 = p[i1,1]  #1st parameter for i1, 1st parameter overall
    p2 = p[i2,2]  #1st parameter for i2, 2nd parameter overall
    p3 = p[i1,1]  #reusing first parameter
    p4 = p[i1,3] #2nd parameter for i1, 3rd parameter overall
    p5 = p[,4] #1st parameter constant optimisation, 4th parameter overall
   #used however you want between functions or within
end

Then behind the hood, have an initiator:

parameters_indices = [1, 2, 1, 3, 4]  # Example data that is picked up from a parser
parameters_indices_variables = [:i1, :i2, :i1, :i1, missing]  # Example variables that is picked up from the parser, maybe going through structure and searching for p[*]
combined_matrix = hcat(parameters_indices, parameters_indices_variables)' 

# Unique parameter indices
unique_parameters_indices = unique(parameters_indices)
no_of_unique_parameters = maximum(unique_parameters_indices) 

num_parameters = 0
num_parameters_indices = zeros(Int, no_of_unique_parameters)  

for i in 1:no_of_unique_parameters
    matching_rows = combined_matrix[:, combined_matrix[1, :] .== i]
    unique_variable_count = length(unique(matching_rows[2, :]))
    num_parameters += unique_variable_count
    num_parameters_indices[i] = num_parameters
end

num_parameters_indices2 = similar(parameters_indices) 

for i in 1:length(parameters_indices)
    parameter_index = parameters_indices[i]
    matching_rows = combined_matrix[:, combined_matrix[1, :] .== parameter_index]
    unique_variable_count = length(unique(matching_rows[2, :]))
    num_parameters_indices2[i] = num_parameters_indices[parameter_index] - unique_variable_count
end

if I have written it right, then the output should be:
num_parameters_indices: [4, 6, 10, 11]
num_parameters_indices2: [0, 4, 0, 6, 10]
That should match with the original syntax:
p[parameters_indices_variables[1] + num_parameters_indices2[1]] = p[i1 + 0]
p[parameters_indices_variables[2] + num_parameters_indices2[2]] = p[i2 + 4]
p[parameters_indices_variables[3] + num_parameters_indices2[3]] = p[i1 + 0]
p[parameters_indices_variables[4] + num_parameters_indices2[4]] = p[i1 + 6]
p[parameters_indices_variables[5] + num_parameters_indices2[5]] = p[missing + 10]

Let me know if that makes any sense!

Edit: I just noticed v2 syntax, it is much better than API v1 and better than my suggestion above👍

@gm89uk
Copy link

gm89uk commented Dec 30, 2024

With regards to API v2:

structure = TemplateStructure{(:f,), (:p1, :p2)}(
  function ((; f), (; p1, p2), (x1, x2, x3, x4, x5, i1, y, i2))
    o = f(x1, x2, x3, x4, x5, p1[i1], p2[i2])
    if !o.valid
        return o
    end
    #Compute gradients for all variables at once
    for varidx in 1:3
    	grad_column = D(f, varidx)(x1, x2, x3, x4, x5, p1[i1], p2[i2]).x
    	# Check if all nonnegative or all nonpositive
	if !(all(g -> g >= 0, grad_column) || all(g -> g <= 0, grad_column))
		return ValidVector(fill(1f09, length(o.x)), true)
    	end
    end
    return o
  end;
  num_parameters = (; p1=2, p2=12)
)

i1 is a binary categorial variable of 0,1s
i2 is has 12 unique members

ERROR: BoundsError: attempt to access 2-element SymbolicRegression.TemplateExpressionModule.ParamVector{Float32} at index [Float32[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0]]

Can't quite see my mistake!

@MilesCranmer
Copy link
Owner Author

Quick reply – since Julia is one-indexed, you will need to offset i1 by +1.

(I think I may make a 0-indexed array for the PySR interface so its easier for Python users)

@gm89uk
Copy link

gm89uk commented Dec 30, 2024

Thank you, that worked. Theoretically, I suppose it should be more efficient to pass a categorial binary variable as a parametric variable than as 0,1 as a normal variable?

@MilesCranmer
Copy link
Owner Author

Maybe but note that everything gets converted to Float64 anyways (and then back to Int64 when indexing). But these sorts of memory considerations are <1% compared to all of the allocations that happen within the evaluation and optimization loops

@Andrea-gm
Copy link

Hi Miles. Thank you again for this implementation. Is there any plan of extending this to the python API? If yes, is it something you plan on doing soon?

@MilesCranmer
Copy link
Owner Author

Once it merges it should be straightforward to get the analogous version into Python. But need to confirm the Julia API first

@gm89uk
Copy link

gm89uk commented Jan 4, 2025

Just another consideration, will the above APIs definitely work with batching = true?

If the whole database i1 has 12 different categories but on a specific batch there are only 5 present, would that not misalign the parameters and cause convergence issues?

@MilesCranmer
Copy link
Owner Author

Yeah it should be fine. The input features would just be slices of the whole features, and the categorical vectors are used as indices. So if you write ([1, 4, 8, 16])[[1, 3, 1]], you would get [1, 8, 1] returned; it wouldn’t be an issue that you didn’t refer to all the contents of the vector.

It does mean that sometimes, gradients with respect to one of the parameters will be zero, but you’d expect over time it would average out.

Also the expressions are evaluated on the whole dataset before getting stored in the Pareto front, so even if the batch loss is super noisy, the final loss will cover everything.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Jan 4, 2025

@avik-pal I'd be interested in hearing your thoughts on this API too! I think this would allow for further integration with Lux.jl. e.g., one of those parameter vectors could be for a Lux model - in which case you could simultaneously optimizing a Lux model AND a symbolic expression, jointly.

@Andrea-gm
Copy link

Hi Miles, I was wondering if you might have an idea of the timeline for implementing this feature. I don’t mean to pressure you in any way; I’m just trying to decide whether to wait or start learning Julia instead. Thank you!

@MilesCranmer
Copy link
Owner Author

I still have some reservations about the specific look of this API (someone on twitter mentioned having trouble understanding it). Maybe it's just that (; p1, p2) looks very unnatural to a Python user?

You would also write this like

structure = TemplateStructure{(:f,), (:p1, :p2, :p3)}(
    ((; f), params, (x1, x2, class)) -> params.p1[class] * x1^2 + f(x1, x2, params.p2[class]) - params.p3[1];
    num_parameters=(; p1=10, p2=10, p3=1)
)

It's a bit more verbose this way. The underlying mechanism is identical though. Perhaps it's a tad less confusing to people?

@Andrea-gm
Copy link

Andrea-gm commented Jan 15, 2025

To me this looks fine, but it's true that it's more understandable when you can explicitly define the function_symbols and parameters, as it's currently done in TemplateExpressionSpec.

Not sure if it's possible, but maybe a more intuitive Python interface would be something like this?

expression_spec = TemplateExpressionSpec(
    function_symbols = ["f"],
    parameters = ["p1", "p2",  "p3"],
    num_categories = dict("p1"=10, "p2"=10), # default num_cat = 1
    features = ["x1", "x2",  "cat"],
    expression = "p1(cat) * x1^2 + f(x1, x2, p2(cat)) - p3"
)

And then you can build the julia string in the backend?
To me defining the parameters as functions of the category makes intuitive sense.

@MilesCranmer
Copy link
Owner Author

Thanks for that, very helpful. Indeed perhaps on the Python side we should make it so that the user need not specify the signature. My only concern is whether hiding the Julia stuff would make it less clear how to customize it. But maybe that’s unrealistic!

Although I’m not sure about making the parameters callable, because [] is used in both Julia and Python for indexing, and this is the operation that is actually occurring when writing p1[cat].

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Jan 15, 2025

On the Julia side we could also this all be macro based. Generally I’m not a fan of user-side macros because it’s harder to customize stuff, but here maybe it’d be nice.

structure = @template(
    p1[class] * x1^2 + f(x1, x2, p2[class]) - p3[1],
    parameters=(p1=10, p2=10, p3=1),
    expressions=(f,),
    features=(x1, x2, class).
)

Thoughts?

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Jan 15, 2025

We could also allow specifying parameter matrices. That way it could allow requesting stuff like Lux.jl parameters. cc @avik-pal

Like

parameters=(p2=(10,), p2=(10, 5) #= matrix =#, p3=() #= scalar =#)

Maybe it’s better to make an abstract interface for parameters in a template expression?

@Andrea-gm
Copy link

Thanks for that, very helpful. Indeed perhaps on the Python side we should make it so that the user need not specify the signature. My only concern is whether hiding the Julia stuff would make it less clear how to customize it. But maybe that’s unrealistic!

Understandable! Maybe you can then leave the option of a julia_expression parameter where the user can pass a julia string that overrides the rest of the expression? Defaulting julia_expression = None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants