RFC: TemplateExpression with parameters #394

MilesCranmer · 2024-12-26T01:55:58Z

@gm89uk @Andrea-gm here's an attempt at MilesCranmer/PySR#787. If you get a chance to try, I'd be very interested to hear what you think.

The API is like this (v2). We request an expression f and parameter vectors p1, p2, and p3, via the type parameter {(:f,), (:p1, :p2, :p3)}. We then write out how a function specifying how these are combined, and how the expression is evaluated. We then specify the length of each parameter vector (which here is the number of unique parameter classes):

structure = TemplateStructure{(:f,), (:p1, :p2, :p3)}(
    function ((; f), (; p1, p2, p3), (x1, x2, class))
        p1[class] * x1^2 + f(x1, x2, p2[class]) - p3[1]
    end;
    num_parameters=(; p1=10, p2=10, p3=1)
)

This is equivalent to (for p1 = $\alpha$, p2 = $\beta$, p3 = $\gamma$):

$$ \hat{y} = \alpha_c x_1^2 + f(x_1, x_2, \beta_c) - \gamma $$

for each datapoint $i$ with features $x_1$ and $x_2$, and class $c$. There are 10 classes which each have parameters $\alpha$ and $\beta$. There is also $\gamma$ which is simply a manually-specified constant.

The Python version could be:

expression_spec = TemplateExpressionSpec(
    ["f"],
    ["p1", "p2", "p3"],
    combine="((; f), (; p1, p2, p3), (x, y, class)) -> p1[class] * x^2 + f(x, y, p2[class]) - p3[1]",
    num_parameters=dict(p1=10, p2=10, p3=1)
)

Note that you would pass class as the third feature of the dataset.

Original API

The API is like this:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x) + p[i] * y; num_parameters=2
)

You need to specify how many total parameters you ask for in advance. Usually this is just length(unique(i)) for some vector i that contains the categorization of each row. But you can have multiple categorizations too – just pass those as additional columns to the dataset. (They will get converted back to integers when indexing p here)

Those parameters can be used in any way you want. You can pass them to one of the subexpressions:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x, p[i]) + y; num_parameters=2
)

which basically let's you do what ParametricExpression does, but in a fixed functional form way.

You can also get parameters manually, like this:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y), p) -> f(x) + p[1] * y; num_parameters=1
)

Where you can see we just ask for a single parameter in the structure.

Full example:

using SymbolicRegression
using Random: MersenneTwister
using MLJBase: machine, fit!, report, matrix

# structure => f(x) + p[1] * y, single param
struct_search = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x) + p[i] * y; num_parameters=2
)

rng = MersenneTwister(0)

# We'll create 2 features, and 1 categorical variable (with 2 categories total)
X = (; x=rand(rng, 32), y=rand(rng, 32), i=rand(rng, 1:2, 32))
true_params = [0.5, -3.0]
true_f(x) = 0.5 * x * x

y = [true_f(X.x[i]) + true_params[X.i[i]] * X.y[i] for i in 1:32]

model = SRRegressor(;
    niterations=20,
    binary_operators=(+, *, -),
    unary_operators=(),
    expression_type=TemplateExpression,
    expression_options=(; structure=struct_search),
    early_stop_condition=(l, c) -> l < 1e-6 && c == 5,
)

mach = machine(model, X, y)
fit!(mach)

r = report(mach)
best_expr_idx = findfirst(
    i -> r.losses[i] < 1e-6 && r.complexities[i] == 5, 1:length(r.equations)
)
best_expr = r.equations[best_expr_idx]

# We can check that we recovered the `true_params`,
# since this problem fully constrains them:
params = get_metadata(best_expr).parameters
@test isapprox(params, true_params; atol=1e-3)

coveralls · 2024-12-26T07:00:36Z

Pull Request Test Coverage Report for Build 12551568401

Details

177 of 183 (96.72%) changed or added relevant lines in 4 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.04%) to 94.772%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/TemplateExpression.jl	163	169	96.45%

Files with Coverage Reduction	New Missed Lines	%
src/ComposableExpression.jl	2	96.55%

Totals
Change from base Build 12521558752:	0.04%
Covered Lines:	3281
Relevant Lines:	3462

💛 - Coveralls

gm89uk · 2024-12-26T09:10:42Z

Apologies in advance for the delay Miles, I'm away over Christmas, and won't be able to test it out till the 28th. Happy holidays

Andrea-gm · 2024-12-28T10:13:27Z

Hi Miles. Apologies for the later reply, I am also on and off these days. The API looks great to me, thanks so much for it!
Is it already possible to test this from Python? I am unfortunately not familiar with julia

gm89uk · 2024-12-29T00:01:12Z

Hi Miles,

Really like the syntax although I misunderstood that parameters = length(unique(i))

For full clarity:
I interpreted (and with ParametricExpression) that parameters were the number of parameters to optimise for each categorial variable. i.e. if you had 1 categorial variable with 4 categories (1,2,3,4), and you set number of parameters to 1, then it would optimise p1 (or whatever) to 4 different values within a function, one for each category; but actually you mean in this instance we should have number of parameters to be 4?

If batching is running, then length(unique(lens)) may be variable with each run.

So lets say i has 4 categories, 1,2,3,4. To find the form p*sin(x) = y, I should set number of parameters to be 4 or 1?
Nevertheless, a little stuck, thank you again for your patience.
I removed and ran ] add SymbolicRegression#master

structure = TemplateStructure{(:f,)}(
  ((; f), (x1, x2, x3, x4, x5, x6, y, cat), c) -> begin
    o = f(x1, x2, x3, x4, x5, x6, c[cat]) 
    if !o.valid
        return ValidVector(Vector{Float32}(undef, length(o.x)), false)
    end
    #Compute gradients for first 3 variables at once
    for varidx in 1:3
    	grad_column = D(f, varidx)(x1, x2, x3, x4, x5, x6, c[cat]).x
    	if !(all(grad_column .>= 0) || all(grad_column .<= 0))
        	return ValidVector(fill(Float32(1e9), length(o.x)), true)
    	end
    end
    return o
  end;
  num_parameters = length(unique(cat)) #I understand that cat is not defined here, but how do I pass a variable number of parameters so that this code works with batching?
)
model = SRRegressor(
    niterations=1000000,
    binary_operators=[+,-,*,/],
    maxsize=50,
    bumper=true,
    turbo=true,
    populations=18,
    expression_type = TemplateExpression,
    expression_options = (; structure),
    population_size=100
    parsimony = 0.01,
    batching=true,
)
mach = machine(model, X, y)
fit!(mach)

MilesCranmer · 2024-12-29T01:06:10Z

So lets say i has 4 categories, 1,2,3,4. To find the form p*sin(x) = y, I should set number of parameters to be 4 or 1?

And if i had 4 categories, and you were searching the equation p1 * sin(x) - p2 = y, then you would set the number of parameters to 8.

In other words, num_parameters is literally the length of the vector c in your example. You can use that vector in any way you want – which makes it super flexible.

Note that confusingly, num_parameters here (still just an idea, nothing is fixed) is different than ParametricExpression's max_parameters, which in this case, would be 1 and 2 respectively, rather than 4 and 8. This is because the ParametricExpression automatically infers the length of c based on the number of categories. Whereas here, we are just setting the length explicitly.

The one downside is you do need to do a bit more work to set up a problem. But I feel like it's so explicit and flexible that it's worth it, and maybe even easier to understand?

This also means you could have two category types: i1 and i2. If i1 had 4 categories, and corresponded to the parameter p1, and i2 had 2 categories for parameter p2, then you would set num_parameters to 6.

Do you think num_parameters is the right word? Or is there a better term here that might differentiate it with respect to the ParametricExpression max_parameters?

MilesCranmer · 2024-12-29T01:17:46Z

What about num_parameter_values instead of num_parameters? Then it contrasts a bit with ParametricExpression's max_parameters which is more like the number of parameter variables, rather than the number of actual scalar values.

MilesCranmer · 2024-12-30T12:45:23Z

@atharvas would be interested in your thoughts on this API as well, I think this will be useful for our stuff! See the first post in this thread for the current API.

MilesCranmer · 2024-12-30T12:57:49Z

I've got it printing parameters now too which is nice:

template_parameters_3_trimmed.mp4

gm89uk · 2024-12-30T14:42:52Z

Great progress! I was actually about to ask, how do you acquire the parameters in a very long run where you set the niterations to a very high number.

Thanks for the comments, it is much appreciated. Just as a user it is super helpful for me to know what is intuitive.

For this syntax, what would happen if you have a cat and a cat2 here, with different number of unique integers in each?

I've been thinking about this with an example that captures the complexities that can arise:
i1 is categorial variable with 4 members, i2 is a categorial variable with 2 members, we aim to have 2 parameters for i1 (so num_parameters 8), 1 parameter for i2 (so num_paramters 10), and a single parameter for constant optimisation (so num_parameters 11).

Current syntax:

((; f), (x1, x2, i1,i2), p) -> let
    p1 = p[i1]  #1st parameter for i1, 1st parameter overall
    p2 = p[i2+4]  #1st parameter for i2, 2nd parameter overall
    p3 = p[i1]  #reusing first parameter
    p4 = p[i1+6] #2nd parameter for i1, 3rd parameter overall
    p5 = p[10] #1st parameter constant optimisation, 4th parameter overall
   #used however you want between functions or within
end;
num_paramters = 11

Honestly, I find in complex workflows this will very prone to errors, especially if you get beyond 2,3 parameters.

Proposed synxtax:

((; f), (x1, x2, i1,i2), p) -> let
    p1 = p[i1,1]  #1st parameter for i1, 1st parameter overall
    p2 = p[i2,2]  #1st parameter for i2, 2nd parameter overall
    p3 = p[i1,1]  #reusing first parameter
    p4 = p[i1,3] #2nd parameter for i1, 3rd parameter overall
    p5 = p[,4] #1st parameter constant optimisation, 4th parameter overall
   #used however you want between functions or within
end

Then behind the hood, have an initiator:

parameters_indices = [1, 2, 1, 3, 4]  # Example data that is picked up from a parser
parameters_indices_variables = [:i1, :i2, :i1, :i1, missing]  # Example variables that is picked up from the parser, maybe going through structure and searching for p[*]
combined_matrix = hcat(parameters_indices, parameters_indices_variables)' 

# Unique parameter indices
unique_parameters_indices = unique(parameters_indices)
no_of_unique_parameters = maximum(unique_parameters_indices) 

num_parameters = 0
num_parameters_indices = zeros(Int, no_of_unique_parameters)  

for i in 1:no_of_unique_parameters
    matching_rows = combined_matrix[:, combined_matrix[1, :] .== i]
    unique_variable_count = length(unique(matching_rows[2, :]))
    num_parameters += unique_variable_count
    num_parameters_indices[i] = num_parameters
end

num_parameters_indices2 = similar(parameters_indices) 

for i in 1:length(parameters_indices)
    parameter_index = parameters_indices[i]
    matching_rows = combined_matrix[:, combined_matrix[1, :] .== parameter_index]
    unique_variable_count = length(unique(matching_rows[2, :]))
    num_parameters_indices2[i] = num_parameters_indices[parameter_index] - unique_variable_count
end

if I have written it right, then the output should be:
num_parameters_indices: [4, 6, 10, 11]
num_parameters_indices2: [0, 4, 0, 6, 10]
That should match with the original syntax:
p[parameters_indices_variables[1] + num_parameters_indices2[1]] = p[i1 + 0]
p[parameters_indices_variables[2] + num_parameters_indices2[2]] = p[i2 + 4]
p[parameters_indices_variables[3] + num_parameters_indices2[3]] = p[i1 + 0]
p[parameters_indices_variables[4] + num_parameters_indices2[4]] = p[i1 + 6]
p[parameters_indices_variables[5] + num_parameters_indices2[5]] = p[missing + 10]

Let me know if that makes any sense!

Edit: I just noticed v2 syntax, it is much better than API v1 and better than my suggestion above👍

gm89uk · 2024-12-30T15:31:29Z

With regards to API v2:

structure = TemplateStructure{(:f,), (:p1, :p2)}(
  function ((; f), (; p1, p2), (x1, x2, x3, x4, x5, i1, y, i2))
    o = f(x1, x2, x3, x4, x5, p1[i1], p2[i2])
    if !o.valid
        return o
    end
    #Compute gradients for all variables at once
    for varidx in 1:3
    	grad_column = D(f, varidx)(x1, x2, x3, x4, x5, p1[i1], p2[i2]).x
    	# Check if all nonnegative or all nonpositive
	if !(all(g -> g >= 0, grad_column) || all(g -> g <= 0, grad_column))
		return ValidVector(fill(1f09, length(o.x)), true)
    	end
    end
    return o
  end;
  num_parameters = (; p1=2, p2=12)
)

i1 is a binary categorial variable of 0,1s
i2 is has 12 unique members

ERROR: BoundsError: attempt to access 2-element SymbolicRegression.TemplateExpressionModule.ParamVector{Float32} at index [Float32[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0]]

Can't quite see my mistake!

MilesCranmer · 2024-12-30T16:32:09Z

Quick reply – since Julia is one-indexed, you will need to offset i1 by +1.

(I think I may make a 0-indexed array for the PySR interface so its easier for Python users)

gm89uk · 2024-12-30T18:02:50Z

Thank you, that worked. Theoretically, I suppose it should be more efficient to pass a categorial binary variable as a parametric variable than as 0,1 as a normal variable?

MilesCranmer · 2024-12-30T18:46:53Z

Maybe but note that everything gets converted to Float64 anyways (and then back to Int64 when indexing). But these sorts of memory considerations are <1% compared to all of the allocations that happen within the evaluation and optimization loops

Andrea-gm · 2025-01-04T13:03:15Z

Hi Miles. Thank you again for this implementation. Is there any plan of extending this to the python API? If yes, is it something you plan on doing soon?

MilesCranmer · 2025-01-04T13:05:46Z

Once it merges it should be straightforward to get the analogous version into Python. But need to confirm the Julia API first

gm89uk · 2025-01-04T14:42:05Z

Just another consideration, will the above APIs definitely work with batching = true?

If the whole database i1 has 12 different categories but on a specific batch there are only 5 present, would that not misalign the parameters and cause convergence issues?

MilesCranmer · 2025-01-04T15:16:04Z

Yeah it should be fine. The input features would just be slices of the whole features, and the categorical vectors are used as indices. So if you write ([1, 4, 8, 16])[[1, 3, 1]], you would get [1, 8, 1] returned; it wouldn’t be an issue that you didn’t refer to all the contents of the vector.

It does mean that sometimes, gradients with respect to one of the parameters will be zero, but you’d expect over time it would average out.

Also the expressions are evaluated on the whole dataset before getting stored in the Pareto front, so even if the batch loss is super noisy, the final loss will cover everything.

MilesCranmer · 2025-01-04T16:37:14Z

@avik-pal I'd be interested in hearing your thoughts on this API too! I think this would allow for further integration with Lux.jl. e.g., one of those parameter vectors could be for a Lux model - in which case you could simultaneously optimizing a Lux model AND a symbolic expression, jointly.

Andrea-gm · 2025-01-14T17:07:46Z

Hi Miles, I was wondering if you might have an idea of the timeline for implementing this feature. I don’t mean to pressure you in any way; I’m just trying to decide whether to wait or start learning Julia instead. Thank you!

MilesCranmer · 2025-01-14T19:21:05Z

I still have some reservations about the specific look of this API (someone on twitter mentioned having trouble understanding it). Maybe it's just that (; p1, p2) looks very unnatural to a Python user?

You would also write this like

structure = TemplateStructure{(:f,), (:p1, :p2, :p3)}(
    ((; f), params, (x1, x2, class)) -> params.p1[class] * x1^2 + f(x1, x2, params.p2[class]) - params.p3[1];
    num_parameters=(; p1=10, p2=10, p3=1)
)

It's a bit more verbose this way. The underlying mechanism is identical though. Perhaps it's a tad less confusing to people?

Andrea-gm · 2025-01-15T01:31:29Z

To me this looks fine, but it's true that it's more understandable when you can explicitly define the function_symbols and parameters, as it's currently done in TemplateExpressionSpec.

Not sure if it's possible, but maybe a more intuitive Python interface would be something like this?

expression_spec = TemplateExpressionSpec(
    function_symbols = ["f"],
    parameters = ["p1", "p2",  "p3"],
    num_categories = dict("p1"=10, "p2"=10), # default num_cat = 1
    features = ["x1", "x2",  "cat"],
    expression = "p1(cat) * x1^2 + f(x1, x2, p2(cat)) - p3"
)

And then you can build the julia string in the backend?
To me defining the parameters as functions of the category makes intuitive sense.

MilesCranmer · 2025-01-15T13:32:46Z

Thanks for that, very helpful. Indeed perhaps on the Python side we should make it so that the user need not specify the signature. My only concern is whether hiding the Julia stuff would make it less clear how to customize it. But maybe that’s unrealistic!

Although I’m not sure about making the parameters callable, because [] is used in both Julia and Python for indexing, and this is the operation that is actually occurring when writing p1[cat].

MilesCranmer · 2025-01-15T13:38:06Z

On the Julia side we could also this all be macro based. Generally I’m not a fan of user-side macros because it’s harder to customize stuff, but here maybe it’d be nice.

structure = @template(
    p1[class] * x1^2 + f(x1, x2, p2[class]) - p3[1],
    parameters=(p1=10, p2=10, p3=1),
    expressions=(f,),
    features=(x1, x2, class).
)

Thoughts?

MilesCranmer · 2025-01-15T13:52:59Z

We could also allow specifying parameter matrices. That way it could allow requesting stuff like Lux.jl parameters. cc @avik-pal

Like

parameters=(p2=(10,), p2=(10, 5) #= matrix =#, p3=() #= scalar =#)

Maybe it’s better to make an abstract interface for parameters in a template expression?

Andrea-gm · 2025-01-15T17:47:01Z

Thanks for that, very helpful. Indeed perhaps on the Python side we should make it so that the user need not specify the signature. My only concern is whether hiding the Julia stuff would make it less clear how to customize it. But maybe that’s unrealistic!

Understandable! Maybe you can then leave the option of a julia_expression parameter where the user can pass a julia string that overrides the rest of the expression? Defaulting julia_expression = None

This comment was marked as resolved.

Sign in to view

MilesCranmer added 23 commits December 28, 2024 20:21

wip: start implementing parametric component of template expressions

33b6452

wip: rewrite TemplateExpression so all definitions within file

1d64994

wip: add more functions for parametric form of TemplateExpression

e4add98

feat: introduce parameter feature for TemplateExpression

5c5a07e

refactor: num_params -> num_parameters

252ddf7

refactor: clean up with has_params

af67dfe

feat: custom mutate_constant for parametric template expression

a71a545

feat: permit variable_names=nothing in ComposableExpression

e478322

refactor: clean up redundant imports

3242e69

feat: allow initializing parameters from naked array

99b1598

fix: some issues in TemplateExpression

29c5e4c

test: parameters for template expression

e015105

ci: flag isbits functions for no coverage

7d3a2bc

fix: better defensive coding

7faeb4d

test: indexed evaluation of paramters

e6b4c2a

test: include proper test of parametric template expression

2da54a6

fix: condition for constant mutation of template

8f1b7dc

refactor: avoid named tuple types

7ebe583

test: fix JET error with type assertion

29fb7db

test: fix JET error

c6205ab

ci: skip known coverage

61374d7

test: fix parametric template example

2c539ed

fix: param assertion

a42aa91

MilesCranmer added 6 commits December 30, 2024 10:20

refactor: use empty NamedTuple instead of nothing

7927331

test: fix JET error and expand to 1.11

215eb72

test: basic evaluation with multiple parameters

75cc4c5

test: indexed multi-param evaluation

1f5c22c

test: fix error test

44e5ef8

feat: have TemplateExpression store all parameters in raw string

5e62569

MilesCranmer added 2 commits December 30, 2024 15:12

ci: add tests to parts

d792cee

ci: try to rebalance test times

4f20043

ci: try to rebalance test times

cef9b30

test: fix printing test

e8d9a5e

ci: try to rebalance test times

b0742c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: TemplateExpression with parameters #394

RFC: TemplateExpression with parameters #394

MilesCranmer commented Dec 26, 2024 •

edited

Loading

This comment was marked as resolved.

coveralls commented Dec 26, 2024 •

edited

Loading

gm89uk commented Dec 26, 2024

Andrea-gm commented Dec 28, 2024

gm89uk commented Dec 29, 2024 •

edited

Loading

MilesCranmer commented Dec 29, 2024

MilesCranmer commented Dec 29, 2024

MilesCranmer commented Dec 30, 2024 •

edited

Loading

MilesCranmer commented Dec 30, 2024

gm89uk commented Dec 30, 2024 •

edited

Loading

gm89uk commented Dec 30, 2024

MilesCranmer commented Dec 30, 2024

gm89uk commented Dec 30, 2024 •

edited

Loading

MilesCranmer commented Dec 30, 2024

Andrea-gm commented Jan 4, 2025

MilesCranmer commented Jan 4, 2025

gm89uk commented Jan 4, 2025

MilesCranmer commented Jan 4, 2025

MilesCranmer commented Jan 4, 2025 •

edited

Loading

Andrea-gm commented Jan 14, 2025

MilesCranmer commented Jan 14, 2025

Andrea-gm commented Jan 15, 2025 •

edited

Loading

MilesCranmer commented Jan 15, 2025

MilesCranmer commented Jan 15, 2025 •

edited

Loading

MilesCranmer commented Jan 15, 2025 •

edited

Loading

Andrea-gm commented Jan 15, 2025

RFC: TemplateExpression with parameters #394

Are you sure you want to change the base?

RFC: TemplateExpression with parameters #394

Conversation

MilesCranmer commented Dec 26, 2024 • edited Loading

This comment was marked as resolved.

coveralls commented Dec 26, 2024 • edited Loading

Pull Request Test Coverage Report for Build 12551568401

Details

💛 - Coveralls

gm89uk commented Dec 26, 2024

Andrea-gm commented Dec 28, 2024

gm89uk commented Dec 29, 2024 • edited Loading

MilesCranmer commented Dec 29, 2024

MilesCranmer commented Dec 29, 2024

MilesCranmer commented Dec 30, 2024 • edited Loading

MilesCranmer commented Dec 30, 2024

gm89uk commented Dec 30, 2024 • edited Loading

gm89uk commented Dec 30, 2024

MilesCranmer commented Dec 30, 2024

gm89uk commented Dec 30, 2024 • edited Loading

MilesCranmer commented Dec 30, 2024

Andrea-gm commented Jan 4, 2025

MilesCranmer commented Jan 4, 2025

gm89uk commented Jan 4, 2025

MilesCranmer commented Jan 4, 2025

MilesCranmer commented Jan 4, 2025 • edited Loading

Andrea-gm commented Jan 14, 2025

MilesCranmer commented Jan 14, 2025

Andrea-gm commented Jan 15, 2025 • edited Loading

MilesCranmer commented Jan 15, 2025

MilesCranmer commented Jan 15, 2025 • edited Loading

MilesCranmer commented Jan 15, 2025 • edited Loading

Andrea-gm commented Jan 15, 2025

MilesCranmer commented Dec 26, 2024 •

edited

Loading

coveralls commented Dec 26, 2024 •

edited

Loading

gm89uk commented Dec 29, 2024 •

edited

Loading

MilesCranmer commented Dec 30, 2024 •

edited

Loading

gm89uk commented Dec 30, 2024 •

edited

Loading

gm89uk commented Dec 30, 2024 •

edited

Loading

MilesCranmer commented Jan 4, 2025 •

edited

Loading

Andrea-gm commented Jan 15, 2025 •

edited

Loading

MilesCranmer commented Jan 15, 2025 •

edited

Loading

MilesCranmer commented Jan 15, 2025 •

edited

Loading