-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: TemplateExpression with parameters #394
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Pull Request Test Coverage Report for Build 12551568401Details
💛 - Coveralls |
Apologies in advance for the delay Miles, I'm away over Christmas, and won't be able to test it out till the 28th. Happy holidays |
Hi Miles. Apologies for the later reply, I am also on and off these days. The API looks great to me, thanks so much for it! |
Hi Miles, Really like the syntax although I misunderstood that parameters = length(unique(i)) For full clarity: If batching is running, then length(unique(lens)) may be variable with each run. So lets say i has 4 categories, 1,2,3,4. To find the form p*sin(x) = y, I should set number of parameters to be 4 or 1? structure = TemplateStructure{(:f,)}(
((; f), (x1, x2, x3, x4, x5, x6, y, cat), c) -> begin
o = f(x1, x2, x3, x4, x5, x6, c[cat])
if !o.valid
return ValidVector(Vector{Float32}(undef, length(o.x)), false)
end
#Compute gradients for first 3 variables at once
for varidx in 1:3
grad_column = D(f, varidx)(x1, x2, x3, x4, x5, x6, c[cat]).x
if !(all(grad_column .>= 0) || all(grad_column .<= 0))
return ValidVector(fill(Float32(1e9), length(o.x)), true)
end
end
return o
end;
num_parameters = length(unique(cat)) #I understand that cat is not defined here, but how do I pass a variable number of parameters so that this code works with batching?
)
model = SRRegressor(
niterations=1000000,
binary_operators=[+,-,*,/],
maxsize=50,
bumper=true,
turbo=true,
populations=18,
expression_type = TemplateExpression,
expression_options = (; structure),
population_size=100
parsimony = 0.01,
batching=true,
)
mach = machine(model, X, y)
fit!(mach) |
And if In other words, Note that confusingly, The one downside is you do need to do a bit more work to set up a problem. But I feel like it's so explicit and flexible that it's worth it, and maybe even easier to understand? This also means you could have two category types: Do you think |
What about |
@atharvas would be interested in your thoughts on this API as well, I think this will be useful for our stuff! See the first post in this thread for the current API. |
I've got it printing parameters now too which is nice: template_parameters_3_trimmed.mp4 |
Great progress! I was actually about to ask, how do you acquire the parameters in a very long run where you set the niterations to a very high number.
I've been thinking about this with an example that captures the complexities that can arise: Current syntax: ((; f), (x1, x2, i1,i2), p) -> let
p1 = p[i1] #1st parameter for i1, 1st parameter overall
p2 = p[i2+4] #1st parameter for i2, 2nd parameter overall
p3 = p[i1] #reusing first parameter
p4 = p[i1+6] #2nd parameter for i1, 3rd parameter overall
p5 = p[10] #1st parameter constant optimisation, 4th parameter overall
#used however you want between functions or within
end;
num_paramters = 11 Honestly, I find in complex workflows this will very prone to errors, especially if you get beyond 2,3 parameters. Proposed synxtax: ((; f), (x1, x2, i1,i2), p) -> let
p1 = p[i1,1] #1st parameter for i1, 1st parameter overall
p2 = p[i2,2] #1st parameter for i2, 2nd parameter overall
p3 = p[i1,1] #reusing first parameter
p4 = p[i1,3] #2nd parameter for i1, 3rd parameter overall
p5 = p[,4] #1st parameter constant optimisation, 4th parameter overall
#used however you want between functions or within
end Then behind the hood, have an initiator: parameters_indices = [1, 2, 1, 3, 4] # Example data that is picked up from a parser
parameters_indices_variables = [:i1, :i2, :i1, :i1, missing] # Example variables that is picked up from the parser, maybe going through structure and searching for p[*]
combined_matrix = hcat(parameters_indices, parameters_indices_variables)'
# Unique parameter indices
unique_parameters_indices = unique(parameters_indices)
no_of_unique_parameters = maximum(unique_parameters_indices)
num_parameters = 0
num_parameters_indices = zeros(Int, no_of_unique_parameters)
for i in 1:no_of_unique_parameters
matching_rows = combined_matrix[:, combined_matrix[1, :] .== i]
unique_variable_count = length(unique(matching_rows[2, :]))
num_parameters += unique_variable_count
num_parameters_indices[i] = num_parameters
end
num_parameters_indices2 = similar(parameters_indices)
for i in 1:length(parameters_indices)
parameter_index = parameters_indices[i]
matching_rows = combined_matrix[:, combined_matrix[1, :] .== parameter_index]
unique_variable_count = length(unique(matching_rows[2, :]))
num_parameters_indices2[i] = num_parameters_indices[parameter_index] - unique_variable_count
end if I have written it right, then the output should be: Let me know if that makes any sense! Edit: I just noticed v2 syntax, it is much better than API v1 and better than my suggestion above👍 |
With regards to API v2: structure = TemplateStructure{(:f,), (:p1, :p2)}(
function ((; f), (; p1, p2), (x1, x2, x3, x4, x5, i1, y, i2))
o = f(x1, x2, x3, x4, x5, p1[i1], p2[i2])
if !o.valid
return o
end
#Compute gradients for all variables at once
for varidx in 1:3
grad_column = D(f, varidx)(x1, x2, x3, x4, x5, p1[i1], p2[i2]).x
# Check if all nonnegative or all nonpositive
if !(all(g -> g >= 0, grad_column) || all(g -> g <= 0, grad_column))
return ValidVector(fill(1f09, length(o.x)), true)
end
end
return o
end;
num_parameters = (; p1=2, p2=12)
) i1 is a binary categorial variable of 0,1s ERROR: BoundsError: attempt to access 2-element SymbolicRegression.TemplateExpressionModule.ParamVector{Float32} at index [Float32[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0]] Can't quite see my mistake! |
Quick reply – since Julia is one-indexed, you will need to offset (I think I may make a 0-indexed array for the PySR interface so its easier for Python users) |
Thank you, that worked. Theoretically, I suppose it should be more efficient to pass a categorial binary variable as a parametric variable than as 0,1 as a normal variable? |
Maybe but note that everything gets converted to Float64 anyways (and then back to Int64 when indexing). But these sorts of memory considerations are <1% compared to all of the allocations that happen within the evaluation and optimization loops |
Hi Miles. Thank you again for this implementation. Is there any plan of extending this to the python API? If yes, is it something you plan on doing soon? |
Once it merges it should be straightforward to get the analogous version into Python. But need to confirm the Julia API first |
Just another consideration, will the above APIs definitely work with batching = true? If the whole database i1 has 12 different categories but on a specific batch there are only 5 present, would that not misalign the parameters and cause convergence issues? |
Yeah it should be fine. The input features would just be slices of the whole features, and the categorical vectors are used as indices. So if you write It does mean that sometimes, gradients with respect to one of the parameters will be zero, but you’d expect over time it would average out. Also the expressions are evaluated on the whole dataset before getting stored in the Pareto front, so even if the batch loss is super noisy, the final loss will cover everything. |
@avik-pal I'd be interested in hearing your thoughts on this API too! I think this would allow for further integration with Lux.jl. e.g., one of those parameter vectors could be for a Lux model - in which case you could simultaneously optimizing a Lux model AND a symbolic expression, jointly. |
Hi Miles, I was wondering if you might have an idea of the timeline for implementing this feature. I don’t mean to pressure you in any way; I’m just trying to decide whether to wait or start learning Julia instead. Thank you! |
I still have some reservations about the specific look of this API (someone on twitter mentioned having trouble understanding it). Maybe it's just that You would also write this like structure = TemplateStructure{(:f,), (:p1, :p2, :p3)}(
((; f), params, (x1, x2, class)) -> params.p1[class] * x1^2 + f(x1, x2, params.p2[class]) - params.p3[1];
num_parameters=(; p1=10, p2=10, p3=1)
) It's a bit more verbose this way. The underlying mechanism is identical though. Perhaps it's a tad less confusing to people? |
To me this looks fine, but it's true that it's more understandable when you can explicitly define the Not sure if it's possible, but maybe a more intuitive Python interface would be something like this?
And then you can build the julia string in the backend? |
Thanks for that, very helpful. Indeed perhaps on the Python side we should make it so that the user need not specify the signature. My only concern is whether hiding the Julia stuff would make it less clear how to customize it. But maybe that’s unrealistic! Although I’m not sure about making the parameters callable, because |
On the Julia side we could also this all be macro based. Generally I’m not a fan of user-side macros because it’s harder to customize stuff, but here maybe it’d be nice. structure = @template(
p1[class] * x1^2 + f(x1, x2, p2[class]) - p3[1],
parameters=(p1=10, p2=10, p3=1),
expressions=(f,),
features=(x1, x2, class).
) Thoughts? |
We could also allow specifying parameter matrices. That way it could allow requesting stuff like Lux.jl parameters. cc @avik-pal Like parameters=(p2=(10,), p2=(10, 5) #= matrix =#, p3=() #= scalar =#) Maybe it’s better to make an abstract interface for parameters in a template expression? |
Understandable! Maybe you can then leave the option of a |
@gm89uk @Andrea-gm here's an attempt at MilesCranmer/PySR#787. If you get a chance to try, I'd be very interested to hear what you think.
The API is like this (v2). We request an expression
f
and parameter vectorsp1
,p2
, andp3
, via the type parameter{(:f,), (:p1, :p2, :p3)}
. We then write out how a function specifying how these are combined, and how the expression is evaluated. We then specify the length of each parameter vector (which here is the number of unique parameter classes):This is equivalent to (for$\alpha$ , $\beta$ , $\gamma$ ):
p1
=p2
=p3
=for each datapoint$i$ with features $x_1$ and $x_2$ , and class $c$ . There are 10 classes which each have parameters $\alpha$ and $\beta$ . There is also $\gamma$ which is simply a manually-specified constant.
The Python version could be:
Note that you would pass
class
as the third feature of the dataset.Original API
The API is like this:
You need to specify how many total parameters you ask for in advance. Usually this is just
length(unique(i))
for some vectori
that contains the categorization of each row. But you can have multiple categorizations too – just pass those as additional columns to the dataset. (They will get converted back to integers when indexingp
here)Those parameters can be used in any way you want. You can pass them to one of the subexpressions:
which basically let's you do what
ParametricExpression
does, but in a fixed functional form way.You can also get parameters manually, like this:
Where you can see we just ask for a single parameter in the structure.
Full example: