-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add naming spike #18
base: main
Are you sure you want to change the base?
Add naming spike #18
Conversation
I find the concept of representers really interesting because there is so much space between "human interpretation" and "machine interpretation" of one code. Different local names like in this PR is one such space freedom. But the general problem of computing representation is where do we place the cursor between "exactly the same code" and "exactly the same code execution". The more we say that a normalization is fine if it results in the same code execution, the more liberty we have to change things and thus to have more generic representations. This may be interesting in the case of changes that have combinatorial effects on the number of representations. One such thing is order. May it be order of declarations, variables, etc. For example:
Those are also fundamentally the same. I believe each (valid) normalization brings us closer to "same code execution" and further from the original solution. I'm wondering if tackling this problem could be done with some kind of representation levels. Let's say we have three levels in the ruby representer for example.
Let's say an exercise has 1000 solutions. Maybe there will have 200 different level 1 representations, 50 different level 2 and 10 different level 3 representations. So provided adequate feedback on level 3 representations could reach a lot of people, while providing feedback on level 1 would mean a lot more personalized feedback. Levels 3 (or whatever higher representation) could also be interesting to analyze the global patterns used to solve an exercise so might be interesting for grouping solutions by categories when browsing solutions of other people. That's just my 2 cents on the problem of representations ^^. |
I'm not sure if I follow. We need the represented for two things, right:
And I'm not sure how the represented is built, but I kinda thought that it returned a Data Structure consisting of tokens. Then that data structure is parsed to whatever the current objective is at that time. If that's the case, why not store the identifiers in a scope table, and return the token in the form of a pair [PLACEHOLDER_ID, NAME] with the PLACEHOLDER_ID the placeholder given for that context, and the NAME the name the var had before? I think it goes in the line of the idea of having a step where we can do bijection mappings, but what I'm lost is why we have to do these kind of decisions before presentation - we can use these pair tokens for anything but showing them to the mentor, and there the conversion should be fairly simple. I'm talking from the unknown here, because I don't know exactly what the represented is or what its features are, but rather what I'd do with this kind of problem in a general way As an example the given code would become this. Submission 1
Submission 2
Submission 3
We could even make them valid string identifiers: [ID_1, 'a'] could become |
See https://github.com/exercism/docs/blob/main/building/tooling/representers/interface.md for a description.
We store the name mapping in a |
This is a messy silly spike. The code is not important, and will never get merged, but the idea is interesting.
Also - this is entirely not urgent or important, but as writing CSS 24/7 is killing me, and its important to be confident of our specs before launch, I felt like spending an hour or two reasoning about this using code
This explores the idea of rewriting a solution to standardise variable naming to allow for deeper normalisation.
Problem
It is essential that when a mentor gives feedback on a representer solution that they are referencing identifiers (e.g. variable names) that can be replaced with other variable names in other solutions.
Right now the fact that someone can reuse an identifier (e.g. variable name) in different contexts is problematic for us.
Take these three submissions:
If we replace
a
in the first solution withPLACEHOLDER_1
andb
in the second withPLACEHOLDER_1
we can see that they are identical. However in submission 3 because botha
andb
are used, this naive replacement means they get different representations.Possible Solution
Rather than relying on the identifiers given by the student we could instead rename the variables ourselves internally. This would give us something like this:
All three submissions are normalised in the same way so their representations are identical. This branch achieves this (see test/naming_normalizers/method_locals.rb for tests):
This still gives us a problem though that we show the original code to the mentor to give feedback on. If they gave feedback on submission 3, this would be fine, but on submissions 1 or 2, the
a
orb
would be ambiguous. The rule to guard against this is that allmapping.json
files must be reversible (ie we cannot have identical keys or values).To fix this, we could return our normalised solution as a new
normalized.rb
file with a mapping against this file, and show this normalised code to the mentor (possibly with a tab to see original code too in case the modified identifiers make it hard to read). Regardless of which of the three submissions are used, things would then map correctly.send
anddefine_method
and either falling back to a more naive implementation or returning an error on those.