You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now terminal tokens have to be separate words. Treebender should be able to support morphological rules:
V[ stem: t ] -> walk
V[ stem: t ] -> talk
// stem: f to block walkedededededededed...
V[ tense: past, stem: f ] -> V[ stem: t ] ++ ed // syntax TBD
Questions:
What scope do we want here? Are we only supporting basic concatenative morphology (prefixes and suffixes), or will we try and support allomorphy, sound changes / ablaut, semitic roots...
It's tempting to say we just focus on English and support concatenative and allow the user to fall back with a flag:
V[ can_inflect: y ] -> walk
V[ can_inflect: n ] -> buy
V[ tense: past, can_inflect: n ] -> V[ can_inflect: y ] ++ ed
V[ tense: past, can_inflect: n ] -> bought
+ However, lots of common words in English have changes like bake ~ baked not *bakeed. There's no real way to support that without some more sophisticated tool or tons of duplicate rules.
Todo:
Remind myself of how the LKB does this
The text was updated successfully, but these errors were encountered:
One way to approach this would actually be to just allow grammar files to define a token-splitting process that runs before parsing.
Something like:
$splitters = [
/(.+)ed/ => [\1, -ed]
/(.+)d/ => [\1, -ed] // for words like "baked"
/(.+)s/ => [\1, -s]
/(.+)es/ => [\1, -s]
]
Then all possible splitters would match on a word, plus an implicit "no expansion" splitter, and split a sentence into a bunch of possible morphological derivations:
"The dogs walked to the beach and baked"
"The dogs walk -ed to the beach and baked"
"The dogs walke -ed to the beach and baked"
"The dog -s walked to the beach and baked"
"The dog -s walk -ed to the beach and baked"
"The dog -s walke -ed to the beach and baked"
"The dogs walked to the beach and bak -ed"
"The dogs walk -ed to the beach and bak -ed"
"The dogs walke -ed to the beach and bak -ed"
"The dog -s walked to the beach and bak -ed"
"The dog -s walk -ed to the beach and bak -ed"
"The dog -s walke -ed to the beach and bak -ed"
"The dogs walked to the beach and bake -ed"
"The dogs walk -ed to the beach and bake -ed"
"The dogs walke -ed to the beach and bake -ed"
"The dog -s walked to the beach and bake -ed"
==> "The dog -s walk -ed to the beach and bake -ed"
"The dog -s walke -ed to the beach and bake -ed"
Obviously this has the potential to blow up, but we could also fail fast if a splitter generates a token that doesn't match any nonterminals in the grammar.
Right now terminal tokens have to be separate words. Treebender should be able to support morphological rules:
Questions:
Todo:
The text was updated successfully, but these errors were encountered: