Morphology support #1

vgel · 2020-10-19T23:56:05Z

Right now terminal tokens have to be separate words. Treebender should be able to support morphological rules:

V[ stem: t ] -> walk
V[ stem: t ] -> talk
// stem: f to block walkedededededededed...
V[ tense: past, stem: f ] -> V[ stem: t ] ++ ed  // syntax TBD

Questions:

What scope do we want here? Are we only supporting basic concatenative morphology (prefixes and suffixes), or will we try and support allomorphy, sound changes / ablaut, semitic roots...

It's tempting to say we just focus on English and support concatenative and allow the user to fall back with a flag:

    V[ can_inflect: y ] -> walk
    V[ can_inflect: n ] -> buy
    V[ tense: past, can_inflect: n ] -> V[ can_inflect: y ] ++ ed
    V[ tense: past, can_inflect: n ] -> bought
+ However, lots of common words in English have changes like bake ~ baked not *bakeed. There's no real way to support that without some more sophisticated tool or tons of duplicate rules.

Todo:

Remind myself of how the LKB does this

The text was updated successfully, but these errors were encountered:

vgel · 2020-10-20T00:09:49Z

One way to approach this would actually be to just allow grammar files to define a token-splitting process that runs before parsing.

Something like:

$splitters = [
    /(.+)ed/ => [\1, -ed]
    /(.+)d/  =>  [\1, -ed] // for words like "baked"
    /(.+)s/  => [\1, -s]
    /(.+)es/ => [\1, -s]
]

Then all possible splitters would match on a word, plus an implicit "no expansion" splitter, and split a sentence into a bunch of possible morphological derivations:

"The dogs walked to the beach and baked"
"The dogs walk -ed to the beach and baked"
"The dogs walke -ed to the beach and baked"
"The dog -s walked to the beach and baked"
"The dog -s walk -ed to the beach and baked"
"The dog -s walke -ed to the beach and baked"
"The dogs walked to the beach and bak -ed"
"The dogs walk -ed to the beach and bak -ed"
"The dogs walke -ed to the beach and bak -ed"
"The dog -s walked to the beach and bak -ed"
"The dog -s walk -ed to the beach and bak -ed"
"The dog -s walke -ed to the beach and bak -ed"
"The dogs walked to the beach and bake -ed"
"The dogs walk -ed to the beach and bake -ed"
"The dogs walke -ed to the beach and bake -ed"
"The dog -s walked to the beach and bake -ed"
==> "The dog -s walk -ed to the beach and bake -ed"
"The dog -s walke -ed to the beach and bake -ed"

Obviously this has the potential to blow up, but we could also fail fast if a splitter generates a token that doesn't match any nonterminals in the grammar.

vgel added the enhancement New feature or request label Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Morphology support #1

Morphology support #1

vgel commented Oct 19, 2020

vgel commented Oct 20, 2020 •

edited

Loading

Morphology support #1

Morphology support #1

Comments

vgel commented Oct 19, 2020

vgel commented Oct 20, 2020 • edited Loading

vgel commented Oct 20, 2020 •

edited

Loading