Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about upcase solution #34

Open
GoogleCodeExporter opened this issue Mar 3, 2016 · 4 comments
Open

Question about upcase solution #34

GoogleCodeExporter opened this issue Mar 3, 2016 · 4 comments

Comments

@GoogleCodeExporter
Copy link

I solved the problem to also recognize upcase words using an upcase converter 
like:

define ToUpcase a -> A || .#. _ ,,
                á -> Á || .#. _ ,,
                b -> B || .#. _ ,,
                c -> C || .#. _ ,,
                d -> D || .#. _ ,,
                e -> E || .#. _ ,,
                é -> É || .#. _ ,,
                f -> F || .#. _ ,,
                g -> G || .#. _ ,,
                h -> H || .#. _ ,,
                i -> I || .#. _ ,,
                í -> Í || .#. _ ,,
                j -> J || .#. _ ,,
                k -> K || .#. _ ,,
                l -> L || .#. _ ,,
                m -> M || .#. _ ,,
                n -> N || .#. _ ,,
                o -> O || .#. _ ,,
                ó -> Ó || .#. _ ,,
                ö -> Ö || .#. _ ,,
                ő -> Ő || .#. _ ,,
                p -> P || .#. _ ,,
                q -> Q || .#. _ ,,
                r -> R || .#. _ ,,
                s -> S || .#. _ ,,
                t -> T || .#. _ ,,
                u -> U || .#. _ ,,
                ú -> Ú || .#. _ ,,
                ü -> Ü || .#. _ ,,
                ű -> Ű || .#. _ ,,
                v -> V || .#. _ ,,
                w -> W || .#. _ ,,
                x -> X || .#. _ ,,
                y -> Y || .#. _ ,,
                z -> Z || .#. _ ;

and by doubling all grammars having a normal and an upcase version:
define Grammar Lexicon           .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup;

define Grammarup Lexicon           .o. 
               ToUpcase          .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup;

regex Grammar | Grammarup;

Attached the complete project.

The approach has two disadvantages:
1. I have to double all grammars
2. using down:
foma[1]: down
apply down> cat+N+Sg
cat
Cat
apply down> Peter+N+Sg
Peter

I also get for cat+N+Sg Cat, which is obvious and in fact unnecessary.

Is there a more elegant way to solve up/lower case, or is my one the optimal 
one?

Thanks in advance.

Original issue reported on code.google.com by [email protected] on 29 Jul 2012 at 12:28

Attachments:

@GoogleCodeExporter
Copy link
Author

[deleted comment]

1 similar comment
@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Here's a simpler, although ultimately equivalent way of doing it. 

First we define a transducer that optionally uppercases the first letter:

define UpCase 
[a:A|b:B|c:C|d:D|e:E|f:F|g:G|h:H|i:I|j:J|l:L|m:M|n:N|o:O|p:P|q:Q|r:R|s:S|t:T|u:U
|v:V|w:W|x:X|y:Y|z:Z] ?* | ?*;

And then we compose this in last after Cleanup:

define Grammar Lexicon           .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup           .o.
               UpCase ;

As for the second question: there is no way to build a transducer that would 
map cat+N+Sg only to cat and where its inverse would also map Cat to cat+N+Sg.  
They are the same device, so to speak, and contain the same mappings regardless 
of the direction. The only way to get cat as the only output for cat+N+Sg is to 
define two separate transducers, one for generation (without the uppercasing 
composed in), and another one for parsing (with uppercasing). It is in fact 
fairly normal to maintain two such transducers for various reasons.

Original comment by [email protected] on 29 Jul 2012 at 3:38

@GoogleCodeExporter
Copy link
Author

Thanks for the answer and help.
The UpCase transducer is a big deal for me, because I have 8200 lines in the 
foma code now, and it does matter for 
* code maintenance
* adding new code
* compiling 
if I have 8200 lines or 16400 lines.

For the second also thanks for the idea with separating generation and parsing. 
At present I am not that far yet, that I can decide, if I do that.

Original comment by [email protected] on 30 Jul 2012 at 8:21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant