Question about upcase solution #34

GoogleCodeExporter · 2016-03-03T03:08:35Z

I solved the problem to also recognize upcase words using an upcase converter 
like:

define ToUpcase a -> A || .#. _ ,,
                á -> Á || .#. _ ,,
                b -> B || .#. _ ,,
                c -> C || .#. _ ,,
                d -> D || .#. _ ,,
                e -> E || .#. _ ,,
                é -> É || .#. _ ,,
                f -> F || .#. _ ,,
                g -> G || .#. _ ,,
                h -> H || .#. _ ,,
                i -> I || .#. _ ,,
                í -> Í || .#. _ ,,
                j -> J || .#. _ ,,
                k -> K || .#. _ ,,
                l -> L || .#. _ ,,
                m -> M || .#. _ ,,
                n -> N || .#. _ ,,
                o -> O || .#. _ ,,
                ó -> Ó || .#. _ ,,
                ö -> Ö || .#. _ ,,
                ő -> Ő || .#. _ ,,
                p -> P || .#. _ ,,
                q -> Q || .#. _ ,,
                r -> R || .#. _ ,,
                s -> S || .#. _ ,,
                t -> T || .#. _ ,,
                u -> U || .#. _ ,,
                ú -> Ú || .#. _ ,,
                ü -> Ü || .#. _ ,,
                ű -> Ű || .#. _ ,,
                v -> V || .#. _ ,,
                w -> W || .#. _ ,,
                x -> X || .#. _ ,,
                y -> Y || .#. _ ,,
                z -> Z || .#. _ ;

and by doubling all grammars having a normal and an upcase version:
define Grammar Lexicon           .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup;

define Grammarup Lexicon           .o. 
               ToUpcase          .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup;

regex Grammar | Grammarup;

Attached the complete project.

The approach has two disadvantages:
1. I have to double all grammars
2. using down:
foma[1]: down
apply down> cat+N+Sg
cat
Cat
apply down> Peter+N+Sg
Peter

I also get for cat+N+Sg Cat, which is obvious and in fact unnecessary.

Is there a more elegant way to solve up/lower case, or is my one the optimal 
one?

Thanks in advance.

Original issue reported on code.google.com by [email protected] on 29 Jul 2012 at 12:28

Attachments:

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2016-03-03T03:08:36Z

[deleted comment]

GoogleCodeExporter · 2016-03-03T03:08:36Z

[deleted comment]

GoogleCodeExporter · 2016-03-03T03:08:36Z

Here's a simpler, although ultimately equivalent way of doing it. 

First we define a transducer that optionally uppercases the first letter:

define UpCase 
[a:A|b:B|c:C|d:D|e:E|f:F|g:G|h:H|i:I|j:J|l:L|m:M|n:N|o:O|p:P|q:Q|r:R|s:S|t:T|u:U
|v:V|w:W|x:X|y:Y|z:Z] ?* | ?*;

And then we compose this in last after Cleanup:

define Grammar Lexicon           .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup           .o.
               UpCase ;

As for the second question: there is no way to build a transducer that would 
map cat+N+Sg only to cat and where its inverse would also map Cat to cat+N+Sg.  
They are the same device, so to speak, and contain the same mappings regardless 
of the direction. The only way to get cat as the only output for cat+N+Sg is to 
define two separate transducers, one for generation (without the uppercasing 
composed in), and another one for parsing (with uppercasing). It is in fact 
fairly normal to maintain two such transducers for various reasons.

Original comment by [email protected] on 29 Jul 2012 at 3:38

GoogleCodeExporter · 2016-03-03T03:08:36Z

Thanks for the answer and help.
The UpCase transducer is a big deal for me, because I have 8200 lines in the 
foma code now, and it does matter for 
* code maintenance
* adding new code
* compiling 
if I have 8200 lines or 16400 lines.

For the second also thanks for the idea with separating generation and parsing. 
At present I am not that far yet, that I can decide, if I do that.

Original comment by [email protected] on 30 Jul 2012 at 8:21

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Mar 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about upcase solution #34

Question about upcase solution #34

GoogleCodeExporter commented Mar 3, 2016

GoogleCodeExporter commented Mar 3, 2016

GoogleCodeExporter commented Mar 3, 2016

GoogleCodeExporter commented Mar 3, 2016

GoogleCodeExporter commented Mar 3, 2016

Question about upcase solution #34

Question about upcase solution #34

Comments

GoogleCodeExporter commented Mar 3, 2016

GoogleCodeExporter commented Mar 3, 2016

GoogleCodeExporter commented Mar 3, 2016

GoogleCodeExporter commented Mar 3, 2016

GoogleCodeExporter commented Mar 3, 2016