-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
56 lines (35 loc) · 1.57 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
## ==================================================================
## Indonesian Morphological Tool (MorphInd)
## Last Modified : May 2013
## ==================================================================
## Copyright (c) 2013 Septina Dian Larasati. All rights reserved.
Version
================
1.2: Initial Version
1.3: Added disambiguation module
1.4: Added compound word handling
Operating System
================
Only run on Unix machine
Prerequisites:
==============
- PERL
- FOMA (http://code.google.com/p/foma)
How to use:
===========
bash$ cat INPUTFILE | perl MorphInd.pl > OUTPUTFILE
bash$ echo "mengirim" | perl MorphInd.pl > OUTPUTFILE
bash$ cat INPUTFILE | perl MorphInd.pl [-cache-file CACHEFILE -bin-file BINARYFILE] > OUTPUTFILE
Example:
bash$ cat sample.txt | perl MorphInd.pl > sample.out
bash$ cat sample.txt | perl MorphInd.pl -cache-file cache-files/default.cache -bin-file bin/morphind.bin > sample.out
Input example : "mengirim"
Output example : "^meN+kirim<v>_VSA$"
Files
=====
= cache-files/default.cache: The cache files, that can give a direct analysis and override the FST tool output. The entries format is "[word][tab][analysis]".
All words partially containing number must be put in this file, e.g. "ab123".
= cache-files/default.compound: The compound files, that can give a direct analysis and override the FST tool output. The entries format is "[compound word][tab][analysis]".
Known Bug
=========
- Lemmata containing "berr" or "terr" are not analyzed properly. These lemmata has to be put in "default.cache" with its respective analysis.