Parser for KAF or NAF files in python. The documentation for all methods and API of this parser can be found at:
- HTML: http://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy
- PDF: http://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/api.pdf
You can also take a look at this presentation on slideshare about this library.
The KafNafParserPy is from Feb 10th available in the Python Package Index, so you can easily install it (and its dependencies), by running:
pip install KafNafParserPy
Clone the repository from github
git clone https://github.com/cltl/KafNafParserPy.git
You will need to have installed the lxml library for python (http://lxml.de/). Usually just by runningpip install --user lxml
should be enough for
getting lxml installed. In some cases there can be problems with the libraries libxml and libxslt. In this case (considering you have no root access
for the machine), you can try to do the following:
wget http://xmlsoft.org/sources/libxml2-sources-2.7.7.tar.gz
gzip -dc libxml2-sources-2.7.7.tar.gz | tar xvf -
cd libxml2-2.7.7
./configure --prefix=/home/ruben/lib
make
make install
wget http://xmlsoft.org/sources/libxslt-1.1.26.tar.gz
gzip -dc libxslt-1.1.26.tar.gz | tar xvf -
cd libxslt-1.1.26
./configure --prefix=/home/ruben/lib --with-libxml-prefix=/home/ruben/lib
make
make install
PATH=$PATH:/home/ruben/lib/bin/
pip install --user lxml
Of course replace /home/ruben/lib
by the folder where you want to install the libraries, and check the corresponding websites for newer versions
of the libraries.
This library is a python module, that reads a KAF or NAF file and parses it. It basically parses one KAF/NAF file and allows to access to all the layers through different methods and functions. This is one example of usage:
python
>>> from KafNafParserPy import KafNafParser
>>> my_parser = KafNafParser('myfile.kaf')
>>> for token_obj in my_parser.get_tokens():
>>> print 'Token id',token.get_id()
>>> print 'Token text',token.get_text()
>>>
>>> for term_obj in my_parser.get_terms():
>>> print 'Lemma',term_obj.get_lemma()
>>> print 'Ids:',term_obj.get_span().get_span_ids()
>>>
>>> for prop in my_paser.get_properties():
>>> print 'Id',prop.get_id()
>>> for reference in prop.get_references():
>>> for span_obj in reference: ##Iterator over Creference object
>>> print 'span ids',span_obj.get_span_ids()
You can find some examples of usage of this parser in the subfolder examples
.
The documentation can be generated automatically by running:
epydoc --config documentation.cfg
This will call to the external program epydoc (http://epydoc.sourceforge.net/) with the provided configuration file, and will create the HTML documents
for the API in the folder apidocs
. As said before the already generated documentation can be seen at http://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy
- Ruben Izquierdo Bevia
- [email protected]
- http://rubenizquierdobevia.com/
- Vrije University of Amsterdam
Sofware distributed under GPL.v3, see LICENSE file for details.