Skip to content

Latest commit

 

History

History
55 lines (47 loc) · 1.88 KB

README.md

File metadata and controls

55 lines (47 loc) · 1.88 KB

datagger

A Simple Dialog Act Tagger

This project intends to tag the dialog act given a discourse.

Introduction

The corpus we used is from loria.fr. The corpus is used for a French Learnig dialog system. You can check corpus/sample to have a feeling about how the data looks like.

Software Depends

To use datagger, you need to install:

Howto

You can simply run

python2 Corpusbuilder --help
usage: CorpusBuilder.py [-h] [--cv] [--path PATH] corpuspath {crf,sample}

positional arguments:
  corpuspath    The path to the corpus
  {crf,sample}  select what kind of corpus to generate. sample: to generate
                unlabled data. crf: to build training and test data for crf

optional arguments:
  -h, --help    show this help message and exit
  --cv          build cross_validation corpus
  --path PATH   place to put the generated corpus data

to check the usage of the script.

After you genrate the CRF++ data format. Simply use

crf_learn template train_data model`

you will get a model file. Use this model to tag test data, run

crf_test model test_data`

Evaluation`

To evaluate our result, you can use the scipt from conll phrase recognition task. http://www.cnts.ua.ac.be/conll2000/chunking/output.html To evaluate our result, run:

perl conlleval.pl -r -d '\t'` < result_you_got

` To check more information of this project, read our report.