|
Baseline Systems for PatentMT at NTCIR-9ToolsTools used by the baseline systems:
For all subtasks- Moses: revision="3717"
- GIZA++: giza-pp-v1.0.3
- SRI LM: version 1.5.12
- Additional Scripts: http://homepages.inf.ed.ac.uk/jschroe1/how-to/scripts.tgz
- Installation: refer to the Moses web page.
http://www.statmt.org/moses_steps.html
For Chinese segmentation- Stanford Chinese Segmenter: version 2008-05-21
http://www-nlp.stanford.edu/downloads/segmenter.shtml
* Using Chinese Penn Treebank (CTB) model
For Japanese segmentation- Mecab: version 0.98
http://sourceforge.net/projects/mecab/files/
- Dictionary for Mecab: mecab-ipadic-2.7.0-20070801.tar.gz
http://sourceforge.net/projects/mecab/files/mecab-ipadic/
- nkf: version 2.1.1
http://sourceforge.jp/projects/nkf/downloads/48945/nkf-2.1.1.tar.gz/
Data preparation
System training and running
Tuned configuration files of the baseline systems
|
|
|
|