NTCIR

NII Testbeds and Community for Information access Research
  • NTCIR Official site
  • Conference
  • Organizers
  • Data
  • Important Dates
 

Baseline Systems for PatentMT at NTCIR-9

Tools

Tools used by the baseline systems:

For all subtasks

  • Moses: revision="3717"
  • GIZA++: giza-pp-v1.0.3
  • SRI LM: version 1.5.12
  • Additional Scripts: http://homepages.inf.ed.ac.uk/jschroe1/how-to/scripts.tgz
  • Installation: refer to the Moses web page.
    http://www.statmt.org/moses_steps.html

For Chinese segmentation

  • Stanford Chinese Segmenter: version 2008-05-21
    http://www-nlp.stanford.edu/downloads/segmenter.shtml
    * Using Chinese Penn Treebank (CTB) model

For Japanese segmentation

  • Mecab: version 0.98
    http://sourceforge.net/projects/mecab/files/
  • Dictionary for Mecab: mecab-ipadic-2.7.0-20070801.tar.gz
    http://sourceforge.net/projects/mecab/files/mecab-ipadic/
  • nkf: version 2.1.1
    http://sourceforge.jp/projects/nkf/downloads/48945/nkf-2.1.1.tar.gz/

    Data preparation

    System training and running



    Tuned configuration files of the baseline systems