NTCIR

NII Testbeds and Community for Information access Research
  • NTCIR Official site
  • Conference
  • Organizers
  • Data
  • Important Dates
 

Patent Machine Translation Task at NTCIR-9

(PatentMT)


  • A Chinese to English subtask has been added
  • Human evaluations will be carried out
  • Parallel corpora consisting of 1 million Chinese-English and 3 million Japanese-English sentence pairs will be provided

Call for Participation (PDF)

Baseline systems

Motivation

Patent information is information that is important to society around the world. There is a large need for translations to understand patent information written in foreign languages and to apply for patents in foreign countries. Patents are one of the challenging domains for machine translation because patent sentences can be quite long and contain complex structures. We have organized a patent machine translation task (PatentMT) to address this significant practical need and to develop this challenging research further.

Goals

PatentMT is not competition-oriented, but the eventual goal is to foster cooperative work and scientific exchange. In this respect, the organizers propose a research task and an open experimental infrastructure for the scientific community working on machine translation research. The goals of PatentMT are as follows:
  • To develop challenging and significant practical research into patent machine translation.
  • To investigate the performance of state-of-the-art machine translation in terms of patent translations involving Japanese, English, and Chinese.
  • To compare the effects of different methods of patent translation by applying them to the same test data.
  • To create publicly available parallel corpora of patent documents and human evaluations of MT results for patent information processing research.
  • To drive machine translation research, which is an important technology for cross-lingual information access to understand information written in unknown languages.

Task

  • Subtasks:
     Subtasks  Parallel corpus 
     Chinese to English   1 million patent description sentence pairs 
     Japanese to English   3 million patent description sentence pairs 
     English to Japanese 
    Test data: 2,000 patent description sentences

  • Participants choose the subtasks in which they would like to participate.
  • Resources planned to be provided
    • Chinese to English subtask: A parallel corpus consisting of 1 million Chinese-English patent description sentence pairs, a large-scale monolingual patent corpus in English, and a test set of patent descriptions
    • Japanese to English subtask: A parallel corpus consisting of 3 million Japanese-English patent description sentence pairs, a large-scale monolingual patent corpus in English, and a test set of patent descriptions
    • English to Japanese subtask: A parallel corpus consisting of 3 million Japanese-English patent description sentence pairs, a large-scale monolingual patent corpus in Japanese, and a test set of patent descriptions
    Use of the data depends on contracts of user agreements at NTCIR-9.
  • Participants are requested to machine translate the test sets.
  • The submitted translation results will be evaluated through human evaluation and automatic evaluation. The primary evaluation is human evaluation. Human evaluation criteria will be adequacy and acceptability. In this task, acceptability is defined in Fig. 1. Evaluation of acceptability will be applied for some selected systems because of budgetary limitations. We will select systems basically using the following criteria: (i) For the inclusion of many types of methods and (ii) systems with high adequacy are given priority if the type of methods are the same.
  • Task definition (PDF) (updated Section 5 on 2011.4.21)
    The task definition is almost the same as that for the NTCIR-8 Patent Translation Task.
    The submission format of the translated results and the submission method of the translated results are shown in this document.
  • Case recovery and de-tokenization must be done on the translation results. (added 2011.4.21)
  • Participants are requested to submit a paper describing the MT system, the utilized resources, and their results using the provided test data, and are requested to present their papers at the workshop.

Fig. 1 Acceptability

Schedule

2010.12.20 2011.1.20: Registration due (extended)
2011.1.5: Training data release
2011.5.9: Test data release
2011.5.22: Translation results submission due (UTC)
2011.8.19: Evaluation results release
2011.9.20: MT system description due
2011.11.4: Camera-ready due
2011.12.6-9: NTCIR-9 workshop
If participants register by 2010.12.20 and NII/HKIED receives a user agreement by 2011.1.4, NII/HKIED will provide data to participants on 2011.1.5.
If participants register after 2010.12.20, NII/HKIED will provide data to participants after 2011.1.5 and after NII/HKIED receives a user agreement.
(NII will release the NTCIR-8 PATMT training data to the public for research use before 2011.1.5. Participants are allowed to use the data before the 2011.1.5 release date.)

Registration

Registration forms are available at the official NTCIR-9 page:HERE

Organizers

Chinese-English Side:
  • Benjamin K. Tsou (Hong Kong Institute of Education/City University of Hong Kong)
  • Kapo Chow (Hong Kong Institute of Education)
  • Bin Lu (City University of Hong Kong)
Japanese-English Side:
  • Isao Goto (NICT)
  • Eiichiro Sumita (NICT)

Contact

  ntc9adm-patentmt
   If you have any question or suggestion about the task, please feel free to send an email to the organizers.