NTCIR

NII Testbeds and Community for Information access Research
  • NTCIR Official site
  • Conference
  • Organizers
  • Data
  • Important Dates
 

About NTCIR

About NTCIR
 
Overview
Project Overview

The NTCIR Workshop is a series of evaluation workshops designed to enhance research in Information Access (IA) technologies including information retrieval, question answering, text summarization, extraction, etc. It was co-sponcered by Japan Society for Promotion of Science (JSPS) as part of JSPS "Research for Future" Program" and National Center for Science Information Systems (NACSIS) since 1997, by JSPS and Research Center for Information Resources at National Institute of Informatics (RCIR/NII,) in FY 2000, and by MEXT Grant-in-Aid for Scientific Research on Priority Areas of "Informatics" (#13224087) and RCIR/NII in and after FY2001.

The aims are;

  1. to encourage research in Information Access technologies by providing large-scale test collections reusable for experiments and a common evaluation infrastructure allowing cross-system comparisons   
  2. to provide a forum for research groups interested in cross-system comparison and exchanging research ideas in an informal atmosphere   
  3. to investigate evaluation methods of Information Access techniques and methods for constructing a large-scale data set reusable for experiments. 

An evaluation workshop usually provides test collections (data sets usable for experiments) and unified evaluation procedures for experiment results. Each participating group conducts research and experiments using the common data provided by the NTCIR organizer with various approaches. The importance of reusable large-scale standard test collections in IA research has been widely recognized and an evaluation workshop is now recognized as a new style of active research project that facilitates research by providing the data and a forum for research idea exchange and technology transfer.

For the First NTCIR Workshop, the process was started from November, 1998, and the Workshop meeting was held on August 30 - September 1, 1999, at KKR Hotel, Tokyo. Twenty-eight groups from six countries conducted the tasks and submitted the results for the first workshop. For the Second Workshop, the process was started from June 2000 and the meeting was held on March 7-9, 2001, NII, Tokyo and forty-six groups from eight countries have registered for the tasks and 36 groups conducted and submitted the results to one or more tasks. The process of the Third NTCIR Workshop started from October 2001 and the meeting was held on October 8-10, 2002, NII, Tokyo and sixty-five groups from nine countries submitted the results.

From the beginning of the NTCIR Project, We have looked at both traditional laboratory-typed IR system testing and evaluation of more challenging technologies. For the laboratory-typed testing, we have placed emphasis on (1) information retrieval (IR) with Japanese or other Asian languages and (2) cross-lingual information retrieval. For the challenging issues, (3) shift from document retrieval to "information" retrieval and technologies to utilizing information in the documents, and (4) investigation for realistic evaluation, including evaluation methods for summarization, multigrade relevance judgments and single-numbered averageable measures for such judgments, evaluation methods suitable for retrieval and processing of particular document-genre and its usage of the user group of the genre and so on.

The test collection constructed, tasks, participants, sponcership for the previous NTCIR workshops are as follows;

Table 1. Tasks, Collections and Participants of the Previous Workshops

Work-
shop
PeriodTaskTest
Collection
#of
Parting
groups
#Countries
of Parti-
cipants
Sponcer
main categogysubtask
1Nov 1999
- Sept 1999
Ad Hoc IRJ-JENTCIR-136JSPS+
NACSIS
1828
Cross Lingual IRE-J103
Term RecognitionTerm Extract93
Roll Analysis
2June 2000
- Mar 2001
Chinese Text
Retrieval
CHIR(C-C)CIRB010113658JSPS+
RCIR/NII
ECIR(E-C)
Japanese and
English IR
monolingual
IR (J-J, E-E)
NTCIR-2255
CLIR (J-E,E-J,
J-JE,E-JE)
Text
Summarization
Intrinsic-ExtractNTCIR-2 SUMM91
Intrinsic-Free
Extrinsic-IR task
3October 2001
- October 2002
Cross-Lingual
Information
Retrieval
Single Language(C-C, E-E, J-J, K-K)NTCIR-3CLIR226589MEXT+
RCIR/NII
Bilingual CLIR (x-C, x-J, x-K)144
Multilingual CLIR (x-CEJ)74
Patent RetrievalCross-Genre RetirievalNTCIR-3 PAT83
Search Question Retrieval, CLIR63
Optional Task21
Question
Answering
5 possible answersNTCIR-3 QAC172
Only One Set of All the Answers131
Series of Questions61
Text
Summarization
Single Document SummarizationNTCIR-3 TSC81
Multiple Document Sumamrization91
WEBSurvey Retrieval:Topic RetrievalNTCIR-3 WEB72
Survey Retrieval:Search by Document21
Target Retrieval71
Optional Task: Output Classification01
Optional Task: Speach Driven Retrieval11

Table 2.The test collections constructed (and made available) or will be constructed through NTCIR Workshops

CollectionTaskDocumentTopic/SummResearch purpose use
typeLangLang
NTCIR-1IRScientificJa+EnJaYes
CIRB010IRNewspaper'98-9ChCh+En(participants only)
NTCIR-2IRScientificJa+EnJa+EnYes
NTCIR-2 SUMMSummarizationNewpaper'94-5,98JaJaYes <*>
NTCIR-2TAOSummarizationNewspaperJaJaYes <*>
NTCIR-3 CLIRIRNewspaper '98-99CHtr+JA+ENCHtr+JA+
EN+KO
Yes <*>,<*2>
Newspaper '94KOCHtr+JA+
EN+KO
(Participant only)
NTCIR-3 PATENTIRPatent '98-99
+Abstract '95-99
JA (Fulltext)
JA+EN (Abst)
JA+EN+CHtr+
CHsm+KO
Yes
NTCIR-3 QAQANewspaper '98-99JAJA(+EN)Yes <*>
NTCIR-3 SUMMSummarizationNewspaper '98-99JAJAYes <*>
NTCIR-3 WEBIRhtmlmultipule
languages <*3>
JA+(EN)Yes
NTCIR-4 CLIRIRNewspaper '98-99CHtr+JA+KO+EN
NTCIR-4 PATENTIRPatent 1993-2002
+Abstract 1993-2002
JA (Fulltext)
EN (Abst)
NTCIR-4 QAQANewspaper '98-99(2 types)JA
NTCIR-4 SUMMSummarizationNewspapaer '98-99 (2 types)JA
NTCIR-4 WEBIRhtmlmultipule
languages <*3>

JA: Japanese, EN: English, CH: Chinese (tr: traditional, sm: simplified), KO: Korean

* Documents are avaible for research purpose use from Nichigai Associates, Co. (for Japanese users) or MAINICHI International, Inc. (for international users).
*2: Chinese Document Collections, CIRB011, CIRB011, CIRB020 are available for participants only. The contents of the CIRB010 and CIRB011 are the same but the format is slightly different.
*3: almost Japanese and English, (some other languages)

The details of the NTCIR-4 Test Collections are available HERE