NTCIR

NII Testbeds and Community for Information access Research
  • NTCIR Official site
  • Conference
  • Organizers
  • Data
  • Important Dates
 
Data
 
Research Purpose Use of NTCIR Test Collections or Data Archive/ User Agreement
Test CollectionsSubmission ArchivesToolsUser AgreementsDetailed Table of Test Collections

The below are the test collections that have been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collections included in the test collections were provided to NII for used in NTCIR free of charge or for a fee. The providers of the document data kindly understand the importance of the test collection in the research on information access technologies and then granted the use of the data for research purpose. Please remember that the document data in the NTCIR test collections is copyrighted and has commercial value as data. It is important for our continued reliable and good relationship with the data producers/providers that we researchers must behave as a reliable partners and use the data only for research purpose under the user agreement and use them carefully not to violate any rights for them .

"Research Activities Report" and "Publication Report" should be submitted by the users of NTCIR Test Collections.

"Research Activities Report"
The form of Research Activities Report must be filled out and sent by
E-mail to ntc-secretariat

"Publication Report related to NTCIR"--> please refer to the page "To Publication Report related to NTCIR"and send by E-mail to ntc-bib

To obtain the NTCIR Collection

The followings are the procedures to obtain the test collections. The test collections and data available from NII are free of charge.

  • The application form of the test collection must be filled out and sent by E-mail to ntc-secretariat-->instructions
  • Depending on the types of the data set, either a user agreement (memorandum) or a formal application is required. Please refer the list below for the required documents.
    • User Agreement (memorandum on Permission to Use Test Collection)
      The user agreement form for each test collection that you would like to obtain must be filled out and sent by postal mail or courier to the address below. Please download and make two copies of the form in double-sided print. Signatures are needed on both agreement forms. After counter-signed by NII side, one copy of the form will be sent to you and one copy will be kept by the NII.-->instructions
    • Formal Application
      You can apply for different dataset by one application. One copy of the formal application must be downloaded, filled out and sent by postal mail or courier to the Address below.
      After review in the NII, the permission of use of the data will be sent to the applicant.
Terminate the Use

If you will terminate the use of the data, please notify NTCIR Project office by E-mail at ntc-secretariat. Then all the data and secondry data derived from them must be deleted. One copy of its proof must be downloaded, filled out and sent by postal mail or courier to the Address below.-->'Cancellation of the Licensing of Data and Deletion of Data'

Address

NTCIR project (Rm.1309)
    National Institute of Informatics
    2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
    101-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Mailing List

The release of the new test collections and correction information shall be announced through the ntcirMailing list

("NTCIR-8 Workshop" is here.)


All kind of the data for Research Purpose Use

  • NTCIR-1
    NTCIR-1(IR and Term Extraction/Role Analysis Test Collections)
    [ Detailed Table of Test Collections ]
    The IR Test collection includes (1) Document data (Author abstracts of the Academic Conference Paper Database (1988-1997) = author abstracts of the paper presented at the academic conference hosted by either of 65 academic societies in Japan. about 330,000 documents; more than half are English-Japanese paired), (2) 83 Search topics (Japanese), and (3) Relevance Judgements. The collection can be used for retrieval experiments of Japanese text retrieval and CLIR of search Either of English documents or Japanese-English documents by Japanese topics. The Term Extraction Test collection includes tagged corpus using the 2000 Japanese documents selected from the above IR test collection. The whole test collection is available for research purpose use from NII.
    ・Application Form [ txt ]
    ・User agreement form [ PDF ]
    ・Readme for the CD-ROM [ txt ]
  • NTCIR-2
    NTCIR-2(IR Test Collection)
    [ Detailed Table of Test Collections ]
    The collection includes (1) Document data (Author abstracts of the Academic Conference Paper Database (1997-1999) and Grant Reports (1988-1997) = about 400,000 Japanese and 130,000 English documents), (2) 49 Search topics (Japanese and English), and (3) Relevance Judgements. The whole test collection is available for research purpose use from NII For experiments, the document data must be used with those of the NTCIR-1. Relevance judgments were done of the merged database of NTCIR-1 and NTCIR-2. To merge document collections, the document IDs in the NTCIR-1 must be converted using the script included in the NTCIR-2 CD-ROM. At the Second NTCIR Workshop, segmented data, in which the whole document data were segmented into terms (short units as well as longer units) using the standard software for segmentation in the year of 2000. Those who are interested in the segmented data, please contact ntc-secretariat.
    ・Application Form [ txt ]
    User agreement form
    ・Readme for the CD-ROM [ txt ]
    NOTE: To display and print the English manual (in PDF form) of NTCIR-2 CD-ROM, you need to download and install Acrobat Reader 4.0 Asian Font Pack (Japanese) from ">http://www.adobe.com/products/acrobat/cjkfontpack.html.

    NTCIR-2 SUMM(Text Summarization Test Collection)
    [ Detailed Table of Test Collections ]
    The collection includes (1) Document data (Japanese newspaper articles Mainichi Newspaper (1994, 1995, 1998), and (2) Model Summaries (for each of 180 documents, 7 types of single document summaries prepared in different length by different strategies were prepared by 3 analysts). The Summaries are available from NII. The document data is available from Mainichi Newspaper Co..

    Topics and Relevance judgments
    ・Application Form [ txt ]
    ・Formal Application [ PDF ]
    ・Readme for the data set [ txt ]

    NTCIR-2 SUMM TAO(Text Summarization)
    [ Detailed Table of Test Collections ]
    Distribution of NTCIR-2 SUMM TAO (Text Summarization) is currently unavailable. We will announce through the ntcir Mailing listonce it becomes available again.
  • NTCIR-3
    NTCIR-3 CLIR(IR/CLIR Test Collection)
    [ Detailed Table of Test Collections ]
    The collection includes (1) Document data (Mainichi Newspaper 1998-1999 (Japanese), CIRB011+CIRB020 (Chinese News articles publish in Taiwan in 1998-1999), Mainichi Daily 1998-1999 (English Newspaper published in Japan), EIRB010 (English News articles published in Taiwan in 1998-1999, and Korean Economic Daily 1994 (Korean Newspaper), (2) 50 Search topics for 1998-1999 Collections and 30 topics for 1994 collections (Chinese, Korean, Japanese and English), and (3) Relevance Judgements. The Topics and Relevance Judgments, Mainichi Daily (English), CIRB020(Chinese) are available from NII. The document data is re-used in NTCIR-4 CLIRas well. The Japanese document data is available from Mainichi Newspaper Co.Other document data is available NTCIR Workshop participants only. Please notice that topics and relevance judgments usable for retrieval experiments vary according to the document data set to be retrieved. For details, please consult README.

    If you will obtain Topics and Relevance Judgments only
    ・Application Form [ txt ]
    ・Formal Application [ PDF ]
    ・Readme for the data set The terms of use [ PDF ]
    If you will obtain the Test Collection (Document Data and Topics/Relevance Judgments)
    ・Application Form [ txt ]
    ・User agreement form [ txt ]
    ・README for NTCIR-3 CLIR [ dry run ] [ fomal run ]

    NTCIR-3 PATENT(IR Test Collection)
    [ Detailed Table of Test Collections ]
    # The collection includes (1) Document data (Japanese Patent Application fulltext 1998-1999 JAPIO Japanese abstracts (1995-1999) and PAJ English Abstract (1995-1999), (2) 30 Search topics (Japanese and translation to Traditional Chinese, Simplified Chinese, Korean and English), and (3) Relevance Judgements. JAPIO Abstract and PAJ Abstracts are exactly translated pairs. Document sizes are 18GB for fulltext and 4GB for Abstracts. NTCIR-4 PATENT used Patent Application fulltext 1993-2002 and PAJ 1993-2002 but it includes small number of inconsistent document data. Each topic for NTCIR-3 PATENT includes a related newspaper article, and the collection is usable for Cross-Genre experiments in which patents were retrieved by a newspaper clip as well as ordinary ad hoc retrieval of patents by topics. For CLIR experiments, using JAPIO abstracts and PAJ abstracts of 1995-1997 only to extract translation knowledge is strongly recommended. The whole test collection is available for research purpose use from NII.

    retrieval task test collection
    ・Application Form [ txt ]
    ・User agreement form [ PDF ]
    ・README for NTCIR-3 PATENT [ PDF ]

    QAC : NTCIR-3 QA: Question Answering Test Collection
    [ Detailed Table of Test Collections ]
    The collection includes (1) Document data (Mainichi Newspaper 1998-1999 (Japanese)), (2) about 1200 questions (Japanese and English translation), and (3) Answers. The Questions and Answers are available from NII. The document data is re-used in NTCIR-4 QACas well. The document data is available from Mainichi Newspaper Co.

    Questions and Answers Data:
    ・Application Form [ txt ]
    ・Formal Application [ PDF ]
    ・The terms of use [ PDF ]
    ・Notices [ PDF ]
    ・README for NTCIR-3 QA [ txt ]

    TSC : NTCIR-3 SUMM: (Text Summarization Test Collection)
    [ Detailed Table of Test Collections ]
    The collection includes (1) Document data (Japanese newspaper articles Mainichi Newspaper (1998-1999), and (2) Model Summaries. Summary data consists of (2i) Single document summaries (Each of 60 documents, 7 types of single document summaries prepared in different length by different strategies were prepared by 3 analysts) and (2ii) Multi-document summaries (Each of 50 document collections, 2 types of length of summaries were prepared by 3 analysis. The topics of the document collections were given). The Summaries are available from NII. The document data is available from Mainichi Newspaper Co.

    Summaries
    ・Application Form [ txt ]
    ・Formal Application [ PDF ]
    ・The terms of use [ PDF ]
    Notices : [ PDF ]
    ・README for NTCIR-3 TSC [ dry run ] [ fomal run ]

    NTCIR-3 WEB(Web Retrieval Test Collection)
    [ Detailed Table of Test Collections ]
    Distribution of NTCIR-3 WEB (Web Retrieval Test Collection) is currently unavailable. We will announce through the ntcir Mailing listonce it becomes available again.

    * The collection includes (1) Document data (html and plain-text files mainly crawled from ".jp" domain. Most of them are written in Japanese or English, but some are in other languages. The size is 100GB), (2) 47 search topics (Japanese and English translation), and (3) Relevance judgements (The "One-click distance model" or "Page-unit document model" are used in relevance judgements.) The whole test collection is available for research purpose use from the National Institute of Informatics (NII). The separate application for the document data ("NW100G-01") and for the topics and relevance judgments are needed.
    * (The former restriction, under which the users were permitted to access and process the Document data only in the "Open Laboratory", has been abolished.)

    Document data ("NW100G-01")
    ・Application Form [ txt ]
    ・User agreement form [ PDF ]
    ・Data contents: Refer to Section 3 of this paper.
    Topics and Relevance judgments
    ・Application Form [ txt ]
    ・Formal Application [ PDF ]
    ・The terms of use [ PDF ]
    ・README for NTCIR-3 WEB Topics and Relevance judgments data of main
     tasks [ txt ] and of Speech-Driven Retrieval Sub-task [ txt ]
The current NTCIR Workshop is here:
http://ntcir.nii.ac.jp/ntcir8/
Updated on:2009.07.29
ntc-admin