NTCIR

NII Testbeds and Community for Information access Research
  • NTCIR Official site
  • Conference
  • Organizers
  • Data
  • Important Dates
 
Data
 
Test Collections - DATA
Test CollectionsSubmission ArchivesToolsUser AgreementsDetailed Table of Test Collections
NTCIR Test collections : IR & QA
Class Collection Task Documents Task data
Genre Filename Lang.
Year # of doc Size Topic/ Question Relevance
judge  
lang #
ACLIA NTCIR-7
ACLIA

(CCLQA/
IR for QA)
In Advanced Cross-Lingual Information Access (ACLIA), Complex Cross-Lingual Question Answering Task (CCLQA) and Information Retrieval for QA (IR for QA) Task are combined. For further details, please consult the columns of 'CLIR on News' and 'QA'.
CLIR on Scientific NTCIR-1 IR sci. abstract ntc1-je(A) JE 1988-
1997
339,483 577MB J 83 3
grades
ntc1-j(A) J 332,918 312MB
ntc1-e(A) E 187,080 218MB 60
TE *5 ntc1-tmrc(A) J 2,000 - - -
NTCIR-2 IR sci. abstract ntc2-j(A) J 1986-
1999
*2
400,248 600MB J
E
49 4
grades
ntc2-e(A) E 134,978 200MB

CLIR on News
CIRB010 IR News CIRB010(C) Ct 1998-
1999
132,220 132MB Ct
E
50 4
grades
NTCIR-3 CLIR IR News KEIB010(C) K 1994 66,146 74MB Ct
K
J
E
30 4
grades
CIRB011(C) Ct 1998-
1999
132,173 870MB Ct
K
J
50 4
grades
CIRB020(A) 249,508
Mainichi(B) J 220,078
EIRB010(C) E 10,204
Mainichi Daily(A) 12,723
NTCIR-4 CLIR IR News CIRB011(C) Ct 1998-
1999
132,173 ca.3GB Ct
K
J
E
60 4
grades
CIRB020(A) 249,203
Hankookilbo(A) K 149,921
Chosenilbo(A) 104,517
Mainichi(B) J 220,078
Yomiuri(B) 373,558
EIRB010(C) E 10,204
Mainichi Daily(A) 12,723
Korea Times(A) 19,599
Hong Kong Standard(A) 96,683
Xinhua(B) 208,167
NTCIR-5 CLIR IR News CIRB040r(A) Ct 2000-
2001
901,446 581.7MB Ct
K
J
E
50 4
grades
Hankookilbo(A) K 85,250 52.1MB
Chosenilbo(A) 135,124 88.7MB
Mainichi(B) J 199,681 118.8MB
Yomiuri(B) 658,719 343.3MB
Mainichi Daily(A) E 12,155 9.9MB
Korea Times(A) 30,530 25.3MB
Daily Yomiuri(B) 17,741 22.9MB
Xinhua(B) 198,624
NTCIR-6 CLIR IR News CIRB040r(A) Ct 2000-
2001
901,446 581.7MB Ct
K
J
E
50
(selected
from NTCIR-3,4)
4
grades
Hankookilbo(A) K 85,250 52.1MB
Chosenilbo(A) 135,124 88.7MB
Mainichi(B) J 199,681 118.8MB
Yomiuri(B) 658,719 343.3MB
NTCIR-7
ACLIA

(IR for QA)
IR News CIRB020(A) Ct 1998-
1999
249,508 320 MB C
J
E
EN-JA: 98
JA-JA: 98
EN-CS: 97
CS-CS: 97
EN-CT: 95
CT-CT: 95

3
grades
CIRB040r(A) 2000-
2001
901,446 582 MB
Lianhe Zaobao (A) Cs 1998-
2001
249,287 411 MB
Xinhua Chinese(B) 295,875 511 MB
Mainichi(B) J 419,759 544 MB
CLQA NTCIR-5 CLQA For further details about Cross-Lingual Question Answering, please consult the columns of 'QA'.
NTCIR-6 CLQA
OPINION NTCIR-6 OPINION IE/
analysis
News CIRB020(A) Ct 1998-
1999
249,508 788MB Ct
J
E
32
(selected
from
NTCIR
-3,-4,-5 CLIR)
843
*8
2
types,
3
metrics
CIRB040r(A) 2000-
2001
901,446
Mainichi(B) J 1998-
2001
419,759 766MB 490
*8
Yomiuri(B) 1998-
2001
1,034,699
Daily Yomiuri(B) E 2000-
2001
17,741 471.5MB 439
*8
Mainichi Daily(A) 1998-
2001
24,878
Korea Times(A) 2000-
2001
30,530
Hong Kong Standard(A) 1998-
1999
96,856
Xinhua(B) 1998-
2001
409,792 299MB
NTCIR-7
MOAT
IE/
analysis
News CIRB020(A) Ct 1998-
1999
249,508 320 MB Ct 17 246
*10
2
types,
3
metrics
CIRB040r(A) 2000-
2001
901,446 581.7MB
Xinhua Chinese(B) Cs 1998-
2001
295,875 511 MB Cs 16 271
*10
Lianhe Zaobao(A) 249,287 230MB
Mainichi(B) J 419,759 544 MB J 22 287
*10
Mainichi Daily(A) E 24,878 22.8MB E 17 167
*10
Korea Times(A) 50,129 45.7MB
Hong Kong Standard(A) 1998-
1999
96,683 252MB
Xinhua(B) 1998-
2001
406,791 229MB
Straits Times(A) - 250MB
Patent NTCIR-3 PATENT IR patent full kkh(A) *3 J 1998-
1999
697,262 18GB Ct
Cs
K
J
E
31 3
grades
abstract jsh(A) *3 1995-
1999
1,706,154 1,883MB
paj (A)*3 E 1,701,339 2,711MB
NTCIR-4 PATENT IR patent full Publication of unexamined patent application(A) J 1993-
1997
ca.
1,700,000
ca.45GB E Main:34,
Add:69
3
grades
abstract Patent Abstracts of Japan(PAJ)(A) E 1993-
1997
ca.
1,700,000
ca.2.2GB
NTCIR-5 PATENT IR/
classi
fication
patent full Publication of unexamined patent application(A) J 1993-
2002
3,496,252 94.5GB J
E
34+1189
in NRCIR-5,
added
349+1681
in NTCIR-6
3
grades
abstract Patent Abstracts of Japan(PAJ)(A) E 1993-
2002
3,496,252 ca.5GB
NTCIR-6 PATENT IR/
classi
fication
patent full Patent grant data published from USPTO(A) E 1993-
2002
1,315,470
52.6GB E 3221 3
grades
patent full Publication of unexamined patent application(A) J 1993-
2002
3,496,252 94.5GB J

Japanese Retrieval
2,908

Classifi
cation
21,606

4
grades
abstract Patent Abstracts of Japan(PAJ)(A) E 1993-
2002
3,496,252 ca.5GB E 1
grade
Patent Mining NTCIR-7
PATMN
Mining patent full Publication of unexamined patent application(A) J 1993-
2002
3,496,252 94.5GB J
E
Japanese/
Cross-
lingual
(E2J)
976
2
abstract Patent Abstracts of Japan(PAJ)(A) E 1993-
2002
3,496,252 ca.5GB
patent full Patent grant data published from USPTO(A) E 1993-
2002
1,315,470 52.6GB
sci. abstract ntc1-je(A) JE 1988-
1997
339,483 577MB English/
Cross-
lingual
(J2E)
976
2
ntc1-j(A) J 332,918 312MB
ntc1-e(A) E 187,080 218MB
ntc2-j(A) J 1986-
1999
*2
400,248 600MB
ntc2-e(A) E 134,978 200MB
Patent Trans
lation
NTCIR-7
PATMT
MT patent full Publication of unexamined patent application(A) J 1993-
2002
3,496,252 94.5GB J Intrinsic
1381
-
E
Patent grant data published from USPTO(A) E 1993-
2002
1,315,470 52.6GB Intrinsic
1381
-
E Extrinsic
124
2
levels

QA
NTCIR-3 QA QA News Mainichi(B) J 1998-
1999
220,078 260MB J *1 1200 exact answer
NTCIR-4 QA QA News Mainichi(B) J 1998-
1999
220,078 ca.
776MB
J *1 197 exact answer
199
Yomiuri(B) 373,558 251
NTCIR-5 CLQA QA News CIRB040r(A) C 2000-
2001
901,446 581.7MB C
J
E
smpl:300, test:200*6 3
grades
*7
Yomiuri(B) J 658,719 343.3MB
Daily Yomiuri(B) E 17,741 22.9MB
NTCIR-5 QA QA News Mainichi(B) J 2000-
2001
199,681 260MB J *1 50 series
(360Q)
graded

NTCIR-6 CLQA
QA News CIRB020(A) Ct 1998-
1999
249,203 320MB C
J
E
J-E/
J-J/
E-J:
200,
C-E/
C-C/
E-C/
E-E:
150
3
grades
*7
Mainichi(B) J 220,078 282MB
EIRB010(C) E 10,204 24.5MB
Mainichi Daily(A) 12,723 33.3MB
Korea Times(A) 19,599 55.8MB
Hong Kong Standard(A) 96,683 252MB
NTCIR-6 QA QA News Mainichi(B) J 1998-
2001
419,759 535MB J 100Q
(any kind of Q)
graded
(3
types,
4
levels)
NTCIR-7
ACLIA

(CCLQA)
QA News CIRB020(A) Ct 1998-
1999
249,508 320 MB C
J
E
EN-JA: 100
JA-JA: 100
EN-CS: 100
CS-CS: 100
EN-CT: 100
CT-CT: 100
Binary decision (system response conceptually containing
the nugget
or not)
CIRB040r(A) 2000-
2001
901,446 582 MB
Lianhe Zaobao (A) Cs 1998-
2001
249,287 411 MB
Xinhua Chinese(B) 295,875 511 MB
Mainichi(B) J 419,759 544 MB
WEB NTCIR-3 WEB IR Web (html/
text)
NW100G-01(A) m*4 crawled
in 2001
11,038,720 100GB J *1 47 4
grades
+
relative
NW10G-01(A) 1,445,466 10GB
NTCIR-4 WEB IR Web (html/
text)
NW100G-01(A) m*4 crawled
in 2001
11,038,720 100GB J *1 - 3
grades
NTCIR-5 WEB IR Web (html/
text)
NW1000G-04(A) m*4 crawled
in 2004
98,870,352 1.36TB J *1 269+847 3
grades
MuST
(Trend
Inform
ation)
NTCIR-6
MuST
IE/
analysis
News Mainichi(B) J 1998-
1999
220,078 260MB J 27 581
*9
-
NTCIR-7
MuST
IE/
analysis
News Mainichi(B) J 1998-
2001
419,759 535MB J 25
(8topics)
701
*9
-
Others available for future task - QA site on Web Yahoo!Q&Acorpus
(Chiebukuro)
(A)
J Apr.
2004
to Oct.
2005
- - - - -
News Singapore Press(A) Cs 1998-
2001

J:Japanese, E:English, C:Chinese (Ct:Traditional Chinese, Cs: Simplified Chinese), K:Korean;

*1: English translation is available
*2: gakkai subfiles: 1997-1999, kaken subfiles: 1986-1997
*3: kkh : Publication of unexamined patent application, jsh: Japanese abstract, paj: English translation of jsh
*4: m:multiple: almost Japanese or English (some in other languages)
*5: Term extraction/role analysis:
*6: 300+200 questions for C documents, and 300+200 questions for JE documents
*7: Right, unsupported, Wrong
*8: # of tagged Documents with annotations (# of sentences Ct: 11,907、J: 15,279、E: 8,356)
*9: # of tagged Documents with Trend informations
*10: # of tagged Documents with annotations (# of sentences Ct: 6,174, Cs: 5,301, J: 7,163, E: 4,711)

NTCIR Test collections : Summarization
collection task documents summaries
genre filename lang year # of doc types analysts total#
NTCIR-2 SUMM single doc news Mainichi(B) J 1994.1995.1998 180 doc 7 3 3780
NTCIR-2 TAO*10 Mainichi(B) 1998 1000 doc 2 1 2000
NTCIR-3 SUMM Mainichi(B) 1998-1999 60 docs 7 3 1260
multi doc 50 sets 2 3 300

J:Japanese

*10: Distribution of NTCIR-2 SUMM TAO (Text Summarization) is currently unavailable. We will announce through the ntcir Mailing list once it becomes available again.


(A) the document collections available from NII for research purpose
(B) the document collections available for task participants for free,
and available for research purpose use other than NTCIR participation from other party with fee
(C) the document collections available for task participants only
Last Modified:2009.07.29
ntc-admin