| Class |
Collection |
Task |
Documents |
Task data |
| Genre |
Filename |
Lang.
|
Year |
# of doc |
Size |
Topic/ Question |
Relevance
judge |
| lang |
# |
| ACLIA |
NTCIR-7
ACLIA
(CCLQA/
IR for QA) |
In Advanced Cross-Lingual Information Access (ACLIA), Complex Cross-Lingual Question Answering Task (CCLQA) and Information Retrieval for QA (IR for QA) Task are combined. For further details, please consult the columns of 'CLIR on News' and 'QA'.
|
| CLIR on Scientific |
NTCIR-1 |
IR |
sci. abstract |
ntc1-je(A) |
JE |
1988-
1997 |
339,483 |
577MB |
J |
83 |
3
grades |
| ntc1-j(A) |
J |
332,918 |
312MB |
| ntc1-e(A) |
E |
187,080 |
218MB |
60 |
| TE *5 |
ntc1-tmrc(A) |
J |
2,000 |
- |
- |
- |
| NTCIR-2 |
IR |
sci. abstract |
ntc2-j(A) |
J |
1986-
1999
*2 |
400,248 |
600MB |
J
E |
49 |
4
grades |
| ntc2-e(A) |
E |
134,978 |
200MB |
CLIR on News |
CIRB010 |
IR |
News |
CIRB010(C) |
Ct |
1998-
1999 |
132,220 |
132MB |
Ct
E |
50 |
4
grades |
| NTCIR-3 CLIR |
IR |
News |
KEIB010(C) |
K |
1994 |
66,146 |
74MB |
Ct
K
J
E |
30 |
4
grades |
| CIRB011(C) |
Ct |
1998-
1999 |
132,173 |
870MB |
Ct
K
J
|
50 |
4
grades |
| CIRB020(A) |
249,508 |
| Mainichi(B) |
J |
220,078 |
| EIRB010(C) |
E |
10,204 |
| Mainichi Daily(A) |
12,723 |
| NTCIR-4 CLIR |
IR |
News |
CIRB011(C) |
Ct |
1998-
1999 |
132,173 |
ca.3GB |
Ct
K
J
E |
60 |
4
grades |
| CIRB020(A) |
249,203 |
| Hankookilbo(A) |
K |
149,921 |
| Chosenilbo(A) |
104,517 |
| Mainichi(B) |
J |
220,078 |
| Yomiuri(B) |
373,558 |
| EIRB010(C) |
E |
10,204 |
| Mainichi Daily(A) |
12,723 |
| Korea Times(A) |
19,599 |
| Hong Kong Standard(A) |
96,683 |
| Xinhua(B) |
208,167 |
| NTCIR-5 CLIR |
IR |
News |
CIRB040r(A) |
Ct |
2000-
2001 |
901,446 |
581.7MB |
Ct
K
J
E |
50 |
4
grades |
| Hankookilbo(A) |
K |
85,250 |
52.1MB |
| Chosenilbo(A) |
135,124 |
88.7MB |
| Mainichi(B) |
J |
199,681 |
118.8MB |
| Yomiuri(B) |
658,719 |
343.3MB |
| Mainichi Daily(A) |
E |
12,155 |
9.9MB |
| Korea Times(A) |
30,530 |
25.3MB |
| Daily Yomiuri(B) |
17,741 |
22.9MB |
| Xinhua(B) |
198,624 |
|
| NTCIR-6 CLIR |
IR |
News |
CIRB040r(A) |
Ct |
2000-
2001 |
901,446 |
581.7MB |
Ct
K
J
E |
50
(selected
from NTCIR-3,4) |
4
grades |
| Hankookilbo(A) |
K |
85,250 |
52.1MB |
| Chosenilbo(A) |
135,124 |
88.7MB |
| Mainichi(B) |
J |
199,681 |
118.8MB |
| Yomiuri(B) |
658,719 |
343.3MB |
NTCIR-7
ACLIA
(IR for QA) |
IR |
News |
CIRB020(A) |
Ct |
1998-
1999 |
249,508 |
320 MB |
C
J
E |
EN-JA: 98
JA-JA: 98
EN-CS: 97
CS-CS: 97
EN-CT: 95
CT-CT: 95
|
3
grades |
| CIRB040r(A) |
2000-
2001 |
901,446 |
582 MB |
| Lianhe Zaobao (A) |
Cs |
1998-
2001 |
249,287 |
411 MB |
| Xinhua Chinese(B) |
295,875 |
511 MB |
| Mainichi(B) |
J |
419,759 |
544 MB |
| CLQA |
NTCIR-5 CLQA |
For further details about Cross-Lingual Question Answering, please consult the columns of 'QA'. |
| NTCIR-6 CLQA |
| OPINION |
NTCIR-6 OPINION |
IE/
analysis |
News |
CIRB020(A) |
Ct |
1998-
1999 |
249,508 |
788MB |
Ct
J
E |
32
(selected
from
NTCIR
-3,-4,-5 CLIR) |
843
*8 |
2
types,
3
metrics |
| CIRB040r(A) |
2000-
2001 |
901,446 |
| Mainichi(B) |
J |
1998-
2001 |
419,759 |
766MB |
490
*8 |
| Yomiuri(B) |
1998-
2001 |
1,034,699 |
| Daily Yomiuri(B) |
E |
2000-
2001 |
17,741 |
471.5MB |
439
*8 |
| Mainichi Daily(A) |
1998-
2001 |
24,878 |
| Korea Times(A) |
2000-
2001 |
30,530 |
| Hong Kong Standard(A) |
1998-
1999 |
96,856 |
| Xinhua(B) |
1998-
2001 |
409,792 |
299MB |
NTCIR-7
MOAT |
IE/
analysis |
News |
CIRB020(A) |
Ct |
1998-
1999
|
249,508 |
320 MB |
Ct |
17 |
246
*10 |
2
types,
3
metrics |
| CIRB040r(A) |
2000-
2001 |
901,446 |
581.7MB |
| Xinhua Chinese(B) |
Cs |
1998-
2001 |
295,875 |
511 MB |
Cs |
16 |
271
*10 |
| Lianhe Zaobao(A) |
249,287 |
230MB |
| Mainichi(B) |
J |
419,759 |
544 MB |
J |
22 |
287
*10 |
| Mainichi Daily(A) |
E |
24,878 |
22.8MB |
E |
17 |
167
*10 |
| Korea Times(A) |
50,129 |
45.7MB |
| Hong Kong Standard(A) |
1998-
1999 |
96,683 |
252MB |
| Xinhua(B) |
1998-
2001 |
406,791 |
229MB |
| Straits Times(A) |
- |
250MB |
| Patent |
NTCIR-3 PATENT |
IR |
patent full |
kkh(A) *3 |
J |
1998-
1999 |
697,262 |
18GB |
Ct
Cs
K
J
E |
31 |
3
grades |
| abstract |
jsh(A) *3 |
1995-
1999 |
1,706,154 |
1,883MB |
| paj (A)*3 |
E |
1,701,339 |
2,711MB |
| NTCIR-4 PATENT |
IR |
patent full |
Publication of unexamined patent application(A) |
J |
1993-
1997 |
ca.
1,700,000 |
ca.45GB |
E |
Main:34,
Add:69 |
3
grades |
| abstract |
Patent Abstracts of Japan(PAJ)(A) |
E |
1993-
1997 |
ca.
1,700,000 |
ca.2.2GB |
| NTCIR-5 PATENT |
IR/
classi
fication |
patent full |
Publication of unexamined patent application(A) |
J |
1993-
2002 |
3,496,252 |
94.5GB |
J
E |
34+1189
in NRCIR-5,
added
349+1681
in NTCIR-6 |
3
grades |
| abstract |
Patent Abstracts of Japan(PAJ)(A) |
E |
1993-
2002 |
3,496,252 |
ca.5GB |
| NTCIR-6 PATENT |
IR/
classi
fication |
patent full |
Patent grant data published from USPTO(A) |
E |
1993-
2002 |
1,315,470
|
52.6GB |
E |
3221 |
3
grades |
| patent full |
Publication of unexamined patent application(A) |
J |
1993-
2002 |
3,496,252 |
94.5GB |
J |
Japanese Retrieval
2,908
Classifi
cation
21,606 |
4
grades |
| abstract |
Patent Abstracts of Japan(PAJ)(A) |
E |
1993-
2002 |
3,496,252 |
ca.5GB |
E |
1
grade |
| Patent Mining |
NTCIR-7
PATMN |
Mining |
patent full |
Publication of unexamined patent application(A) |
J |
1993-
2002 |
3,496,252 |
94.5GB |
J
E |
Japanese/
Cross-
lingual
(E2J)
976 |
2 |
| abstract |
Patent Abstracts of Japan(PAJ)(A) |
E |
1993-
2002 |
3,496,252 |
ca.5GB |
| patent full |
Patent grant data published from USPTO(A) |
E |
1993-
2002 |
1,315,470 |
52.6GB |
| sci. abstract |
ntc1-je(A) |
JE |
1988-
1997 |
339,483 |
577MB |
English/
Cross-
lingual
(J2E)
976 |
2 |
| ntc1-j(A) |
J |
332,918 |
312MB |
| ntc1-e(A) |
E |
187,080 |
218MB |
| ntc2-j(A) |
J |
1986-
1999
*2 |
400,248 |
600MB |
| ntc2-e(A) |
E |
134,978 |
200MB |
Patent Trans
lation |
NTCIR-7
PATMT |
MT |
patent full |
Publication of unexamined patent application(A) |
J |
1993-
2002 |
3,496,252 |
94.5GB |
J |
Intrinsic
1381 |
- |
| E |
| Patent grant data published from USPTO(A) |
E |
1993-
2002 |
1,315,470 |
52.6GB |
Intrinsic
1381 |
- |
| E |
Extrinsic
124 |
2
levels |
QA |
NTCIR-3 QA |
QA |
News |
Mainichi(B) |
J |
1998-
1999 |
220,078 |
260MB |
J *1 |
1200 |
exact answer |
| NTCIR-4 QA |
QA |
News |
Mainichi(B) |
J |
1998-
1999 |
220,078 |
ca.
776MB |
J *1 |
197 |
exact answer |
| 199 |
| Yomiuri(B) |
373,558 |
251 |
| NTCIR-5 CLQA |
QA |
News |
CIRB040r(A) |
C |
2000-
2001 |
901,446 |
581.7MB |
C
J
E |
smpl:300, test:200*6 |
3
grades
*7 |
| Yomiuri(B) |
J |
658,719 |
343.3MB |
| Daily Yomiuri(B) |
E |
17,741 |
22.9MB |
| NTCIR-5 QA |
QA |
News |
Mainichi(B) |
J |
2000-
2001 |
199,681 |
260MB |
J *1 |
50 series
(360Q) |
graded |
NTCIR-6 CLQA |
QA |
News |
CIRB020(A) |
Ct |
1998-
1999 |
249,203 |
320MB |
C
J
E |
J-E/
J-J/
E-J:
200,
C-E/
C-C/
E-C/
E-E:
150 |
3
grades
*7 |
| Mainichi(B) |
J |
220,078 |
282MB |
| EIRB010(C) |
E |
10,204 |
24.5MB |
| Mainichi Daily(A) |
12,723 |
33.3MB |
| Korea Times(A) |
19,599 |
55.8MB |
| Hong Kong Standard(A) |
96,683 |
252MB |
| NTCIR-6 QA |
QA |
News |
Mainichi(B) |
J |
1998-
2001 |
419,759 |
535MB |
J |
100Q
(any kind of Q) |
graded
(3
types,
4
levels) |
NTCIR-7
ACLIA
(CCLQA) |
QA |
News |
CIRB020(A) |
Ct |
1998-
1999 |
249,508 |
320 MB |
C
J
E |
EN-JA: 100
JA-JA: 100
EN-CS: 100
CS-CS: 100
EN-CT: 100
CT-CT: 100 |
Binary decision (system response conceptually containing
the nugget
or not) |
| CIRB040r(A) |
2000-
2001 |
901,446 |
582 MB |
| Lianhe Zaobao (A) |
Cs |
1998-
2001 |
249,287 |
411 MB |
| Xinhua Chinese(B) |
295,875 |
511 MB |
| Mainichi(B) |
J |
419,759 |
544 MB |
| WEB |
NTCIR-3 WEB |
IR |
Web (html/
text) |
NW100G-01(A) |
m*4 |
crawled
in 2001 |
11,038,720 |
100GB |
J *1 |
47 |
4
grades
+
relative |
| NW10G-01(A) |
1,445,466 |
10GB |
| NTCIR-4 WEB |
IR |
Web (html/
text) |
NW100G-01(A) |
m*4 |
crawled
in 2001 |
11,038,720 |
100GB |
J *1 |
- |
3
grades |
| NTCIR-5 WEB |
IR |
Web (html/
text) |
NW1000G-04(A) |
m*4 |
crawled
in 2004 |
98,870,352 |
1.36TB |
J *1 |
269+847 |
3
grades |
MuST
(Trend
Inform
ation) |
NTCIR-6
MuST |
IE/
analysis |
News |
Mainichi(B) |
J |
1998-
1999 |
220,078 |
260MB |
J |
27 |
581
*9 |
- |
NTCIR-7
MuST |
IE/
analysis |
News |
Mainichi(B) |
J |
1998-
2001 |
419,759 |
535MB |
J |
25
(8topics) |
701
*9 |
- |
| Others |
available for future task |
- |
QA site on Web |
Yahoo!Q&Acorpus
(Chiebukuro)(A) |
J |
Apr.
2004
to Oct.
2005 |
- |
- |
- |
- |
- |
| News |
Singapore Press(A) |
Cs |
1998-
2001 |
J:Japanese, E:English, C:Chinese (Ct:Traditional Chinese, Cs: Simplified Chinese), K:Korean; *1: English translation is available
*2: gakkai subfiles: 1997-1999, kaken subfiles: 1986-1997
*3: kkh : Publication of unexamined patent application, jsh: Japanese abstract, paj: English translation of jsh
*4: m:multiple: almost Japanese or English (some in other languages)
*5: Term extraction/role analysis:
*6: 300+200 questions for C documents, and 300+200 questions for JE documents
*7: Right, unsupported, Wrong
*8: # of tagged Documents with annotations (# of sentences Ct: 11,907、J: 15,279、E: 8,356)
*9: # of tagged Documents with Trend informations
*10: # of tagged Documents with annotations (# of sentences Ct: 6,174, Cs: 5,301, J: 7,163, E: 4,711) |