| 分類 |
コレクション |
タスク |
文書データ |
タスクデータ |
| ジャンル |
ファイル名 |
言語 |
年度 |
文書数 |
サイズ |
課題/質問 |
適合判定 |
| 言語 |
# |
| ACLIA |
NTCIR-7
ACLIA
(CCLQA/
IR for QA) |
In Advanced Cross-Lingual Information Access is a (高度言語横断情報アクセス)は、Complex Cross-Lingual Question Answering Task(複合的言語横断質問応答タスク)とInformation Retrieval
for QA Task(質問応答向け情報検索タスク)を組み合わせたものです。詳細は、CLIR on News, QAの欄をご覧ください。 |
| CLIR on Scientific |
NTCIR-1 |
IR |
科学技術文献抄録 |
ntc1-je(A) |
JE |
1988-
1997 |
339,483 |
577MB |
J |
83 |
3
grades |
| ntc1-j(A) |
J |
332,918 |
312MB |
| ntc1-e(A) |
E |
187,080 |
218MB |
60 |
| TE*5 |
ntc1-tmrc(A) |
J |
2,000 |
- |
- |
- |
| NTCIR-2 |
IR |
科学技術文献抄録 |
ntc2-j(A) |
J |
1986-
1999
*2 |
400,248 |
600MB |
J
E |
49 |
4
grades |
| ntc2-e(A) |
E |
134,978 |
200MB |
| CLIR on News |
CIRB010 |
IR |
新聞 |
CIRB010(C) |
Ct |
1998-
1999 |
132,220 |
132MB |
Ct
E |
50 |
4
grades |
| NTCIR-3 CLIR |
IR |
新聞記事 |
KEIB010(C) |
K |
1994 |
66,146 |
74MB |
Ct
K
J
E |
30 |
4
grades |
| CIRB011(C) |
Ct |
1998-
1999 |
132,173 |
870MB |
Ct
K
J
E |
50 |
4
grades |
| CIRB020(A) |
249,508 |
| Mainichi(B) |
J |
220,078 |
| EIRB010(C) |
E |
10,204 |
| Mainichi Daily(A) |
12,723 |
| NTCIR-4 CLIR |
IR |
新聞記事 |
CIRB011(C) |
Ct |
1998-
1999 |
132,173 |
ca.3GB |
Ct
K
J
E |
60 |
4
grades |
| CIRB020(A) |
249,203 |
| Hankookilbo(A) |
K |
149,921 |
| Chosenilbo(A) |
104,517 |
| Mainichi(B) |
J |
220,078 |
| Yomiuri(B) |
373,558 |
| EIRB010(C) |
E |
10,204 |
| Mainichi Daily(A) |
12,723 |
| Korea Times(A) |
19,599 |
| Hong Kong Standard(A) |
96,683 |
| Xinhua(B) |
208,167 |
| NTCIR-5 CLIR |
IR |
新聞記事 |
CIRB040r(A) |
Ct |
2000-
2001 |
901,446 |
581.7MB |
Ct
K
J
E |
50 |
4
grades |
| Hankookilbo(A) |
K |
85,250 |
52.1MB |
| Chosenilbo(A) |
135,124 |
88.7MB |
| Mainichi(B) |
J |
199,681 |
118.8MB |
| Yomiuri(B) |
658,719 |
343.3MB |
| Mainichi Daily(A) |
E |
12,155 |
9.9MB |
| Korea Times(A) |
30,530 |
25.3MB |
| Daily Yomiuri(B) |
17,741 |
22.9MB |
| Xinhua(B) |
198,624 |
|
| NTCIR-6 CLIR |
IR |
新聞記事 |
CIRB040r(A) |
Ct |
2000-
2001 |
901,446 |
581.7MB |
Ct
K
J
E |
50
(selected
from NTCIR-3,4) |
4
grades |
| Hankookilbo(A) |
K |
85,250 |
52.1MB |
| Chosenilbo(A) |
135,124 |
88.7MB |
| Mainichi(B) |
J |
199,681 |
118.8MB |
| Yomiuri(B) |
658,719 |
343.3MB |
NTCIR-7
ACLIA
(IR
for QA) |
IR |
新聞記事 |
CIRB020(A) |
Ct |
1998-
1999 |
249,508 |
320 MB |
C
J
E |
EN-JA: 98
JA-JA: 98
EN-CS: 97
CS-CS: 97
EN-CT: 95
CT-CT: 95 |
3
grades |
| CIRB040r(A) |
2000-
2001 |
901,446 |
582 MB |
| Lianhe Zaobao (A) |
Cs |
1998-
2001 |
249,287 |
411 MB |
| Xinhua Chinese(B) |
295,875 |
511 MB |
| Mainichi(B) |
J |
419,759 |
544 MB |
| CLQA |
NTCIR-5 CLQA |
Cross-Lingual QA(言語横断質問応答)に関する詳細は、QAの欄をご覧ください。 |
| NTCIR-6 CLQA |
| OPINION |
NTCIR-6 OPINION |
IE/
analysis |
新聞記事 |
CIRB020(A) |
Ct |
1998-
1999 |
249,508 |
788MB |
Ct
J
E |
32
(selected
from NTCIR
-3,-4,-5 CLIR) |
843
*8 |
2
types,
3
metrics |
| CIRB040r(A) |
2000-
2001 |
901,446 |
| Mainichi(B) |
J |
1998-
2001 |
419,759 |
766MB |
490
*8 |
| Yomiuri(B) |
1,034,699 |
| Daily Yomiuri(B) |
E |
2000-
2001 |
17,741 |
471.5MB |
439
*8 |
| Mainichi Daily(A) |
1998-
2001 |
24,878 |
| Korea Times(A) |
2000-
2001 |
30,530 |
| Hong Kong Standard(A) |
1998-
1999 |
96,856 |
| Xinhua(B) |
1998-
2001 |
409,792 |
299MB |
NTCIR-7
MOAT |
IE/
analysis |
新聞記事 |
CIRB020(A) |
Ct |
1998-
1999
|
249,508 |
320 MB |
Ct |
17 |
246
*10 |
2
types,
3
metrics |
| CIRB040r(A) |
2000-
2001 |
901,446 |
581.7MB |
| Xinhua Chinese(B) |
Cs |
1998-
2001 |
295,875 |
511 MB |
Cs |
16 |
271
*10 |
| Lianhe Zaobao(A) |
249,287 |
230MB |
| Mainichi(B) |
J |
419,759 |
544 MB |
J |
22 |
287
*10 |
| Mainichi Daily(A) |
E |
24,878 |
22.8MB |
E |
17 |
167
*10 |
| Korea Times(A) |
50,129 |
45.7MB |
| Hong Kong Standard(A) |
1998-
1999 |
96,683 |
252MB |
| Xinhua(B) |
1998-
2001 |
406,791 |
229MB |
| Straits Times(A) |
- |
250MB |
| Patent |
NTCIR-3 PATENT |
IR |
特許全文 |
kkh(A) *3 |
J |
1998-
1999 |
697,262 |
18GB |
Ct
Cs
K
J
E |
31 |
3
grades |
| 特許抄録 |
jsh(A) *3 |
1995-
1999 |
1,706,154 |
1,883MB |
| paj (A)*3 |
E |
1,701,339 |
2,711MB |
| NTCIR-4 PATENT |
IR |
特許全文 |
Publication of unexamined patent application(A) |
J |
1993-
1997 |
ca.
1,700,000 |
ca.45GB |
E |
Main:34,
Add:69 |
3
grades |
| 特許抄録 |
Patent Abstracts of Japan(PAJ)(A) |
E |
1993-
1997 |
ca.
1,700,000 |
ca2.2GB |
| NTCIR-5 PATENT |
IR/
classi
fication |
特許全文 |
Publication of unexamined patent application(A) |
J |
1993-
2002 |
3,496,252 |
94.5GB |
J
E |
34+1189
in
NRCIR-5,
added
349+1681
in
NTCIR-6 |
3
grades |
| 特許抄録 |
Patent Abstracts of Japan(PAJ)(A) |
E |
1993-
2002 |
3,496,252 |
ca.5GB |
| NTCIR-6 PATENT |
IR/
classi
fication |
特許全文 |
Patent grant data published from USPTO(A) |
E |
1993-
2002 |
1,315,470 |
52.6GB |
E |
3221 |
3
grades |
| Publication of unexamined patent application(A) |
J |
1993-
2002 |
3,496,252 |
94.5GB |
J |
Japanese Retrieval
2,908 Classifi
cation
21,606 |
4
grades |
| Patent Abstracts of Japan(PAJ)(A) |
E |
1993-
2002 |
3,496,252 |
ca.5GB |
E |
1
grade |
| Patent Mining |
NTCIR-7
PATMN |
Mining |
特許全文 |
Publication of unexamined patent application(A) |
J |
1993-
2002 |
3,496,252 |
94.5GB |
J
E |
Japanese/
Cross-
lingual
(E2J)
976
|
2 |
| Patent Abstracts of Japan(PAJ)(A) |
E |
1993-
2002 |
3,496,252 |
ca.5GB |
| Patent grant data published from USPTO(A) |
E |
1993-
2002 |
1,315,470 |
52.6GB |
| 科学技術文献抄録 |
ntc1-je(A) |
JE |
1988-
1997 |
339,483 |
577MB |
English/
Cross-
lingual
(J2E)
976 |
2 |
| ntc1-j(A) |
J |
332,918 |
312MB |
| ntc1-e(A) |
E |
187,080 |
218MB |
| 科学技術文献抄録 |
ntc2-j(A) |
J |
1986-
1999
*2 |
400,248 |
600MB |
| ntc2-e(A) |
E |
134,978 |
200MB |
Patent Trans
lation |
NTCIR-7
PATMT |
MT |
特許全文 |
Publication of unexamined patent application(A) |
J |
1993-
2002 |
3,496,252 |
94.5GB |
J |
Intrinsic
1381
|
- |
| E |
Intrinsic
1381
|
| Patent grant data published from USPTO(A) |
E |
1993-
2002 |
1,315,470 |
52.6GB |
- |
| E |
Extrinsic
124 |
2
levels |
| QA |
NTCIR-3 QA |
QA |
新聞記事 |
Mainichi(B) |
J |
1998-
1999 |
220,078 |
260MB |
J *1 |
1200 |
exact answer |
| NTCIR-4 QA |
QA |
新聞記事 |
Mainichi(B) |
J |
1998-
1999 |
220,078 |
ca.
776MB |
J *1 |
197 |
exact answer |
| 199 |
| Yomiuri(B) |
373,558 |
251 |
| NTCIR-5 CLQA |
QA |
新聞記事 |
CIRB040r(A) |
C |
2000-
2001 |
901,446 |
581.7MB |
C
J
E |
smpl:300,
test:200*6 |
3
grades
*7 |
| Yomiuri(B) |
J |
658,719 |
343.3MB |
| Daily Yomiuri(B) |
E |
17,741 |
22.9MB |
| NTCIR-5 QA |
QA |
新聞記事 |
Mainichi(B) |
J |
2000-
2001 |
199,681 |
260MB |
J *1 |
50 series
(360Q) |
graded |
NTCIR-6 CLQA |
QA |
新聞記事 |
CIRB020(A) |
Ct |
1998-
1999 |
249,203 |
320MB |
C
J
E |
J-E/
J-J/
E-J:
200
C-E/
C-C/
E-C/
E-E:
150 |
3
grades
*7 |
| Mainichi(B) |
J |
220,078 |
282MB |
| EIRB010(C) |
E |
10,204 |
24.5MB |
| Mainichi Daily(A) |
12,723 |
33.3MB |
| Korea Times(A) |
19,599 |
55.8MB |
| Hong Kong Standard(A) |
96,683 |
252MB |
| NTCIR-6 QA |
QA |
新聞記事 |
Mainichi(B) |
J |
1998-
2001 |
419,759 |
535MB |
J |
100Q
(any kind
of Q) |
graded
(3
types,
4
levels) |
NTCIR-7
ACLIA
(CCLQA) |
QA |
新聞記事 |
CIRB020(A) |
Ct |
1998-
1999 |
249,508 |
320 MB |
C
J
E |
EN-JA: 100
JA-JA: 100
EN-CS: 100
CS-CS: 100
EN-CT: 100
CT-CT: 100 |
Binary decision (ナゲットの含有) |
| CIRB040r(A) |
2000-
200 |
901,446 |
582 MB |
| Lianhe Zaobao (A) |
Cs |
1998-
2001 |
249,287 |
411 MB |
| Xinhua Chinese(B) |
295,875 |
511 MB |
| Mainichi(B) |
J |
419,759 |
544 MB |
| WEB |
NTCIR-3 WEB |
IR |
Web (html/
text) |
NW100G-01(A) |
m*4 |
crawled
in
2001 |
11,038,720 |
100GB |
J *1 |
47 |
4
grades
+
relative |
| NW10G-01(A) |
1,445,466 |
10GB |
| NTCIR-4 WEB |
IR |
Web (html/
text) |
NW100G-01(A) |
m*4 |
crawled
in
2001 |
11,038,720 |
100GB |
J *1 |
- |
3
grades |
| NTCIR-5 WEB |
IR |
Web (html/
text) |
NW1000G-04(A) |
m*4 |
crawled
in
2004 |
98,870,352 |
1.36TB |
J *1 |
269+847 |
3
grades |
MuST
(Trend Inform
ation) |
NTCIR-6
MuST |
IE/
analysis |
新聞記事 |
Mainichi(B) |
J |
1998-
1999 |
220,078 |
260MB |
J |
27 |
581
*9 |
- |
NTCIR-7
MuST |
IE/
analysis |
新聞記事 |
Mainichi(B) |
J |
1998-
2001 |
419,759 |
535MB |
J |
25
(8topics) |
701
*9 |
- |
| Others |
available for future task |
- |
QA site on Web |
Yahoo!Q&Acorpus
(Chiebukuro)(A) |
J |
Apr.
2004
to Oct.
2005 |
- |
- |
- |
- |
- |
| News |
Singapore Press(A) |
Cs |
1998-
2001 |
J:日本語 E:英語 C:中国語 (Ct:繁体字 Cs: 簡体字) K:韓国語 *1: 英訳あり
*2: gakkai サブファイル: 1997-1999 kakenサブファイル: 1986-1997
*3: kkh : 未審査特許申請 jsh: 日本語抄録 paj: jsh英訳
*4: m:multiple 大部分は日本語または英語(一部他言語あり)
*5: 用語抽出/役割分析
*6: 中国語文書に対して300+200質問、日本語・英語文書に対して300+200質問
*7: Right, Unsupported, Wrong
*8: 意見情報をタグ付けした文書数(文の数は中国語: 11,907、日本語: 15,279、英語: 8,356)
*9: 動向情報をタグ付けした文書数
*10: 意見情報をタグ付けした文書数(文の数は中国語(繁体字): 6,174、中国語(簡体字): 5,301、日本語: 7,163、英語: 4,711) |