There will be two types of assessments: automatic assessment using the Wikipedia ground truth (existing cross-lingual links); and manual assessment done by human assessors. For the latter, all submissions will be pooled and a GUI tool for efficient assessment will be used. In manual assessment, either the anchor candidate or the target link could be identified relevant (or non-relevant). Once the anchor candidate is assessed as non-relevant, all anchors and associated links inside this anchor will become non-relevant. After the assessment, the performance of cross-lingual link discovery system then will be evaluated using Precision, Recall and Mean Average Precision metrics.
The Wikipedia ground truth set of links is derived from the existing Wikipedia pages. For instance, if the English topic page is “Solar Eclipse” then we define the ground truth set of Chinese links as the set of links out of the Chinese Solar Eclipse page日食to other pages in the Chinese collection. Similarly, if any English Wikipedia page linked by the “Solar Eclipse” English page has a counterpart in the Chinese Wikipedia, such a link also becomes part of the ground truth. For the purpose of evaluation we make the assumption that a good CLLD system will be able to find the same set of Chinese language links starting from the orphaned English text. This may not be very precise—for instance the two pages may not necessarily be exact translations of each other. However, this is likely to be sufficient to provide a good set of useful links.
EVALUATION PROGRAM: A GUI program is used for computing performance scores of different CLLD methods or systems.
With this tool, a comparative plot for different methods or systems can also be generated.
To download this program for training evaluation, click here.