A: All SDR retrieval must be submitted to NIST in the standard
TREC ret format
as follows :
The ret format is a space delimited ASCII table :
Here is a sample of a ret file:
23 Q0 19980104_1130_1200_CNN_HDL.0034 1 4238 ibm-cr-att-s1
23 Q0 19980105_1800_1830_ABC_WNT.0143 2 4223 ibm-cr-att-s1
23 Q0 19980105_1130_1200_CNN_HDL.1120 3 4207 ibm-cr-att-s1
23 Q0 19980515_1630_1700_CNN_HDL.0749 4 4194 ibm-cr-att-s1
23 Q0 19980303_1600_1700_VOA_WRP.0061 5 4189 ibm-cr-att-s1
...
For the Unknown Story Boundary condition,
the ret file must be generated by UIDmatch.pl.
The input format for UIDmatch.pl is basically the same as ret. Only the StoryID field is replaced by a timeID
TimeId = <episodeID>:<Time-in-seconds.hundredths>
Example : 19980104_1130_1200_CNN_HDL:34.14
A: Different tools are available, as follows:
trec_eval is the NIST IR scoring package used for SDR. It can be found at
ftp://ftp.cs.cornell.edu/pub/smart/.
UIDmatch.pl is a perl script used to match timeID generated
for the Unknown Story Boundary condition to storyID
that can be scored by trec_eval.
UIDmatch.pl can be found here.
The procedure for implementing scoring for this condition is as follows:
Example : trec_eval -q SDR99.qrels output.ret
Note: the SDR99.qrels file will be made available at scoring time
The procedure for implementing scoring for this condition is as follows:
UIDmatch.pl from our ftp server.
UIDmatch.pl and set the $ndxdir variable to the path of the directory
where all the index files are located on your system.timeID to storyID using UIDmatch.pl.
UIDmatch my-ret-file.ret > my-mapped-ret-file.ret