|
query categories
|
Henning Müller
|
Oct 19, 2001 02:48 PDT
|
Dear Bechathletes,
here is a list of possible query categories that we could have for the
Bechathlon. I already sent this list a while ago, but I was told that
topica had problems with attachments and so I send it as text again,
now.
Please comment on which other categories you can imagine to evaluate
retrieval systems. This is important for creating a general benchmarking
harness.
I think for this year's benchmark we should concentrate on presentating
a framework for query by example and the evaluation of relevance
feedback of the participating systems.
None of the performance measures is fixed yet. I think we should start
out with a larger number of performance measures and then compare them
to find out measures that contain differing information about a system.
Any suggestions for other performance measures?
Henning
--
1.) Looking for a specific image
1.1) Looking if the exact same image is in the database
goal: How fast can a system find this out?
measures: Response time for a correct answer
Accuracy of the reply, number of correct answers
1.2) Looking if the query image is part of an image in the database
goal: How quickly does a system find part of an image and with
which accuracy
where accuracy might be more important than time
measures: Response time for a correct answer
Accuracy of the reply, positions of the relevant images
1.3) Looking if a geometrically altered image is part of an image in the
database
goal: How quickly does a system find part of an image and with
which accuracy
where accuracy might be more important than time
measures: Response time for a correct answer
Accuracy of the reply, positions of the relevant images
1.4) Looking if a compressed version of an image is in the database (ie.
strong JPEG compression)
goal: How quickly does a system find a compressed image and with
which accuracy
measures: Response time for a correct answer
Accuracy of the reply, positions of the relevant images
2.) Looking for a number of similar images
2.1) Query by example with known groundtruth
goal: Find images that are relevant for a certain query image
measures: normalized average rank (see BIRDS-I) as a leading
indicator
precision/recall graph
precision and recall at certain important cutoff points
rank of the first relevant image other than the query
image
average rank
primary recall
2.2) evaluation of positive and/or negative feedback
goal: How well can the results be improved with feedback, how many
steps of feedback
measures: Possibly the measure of secondary recall etc, proposed by
C. Leung
can the same measures be used as for the first query step
to have
a comparison between the two?
2.3) how well can a system adapt the output for the same starting image
but with different ground truth sets
goal: How well can the system adapt the output to the need of
different users?
measures: the same measures as before but with different relevance
sets
Can we get different relevance sets from the ground
truth?
Can we use the same measures as stated before and average
them over the different
relevance sets?
3.) Target search (or called image browsing), the image searched for is
not taken as an input
3.1) How quickly can an image be found while browsing
goal: Find an image as quickly as possible
measure: Number of images that have to be viewed before the correct
one is found
4.) Application
4.1) Inserting an image into the database
goal: time it takes to insert an image
measures: time
4.2) Inserting an image into the database and find a known image similar
to this one
goal: time it takes to insert an image and how accurate the
response is
measures: time and accuracy
5.) Looking for a sketch of an image (incomplete information)
5.1) How well can a sketch of an image be found?
goal: speed and accuracy of the reply
measures: time and accuracy
6.) Tests where two systems are explicitly compared
see the article of A. Dimai at Visual 99
7.) Tests for special application areas such as trademarks or medical
imaging
are the measures really different or can the same measure be used as
before
just applied to a different set of groundtruth and images
8.) Measure the scalability of a CBIR system
8.1) Scalability with respect to a large collection size
(10,000;100,000;1,000,000 images)
goal: Measure the time it takes with collections of different
sizes
to be able to interpolate the response time for even larger
collection sizes
measures: time change with respect to the collection size
9.) Evaluation of CBIR interfaces
goal: Find the most efficient user interface for a certain task
measures:
--
----------------------------------------------------------
Henning Mueller, Computer Vision Group
Computer Science Department, University of Geneva
24, rue du General Dufour, CH-1211 Geneva 4, SWITZERLAND
Phone : +41(22)705 7633; fax: +41(22)705 7780
Henning.-@cui.unige.ch
----------------------------------------------------------
|
|
 |
|