Monday, April 25, 2005

Classification Performance Measures

The Li,Ogihara paper "Detecting Emotions in Music" mentions several criteria for evaluation of their classifier. Here is a short summary of the terms they use.


General:
The experimental evaluation of a classifier usually measures its ability to take the right classification decisions. After a classifier is constructed using a training set, the effectiveness is evaluated using a test set.
The following counts are computed for each category i:
–TPi: true positives
TP w.r.t. category ci is the set of documents that both the classifier and the previous judgments (as recorded in the test set) classify under ci
–FPi: false positives
FP w.r.t. category ci is the set of documents that the classifier classifies under ci, but the test set indicates that they do not belong to ci
–TNi: true negatives
TN w.r.t. ci is when both the classifier and the test set agree that the documents in TNi do not belong to ci
–FNi: false negatives
FN w.r.t. ci is when the classifier does not classify the documents in FNi under ci, but the test set indicates that they should be classified under ci

Precision: TPi / (TPi+FPi)
Recall: TPi / (TPi + FNi)

See also http://www.hsl.creighton.edu/hsl/Searching/Recall-Precision.html


A classifier should be evaluated by means of a measure which combines recall and precision.

Example: The trivial acceptor (each document is classified under each category) has a recall = 1. In this case, precision would usually be very low.

Some combined measures:
–the breakeven point: the value where precision equals recall
–F1 measure: 2Pres.*Rec. / (Pres.+Rec.)

Example:
Breakeven point of a classifier is always less or equal than its F1 value.
For the trivial acceptor, Prec. -> 0 and Rec, = 1, F1 -> 0

0 Comments:

Post a Comment

<< Home