http://www.cs.cmu.edu/~bsettles/pub/law.ecml10.pdf

"Abstract. Most approaches to classifying media content assume a xed,
closed vocabulary of labels. In contrast, we advocate machine learning
approaches which take advantage of the millions of free-form tags obtain-
able via online crowd-sourcing platforms and social tagging websites. The
use of such open vocabularies presents learning challenges due to typo-
graphical errors, synonymy, and a potentially unbounded set of tag la-
bels. In this work, we present a new approach that organizes these noisy
tags into well-behaved semantic classes using topic modeling, and learn to
predict tags accurately using a mixture of topic classes. This method can
utilize an arbitrary open vocabulary of tags, reduces training time by 94%
compared to learning from these tags directly, and achieves comparable
performance for classi cation and superior performance for retrieval. We
also demonstrate that on open vocabulary tasks, human evaluations are
essential for measuring the true performance of tag classi ers, which tra-
ditional evaluation methods will consistently underestimate. We focus
on the domain of tagging music clips, and demonstrate our results using
data collected with a human computation game called TagATune."
This entry was posted in Music and tagged , . Bookmark the permalink.

Leave a Reply