|Sparse coded images from Caltech256|
Images are represented using "visterms", a combination of color and texture histograms.
|Gene expression time courses ("Impulse")|
76 time courses with 5-13 time points. Gene expression measured in yeast following changes in environment.
|NIPS 1-17 data|
Distribution of words in all NIPS papers from 1988 to 2003. Including joint distribution of words and authors. This extends a database prepared by Sam Roweis, that used papers from nips 1-12 that were OCR'ed by Yann Lecun. Word occurences were calculated from pdf or ps files mostly available on the NIPS web site. Data was processed with the help of Amir Globerson.
Distribution of words, documents and authors
Download the data in matlab (V6, R12) format (35Mb)
Data preprocessing description.
If you use this data please refer to "Euclidean Embedding of Co-occurrence Data. Amir Globerson, Gal Chechik, Fernando Pereira and Naftali Tishby, JMLR 8, 2007. bibtex
Raw text files