January 2011. Krzysztof Gorgolewski
February 2012. Victor Hernandez-Urbina
School of Informatics, University of Edinburgh.
wget http://mallet.cs.umass.edu/dist/mallet-2.0.6.tar.gz tar zxvf mallet-2.0.6.tar.gz alias mallet={write here the path where you extracted the file}/mallet-2.0.6/bin/malletAlso have a look the the MALLET documentation realted to topic analysis.
mallet import-file --input imdb-reviews.txt --output imdb-reviews.mallet --keep-sequence
mallet train-topics --input imdb-reviews.mallet --num-topics 50 --inferencer-filename inferencer.mallet --num-iterations 100 --output-topic-keys imdb-reviews-topics.txt
mallet import-file --input new_sample_review.txt --output new_sample_review.mallet --keep-sequence --remove-stopwords --use-pipe-from imdb-reviews.mallet
mallet infer-topics --input new_sample_review.mallet --inferencer inferencer.mallet --output-doc-topics inferred_topics.txtThe results will be saved in inferred_topics.txt.
install.packages("lsa") library(lsa)When the system ask you if you would like to create a personal library for R, answer yes.
matrix<-textmatrix("docs/", stopwords=c("the","a","an","in","of","for","to","and"))And then, inspect the contents of this new variable. (Don't forget to specify -in the above command- the name of the folder in which you uncompressed the files!)
LSAspace<-lsa(matrix,dims=dimcalc_raw())You should take a look at what the function dimcalc_raw() does. Also, inspect the contents of the LSA space. Do you understand what you see? If not, take a look at the lecture notes and to the help article of the function lsa(). Remember that LSA is performing a PCA transformation on the data.
svd(matrix)Compare this to the output of lsa(). What do you notice?
round(LSAspace$tk %*% diag(LSAspace$sk) %*% t(LSAspace$dk))What do you notice here? What does the remind you from the lecture?
newLSAspace<-lsa(matrix, dims=2)This last command is identical to the last one, however now we are specifying the dimension of the LSA space.
newMatrix<-round(as.textmatrix(newLSAspace),2)And take a look at it and compare it to the first matrix. Do you notice anything strange in this reconstruction?
associate(matrix,"computer")And then,
associate(newMatrix,"computer")What do you see? Try this with other terms.
t.locs<-newLSAspace$tk %*% diag(newLSAspace$sk)What do you see in this plot? If this is not clear, then try running the following two commands and try again to interpret this plot.
> plot(t.locs,type="n") > text(t.locs, labels=rownames(newLSAspace$tk))
q()
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |