NLP and Speech Software under DICE

Information about corpora and other language and speech data can be found here.

General Information

A central repository has been created that contains software for research in NLP and speech processing. It is meant as a place where users can install software that is of general interest, but that is not used widely enough to justify full DICE support.

The software in this repository is maintained by individual users; please do not email central Computing Support if you have any queries or problems regarding the software. Instead, email the contact person in the list below. For general questions, please email Frank Keller.

This page only contains the most essential information regarding NLP and speech software under DICE. Please go to the NLP and Speech Software Wiki for local documentation and tips and tricks contributed by users of the software.

Using the Software

In order to use the software installed here, you have to augment your PATH and MANPATH variables, e.g., by adding the following lines to your .brc file:
export PATH="/group/project/nlp-speech/bin:${PATH}"
export MANPATH="/group/project/nlp-speech/man:${MANPATH}"
export PERLLIB="/group/project/nlp-speech/lib/perl5/5.8.5:/group/project/nlp-speech/lib/perl5/site_perl/5.8.5:${PERLLIB}"

Directory Structure

The central directory for NLP and speech software is:
This directory should be mounted on all DICE machines. The directory structure is explained in more detail in the next section.

Installing Software

To install software in the central repository, you have to be a member of the Unix group nlp-speech. Please email Frank Keller for more information.

When installing software, please pay attention to the following:

  1. Respect the directory structure. Instead of creating a new directory for each package, use the pre-existing directories as follows:
    bin/        binary executables (see below)
    doc/        documentation
    etc/        configuration files
    include/    header files
    lib/        library files
    man/        manual pages
    share/      shared data files
    src/        compiled sources (see below)
    pkg/        packages (pristine sources; see below)

  2. Please bear in mind that you won't be able to do anything that requires root privileges during the installation. In most cases, this means that you have to set the installation prefix correctly, as otherwise the software will attempt to install itself in global directories such as /usr/bin. For example, if the packages uses a configure script and a make file, then you have to say:
    ./configure --prefix=/group/project/nlp-speech/
    make install
    If the package you want to install is a Perl library, then you typically have to use:
    perl Makefile.PL PREFIX=/group/project/nlp-speech/
    make install
  3. When you install the software, please put the compiled sources in the src directory (see above). At the same time, please also keep the pristine sources (e.g., the tar or rpm file), and put them in the pkg directory (see above).

    Having the pristine sources is important if your software needs to be recompiled later on a different architecture or on a different version of DICE. Note that RPMs are the preferred way of archiving software, as we anticipate moving to an RPM-based local distribution of NLP and speech software in the next release of DICE.

  4. Once you have successfully installed a software package, please add it to the list on this page. Please also include your name (in case there are any questions), and a link to a web page with documentation for the software.

    In order to be able to edit this page (web/resources/nlp/index.html), you will have to have the requisite CVS permissions. All members of the group nlp-speech should have been given these permissions. If that's not the case, please file a Support Request. More information on how to use CVS can be found here.

List of Centrally Installed Software

The following NLP/speech software has been installed by central DICE Support. To find out more about each package, please use the command rpm -qi package.

CorpusWorkbench3.0Corpus query tools, incl. CQPCentralhere
R1.9.1Package for statistical computingCentralhere
WordNet1.7.1Lexical databaseCentralhere
bow0.2Toolkit for statistical language modeling, text retrieval, classification and clusteringCentralhere
graphviz1.10Graph visualization softwareCentralhere
netlab3.2Neural network toolkit for MatlabCentralhere
NLTK2.0b6Natural Language ToolkitCentralhere
pipestat5.4Comandline-based statistical analysisCentralhere
splus7.0Statistical analysis package with graphical frontendCentralhere
tnt2.2Thorsten Brant's Part of Speech TaggerCentralhere
weka3.2.3Machine learning Algorithms in JavaCentralhere

List of User-Installed Software

The following NLP/speech software has been installed by individual users.

Bilingual Sentence Aligner1.0Bob Moore's tool for sentence alignment in parallel bilingual corpusMirella Lapatahere
BoosTexter2.1Classifier using boostingMirella Lapatahere
BootCaT Toolkit0.1.2Simple Utilities for Bootstrapping Corpora and Terms from the WebMirella Lapatahere
C4.5release 8, 10/95Classification tree generatorMirella Lapatahere
CDE1.0CCG and DRT EnvironmentJohan Boshere
CMU-Cambridge Toolkit2.05Statistical Language Modeling ToolkitFrank Kellerhere
Cass/Scol1hSteve Abney's partial parserMirella Lapatahere
Charniak_parser05Mar18Eugene Charniak's ParserFrank Kellerhere
Cluto2.1.1Clustering high-dimensional datasetsMirella Lapatahere
DBparser0.9.9aDan Bikel's ParserFrank Kellerhere
ESPS6.0ESPS/waves+ with EnSigVolker Stromhere
Evalb--Parser bracketing evaluation toolFrank Kellerhere
Festival1.96Speech synthesis engineRob Clarkhere
German Chunker1.0Chunker for German developed by Helmut Schmid and Sabine SchulteMirella Lapatahere
Giza++2.0Training of statistical translation modelsMirella Lapatahere
Gsearch2.07Tool for finding syntactic patterns in unparsed textFrank Kellerhere
HTK3.4, Oct '06HMM Tool KitVolker Stromhere
Infomap NLP0.8.5Latent Semantic Analysis for NLPMirella Lapatahere
Kino0.6.5Digital video editorRob Clarkhere
LDA-C1.0C implementation of latent Dirichlet allocationMirella Lapatahere
LP Solve5.5Mixed integer linear programming solver.Mirella Lapatahere
LT Chunk3.0LTG syntactic chunkerMirella Lapatahere
LexChainer1.0 A tool to find semantically related words within unrestricted textsMirella Lapatahere
LingPipe2.1.1Java tools for the linguistic analysis on natural language dataMirella Lapatahere
LoPar3.0Helmut Schmid's left-corner parserFrank Kellerhere
MaltParser1.2Nivre's data-driven dependency parserEwan Kleinhere
Mary TTS3.0Saarland University's TTSVolker Stromhere
Megam0.3Maximum entropy model optimization packageMirella Lapatahere
Minipar1.0Dekan Lin's broad coverage parserMirella Lapatahere
Morpha/morphg/ana1.0John Carroll's morphological toolsFrank Kellerhere
Mxpost1997Adwait Ratnaparkhi's POS taggerVolker Stromhere
NSP0.67Ted Pedersen's N-gram Statistics PackageMirella Lapatahere
Normalized Cut1.0Matlab code for normalized cut image segmentationMirella Lapatahere
PDTB Tools1.2.3Tools for the Penn Discourse TreebankFrank Kellerhere
Pharaoh1.2.3Beam search decoder for phrase-based statistical machine translation modelsMirella Lapatahere
Praat4.3.24Analyze and synthesize speechVolker Stromhere
Primula1.0Inference with relational Bayesian networksMirella Lapatahere
Prover9 & Mace42009-02Aautomated first-order theorem prover & finite model builderEwan Kleinhere
Proximity4.0System for relational knowledge discoveryMirella Lapatahere
QuickNet3.11Tools and C++ library for using Multi-Layer Perceptrons (MLPs)Partha Lalhere
RASPOctober 2002John Carrol and Ted Briscoe's parserMirella Lapatahere
Ratingtest1.0Tool for web-based listening testVolker Stromhere
Rule Based Tagger1.14Eric Brill's rule based part of speech taggerMirella Lapatahere
SNoW3.1Sparse Network of WinnowsMirella Lapatahere
SRILM1.4.5LM toolkit from SRI InternationalVolker Stromhere
SVM light6.01Support vector machinesMirella Lapatahere
SamIam2.3Modeling and reasoning with Bayesian networksMirella Lapatahere
SCTK2.1.7NIST's Speech Recognition Scoring Toolkit, version 2.1.7-20070222-1638Partha Lalhere
SenseClusters0.71Cluster similar contexts together using unsupervised methodsMirella Lapatahere
Snack2.2.10Sound Toolkit for Tcl/Tk or PythonFrank Kellerhere
Sonic2.0 beta 5University of Colorado Speech RecognizerVolker Stromhere
Spade0.9Sentence-level parsing for discourseMirella Lapatahere
SPRACHcore2004-08-26ICSI Speech toolkitPartha Lalhere
Text Similarity3.0Ted Pedersen's Perl package for computing text similarity measuresFrank Kellerhere
Tgrep1.14Treebank search toolFrank Kellerhere
TigerSearch2.1Corpus search toolsFrank Kellerhere
Timbl5.1Tilburg Memory-based LearnerMirella Lapatahere
TinySVM0.09Support Vector MachinesFrank Kellerhere
TreeView1.0Tree viewing toolFrank Kellerhere
WordNet QueryData3.0Jason Rennie's Perl package for querying WordNetFrank Kellerhere
WordNet Similarity3.0Ted Pedersen's Perl package for computing WordNet-based similarity measuresFrank Kellerhere
Yale3.2Eenvironment for machine learning experiments and data miningFrank Kellerhere
YamCha0.33Yet Another Multipurpose Chunk AnnotatorFrank Kellerhere

Home : Resources 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh