Please visit the software page at clulab.org.
This is the code and data from our EMNLP 2012 paper. Additionally, this package includes all the source code from our KBP 2011 paper on slot filling.
I contribute to Stanford's CoreNLP suite of natural language analysis tools. This includes tokenization, morphological analysis, POS tagging, named and numeric entity recognition, syntactic parsing (both constituents and dependencies), and coreference resolution. The software allows you to generate all these annotations with just two lines of code: first, create a
StanfordCoreNLP object, then, call the
annotate(new Annotation(String yourText)) method.
This software is the event parser component from the Stanford and FAUST submissions to the BioNLP shared task.
This code implements a linear interpolation of several linear-time parsing models (all based on MaltParser). Each individual parser runs in its own thread, which means that, if a sufficient number of cores are available, the overall runtime is essentially similar to a single Malt parser. The resulting parser has state of the art performance yet it remains very fast.
SwiRL is a Semantic Role Labeling (SRL) system for English constructed on top of the full syntactic analysis of text. Achieved state-of-the-art performance in the CoNLL 2005 SRL evaluation.
Includes a named-entity recognizer, a syntactic chunker, a POS tagger, and a "smart" tokenizer. All processors are learned using the MiLL machine learning library (see below).
Includes SVM, Maximum Entropy and Perceptron classifiers under a unique and simple interface.
All algorithms support mult-class problems.
MiLL includes the novel Perceptron algorithm with dynamic uneven margins I designed for my ACE Information Extraction system (see the publication page). MiLL is distributed together with BIOS but it can be used independently of BIOS, for any ML task.
Syntactic parser heavily based on Michael Collins' Model 1 parser.
The Spear package includes also a corpus of parsed questions I created from the TREC 8 - 12 evaluations. This corpus was crucial in improving the parser performance on questions.