The Spear Parser


This a yet another dependency parser, heavily based on Michael Collins' code. It was trained on TreeBank version 2.0, and on an additional QuestionBank developed from the TREC 8-12 questions.

Why should you use this parser instead of the Michael Collins' original parser? There are at least 4 reasons:

  1. This parser contains code to completely retrain Collins' parsing Model 1 (fully compatible with his parser). Michael does not release his training code, so if you need to retrain the parser, this package provides all the needed software.

  2. The parser comes pre-trained not only on TreeBank version 2.0, which has a very small number of questions, but also on a corpus of questions I developed. This means that spear parses questions better than Collins' parser, which makes it more suitable for applications where questions are common, e.g. Question Answering.

  3. The parsing model is stored in shared memory instead of the regular heap. This means that initialization is sped up significantly for subsequent executions of the parsing process.

  4. It has a clean and easy to use API.

Click here to download the latest version of the parser. The project README contains installation and usage instructions. Click here to download the question bank I created and used as additional knowledge for the parser.