NLP Projects at Reykjavik University

Our current main project is developing an open source natural language processing (NLP) toolkit, IceNLP, for analysing and processing the Icelandic language.

IceNLP currently consists of a tokeniser, a morphological analyser (IceMorphy), a linguistic rule-based part-of-speech tagger (IceTagger), a trigram tagger (TriTagger), a perceptron tagger (IceStagger), and a shallow (finite-state) parser (IceParser).  IceNLP is written as a collection of Java classes.

The taggers use a tagset constructed in the compilation of the Icelandic Frequency Dictionary corpus.

IceParser produces output according to a specific shallow annotation scheme.

You can test IceNLP here.

Source and/or executables for IceNLP can be obtained from Github.

IceNLP is a step towards the goal of developing a Basic Language Resource Kit (BLARK) for Icelandic. A BLARK for a language is the minimal set of basic resources (software modules, corpora, dictionaries, etc.) that is necessary to do further research and development in the field of Language Technology.