Our current main project is developing an open source natural language processing (NLP) toolkit, IceNLP, for analysing and processing the Icelandic language.
IceNLP currently consists of a tokeniser, a morphological analyser (IceMorphy), a linguistic rule-based part-of-speech tagger (IceTagger), a trigram tagger (TriTagger), a perceptron tagger (IceStagger), and a shallow (finite-state) parser (IceParser). IceNLP is written as a collection of Java classes.
The taggers use a tagset constructed in the compilation of the Icelandic Frequency Dictionary corpus.
IceParser produces output according to a specific shallow annotation scheme.
You can test IceNLP here.
Source and/or executables for IceNLP can be obtained from Github.
IceNLP is a step towards the goal of developing a Basic Language Resource Kit (BLARK) for Icelandic. A BLARK for a language is the minimal set of basic resources (software modules, corpora, dictionaries, etc.) that is necessary to do further research and development in the field of Language Technology.