Created by UZEI, this resource detects and automatically extracts multiword lexical units (collocations) in Basque and Spanish.
Koloka uses a combination of linguistic methods (it has established patterns and models of possible combinations of categories, and detects word combinations complying with said patterns) and statistical methods to identify multiword lexical units. It is thus able to effectively process large text masses.
The use of an automatic extractor of units containing more than one word is important, particularly in the case of Basque, since this type of unit can take on a variety of forms or combinations in the word succession in a sentence. This is why, as regards terminology for example, it is hard for the terminologist to systematically detect all the candidate terms of this type in the corpus being analysed.
The TermiGai term extractor uses Koloka, among other resources.