CoolJugator is a new kind of verb lookup tool. Started back in 2011, and restarted with a completely new approach in 2016, it seeks to alter the way dictionaries work. In late September 2016, we further relaunched with improved site navigation and stability features.
The languages on CoolJugator
Our aim is to develop conjugations for as many languages as possible. The original CoolJugator began with five languages: Lithuanian, Latvian, Estonian, Russian and Modern Greek. In the new 2016 edition of CoolJugator we have kept the data from the original CoolJugator and have added additional languages, including English, Polish and Esperanto. The new languages are built using the smart morphology approach described below, while the old five are currently stored in a legacy format and are scheduled to be updated to the new version.
We don't have the language you need? Send us your feedback and let us know which other languages you would like to see on CoolJugator!
How CoolJugator seeks to improve conjugation
In the same way as digital media has transformed the publishing industry, the dynamic nature of online language reference materials provides very different opportunities from their non-online counterparts. In particular:
- Word form recognition and on-the-fly generation: while traditional word reference materials would usually only allow the lookup of the main forms of the word (e.g. infinitives in French, say, avoir – to have), dynamically, it is possible to look up any form, including rare ones, (e.g. agîmes– we acted) to arrive at search results.
- Inter-linguistic compatibility: a decreasing amount of internet users (and, by implication, foreign language learners) primarily use English, and only half or less of the websites on the internet now use English at all. Yet, a disproportionate amount of all online dictionaries and reference materials tends to be primarily produced for English speakers and use English as the primary language of reference. Cross-language implementation is lacking.
- Platform compatibility: up to 37% of users of online language reference websites use mobile devices next to non-mobile ones, and users on mobile platforms are much more used to higher interaction costs and higher data prioritisation needs, thus necessitating more accurate verb lookup, data selection, as well as adaptability to specific learner language-wise, also their levels and preferences.
- Context-sensitive examples: the application of machine learning and text corpora analysis allows the supplying of context-sensitive usage examples for reference entries; however, in order to be accurate, such examples need to be informed by the morphological information and take into account the multiple grammatical variations associated with each word.
How CoolJugator works behind the scenes
We have built or applied software for morphological analysis and generation of word forms, which relies on a hybrid approach that uses both rule-based and statistical models to identify all correct forms for any word of interest. To put it very simply, our software allows us to both derive all forms of a word given the word (so, for example, with the inputs of the Spanish word hablar, the software would identify all its other forms, including hablé, hablas, habla, hablaba, hablaría, hablaré, etc.), and to do reverse lookup (thus an input habláramos would identify its grammatical categories and the key form hablar).
The software already has the following features:
- Generation and lookup of all forms of any word given any other form: the software takes a word form and can output both the result of its morphological analysis (i.e. the semantic category code) and full set of forms derived from the stem of that word. This is done by searching the word against the defined rules and checking whether all forms as prescribed by the rule pass the Hunspell spellchecking function, where we are checking the form against industry standard. Selection of the correct rule can be supported by a statistical model derived from word usage frequency tables. This also allows for words not in Hunspell to be declined based on their most likely declination patterns, and thus our software efficiently generates forms even for previously unseen words.
- Inter-linguistic operability: in what we consider as the most important aspect of our software functionality, semantic category codes allow matching correct forms between languages in a complex manner. For instance, it is possible to assign matches both between morphological features that are used in the same semantic context between languages (e.g. “я смотрю” and “I look”) and also between semantic categories that have more complex inter-language mappings (e.g. jussive in Esperanto can correspond to both subjunctive and imperative in other languages). This can be used to allow a choice of corresponding forms in a context dependent manner. In the future, we plan to integrate a statistical model trained on word context to permit the generation of ranked list of suggested matching forms.
- Extension of software’s functionality to further languages: based on Gramtool, a structured semantic category tree lies at the core of our software, which is defined in YAML, and it includes grammatical definitions for each language and takes forms as inputs. We have applied a consistent and human-friendly format for the definition of grammars of natural languages. This format allows us to have our software work with different languages through having grammars defined in them. In our experience, a person familiar with the target language can develop a new morphological grammar rule set for the language in this format in under a week. Each grammar is easily and intuitively expandable to include further grammatical category definitions as the need for them arises. Each word form is associated with a code that directly reflects its position within the semantic category tree.
Acknowledgments: what technologies and sources we use
We have also drawn from Wiktionary and similar online resources for some conjugation patterns and information in a number of our languages. While it is not our intention to offend or misguide someone, but, for interests of learning, we tend to strive for inclusiveness, and often include verbs automatically, thus we cannot be held responsible for the accuracy, quality or decency of the verbs included - please be aware about this disclaimer and use this site only if you are willing to take on the risk of being offended or misguided.
The reason verbs are conjugated is to give them more precise meanings according to the context in which they are used, which in turn facilitates the understanding of more complex ideas. Therefore, it is important to see how the the conjugated forms of a verb are used in examples of living language. In CoolJugator, we use examples extracted from the Open Parallel Corpus (OPUS) collection. In particular, many of our examples are derived from the OpenSubtitles2016, EUbookshop, DGT and Tatoeba corpora in this database. Importantly, since we extract examples, we may not be responsible for their accuracy, quality or political correctness - please peruse in your own risk.
Plans for CoolJugator
We have only begun to grow our verb conjugation site. We have a lot of plans for it, which include, but are not limited to:
- Defining further languages - we view the current selection of target languages as merely a first step as our software is designed from the start to be adaptable to further languages by defining new grammars.
- Further software development – we plan to further improve key features of our software outlined above, with a special emphasis on inter-linguistic operability, which will allow us to focus our services on better localisation and matching of different language pairs.
- Integration with machine translation – we seek to further integrate the existing statistical machine translation methods (corpus-based approaches) into our system, to provide even more accurate form matches for our tools.
Legal: Usage of data, privacy and disclaimer
You may use any data on this website without explicit persmission from us only for non-commercial use, and only to the extent that such use does not make available to the public any significant portion of the data by any means, whether online or offline. Please also note that any information that you manually or automatically submit to this website (including questions, suggestions or searches, and corresponding user-identifying information available to the server by virtue of your visit (such as IP addresses) may be logged and analysed by us, and, insofar as such information is not individually identifying (or where you have chosen to be individually identified), this information can be publically shared by us (in particular this latter point applies to questions and suggestions submitted by our users). In addition, and this applies to all of the forms, translations or any other data that we have on the website: we do our best to provide you with accurate forms and translations, but we cannot guarantee full accuracy thus we cannot accept any responsibility for whatever context you use these forms in.