TOBAR NA GAEDHILGE 1.4

Ciarán Ó Duibhín, 2009/05/01
Clár cinn / Home page / Page d'accueil / Hauptseite
My email address

What it does
    • What the results may look like
    • What texts can be searched
    • Requesting word-forms
    • More about what to search for
    • More languages!
    • Lemmatization    NEW!
    • Translation equivalents    NEW!
Installation
History
New this time
The texts in detail
Keyboard layouts
Miscellaneous

What it does

Tobar na Gaedhilge is a searchable textbase of high-quality 20th-century Gaelic texts (mostly Irish, with some Scottish), containing over 3 million words, freely downloadable for installation on a personal computer under MS Windows.  After the program is installed, a word-form may be requested, and examples of its use may be viewed.

• What the results may look like

We may begin by finding sentences containing a selected word-form, and we will do one example each from the Munster, Connacht and Scottish texts. Thereafter, we will draw our examples from the Ulster texts, which form by far the largest part of the material stored.

Figure 1: We looked for examples in the Munster texts of the word-form cábóg (a country person). We found two examples in Pádraig Ua Maoileoin, Na hAird Ó Thuaidh, and we show the second example here. Page and line reference is given to the published book.

Screenshot of Munster sentence view

The navigation panel (at upper right) allows us to move around the retrieved examples. The panel at the lower right allows the form in which the sentence is displayed to be modified. All the options are described under Figure 6 below.

Figure 2: We looked for examples in the Connacht texts of the sequence of word-forms lúb gaoil (blood relationship), and we found two examples, both in Séamus Mag Uidhir, Fánaidheacht i gConndae Mhuigheo. We show the first sentence here.

Screenshot of Connacht sentence view

Figure 3: We looked for examples in the Scottish texts of word-forms beginning with càr. With a request like this for word-forms matching a general pattern, clearly more than one word-form may match — we refer to this situation as matching a disjunction of word-forms. The sentences for each matching word-form are presented (over all the relevant books) before presenting the sentences for the next matching word-form. Here, we show a sentence containing the matching form càraich (fix).

Screenshot of Scottish sentence view

Besides viewing complete sentences as above, two other ways of viewing results  are provided, which are more compact for large quantities of text. Output from these methods is shown in the next two figures.

Figure 4: Uses of the word saoghal (life) shown as its frequency in the various Ulster texts.

Screenshot of frequency view

Scrolling may be necessary to reveal the information for all the texts. If the request matched a disjunction of word-forms, a menu option AthFhocal (next word-form) will proceed to the next matching word-form. A menu option Réidh (finished) leaves the displayed results and returns to the search screen.

Figure 5: A keyword-in-context index of the word-form athair (father) in Séamus 'ac Grianna, Thiar i dTír Chonaill. The first batch of 40 occurrences is shown.

Screenshot of concordance view

The navigation panel at the upper right allows several options. First of all, we may move up and down through the concordance lines, which are in screenfuls of 40 (unless the user resizes the window). We may move down a screenful (Síos), up a screenful (Suas), to start of text (Suas go bárr), to end of text (Síos go bun). However, these options will not take us into into another text, nor (if the request matched several word-forms in a disjunction) will they take us into another word-form.

Second, we may move, forwards only, through the whole collection: AthThlámán (next batch), AthLeabhar (next book), AthFhocal (next word-form, if the request covered matched several in disjunction), Réidh (finished). If we wish to examine all the examples from the beginning again, we need only choose Réidh and then click OK. This second set of options are provided with keyboard shortcuts, which may prove convenient on occasion

Cóipeáil (copy) copies the current display to a textfile, which by default, is called samplaí.txt and is placed in the My Documents folder, and the copied material is appended to it. Comhad Cóipeála (copy file) allows the name and location of the file to be changed, and also the mode from append to overwrite (but it will revert to appending after the first copy).

Réidh (finished) leaves the displayed results and returns to the search screen.

Figure 6: Returning now to display by sentences, we give an example of the word-form oidhreógach (ice) from the Ulster texts. The example shown is from Seosamh 'ac Grianna, Pádraic Ó Conaire agus Aistí Eile.

Screenshot of sentence view

The panel at the lower right allows the text of the sentence to be shown in a choice of ways: plain (Lom), including mark-up (Marcáilte), or as a list of the word-forms by which it is indexed (Innéacsáilte). The third of these options allows you to see which index terms will produce this sentence. You can also investigate our failure to achieve consistency in indexing over this body of text! The panel also allows the sentence to requested to appear in other languages, when this is possible (see later) — the fact that the language names are in italics shows that they are unavailable for this sentence.

The navigation panel at the upper right allows several options.

First of all, we may move around the examples. We may move down a sentence (Síos), up a sentence (Suas), to start of text (Suas go bárr), to end of text (Síos go bun). However, these options will not take us into another text, nor (if the request matched several word-forms in a disjunction) will they take us into another word-form.

Second, we may move, forwards only, through the whole collection: AthAbairt (next sentence), AthLeabhar (next book), AthFhocal (next word-form, if the request matched several in disjunction), Réidh (finished). If we wish to examine all the examples from the beginning again, we need only choose Réidh and then click OK. This second set of options are provided with keyboard shortcuts, which may prove convenient on occasion.

Cóipeáil (copy) copies the current display to a textfile, which by default, is called samplaí.txt and is placed in the My Documents folder, and the copied material is appended to it. Comhad Cóipeála (copy file) allows the name and location of the file to be changed, and also the mode from append to overwrite (but it will revert to appending after the first copy). The darker elongated panel shown above is the result of clicking Comhad Cóipeála.

Réidh (finished) leaves the displayed results and returns to the search screen.

If you accidentally lose the navigation panel from the screen while examining the results, a reminder labelled Preab-liosta! will appear from underneath it.  Clicking on this reminder will recover the navigation panel.

Multi-word retrievals may require the creation of workfiles. These will be placed in the current folder, if possible. If this is not possible, e.g. because the current folder is read-only in a network situation, you will be prompted for the name of a folder to hold workfiles.

• What texts can be searched

Gaelic is found in several slightly different forms, and the texts are organized into collections to reflect this and to keep each collection fairly homogeneous in language.  The four collections supplied are (as of 2009/05/01):

  • Ulaidh (Ulster)           Index: Gaedhilg. 17 authors;  50 books;  44,252 word-forms;  3,161,844 word-tokens
  • Connachta                 Index: Gaedhilge.  3 authors;   3 books;  10,066 word-forms;    139,184 word-tokens
  • Mumhain (Munster)  Index: Gaolainn. 2 authors;   2 books;   6,755 word-forms;    101,386 word-tokens
  • Alba (Scotland)          Index: Gàidhlig.  4 authors;   4 books;    7,401 word-forms;    98,909 word-tokens

At the present stage of development, the Ulaidh collection is much larger than any of the others.

A different division into collections would be, of course, possible.

The identities of the texts in all collections are listed in full below.

Searching may be restricted to any chosen subset of the texts of a collection, by deselecting temporarily individual texts or authors.

Each collection has at least one pre-compiled index associated with it, made from the word-forms found in the relevant books, and in which requests for words are looked up. The statistics just given for the collections refer to these indexes — indexes of Gaelic word-forms. The Ulaidh collection has several additional indexes. Tobar na Gaedhilge contains English and French translations or originals of some of the books in the Ulaidh collection, and Béarla (English) and Fraincis (French) indexes of word-forms are provided to the respective subsets of books. Further, from version 1.4, a rough and ready lemmatization has been applied to the English and French texts, so that the Béarla and Francis indexes are offered in terms of lemmata (lemmas) as well as in terms of foirmeacha (word-forms).

The statistics for the additional indexes to the Ulaidh collection are as follows, at 2009/05/01:

  • Ulaidh (Ulster)           Index: Béarla (foirmeacha). 6 translaters; 11 books;     30,865 word-forms;    1,073,265 word-tokens
  •                                  Index: Béarla (lemmata).  6 translaters;   11 books;   22,545 lemmata;    1,074,221 word-tokens
  •                                   Index: Fraincis (foirmeacha). 2 translaters;  4 books;    21,562 word-forms;    372,439 word-tokens
  •                                   Index: Fraincis (lemmata).  2 translaters;   4 books;    10,786 lemmata;    372,439 word-tokens

(The discrepancy between the English token counts arises from taking words like won't or can't as one form-token but as two lemma-tokens: will not, can not).

Much more detail will be given later about what may be found in the indexes. But it is well to say at the outset that a Gaelic (for example) index means an index to everything that is found in the Gaelic versions of books. It does not mean an index containing exclusively Gaelic words — whatever is in the book may appear in the index, so each index is, to an extent, multi-lingual.  The same applies to English and French indexes.

Figure 7: This is the program's opening screen, and the first task is to choose the desired collection/index.

Screenshot of opening screen

The program should show a list of the available collections, as in blue above — together with some statistics of the highlighted collection (below) and a pick list of its index (on the right). If there are no collections listed, you may be in the wrong folder, and you can browse (using Cuirtear Lorg) to a different folder. The indexes available to the highlighted collection are shown in brown. When you have marked the desired collection and chosen your index to it, click on Isteach to enter the collection/index.

Amach is to exit the program.

Treoir is for help. The help file supplied with Tobar na Gaedhilge is in Windows Help format (.hlp), which is not supported by Windows Vista. In order to use Windows Help files in Vista, download and install the Microsoft Vista upgrade of WinHlp32.exe at http://support.microsoft.com/kb/917607

• Requesting word-forms

When a collection has been selected, and Isteach clicked, the display changes to that shown in the next figure, which allows you to type word-forms, among other things.

Figure 8: Requesting word-forms.

Screenshot of requesting wordforms

Before entering our own word-forms, however, first notice that this screen allows you to change to a different collection/index, by using the Athruigh button on the Cnuasacht panel. And also, that you may see which authors and books are included in the current collection/index by using the Athruigh button on the Leabharthaí panel, and you may choose to select temporarily a subset of those books.

From the Radharc panel, you may choose your display mode for the results: frequencies (Minicidheacht), a keyword-in-context concordance (KWIC) or sentences (Abairteacha). Samples of each form of results have already been shown above.

And now we come to the Focal panel, where the desired word-form or word-forms may be typed into the box provided, or may be inserted there by double-clicking them on the pick list, which is a displayed segment of the collection's index. The pick list accommodates itself to the existing contents of the box, as a guide to what word-forms are available.

As an alternative to typing it into the box, a search word may also be chosen by double-clicking it on the pick list.  The word will be appended to anything already in the box (and not highlighted).

In the word-form box you may put:
• a word-form, such as oidhreógach or saoghal or athair, as used in our previous examples
• two or more word-forms occurring together, either consecutively or within the same sentence (eg. lúb gaoil)
• any word-form may contain a wild-card (*), that is, an asterisk which matches any number of letters, including none.  For example, all word-forms with a particular stem may be sought (eg. beir*), or all word-forms with a particular termination (eg. *stin).

If the word-form box contains more than one word-form (ie. there is a space within it), you are asked to choose between seeking the word-forms directly adjacent and in the given order; or within the same sentence in any order.

It is even possible to give one or more of the words as simply the asterisk (*), which matches any word; the search is then assumed to be a consecutive one. (But avoid giving * as the final word, as the search will be slow.) As we will see below (under demutation) a hyphen is, in most circumstances, counted as a separate word, so search for sean-bhean as three words: sean - bean (as well as sean bean and seanbhean to cover any unhyphenated instances).

You may tick the Gan beinn ar an tsíneadh fhada checkbox if you want to include word-forms which differ from that requested only by the presence or absence of an accent, eg. comhradh with this checkbox ticked will match comhrádh, cómhradh and cómhrádh as well. Ticking this box also includes word-forms which differ from that requested by the addition of a hyphen, apostrophe or period — though most Gaelic words containing apostrophes or hyphens are indexed as two or more separate words anyway, as explained under decliticisation below.

Word-form searches are always case-insensitive; you should not enter capital letters in word-forms (for lemmata, see later).

To type accented vowels, use your normal method of doing this under Windows. For information about keying accented letters under Windows, look here, or see the section "Keyboards layouts" near the end of this file. (Remember that, in Tobar na Gaedhilge, lenition is not indicated by a dot accent, but always by suffixing the letter h.)

When all this is complete, you may click the OK button to produce the results.

Further hints on the selection of word-forms will come in the next section of this document.

• More about what to search for

Here are some pointers regarding what kinds of word-forms are worth requesting.

When a word-form is requested, it is matched against a pre-compiled index of word-forms from the chosen collection.  For Gaelic, this index consists of word-forms which are aggregated in a number of ways to increase coverage:

• lowercased: the word-forms in the index have been converted to lowercase by replacing any capital letters by small letters; this even applies to proper names. So you should use only small letters in your request. Keep an eye on the scrolling alphabetic list for guidance on what forms are available. (For lemmatized indexes, see later.)

• decliticised: common enclitics, such as d' in d'ól, or 's in 'seadh, or -sa in agam-sa, are treated as separate words in the index (d' + ól; 's + eadh; agam + - + sa), and should also be detached in your request. Enclitics are normally signalled in running text by a hyphen or an apostrophe. But when there is no overt signal in similar cases (eg. agamsa, seadh), the splitting in the index will have been performed manually and is unlikely to have been exhaustive.

A number of common contracted words have been indexed under their parts, e.g. 'na (from ina) under ' and a; 'na (from chun an) under 'n and a'; 'na (from chun na) under ' and na; ab (from a ba) under a and b, or under a and b'; gurab under gur and a and b; and many other similar cases.

Compound words, with the parts connected by a hyphen, are also generally split in the index, in three parts, eg. leith-phighinn under leith and - and pighinn.

• demutated: initial mutations are removed from word-forms in the index; so, for example, fear, fhear and bhfear are all indexed as fear, while t-olc, n-olc and holc are all indexed as olc—but, where the mutation is permanent, it is retained, e.g. chugam, thart (in one of its senses), (go) dtí. You may have noticed the benefits of demutation and decliticisation in our athair example above. An initial mutation does not leave any trace in the index; this is true of a hyphen which is nothing but part of an initial mutation. When typing word-forms of Gaelic, remember to remove initial mutations, unless they are a permanent part of the word. Removal of initial mutations may seem unintuitive when a sequence of word-forms is requested (eg. ár athair), but it is nonetheless required.

But the word-forms in an index are not lemmatized, i.e. terminally inflected forms, such as fear, fir, feara, must be searched for separately—although the wild card may often be used to advantage to retrieve the several related forms.

You may choose (by ticking Gan beinn ar an tsíneadh fhada) to make the search insensitive to accents. A suitable case might be searching together for word-forms like comhartha and cómhartha. Both word-forms are to be found separately in the index, and with this option selected a request for either of them will fetch both.

Finally, note that, wherever the text is clearly in error, we may silently correct the indexed form of a word, but we never correct the text itself, that is, the original text form—right or wrong—will be displayed in the contexts (e.g. the misprint comhhartha is indexed as comhartha, but remains comhhartha in contexts).

• More languages!

It happens that a minority of the texts in the Ulaidh collection also exist in translated (or original) English forms, and some too in French. When a Gaelic sentence is being viewed, then, it is possible to request the display of the English and/or French equivalent sentences too, and they will be shown if they are available. You may have noticed buttons, on the sentence-level displays above, labelled Béarla and Fraincis — this is their purpose. When a translation is not available for the sentence being displayed, the name of the unavailable language is italicised.

Figure 9: Sentences containing the word-forms creafadaigh (shaking); the first of two examples from Seosamh 'ac Grianna, Seideán Bruithne/Amy Foster. English and French translations are available and are shown.

Screenshot of Gaelic sentences with translations

In this and the following examples, remember that the sentence may be shown in a choice of ways in the first language: plain (Lom), including mark-up (Marcáilte), or as a list of the word-forms by which it is indexed (Innéacsáilte). In other languages it is shown plain.

Likewise, when we use the Béarla or Francis word-form indexes to the Ulaidh collection, we may view Gaelic sentences which serve to translate a particular word-form of English or French. Like the Gaelic word-form indexes, the Béarla and Fraincis word-form indexes are lowercased, decliticised, unlemmatized.

Figure 10: Search of the English index of the Ulaidh collection for the word-form bunch. An example is shown from Ben-Hur, and the English and Gaelic and French of the sentence is displayed.

Screenshot of English sentences with translations

Figure 11: Search of the French index of the Ulaidh collection for the word-form accroché. An example is shown from Iascaire Inse Tuile, and the French and Gaelic of the sentence is displayed.

Screenshot of French sentences with translations

• Lemmatization

Alongside the indexes of word-forms, we now offer lemmatized Béarla and Fraincis indexes to parts of the Ulaidh collection. (Lemmatized Irish indexes are not yet in sight.) Thus, a request for the English lemma man will match the word-forms man or men; while a request for the French lemma homme will match the word-forms homme or hommes.

Figure 12: A KWIC list of the examples of the lemma listen in Gadaidheacht le Láimh Láidir, according to our English lemmatized index. The corresponding Gaelic material may be inspected, one example at a time, in the sentence display mode.

Screenshot of KWIC using English lemmatized index

Figure 13: A KWIC list of the examples of the lemma abandonner (to abandon) in Ben-Hur, according to our lemmatized French index. The corresponding Gaelic material may be inspected, one example at a time, in the sentence display mode.

Screenshot of KWIC using French lemmatized index

It is important to understand, however, that our lemmatization of English and French is severely limited. It has been performed automatically, using the Stuttgart Tree Tagger and, as with all statistical operations, a percentage of errors is inevitable, despite much manual checking. Also, we have not tried to separate the several senses of homographic headwords, such as pack or stamp or well in English, or pas or tendre or vague in French; all lemmas with a common headform are simply produced together.

In our lemmatized indexes, most words have again been made lowercase; but an initial capital has been retained in some words, mostly names. Therefore the letters you type in your request to a lemmatized index should be small letters, except where an initial capital is appropriate to the lemma which you seek. Keep an eye on the scrolling alphabetic list for guidance on what lemmas are available and where capital letters have been retained.

If you wish to receive lemmas which differ from your request only by an accent, put a check mark on Gan beinn ar an tsíneadh fhada just as with word-form indexes.

When displaying the results from a lemmatized index, Innéacsáilte shows the sentence in the first language as a list of the lemmas with which it is indexed.

• Translation equivalents

A related innovation is the calculation of translation equivalents. Given a word in the source language, this consists of a listing of the relatively most common words in the corresponding segments of the target language. This will clearly be more effective using lemmas than using word-forms, so it is offered only with lemmatized English or lemmatized French as the source index. At present the target produced is a list of word-forms (Gaelic, English or French), though again lemmas would be preferable but are not yet available. This technique has potential, but is limited at present by the amount of text available (English/Gaelic: about 1 million words; French/Gaelic: about 350,000 words; English/French: about 300,000 words). Results with the present amounts of text will vary from useful to comical.

Where a lemmatized source index (English or French) is in use, translation equivalents may be chosen as a fourth output display mode, named Freagar-fhocla. The calculation, for a selected source-language lemma, may take a few moments. The resulting display is a list of target-language word-forms, each accompanied by a score, and sorted on these scores (the user may have it re-sorted alphabetically on the word-forms themselves). The scores — which are not raw word counts — may range from 99,999,999 down to 100,000. They measure how many times as common the target-language word-form is in the neighbourhood of the source-language lemma than in the target language corpus on average.

Figure 14: Search for Gaelic word-forms collocating with the English lemma child, in the Ulster texts.

Screenshot of translation equivalents

The chosen source-language word (ie. lemma) defines a set of sentences in the source-language corpus — those sentences in which it occurs — and a corresponding set of sentences in the target-language corpus — those sentences which translate them. This "select part" of the target-language corpus is studied, looking for word-forms (freagar-fhocla, word-equivalents) which are more frequent there than in the target-language corpus on average.

If the source-language word is uncommon (read: selects less than one-thousandth part of the source language corpus), a warning is issued that the results may not be statistically useful, but no impediment is placed on calculating them.

If a target-language word turns out to be equally frequent in the select part and on average, it is given a score of 100,000; if it turns out to be twice as frequent as on average in the select part, it is assigned 200,000; and so on. Words less frequent in the select part than on average are discarded as uninteresting, so that 100,000 is the minimum score among those retained. At the other end of the range, the score 99,999,999 is assigned to any word which is 100 times or more as common in the select part as on average.

Even if a word falls within the range 10000..99999999, it is still omitted from the displayed list if its absolute frequency is small. This is intended to overcome "accidental" collocations, which will disappear naturally as more text is added, but may mask significant data while they remain. A suitable empirical lower cutoff for absolute frequency of a word-equivalent is found to be the square root of one-tenth of the frequency of the source-language word.

Results are still poor enough with the amount of text available, but will improve as the quantity increases. Even at the present time, however, it may be of interest to input English lemmas from the following list, and to compare the results with the content of existing English–Irish dictionaries, noting what is found in the dictionaries but absent in the corpus, as well as what relevant equivalents are found in the corpus but not in the dictionaries: smoke, minute, also, yet, dog, ice, bee, garden, help, interest, gravel, cave, busy, cell, kitchen, open.

Installation

It is suggested that you uninstall any earlier version of Tobar na Gaedhilge before installing this one. Uninstallation may be performed using the Start/Programs menu, or using the Add/Remove Programs control panel. If you want to retain older versions, you may install this one to a different folder.

System requirements are:

(It may also be possible to run Tobar na Gaedhilge on a Macintosh through Windows emulation — success has been reported here using Virtual PC on MacOS9, and using Virtual PC 7 on MacOSX.)

If you have Windows, from Win95 to Vista, download the file TOBAR2009.EXE to any folder, and double-click on it to install Tobar na Gaedhilge.

Installation uses the EZInstall installer, and may take a few minutes.

This product is completely free of adware, spyware or other harmful inclusions. During installation you will be shown — once — the website address of EZInstall, and invited to visit it.

By default, installation is to the folder C:\Program Files\Tobar, where the following files will be created:

The texts: 89 files, with names bearing the extension .MRK, containing the texts. These files are not intended to be used in any way other than through the program TOBAR. The full list of texts, with acknowledgements to those who originally produced them in machine-readable form, is given later in this file.

The indexes to the collections (each index consists of four files):

A number of files containing optional keyboard layouts:

To run the textbase, after installing it, use the shortcut "Tobar na Gaedhilge" already placed on your Start/Programs menu. Or double-click on the TOBAR.EXE icon, which you will find in the Program Files\Tobar folder (you may, if you find it convenient, create a short-cut by dragging this icon to your desktop).

History

Version 1.4 released May 2009.

Version 1.3 released September 2006.

Version 1.2 released November 2004.

Version 1.1 released November 2003.

Version 1.0 released as a native Windows application in February 2002.

The system was previously an MS-DOS application, known as GAELDICT or FOCAL, of which four versions were released between 1995 and 1998.

New this time

Thanks to Foclóir Stairiúil na Nua-Ghaeilge, to Pádraig Ó Mianáin and to Rita Nic Aodha Bhuí for contributing new texts, and to Foclóir Stairiúil na Nua-Ghaeilge for contributing corrections to Eachtraí Sherlock Holmes.

The texts in detail

In selecting and preparing texts for Tobar na Gaedhilge, the emphasis is on authenticity, accuracy and added-value.

Authenticity: Texts chosen are written in the early to middle 20th century, by native speakers of Gaelic (or occasionally, by non-native speakers who modelled their speech on well-defined local Gaelic). Our policy is to adhere as closely as possible to the language in which the authors wrote them.  In general, the earliest editions have been preferred, and standardized or school editions avoided. Manuscripts, where available. may be consulted in the interests of accurary, as may earlier publication in serial form, where it exists. Remaining obvious errors have been corrected, but only in the indexing, not in displayed contexts.  We do not to impose our own subjective filter between the texts and their users, or to conceal anything which could have the slightest linguistic significance.

Accuracy: Texts are either typed through the keyboard, or scanned. The input may be performed specificially for Tobar na Gaedhilge, or texts may be donated by individuals or projects, to all of whom we are grateful, and who are acknowledged under "Original e-text" in the lists below. The contribution of Foclóir Stairiúil na Nua-Ghaeilge at Acadamh Ríoga na hÉireann, Dublin, is particularly significant. On incorporation into Tobar na Gaedhilge, all texts are subject to continuous correction, as transcription errors are uncovered.

Added-value: It has proved possible to augment the texts, in their stored medium, in various ways which add to their value. Undisputed errors are corrected in the indexing. The Gaelic versions are aligned with English and/or French versions, where these are available to us. Lemmatization, in a rough-and-ready form, has been applied to the English and French versions. New possibilities for analysis arise from these enhancements. Future objectives include lemmatization of the Gaelic texts, and addition of aligned sound files. 

A full list of texts at version 1.4 follows. The text identification scheme used in this list comes from Foclóir Stairiúil na Nua-Ghaeilge.

ULAIDH:

LU006: Na Rosa go Bráthach; Mághnus 'ac Comhghaill (1885–1965)                ÚR!
Publisher: Oifig an tSoláthair, Dublin, 1939 (we used: An Cúigiú Cló, 1946)
Original e-text: Rita Nic Aodha Bhuí
LU010: Na Glúnta Rosannacha; Niall Ó Domhnaill (1908–1995)
Publisher: Oifig an tSoláthair, Dublin, 1952
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU013: Troid Bhaile an Droichid; Séamus 'ac an Bháird (1871–1951)
Publisher: Connradh na Gaedhilge, Dublin, 1907 (we used: Preas Dhún Dealgan, Dundalk, undated, 1920s)
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU016: Eoghan Ruadh Ó Néill; Seosamh 'ac Grianna (1900–1990)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1931
Original e-text: Rita Nic Aodha Bhuí, for FNG
LU018: Dochartach Dhuibhlionna; Seosamh 'ac Grianna (1900–1990)
Publisher: Cú Uladh, Dublin, undated (1925)
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU019: An Grádh agus an Ghruaim; Seosamh 'ac Grianna (1900–1990)
Publisher: C S Ó Fallamhain, Dublin, 1929
Original e-text: Ciarán Ó Duibhín
LU020: Pádraic Ó Conaire agus Aistí Eile; Seosamh 'ac Grianna (1900–1990)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1936
Original e-text: Ciarán Ó Duibhín
LU023: Indé agus Indiu; Seaghán 'ac Meanman (1886–1962)
Publisher: C S Ó Fallamhain, Dublin, 1929
Original e-text: Ciarán Ó Duibhín
LU024: Fear Siubhail; Seaghán 'ac Meanman (1886–1962)
Publisher: Preas Dhún Dealgan, Dundalk, 1924 (we used: Oifig Díolta Foillseacháin Rialtais, Dublin, 1931)
Original e-text: Ciarán Ó Duibhín
Tá píosaí Béarla in 1924, a bhfuil Gaedhilg ortha in 1931. Is beag má tá duifir sa chuid eile.
LU025: Sgéalta Goiride Geimhridh; Seaghán 'ac Meanman (1886–1962)
Publisher: Clódhanna Teo, 1918 (we also used: Preas Dhún Dealgan, Dundalk, 1922)
Original e-text: Ciarán Ó Duibhín
The stories had earlier been previously published in the "Weekly Freeman" and the "Claidheamh Soluis". The author may have had more control over the 1922 edition, which (despite what is said in the foreword) is considerably changed from the 1918 one, and is much better overall. The 1922 edition is generally followed here, but mistakes newly introduced in the 1922 edition are silently corrected. Many of the systematic changes between the 1918 and 1922 editions were incompletely carried out; for each such change, we have had to choose—for the purposes of the index—either to complete it, to leave it incomplete as found, or to undo it.
LU026: Ó Chamhaoir go Clap-Sholas; Seaghán 'ac Meanman (1886–1962)
Publisher: Oifig an tSoláthair, Dublin, 1940
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU027: Mám as mo Mhála; Seaghán 'ac Meanman (1886–1962)
Publisher: Oifig an tSoláthair, Dublin, 1940
Original e-text: Ciarán Ó Duibhín
LU028: Mám Eile as an Mhála Chéadna; Seaghán 'ac Meanman (1886–1962)
Publisher: Oifig an tSoláthair, Dublin, 1954
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU029: Crathadh an Phocáin; Seaghán 'ac Meanman (1886–1962)
Publisher: Oifig an tSoláthair, Dublin, 1955
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU032: Saoghal Corrach; Séamus 'ac Grianna (1889–1969)
Publisher: An Press Náisiúnta, Dublin, undated (1945)
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU033: Mo Dhá Róisín; Séamus 'ac Grianna (1889–1969)
Publisher: Preas Dhún Dealgan, Dundalk, undated (1921)
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU034: Nuair a Bhí Mé Óg; Séamus 'ac Grianna (1889–1969)
Publisher: Clólucht an Talbóidigh, Dublin, 1942
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU035: Cioth is Dealán; Séamus 'ac Grianna (1889–1969)
Publisher: Preas Dhún Dealgan, Dundalk, undated (1926)
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU036: Caisleáin Óir; Séamus 'ac Grianna (1889–1969)
Publisher: Preas Dhún Dealgan, Dundalk, 1924
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU037: Thiar i dTír Chonaill; Séamus 'ac Grianna (1889–1969)
Publisher: Faoi Chomhartha na dTrí gCoinneal, Dublin, 1940
Original e-text: Ciarán Ó Duibhín
LU038: Bean Ruadh de Dhálach; Séamus 'ac Grianna (1889–1969)
Publisher: Oifig an tSoláthair, Dublin, 1966
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU039: Micheál Ruadh; Séamus 'ac Grianna (1889–1969)
Publisher: Preas Dhún Dealgan, Dundalk, undated (1925)
Original e-text: Ailbhe Ó Corráin, then of Queen's University Belfast
LU040: Rann na Feirste; Séamus 'ac Grianna (1889–1969)
Publisher: An Press Náisiúnta, undated (1942)
Original e-text: Ciarán Ó Duibhín
LU041: An Clár is an Fhoireann; Séamus 'ac Grianna (1889–1969)
Publisher: Oifig an tSoláthair, Dublin, 1955
Original e-text: Ciarán Ó Duibhín
LU044: Le Clap-Sholus; Séamus 'ac Grianna (1889–1969)
Publisher: Oifig an tSoláthair, Dublin, 1967
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU047: Scéal Úr agus Sean-Scéal; Séamus 'ac Grianna (1889–1969)
Publisher: Oifig an tSoláthair, Dublin, 1945 (we used: Oifig an tSoláthair, Dublin, 1950)
Original e-text: Ciarán Ó Duibhín
LU050: An Teach nár Tógadh; Séamus 'ac Grianna (1889–1969)
Publisher: Oifig an tSoláthair, Dublin, 1948
Original e-text: Ciarán Ó Duibhín
LU054: Cloich Cheann-Fhaolaidh; Séamus Ó Searcaigh (1886–1965)
Publisher: M.H. Gill & Son Ltd, Dublin, 1908
Original e-text: Rita Nic Aodha Bhuí, for FNG
LU056: Thiar i nGleann Ceo; Tadhg Ó Rabhartaigh (1909–1982)
Publisher: Oifig an tSoláthair, Dublin, 1953
Original e-text: Seosamh Ó Labhraí, Coláiste Ollscoile Naomh Muire, Béal Feirste
LU057: Mian na Marbh; Tadhg Ó Rabhartaigh (1909–1982)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1937
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU065: Rácáil agus Scuabadh; Seaghán 'ac Meanman (1886–1962)
Publisher: Oifig an tSoláthair, Dublin, 1955
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU072: Cáitheamh na dTonn; Pádraig Ó Gallchobhair (1892–1961)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1934
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LU073: Na Lochlannaigh; Seosamh 'ac Grianna (1900–1990)
Publisher: Oifig an tSoláthair, Dublin, 1938
Original e-text: Lá, Belfast
LU079: Sgéilíní na Finne; Aindrias Ó Baoighill (1888–1972)
Publisher: C S Ó Fallamhain, Dublin, undated (1928)
Original e-text: Ciarán Ó Duibhín
U032: Dírbheathaisnéis, Niall 'ac Giolla Bhrighde (1861–1942), (ed. Liam O Connacháin)        ÚR!
Publisher: Brún agus Ó Nualláin Teór., Dublin, undated (1940)
Original e-text: Pádraig Ó Mianáin
U043: Scéalta Johnny Sheimisín, (ed.) Niall Ó Domhnaill (1908–1995)
Publisher: Comhaltas Uladh, Belfast & Dundalk, 1948
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
U131: Scéal Hiúdaí Sheáinín; Eoghan Ó Domhnaill (1908–1966)                ÚR!
Publisher: Oifig an tSoláthair, 1940
Original e-text: Rita Nic Aodha Bhuí
A221: Scéalta Sealgaire; trans. Máighréad Nic Mhaicín (1899–1983)                ÚR!
Publisher: Oifig an tSoláthair, 1954
Original e-text: Rita Nic Aodha Bhuí
AU002: Gadaidheacht le Láimh Láidir; trans. Domhnall 'ac Grianna (1894–1962)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1936
Original e-text: Brian Mac Lochlainn
AU011: 'Teacht fríd an tSeagal; trans. Seosamh 'ac Grianna (1900–1990)                ÚR!
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, gan dáta (1934)
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
AU012: Séideán Bruithne / Amy Foster; trans. Seosamh 'ac Grianna (1900–1990)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1935
Original e-text: Nollaig Ó hUrmoltaigh, then of Queen's University, Belfast
AU013: Muinntir an Oileáin; trans. Seosamh 'ac Grianna (1900–1990)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1935
Original e-text: Rita Nic Aodha Bhuí
AU018: Scairt an Dúthchais; trans. Niall Ó Domhnaill (1908–1995)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1932
Original e-text: Ciarán Ó Duibhín
AU020: Ben-Hur; trans. Seosamh 'ac Grianna (1900–1990)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1933
Original e-text: Ciarán Ó Duibhín, using facilities at Irish Studies, University of Ulster, Coleraine
AU021: Eachtraí Sherlock Holmes; trans. Proinnsias Ó Brógáin (1905–1997)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1936
Original e-text: Ciarán Ó Duibhín
AU022: Iascaire Inse Tuile; trans. Séamus 'ac Grianna (1889–1969)
Publisher: Oifig an tSoláthair, Dublin, 1952
Original e-text: Rita Nic Aodha Bhuí
AU023: Dith Céille Almayer; trans. Seosamh 'ac Grianna (1900–1990)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1936
Original e-text: Rita Nic Aodha Bhuí
AU024: Faoi Chrann Smola; trans. Séamus 'ac Grianna (1889–1969)
Publisher: Oifig Díolta Foillseacháin Rialtais, 1934
Original e-text: Brian Mac Lochlainn
AU025: Uaigheanna Chill Mhóirne; trans. Domhnall 'ac Grianna (1894–1962)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1933
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin

AU027: Néall Dearg; trans. Niall 'ac Suibhne (1895–1949)
Publisher: Oifig Díolta Foillseacháin Rialtais, Dublin, 1935
Original e-text: Brian Mac Lochlainn

ULAIDH/BÉARLA:

AU002B: Robbery Under Arms; Rolf Boldrewood
Original e-text: Alan R. Light, via Project Gutenberg
AU011B: Comin' thro' the Rye; Helen Mathers                         ÚR!
Original e-text: Ciarán Ó Duibhín
AU012B: Typhoon / Amy Foster; Joseph Conrad
Original e-text: Judith Boss, Omaha, Nebraska, via Project Gutenberg
AU013B: Islanders; Peadar O'Donnell
Original e-text: Ciarán Ó Duibhín
AU018B: The Call of the Wild; Jack London
Original e-text: Oxford Text Archive
AU020B: Ben-Hur; Lew Wallace
Original e-text: Virginia Tech
AU021B: The Memoirs of Sherlock Holmes; Arthur Conan-Doyle
Original e-text: Roger Squires
AU022B: Iceland Fisherman; Pierre Loti, English translation by Jules Cambon
Original e-text: Dagny, and John Bickers, via Project Gutenberg
AU023B: Almayer's Folly; Joseph Conrad
Original e-text: David Price, via Project Gutenberg
AU025B: The Graves of Kilmorna, Canon Sheehan
Original e-text: Ciarán Ó Duibhín

AU027B: Red Cloud; General Sir William F Butler
Original e-text: Ciarán Ó Duibhín

ULAIDH/FRAINCIS:

AU012F: Typhon; Joseph Conrad, French translation by André Gide
Original e-text: ATILF, CNRS, Nancy (FranText) (http://atilf.atilf.fr/artis/nvlbiblio.htm, http://www.frantext.fr/)
AU020F: Ben-Hur; Lew Wallace, French translation by Philippe Mazoyer                ÚR!
Original e-text: Ciarán Ó Duibhín
AU022F: Pêcheur d'Islande; Pierre Loti
Original e-text: L'Institut National de la Langue Française

AU024F: La Terre qui Meurt; René Bazin
Original e-text: Ciarán Ó Duibhín

CONNACHTA:

LC023: Feamainn Bhealtaine; Máirtín Ó Diréain
Publisher: 
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin
LC027: An Mothall Sin Ort; Seán Ó Ruadháin
Publisher: 
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin

LC093: Fánaidheacht i gConndae Mhuigheo; Séamus Mag Uidhir
Publisher: Oifig an tSoláthair, 1944
Original e-text: Peter K Griffin
This book is an abridged collection of pieces previously published in "An tÉireannach".

MUMHAIN:

LM054: Timcheall Chinn Sléibhe; Seán Ó Dálaigh
Publisher: Oifig an tSoláthair, 1933
Original e-text: Aindí Coyle
LM066: Na hAird Ó Thuaidh; Pádraig Ua Maoileoin
Publisher: Sáirséal agus Dill, Dublin, 1960
Original e-text: Foclóir na Nua-Ghaeilge, Acadamh Ríoga na hÉireann, Dublin

ALBA:

COMPANAC: Companach na Cloinne; Iain Mac Phàidein
Publisher: Eneas MacKay, Stirling, 1912
Original e-text: Ciarán Ó Duibhín
TRIDEALB: Tri Dealbhan Cluiche; Alasdair Caimbeul
Publisher: Cló Ostaig, An t-Eilean Sgìtheanach, 1990
Original e-text: Caoimhín Ó Donnghaile, Sabhal Mòr Ostaig
BODACH: Am Measg nam Bodach; various authors
Publisher: An Comunn Gàidhealach, Glaschu, 1938
Original e-text: Ciarán Ó Duibhín
SEANCH: Seanchaidhe na Tràghad; Iain Mac Cormaic
Publisher: Eneas MacKay, Stirling, 1911
Original e-text: Ciarán Ó Duibhín

Keyboard layouts

If you are not already using a satisfactory method of keyboarding Gaelic text, you may find one of the Gaelic keyboard layouts which are included with Tobar na Gaedhilge useful.  However, the use of these layouts and of Tobar na Gaedhilge are completely independent of one another.

Information regarding three layouts may be found in the directory into which Tobar na Gaedhilge was installed.  The layouts are:

Miscellaneous

• Citing Tobar na Gaedhilge

If you publish results which have benefitted from the use of Tobar na Gaedhilge, it would be appreciated if you would cite us among your sources. A suggested form of citation for this version is:

Ciarán Ó Duibhín, Tobar na Gaedhilge, version 1.4 (2009), Gaelic textbase and retrieval system for use under MS Windows, freely downloadable from http://www.smo.uhi.ac.uk/~oduibhin/tobar/index.htm

• WARNING: Always check the sources

Although great care has been taken to make the textbase accurate, and it is constantly being improved, it is advisable not to draw linguistic conclusions without checking examples against the original printed matter.

• Using Tobar na Gaedhilge on college networks

Colleges wishing to make Tobar na Gaedhilge available on their student computing networks are welcome to do so.

• Like to help?

The compiler intends to continue to add Ulster texts to the textbase. He would welcome volunteers to: