Acmhainn Díchlaochlaithe agus Scoilte Focal:
Word Demutation
and Segmentation Tool
Ciarán Ó Duibhín
![]()
Más fada leat an teideal thuas, ní miste liom má bheir tú "An Duibhíneach" ar an acmhainn seo!
Bainidh an acmhainn seo na claochlaithe tosaigh ón fhocal a bheirtear dó, agus, má bhíonn uaschamóg nó fleiscín san fhocal, scoiltidh sé an focal ina chodanna dá réir. Oibridheacht riachtanach seo i dtéacs-phróiseáil na Gaedhilge.
This tool attempts to remove initial mutations from a supplied Gaelic word, and also to segment any word containing an apostrophe or a hyphen into its constituent parts. This is a basic operation in Gaelic text processing.
An taisbeanadh — Using the demo
Is féidir taisbeanadh beag fá choinne MS-Windows a tharraingt anuas as seo. Leis an taisbeanadh seo, cuirtear isteach focla fríd an mhéarchlár, do réir ceann is ceann, lena bhfuascladh. Bain triail as ar shamplaí mar iad seo:
fhear, bhfear, n-éan, nÉan, héanacha, hata, d'ól, arsa'n, b'fhada, 'seadh, nárbh' etc.A simple demo application for MS-Windows is provided for downloading from here. In this demo, words to be resolved are simply typed in, one at a time. Try it out on a few of the examples just suggested.
Más eol dó go bhfuil an dá fhéidearthacht ann leis an fhocal a bheirtear dó (m.sh.
thart), fiafruighidh sé díot an mbainfidh sé an claochlú de nó nach mbainfidh. In gnáth-úsáid na h-acmhainne, agus na focla dá ndealbhú as téacs reatha, thiocfadh leis an chomhthéacs bheith ina chuidiú leis an cheist a fhreagairt.If you give it a word-form which it knows to be ambiguous with regard to demutation (e.g.
thart), it will ask you to choose whether to demutate or not. In a realistic application, where the words are being drawn from a running text, you could arrange to show some context to inform the decision.Tá an acmhainn seo in úsáid leis na bliadhanta ag Foclóir na Nua-Ghaeilge in Acadamh Ríoga na hÉireann. Is mar thoradh ar an fhéacháil chruaidh a cuireadh uirthi annsin a tháinig cuid mhaith den fhorbairt atá deánta uirthi.
This tool has been in use for many years by Foclóir na Nua-Ghaeilge at Acadamh Ríoga na hÉireann. Much of its development is due to the range of data which it has encountered there.
I dteannta na modhanna oibre a mbeifí ag súil leo, bainidh an acmhainn leas as comhad ina bhfuil liostaí de fhocla eisceachtamhla. Tá dhá chomhad den tseort seo in éineacht leis an oideas taisbeanaidh, agus caithfear ceann acu a cheangal leis an oideas nuair a chuirtear a dhul é. Is iad na comhaid seo
The tool uses obvious algorithms, backed up by a file containing lists of exceptional words. Two such files are supplied with the demo, and one or other of them must be selected as the demo starts up. They are
An t-oideas bunaidh — Using the source
Sa dóigh is go dtig leat an acmhainn seo a chur go feidhm in do obair oideas-chumtha féin, cuirtear ar fagháil é san fhoirm bhunaidh (in Delphi 5); thig a tharraingt chugat as seo.
If you wish to use the tool in your own programming, it is supplied in source form (in Delphi 5), downloadable from here.
Glaoidhtear air mar fhó-oideas ó do fheidhm-oideas féin:
The interface takes the form of a procedure which may be called from an application:
function enrichword
(word: string; action: affirmtype;
segment, demutate: boolean;
endofline, prefixnow: boolean;
splitter, padesc: char;
continuation: boolean): string;
word: the word to be processed. Normally a complete word, but it is
allowable for it to be either part of a broken word (e.g. one hyphenated at an end of
line in running text). Special treatment as the initial or final part of a
broken word is secured by setting endofline or continuation
respectively true; for a complete word, both should be set false.
action: the name of a user-supplied function to handle queries from enrichword and
pass back the user's
replies. The specification is:
function action (prompt, default: string): boolean;
The function should be written to display the prompt string, and if desired the default value of the
reply — these are given to it by enrichword — and should invite a reply (typically,
In the demo, word is as typed by the user; action uses a one-line edit box on the screen; segment and demutate are always true; endofline, prefixnow and continuation are always false; splitter is '
+' and padesc is '^', but before the output is displayed splitter is replaced by a number of spaces, while padesc and the character following it are removed.In general, it is the responsibility of the user program which calls enrichword to:
The file of word lists will be opened when first required by the application. If you wish to
open it before then, you may call the procedure:
processlists (action)
where action has the same specification as the similarly-named parameter
of enrichword above. The same procedure as there may be reused
here.
Tá sé ceadmhach úsáid ar bith is mian leat a bhaint as an acmhainn seo, agus a chur in oireamhaint duit féin. Má bhaintear leas as, ba mhór liom dá dtabharfaí creideamhaint don áit a bhfuarthas é.
This tool may be used freely and adapted in any way. If it is found useful, it would be appreciated if its source was credited.