Using the FreeLing Analyzer Program under Windows
Ciarán Ó Duibhín
![]()
FreeLing
is a system for the linguistic analysis of text (tagging, lemmatization etc)
developed at the Universitat Politècnica de Catalunya by a team including Lluís
Padró. The system includes extensive language data for Spanish, Catalan,
Galician, English and Italian; and from Version 2.2, for Welsh, Portuguese and
Asturian; and from version 3.0, for Russian and Ancient Spanish. The system takes the form of a library, which can be called from
within a computer program, but there is also a fully-compiled application
program, called analyzer.exe, which allows most of
the functionality of FreeLing to be used. We will be concerned here only with
the installation and use of this stand-alone analyzer program.
FreeLing is written in C++, and development takes place in a Unix environment.
The latest version as of 22 October 2012 is FreeLing 3.0; a Windows port
(173 MB) is available, which contains the binary and the language data, together with a patch here.
(This port also contains project files and instructions to recompile FreeLing on Windows using MSVC.)
The user manual is available here.
A number of Windows ports of earlier versions of FreeLing were also created, and will be mentioned later. Unlike Version 3.0, these earlier ports were made by
individuals on a voluntary basis, and were unsupported by the developers, but their use could be (and has been) discussed by their users on the
FreeLing Forum (to contribute to discussion, you must become
registered and then login on the FreeLing home page).
The zip file for Version 3.0 named in the link above should
be downloaded. The zip file should be extracted
into a suitable folder, taking care to preserve the subfolder
structure (by checking the "Use folder names" button or similar). If you set to
extract to C:\Program Files, the package will be extracted into C:\Program
Files\freeling_win and subfolders. It may facilitate later steps if
you now rename this folder to C:\Program
Files\FreeLing-3.0 This will be called the "extraction folder." No installer program is required. You should now replace
C:\Program Files\FreeLing-3.0\freeling\lib\freeling.dll
by the version of freeling.dll in the downloaded patch.
The Version 3.0 User Manual may be downloaded from here.
Following zip extraction, installation instructions may be found in C:\Program Files\FreeLing-3.0\README
and there is not much to add here. Our suggested modifications are minor: (a) change of extraction folder to a subfolder of Program Files,
in accordance with usual Windows practice; (b) possibly make changes of environment variables temporary, by using a batch file.
If the redistributable component (at least) of Microsoft Visual C++ is not already installed on your computer, you should now install it. Download it
from here, and run it. Alternatively, if the files
msvcr100.dll and msvcp100.dll and tlkernel.dll already exist on your
computer in the folder of some other application program, it MAY suffice to copy them to C:\Program Files\FreeLing-3.0\freeling\bin.
As a Unix program in origin, the analyzer expects to read its options from the
MS-DOS command line (other possibilities are mentioned later). The success of
the installation so far may be verified by opening an MS-DOS window, moving to
the C:\Program Files\FreeLing-3.0\freeling folder, and typing
bin\analyzer -h
This should run the analyzer program and display a list of allowed program
options (the options are fully described in the user manual).
To try out further program options, we need to make some changes to the Windows environment variables (unless these are made – and unmade – in a batch file,
see below). To make the changes now, press Win+Break, and choose Advanced system settings –> Environment variables –> System variables:
1. Append C:\Program Files\FreeLing-3.0\freeling\bin;C:\Program Files\FreeLing-3.0\freeling\lib;C:\Program Files\FreeLing-3.0\boost_1.47\lib;C:\Program Files\FreeLing-3.0\icu\bin; to Path
2. Create new variable FREELINGSHARE=C:\Program Files\FreeLing-3.0\freeling\data
The most important option for the analyzer is the name of a configuration
file. This is a file containing a set of program options, so that the
remainder of the command line need contain only options absent from the
configuration file, or those which we wish to differ from what is in the
configuration file. If no configuration file is specified on the command line,
the default is analyzer.cfg in the \freeling subfolder of the extraction
folder.
Other possible command-line content includes redirection of the input (text for
analysis) and/or output (results of analysis) to named files. In the absence
of redirection, input comes from the keyboard and output goes to the MS-DOS
window. End of input through the keyboard may be signalled by keying
Ctrl/Z.
Following zip extraction, you will find a configuration file for each
supported language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg, as.cfg, cy.cfg, pt.cfg, old-es.cfg, ru.cfg — in the C:\Program Files\FreeLing-3.0\freeling\data\config
subfolder.
If you intend to use Russian, change the 'Locale' option in C:\Program Files\FreeLing-3.0\freeling\data\config\ru.cfg from
ru_RU.UTF8 to rus. (In the other *.cfg files, the Locale option is already set to default.)
This completes the installation of the analyzer program. It can be run by
opening an MS-DOS window, and – from any folder – typing a
command line. A typical command-line might begin with
analyzer -f "%FREELINGSHARE%\config\en.cfg"
Note the double quotes, required because of the internal space in Program Files (after substitution of the value of FREELINGSHARE).
On its own, this command line will accept text from the keyboard, analyse it
according to typical rules for English, and output the results to the MS-DOS
window. Further options and redirections could be added on the command line,
to name input and/or output files, or to vary some of the settings in the
distributed English configuration file.
However, typing text direct from the keyboard in an MS-DOS window is unsatisfactory, because MS-DOS does not use the UTF8 character-set
(as required by FreeLing 3.0), or even the Latin-1 character-set (as required by earlier versions of FreeLing). As a result, any attempt to type
an accented letter in MS-DOS is likely to mean something different to FreeLing. Instead, input text should be placed in a file beforehand, and this file should be
named on the command-line, eg.
analyzer -f "%FREELINGSHARE%\config\en.cfg" <input.txt
A suitable file of text can easily be created or edited in Windows using a plain-text editor such as NotePad (just make sure the file is saved in UTF-8 encoding).
Note that this Windows port of FreeLing 3.0 has not been tested here under any
version of Windows other than Vista and XP SP3.
Instead of opening an MS-DOS window and typing a command-line to run the Analyzer program, you can use a batch file, which will allow you
to stay in Windows the whole time.
To create the batch file (called, for example, freeling.bat), go (in Windows) to the \freeling\bin subfolder of the extraction folder
and edit the content of your command-line into a plain text file, freeling.bat, using an application such as NotePad. Add a further line containing
pause, so you will have a chance to view the output before it vanishes. The batch file should be saved with encoding OEM or MS-DOS
if any of these are available; otherwise encoding ANSI or Latin-1 will do for most purposes.
Double-clicking on the batch file's icon will run the analyzer. You can even make a shortcut to the batch file, and place the short-cut on your desktop
or on your Start Menu or in any folder. If filenames mentioned in the batch file contain non-ASCII characters (such as accented letters), you
have to give these in the MS-DOS character-set rather than in the Windows character-set; in that case, unless you have a Windows plain-text editor that
can save in OEM or MS-DOS encoding, you may find it easier to edit the batch file under MS-DOS than under Windows.
If using a batch file, the changes to environmental variables described under installation above, instead of being made permanent, might be made in the batch file
before calling the analyzer, and cancelled in the batch file immediately afterwards. The following batch file embeds the previous sample analyzer command-line
within such a sequence, and prompts for the name of the input file:
@Echo Off
Set OLDPATH=%PATH%
Set PATH=C:\Program Files\FreeLing-3.0\freeling\bin;C:\Program Files\FreeLing-3.0\freeling\lib;C:\Program Files\FreeLing-3.0\boost_1.47\lib;C:\Program Files\FreeLing-3.0\icu\bin;%PATH%
Set FREELINGSHARE=C:\Program Files\FreeLing-3.0\freeling\data
Set /P _input=Input file: || Set _input=input.txt
Echo On
analyzer -f "%FREELINGSHARE%\config\en.cfg" <"%_input%"
@Echo Off
Pause
Set PATH=%OLDPATH%
Set OLDPATH=
Set FREELINGSHARE=
Set _input=
If FreeLing is always to be called in this way, the permanent changes to environment variables described under installation above may be revoked.
We hope at some stage to offer a Windows graphic interface to FreeLing, which
will allow the options to be selected visually, and will then launch the application automatically.
There is no reason to install a pre-current version of FreeLing, unless your Windows version is too old to run the current version of FreeLing.
If required, earlier versions of FreeLing are available for Windows as follows. Version 2.2 is still downloadable from the FreeLing website.
Version
1.4 (28.8 MB) has been compiled for Windows by Jordi Atserias using
cygwin.
Version 1.5 has been compiled twice for Windows;
firstly, by Bruno Martínez using MS Visual C++ 2005 —
Version
1.5 (Martínez) (22.8 MB);
and secondly, by Javier Puche using MingW + Msys + msysDTK —
Version
1.5 (Puche) (27.59 MB).
Version 2.0 has been compiled using cygwin and is available for download as a set of three zip files:
the
Version
2.0 program files (5.72 MB),
the Version
2.0 data files for English and Italian (14.19 MB), and
the Version
2.0 data files for Spanish, Catalan and Galician (20.95 MB).
Version 2.2 was ported by Israel Olalla, cross-compiled on Linux using MingW32.
Here is Version 2.2 information
and here are Version 2.2 binaries (8.4 MB). Data files
for the supported languages, as well as the user manual, are to be found in the
Version 2.2 machine-independent distribution (40.5 MB).
A comparison of the subfolder structures created by unzipping the various ports
may be helpful. Those ports without a \bin subfolder hold the binary in the
root folder.
| 3.0 | 2.2 (Olalla) | 2.0 | 1.5 (Puche) | 1.5 (Martínez) | 1.4 (Atserias) |
| freeling | |||||
| bin | bin | bin | |||
| doc | doc | doc | |||
| userman | userman | ||||
| html | html | ||||
| refman | |||||
| html | |||||
| latex | |||||
| diagrams | |||||
| include | include | include | include | ||
| freeling | |||||
| morpho | freeling | freeling | |||
| morpho | fries | fries | |||
| omlet | omlet | omlet | |||
| utf8 | |||||
| lib | lib | ||||
| devel | |||||
| java | |||||
| dynamic | |||||
| java | |||||
| util | |||||
| data | share | data | |||
| common | common | common | common | common | |
| nec | nec | nec | nec | nec | |
| connector | |||||
| lang_ident | |||||
| config | config | config | config | config | |
| ca | ca | ca | ca | ca | |
| nec | nec | nec | nec | nec | |
| chunker | |||||
| dep | |||||
| en | en | en | en | en | |
| nec | nec | nec | nec | nec | |
| chunker | |||||
| dep | |||||
| ner | |||||
| es | es | es | es | es | |
| nec | nec | nec | nec | nec | |
| chunker | |||||
| dep | dep | ||||
| coref | |||||
| corrector | |||||
| ner | |||||
| old-es | |||||
| gl | gl | gl | gl | gl | |
| nec | nec | nec | nec | nec | |
| chunker | |||||
| dep | |||||
| ner | |||||
| it | it | it | it | it | |
| nec | nec | nec | nec | nec | |
| as | |||||
| nec | |||||
| chunker | |||||
| dep | |||||
| cy | |||||
| nec | |||||
| pt | |||||
| nec | |||||
| chunker | |||||
| ner | |||||
| ru | |||||
| icu | |||||
| bin | |||||
| include | |||||
| layout | |||||
| unicode | |||||
| lib | |||||
| boost_1.47 | |||||
| boost | |||||
| (80 subfolders) | |||||
| lib |
The zip files for Version 2.2 named in the links above should
be downloaded. The zip file containing the Windows binary should be extracted
into a suitable directory, taking care to preserve the internal subdirectory
structure (by checking the "Use folder names" button or similar). If you set to
extract to C:\Program Files, the package will be extracted into C:\Program
Files\freeling-2.2-mingw and subdirectories. It may facilitate later steps if
you now rename this directory to C:\Program
Files\FreeLing-2.2 No installer program is required.
The Version 2.2 User Manual may be downloaded from here, or
alternatively, extracted from the machine-independent 2.2 distribution.
As a Unix program in origin, the analyzer expects to read its options from the
MS-DOS command line (other possibilities are mentioned later). The success of
the installation so far may be verified by opening an MS-DOS window, moving to
the extraction directory, and typing
bin\analyzer -h
This should run the analyzer program and display a list of allowed program
options (the options are fully described in the user manual).
The most important option for the analyzer is the name of a configuration
file. This is a file containing a set of program options, so that the
remainder of the command line need contain only options absent from the
configuration file, or those which are to be made different from what is in the
configuration file. If no configuration file is specified on the command line,
the default is analyzer.cfg in the extraction
directory.
Other possible command-line content includes redirection of the input (text for
analysis) and/or output (results of analysis) to named files. In the absence
of redirection, input comes from the keyboard and output goes to the MS-DOS
window. End of input through the keyboard may be signalled by keying
Ctrl/Z.
You should now extract the data files for the supported languages from the
downloaded zip file for the machine-independent 2.2 distribution. You need
select only those files packed in FreeLing-2.2\data, and then extract, having
ensured that "Use folder names" or similar is selected, and that the target is
set to C:\Program Files You will now find a configuration file for each
supported language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg, as.cfg, cy.cfg, pt.cfg — in the data\config
subdirectory. Among the contents of a configuration file will be a number of
filenames, and these filenames — at least in those configuration files which
you intend to use — may have to be changed to suit (a) Windows syntax — in
particular, by changing forward slashes in paths to backslashes; and (b) your
choice of extraction directory.
So, for example, if a configuration file contains the filename
$FREELINGSHARE/en/tokenizer.dat
it should be changed to
C:\Program
Files\FreeLing-2.2\data\en\tokenizer.dat
on the assumption that you extracted to C:\Program
Files\FreeLing-2.2
You will also have to change two filenames in configuration files:
maco.db to dicc.src
senses30.db to senses30.src
This completes the installation of the analyzer program. It can be run by
opening an MS-DOS window, moving to the extraction directory, and typing a
command line. A typical command-line might begin with
bin\analyzer -f data\config\en.cfg
On its own, this command line will accept text from the keyboard, analyse it
according to typical rules for English, and output the results to the MS-DOS
window. Further options and redirections could be added on the command line,
to name input and/or output files, or to vary some of the settings in the
distributed English configuration file.
Note that this Windows port of FreeLing 2.2 has not been tested here under any
version of Windows other than Vista.
The zip files linked above for Version 2.0 should be
downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing20), taking care to preserve
the internal subdirectory structure. No installer program is used. The
remaining information required for Windows installation will be found in the
file README.txt, contained in the top-level
directory.
The Version 2.0 User Manual is included in the download of the program files,
as doc\userman\userman.pdf.
As a Unix program in origin, the analyzer expects to read its options from the
MS-DOS command line (other possibilities are mentioned later). For now, the
installation of the analyzer can be tested by opening an MS-DOS window, moving
to the extraction directory, and typing
bin\analyzer -h
This should display a list of allowed program options (the options are fully
described in the user manual).
The most important option for the analyzer is the name of a configuration
file. This is a file containing a set of program options, so that the
remainder of the command line need contain only options absent from the
configuration file, or those which are to be made different from what is in the
configuration file. If no configuration file is specified on the command line,
the default is analyzer.cfg in the extraction
directory.
Other possible command-line content includes redirection of the input (text for
analysis) and/or output (results of analysis) to named files. In the absence
of redirection, input comes from the keyboard and output goes to the MS-DOS
window. End of input through the keyboard may be signalled by keying
Ctrl/Z.
You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the share\config subdirectory. Among the contents of a
configuration file will be a number of filenames, and these filenames — at
least in those configuration files which you intend to use — may have to be
changed to suit (a) Windows syntax — in particular, by changing forward slashes
in paths to backslashes; and (b) your choice of extraction directory.
So, for example, if a configuration file contains the filename
$FREELINGSHARE/en/tokenizer.dat
it should be changed to
C:\Program
Files\FreeLing20\share\en\tokenizer.dat
on the assumption that you extracted to C:\Program
Files\FreeLing20
This completes the installation of the analyzer program. It can be run by
opening an MS-DOS window, moving to the extraction directory, and typing a
command line. A typical command-line might begin with
bin\analyzer -f share\config\en.cfg
On its own, this command line will accept text from the keyboard, analyse it
according to typical rules for English, and output the results to the MS-DOS
window. Further options and redirections could be added on the command line,
to name input and/or output files, or to vary some of the settings in the
distributed English configuration file. Note that, in older versions of
Windows, you may have to abbreviate filenames and directory names to their 8+3
forms.
The zip file linked above for Version 1.5 (Puche) should be
downloaded, and extracted into a suitable directory (eg. C:\Program Files\freeling1.5-win-java-all-langs), taking
care to preserve the internal subdirectory structure. No installer program is
used. (Don't worry about the mention of Java — it is not involved in running
the analyzer compiled program.)
The Version 1.5 PDF User Manual is not included in the download, and has been
superseded on the FreeLing website. Therefore I make it available here.
As a Unix program in origin, the analyzer expects to read its options from the
MS-DOS command line (other possibilities are mentioned later). For now, the
installation of the analyzer can be tested by opening an MS-DOS window, moving
to the extraction directory, and typing
analyzer -h
This should display a list of allowed program options (the options are fully
described in the user manual).
The most important option for the analyzer is the name of a configuration
file. This is a file containing a set of program options, so that the
remainder of the command line need contain only options absent from the
configuration file, or those which are to be made different from what is in the
configuration file. If no configuration file is specified on the command line,
the default is analyzer.cfg in the extraction
directory.
Other possible command-line content includes redirection of the input (text for
analysis) and/or output (results of analysis) to named files. In the absence
of redirection, input comes from the keyboard and output goes to the MS-DOS
window. End of input through the keyboard may be signalled by keying
Ctrl/Z.
You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the config
subdirectory. Among the contents of a configuration file will be a number of
filenames, and these filenames — at least in those configuration files which
you intend to use — may have to be changed to suit (a) Windows syntax — in
particular, by changing forward slashes in paths to backslashes; and (b) your
choice of extraction directory.
So, for example, if a configuration file contains the filename
en/tokenizer.dat
it may have to be changed to
C:\Program
Files\freeling1.5-win-java-all-langs\en\tokenizer.dat
on the assumption that you extracted to C:\Program
Files\freeling1.5-win-java-all-langs
This completes the installation of the analyzer program. It can be run by
opening an MS-DOS window, moving to the extraction directory, and typing a
command line. A typical command-line might begin with
analyzer -f config\en.cfg
On its own, this command line will accept text from the keyboard, analyse it
according to typical rules for English, and output the results to the MS-DOS
window. Further options and redirections could be added on the command line,
to name input and/or output files, or to vary some of the settings in the
distributed English configuration file. Note that, in older versions of
Windows, you may have to abbreviate filenames and directory names to their 8+3
forms.
First off, if you are using Windows
95, FreeLing 1.5 (Martínez) will NOT work under it — you may follow the
process below for a certain distance, but it will eventually fail.
The zip file linked above for Version 1.5 (Martínez) should be downloaded, and
extracted into a suitable directory (eg. C:\Program
Files\FreeLing-1.5), taking care to preserve the internal subdirectory
structure. No installer program is used. No information whatever is included
about how to install.
The Version 1.5 PDF User Manual is not included in the download, and has been
superseded on the FreeLing website. Therefore I make it available here.
As a Unix program in origin, the analyzer expects to read its options from the
MS-DOS command line (other possibilities are mentioned later). For now, the
installation of the analyzer can be tested by opening an MS-DOS window, moving
to the extraction directory, and typing
analyzer -h
This should display a list of allowed program options (the options are fully
described in the user manual).
However, an error may occur at this point because the download does NOT contain
the essential files MSVCR80.DLL and MSVCP80.DLL, which are runtime libraries required by
applications written in Visual C++ 2005 (that includes this port of FreeLing).
You may already have these files on your machine, if you have previously
installed Visual C++ 2005, or an application written in it.
The absence of these files produces different error messages in different
versions of Windows. In early versions (95; 98; ME? 2000?), it may say "A
required .DLL file, MSVCP80.DLL, was not found." With later versions of
Windows (XP; XP with SP2? 2003? Vista?), the message may be "The application
has failed to start because the application configuration is incorrect.
Reinstalling the application may fix this problem." Use of Resource Hacker shows that
analyzer.exe contains an embedded manifest which asks for version 8.0.50727.762
of the .dlls.
If these .dlls are missing or are causing errors, the best (and safest) way to
rectify this is to download and install the appropriate one of two free
Microsoft packages:
• for Windows 98; 98 Second Edition; ME: Microsoft
Visual C++ 2005 Redistributable Package (x86), v. 1.0 (actually,
8.0.50727.42), dated 2006/04/10 (2.6 MB);
• for Windows 2000; XP; 2003; Vista: Microsoft
Visual C++ 2005 SP1 Redistributable Package (x86), v. 8.0.50727.762, dated
2007/04/10 (2.6 MB)
As regards Windows 95, the first of these
Microsoft packages will actually install under Windows 95 — or at least, under
Windows 95B — but running the analyzer program now produces the message "The
MSVCR80.DLL file is linked to missing export
KERNEL32.DLL:GetLongPathNameW." This means that MSVCR80.DLL (and indeed
VC++ 2005 as a whole) is simply incompatible with Windows 95, which does not
have routines like GetLongPathNameW in its kernel. To run under Windows 95,
this port of FreeLing would need to be recompiled under an earlier version of
VC++, such as VC++ 7.1 (also known as Visual C++ .NET 2003), or possibly even
as far back as VC++ 6.
Under Windows 98 or later, if the analyzer -h
command is now producing output, we may try something more ambitious.
The most important option for the analyzer is the name of a configuration
file. This is a file containing a set of program options, so that the
remainder of the command line need contain only options absent from the
configuration file, or those which are to be made different from what is in the
configuration file. If no configuration file is specified on the command line,
the default is analyzer.cfg in the extraction
directory.
Other possible command-line content includes redirection of the input (text for
analysis) and/or output (results of analysis) to named files. In the absence
of redirection, input comes from the keyboard and output goes to the MS-DOS
window. End of input through the keyboard may be signalled by keying Ctrl/Z
followed by Enter.
You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the config
subdirectory. Among the contents of a configuration file will be a number of
filenames, and these filenames — at least in those configuration files which
you intend to use — may have to be changed to suit (a) Windows syntax — in
particular, by changing forward slashes in paths to backslashes; and (b) your
choice of extraction directory.
So, for example, if a configuration file contains the filename
en/tokenizer.dat
it may have to be changed to
C:\Program
Files\Freeling-1.5\en\tokenizer.dat
on the assumption that you extracted to C:\Program
Files\FreeLing-1.5
You may now try a command line such as the following
analyzer -f config\en.cfg
On its own, this command line will accept text from the keyboard, analyse it
according to typical rules for English, and output the results to the MS-DOS
window. Further options and redirections could be added on the command line,
to name input and/or output files, or to vary some of the settings in the
distributed English configuration file. Note that, in older versions of
Windows, you may have to abbreviate filenames and directory names to their 8+3
forms.
In Windows 98 — and the same is probably
true of Windows ME — the analyzer program
may now produce error messages such as "Error 14 while opening database
en\maco.db." This problem is cured if the distributed file libdb45.dll is replaced by this
one, which was kindly compiled by Andrei Costache of Oracle/BerkeleyDB for
Windows 98/ME as the target system. (It works on newer Windows systems too, but
may not perform as efficiently on the newer systems as the distributed
libdb45.dll.) Many thanks to Andrei for his patient help with this. Remember
that use of libdb45.dll is subject to the terms of the BerkeleyDB
licence agreement.
This completes the installation of the analyzer program.
The omissions in the distribution and the incompatibility of the executable
with Windows 95 are regrettable, as compilation under Virtual C++ feels like
the best way to go.
The zip file linked above for Version 1.4 should be
downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing-1.4), taking care to preserve
the internal subdirectory structure. No installer program is used. The
remaining information required for Windows installation will be found in the
file Readme, contained in the top-level
directory.
The Version 1.4 User Manual is included in the general download for that
version, as doc\userman\userman.pdf.
As a Unix program in origin, the analyzer expects to read its options from the
MS-DOS command line (other possibilities are mentioned later). For now, the
installation of the analyzer can be tested by opening an MS-DOS window, moving
to the extraction directory, and typing
analyzer -h
This should display a list of allowed program options (the options are fully
described in the user manual).
The most important option for the analyzer is the name of a configuration
file. This is a file containing a set of program options, so that the
remainder of the command line need contain only options absent from the
configuration file, or those which are to be made different from what is in the
configuration file. If no configuration file is specified on the command line,
the default is analyzer.cfg in the extraction
directory.
Other possible command-line content includes redirection of the input (text for
analysis) and/or output (results of analysis) to named files. In the absence
of redirection, input comes from the keyboard and output goes to the MS-DOS
window. End of input through the keyboard may be signalled by keying
Ctrl/Z.
You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the data\config
subdirectory. Among the contents of a configuration file will be a number of
filenames, and these filenames — at least in those configuration files which
you intend to use — may have to be changed to suit (a) Windows syntax — in
particular, by changing forward slashes in paths to backslashes; and (b) your
choice of extraction directory.
So, for example, if a configuration file contains the filename
N:/Eines-SL/freeling1.4/FreeLing/en/tokenizer.dat
or, in fact,
<anything>/en/tokenizer.dat
it should be changed to
C:\Program
Files\FreeLing-1.4\data\en\tokenizer.dat
on the assumption that you extracted to C:\Program
Files\FreeLing-1.4
This completes the installation of the analyzer program. It can be run by
opening an MS-DOS window, moving to the extraction directory, and typing a
command line. A typical command-line might begin with
analyzer -f data\config\en.cfg
On its own, this command line will accept text from the keyboard, analyse it
according to typical rules for English, and output the results to the MS-DOS
window. Further options and redirections could be added on the command line,
to name input and/or output files, or to vary some of the settings in the
distributed English configuration file. Note that, in older versions of
Windows, you may have to abbreviate filenames and directory names to their 8+3
forms.
This webpage is really about using FreeLing on Windows in the
form of the analyzer application, but it is also
possible to call FreeLing from a programming language, such as C++ or Java. I
have no personal experience of doing this, but here are a few basic notes,
which others are invited to correct and extend.
The FreeLing user manual usually contains a chapter entitled "Using the library
from your own application," but not necessarily written from a Windows perspective.
According to the manual, versions up to and including 2.0 had only
a C++ API, while versions 2.2 and 3.0 offer "a complete C++ API, a quite-complete Java API,
and half-complete perl and python APIs".
To use FreeLing from a programming language in Windows, FreeLing should be
available as a .dll file. However, only certain versions contain a large .dll file in their
distributions. Version 1.5 (Puche) supplies morfo.dll and morfo_java.dll
in the java folder. Version 2.2 (Olalla) supplies similarly-named .dll files in the bin folder. Version 3.0 supplies
freeling.dll and freeling-d.dll, in the freeling\lib folder.
From C++
To call FreeLing without recompilation from a C++ application under Windows, we may need to use the
.dll files which are distributed with certain versions as above, or to compile them for versions where they are not distributed.
The definitions are in the .h files in the \include subfolder, but these
are not distributed in either of the Version 1.5 ports (Puche or Martínez),
though they are probably to be found in the Linux distribution of that
version.
The user manual shows an example of a C++ program using the library. In compiling the example user's program, libraries named
morpho, db_cxx, pcre are linked from the earliest versions of FreeLing, with omlet and fries(?) added at version 2.0 and boost_filesystem
added at version 2.2. The user manual explains that morpho "links with libmorfo library, which is the final result of the FreeLing compilation process",
while db_cxx, pcre and the others refer to "other libraries required by FreeLing."
I am unclear how these libraries relate to the dll files supplied with certain versions as above, or if they relate at all.
Of course, the C++ programmer — unlike the user of other programming languages — has the alternative of recompiling any
version of FreeLing entirely from source along with his own application.
From Java
Puche's file java\USAGE.txt tells how to call FreeLing 1.5 (Puche) from a Java application, using the
two distributed .dlls and also a distributed file libmorfo_java.jar, which
contains the FreeLing API definitions, as well as some code. Such an application, myprog, is compiled as follows:
javac -classpath libmorfo_java.jar
myprog.java
and is executed as follows:
java -classpath libmorfo_java.jar;.
myprog
The application source file myprog.java makes
internal reference to morfo_java.dll
From Delphi
From perl
From python
Contributions for inclusion here, concerning the use of FreeLing from any other programming language, will be welcomed.
This page is offered as a facility for corpus analysis on
Windows. By using it, you are deemed to accept that the author bears no
responsibility for any adverse consequences. Needless to say, he hopes that
there will be no such consequences. He will be pleased to receive comments,
but cannot promise to act upon them.