Using the FreeLing Analyzer Program under Windows

Ciarán Ó Duibhín

FreeLing is a system for the linguistic analysis of text (tagging, lemmatization etc) developed at the Universitat Politècnica de Catalunya by a team including Lluís Padró.  The system includes extensive language data for Spanish, Catalan, Galician, English and Italian; and from Version 2.2, for Welsh, Portuguese and Asturian.  The system takes the form of a library, which can be called from within a computer program, but there is also a fully-compiled application program, called analyzer.exe, which allows most of the functionality of FreeLing to be used.  We will be concerned here only with the installation and use of this stand-alone analyzer program.

FreeLing is written in C++, and development takes place in a Unix environment.  The latest version as of 03 September 2010 is FreeLing 2.2. A port of FreeLing 2.2 for Windows has been made by Israel Olalla, cross-compiled on Linux using MingW32. Here is information and here are binaries (8.4 MB) and here is the usermanual for 2.2 (407 KB). Data files for the supported languages are to be found in the machine-independent 2.2 distribution (40.5 MB). A number of ports of earlier versions of FreeLing are also downloadable, and will be mentioned later.

These Windows ports have been made by individuals on a voluntary basis, and are offered by the FreeLing developers for download from the FreeLing website merely "as a service".  The FreeLing developers are at pains to point out that they have no interest in Windows, and cannot assist users of FreeLing under Windows.  Some discussion among users of FreeLing under Windows may be read on the FreeLing Forum (to contribute to discussion, you must become registered and then login on the FreeLing home page).

Installing Analyzer from FreeLing 2.2 in Windows

The zip files for Version 2.2 named in the links above should be downloaded. The zip file containing the Windows binary should be extracted into a suitable directory, taking care to preserve the internal subdirectory structure (by checking the "Use folder names" button or similar). If you set to extract to C:\Program Files, the package will be extracted into C:\Program Files\freeling-2.2-mingw and subdirectories. It may facilitate later steps if you now rename this directory to C:\Program Files\FreeLing-2.2 No installer program is required.

The Version 2.2 User Manual may be downloaded from here, or alternatively, extracted from the machine-independent 2.2 distribution.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  The success of the installation so far may be verified by opening an MS-DOS window, moving to the extraction directory, and typing

        bin\analyzer -h

This should run the analyzer program and display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You should now extract the data files for the supported languages from the downloaded zip file for the machine-independent 2.2 distribution. You need select only those files packed in FreeLing-2.2\data, and then extract, having ensured that "Use folder names" or similar is selected, and that the target is set to C:\Program Files You will now find a configuration file for each supported language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg, as.cfg, cy.cfg, pt.cfg — in the data\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        $FREELINGSHARE/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing-2.2\data\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-2.2

You will also have to change two filenames in configuration files:

        maco.db to dicc.src

        senses30.db to senses30.src

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        bin\analyzer -f data\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.

Note that this Windows port of FreeLing 2.2 has not been tested here under any version of Windows other than Vista.

Using a MS-DOS batch file

As an alternative to typing command lines at the MS-DOS prompt, you can — while remaining in Windows — go to the extraction directory and edit the same lines into a text file with extension .bat (called, for example, freeling.bat).  Double-clicking on this file will open an MS-DOS window and run the program in it.  You can even add a shortcut to the batch file from your desktop or from your Start Menu.  If filenames mentioned in the batch file contain non-ASCII characters (such as accented letters), you may have to render these in the MS-DOS character-set rather than in the Windows character-set; in that case, you may find it easier to edit the batch file under MS-DOS than under Windows.

A graphic interface

The analyzer program has to be run from the MS-DOS command-line, where the required options and redirections are specified.

We hope at some stage to offer a Windows graphic interface to FreeLing, which will allow the options to be selected visually, and then the MS-DOS application to be launched automatically.

Ports of earlier versions of FreeLing

Earlier versions of FreeLing are available for Windows as follows, and are downloadable from the FreeLing website.  Version 1.4 (28.8 MB) has been compiled for Windows by Jordi Atserias using cygwin.  Version 1.5 — the version prior to 2.0 — has been compiled twice for Windows; firstly, by Bruno Martínez using MS Visual C++ 2005 — Version 1.5 (Martínez) (22.8 MB); and secondly, by Javier Puche using MingW + Msys + msysDTK — Version 1.5 (Puche) (27.59 MB). Version 2.0 has been compiled using cygwin and is available for download as a set of three zip files: the Version 2.0 program files (5.72 MB), the Version 2.0 data files for English and Italian (14.19 MB), and the Version 2.0 data files for Spanish, Catalan and Galician (20.95 MB).

A comparison of the directory structures created by unzipping the various ports may be helpful. Those ports without a \bin subdirectory hold the binary in the root directory.

2.2 (Olalla) 2.0 1.5 (Puche) 1.5 (Martínez) 1.4 (Atserias)
bin bin      
doc doc     doc
    userman        userman
       html           html
           refman
              html
              latex
           diagrams
include include     include
  freeling   freeling      
  fries   fries      
  omlet   omlet      
lib        
    devel    
      java    
    dynamic    
    java    
    util    
  share     data
    common common common   common
       nec   nec   nec      nec
    config config config   config
    ca ca ca   ca
       nec   nec   nec      nec
    en en en   en
       nec   nec   nec      nec
    es es es    es
       nec   nec   nec      nec
    gl gl gl   gl
       nec   nec   nec      nec
    it it it   it
       nec   nec   nec      nec

Installing Analyzer from FreeLing 2.0 in Windows

The zip files linked above for Version 2.0 should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing20), taking care to preserve the internal subdirectory structure. No installer program is used.  The remaining information required for Windows installation will be found in the file README.txt, contained in the top-level directory.

The Version 2.0 User Manual is included in the download of the program files, as doc\userman\userman.pdf.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        bin\analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the share\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        $FREELINGSHARE/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing20\share\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing20

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        bin\analyzer -f share\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Installing Analyzer from FreeLing 1.5 (Puche) in Windows

The zip file linked above for Version 1.5 (Puche) should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\freeling1.5-win-java-all-langs), taking care to preserve the internal subdirectory structure.  No installer program is used.  (Don't worry about the mention of Java — it is not involved in running the analyzer compiled program.)

The Version 1.5 PDF User Manual is not included in the download, and has been superseded on the FreeLing website.  Therefore I make it available here.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        en/tokenizer.dat

it may have to be changed to

        C:\Program Files\freeling1.5-win-java-all-langs\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\freeling1.5-win-java-all-langs

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        analyzer -f config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Installing Analyzer from FreeLing 1.5 (Martínez) in Windows

First off, if you are using Windows 95, FreeLing 1.5 (Martínez) will NOT work under it — you may follow the process below for a certain distance, but it will eventually fail.

The zip file linked above for Version 1.5 (Martínez) should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing-1.5), taking care to preserve the internal subdirectory structure. No installer program is used. No information whatever is included about how to install.

The Version 1.5 PDF User Manual is not included in the download, and has been superseded on the FreeLing website.  Therefore I make it available here.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

However, an error may occur at this point because the download does NOT contain the essential files MSVCR80.DLL and MSVCP80.DLL, which are runtime libraries required by applications written in Visual C++ 2005 (that includes this port of FreeLing).  You may already have these files on your machine, if you have previously installed Visual C++ 2005, or an application written in it.

The absence of these files produces different error messages in different versions of Windows.  In early versions (95; 98; ME? 2000?), it may say "A required .DLL file, MSVCP80.DLL, was not found."  With later versions of Windows (XP; XP with SP2? 2003? Vista?), the message may be "The application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem."  Use of Resource Hacker shows that analyzer.exe contains an embedded manifest which asks for version 8.0.50727.762 of the .dlls.

If these .dlls are missing or are causing errors, the best (and safest) way to rectify this is to download and install the appropriate one of two free Microsoft packages:
• for Windows 98; 98 Second Edition; ME: Microsoft Visual C++ 2005 Redistributable Package (x86), v. 1.0 (actually, 8.0.50727.42), dated 2006/04/10 (2.6 MB);
• for Windows 2000; XP; 2003; Vista: Microsoft Visual C++ 2005 SP1 Redistributable Package (x86), v. 8.0.50727.762, dated 2007/04/10 (2.6 MB)

As regards Windows 95, the first of these Microsoft packages will actually install under Windows 95 — or at least, under Windows 95B — but running the analyzer program now produces the message "The MSVCR80.DLL file is linked to missing export KERNEL32.DLL:GetLongPathNameW."  This means that MSVCR80.DLL (and indeed VC++ 2005 as a whole) is simply incompatible with Windows 95, which does not have routines like GetLongPathNameW in its kernel.  To run under Windows 95, this port of FreeLing would need to be recompiled under an earlier version of VC++, such as VC++ 7.1 (also known as Visual C++ .NET 2003), or possibly even as far back as VC++ 6.

Under Windows 98 or later, if the analyzer -h command is now producing output, we may try something more ambitious.

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z followed by Enter.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        en/tokenizer.dat

it may have to be changed to

        C:\Program Files\Freeling-1.5\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-1.5

You may now try a command line such as the following

        analyzer -f config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

In Windows 98 — and the same is probably true of Windows ME — the analyzer program may now produce error messages such as "Error 14 while opening database en\maco.db."  This problem is cured if the distributed file libdb45.dll is replaced by this one, which was kindly compiled by Andrei Costache of Oracle/BerkeleyDB for Windows 98/ME as the target system. (It works on newer Windows systems too, but may not perform as efficiently on the newer systems as the distributed libdb45.dll.)  Many thanks to Andrei for his patient help with this.  Remember that use of libdb45.dll is subject to the terms of the BerkeleyDB licence agreement.

This completes the installation of the analyzer program.

The omissions in the distribution and the incompatibility of the executable with Windows 95 are regrettable, as compilation under Virtual C++ feels like the best way to go.

Installing Analyzer from FreeLing 1.4 in Windows

The zip file linked above for Version 1.4 should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing-1.4), taking care to preserve the internal subdirectory structure. No installer program is used.  The remaining information required for Windows installation will be found in the file Readme, contained in the top-level directory.

The Version 1.4 User Manual is included in the general download for that version, as doc\userman\userman.pdf.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the data\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        N:/Eines-SL/freeling1.4/FreeLing/en/tokenizer.dat

or, in fact,

        <anything>/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing-1.4\data\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-1.4

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        analyzer -f data\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Using FreeLing from a Programming Language in Windows

This webpage is really about using FreeLing on Windows in the form of the analyzer application, but it is also possible to call FreeLing from a programming language, such as C++ or Java. I have no personal experience of doing this, but here are a few basic notes, which others are invited to correct and extend.

To use FreeLing from a programming language in Windows, FreeLing has to be compiled as a .dll file. Only the Version 2.2 (Olalla) and Version 1.5 (Puche) ports supply such a file — actually, it was found necessary there to break the .dll file into two parts, which are called morfo.dll and morfo_java.dll.

I will try to describe here Puche's use of these files. His file java\USAGE.txt tells how to call FreeLing from a Java application. A Java application which calls FreeLing 1.5 (Puche) requires the above two distributed .dlls and also a distributed file libmorfo_java.jar, which contains the FreeLing API definitions, as well as some code.  Such an application, myprog, is compiled as follows:
      javac -classpath libmorfo_java.jar myprog.java
and is executed as follows:
      java -classpath libmorfo_java.jar;. myprog
The application source file myprog.java makes internal reference to morfo_java.dll

To call FreeLing from a C++ application under Windows, we may again use the .dll files, while the definitions are in the .h files in the \include directory.  These .h files are not distributed in either of the Version 1.5 ports (Puche or Martínez), though they are probably to be found in the Linux distribution of that version.  Of course, the C++ programmer has the alternative of recompiling any version of FreeLing entirely from source along with his own application.

Disclaimer

This page is offered as a facility for corpus analysis on Windows.  By using it, you are deemed to accept that the author bears no responsibility for any adverse consequences.  Needless to say, he hopes that there will be no such consequences.  He will be pleased to receive comments, but cannot promise to act upon them.


Ciarán Ó Duibhín
2010/10/06
Clár cinn / Home page / Page d'accueil / Hauptseite