Using the FreeLing Analyzer Program under Windows

Ciarán Ó Duibhín

FreeLing is a system for the linguistic analysis of text (tagging, lemmatization etc) developed at the Universitat Politècnica de Catalunya by a team including Lluís Padró.  The system includes extensive language data for Spanish, Catalan, Galician, English and Italian; and from Version 2.2, for Welsh, Portuguese and Asturian; and from version 3.0, for Russian and Ancient Spanish.  The system takes the form of a library, which can be called from within a computer program, but there is also a fully-compiled application program, called analyzer.exe, which allows most of the functionality of FreeLing to be used.  We will be concerned here only with the installation and use of this stand-alone analyzer program.

FreeLing is written in C++, and development takes place in a Unix environment.  The latest version as of 22 October 2012 is FreeLing 3.0; a Windows port (173 MB) is available, which contains the binary and the language data, together with a patch here. (This port also contains project files and instructions to recompile FreeLing on Windows using MSVC.) The user manual is available here.

A number of Windows ports of earlier versions of FreeLing were also created, and will be mentioned later. Unlike Version 3.0, these earlier ports were made by individuals on a voluntary basis, and were unsupported by the developers, but their use could be (and has been) discussed by their users on the FreeLing Forum (to contribute to discussion, you must become registered and then login on the FreeLing home page).

Installing Analyzer from FreeLing 3.0 in Windows

The zip file for Version 3.0 named in the link above should be downloaded. The zip file should be extracted into a suitable folder, taking care to preserve the subfolder structure (by checking the "Use folder names" button or similar). If you set to extract to C:\Program Files, the package will be extracted into C:\Program Files\freeling_win and subfolders. It may facilitate later steps if you now rename this folder to C:\Program Files\FreeLing-3.0 This will be called the "extraction folder." No installer program is required. You should now replace C:\Program Files\FreeLing-3.0\freeling\lib\freeling.dll by the version of freeling.dll in the downloaded patch.

The Version 3.0 User Manual may be downloaded from here.

Following zip extraction, installation instructions may be found in C:\Program Files\FreeLing-3.0\README and there is not much to add here. Our suggested modifications are minor: (a) change of extraction folder to a subfolder of Program Files, in accordance with usual Windows practice; (b) possibly make changes of environment variables temporary, by using a batch file.

If the redistributable component (at least) of Microsoft Visual C++ is not already installed on your computer, you should now install it. Download it from here, and run it. Alternatively, if the files msvcr100.dll and msvcp100.dll and tlkernel.dll already exist on your computer in the folder of some other application program, it MAY suffice to copy them to C:\Program Files\FreeLing-3.0\freeling\bin.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  The success of the installation so far may be verified by opening an MS-DOS window, moving to the C:\Program Files\FreeLing-3.0\freeling folder, and typing

        bin\analyzer -h

This should run the analyzer program and display a list of allowed program options (the options are fully described in the user manual).

To try out further program options, we need to make some changes to the Windows environment variables (unless these are made – and unmade – in a batch file, see below). To make the changes now, press Win+Break, and choose Advanced system settings –> Environment variables –> System variables:
1. Append C:\Program Files\FreeLing-3.0\freeling\bin;C:\Program Files\FreeLing-3.0\freeling\lib;C:\Program Files\FreeLing-3.0\boost_1.47\lib;C:\Program Files\FreeLing-3.0\icu\bin; to Path
2. Create new variable FREELINGSHARE=C:\Program Files\FreeLing-3.0\freeling\data

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which we wish to differ from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the \freeling subfolder of the extraction folder.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

Following zip extraction, you will find a configuration file for each supported language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg, as.cfg, cy.cfg, pt.cfg, old-es.cfg, ru.cfg — in the C:\Program Files\FreeLing-3.0\freeling\data\config subfolder.

If you intend to use Russian, change the 'Locale' option in C:\Program Files\FreeLing-3.0\freeling\data\config\ru.cfg from ru_RU.UTF8 to rus. (In the other *.cfg files, the Locale option is already set to default.)

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, and – from any folder – typing a command line. A typical command-line might begin with

        analyzer -f "%FREELINGSHARE%\config\en.cfg"

Note the double quotes, required because of the internal space in Program Files (after substitution of the value of FREELINGSHARE).

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.

However, typing text direct from the keyboard in an MS-DOS window is unsatisfactory, because MS-DOS does not use the UTF8 character-set (as required by FreeLing 3.0), or even the Latin-1 character-set (as required by earlier versions of FreeLing). As a result, any attempt to type an accented letter in MS-DOS is likely to mean something different to FreeLing. Instead, input text should be placed in a file beforehand, and this file should be named on the command-line, eg.

        analyzer -f "%FREELINGSHARE%\config\en.cfg" <input.txt

A suitable file of text can easily be created or edited in Windows using a plain-text editor such as NotePad (just make sure the file is saved in UTF-8 encoding).

Note that this Windows port of FreeLing 3.0 has not been tested here under any version of Windows other than Vista and XP SP3.

Using a batch file

Instead of opening an MS-DOS window and typing a command-line to run the Analyzer program, you can use a batch file, which will allow you to stay in Windows the whole time.

To create the batch file (called, for example, freeling.bat), go (in Windows) to the \freeling\bin subfolder of the extraction folder and edit the content of your command-line into a plain text file, freeling.bat, using an application such as NotePad. Add a further line containing pause, so you will have a chance to view the output before it vanishes. The batch file should be saved with encoding OEM or MS-DOS if any of these are available; otherwise encoding ANSI or Latin-1 will do for most purposes.

Double-clicking on the batch file's icon will run the analyzer. You can even make a shortcut to the batch file, and place the short-cut on your desktop or on your Start Menu or in any folder. If filenames mentioned in the batch file contain non-ASCII characters (such as accented letters), you have to give these in the MS-DOS character-set rather than in the Windows character-set; in that case, unless you have a Windows plain-text editor that can save in OEM or MS-DOS encoding, you may find it easier to edit the batch file under MS-DOS than under Windows.

If using a batch file, the changes to environmental variables described under installation above, instead of being made permanent, might be made in the batch file before calling the analyzer, and cancelled in the batch file immediately afterwards. The following batch file embeds the previous sample analyzer command-line within such a sequence, and prompts for the name of the input file:

        @Echo Off
        Set OLDPATH=%PATH%
        Set PATH=C:\Program Files\FreeLing-3.0\freeling\bin;C:\Program Files\FreeLing-3.0\freeling\lib;C:\Program Files\FreeLing-3.0\boost_1.47\lib;C:\Program Files\FreeLing-3.0\icu\bin;%PATH%
        Set FREELINGSHARE=C:\Program Files\FreeLing-3.0\freeling\data
        Set /P _input=Input file: || Set _input=input.txt
        Echo On
        analyzer -f "%FREELINGSHARE%\config\en.cfg" <"%_input%"
        @Echo Off
        Pause
        Set PATH=%OLDPATH%
        Set OLDPATH=
        Set FREELINGSHARE=
        Set _input=


If FreeLing is always to be called in this way, the permanent changes to environment variables described under installation above may be revoked.

A graphic interface

We hope at some stage to offer a Windows graphic interface to FreeLing, which will allow the options to be selected visually, and will then launch the application automatically.

Ports of earlier versions of FreeLing

There is no reason to install a pre-current version of FreeLing, unless your Windows version is too old to run the current version of FreeLing. If required, earlier versions of FreeLing are available for Windows as follows. Version 2.2 is still downloadable from the FreeLing website.

Version 1.4 (28.8 MB) has been compiled for Windows by Jordi Atserias using cygwin.

Version 1.5 has been compiled twice for Windows;

firstly, by Bruno Martínez using MS Visual C++ 2005 — Version 1.5 (Martínez) (22.8 MB);

and secondly, by Javier Puche using MingW + Msys + msysDTK — Version 1.5 (Puche) (27.59 MB).

Version 2.0 has been compiled using cygwin and is available for download as a set of three zip files: the Version 2.0 program files (5.72 MB), the Version 2.0 data files for English and Italian (14.19 MB), and the Version 2.0 data files for Spanish, Catalan and Galician (20.95 MB).

Version 2.2 was ported by Israel Olalla, cross-compiled on Linux using MingW32. Here is Version 2.2 information and here are Version 2.2 binaries (8.4 MB). Data files for the supported languages, as well as the user manual, are to be found in the Version 2.2 machine-independent distribution (40.5 MB).

A comparison of the subfolder structures created by unzipping the various ports may be helpful. Those ports without a \bin subfolder hold the binary in the root folder.

3.0 2.2 (Olalla) 2.0 1.5 (Puche) 1.5 (Martínez) 1.4 (Atserias)
freeling          
  bin bin bin      
  doc doc     doc
      userman       userman
         html           html
             refman
                html
                latex
            diagrams
  include include include     include
    freeling          
      morpho   freeling   freeling      
      morpho   fries   fries      
      omlet   omlet   omlet      
      utf8          
  lib lib        
      devel    
        java    
      dynamic    
      java    
      util    
  data   share     data
    common     common common common   common
      nec        nec   nec   nec      nec
      connector          
      lang_ident          
    config     config config config   config
    ca     ca ca ca   ca
      nec        nec   nec   nec      nec
      chunker          
      dep          
    en     en en en   en
      nec        nec   nec   nec      nec
      chunker          
      dep          
      ner          
    es     es es es   es
      nec        nec   nec   nec      nec
      chunker          
      dep        dep      
      coref          
      corrector          
      ner          
      old-es          
    gl     gl gl gl   gl
      nec        nec   nec   nec      nec
      chunker          
      dep          
      ner          
    it     it it it   it
      nec        nec   nec   nec      nec
    as          
      nec          
      chunker          
      dep          
    cy          
      nec          
    pt          
      nec          
      chunker          
      ner          
    ru          
icu          
  bin          
  include          
    layout          
    unicode          
  lib          
boost_1.47          
  boost          
    (80 subfolders)          
  lib          

Installing Analyzer from FreeLing 2.2 in Windows

The zip files for Version 2.2 named in the links above should be downloaded. The zip file containing the Windows binary should be extracted into a suitable directory, taking care to preserve the internal subdirectory structure (by checking the "Use folder names" button or similar). If you set to extract to C:\Program Files, the package will be extracted into C:\Program Files\freeling-2.2-mingw and subdirectories. It may facilitate later steps if you now rename this directory to C:\Program Files\FreeLing-2.2 No installer program is required.

The Version 2.2 User Manual may be downloaded from here, or alternatively, extracted from the machine-independent 2.2 distribution.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  The success of the installation so far may be verified by opening an MS-DOS window, moving to the extraction directory, and typing

        bin\analyzer -h

This should run the analyzer program and display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You should now extract the data files for the supported languages from the downloaded zip file for the machine-independent 2.2 distribution. You need select only those files packed in FreeLing-2.2\data, and then extract, having ensured that "Use folder names" or similar is selected, and that the target is set to C:\Program Files You will now find a configuration file for each supported language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg, as.cfg, cy.cfg, pt.cfg — in the data\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        $FREELINGSHARE/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing-2.2\data\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-2.2

You will also have to change two filenames in configuration files:

        maco.db to dicc.src

        senses30.db to senses30.src

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        bin\analyzer -f data\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.

Note that this Windows port of FreeLing 2.2 has not been tested here under any version of Windows other than Vista.

Installing Analyzer from FreeLing 2.0 in Windows

The zip files linked above for Version 2.0 should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing20), taking care to preserve the internal subdirectory structure. No installer program is used.  The remaining information required for Windows installation will be found in the file README.txt, contained in the top-level directory.

The Version 2.0 User Manual is included in the download of the program files, as doc\userman\userman.pdf.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        bin\analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the share\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        $FREELINGSHARE/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing20\share\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing20

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        bin\analyzer -f share\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Installing Analyzer from FreeLing 1.5 (Puche) in Windows

The zip file linked above for Version 1.5 (Puche) should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\freeling1.5-win-java-all-langs), taking care to preserve the internal subdirectory structure.  No installer program is used.  (Don't worry about the mention of Java — it is not involved in running the analyzer compiled program.)

The Version 1.5 PDF User Manual is not included in the download, and has been superseded on the FreeLing website.  Therefore I make it available here.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        en/tokenizer.dat

it may have to be changed to

        C:\Program Files\freeling1.5-win-java-all-langs\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\freeling1.5-win-java-all-langs

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        analyzer -f config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Installing Analyzer from FreeLing 1.5 (Martínez) in Windows

First off, if you are using Windows 95, FreeLing 1.5 (Martínez) will NOT work under it — you may follow the process below for a certain distance, but it will eventually fail.

The zip file linked above for Version 1.5 (Martínez) should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing-1.5), taking care to preserve the internal subdirectory structure. No installer program is used. No information whatever is included about how to install.

The Version 1.5 PDF User Manual is not included in the download, and has been superseded on the FreeLing website.  Therefore I make it available here.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

However, an error may occur at this point because the download does NOT contain the essential files MSVCR80.DLL and MSVCP80.DLL, which are runtime libraries required by applications written in Visual C++ 2005 (that includes this port of FreeLing).  You may already have these files on your machine, if you have previously installed Visual C++ 2005, or an application written in it.

The absence of these files produces different error messages in different versions of Windows.  In early versions (95; 98; ME? 2000?), it may say "A required .DLL file, MSVCP80.DLL, was not found."  With later versions of Windows (XP; XP with SP2? 2003? Vista?), the message may be "The application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem."  Use of Resource Hacker shows that analyzer.exe contains an embedded manifest which asks for version 8.0.50727.762 of the .dlls.

If these .dlls are missing or are causing errors, the best (and safest) way to rectify this is to download and install the appropriate one of two free Microsoft packages:
• for Windows 98; 98 Second Edition; ME: Microsoft Visual C++ 2005 Redistributable Package (x86), v. 1.0 (actually, 8.0.50727.42), dated 2006/04/10 (2.6 MB);
• for Windows 2000; XP; 2003; Vista: Microsoft Visual C++ 2005 SP1 Redistributable Package (x86), v. 8.0.50727.762, dated 2007/04/10 (2.6 MB)

As regards Windows 95, the first of these Microsoft packages will actually install under Windows 95 — or at least, under Windows 95B — but running the analyzer program now produces the message "The MSVCR80.DLL file is linked to missing export KERNEL32.DLL:GetLongPathNameW."  This means that MSVCR80.DLL (and indeed VC++ 2005 as a whole) is simply incompatible with Windows 95, which does not have routines like GetLongPathNameW in its kernel.  To run under Windows 95, this port of FreeLing would need to be recompiled under an earlier version of VC++, such as VC++ 7.1 (also known as Visual C++ .NET 2003), or possibly even as far back as VC++ 6.

Under Windows 98 or later, if the analyzer -h command is now producing output, we may try something more ambitious.

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z followed by Enter.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        en/tokenizer.dat

it may have to be changed to

        C:\Program Files\Freeling-1.5\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-1.5

You may now try a command line such as the following

        analyzer -f config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

In Windows 98 — and the same is probably true of Windows ME — the analyzer program may now produce error messages such as "Error 14 while opening database en\maco.db."  This problem is cured if the distributed file libdb45.dll is replaced by this one, which was kindly compiled by Andrei Costache of Oracle/BerkeleyDB for Windows 98/ME as the target system. (It works on newer Windows systems too, but may not perform as efficiently on the newer systems as the distributed libdb45.dll.)  Many thanks to Andrei for his patient help with this.  Remember that use of libdb45.dll is subject to the terms of the BerkeleyDB licence agreement.

This completes the installation of the analyzer program.

The omissions in the distribution and the incompatibility of the executable with Windows 95 are regrettable, as compilation under Virtual C++ feels like the best way to go.

Installing Analyzer from FreeLing 1.4 in Windows

The zip file linked above for Version 1.4 should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing-1.4), taking care to preserve the internal subdirectory structure. No installer program is used.  The remaining information required for Windows installation will be found in the file Readme, contained in the top-level directory.

The Version 1.4 User Manual is included in the general download for that version, as doc\userman\userman.pdf.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the data\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        N:/Eines-SL/freeling1.4/FreeLing/en/tokenizer.dat

or, in fact,

        <anything>/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing-1.4\data\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-1.4

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        analyzer -f data\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Using FreeLing from a Programming Language in Windows

This webpage is really about using FreeLing on Windows in the form of the analyzer application, but it is also possible to call FreeLing from a programming language, such as C++ or Java. I have no personal experience of doing this, but here are a few basic notes, which others are invited to correct and extend.

The FreeLing user manual usually contains a chapter entitled "Using the library from your own application," but not necessarily written from a Windows perspective. According to the manual, versions up to and including 2.0 had only a C++ API, while versions 2.2 and 3.0 offer "a complete C++ API, a quite-complete Java API, and half-complete perl and python APIs".

To use FreeLing from a programming language in Windows, FreeLing should be available as a .dll file. However, only certain versions contain a large .dll file in their distributions. Version 1.5 (Puche) supplies morfo.dll and morfo_java.dll in the java folder. Version 2.2 (Olalla) supplies similarly-named .dll files in the bin folder. Version 3.0 supplies freeling.dll and freeling-d.dll, in the freeling\lib folder.

From C++

To call FreeLing without recompilation from a C++ application under Windows, we may need to use the .dll files which are distributed with certain versions as above, or to compile them for versions where they are not distributed. The definitions are in the .h files in the \include subfolder, but these are not distributed in either of the Version 1.5 ports (Puche or Martínez), though they are probably to be found in the Linux distribution of that version.

The user manual shows an example of a C++ program using the library. In compiling the example user's program, libraries named morpho, db_cxx, pcre are linked from the earliest versions of FreeLing, with omlet and fries(?) added at version 2.0 and boost_filesystem added at version 2.2. The user manual explains that morpho "links with libmorfo library, which is the final result of the FreeLing compilation process", while db_cxx, pcre and the others refer to "other libraries required by FreeLing." I am unclear how these libraries relate to the dll files supplied with certain versions as above, or if they relate at all.

Of course, the C++ programmer — unlike the user of other programming languages — has the alternative of recompiling any version of FreeLing entirely from source along with his own application.

From Java

Puche's file java\USAGE.txt tells how to call FreeLing 1.5 (Puche) from a Java application, using the two distributed .dlls and also a distributed file libmorfo_java.jar, which contains the FreeLing API definitions, as well as some code.  Such an application, myprog, is compiled as follows:
      javac -classpath libmorfo_java.jar myprog.java
and is executed as follows:
      java -classpath libmorfo_java.jar;. myprog
The application source file myprog.java makes internal reference to morfo_java.dll

From Delphi
From perl
From python

Contributions for inclusion here, concerning the use of FreeLing from any other programming language, will be welcomed.

Disclaimer

This page is offered as a facility for corpus analysis on Windows.  By using it, you are deemed to accept that the author bears no responsibility for any adverse consequences.  Needless to say, he hopes that there will be no such consequences.  He will be pleased to receive comments, but cannot promise to act upon them.


Ciarán Ó Duibhín
2012/11/08
Clár cinn / Home page / Page d'accueil / Hauptseite