Using the FreeLing Analyzer Program under Windows

Ciarán Ó Duibhín

FreeLing is a system for the linguistic analysis of text (tagging, lemmatization etc) developed at the Universitat Politècnica de Catalunya by a team including Lluís Padró.  The system includes extensive language data for Spanish, Catalan, Galician, English and Italian; and from Version 2.2, for Welsh, Portuguese and Asturian; and from Version 3.0, for Russian and Ancient Spanish; and from Version 3.1, for French and (limited) Slovene — or should that be Czech?  The system takes the form of a library, which can be called from within a computer program, but there is also a fully-compiled application program, called analyzer.exe, which allows most of the functionality of FreeLing to be used off-the-shelf.  We will be concerned here only with the installation and use of this stand-alone analyzer program.

FreeLing is written in C++, and development takes place in a Unix environment.  The latest version as of 08 February 2014 is FreeLing 3.1; a Windows port (165.5 MB) is available, which contains the binary program and the language data. The user manual is available here. Omitted in version 3.1 are the instructions supplied in the 3.0 Readme, to call FreeLing as a library when compiling your own programs, without recompiling or accessing the source of FreeLing and other libraries but using only the contents of this download (and seemingly the required project files are omitted also).

A number of Windows ports of earlier versions of FreeLing were also created, and will be mentioned later. Versions earlier than 3.0 were made by individuals on a voluntary basis, and were unsupported by the developers, but their use could be (and has been) discussed by their users on the FreeLing Forum (to contribute to discussion, you must become registered and then login on the FreeLing home page). The present page was created mainly to help Windows users with those earlier versions of the FreeLing analyser program; since FreeLing 3.0, there is little to add to the Readme file supplied by the developers in the Windows port itself.

Installing Analyzer from FreeLing 3.1 in Windows

The zip file for Version 3.1 named in the link above should be downloaded. The zip file should be extracted into a suitable folder, taking care to preserve the subfolder structure (by checking the "Use folder names" button or similar). If you set to extract to C:\Program Files, the package will be extracted into C:\Program Files\freeling-3.1-win and subfolders. It may facilitate later steps if you now rename this folder to C:\Program Files\FreeLing-3.1 This will be called the "extraction folder." No installer program is required.

The Version 3.1 User Manual will be found in C:\Program Files\FreeLing-3.1\doc\userman\userman.pdf, or it may be downloaded from here.

Following zip extraction, installation instructions may be found in C:\Program Files\FreeLing-3.1\README and there is not much to add here. Our suggested modifications are minor: (a) change of extraction folder to a subfolder of Program Files, in accordance with usual Windows practice; (b) possibly make changes of environment variables temporary, by using a batch file.

If the redistributable component (at least) of Microsoft Visual C++ is not already installed on your computer, you should now install it. Download it from here, and run it. Alternatively, if the files msvcr100.dll and msvcp100.dll and tlkernel.dll already exist on your computer in the folder of some other application program, it MAY suffice to copy them to C:\Program Files\FreeLing-3.0\freeling\bin.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned later).  The success of the installation so far may be verified by opening an MS-DOS window, moving to the C:\Program Files\FreeLing-3.1 folder, and typing

        bin\analyzer -h

This should run the analyzer program and display a list of allowed program options (the options are fully described in the user manual).

To try out further program options, we need to make some changes to the Windows environment variables (unless these are made – and unmade – in a batch file, see below). To make the changes now, press Win+Break, and choose Advanced system settings –> Environment variables –> System variables:
1. Append ;C:\Program Files\FreeLing-3.1\bin to system variable Path
2. Create new system variable FREELINGSHARE=C:\Program Files\FreeLing-3.1\data

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which we wish to differ from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction folder.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

Following zip extraction, you will find a configuration file for each supported language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg, as.cfg, cy.cfg, pt.cfg, old-es.cfg, ru.cfg, fr.cfg, cs.cfg — in the C:\Program Files\FreeLing-3.1\data\config subfolder.

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, and – from any folder, no bin\ required now – typing a command line. A typical command-line might begin with

        analyzer -f "%FREELINGSHARE%\config\en.cfg"

Note the double quotes, required because of the internal space in Program Files (after substitution of the value of FREELINGSHARE).

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.

However, typing text direct from the keyboard in an MS-DOS window is unsatisfactory, because MS-DOS does not use the UTF8 character-set (as required by FreeLing since 3.0), or even the Latin-1 character-set (as required by earlier versions of FreeLing). As a result, any attempt to type an accented letter in MS-DOS is likely to mean something different to FreeLing. Instead, input text should be placed in a file beforehand, and this file should be named on the command-line, eg.

        analyzer -f "%FREELINGSHARE%\config\en.cfg" <input.txt

A suitable file of text can easily be created or edited in Windows using a plain-text editor such as NotePad (just make sure the file is saved in UTF-8 encoding).

Note that this Windows port of FreeLing 3.1 has not been tested here under any version of Windows other than Vista.

Using a batch file

Instead of opening an MS-DOS window and typing a command-line to run the Analyzer program, you can use a batch file, which will allow you to stay in Windows the whole time.

To create the batch file (called, for example, freeling.bat), go (in Windows) to the \freeling\bin subfolder of the extraction folder and edit the content of your command-line into a plain text file, freeling.bat, using an application such as NotePad. Add a further line containing pause, so you will have a chance to view the output before it vanishes. The batch file should be saved with encoding OEM or MS-DOS if any of these are available; otherwise encoding ANSI or Latin-1 will do for most purposes.

Double-clicking on the batch file's icon will run the analyzer. You can even make a shortcut to the batch file, and place the short-cut on your desktop or on your Start Menu or in any folder. If filenames mentioned in the batch file contain non-ASCII characters (such as accented letters), you have to give these in the MS-DOS character-set rather than in the Windows character-set; in that case, unless you have a Windows plain-text editor that can save in OEM or MS-DOS encoding, you may find it easier to edit the batch file under MS-DOS than under Windows.

If using a batch file, the changes to environmental variables described under installation above, instead of being made permanent, might be made in the batch file before calling the analyzer, and cancelled in the batch file immediately afterwards. The following batch file embeds the previous sample analyzer command-line within such a sequence, and prompts for the name of the input file:

        @Echo Off
        Set OLDPATH=%PATH%
        Set PATH=C:\Program Files\FreeLing-3.1\bin;%PATH%
        Set FREELINGSHARE=C:\Program Files\FreeLing-3.1\data
        Set /P _input=Input file: || Set _input=input.txt
        Echo On
        analyzer -f "%FREELINGSHARE%\config\en.cfg" <"%_input%"
        @Echo Off
        Pause
        Set PATH=%OLDPATH%
        Set OLDPATH=
        Set FREELINGSHARE=
        Set _input=


If FreeLing is always to be called in this way, the permanent changes to environment variables described under installation above may be revoked.


A graphic interface

We hope at some stage to offer a Windows graphic interface to FreeLing, which will allow the options to be selected visually, and will then launch the application automatically.

Downloading FreeLing

For me, one of the hardest things is getting FreeLing downloaded from devel.clp.upc.edu. The download containing the version 3.1 binary is 166 MB in size. When I try any of the browsers (Internet Explorer, Firefox, Chrome), it will deliver only the first part of the file, as an invalid zip archive. How much is delivered before failure varies from one try to the next — did someone say that computers are predictable? — you nearly always get at least 37MB, sometimes twice that, sometimes three times it, but you may have to try scores of times if not hundreds before you will get all of the file. I note others having the same problem, according to the FreeLing forum.

What is going on here? Well, there are two ends to every connection, and each will blame the other. One end says "These files are successfully downloaded all the time, so your problem must be at the client end" — though I wonder how they know how many of those downloads were actually successful, and how many were truncated failures. The other end says "I download files successfully all the time and I never have this problem with other servers." Stalemate.

I must have made at least 50 attempts to download v3.1 using a variety of browsers, before turning to wget. Wget is a command-line downloader, available from here. It won't get over the problem; it gives the message "connection closed" whenever the axe falls on the download. But wget runs in the background, and can retry automatically — for ever, if you so instruct it. Unfortunately each retry starts afresh from the beginning of the file; wget can resume a download from the point of failure but only if the server supports it, and it seems this one doesn't. Nevertheless, this is what I recommend: there's nothing as obstinate as a computer, unless it's another computer. So put a couple of lines in a batch file

      wget -v -t 0 http://devel.clp.upc.edu/freeling/downloads/freeling-3.1-win.zip
      pause


using the name of whatever file you want to download, and let battle commence. One nice thing is that wget will only retain the longest download of the file to date, not all of them. As far as the progress of the download goes, the byte-counts given by wget seem more reliable than its percentages.

I started running wget late on a Friday night. After 14 "connection closed" and 1 "read error", attempt 16 succeeded! As far as I can tell, all my attempts were logged in the server's download counts.

Ports of earlier versions of FreeLing

There is no reason to install a pre-current version of FreeLing, unless your Windows version is too old to run the current version of FreeLing. If required, earlier versions of FreeLing are available for Windows as follows. Versions 2.2 and later are still downloadable from the FreeLing website.

Version 1.4 (28.8 MB) has been compiled for Windows by Jordi Atserias using cygwin.

Version 1.5 has been compiled twice for Windows;

firstly, by Bruno Martínez using MS Visual C++ 2005 — Version 1.5 (Martínez) (22.8 MB);

and secondly, by Javier Puche using MingW + Msys + msysDTK — Version 1.5 (Puche) (27.59 MB).

Version 2.0 has been compiled using cygwin and is available for download as a set of three zip files: the Version 2.0 program files (5.72 MB), the Version 2.0 data files for English and Italian (14.19 MB), and the Version 2.0 data files for Spanish, Catalan and Galician (20.95 MB).

Version 2.2 was ported by Israel Olalla, cross-compiled on Linux using MingW32. Here is Version 2.2 information and here are Version 2.2 binaries (8.4 MB). Data files for the supported languages, as well as the user manual, are to be found in the Version 2.2 machine-independent distribution (40.5 MB).

Version 3.0 has been compiled using MS Visual C++ 2010. There are two files: Version 3.0 (173 MB) containing the binary program and the language data, as well as the project files needed to call FreeLing as a library when compiling your own programs, without recompiling or accessing the source of FreeLing and other libraries but using only the contents of this download (see Readme); and a patch (0.6 MB).

A comparison of the subfolder structures created by unzipping the various ports may be helpful. Those ports without a \bin subfolder hold the binary in the root folder.

3.1 3.0 2.2 (Olalla) 2.0 1.5 (Puche) 1.5 (Martínez) 1.4 (Atserias)
  freeling          
bin   bin bin bin      
doc   doc doc     doc
  userman       userman       userman
     html          html           html
  refman             refman
      html                 html
                  latex
  diagrams             diagrams
  grammars            
  multilingual            
  tagsets            
    tagset-ru.files            
data   data   share     data
  common     common     common common common   common
    nec       nec        nec   nec   nec      nec
        connector          
    lang_ident       lang_ident          
    alternatives            
  config     config     config config config   config
  ca     ca     ca ca ca   ca
    nec       nec        nec   nec   nec      nec
    chunker       chunker          
    dep       dep          
  en     en     en en en   en
    nerc       nec        nec   nec   nec      nec
    nerc       ner          
    chunker       chunker          
    dep       dep          
  es     es     es es es   es
    nerc       nec        nec   nec   nec      nec
    nerc       ner          
    chunker       chunker          
    dep       dep        dep      
    coref       coref          
        corrector          
    old-es       old-es          
  gl     gl     gl gl gl   gl
    nerc       nec        nec   nec   nec      nec
    nerc       ner          
    chunker       chunker          
    dep       dep          
  it     it     it it it   it
    nec       nec        nec   nec   nec      nec
  as     as          
    nerc       nec          
    chunker       chunker          
    dep       dep          
  cy     cy          
    nec       nec          
  pt     pt          
    nerc       nec          
    nerc       ner          
    chunker       chunker          
  ru     ru          
  fr            
  cs            
include   include include include     include
      freeling          
        morpho   freeling   freeling      
        morpho   fries   fries      
        omlet   omlet   omlet      
        utf8          
    lib lib        
        devel    
          java    
        dynamic    
        java    
        util    
  icu          
bin   bin          
include   include          
  layout     layout          
  unicode     unicode          
    lib          
include
boost_1.47          
  boost   boost          
    (80 subfolders)     (80 subfolders)          
    lib          

Installing Analyzer from FreeLing 3.0 in Windows

The zip file for Version 3.0 named in the link above should be downloaded. The zip file should be extracted into a suitable folder, taking care to preserve the subfolder structure (by checking the "Use folder names" button or similar). If you set to extract to C:\Program Files, the package will be extracted into C:\Program Files\freeling_win and subfolders. It may facilitate later steps if you now rename this folder to C:\Program Files\FreeLing-3.0 This will be called the "extraction folder." No installer program is required. You should now replace C:\Program Files\FreeLing-3.0\freeling\lib\freeling.dll by the version of freeling.dll in the downloaded patch.

The Version 3.0 User Manual may be downloaded from here.

Following zip extraction, installation instructions may be found in C:\Program Files\FreeLing-3.0\README and there is not much to add here. Our suggested modifications are minor: (a) change of extraction folder to a subfolder of Program Files, in accordance with usual Windows practice; (b) possibly make changes of environment variables temporary, by using a batch file.

If the redistributable component (at least) of Microsoft Visual C++ is not already installed on your computer, you should now install it. Download it from here, and run it. Alternatively, if the files msvcr100.dll and msvcp100.dll and tlkernel.dll already exist on your computer in the folder of some other application program, it MAY suffice to copy them to C:\Program Files\FreeLing-3.0\freeling\bin.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned above).  The success of the installation so far may be verified by opening an MS-DOS window, moving to the C:\Program Files\FreeLing-3.0\freeling folder, and typing

        bin\analyzer -h

This should run the analyzer program and display a list of allowed program options (the options are fully described in the user manual). Beware, however! In this particular release, even a list of options requires access to freeling\lib\freeling.dll and icu\bin\icuuc49.dll (at least), so that the path needs to have been set as described in the section immediately following.

To try out further program options, we need to make some changes to the Windows environment variables (unless these are made – and unmade – in a batch file, see above). To make the changes now, press Win+Break, and choose Advanced system settings –> Environment variables –> System variables:
1. Append C:\Program Files\FreeLing-3.0\freeling\bin;C:\Program Files\FreeLing-3.0\freeling\lib;C:\Program Files\FreeLing-3.0\boost_1.47\lib;C:\Program Files\FreeLing-3.0\icu\bin; to Path
2. Create new variable FREELINGSHARE=C:\Program Files\FreeLing-3.0\freeling\data

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which we wish to differ from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the \freeling subfolder of the extraction folder.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

Following zip extraction, you will find a configuration file for each supported language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg, as.cfg, cy.cfg, pt.cfg, old-es.cfg, ru.cfg — in the C:\Program Files\FreeLing-3.0\freeling\data\config subfolder.

If you intend to use Russian, change the 'Locale' option in C:\Program Files\FreeLing-3.0\freeling\data\config\ru.cfg from ru_RU.UTF8 to rus. (In the other *.cfg files, the Locale option is already set to default.)

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, and – from any folder – typing a command line. A typical command-line might begin with

        analyzer -f "%FREELINGSHARE%\config\en.cfg"

Note the double quotes, required because of the internal space in Program Files (after substitution of the value of FREELINGSHARE).

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.

Note that this Windows port of FreeLing 3.0 has not been tested here under any version of Windows other than Vista and XP SP3.

Installing Analyzer from FreeLing 2.2 in Windows

The zip files for Version 2.2 named in the links above should be downloaded. The zip file containing the Windows binary should be extracted into a suitable directory, taking care to preserve the internal subdirectory structure (by checking the "Use folder names" button or similar). If you set to extract to C:\Program Files, the package will be extracted into C:\Program Files\freeling-2.2-mingw and subdirectories. It may facilitate later steps if you now rename this directory to C:\Program Files\FreeLing-2.2 No installer program is required.

The Version 2.2 User Manual may be downloaded from here, or alternatively, extracted from the machine-independent 2.2 distribution.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned above).  The success of the installation so far may be verified by opening an MS-DOS window, moving to the extraction directory, and typing

        bin\analyzer -h

This should run the analyzer program and display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You should now extract the data files for the supported languages from the downloaded zip file for the machine-independent 2.2 distribution. You need select only those files packed in FreeLing-2.2\data, and then extract, having ensured that "Use folder names" or similar is selected, and that the target is set to C:\Program Files You will now find a configuration file for each supported language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg, as.cfg, cy.cfg, pt.cfg — in the data\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        $FREELINGSHARE/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing-2.2\data\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-2.2

You will also have to change two filenames in configuration files:

        maco.db to dicc.src

        senses30.db to senses30.src

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        bin\analyzer -f data\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.

Note that this Windows port of FreeLing 2.2 has not been tested here under any version of Windows other than Vista.

Installing Analyzer from FreeLing 2.0 in Windows

The zip files linked above for Version 2.0 should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing20), taking care to preserve the internal subdirectory structure. No installer program is used.  The remaining information required for Windows installation will be found in the file README.txt, contained in the top-level directory.

The Version 2.0 User Manual is included in the download of the program files, as doc\userman\userman.pdf.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned above).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        bin\analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the share\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        $FREELINGSHARE/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing20\share\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing20

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        bin\analyzer -f share\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Installing Analyzer from FreeLing 1.5 (Puche) in Windows

The zip file linked above for Version 1.5 (Puche) should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\freeling1.5-win-java-all-langs), taking care to preserve the internal subdirectory structure.  No installer program is used.  (Don't worry about the mention of Java — it is not involved in running the analyzer compiled program.)

The Version 1.5 PDF User Manual is not included in the download, and has been superseded on the FreeLing website.  Therefore I make it available here.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned above).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        en/tokenizer.dat

it may have to be changed to

        C:\Program Files\freeling1.5-win-java-all-langs\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\freeling1.5-win-java-all-langs

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        analyzer -f config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Installing Analyzer from FreeLing 1.5 (Martínez) in Windows

First off, if you are using Windows 95, FreeLing 1.5 (Martínez) will NOT work under it — you may follow the process below for a certain distance, but it will eventually fail.

The zip file linked above for Version 1.5 (Martínez) should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing-1.5), taking care to preserve the internal subdirectory structure. No installer program is used. No information whatever is included about how to install.

The Version 1.5 PDF User Manual is not included in the download, and has been superseded on the FreeLing website.  Therefore I make it available here.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned above).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

However, an error may occur at this point because the download does NOT contain the essential files MSVCR80.DLL and MSVCP80.DLL, which are runtime libraries required by applications written in Visual C++ 2005 (that includes this port of FreeLing).  You may already have these files on your machine, if you have previously installed Visual C++ 2005, or an application written in it.

The absence of these files produces different error messages in different versions of Windows.  In early versions (95; 98; ME? 2000?), it may say "A required .DLL file, MSVCP80.DLL, was not found."  With later versions of Windows (XP; XP with SP2? 2003? Vista?), the message may be "The application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem."  Use of Resource Hacker shows that analyzer.exe contains an embedded manifest which asks for version 8.0.50727.762 of the .dlls.

If these .dlls are missing or are causing errors, the best (and safest) way to rectify this is to download and install the appropriate one of two free Microsoft packages:
• for Windows 98; 98 Second Edition; ME: Microsoft Visual C++ 2005 Redistributable Package (x86), v. 1.0 (actually, 8.0.50727.42), dated 2006/04/10 (2.6 MB);
• for Windows 2000; XP; 2003; Vista: Microsoft Visual C++ 2005 SP1 Redistributable Package (x86), v. 8.0.50727.762, dated 2007/04/10 (2.6 MB)

As regards Windows 95, the first of these Microsoft packages will actually install under Windows 95 — or at least, under Windows 95B — but running the analyzer program now produces the message "The MSVCR80.DLL file is linked to missing export KERNEL32.DLL:GetLongPathNameW."  This means that MSVCR80.DLL (and indeed VC++ 2005 as a whole) is simply incompatible with Windows 95, which does not have routines like GetLongPathNameW in its kernel.  To run under Windows 95, this port of FreeLing would need to be recompiled under an earlier version of VC++, such as VC++ 7.1 (also known as Visual C++ .NET 2003), or possibly even as far back as VC++ 6.

Under Windows 98 or later, if the analyzer -h command is now producing output, we may try something more ambitious.

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z followed by Enter.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        en/tokenizer.dat

it may have to be changed to

        C:\Program Files\Freeling-1.5\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-1.5

You may now try a command line such as the following

        analyzer -f config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

In Windows 98 — and the same is probably true of Windows ME — the analyzer program may now produce error messages such as "Error 14 while opening database en\maco.db."  This problem is cured if the distributed file libdb45.dll is replaced by this one, which was kindly compiled by Andrei Costache of Oracle/BerkeleyDB for Windows 98/ME as the target system. (It works on newer Windows systems too, but may not perform as efficiently on the newer systems as the distributed libdb45.dll.)  Many thanks to Andrei for his patient help with this.  Remember that use of libdb45.dll is subject to the terms of the BerkeleyDB licence agreement.

This completes the installation of the analyzer program.

The omissions in the distribution and the incompatibility of the executable with Windows 95 are regrettable, as compilation under Virtual C++ feels like the best way to go.

Installing Analyzer from FreeLing 1.4 in Windows

The zip file linked above for Version 1.4 should be downloaded, and extracted into a suitable directory (eg. C:\Program Files\FreeLing-1.4), taking care to preserve the internal subdirectory structure. No installer program is used.  The remaining information required for Windows installation will be found in the file Readme, contained in the top-level directory.

The Version 1.4 User Manual is included in the general download for that version, as doc\userman\userman.pdf.

As a Unix program in origin, the analyzer expects to read its options from the MS-DOS command line (other possibilities are mentioned above).  For now, the installation of the analyzer can be tested by opening an MS-DOS window, moving to the extraction directory, and typing

        analyzer -h

This should display a list of allowed program options (the options are fully described in the user manual).

The most important option for the analyzer is the name of a configuration file.  This is a file containing a set of program options, so that the remainder of the command line need contain only options absent from the configuration file, or those which are to be made different from what is in the configuration file.  If no configuration file is specified on the command line, the default is analyzer.cfg in the extraction directory.

Other possible command-line content includes redirection of the input (text for analysis) and/or output (results of analysis) to named files.  In the absence of redirection, input comes from the keyboard and output goes to the MS-DOS window.  End of input through the keyboard may be signalled by keying Ctrl/Z.

You will find a configuration file for each language — ca.cfg, en.cfg, es.cfg, gl.cfg, it.cfg — in the data\config subdirectory.  Among the contents of a configuration file will be a number of filenames, and these filenames — at least in those configuration files which you intend to use — may have to be changed to suit (a) Windows syntax — in particular, by changing forward slashes in paths to backslashes; and (b) your choice of extraction directory.

So, for example, if a configuration file contains the filename

        N:/Eines-SL/freeling1.4/FreeLing/en/tokenizer.dat

or, in fact,

        <anything>/en/tokenizer.dat

it should be changed to

        C:\Program Files\FreeLing-1.4\data\en\tokenizer.dat

on the assumption that you extracted to C:\Program Files\FreeLing-1.4

This completes the installation of the analyzer program.  It can be run by opening an MS-DOS window, moving to the extraction directory, and typing a command line. A typical command-line might begin with

        analyzer -f data\config\en.cfg

On its own, this command line will accept text from the keyboard, analyse it according to typical rules for English, and output the results to the MS-DOS window.  Further options and redirections could be added on the command line, to name input and/or output files, or to vary some of the settings in the distributed English configuration file.  Note that, in older versions of Windows, you may have to abbreviate filenames and directory names to their 8+3 forms.

Using FreeLing from a Programming Language in Windows

This webpage is really about using FreeLing on Windows in the form of the analyzer application, but it is also possible to call FreeLing from a programming language, such as C++ or Java. I have no personal experience of doing this, but here are a few basic notes, which others are invited to correct and extend.

The FreeLing user manual usually contains a chapter entitled "Using the library from your own application," but not necessarily written from a Windows perspective. According to the manual, versions up to and including 2.0 had only a C++ API, while versions 2.2 and 3.0 offer "a complete C++ API, a quite-complete Java API, and half-complete perl and python APIs".

To use FreeLing from a programming language in Windows, FreeLing should be available as a .dll file. However, only certain versions contain a large .dll file in their distributions. Version 1.5 (Puche) supplies morfo.dll and morfo_java.dll in the java folder. Version 2.2 (Olalla) supplies similarly-named .dll files in the bin folder. Version 3.0 supplies freeling.dll and freeling-d.dll, in the freeling\lib folder.

From C++

To call FreeLing without recompilation from a C++ application under Windows, we may need to use the .dll files which are distributed with certain versions as above, or to compile them for versions where they are not distributed. The definitions are in the .h files in the \include subfolder, but these are not distributed in either of the Version 1.5 ports (Puche or Martínez), though they are probably to be found in the Linux distribution of that version.

The user manual shows an example of a C++ program using the library. In compiling the example user's program, libraries named morpho, db_cxx, pcre are linked from the earliest versions of FreeLing, with omlet and fries(?) added at version 2.0 and boost_filesystem added at version 2.2. The user manual explains that morpho "links with libmorfo library, which is the final result of the FreeLing compilation process", while db_cxx, pcre and the others refer to "other libraries required by FreeLing." I am unclear how these libraries relate to the dll files supplied with certain versions as above, or if they relate at all.

Of course, the C++ programmer — unlike the user of other programming languages — has the alternative of recompiling any version of FreeLing entirely from source along with his own application.

From Java

Puche's file java\USAGE.txt tells how to call FreeLing 1.5 (Puche) from a Java application, using the two distributed .dlls and also a distributed file libmorfo_java.jar, which contains the FreeLing API definitions, as well as some code.  Such an application, myprog, is compiled as follows:
      javac -classpath libmorfo_java.jar myprog.java
and is executed as follows:
      java -classpath libmorfo_java.jar;. myprog
The application source file myprog.java makes internal reference to morfo_java.dll

From Delphi
From perl
From python

Contributions for inclusion here, concerning the use of FreeLing from any other programming language, will be welcomed.

Disclaimer

This page is offered as a facility for corpus analysis on Windows.  By using it, you are deemed to accept that the author bears no responsibility for any adverse consequences.  Needless to say, he hopes that there will be no such consequences.  He will be pleased to receive comments, but cannot promise to act upon them.


Ciarán Ó Duibhín
2014/02/08
Clár cinn / Home page / Page d'accueil / Hauptseite