Using the BabelNet disambiguation program in Windows

Ciarán Ó Duibhín

BabelNet is — among other things — a system for performing word-sense-disambiguation (WSD) on running text in several languages. It was created at the Sapienza Università di Roma by a group headed by Roberto Navigli. The system is written in Java and implemented on Unix. As of May 2014, the current version is 2.5.

The API archive download below contains a BabelNet demonstration program, which explores the BabelNet resources, but does not process running text. The compiled program is found in bin\it\uniroma1\lcl\babelnet\BabelNetDemo.class, and the source is in src\it\uniroma1\lcl\babelnet\BabelNetDemo.java. Instructions to install and run the compiled program are given for Unix in the README file in the root of the same archive. The present file describes how to adapt these instructions for Windows, to help Windows users who want to try out BabelNet without studying Unix or Java.

There is also, in Figure 3 of the paper Multilingual WSD with Just a Few Lines of Code: the BabelNet API, by Roberto Navigli and Simone Paolo Ponzetto, a Java program which uses BabelNet to perform WSD on running text. However, the "path indexes" which must be downloaded to run this program have not been updated since version 1.0.1 of BabelNet, and it cannot be run on any more recent version. (To run v1.0.1, download the files named near the end of https://groups.google.com/forum/#!topic/babelnet-kb/1mrYql7FwrA and use the program given in https://groups.google.com/forum/#!topic/babelnet-kb/2EIKgvDVE2c . This process will not be explained here.)

Due to the work involved in updating the present file each time a BabelNet update is issued, I do not intend to update again until such time as I learn that the path indexes have been made available.

Pre-requisites

Before BabelNet can be used, the programming language Java and the lexical database WordNet have to be installed, and we begin with them.

Java

Java is a programming language. What you download will depend on whether you just want to run programs already written in Java by others (like the BabelNet demonstration program) — in this case, you only need download the Java Runtime Environment (JRE); or whether you want to write or modify programs (as you will certainly want to do with the BabelNet WSD program) — in this case, you download the Java Development Kit (JDK), which includes the JRE. The Java downloads are .exe files, one or other of which just needs to be run in order to install Java. The default installation directories, as of November 2013, are C:\Program Files\Java\jre7 and C:\Program Files\Java\jdk1.7.0_45

WordNet

Version 3.0 (at least) of the lexical database WordNet is required for BabelNet (BabelNet may be configured to set the WordNet version to 2.1, but this will be ignored!) The WordNet website is contradictory as to whether WordNet 3.0 is usable under Windows — the WordNet 3.0 README talks of a self-extracting archive containing WordNet 3.0 for Windows, but the Download page and the Current Version page both say that WordNet 3.0 is for Unix only. WordNet have declined to answer my query on the matter, but, as of November 2013, the real position seems to be that the WordNet 3.0 download contains data files which work perfectly well with Windows applications, but lacks a Windows implementation of the WordNet GUI browser program which is included in source and/or binary form in all Unix releases, and in Windows releases up to 2.1; but this does not matter to us, as we will be using the data files with BabelNet. So download WordNet 3.0 for UNIX-like systems — probably any of the three downloads will work, but I used the tar-gzipped one — and unpack into Program Files\WordNet-3.0. Disregard the mention of source code and binaries — the source code, if included in your download, can be ignored; and there are no binaries.

Downloading and unpacking BabelNet

Download the BabelNet Precompiled Index, Core, v2.5, CC_BY_NC_SA_30 licence (1.20 GB)
and unpack it to C:\Program Files\BabelNet, so that the following subdirectories are created directly under C:\Program Files\BabelNet:
      core_CC_BY_NC_SA_30
      graph_CC-BY_NC_SA_30
      dict
      gloss
      lexicon
Download also ONE or more of the following, according to the type of licence required:
• the BabelNet Precompiled Index, v2.5, CC_BY_30 licence (39.4 MB)
      dict_CC_BY_30
      gloss_CC_BY_30
      lexicon_CC_BY_30
• the BabelNet Precompiled Index, v2.5, CC_BY_SA_30 licence (2.94 GB)
      dict_CC_BY_SA_30
      gloss_CC_BY_SA_30
      lexicon_CC_BY_SA_30
• the BabelNet Precompiled Index, v2.5, CC_BY_NC_SA_30 licence (1.98 MB).
      dict_CC_BY_NC_SA_30
      gloss_CC_BY_NC_SA_30
      lexicon_CC_BY_NC_SA_30
• the BabelNet Precompiled Index, v2.5, APACHE-20 licence (1.36 MB).
      dict_APACHE_20
      gloss_APACHE_20
      lexicon_APACHE_20
• the BabelNet Precompiled Index, v2.5, CECILL-C licence (4.43 MB).
      dict_CECILL_C
      gloss_CECILL_C
      lexicon_CECILL_C

Alternatively, download the BabelNet Precompiled Index Bundle, v2.5 — note that this download is 5.20 GB! — and unpack it to C:\Program Files\BabelNet, which will create all 20 of the above-named subdirectories directly under C:\Program Files\BabelNet. This is certainly the easier option, in the absence of any guidance on choosing a licence.

WinRAR v5.0 can be used for all unpacking, but beware that the out-of-date version WinRAR 3.8 may report that the downloaded archive is corrupt and may not unpack all the files.

Next, download the BabelNet Java API, v2.5 (30.5 MB), and unpack to C:\Program Files\BabelNet, so that the subdirectories bin, config, docs, lib, licenses, resources and src are directly under C:\Program Files\BabelNet.

Uninstallation involves only the removal of the unpacked folders and files.

Running the BabelNetDemo program

BabelNet must be informed of the locations to which BabelNet and WordNet have been unpacked. Two files in the config subdirectory must be changed.

Assuming you have followed the unpacking suggestions given above, then in config/babelnet.var.properties, put
      babelnet.dir=C:/Program Files/BabelNet
and de-comment the line if necessary;
and in config/jlt.var.properties, put
      jlt.wordnetPrefix=C:/Program Files/WordNet
ie. removing the -3.0.

In the latter file, do NOT change the line
      jlt.wordnetVersion=3.0
as any such change will have no effect.

I don't change the Unix line-ends in these files, nor in any other BabelNet files.

Next, we come to the file run-babelnetdemo.sh, which is meant to run the demo program. As distributed, it contains the line
      java -classpath bin:lib/*:config it.uniroma1.lcl.babelnet.BabelNetDemo

• Change the file extension, to .bat like this: run-babelnetdemo.bat

• Change the two colons in the classpath value to semi-colons

• Comments in Windows batch files start with rem not with # — either make the change or just remove the comments

• Add a line containing pause at the end of the file, if you want to hold the command window when finished while you examine it

• You probably want to redirect BabelNet's output to a file, so add something like > output.txt to your java line

• If you are running out of Java heap space, add an argument like -Xmx512M on your java line

You should now have a file run-babelnetdemo.bat, containing perhaps

      java -Xmx512M -classpath bin;lib/*;config it.uniroma1.lcl.babelnet.BabelNetDemo > output.txt
      pause

Double-clicking this batch file with the mouse will run the demo program, without leaving the Windows GUI. If have are sending the output to a file, you will not see it yet, so allow the program enough time to finish.

You can compare the output of the demo program with the source in BabelNet/src/it/uniroma1/lcl/babelnet/BabelNetDemo.java — but remember the demo is not working from the source but from a compiled version in BabelNet/bin/it/uniroma1/lcl/babelnet/BabelNetDemo.class

Running WSD

This is not possible in BabelNet 2.5 using presently available downloads, but we will follow the process as far as we can.

A Java program for WSD is given in Figure 3 of Multilingual WSD with Just a Few Lines of Code: the BabelNet API, by Roberto Navigli and Simone Paolo Ponzetto.

In order to compile it, you should have downloaded and installed the Java JDK (see above). In any case, you will want to amend the program and recompile it, since it has the example sentence (of English) built-in. The command to compile a Java program is javac. If using this command (at the command prompt, or in a batch file) results in a message that javac is not recognized as an internal or external command, you may need to add the name of the Java directory to your System Path environment variable. Go to the System control panel, Advanced system settings, Environment Variables, System variables; scroll down to Path, and edit it by appending the string ;C:\Program Files\Java\jdk1.7.0_45 After restarting the command prompt, you should now be able to use the javac command.

You can paste the program source from the paper, eg. into a file called wsddemo.java in C:\Program Files\BabelNet, and then make the following alterations:

• Replace the four pairs of matched left and right single quotes on lines 23 and 24 by ASCII apostrophes

• Catch an IOException in procedure disambiguate, ie. place the following outline around lines 3–18 of the source from the paper, with those lines replacing the ... below:
      try
      {
        ...
      }
      catch (IOException ioe)
      {
        System.out.println("Trouble: " + ioe.getMessage());
      }

• Place the following outline around the entire program, which replaces the ... below:
      import it.uniroma1.lcl.jlt.util.Language;
      import it.uniroma1.lcl.jlt.util.ScoredItem;
      import it.uniroma1.lcl.jlt.util.Strings;
      import it.uniroma1.lcl.jlt.ling.Word;
      import it.uniroma1.lcl.knowledge.*;
      import it.uniroma1.lcl.knowledge.graph.*;
      import java.io.IOException;
      import java.util.*;
      public class wsddemo
      {
        ...
      }

With these alterations, the program will compile. On the command line, or in a batch file, in C:\Program Files\BabelNet do:

      javac -classpath bin;lib/*;config wsddemo.java

and a compiled file wsddemo.class will be created in C:\Program Files\BabelNet. I suggest moving wsddemo.class to C:\Program Files\BabelNet\bin in preparation for the next step.

To run C:\Program Files\BabelNet\bin\wsddemo.class, I suggest creating a batch file, C:\Program Files\BabelNet\run-wsd.bat, with the following content:

      java -Xmx512M -classpath bin;lib/*;config wsddemo > wsdout.txt
      pause

This begins to run, but fails because it cannot find something called the "path index" when trying to load the knowledge base. The location of the path index can be specified by putting a line in config/knowledge.var.properties:
      knowledge.graph.pathIndex=C:/Program Files/BabelNet/data
It appears that the path index has not been included in the downloads of BabelNet since version 1.0.1.

I will update this information on how to perform WSD with the current version of BabelNet when I learn that it is again possible to do so.

Disclaimer

This page is offered as a facility for corpus analysis on Windows.  By using it, you are deemed to accept that the author bears no responsibility for any adverse consequences.  Needless to say, he hopes that there will be no such consequences.  He will be pleased to receive comments, but cannot promise to act upon them.


Ciarán Ó Duibhín
2014/05/23
Clár cinn / Home page / Page d'accueil / Hauptseite