jWeb1T - java Web 1T 5-gram Searcher

jWeb1T is an open source Java tool for efficiently searching the Web 1T 5-gram corpus. It is based on a binary search algorithm that finds the n-grams and returns their frequency counts in logarithmic time. As the corpus is stored in many files a simple index is used to retrive the files containing the n-grams. The corpus must be installed and uncopressed on a hard drive (approx. 60 GB).

jWeb1T has been developed by Claudio Giuliano at FBK, Human Language Technologies group. It is released as free software with full source code, provided under the terms of the Apache License, Version 2.0. The latest version of jWeb1T is 1.0, released 19 July 2007. To download, install and run it go to the User's Guide.

jWeb1T has been used in the FBK-irst system for the English Lexical Substitution Task at SemEval 2007:

Claudio Giuliano, Alfio Gliozzo and Carlo Strapparava. FBK-irst: Lexical Substitution Task Exploiting Domain and Syntagmatic Coherence . In Proceedings of the 4th Interational Workshop on Semantic Evaluations (SemEval-2007), Prague, 23-24 June 2007.

jWeb1T has been funded by X-Media Project.