by Lars Vogel

Follow me on twitter

Lars Vogel on Google+

Quick getting started guide for Apache Lucene - Tutorial

Lars Vogel

Version 0.1

15.04.2008

Abstract

Apache Lucene provide a full featured search engine which can be easily integrated into own Java implementations.

This article explains how to run pre-delivered Lucene Examples and gives a short explaination of the used objects.


Table of Contents

1. Installation
2. Using Lucene
3. Lucene Objects
3.1. Document and Fields
3.2. IndexWriter
3.3. Analyser
3.4. Searcher and Query
4. Thank you
5. Questions and Discussion
6. Links and Literature
6.1. Apache Lucene Links

1. Installation

Download Apache Luence from the Lucene Homepage. http://lucene.apache.org/

Add the contained jars to your classpath to make lucence available.

2. Using Lucene

Using Lucene is very simple, I recommend to start with the existing examples and modify them from their. This is what I'm going to describe here.

Create a new Java project "LuceneTest". and add the Lucene jars to the build path.

From your lucene download open the folder \src\demo\org\apache\lucene\demo, create the correct packages for these examples and import the classes.

Select IndexFile.java and run it with the argument "C:\temp" to index all files in the folder c:\temp. This will create the index in your current project directory in the directory index.

Select then SearchFiles.java and run it (no arguments are necessary). The program will prompt you for the search term and then presents the results as a printed list.

Tip

Check the source code of the used Java classes. They are extremely simple and allow you easily to create your own indexer and searcher.

3. Lucene Objects

The API of lucene is surprisingly simple. Lucene uses only a few objects. I'll try to give a (very rought) overview here.

3.1. Document and Fields

The class "org.apache.lucene.document.Document" is necessary container for the index. Lucene requires all indexed objects to provide an instance of Document. Each document defines one or several fields ( (org.apache.lucene.document.Field). Fields contain classified information about the document. A sample classification is for example the creation date of a file and / or the content. These fields allow you to search later for a specific information in this classification.

Have a look at FileDocument from the example. It defines three fields (path, modified, content).

3.2. IndexWriter

The class org.apache.lucene.index.IndexWriter creates the index. Via the method addDocument you can add an existing Document to the index.

The constructor for IndexWriter expects the directory to store the index, and the analyzer for the content of the files. In addition a boolean flag is handed over which indicates if the index should be created new or if an existing index should be extended.

After adding the documents you call the method optimize() and close() to optimize the index and to release the resources.

3.3. Analyser

The class org.apache.lucene.analysis.standard.StandardAnalyzer provides a standard analyzer. This part is responsible to analyse the text and to filter out certain fill-words, e.g. "and".

You can create your own analyzer but for me the standard one was sufficient.

3.4. Searcher and Query

The class org.apache.lucene.search.Searcher provides the search functionality. org.apache.lucene.search.IndexSearcher searches over an index.

What is search is provided via the query (org.apache.lucene.search.Query) class.

The search allows wildcard search, e.g. *, ?, logical operations (AND, OR, NOT) and much more, e.g. fussy.

4. Thank you

Please help me to support this article:

Flattr this

5. Questions and Discussion

Before posting questions, please see the vogella FAQ. If you have questions or find an error in this article please use the www.vogella.de Google Group. I have created a short list how to create good questions which might also help you.

6. Links and Literature

6.1. Apache Lucene Links

http://lucene.apache.org/