Version 0.1
Copyright © 2008 Lars Vogel
15.04.2008
Table of Contents
Download Apache Luence from the Lucene Homepage. http://lucene.apache.org/
Add the contained jars to your classpath to make lucence available.
Using Lucene is very simple, I recommend to start with the existing examples and modify them from their. This is what I'm going to describe here.
Create a new Java project "LuceneTest". and add the Lucene jars to the build path.
From your lucene download open the folder \src\demo\org\apache\lucene\demo, create the correct packages for these examples and import the classes.
Select IndexFile.java and run it with the argument "C:\temp" to index all files in the folder c:\temp. This will create the index in your current project directory in the directory index.
Select then SearchFiles.java and run it (no arguments are necessary). The program will prompt you for the search term and then presents the results as a printed list.
The API of lucene is surprisingly simple. Lucene uses only a few objects. I'll try to give a (very rought) overview here.
The class "org.apache.lucene.document.Document" is necessary container for the index. Lucene requires all indexed objects to provide an instance of Document. Each document defines one or several fields ( (org.apache.lucene.document.Field). Fields contain classified information about the document. A sample classification is for example the creation date of a file and / or the content. These fields allow you to search later for a specific information in this classification.
Have a look at FileDocument from the example. It defines three fields (path, modified, content).
The class org.apache.lucene.index.IndexWriter creates the index. Via the method addDocument you can add an existing Document to the index.
The constructor for IndexWriter expects the directory to store the index, and the analyzer for the content of the files. In addition a boolean flag is handed over which indicates if the index should be created new or if an existing index should be extended.
After adding the documents you call the method optimize() and close() to optimize the index and to release the resources.
The class org.apache.lucene.analysis.standard.StandardAnalyzer provides a standard analyzer. This part is responsible to analyse the text and to filter out certain fill-words, e.g. "and".
You can create your own analyzer but for me the standard one was sufficient.
The class org.apache.lucene.search.Searcher provides the search functionality. org.apache.lucene.search.IndexSearcher searches over an index.
What is search is provided via the query (org.apache.lucene.search.Query) class.
The search allows wildcard search, e.g. *, ?, logical operations (AND, OR, NOT) and much more, e.g. fussy.
Before posting questions, please see the vogella FAQ. If you have questions or find an error in this article please use the www.vogella.de Google Group. I have created a short list how to create good questions which might also help you.