Skip to topic | Skip to bottom
Search:
Home

Chris and Janet's website

Home


Start of topic | Skip to actions

Plucene (MAX_FIELD_LENGTH)

I had a problem where large pdf files (7-8MB) were not being thoroughly indexed. FYI, I was trying to index pdf files attached to my twiki site using the PluceneSearch plugin.

It turns out that the Index writer has a limit of 10000 words per field. I found the answer was to modify the following line:

use constant MAX_FIELD_LENGTH => 10_000;

in /usr/share/perl5/Plucene/Index/Writer.pm, to:

use constant MAX_FIELD_LENGTH => 50_000;

Be warned, this slows down the indexing process considerably (I'm prepared to live with this as I want the index to be comprehensive). According to the Lucene (java version) documentation upping this number can cause out of memory errors. I haven't experienced this problem.

Hopefully, the Plucene developers will provide a cleaner way to configure this parameter in the future, along the same lines as the java Lucene code that uses a property (ala -D).
to top


You are here: Home > WebLeftBar > TechStuff > Perl

to top