<<O>>  Difference Topic Perl (r1.3 - 12 Jul 2005 - ChrisJones)

META TOPICPARENT TechStuff

Plucene (MAX_FIELD_LENGTH)

Line: 7 to 7

It turns out that the Index writer has a limit of 10000 words per field. I found the answer was to modify the following line:
Changed:
<
<
use constant MAX_FIELD_LENGTH => 10_000;
>
>
use constant MAX_FIELD_LENGTH => 10_000;

in /usr/share/perl5/Plucene/Index/Writer.pm, to:

Changed:
<
<
use constant MAX_FIELD_LENGTH => 50_000;
>
>
use constant MAX_FIELD_LENGTH => 50_000;

Be warned, this slows down the indexing process considerably (I'm prepared to live with this as I want the index to be comprehensive). According to the Lucene (java version) documentation upping this number can cause out of memory errors. I haven't experienced this problem.

Changed:
<
<
Hopefully, the Plucene developers will provide a cleaner way to configure this parameter in the future, along the same lines as the java Lucene code that uses a property (ala -D).
>
>
Hopefully, the Plucene developers will provide a cleaner way to configure this parameter in the future, along the same lines as the java Lucene code that uses a runtime property (via -D).

As an aside, I found an online version of the Porter Stemming algorithm. I'm not sure if the standard analyser in plucene is using this algorithm, but it's interesting to enter words with apostrophes. It certainly shows the importance of using the same analyser for both index creation and query parsing.


View topic | Diffs | r1.4 | > | r1.3 | > | r1.2 | More
Revision r1.2 - 12 Jul 2005 - 03:57 - ChrisJones
Revision r1.3 - 12 Jul 2005 - 18:35 - ChrisJones