java - Querying lucene index with single characters, e.g persons initials -

April 15, 2010

i have database of companies , people want query using lucene (via hibernate search). search feature implemented autocomplete-style lookup, web page suggest matches user types.

some of companies , people identified using initials e.g.

g & h civil engineering
j g van der merwe

i want user start getting matches after typing couple of letters, progressively refining search add more text (possibly including spaces). i'm querying couple of different fields, e.g. name, trade name, id numbers, phone numbers etc. using single term, such user type part of name, id number, trade name or cell number.

however, i'm having trouble setting index , query such term g & h match document. using term civil, there lot of matches. single characters spaces in between aren't matching anything.

the test below fails on last line. i'm unsure of combination of analyzers, tokenizers, filters & queries should using.

@test public void testsearching() throws exception {     analyzer analyzer = new reusableanalyzerbase() {         @override         protected tokenstreamcomponents createcomponents(string fieldname, reader reader) {             standardtokenizer tokenizer = new standardtokenizer(version.lucene_36, reader);             lowercasefilter lowercasefilter = new lowercasefilter(version.lucene_36, tokenizer);             ngramtokenfilter filter = new ngramtokenfilter(lowercasefilter, 3, 20);             return new tokenstreamcomponents(tokenizer, filter);         }     };     directory ramdirectory = new ramdirectory();      indexwriterconfig config = new indexwriterconfig(version.lucene_36, analyzer);     indexwriter w = new indexwriter(ramdirectory, config);      document doc = new document();     doc.add(new field("id", "819", field.store.yes, field.index.not_analyzed));     doc.add(new field("particulars.registeredname", "g & h civil engineering", field.store.no, field.index.analyzed));      w.adddocument(doc);     w.close();      // search     int numberofhits = 200;     topscoredoccollector collector = topscoredoccollector.create(numberofhits, true);     indexsearcher searcher = new indexsearcher(indexreader.open(ramdirectory));      phrasequery q = new phrasequery();     q.add(new term("particulars.registeredname", "civil"));     searcher.search(q, collector);     scoredoc[] hits = collector.topdocs().scoredocs;     assertthat(hits.length, greaterthan(0));      phrasequery phrasequery = new phrasequery();     phrasequery.add(new term("particulars.registeredname", "g & h"));     searcher.search(q, collector);     hits = collector.topdocs().scoredocs;     assertthat(hits.length, greaterthan(0)); // fails - no matches

i'm new lucene - pointers appreciated.

your particular issue related fact you're reusing collector, stateful , designed one-time use only. using new collector in second query should trick.

however, please note hibernate search you're not supposed touch lucene internals much: hibernate search automatically derive lucene documents entities @ indexing time, , build index readers , collectors necessary when querying. encourage avoid using lucene directly if you're still new lucene/hibernate search: lucene powerful, not easy tool use.

that mean using annotated (or programmatically mapped) entities instead of building document manually. please refer the documentation, in particular section entity mapping , section analysis.

also, when querying, may use the hibernate search dsl. easier building raw lucene queries. , when query has been built, may ask hibernate search retrieve results easily.

Search This Blog

CSS

java - Querying lucene index with single characters, e.g persons initials -

Comments

Post a Comment

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -