'Lucene/Luwak can't match number values in NOT query
I've created a small Lucene/Luwak prototype. I'm adding a query to a lucene syntax, and after it I want to provide an InputDocument which should give me a match on that query.
For TextFields, everything seems to be working. However, when I try to do the same with Numbers / DoublePoint, I never get a match (for not queries / reverse search).
If I'm using text values, it is working:
storeRuleQuery("ruleID_1" , "textA:* -textA:A");
textValues.put("textA" , "B");
And in console: Match in Luwak: ruleID_1:textA:* -textA:A
VS
storeRuleQuery("ruleID_1" , "numberA:* -numberA:500");
numberValues.put("numberA" , 900d);
And in console: No Match
So let me explain the code which I'm using:
First I'm creating a RamDirectory for my monitor:
fsDirectory = new RAMDirectory();
And I'm also defining a field type:
private static final FieldType FIELD_TYPE = new FieldType();
FIELD_TYPE.setStored(false);
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
Then I'm creating my monitor:
QueryIndexConfiguration config = new QueryIndexConfiguration();
config.storeQueries(true);
monitor = new Monitor(new LuwakQueryParser(null, new KeywordAnalyzer(), number, text), new TermFilteredPresearcher(), fsDirectory, config);
To use DoublePoints, I've created my own QueryParser (LuwakQueryParser)
public class LuwakQueryParser implements MonitorQueryParser {
private QueryParser parser = null;
/**
* Creates a parser with a given default field and analyzer
* @param defaultField the default field
* @param analyzer an analyzer to use to analyzer query terms
*/
public LuwakQueryParser(String defaultField, Analyzer analyzer, List<String> numbers, List<String> text) {
this.parser = new RangeQueryParser(defaultField, analyzer, numbers, text);
this.parser.setLowercaseExpandedTerms(false);
this.parser.setAllowLeadingWildcard(true);
this.parser.setDefaultOperator(Operator.OR);
}
@Override
public Query parse(String query, Map<String, String> metadata) throws Exception {
return parser.parse(query);
}
}
As you can see, I use a custom RangeQueryParser which is then used to parse the queries
public class RangeQueryParser extends QueryParser {
private final List<String> numbers;
private final List<String> text;
public RangeQueryParser(String f, Analyzer a, List<String> numbers, List<String> text) {
super(f, a);
this.numbers = numbers;
this.text = text;
}
@Override
protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws ParseException {
if (StringUtils.isNotBlank(queryText) && isNumber(field) && NumberUtils.isNumber(queryText)) {
//needed for single value, transforms it to a rage (eg [500 TO 500])
return (DoublePoint.newExactQuery(field, Double.parseDouble(queryText)));
} else if(isText(field)){
return (super.newFieldQuery(analyzer, field, queryText, quoted));
}
return (super.newFieldQuery(analyzer, field, queryText, quoted));
}
I've removed unused code in this example which currently isn't needed
As you can see, the newFieldQuery-method checks, if its a text or a number value, and adapts the query. Texts will be stored as a normal fieldQuery, while numbers will be turned into a DoublePoint.newExactQuery. So as an example, it would turn "numberA:500" into "numberA:[500 to 500]"
Then I'm adding a query to the monitor:
//input: storeRuleQuery("ruleID_1" , "numberA:* -numberA:500");
public void storeRuleQuery(String ruleID, String query) throws IOException, UpdateException {
String queryString = query;
if (queryString.trim().length() > 0) {
MonitorQuery monitorQuery = new MonitorQuery(ruleID, queryString);
monitor.deleteById(ruleID);
monitor.update(monitorQuery);
}
}
This is the BooleanQuery which gets created by the monitor.update() method call:
I then would like to match the ruleID_1 by providing an InputDocument like so:
Map<String, Double> numberValues = new HashMap<>();
Map<String, String> textValues = new HashMap<>();
numberValues.put("numberA" , 900d);
InputDocument.Builder builder = InputDocument.builder("document_1");
for(String numberField : numberValues.keySet()){
builder.addField(new DoublePoint(numberField, (numberValues.get(numberField))));
}
for(String textField : textValues.keySet()){
builder.addField(new Field(textField, (textValues.get(textField)), FIELD_TYPE));
}
List<InputDocument> documents = new ArrayList() {{
add(builder.build());
}};
DocumentBatch batch = DocumentBatch.of(documents);
Matches<HighlightsMatch> matches;
matches = monitor.match(batch, HighlightingMatcher.FACTORY);
This are the BooleanQueries which are created from this inputdocument and our matcher.match():
Then I'm retrieving the matches (where I get 0 hits in this case):
Set<Map<String, String>> matchingIds = new HashSet<>();
for (DocumentMatches<HighlightsMatch> docMatches : matches) {
for (HighlightsMatch match : docMatches) {
MonitorQuery mq = monitor.getQuery(match.getQueryId());
HashMap<String, String> q = new HashMap<>();
q.put(match.getQueryId(), mq.getQuery());
matchingIds.add(q);
}
}
Map<String, String> results = new HashMap<>();
for (Map<String, String> v : matchingIds) {
results.put(v.keySet().iterator().next(), v.values().iterator().next());
}
for(String key : results.keySet()){
System.out.println("Match in Luwak: " + key + ":" + results.get(key));
}
The version of luwak that I'm using:
<dependency>
<groupId>com.github.flaxsearch</groupId>
<artifactId>luwak</artifactId>
<version>1.5.0</version>
</dependency>
Solution 1:[1]
Short Answer: Wildcard can only match characters; Cant be used for numbers. Im going to represent a numeric wildcard now as a range like numberA:[-Double.MAX_VALUE TO Double.MAX_VALUE]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | theMahaloRecords |