'Lucene/Luwak can't match number values in NOT query

I've created a small Lucene/Luwak prototype. I'm adding a query to a lucene syntax, and after it I want to provide an InputDocument which should give me a match on that query.

For TextFields, everything seems to be working. However, when I try to do the same with Numbers / DoublePoint, I never get a match (for not queries / reverse search).

If I'm using text values, it is working:

storeRuleQuery("ruleID_1" , "textA:* -textA:A");
textValues.put("textA" , "B");
And in console: Match in Luwak: ruleID_1:textA:* -textA:A

storeRuleQuery("ruleID_1" , "numberA:* -numberA:500");
numberValues.put("numberA" , 900d);
And in console: No Match

So let me explain the code which I'm using:

First I'm creating a RamDirectory for my monitor:

fsDirectory = new RAMDirectory();

And I'm also defining a field type:

private static final FieldType FIELD_TYPE = new FieldType();
FIELD_TYPE.setStored(false);
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);

Then I'm creating my monitor:

 QueryIndexConfiguration config = new QueryIndexConfiguration();
        config.storeQueries(true);
        monitor = new Monitor(new LuwakQueryParser(null, new KeywordAnalyzer(), number, text), new TermFilteredPresearcher(), fsDirectory, config);

To use DoublePoints, I've created my own QueryParser (LuwakQueryParser)

public class LuwakQueryParser implements MonitorQueryParser {

    private QueryParser parser = null;

    /**
     * Creates a parser with a given default field and analyzer
     * @param defaultField the default field
     * @param analyzer an analyzer to use to analyzer query terms
     */
    public LuwakQueryParser(String defaultField, Analyzer analyzer, List<String> numbers, List<String> text) {
        this.parser = new RangeQueryParser(defaultField, analyzer, numbers, text);
        this.parser.setLowercaseExpandedTerms(false);
        this.parser.setAllowLeadingWildcard(true);
        this.parser.setDefaultOperator(Operator.OR);
    }


    @Override
    public Query parse(String query, Map<String, String> metadata) throws Exception {
        return parser.parse(query);
    }
}

As you can see, I use a custom RangeQueryParser which is then used to parse the queries

public class RangeQueryParser extends QueryParser {

    private final List<String> numbers;
    private final List<String> text;

    public RangeQueryParser(String f, Analyzer a, List<String> numbers, List<String> text) {
        super(f, a);
        this.numbers = numbers;
        this.text = text;
    }

    @Override
    protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws ParseException {
        if (StringUtils.isNotBlank(queryText) && isNumber(field) && NumberUtils.isNumber(queryText)) {
            //needed for single value, transforms it to a rage (eg [500 TO 500])
            return (DoublePoint.newExactQuery(field, Double.parseDouble(queryText)));
        } else if(isText(field)){
            return (super.newFieldQuery(analyzer, field, queryText, quoted));
        }
        return (super.newFieldQuery(analyzer, field, queryText, quoted));
    }

I've removed unused code in this example which currently isn't needed

As you can see, the newFieldQuery-method checks, if its a text or a number value, and adapts the query. Texts will be stored as a normal fieldQuery, while numbers will be turned into a DoublePoint.newExactQuery. So as an example, it would turn "numberA:500" into "numberA:[500 to 500]"

Then I'm adding a query to the monitor:

//input: storeRuleQuery("ruleID_1" , "numberA:* -numberA:500");

public void storeRuleQuery(String ruleID, String query) throws IOException, UpdateException {
        String queryString = query;
        if (queryString.trim().length() > 0) {
            MonitorQuery monitorQuery = new MonitorQuery(ruleID, queryString);
            monitor.deleteById(ruleID);
            monitor.update(monitorQuery);
        }
    }

This is the BooleanQuery which gets created by the monitor.update() method call:

I then would like to match the ruleID_1 by providing an InputDocument like so:

Map<String, Double> numberValues = new HashMap<>();
Map<String, String> textValues = new HashMap<>();

numberValues.put("numberA" , 900d);
InputDocument.Builder builder = InputDocument.builder("document_1");
        for(String numberField : numberValues.keySet()){
            builder.addField(new DoublePoint(numberField, (numberValues.get(numberField))));
        }

        for(String textField : textValues.keySet()){
            builder.addField(new Field(textField, (textValues.get(textField)), FIELD_TYPE));
        }

        List<InputDocument> documents = new ArrayList() {{
            add(builder.build());
        }};
        DocumentBatch batch = DocumentBatch.of(documents);
        Matches<HighlightsMatch> matches;
        matches = monitor.match(batch, HighlightingMatcher.FACTORY);

This are the BooleanQueries which are created from this inputdocument and our matcher.match():

Then I'm retrieving the matches (where I get 0 hits in this case):

 Set<Map<String, String>> matchingIds = new HashSet<>();
        for (DocumentMatches<HighlightsMatch> docMatches : matches) {
            for (HighlightsMatch match : docMatches) {
                MonitorQuery mq = monitor.getQuery(match.getQueryId());
                HashMap<String, String> q = new HashMap<>();
                q.put(match.getQueryId(), mq.getQuery());
                matchingIds.add(q);
            }
        }

        Map<String, String> results = new HashMap<>();

        for (Map<String, String> v : matchingIds) {
            results.put(v.keySet().iterator().next(), v.values().iterator().next());
        }

        for(String key : results.keySet()){
            System.out.println("Match in Luwak: " + key + ":" + results.get(key));
        }

The version of luwak that I'm using:

 <dependency>
       <groupId>com.github.flaxsearch</groupId>
       <artifactId>luwak</artifactId>
       <version>1.5.0</version>
 </dependency>

java lucene

Solution 1:^[1]

Short Answer: Wildcard can only match characters; Cant be used for numbers. Im going to represent a numeric wildcard now as a range like numberA:[-Double.MAX_VALUE TO Double.MAX_VALUE]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	theMahaloRecords

'Lucene/Luwak can't match number values in NOT query

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]