'Lucene NET throwing Fatal Error: AccessViolationException

I am using Lucene NET v4.8 beta, and I have a method that is calling MaybeRefresh on a SearcherManager every 5 seconds. 99.9% of the time, everything works fine. However, 0.1% of the time, I am getting an fatal AccessViolationException error. I am not sure what is causing this fatal error. This is the full stacktrace:

at System.IO.UnmanagedMemoryAccessor.ReadByte(Int64)
at Lucene.Net.Store.BufferedChecksumIndexInput.ReadByte()
at Lucene.Net.Store.DataInput.ReadInt32()
at Lucene.Net.Index.SegmentInfos+FindSegmentsFile.Run(Lucene.Net.Index.IndexCommit)
at Lucene.Net.Index.SegmentInfos.Read(Lucene.Net.Store.Directory)
at Lucene.Net.Index.StandardDirectoryReader.IsCurrent()
at Lucene.Net.Index.StandardDirectoryReader.DoOpenNoWriter(Lucene.Net.Index.IndexCommit)
at Lucene.Net.Index.DirectoryReader.OpenIfChanged(Lucene.Net.Index.DirectoryReader)
at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(Lucene.Net.Search.IndexSearcher)
at Lucene.Net.Search.ReferenceManager`1[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].DoMaybeRefresh()
at Lucene.Net.Search.ReferenceManager`1[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MaybeRefresh()
...my method that calls MaybeRefresh...```

Please note:

I have 2 separate services. One service is periodically writing to the index via IndexWriter (service A), and the other is searching on the index and calling MaybeRefresh every 5 seconds (service B). It is service B that sees this fatal error. Service A works fine and does not have any errors. So I believe that this has something to do with Service B, but mentioning this for full transparency in case I missed something.

If anyone can give any insight into this fatal error caused by Lucene methods, that would be appreciated!

Please also let me know of any additional details I should add to describe this error, if it helps.



Solution 1:[1]

First of all, the error message most likely indicates you are opening the MMapDirectory multiple times on the same set of index files, and you are getting the exception because both instances are writing to the same memory space. I am not sure whether that can be considered a bug or not, but it should be noted that for writing, you don't need to open a RAM-intensive MMapDirectory, you can just use a SimpleFSDirectory.

Directory dir = new SimpleFSDirectory(filePath);

That being said, the following advice will make the above point moot.

Option 1

Typically, you should limit the number of processes opening a single index to 1. If you need to write at the same time that reads happen, you can use the near real-time search feature of Lucene.

The steps involved in doing this are:

  1. Open an IndexWriter and keep it open (register it as a singleton).
  2. Create a SearcherManager with the IndexWriter as param (or alternatively, use writer.GetReader()).
  3. Use SearcherManager to search.
  4. Use IndexWriter for indexing operations.
  5. Commit() after indexing.
  6. Call searcherManager.MaybeRefresh() after adding a document.

As pointed out in the linked tutorial, you can use ControlledRealTimeReopenThread to periodically refresh the IndexReader in the background.

Finally, to solve the problem of opening multiple Directory instances (which is what is ultimately causing this issue), use a single process for both writing and reading. Since writes typically happen less often than reads, I recommend doing all of this inside of your searching service and then using a network sockets (TCP, HTTP, etc) to message the search service from the write service in order to write to/update/delete from the index.

Option 2

If you want to open the same index in multiple processes, you can use the Lucene.Net.Replicator module to write your index with one service and then publish it for replication to other services. Lucene.Net.Replicator is typically recommended for replicating the same index across multiple nodes in a web farm, but can also be used for writing the index in one service and reading it in another service. Essentially, for your use case you would have a separate index directory for each one of your services.

However, it will also require you to build a network service to publish the updates to. The primary difference is you wouldn't need to design a specialized web API to write/update/delete the index, instead you could use an existing API to publish your index after it is written.

References:

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1