Lucene.Net.Linq is available on the NuGet Gallery at http://nuget.org/packages/Lucene.Net.Linq.
Recent versions of Lucene.Net.Linq added support for the Unit of Work pattern.
The standard Unit of Work has methods like registerNew
and registerDeleted
, but I decided to use more generic
names that make the interface appear more like a simple collection of documents.
1 2 3 4 5 6 7 8 9 10 |
|
The Delete
method that takes one or more Query
objects is a case special to Lucene.Net. This is an escape hatch
for when you want to delete one or more documents without first retrieving them. For example, you may want to delete all documents that
match a query like type:person
. I have mixed feelings about this escape hatch, because it smells like a
Leaky Abstraction. On the other hand, since it is an overload to a more abstract method,
I think it makes sense from a performance standpoint.
Anyway, one notable method pair is missing from the interface: registerDirty
and registerClean
. That’s because
the session does some book keeping behind the scenes to automatically detect dirty documents when the session is committed.
This makes using Lucene.Net.Linq as a Repository dead simple, and makes the code look nice and clean in the client:
1 2 3 4 5 6 7 8 9 10 |
|
Document Tracking
So how does it work?
When you ask for an IQueryable from session, the session attaches an instance of IRetrievedDocumentTracker to the queryable before returning it. That internal interface has only one method call:
1
|
|
When the LuceneQueryExecutor starts returning results one item at a time, if it detects that a document tracker has been attached, it makes two copies of each result. One is returned to the client and the other is passed only to the tracker as a hidden copy. This gives the library the ability to compare pristine objects that came from the index unmodified with ones that may have been modified with the client.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Finally, when the session is committed, all tracked documents are compared with their hidden copies (using reflection) to detect which documents were modified. Modified documents are written back to the index.
Eventual Consistency
The concepts of eventual consitency describe what expectations a client may have about the ordering and visibility of writes in a distributed asynchronous persistence system. Werner Vogels has written about different consistency definitions.
Since the unit of work in Lucene.Net.Linq is implemented by an interface named ISession, it would have been nice to provide
Session Consistency. It’s open to some interpretation if this behavior is achieved or not. Since the underlying Lucene engine
will not make changes visible to an IndexReader until a commit happens and the IndexReader is reopened, it is difficult or
impossible to allow a client to see the effects of adding, deleting or modifying a document in subsequent queries within the same
session. However, this problem can be side-stepped by simply calling Commit
between making changes and subsequent queries.
Calling Commit
explicitly may have drawbacks if the client will make further changes and wishes for those changes to
appear atomically. Knowing the details of when staged changes become visible will help clients to make changes smartly.
Comments