1.3 KiB

Raw Blame History

Explanation

Merge policy

When adding documents to a tantivy index, the indexed data will be recorded in multiple sections, called segments. There is more information about the Life of a Segment on the tantivy wiki at Github.

Currently, tantivy-py does not offer a way to customize the merge policy, but fortunately the default merge policy is the LogMergePolicy which is a good choice for most use cases. It is aliased as the default merge policy here.

Segment merging is performed in background threads. After adding documents to an index, it is important to allow time for those threads to complete merges. This is done by calling writer.wait_merging_threads() as the final step after adding data. This method will consume the writer and the identifier will no longer be usable.

Here is a short description of the steps in pseudocode:

schema = Schema(...)
index = Index(schema)
writer = index.writer()
for ... in data:
    document = Document(...)
    writer.add_document(...)
writer.commit()
writer.wait_merging_threads()

1.3 KiB Raw Blame History

Explanation

Merge policy

1.3 KiB

Raw Blame History