Updated Readme (#54)

2022-08-19 22:41:10 +10:00 · 2022-08-19 22:41:10 +10:00 · 440584f0f9
commit 440584f0f9
parent e1ffc79ac4
1 changed files with 88 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -44,6 +44,8 @@ The Python bindings have a similar API to Tantivy. To create a index first a sch
 needs to be built. After that documents can be added to the index and a reader
 can be created to search the index.
 ## Building an index and populating it
 ```python
 import tantivy
@ -51,29 +53,112 @@ import tantivy
 schema_builder = tantivy.SchemaBuilder()
 schema_builder.add_text_field("title", stored=True)
 schema_builder.add_text_field("body", stored=True)
 schema_builder.add_integer_field("doc_id",stored=True)
 schema = schema_builder.build()
-# Creating our index (in memory, but filesystem is available too)
+# Creating our index (in memory)
 index = tantivy.Index(schema)
 ```
 To have a persistent index, use the path
 parameter to store the index on the disk, e.g:
-# Adding one document.
+```python
 index = tantivy.Index(schema, path=os.getcwd() + '/index')
 ```
 By default, tantivy  offers the following tokenizers
 which can be used in tantivy-py:
 -  `default`
 `default` is the tokenizer that will be used if you do not
 assign a specific tokenizer to your text field.
 It will chop your text on punctuation and whitespaces,
 removes tokens that are longer than 40 chars, and lowercase your text.
 -  `raw`
 Does not actual tokenizer your text. It keeps it entirely unprocessed.
 It can be useful to index uuids, or urls for instance.
 -  `en_stem`
 In addition to what `default` does, the `en_stem` tokenizer also
 apply stemming to your tokens. Stemming consists in trimming words to
 remove their inflection. This tokenizer is slower than the default one,
 but is recommended to improve recall.
 to use the above tokenizers, simply provide them as a parameter to `add_text_field`. e.g.
 ```python
 schema_builder.add_text_field("body",  stored=True,  tokenizer_name='en_stem')
 ```
 ### Adding one document.
 ```python
 writer = index.writer()
 writer.add_document(tantivy.Document(
 	doc_id=1,
    title=["The Old Man and the Sea"],
    body=["""He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish."""],
 ))
 # ... and committing
 writer.commit()
 ```
 ## Building and Executing Queries
 First you need to get a searcher for the index
 ```python
 # Reload the index to ensure it points to the last commit.
 index.reload()
 searcher = index.searcher()
-query = index.parse_query("fish days", ["title", "body"])
+```
 Then you need to get a valid query object by parsing your query on the index.
 ```python
 query = index.parse_query("fish days", ["title", "body"])
 (best_score, best_doc_address) = searcher.search(query, 3).hits[0]
 best_doc = searcher.doc(best_doc_address)
 assert best_doc["title"] == ["The Old Man and the Sea"]
 print(best_doc)
 ```
 ### Valid Query Formats
 tantivy-py supports the query language used in tantivy.
 Some basic query Formats.
 - AND and OR conjunctions.
 ```python
 query = index.parse_query('(Old AND Man) OR Stream', ["title", "body"])
 (best_score, best_doc_address) = searcher.search(query, 3).hits[0]
 best_doc = searcher.doc(best_doc_address)
 ```
 - +(includes) and -(excludes) operators.
 ```python
 query = index.parse_query('+Old +Man chef -fished', ["title", "body"])
 (best_score, best_doc_address) = searcher.search(query, 3).hits[0]
 best_doc = searcher.doc(best_doc_address)
 ```
 Note: in a query like above, a word with no +/- acts like an OR.
 - phrase search.
 ```python
 query = index.parse_query('"eighty-four days"', ["title", "body"])
 (best_score, best_doc_address) = searcher.search(query, 3).hits[0]
 best_doc = searcher.doc(best_doc_address)
 ```
 - integer search
 ```python
 query = index.parse_query('"eighty-four days"', ["doc_id"])
 (best_score, best_doc_address) = searcher.search(query, 3).hits[0]
 best_doc = searcher.doc(best_doc_address)
 ```
 Note: for integer search, the integer field should be indexed.
 For more possible query formats and possible query options, see [Tantivy Query Parser Docs.](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html)