doc: add MkDocs documentation (#94)
parent
2a9bb86a73
commit
a01ccd99cb
|
@ -0,0 +1 @@
|
||||||
|
# About
|
|
@ -0,0 +1 @@
|
||||||
|
# Explanation
|
|
@ -0,0 +1,46 @@
|
||||||
|
# How-to Guides
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
tantivy-py can be installed using from [pypi](pypi.org) using pip:
|
||||||
|
|
||||||
|
pip install tantivy
|
||||||
|
|
||||||
|
If no binary wheel is present for your operating system the bindings will be
|
||||||
|
build from source, this means that Rust needs to be installed before building
|
||||||
|
can succeed.
|
||||||
|
|
||||||
|
Note that the bindings are using [PyO3](https://github.com/PyO3/pyo3), which
|
||||||
|
only supports python3.
|
||||||
|
|
||||||
|
## Set up a development environment to work on tantivy-py itself
|
||||||
|
|
||||||
|
Setting up a development environment can be done in a virtual environment using
|
||||||
|
[`nox`](https://nox.thea.codes) or using local packages using the provided `Makefile`.
|
||||||
|
|
||||||
|
For the `nox` setup install the virtual environment and build the bindings using:
|
||||||
|
|
||||||
|
python3 -m pip install nox
|
||||||
|
nox
|
||||||
|
|
||||||
|
For the `Makefile` based setup run:
|
||||||
|
|
||||||
|
make
|
||||||
|
|
||||||
|
Running the tests is done using:
|
||||||
|
|
||||||
|
make test
|
||||||
|
|
||||||
|
## Working on tantivy-py documentation
|
||||||
|
|
||||||
|
Please be aware that this documentation is structured using the [Diátaxis](https://diataxis.fr/) framework. In very simple terms, this framework will suggest the correct location for different kinds of documentation. Please make sure you gain a basic understanding of the goals of the framework before making large pull requests with new documentation.
|
||||||
|
|
||||||
|
This documentation uses the [MkDocs](https://mkdocs.readthedocs.io/en/stable/) framework. This package is specified as an optional dependency in the `pyproject.toml` file. To install all optional dev dependencies into your virtual env, run the following command:
|
||||||
|
|
||||||
|
pip install .[dev]
|
||||||
|
|
||||||
|
The [MkDocs](https://mkdocs.readthedocs.io/en/stable/) documentation itself is comprehensive. MkDocs provides some additional context and help around [writing with markdown](https://mkdocs.readthedocs.io/en/stable/user-guide/writing-your-docs/#writing-with-markdown).
|
||||||
|
|
||||||
|
If all you want to do is make a few edits right away, the documentation content is in the `/docs` directory and consists of [Markdown](https://www.markdownguide.org/) files, which can be edited with any text editor.
|
||||||
|
|
||||||
|
The most efficient way to work is to run a MkDocs livereload server in the background. This will launch a local web server on your dev machine, serve the docs (by default at `http://localhost:8000`), and automatically reload the page after you save any changes to the documentation files.
|
|
@ -0,0 +1,22 @@
|
||||||
|
# Welcome to tantivy-py
|
||||||
|
|
||||||
|
tantivy-py is a wrapper for the [tantivy](https://github.com/quickwit-oss/tantivy) full-text search engine, which is inspired by Apache Lucene.
|
||||||
|
|
||||||
|
tantivy-py is [licensed](https://github.com/quickwit-oss/tantivy-py/blob/master/LICENSE) under the [MIT License](https://www.tldrlegal.com/license/mit-license).
|
||||||
|
|
||||||
|
## Important links
|
||||||
|
|
||||||
|
- [tantivy-py code repository](https://github.com/quickwit-oss/tantivy-py)
|
||||||
|
- [tantivy code repository](https://github.com/quickwit-oss/tantivy)
|
||||||
|
- [tantivy Documentation](https://docs.rs/crate/tantivy/latest)
|
||||||
|
- [tantivy query language](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.parse_query)
|
||||||
|
|
||||||
|
## How to use this documentation
|
||||||
|
|
||||||
|
This documentation uses the [Diátaxis](https://diataxis.fr/) framework. The following sections are clearly separated:
|
||||||
|
|
||||||
|
- [Tutorials](tutorials.md): when you want to learn
|
||||||
|
- [How-to Guides](howto.md): when need to accomplish a task
|
||||||
|
- [Explanation](howto.md): when you need a broader understanding and the thinking behind why certain things are set up in a particular way.
|
||||||
|
- [Reference](reference.md): when you need precise, detailed information
|
||||||
|
|
|
@ -0,0 +1,38 @@
|
||||||
|
# Reference
|
||||||
|
|
||||||
|
## Valid Query Formats
|
||||||
|
|
||||||
|
tantivy-py supports the [query language](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.parse_query) used in tantivy.
|
||||||
|
Below a few basic query formats are shown:
|
||||||
|
|
||||||
|
- AND and OR conjunctions.
|
||||||
|
```python
|
||||||
|
query = index.parse_query('(Old AND Man) OR Stream', ["title", "body"])
|
||||||
|
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
|
||||||
|
best_doc = searcher.doc(best_doc_address)
|
||||||
|
```
|
||||||
|
|
||||||
|
- +(includes) and -(excludes) operators.
|
||||||
|
```python
|
||||||
|
query = index.parse_query('+Old +Man chef -fished', ["title", "body"])
|
||||||
|
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
|
||||||
|
best_doc = searcher.doc(best_doc_address)
|
||||||
|
```
|
||||||
|
Note: in a query like above, a word with no +/- acts like an OR.
|
||||||
|
|
||||||
|
- phrase search.
|
||||||
|
```python
|
||||||
|
query = index.parse_query('"eighty-four days"', ["title", "body"])
|
||||||
|
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
|
||||||
|
best_doc = searcher.doc(best_doc_address)
|
||||||
|
```
|
||||||
|
|
||||||
|
- integer search
|
||||||
|
```python
|
||||||
|
query = index.parse_query('"eighty-four days"', ["doc_id"])
|
||||||
|
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
|
||||||
|
best_doc = searcher.doc(best_doc_address)
|
||||||
|
```
|
||||||
|
Note: for integer search, the integer field should be indexed.
|
||||||
|
|
||||||
|
For more possible query formats and possible query options, see [Tantivy Query Parser Docs.](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html)
|
|
@ -0,0 +1,82 @@
|
||||||
|
# Tutorials
|
||||||
|
|
||||||
|
## Building an index and populating it
|
||||||
|
|
||||||
|
```python
|
||||||
|
import tantivy
|
||||||
|
|
||||||
|
# Declaring our schema.
|
||||||
|
schema_builder = tantivy.SchemaBuilder()
|
||||||
|
schema_builder.add_text_field("title", stored=True)
|
||||||
|
schema_builder.add_text_field("body", stored=True)
|
||||||
|
schema_builder.add_integer_field("doc_id",stored=True)
|
||||||
|
schema = schema_builder.build()
|
||||||
|
|
||||||
|
# Creating our index (in memory)
|
||||||
|
index = tantivy.Index(schema)
|
||||||
|
```
|
||||||
|
|
||||||
|
To have a persistent index, use the path
|
||||||
|
parameter to store the index on the disk, e.g:
|
||||||
|
|
||||||
|
```python
|
||||||
|
index = tantivy.Index(schema, path=os.getcwd() + '/index')
|
||||||
|
```
|
||||||
|
|
||||||
|
By default, tantivy offers the following tokenizers
|
||||||
|
which can be used in tantivy-py:
|
||||||
|
- `default`
|
||||||
|
`default` is the tokenizer that will be used if you do not
|
||||||
|
assign a specific tokenizer to your text field.
|
||||||
|
It will chop your text on punctuation and whitespaces,
|
||||||
|
removes tokens that are longer than 40 chars, and lowercase your text.
|
||||||
|
|
||||||
|
- `raw`
|
||||||
|
Does not actual tokenizer your text. It keeps it entirely unprocessed.
|
||||||
|
It can be useful to index uuids, or urls for instance.
|
||||||
|
|
||||||
|
- `en_stem`
|
||||||
|
|
||||||
|
In addition to what `default` does, the `en_stem` tokenizer also
|
||||||
|
apply stemming to your tokens. Stemming consists in trimming words to
|
||||||
|
remove their inflection. This tokenizer is slower than the default one,
|
||||||
|
but is recommended to improve recall.
|
||||||
|
|
||||||
|
to use the above tokenizers, simply provide them as a parameter to `add_text_field`. e.g.
|
||||||
|
```python
|
||||||
|
schema_builder.add_text_field("body", stored=True, tokenizer_name='en_stem')
|
||||||
|
```
|
||||||
|
|
||||||
|
## Adding one document.
|
||||||
|
|
||||||
|
```python
|
||||||
|
writer = index.writer()
|
||||||
|
writer.add_document(tantivy.Document(
|
||||||
|
doc_id=1,
|
||||||
|
title=["The Old Man and the Sea"],
|
||||||
|
body=["""He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish."""],
|
||||||
|
))
|
||||||
|
# ... and committing
|
||||||
|
writer.commit()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Building and Executing Queries
|
||||||
|
|
||||||
|
First you need to get a searcher for the index
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Reload the index to ensure it points to the last commit.
|
||||||
|
index.reload()
|
||||||
|
searcher = index.searcher()
|
||||||
|
```
|
||||||
|
|
||||||
|
Then you need to get a valid query object by parsing your query on the index.
|
||||||
|
|
||||||
|
```python
|
||||||
|
query = index.parse_query("fish days", ["title", "body"])
|
||||||
|
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
|
||||||
|
best_doc = searcher.doc(best_doc_address)
|
||||||
|
assert best_doc["title"] == ["The Old Man and the Sea"]
|
||||||
|
print(best_doc)
|
||||||
|
```
|
||||||
|
|
|
@ -0,0 +1,15 @@
|
||||||
|
site_name: tantivy-py
|
||||||
|
# site_url: https://example.com
|
||||||
|
nav:
|
||||||
|
- Home: index.md
|
||||||
|
- Tutorials: tutorials.md
|
||||||
|
- How-to Guides: howto.md
|
||||||
|
- Explanation: explanation.md
|
||||||
|
- Reference: reference.md
|
||||||
|
- About: about.md
|
||||||
|
theme: readthedocs
|
||||||
|
|
||||||
|
# Can nest documents under above sections
|
||||||
|
# - 'User Guide':
|
||||||
|
# - 'Writing your docs': 'writing-your-docs.md'
|
||||||
|
# - 'Styling your docs': 'styling-your-docs.md'
|
|
@ -6,5 +6,11 @@ build-backend = "maturin"
|
||||||
name = "tantivy"
|
name = "tantivy"
|
||||||
requires-python = ">=3.7"
|
requires-python = ">=3.7"
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
dev = [
|
||||||
|
"nox",
|
||||||
|
"mkdocs",
|
||||||
|
]
|
||||||
|
|
||||||
[tool.maturin]
|
[tool.maturin]
|
||||||
bindings = "pyo3"
|
bindings = "pyo3"
|
||||||
|
|
Loading…
Reference in New Issue