doc: add MkDocs documentation (#94)

master
Caleb Hattingh 2023-08-04 13:27:52 +02:00 committed by GitHub
parent 2a9bb86a73
commit a01ccd99cb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 211 additions and 0 deletions

1
docs/about.md Normal file
View File

@ -0,0 +1 @@
# About

1
docs/explanation.md Normal file
View File

@ -0,0 +1 @@
# Explanation

46
docs/howto.md Normal file
View File

@ -0,0 +1,46 @@
# How-to Guides
## Installation
tantivy-py can be installed using from [pypi](pypi.org) using pip:
pip install tantivy
If no binary wheel is present for your operating system the bindings will be
build from source, this means that Rust needs to be installed before building
can succeed.
Note that the bindings are using [PyO3](https://github.com/PyO3/pyo3), which
only supports python3.
## Set up a development environment to work on tantivy-py itself
Setting up a development environment can be done in a virtual environment using
[`nox`](https://nox.thea.codes) or using local packages using the provided `Makefile`.
For the `nox` setup install the virtual environment and build the bindings using:
python3 -m pip install nox
nox
For the `Makefile` based setup run:
make
Running the tests is done using:
make test
## Working on tantivy-py documentation
Please be aware that this documentation is structured using the [Diátaxis](https://diataxis.fr/) framework. In very simple terms, this framework will suggest the correct location for different kinds of documentation. Please make sure you gain a basic understanding of the goals of the framework before making large pull requests with new documentation.
This documentation uses the [MkDocs](https://mkdocs.readthedocs.io/en/stable/) framework. This package is specified as an optional dependency in the `pyproject.toml` file. To install all optional dev dependencies into your virtual env, run the following command:
pip install .[dev]
The [MkDocs](https://mkdocs.readthedocs.io/en/stable/) documentation itself is comprehensive. MkDocs provides some additional context and help around [writing with markdown](https://mkdocs.readthedocs.io/en/stable/user-guide/writing-your-docs/#writing-with-markdown).
If all you want to do is make a few edits right away, the documentation content is in the `/docs` directory and consists of [Markdown](https://www.markdownguide.org/) files, which can be edited with any text editor.
The most efficient way to work is to run a MkDocs livereload server in the background. This will launch a local web server on your dev machine, serve the docs (by default at `http://localhost:8000`), and automatically reload the page after you save any changes to the documentation files.

22
docs/index.md Normal file
View File

@ -0,0 +1,22 @@
# Welcome to tantivy-py
tantivy-py is a wrapper for the [tantivy](https://github.com/quickwit-oss/tantivy) full-text search engine, which is inspired by Apache Lucene.
tantivy-py is [licensed](https://github.com/quickwit-oss/tantivy-py/blob/master/LICENSE) under the [MIT License](https://www.tldrlegal.com/license/mit-license).
## Important links
- [tantivy-py code repository](https://github.com/quickwit-oss/tantivy-py)
- [tantivy code repository](https://github.com/quickwit-oss/tantivy)
- [tantivy Documentation](https://docs.rs/crate/tantivy/latest)
- [tantivy query language](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.parse_query)
## How to use this documentation
This documentation uses the [Diátaxis](https://diataxis.fr/) framework. The following sections are clearly separated:
- [Tutorials](tutorials.md): when you want to learn
- [How-to Guides](howto.md): when need to accomplish a task
- [Explanation](howto.md): when you need a broader understanding and the thinking behind why certain things are set up in a particular way.
- [Reference](reference.md): when you need precise, detailed information

38
docs/reference.md Normal file
View File

@ -0,0 +1,38 @@
# Reference
## Valid Query Formats
tantivy-py supports the [query language](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.parse_query) used in tantivy.
Below a few basic query formats are shown:
- AND and OR conjunctions.
```python
query = index.parse_query('(Old AND Man) OR Stream', ["title", "body"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)
```
- +(includes) and -(excludes) operators.
```python
query = index.parse_query('+Old +Man chef -fished', ["title", "body"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)
```
Note: in a query like above, a word with no +/- acts like an OR.
- phrase search.
```python
query = index.parse_query('"eighty-four days"', ["title", "body"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)
```
- integer search
```python
query = index.parse_query('"eighty-four days"', ["doc_id"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)
```
Note: for integer search, the integer field should be indexed.
For more possible query formats and possible query options, see [Tantivy Query Parser Docs.](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html)

82
docs/tutorials.md Normal file
View File

@ -0,0 +1,82 @@
# Tutorials
## Building an index and populating it
```python
import tantivy
# Declaring our schema.
schema_builder = tantivy.SchemaBuilder()
schema_builder.add_text_field("title", stored=True)
schema_builder.add_text_field("body", stored=True)
schema_builder.add_integer_field("doc_id",stored=True)
schema = schema_builder.build()
# Creating our index (in memory)
index = tantivy.Index(schema)
```
To have a persistent index, use the path
parameter to store the index on the disk, e.g:
```python
index = tantivy.Index(schema, path=os.getcwd() + '/index')
```
By default, tantivy offers the following tokenizers
which can be used in tantivy-py:
- `default`
`default` is the tokenizer that will be used if you do not
assign a specific tokenizer to your text field.
It will chop your text on punctuation and whitespaces,
removes tokens that are longer than 40 chars, and lowercase your text.
- `raw`
Does not actual tokenizer your text. It keeps it entirely unprocessed.
It can be useful to index uuids, or urls for instance.
- `en_stem`
In addition to what `default` does, the `en_stem` tokenizer also
apply stemming to your tokens. Stemming consists in trimming words to
remove their inflection. This tokenizer is slower than the default one,
but is recommended to improve recall.
to use the above tokenizers, simply provide them as a parameter to `add_text_field`. e.g.
```python
schema_builder.add_text_field("body", stored=True, tokenizer_name='en_stem')
```
## Adding one document.
```python
writer = index.writer()
writer.add_document(tantivy.Document(
doc_id=1,
title=["The Old Man and the Sea"],
body=["""He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish."""],
))
# ... and committing
writer.commit()
```
## Building and Executing Queries
First you need to get a searcher for the index
```python
# Reload the index to ensure it points to the last commit.
index.reload()
searcher = index.searcher()
```
Then you need to get a valid query object by parsing your query on the index.
```python
query = index.parse_query("fish days", ["title", "body"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)
assert best_doc["title"] == ["The Old Man and the Sea"]
print(best_doc)
```

15
mkdocs.yml Normal file
View File

@ -0,0 +1,15 @@
site_name: tantivy-py
# site_url: https://example.com
nav:
- Home: index.md
- Tutorials: tutorials.md
- How-to Guides: howto.md
- Explanation: explanation.md
- Reference: reference.md
- About: about.md
theme: readthedocs
# Can nest documents under above sections
# - 'User Guide':
# - 'Writing your docs': 'writing-your-docs.md'
# - 'Styling your docs': 'styling-your-docs.md'

View File

@ -6,5 +6,11 @@ build-backend = "maturin"
name = "tantivy"
requires-python = ">=3.7"
[project.optional-dependencies]
dev = [
"nox",
"mkdocs",
]
[tool.maturin]
bindings = "pyo3"