API Reference¶
match¶
Module contains the async interface to match needle against haystack in batch.
- async pfzy.match.fuzzy_match(needle, haystacks, key='', batch_size=4096, scorer=None)[source]¶
Fuzzy find the needle within list of haystacks and get matched results with matching index.
Note
The key argument is optional when the provided haystacks argument is a list of
str
. It will be given a default key value if not present.Warning
The key argument is required when provided haystacks argument is a list of
dict
. If not present,TypeError
will be raised.- Parameters
needle (str) – String to search within the haystacks.
haystacks (List[Union[str, Dict[str, Any]]]) – List of haystack/longer strings to be searched.
key (str) – If haystacks is a list of dictionary, provide the key that can obtain the haystack value to search.
batch_size (int) – Number of entry to be processed together.
scorer (Callable[[str, str], SCORE_indices]) – Desired scorer to use. Currently only
fzy_scorer()
andsubstr_scorer()
is supported.
- Raises
TypeError – When the argument haystacks is
list
ofdict
and the key argument is missing,TypeError
will be raised.- Returns
List of matching haystacks with additional key indices and score.
- Return type
Examples
>>> import asyncio >>> asyncio.run(fuzzy_match("ab", ["acb", "acbabc"])) [{'value': 'acbabc', 'indices': [3, 4]}, {'value': 'acb', 'indices': [0, 2]}]
score¶
Module contains the score calculation algorithems.
- pfzy.score.fzy_scorer(needle, haystack)[source]¶
Use fzy matching algorithem to match needle against haystack.
Note
The fzf unordered search is not supported for performance concern. When the provided needle is not a subsequence of haystack at all, then (-inf, None) is returned.
- Parameters
- Returns
A tuple of matching score with a list of matching indices.
- Return type
Examples
>>> fzy_scorer("ab", "acb") (0.89, [0, 2]) >>> fzy_scorer("ab", "acbabc") (0.98, [3, 4]) >>> fzy_scorer("ab", "wc") (-inf, None)
- pfzy.score.substr_scorer(needle, haystack)[source]¶
Match needle against haystack using
str.find()
.Note
Scores may be negative but the higher the score, the higher the match rank. -inf score means no match found.
See also
https://github.com/aslpavel/sweep.py/blob/3f4a179b708059c12b9e5d76d1eb3c70bf2caadc/sweep.py#L837
- Parameters
- Returns
A tuple of matching score with a list of matching indices.
- Return type
Example
>>> substr_scorer("ab", "awsab") (-1.3, [3, 4]) >>> substr_scorer("ab", "abc") (0.5, [0, 1]) >>> substr_scorer("ab", "iop") (-inf, None) >>> substr_scorer("ab", "asdafswabc") (-1.6388888888888888, [7, 8]) >>> substr_scorer(" ", "asdf") (0, [])