API Reference

match

Module contains the async interface to match needle against haystack in batch.

async pfzy.match.fuzzy_match(needle, haystacks, key='', batch_size=4096, scorer=None)[source]

Fuzzy find the needle within list of haystacks and get matched results with matching index.

Note

The key argument is optional when the provided haystacks argument is a list of str. It will be given a default key value if not present.

Warning

The key argument is required when provided haystacks argument is a list of dict. If not present, TypeError will be raised.

Parameters
  • needle (str) – String to search within the haystacks.

  • haystacks (List[Union[str, Dict[str, Any]]]) – List of haystack/longer strings to be searched.

  • key (str) – If haystacks is a list of dictionary, provide the key that can obtain the haystack value to search.

  • batch_size (int) – Number of entry to be processed together.

  • scorer (Callable[[str, str], SCORE_indices]) – Desired scorer to use. Currently only fzy_scorer() and substr_scorer() is supported.

Raises

TypeError – When the argument haystacks is list of dict and the key argument is missing, TypeError will be raised.

Returns

List of matching haystacks with additional key indices and score.

Return type

List[Dict[str, Any]]

Examples

>>> import asyncio
>>> asyncio.run(fuzzy_match("ab", ["acb", "acbabc"]))
[{'value': 'acbabc', 'indices': [3, 4]}, {'value': 'acb', 'indices': [0, 2]}]

score

Module contains the score calculation algorithems.

pfzy.score.fzy_scorer(needle, haystack)[source]

Use fzy matching algorithem to match needle against haystack.

Note

The fzf unordered search is not supported for performance concern. When the provided needle is not a subsequence of haystack at all, then (-inf, None) is returned.

Parameters
  • needle (str) – Substring to find in haystack.

  • haystack (str) – String to be searched and scored against.

Returns

A tuple of matching score with a list of matching indices.

Return type

Tuple[float, Optional[List[int]]]

Examples

>>> fzy_scorer("ab", "acb")
(0.89, [0, 2])
>>> fzy_scorer("ab", "acbabc")
(0.98, [3, 4])
>>> fzy_scorer("ab", "wc")
(-inf, None)
pfzy.score.substr_scorer(needle, haystack)[source]

Match needle against haystack using str.find().

Note

Scores may be negative but the higher the score, the higher the match rank. -inf score means no match found.

Parameters
  • needle (str) – Substring to find in haystack.

  • haystack (str) – String to be searched and scored against.

Returns

A tuple of matching score with a list of matching indices.

Return type

Tuple[float, Optional[List[int]]]

Example

>>> substr_scorer("ab", "awsab")
(-1.3, [3, 4])
>>> substr_scorer("ab", "abc")
(0.5, [0, 1])
>>> substr_scorer("ab", "iop")
(-inf, None)
>>> substr_scorer("ab", "asdafswabc")
(-1.6388888888888888, [7, 8])
>>> substr_scorer(" ", "asdf")
(0, [])