pyahocorasick 1.1

Author:Wojciech Muła, wojciech_mula@poczta.onet.pl
Last update:2011-04-14
Added on:2011-03-2x

Contents

Introduction

pyahocorasick is a Python module implements two kinds of data structures: trie and Aho-Corasick string matching automaton.

Trie is a dictionary indexed by strings, which allow to retrieve associated items in a time proportional to string length. Aho-Corasick automaton allows to find all occurrences of strings from given set in a single run over text.

(BTW in order to use Aho-Corasick automaton, a trie have to be created; this is the reason why these two distinct entities exist in a single module.)

There are two versions:

Documentation

Documentation of C extension API is available on separate page.

Python module API is similar, but isn't exactly the same.

Last changes

2011-04-13
  • select support for either unicode or bytes
  • simple pattern matching

License

Library is licensed under very liberal two-clauses BSD license. Some portions has been released into public domain.

Full text of license is available in LICENSE file.

Downloads

Following files are available:

See also