pyDAWG 1.1

Author:Wojciech Muła, wojciech_mula@poczta.onet.pl
Last update:2011-04-14
Added on:2011-04-08

Contents

Introduction

PyDAWG is a python module implements DAWG graph structure, which allow to store set of strings and check existence of a string in linear time (in terms of a string length). There is support for both unicode strings and bytes objects.

DAWG is constructed by incremental algorithm described in Incremental algorithm for sorted data, Jan Daciuk, Stoyan Mihov, Bruce Watson, and Richard Watson, Computational Linguistics, 26(1), March 2000. Prof. Jan Daciuk offers also some useful documentation, presentations and even sample code on his site.

The algorithm asserts that input words are sorted in lexicographic order; default Python sort() orders strings correctly.

Also minimal perfect hashing (MPH) is supported, i.e. there is a function that maps words to unique number; this function is bidirectional, its possible to find number for given word or get word from number.


There are two versions of module:

Documentation

Documentation of C extension API is available on separate page.

Python module API is similar, but isn't exactly the same, some methods are not implemented.

License

Library is licensed under very liberal three-clauses BSD license. Some portions has been released into public domain.

Full text of license is available in LICENSE file.

Downloads

Following files are available:

Changes

2011-04-14:
  • minimal perfect hashing
  • full unicode support
  • pattern matching

See also