aspell-python 1.12

Treść

Introduction

aspell-python is a Python wrapper for GNU Aspell. There are two versions, one compatible with Python 2.x and other with Python 3.x.

Version for Py2 has been tested with Python 2.1, Python 2.3.4 and Python 2.4.1. Probably it works fine with all Python versions not older then 2.0.

Version for Py3 has been tested with Python 3.2.

News

Whats new in version 1.12?

  • Py3k: added support for in operator.
  • Py3k: method/function ConfigKeys returns dictionary.

Whats new in version 1.11?

  • Fixed ConfigKeys method, now works fine with aspell 0.60, but I do not know and I have not possibility to test if it now works with aspell 0.50. Thanks to Gora Mohanty for mention the bug. Now ConfigKeys returns key's value of that type which certain field declare. Previously all values were string, now can be boolean (True/False), integer, string and list, so if your program plays with config, you have to modify code a bit.
  • Fixed Speller method.
  • test.py — simple test script

Download

Copyrights

Module is covered by BSD license. Author tried to make code usable and free of errors, but can't give any guarantees and is not responsible for any damages in your software, hardware or anything else.

Wojciech Muła, wojciech_mula@poczta.onet.pl

Installation

First extract the sources:

$ tar -xzf aspell-python[some-number].tar.gz
$ cd aspell-python

You need to have libaspell headers installed, Debian package is called libaspell-dev, other distributions should have similar package.

In order to install aspell-python for all users, you must be a root. If you are, type following command:

$ python setup.py install

It builds package and installs aspell.so in directory /usr/lib/{python}/site-packages.

If you don't have root login, you can build package locally. Type following command:

$ python setup.py build

It builds package and places aspell.so in build/lib.{something}. Change to this directory and when you will run python it will be able to import module (you may also copy aspell.so wherever you want, and there run python).

Documentation

Aspell-python module is seen in python under name aspell. So, aspell-python module is imported in following way:

import aspell

The module provides Speller class, two methods, and three types of exceptions — all described below.

Methods

ConfigKeys() => dictionary

Method returns a dictionary, where keys are names of configuration item, values are 3-tuples:

  • key type (string, integer, boolean, list)
  • default value for the key
  • short description - "internal" means that aspell doesn't provide any description of item and you shouldn't set/change it, unless you know what you do

Aspell's documentation covers in details all of keys and their meaning. Below is a list of most useful and obvious options (it is a filtered output of ConfigKeys).

('data-dir', 'string', '/usr/lib/aspell-0.60', 'location of language data files')
('dict-dir', 'string', '/usr/lib/aspell-0.60', 'location of the main word list')
('encoding', 'string', 'ISO-8859-2', 'encoding to expect data to be in')
('home-dir', 'string', '/home/wojtek', 'location for personal files')
('ignore', 'integer', 1, 'ignore words <= n chars')
('ignore-accents', 'boolean', False, 'ignore accents when checking words -- CURRENTLY IGNORED')
('ignore-case', 'boolean', False, 'ignore case when checking words')
('ignore-repl', 'boolean', False, 'ignore commands to store replacement pairs')
('keyboard', 'string', 'standard', 'keyboard definition to use for typo analysis')
('lang', 'string', 'pl_PL', 'language code')
('master', 'string', 'pl_PL', 'base name of the main dictionary to use')
('personal-path', 'string', '/home/wojtek/.aspell.pl_PL.pws', 'internal')
('repl-path', 'string', '/home/wojtek/.aspell.pl_PL.prepl', 'internal')
('run-together', 'boolean', False, 'consider run-together words legal')
('save-repl', 'boolean', True, 'save replacement pairs on save all')
('warn', 'boolean', True, 'enable warnings')
('backup', 'boolean', True, 'create a backup file by appending ".bak"')
('reverse', 'boolean', False, 'reverse the order of the suggest list')
('suggest', 'boolean', True, 'suggest possible replacements')

Classes

Speller()

Method creates an AspellSpeller object which is an interface to the GNU Aspell.

Speller called with no parameters creates speller using default configuration. If you want to change or set some parameter you can pass pair of strings: key and it's value. One can get available keys using ConfigKeys.

>>> aspell.Speller("key", "value")

If you want to set more than one pair of key&value, pass the list of pairs to the Speller().

>>> aspell.Speller( ("k1","v1"), ("k2","v2"), ("k3","v3") )

Exceptions

Module defines following errors:

Additionally TypeError is raised when you pass wrong parameters to method.

AspellConfigError

Error is reported by methods Speller and ConfigKeys. The most common error is passing unknown key.

>>> s = aspell.Speller('python', '2.3')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
aspell.AspellConfigError: The key "python" is unknown.
>>>

AspellModuleError

Error is reported when module can't allocate aspell structures.

AspellSpellerError

Error is reported by libaspell.

>>> # we set master dictionary file, the file doesn't exist
>>> s = Speller('master', '/home/dictionary.rws')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
aspell.AspellSpellerError: The file "/home/dictionary.rws" can not be opened for reading.
>>>

AspellSpeller Object

The AspellSpeller object provides interface to the aspell. It has several methods, described below.

In examples the assumption is that following code has been executed earlier:

>>> import aspell
>>> s = aspell.Speller('lang', 'en')
>>> s
<AspellSpeller object at 0x40209050>
>>>

ConfigKeys() => dictionary

New in version 1.1, changed in 1.13.

Method returns current configuration of speller.

Result has the same meaning as ConfigKeys() procedure.

check(word) => boolean

Method checks spelling of given word. If word is present in the main or personal (see addtoPersonal) or session dictionary (see addtoSession) returns True, otherwise False.

>>> s.check('word') # correct word
True
>>> s.check('wrod') # incorrect
False
>>>

New in version 1.13.

It's possible to use operator in or not in instead of check().

>>> 'word' in s
True
>>> 'wrod' in s
False
>>>

suggest (word) => list of suggestions

Method returns a list of suggested spellings for given word. Even if word is correct, i.e. method check returned 1, action is performed.

>>> s.suggest('wrod') # we made mistake, what aspell suggests?
['word', 'Rod', 'rod', 'Brod', 'prod', 'trod', 'Wood', 'wood', 'wried']
>>>

Warning! suggest() in aspell 0.50 is very, very slow. I recommend caching it's results if program calls the function several times with the same argument.

addReplecement(incorrect, correct) => None

Adds a replacement pair, it affects order of words in suggest result.

>>> # we choose 7th word from previous result
>>> s.addReplecement('wrod', 'trod')
>>> # and the selected word appears at the 1st position
>>> s.suggest('word')
['trod', 'word', 'Rod', 'rod', 'Brod', 'prod', 'Wood', 'wood', 'wried']

If config key save-repl is true method saveAllwords saves the replacement pairs to file ~/.aspell.{lang_code}.prepl.

addtoPersonal(word) => None

Adds word to the personal dictionary, which is stored in file ~./.aspell.{lang_code}.pws. The added words are available for AspellSpeller object, but they remain unsaved until method saveAllwords is called.

# personal dictionary is empty now
$ cat ~/.aspell.en.pws
personal_ws-1.1 en 0

$ python
>>> import aspell
>>> s = aspell.Speller('lang', 'en')
# word 'aspell' doesn't exist
>>> s.check('aspell')
0

# we add it to the personal dictionary
>>> s.addtoPersonal('aspell')

# and now aspell knows it
>>> s.check('aspell')
1

# we save personal dictionary
>>> s.saveAllwords()

# new word appeared in the file
$ cat ~/.aspell.en.pws
personal_ws-1.1 en 1
aspell

# check it once again
$ python
>>> import aspell
>>> s = aspell.Speller('lang', 'en')

# aspell still knows it's own name
>>> s.check('aspell')
1

>>> s.check('aaa')
0
>>> s.check('bbb')
0
# add incorrect words, they shouldn't be saved
>>> s.addtoPersonal('aaa')
>>> s.addtoPersonal('bbb')
>>> s.check('aaa')
1
>>> s.check('bbb')
1

# we've exit without saving, words 'aaa' and 'bbb' doesn't exists
$ cat ~/.aspell.en.pws
personal_ws-1.1 en 1
aspell
$

addtoSession(word) => None

Adds word to the session dictionary. The session dictionary is volatile, it is not saved to any file. It is destroyed with AspellSpeller object or when method clearSession is called.

saveAllwords() => None

Save all words from personal dictionary.

clearSession() => None

Clears session dictionary.

>>> import aspell
>>> s = aspell.Speller('lang', 'en')
>>> s.check('linux')
0
>>> s.addtoSession('linux')
>>> s.check('linux')
1
>>> s.clearSession()
>>> s.check('linux')
0

getPersonalwordlist() => [list of strings]

Returns list of words from personal dictionary.

getSessionwordlist() => [list of strings]

Returns list of words from session dictionary.

>>> s.addtoSession('aaa')
>>> s.addtoSession('bbb')
>>> s.getSessionwordlist()
['aaa', 'bbb']
>>> s.clearSession()
>>> s.getSessionwordlist()
[]
>>>

getMainwordlist() => [list of strings]

Returns list of words from the main dictionary.

Known problems

All version of aspell I've tested have the same error - calling method getMainwordlist produces SIGKILL. It is aspell problem and if you really need a full list of words, use external program word-list-compress.

method aspell 0.50.5 aspell 0.60.2 aspell 0.60.3
ConfigKeys ok ok ok
Speller ok ok ok
check ok ok ok
suggest ok ok ok
addReplecement ok ok ok
addtoPersonal ok ok ok
saveAllwords ok ok ok
addtoSession ok ok ok
clearSession ok AspellSpellerError ok
getPersonalwordlist ok SIGKILL ok
getSessionwordlist ok SIGKILL ok
getMainwordlist SIGKILL SIGKILL SIGKILL

Character encoding

Aspell uses 8-bit encoding. The encoding depend on dictionary setting and is stored in key encoding. One can obtain this key using speller's ConfigKeys.

If your application uses other encoding then aspell, the translation is needed. Here is a sample session (polish dictionary is used).

>>> import aspell
>>> s=aspell.Speller('lang', 'pl')
>>>
>>> s.ConfigKeys()['encoding']
[('encoding', 'string', 'iso8859-2')]
>>> enc =s.ConfigKeys()['encoding'][2]
>>> enc  # dictionary encoding
'iso8859-2'
>>> word # encoding of word is utf8
# 'gżegżółka' means in some polish dialects 'cuckoo'
'g\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> s.check(word)
0
>>> s.check( unicode(word, 'utf-8').encode(enc) )
1