# pyuca: Python Unicode Collation Algorithm implementation (http://jtauber.com/blog/2006/01/27/python_unicode_collation_algorithm/) This is my preliminary attempt at a Python implementation of the [Unicode Collation Algorithm (UCA)](http://unicode.org/reports/tr10/). I originally posted it to my blog in 2006 but it seems to get enough usage it really belongs here (and in PyPI). What do you use it for? In short, sorting non-English strings properly. The core of the algorithm involves multi-level comparison. For example, ``café`` comes before ``caff`` because at the primary level, the accent is ignored and the first word is treated as if it were ``cafe``. The secondary level (which considers accents) only applies then to words that are equivalent at the primary level. The Unicode Collation Algorithm and pyuca also support contraction and expansion. **Contraction** is where multiple letters are treated as a single unit. In Spanish, ``ch`` is treated as a letter coming between ``c`` and ``d`` so that, for example, words beginning ``ch`` should sort after all other words beginnings with ``c``. **Expansion** is where a single letter is treated as though it were multiple letters. In German, ``ä`` is sorted as if it were ``ae``, i.e. after ``ad`` but before ``af``. ## Here is how to use the ``pyuca`` module: `` git clone https://github.com/jtauber/pyuca.git cd pyuca pip install pyuca `` **Usage example:** `` from pyuca import Collator c = Collator("allkeys.txt") sorted_words = sorted(words, key=c.sort_key) `` ``allkeys.txt`` (1 MB) is available at http://www.unicode.org/Public/UCA/latest/allkeys.txt