Tuesday, July 12, 2005

Word Algebras and Strange Grammars

Ars Mathematica recently announced the issue of Algebraic Combinatorics on Words, by the anonymous group of authors called 'M. Lothaire'. This is a sequel to the original Combinatorics on Words.

It may come as a surprise to some (at least it did to me) that you can define interesting algebraic structures on words (sequences of letters from a (finite) alphabet). Combinatorics on Words is a great book that builds up the theory of free algebras over words. It was in this book that I first came across a beautiful proof of the precursor of the Robertson-Seymour program:

Consider any infinite sequence of finite-lengthwords w1, w2, ..., over a finite alphabet. There always exists an i and a j > i such that wi is a subsequence of wj.
Viewed another way, what this says is that if we define a partial ordering on strings such that one string is greater than another if it contains the second as a substring, then

there is no infinite antichain

Why is this a precursor of the Robertson-Seymour program ? Well, the above fact shows that the set of strings with the given order is a well-quasi order, which in turn means that any filter (i.e any set that is "upward" closed, or for each element x contains all elements y greater than x) has a finite basis, which means that there is a finite set of strings whose upward closure defines the filter). Conversely, this means that any ideal (a set of strings closed under the substring operation) has as complement a filter, and thus membership in the ideal can be described in terms of "forbidden" strings (the elements of the finite basis)

The Robertson-Seymour program can be stated as: graphs with the order defined by minors (G is greater than G' if G' is a minor of G) form a well-quasi order, which thus means that any upward closed family has a finite basis, and thus any idea (any minor-closed family) has a finite set of excluded graphs that defines it.

The result on strings is not hard to prove: I used it in this paper to show that membership for a certain grammar was decidable. Unfortunately, we were never able to prove a stronger result, and the grammar seemed so simple that we felt there must be a more efficient membership result.

Here's the grammar, for your entertainment.

All production rules are of the form
a -> ba
cd -> d

where a, b, c, d are elements of the alphabet. There is a special symbol x, and the language of such a grammar is defined as all strings w such that wx ->* x. Notice that in the productions, a symbol either inserts or deletes a new symbol to its left. We called these grammers (somewhat clunkily) "leftist grammars".

Update (12/23/05): Leftist grammars are not context-free. In FCT 2005, Tomasz Jurdzinski and Krzysztof Lorys present a context-sensitive language that can be expressed as a leftist grammer.



  1. your link for algebraic combinatorics is broken. it has a spelling error.

    also in this context what is a chain and further what is meant by an anti-chain? 

    Posted by Anonymous

  2. thanks, the link is fixed.

    A chain is a sequence of ordered elements. i.e s_1, s_2, and so on, where s_i <= s_{i+1}. An antichain is a set of elements, none of whom have any relationship between them (since this is a partial order, such a thing can happen) 

    Posted by Suresh


Disqus for The Geomblog