Esrapy - Parsing in Python

esrapy is yparse backwards

Latest version: 0.6.1 posted March 25, 2006 (changes)

Esrapy is a parsing library written entirely in python. (If you're thinking "what the heck is 'parsing'?", click here.)

Esrapy can parse pretty much any context-free grammar, including left-recursive ones; can compile patterns (grammars) from textual or procedural descriptions; unifies parsing and tokenizing so tokenization can be contextual; has support for precedence (ambiguity resolution) and attributes (limited context-sensitivity); can return "first match" or a forrest of all possible parsings (for an ambiguous grammar); is very easy to use and results in very readable applications; and is only about 600 lines of code (not counting comments)--or only 450 if you don't need the textual compiler. Esrapy works as a simple translator from a raw source text to a friendly(!) parse tree. Much of the emphasis in esrapy is in making the returned parse tree very easy to traverse.

Esrapy is not terribly fast, and may be a bit of a memory hog while it's running (this may be an understatement). It's probably most useful for language prototyping or small interactive applications where generality and ease of use are more important than speed. (I could be wrong about the speed--I haven't run comparisons.) The parsing method it uses is somewhere between chart parsing and a packrat recursive descent.

Here is an entire program using esrapy which implements a simple infix calculator with operator precedence (see here for a trivial natural language example):

import esrapy

# Here's a Pattern that parses expressions like "a = b*5+(2^c)":
pattern = esrapy.compile(r"""
SPACE = <[ \t]*>
expr  = name:<[a-zA-Z_]+> op:'=' b:expr
		|1; a:expr@1 op:<[+-]> b:expr@2
		|2; a:expr@2 op:<[*/]> b:expr@3
		|3; a:expr@3 op:'^'    b:expr@4
		|4; name:<[a-zA-Z_]+> op:\var
		|    val:<[0-9]+>     op:\int
		| '(' _:expr ')'
""")

# And here's code that evaluates expressions like that...
vars = {}
def eval_pt(pt):
	if pt.op == 'int': return int(pt.val)
	if pt.op ==   '+': return eval_pt(pt.a)  + eval_pt(pt.b)
	if pt.op ==   '-': return eval_pt(pt.a)  - eval_pt(pt.b)
	if pt.op ==   '*': return eval_pt(pt.a)  * eval_pt(pt.b)
	if pt.op ==   '/': return eval_pt(pt.a)  / eval_pt(pt.b)
	if pt.op ==   '^': return eval_pt(pt.a) ** eval_pt(pt.b)
	if pt.op == 'var': return vars[pt.name]
	if pt.op ==   '=':
		val = eval_pt(pt.b)
		vars[pt.name] = val
		return val
	raise "Unhandled op: %s"%pt.op

while True:
	try:
		s = raw_input("Expr: ")
	except:
		break
	try:
		pt = pattern.match(s, debug=1)
	except:
		print "Sorry, couldn't parse that."
		continue
	print "Value:", eval_pt(pt)

More info and docs available in the tarball: esrapy.0.6.1.tgz

It is released under the GPL license version 2.

It is currently fairly green, so expect possible hiccups. Please let me know of any bugs you find, or enhancements you make.

See also: pyparsing and spark.

History: I needed a very general parser that was easy to install, maintain, and pass along, and didn't like anything I found, so I wrote one. Spark was close but I wanted contextual lexing (i.e., not a distinct tokenization pass) and prefer a simple parse tree to having code hooks into the parsing process itself (i.e., just go parse the thing and hand me the results, don't get me involved). After esrapy was up and running, I happend upon pyparsing which was so similar under the hood in some ways I had to strike up a conversation with its author Paul McGuire to share amusement and ideas, which we did, so thanks to him for that.

-Simon (simonfunk@gmail.com) March 16, 2006