
Transforming parse trees with the Visitor pattern
==================================================

The traditional way to use esrapy is to call pattern.match() and work with
the compact() representation -- a nested structure of strings, lists, and
MetaDicts.  This works well for simple cases, but for non-trivial grammars
the compact() representation discards structural information and requires
verbose, fragile tuple unpacking.

The visitor pattern is a cleaner alternative for building ASTs or evaluating
parse trees.  Instead of post-processing compact() output, you pass
compact=False to match() to get Match objects directly, then walk the tree
with a visitor whose methods are dispatched by pattern name.


Basic usage
-----------

    m = pattern.match(sourceText, compact=False)
    result = m.visit(myVisitor)

visit() dispatches to a method on the visitor named "on_<patternname>", where
patternname is the name assigned to the pattern in the grammar definition.
For example, if the grammar defines a rule named "number", the visitor method
is on_number(self, m) or on_number(self, m, parts) -- see "The parts
parameter" below.

If no matching on_* method exists, visit() applies the default structural
behavior for the matched pattern type (see "Default behavior by pattern type"
below) and returns the result directly, with no handler involved.


The parts parameter
-------------------

If a handler declares a parameter named "parts", visit() automatically calls
the default structural decomposition for that match and passes it as parts=,
saving the handler from doing so explicitly.  The type of parts depends on
the pattern:

  - Sequence with multiple named children: parts is a MetaDict, e.g.:
        def on_binop(self, m, parts):
            return parts.a + parts.b   # parts.a, parts.b already visited

  - Sequence with sole '_' child: parts is the single visited child value.

  - Regex (terminal): parts is the matched string, e.g.:
        def on_number(self, m, parts):
            return int(parts)

  - NtoM (repetition): parts is a list of visited children.

  - Block: parts is a list of visited statements.

  - Indent (indented block): parts is MetaDict(head=..., body=...).

In all cases, child matches within parts have already been visited -- the
visitor has been applied recursively before parts is passed to the handler.
This means handlers are typically just one or two lines:

    def on_assign(self, m, parts):
        vars[parts.name] = parts.val
        return parts.val

    def on_binop(self, m, parts):
        op = parts.op
        if op == '+': return parts.a + parts.b
        if op == '-': return parts.a - parts.b
        if op == '*': return parts.a * parts.b
        if op == '/': return parts.a / parts.b
        if op == '^': return parts.a ** parts.b

    def on_number(self, m, parts):
        return int(parts)

    def on_var(self, m, parts):
        return vars[parts]

The 'm' parameter is always available regardless of whether 'parts' is
declared.  Omit 'parts' when you need to control or defer the default
recursion -- for instance, to set up context before children are visited,
or to handle a particular sub-tree specially rather than letting the default
decomposition recurse into it:

    def on_with_stmt(self, m):
        # Set up context before children are visited
        self.context.push(...)
        parts = m.visit_parts(self)
        self.context.pop()
        return parts.body

    def on_expr_cmp(self, m):
        # Special handling: visit 'a' normally but handle 'rest' manually
        parts = m.visit_parts(self)
        # chain comparisons into 'and' of pairwise comparisons...


Default behavior by pattern type
---------------------------------

When no handler exists for a named pattern, or when visit_parts() is called
explicitly, each pattern type behaves as follows:

  Regex           Returns the matched string (m.value).

  Sequence        If labeled with a sole '_' child: returns that child's
                  visited value (passthrough).
                  If multiple named children: returns MetaDict of visited
                  named children; unnamed children (punctuation, whitespace)
                  are silently discarded.
                  If no named children: returns list of all visited children.

  OneOfSet        If the winning alternative has a string label in the set,
                  dispatches to that handler (which must exist -- no fallback).
                  Otherwise recurses into the winning alternative's visit().

  NtoM            Returns list of visited children.

  Block           Returns list of visited children.

  Indent          Returns MetaDict(head=visited_head, body=visited_body).

The passthrough rules that exist purely for operator precedence topology
(left_mul, right_sum, etc.) are typically anonymous OneOfSets whose winning
alternative is a named rule -- so they dispatch directly to the right handler
with no client code needed.

Anonymous Sequence wrappers added by compile() (for EOT matching, trailing
SPACE, etc.) use the '_' passthrough convention internally, so the outermost
match is transparently unwrapped before dispatch reaches any client handler.


Labeled alternatives and the OneOfSet
--------------------------------------

When a set of alternatives is labeled with string names in the grammar:

    expr = 1,binop; a:expr@1 op:<[+-]> b:expr@2
         | 2,binop; a:expr@2 op:<[*/]> b:expr@3
         | var
         | number

visit() detects which labeled alternative matched and dispatches to
on_binop, on_var, on_number etc. directly.  The match passed to the handler
is the inner match of the winning alternative, so parts contains the
alternative's own structure (a, op, b for binop).

A label can be shared across multiple alternatives (as "binop" is above) --
the same handler is called regardless of which precedence level matched.

Numeric precedence and string label can be combined as "N,name;" as shown.
The numeric part controls operator precedence (via @N subset references);
the string part controls visitor dispatch.  Neither is required.

Note: labeled alternatives within a set should always be anonymous in-place
patterns.  If you label a reference to an existing named rule, the handler
will receive the inner match but the named rule's own on_* handler will not
be called -- which is confusing.  If you want to delegate to a named rule,
leave it unlabeled and let it dispatch on its own name.


Navigating children directly: m.children
-----------------------------------------

For handlers that need to navigate the Match tree directly rather than
using parts, m.children returns a MatchView with attribute-style access
to named children:

    def on_binop(self, m):
        c = m.children
        a  = c.a.visit(self)
        op = c.op.value
        b  = c.b.visit(self)
        return ...

MatchView recurses transparently through anonymous intermediate Sequence
nodes to collect labeled descendants.  Use c.get('name', default) for
optional children.  Iterating m.children yields all named children in order.

m.value returns the raw first parsing value, which for terminal Regex
matches is the matched string.


Structuring your grammar for clean visitor dispatch
----------------------------------------------------

1.  Give every semantically meaningful alternative its own named rule or
    a string label within a set.  Unnamed alternatives are handled by the
    default structural behavior, which may or may not be what you want.

2.  Label all sequence items you need to access by name.  Unlabeled items
    are treated as punctuation and discarded in the parts MetaDict.

3.  Use '_' as the label for a sole passthrough item in a sequence:
        paren = '(' _:expr ')'
    This causes the sequence to transparently return the visited expr value.
    Note that '_' is only treated as passthrough when it is the sole label;
    if other labels are present, '_' is treated as a regular named child.

4.  For if/elif/else chains and similar constructs where the grammar
    produces siblings but the semantics require parent-child chaining,
    fold the chain while handling the enclosing block rather than in a
    separate post-processing pass:

        def on_block(self, m, parts):
            result = []
            for stmt in parts:
                if is_else_like(stmt) and result and is_if_like(result[-1]):
                    attach_else(result[-1], stmt)
                else:
                    result.append(stmt)
            return result


Complete example
----------------

See examples/maxicalc.py for a complete working example of a visitor-based
expression evaluator, including operator precedence, variable assignment,
and the parts parameter pattern.
