Pyli/Python Implementation

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Parser

The parser is written in pure Python. It consists of two parts, the lexer and the parser.

lexer

Found in lib/pyli/parser.py.

The lexer identifies the following and produces them:

Rule Produces !
End of file ('end', None)
Whitespace / \t\r\n/ Nothing (skipped)
Comment ('#' to end of line) Nothing (skipped)
[](){}&$@. ('symbol', the symbol)
' or " to matching quote, interpreting escape sequences ('quote', the interpreted quoted text)
Numbers ('number', (string of the number, type))
Anything else ('text', the text)

parser

The parser is very simple. Note that whitespace and comments are not seen by the parser.

The top level is a collection of exprs, seperated by whitespace or comments or nothing at all.

exprs := expr*

expr := non_dot_expr | dot_expr

non_dot_expr := paren_expr
              | square_expr
              | amp_expr
              | at_expr
              | dollar_expr
              | quote_expr
              | text_expr
              | number_expr


a.b means (attr a &b). It is important that it is right-associative so that a.b.c means (attr (attr a &b) &c), or a.b's c.

dot_expr := expr '.' non_dot_expr

(a b c) is interpreted as "A vector with three items, 'a', 'b', and 'c'." The interpreter will read this as, "Evaluate a, b, and c and then call a with parameters (b,c).

paren_expr := '(' exprs ')'

[a b c] is interpreted as "a vector with four items, 'vector', 'a', 'b', and 'c'." The interpreter will read this as, "Evaluate "vector", "a", "b", and "c", and call "vector" with the parameters (a, b, c). This will create a vector with items of a, b, and c, each evaluated.

square_expr := '[' exprs ']'

&expr is interpreted as "a vector with two items, "quote" and "expr". The interpreter will see this and simply return 'expr', unmodified and unevaluated.

amp_expr := '&' expr

@expr is interpreted as "a vector with two items, 'inline' and 'expr'." The interpreter will see this when evaluated a series of items, evaluate 'expr', and splice the results into the current list.

at_expr := '@' expr

$expr is interpreted as "a vector with two items, 'eval' and 'expr'." The interpreter will see this and evaluate 'expr' once, and then evaluate what it evaluates to a second time, returning the result.

dollar_expr := '$' expr

"..." means a vector with two items, 'quote' and '...'. The interpreter will interpret this as the literal string '...'. Note the similarities to &.

quote_expr := quote

abc means the text 'abc'. The interpreter interprets this as a lexical lookup of the value for 'abc'.

text_expr := text

123 means a vector with two items, 'int' (or 'decimal' or 'float') and '123'. The interpreter will see this and evaluate it to an int, decimal, or float of the number represented.

number_expr := number

Future Directions

It is apparent (7/15/08) that we need a syntax for lisp's `(... ,... @...). I don't know how to do this quite right, and I can think of a few pathological cases that are not clear.

This syntax can be approximated by replacing ` with &, , with $, and keeping @ as-is, roughly. It is not an exact translation.

Today, the code I have written relies on & or quote meaning, "quote it verbatim, as-is." In other words, it is what ' is in Lisp. I may need to add a new operation--fancy-quote--which will do the substitutions as proscribed by @ and $.

Interpreter

The interpreter is a simple state machine that toggles between three states: eval, return, and done.

I need these three states so that I can interrupt actions midway in a stackless manner. I need 'eval' to be deferred because if I don't, then I can't interrupt anything.

I need 'return' to be deferred because if not, then a long return chain can make a single instruction not only extraorinarily long, but complicated and deep in the Python or C stack.

I need 'done' so that there is some way to exit out of everything.

Actions

There are basically 3 actions you can do to the interpreter state while it is handling an operation:

  1. push: Push something on the stack to indicate where you would like control to go once the next expression is evaluated.
  2. save_tail_recursive: Save an expression to evaluate, indicating that whatever is the result you would like that to be the result of the current expression. (Tail recursion.)
  3. save: Save an expression to evaluate, indicating where you would like the result to go. (Roughly the same as push + save_tail_recursive)
  4. return_: Return a result.

There is also signal, which signals a condition. In reality, this sets up the stack to call (signal condition), which handles the signals.

TODO: signal assumes you are done because you are signaling an error. From time to time, you may want to signal a warning condition, and so signal should actually accept what to do if the condition doesn't result in termination of the current process.

Attributes

The interpreter also has the following core attributes. These are most useful.

  • lex_env: The lexical environment.
  • dyn_env: The dynamic environment.
  • stack: The stack. Each entry in the stack is a tuple of:
    • lex_env at the time the item was pushed on the stack.
    • dyn_env at the time the item was pushed on the stack.
    • exp at the time the item was pushed on the stack
    • cont, the name of the method to call when the value is ready
    • args, additional parameters to send to cont when the value is ready.

Internally, these attributes are used to handle state. Calling save, save_tail_recursive, return_, or done (or signal) will modify these.

  • state: Either 'eval' (evaluate exp), 'return' (return val), or 'done' (final value is val.)
  • exp: The expression to evaluate for 'eval', or the expression that was pushed on the stack along with this one.
  • val: The value that exp was calculated to.
  • cont: The function to call with val and args.
  • args: Additional arguments to call cont with.

How it all works together

Let's follow a simple story, the evaluation of (+ 1 1). This will exercise the stack and show how save and return_ work. We will, for clarity, ignore lex_env and dyn_env since they will not change. The columns indicate what the state of the interpreter was before the operation on the right occurred.

state exp val stack What the operation did
Initialization. Set state to 'eval', exp to (+ 1 1), put "(+ 1 1) done ()" on stack.
eval (+ 1 1) (+ 1 1), done, () eval_ sees that this is a tuple, so it preps the stack to evaluate each item and then call _apply_got_exprs.
eval + (+ 1 1) done ()

(+ 1 1) _apply_got_exprs ()

(+ 1 1) _eetv_got_value ([], [1 1])

eval_ sees that + is a single word, so it looks it up in lex_env and returns it.
return (+ 1 1) + (+ 1 1) done ()

(+ 1 1) _apply_got_exprs ()

As part of the return, the topmost item is popped off, state is restored, and it is called. _eetv_got_values now puts itself back on the stack and evaluates 1.
eval 1 (+ 1 1) done ()

(+ 1 1) _apply_got_exprs ()

(+ 1 1) _eetv_got_value ([+], [1])

eval_ sees that 1 is a number, so it returns the equivalent value.
return (+ 1 1) 1 (+ 1 1) done ()

(+ 1 1) _apply_got_exprs ()

(+ 1 1) _eetv_got_value ([+], [1])

As part of the return, the topmost item is popped off, state is restored, and it is called. _eetv_got_values now puts itself back on the stack and evaluates 1.
eval 1 (+ 1 1) done ()

(+ 1 1) _apply_got_exprs ()

(+ 1 1) _eetv_got_value ([+ 1], [])

eval_ sees that 1 is a number, so it returns the equivalent value.
return (+ 1 1) 1 (+ 1 1) done ()

(+ 1 1) _apply_got_exprs ()

(+ 1 1) _eetv_got_value ([+ 1], [])

As part of the return, the topmost item is popped off, state is restored, and it is called. _eetv_got_values now has the result ready, so it returns it.
return (+ 1 1) [+ 1 1] (+ 1 1) done ()

(+ 1 1) _apply_got_exprs ()

_apply_got_exprs takes the first item from val (+) and calls it with the rest of the items. The addition operation sets the interp state to return the value 2.
return (+ 1 1) 2 (+ 1 1) done () The done method simply sets the state to done.
done (+ 1 1) 2 And the final result is returned: 2.