Pyli/Regular Expressions

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Regular Expressions

Many languages support a regular expression syntax similar to Perl's. Perl's RE syntax, however, has long ago left a simple syntax by allowing arbitrary expressions as part of the regular expressions.

I plan on abandoning the special syntax altogether, and only allowing expressions.

The use case for regular expression is one or more of the following:

  1. Identify whether a string matches an expression.
  2. Identify what matches the expression, including sub-expressions.
  3. Split a string based on an expression.
  4. Provide a lexer.

The lexer is more fully discussed as part of Parsing. It is simply an iterator that returns successive matches (like 2).

Whether a string matches an expression

This is the most simple of all.

There are several functions that build matching functions. A matching function simply returns True or False given a particular string.

Individual Characters

These look only at the first character of the string and ignore the rest.

  • (matches-exact <char>) : Matches an exact char
  • (matches-one-of <string>) : Matches one of the characters in the string.
  • (matches-any-but <string>) : Matches any character but one in the string.
  • (matches-class <description>) : Given a character class description (space, non-space, letter, etc...)

Repeaters

These match a repeating match of one of the above. They look through the successive parts of the match.

  • (matches-zero-or-once <matcher>) : Similar to '?'
  • (matches-once-plus <matcher>) : Similar to '+'
  • (matches-zero-plus <matcher>) : Similar to '*'
  • (matches-range <matcher> <from> <to>) : Similze to {from,to}

The minimal variants will attempt to match the minimum number of times.

  • (matches-zero-or-once-minimal <matcher>) : Similar to '??'
  • (matches-once-plus-minimal <matcher>) : Similar to '+?'
  • (matches-zero-plus-minimal <matcher>) : Similar to '*?'
  • (matches-range-minimal <matcher> <from> <to>) : Similar to {from,to}?

Alternates

These look at matching one or the other expression.

  • (matches-alternate <matcher> ...)