Python/Language

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Overview

This page tries to describe the complete Python programming language.

What's in a Version?

The version number is a sequence of three numbers separate by a period. The first number is the major version, the second the minor, and the third the release number.

Major version changes break everything. Python 2 programs may not run in Python 3.

Python is very strict about backwards compatibility. Any program written in a lesser minor version will still run in a greater minor version. IE, 2.5 programs work with 2.7, and 3.2 programs run in 3.5. This doesn't work the other way, though. Python 3.5 programs may not run in Python 3.2.

Release versions (the third number) increment as they release bug fixes or modify things that are not visible to the programmer.

This is for Python 3.4.3. See https://docs.python.org/3.4/reference/index.html

Program

A program consists of a set of Logical Lines.

Logical Lines

A line starts ends with a newline, or is the combination of lines using the '\', or is the collection of lines that form a statement. Each of the following is a single logical line:

b = 7
a = b + c - d \
    * f
if answer == 'Y':
     return True

Note that the last logical line is a complete statement which consists of logical lines in the statement body.

Physical Lines

A Physical Line is the line of text that ends with the control sequence '\n' (Linux), '\r\n' (Windows) or '\r' (Old Macs).

Encoding declarations

If the first or second line matches one of the following, then it is read as an encoding declaration. Otherwise, the file is considered to be in UTF8.

# -*- coding: <encoding-name> -*-
# vim:fileencoding=<encoding-name>


Comments

Comments start with a '#' that is not in a string literal and run to the end of the physical line. ('/' is ignored.)

Unix Note: The shebang line tells the shell how to run the text file. It must be the first line and begins:

#!/usr/bin/env python

When Python reads the file, it ignores this line as it is a comment.

Explicit Line Joining (/)

Lines that end with '/' and no further spaces join with the following line into a single Logical Line.

Implicit Line Joining

Inside of blocks of code surrounded by group operators such as '( )', '{ }', and '[ ]', the physical lines are all joined together into a single logical line.

a = [
   "a", "b",
   "c", "d"]

Indentation

Lines that begin with whitespace are indented and group with other lines of the same indentation before it. You must start with no indentation. Statements with bodies generally indent.

Do not mix tabs and spaces. I recommend 4 spaces ' ' as the indentation, and that seems to be the standard that everyone uses. Consult your text editor's documentation to configure it properly.

Blank lines

Blank lines are totally ignored.

Whitespace

Whitespace is ignored except in text literals.

Tokenizer

The tokenizer takes a Python program and converts it into a list of tokens. The following tokens exist:

  • NEWLINE indicating the end of one logical line and the beginning of another
  • INDENT / DEDENT indicating the beginning or end of an indentation block.
  • identifiers
  • keywords
  • literals
  • operators
  • delimiters.

Keywords

The following keywords exist:

  • and
  • as
  • assert
  • break
  • class
  • continue
  • def
  • del
  • elif
  • else
  • except
  • False
  • finally
  • for
  • from
  • global
  • if
  • import
  • in
  • is
  • lambda
  • None
  • nonlocal
  • not
  • or
  • pass
  • raise
  • return
  • True
  • try
  • while
  • with
  • yield

Don't bother memorizing this list. As you learn the features of Python, you'll know what all of them mean.

Identifiers

Special Identifiers

  • identifiers that start with '_' are not imported with "from Z import *" syntax.
  • identifiers that start and end with '__' are special to Python. Don't use them unless they are documented in the Python standard.
  • identifiers that start with '__' in a class are renamed '_<classname>_XYZ' so that they become private class members. (But they're not really private.)


Literals

String or Byte Literals

There are two kinds of character sequences in Python.

Strings (str type) (really, unicode strings) are sequences of unicode code points. Their corresponding literals:

  • Start with an optional prefix 'r', 'R', 'u' or 'U'.
  • Are delimited by ', ", , or """
  • Holds a sequence of unicode code points that are encoded as below.

Bytes (bytes type) are sequences of bytes and not unicode code points:

  • Start with required prefix that is some combination of 'b' or 'B' and nothing or 'r' or 'R'. IE, 'B', 'br', 'BR', 'rB', etc...
  • Are delimited with ', ", , or """.
  • Holds a sequence of bytes that have encoding described below.

Note that whitespace is not allowed between the prefix and the delimiter.

Short literals use ' or ". They cannot span physical lines (without using \, of course.)

Long literals use or """. They can span physical lines.

Byte literals can only hold values between 0 and 255. The text inside of them must be ASCII, with escape sequences to denote values outside of the ASCII range.

If the 'r' or 'R' prefix is used, then the literal is a "raw" literal. The escape sequences below are ignored, except for quoting the quote character ' or ". Thus, these literals cannot end in a backslash \. These raw literals are typically used with regular expressions (the re module) where the backslash has a meaning and escaping it would be bothersome.

The 'u' or 'U' prefix is meaningless. It used to indicate a unicode string in Python 2.


The following escape sequences exist:

  • \ at the end of the line: Joins this line with the next.
  • \\: A single \
  • \': '
  • \": "
  • \a: ASCII BEL
  • \b: ASCII BS
  • \f: ASCII FF
  • \n: ASCII LF (newline, line feed, Unix line ending)
  • \r: ASCII CR (carriage return. \r\n is used on Windows and interent protocols for line ending.)
  • \t: ASCII TAB (tab)
  • \v: ASCII VT (vertical tab)
  • \ooo: Three-digit octal code ooo. (Unicode in string, value in bytes)
  • \xhh: Two digit hex code hh. (unicode in string, value in bytes)
  • \N{unicode name}: The unicode code point so named (Only string)
  • \uxxxx: The four-digit hex xxxx unicode code point (Only string)
  • \Uxxxxxxxx: The eight-digit hex xxxxxxxx unicode code point (Only string)*

String or Byte Literal Concatenation

When two string or byte literals follow each other one after the other on a single logical line, they are combined into one string.

These two strings are equivalent:

"Mary had a little lamb"
"Mary had a" \
" little lamb" 

Numeric Literals

There are ints, floats, and imaginary number literals.

  • int:
    • decimal: [1-9][0-9]* (Doesn't start with a 0!)
    • octal: 0[oO][0-7]+ (starts with a 0o or 0O)
    • hex: 0[xX][0-9a-fA-F]+ (starts with 0x or 0X)
    • binary: o[bB][01]+ (starts with 0b or 0B)
  • float:
    • Any number with a . or ending in a .
    • Any number in the for XeY, where X may have a . and Y is an integer. This is just X × 10^Y
  • imaginary:
    • Any float followed by 'j' or 'J'.


Operators

+       -       *       **      /       //      %
<<      >>      &       |       ^       ~
<       >       <=      >=      ==      !=

Python 3.5 is adding @ as an operator meaning "matrix multiplication".

Delimiters

Although some of these look like operators in other languages, they are not operators in Python. They have a special meaning.

(       )       [       ]       {       }
,       :       .       ;       @       =       ->
+=      -=      *=      /=      //=     %=
&=      |=      ^=      >>=     <<=     **=

These are not delimiters but are treated special by the tokenizer:

 '       "       #       \

And these are never valid outside of a string or byte literal:

$       ?       `

Simple Statements

The following statements exist.


Compound Statements

Expressiosn

Any expression is a valid statement. See #Expressions below.

Execution Model

Importing

Data Types