Want to take your software engineering career to the next level? Join the mailing list for career tips & advice Click here

macropy

Macros in Python: quasiquotes, case classes, LINQ and more!

Subscribe to updates I use macropy


Statistics on macropy

Number of watchers on Github 2424
Number of open issues 26
Average time to close an issue 15 days
Main language Python
Average time to merge a PR about 1 hour
Open pull requests 6+
Closed pull requests 4+
Last commit over 6 years ago
Repo Created over 7 years ago
Repo Last Updated over 2 years ago
Size 3.46 MB
Organization / Authorlihaoyi
Contributors12
Page Updated
Do you use macropy? Leave a review!
View open issues (26)
View macropy activity
View on github
Book a Mock Interview With Me (Silicon Valley Engineering Leader, 100s of interviews conducted)
Software engineers: It's time to get promoted. Starting NOW! Subscribe to my mailing list and I will equip you with tools, tips and actionable advice to grow in your career.
Evaluating macropy for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

MacroPy 1.0.3

MacroPy is an implementation of Syntactic Macros in the Python Programming Language. MacroPy provides a mechanism for user-defined functions (macros) to perform transformations on the abstract syntax tree (AST) of a Python program at import time. This is an easy way to enhance the semantics of a Python program in ways which are otherwise impossible, for example providing an extremely concise way of declaring classes:

>>> import macropy.console
0=[]=====> MacroPy Enabled <=====[]=0
>>> from macropy.case_classes import macros, case

>>> @case
class Point(x, y): pass

>>> p = Point(1, 2)
>>> print p.x
1
>>> print p
Point(1, 2)

Try it out in the REPL, it should just work! You can also see the docs/examples/using_macros folder for a minimal example of using MacroPy's existing macros.

MacroPy has been used to implement features such as:

As well as a number of more experimental macros such as:

Browse the high-level overview, or look at the Tutorials will go into greater detail and walk you through

The Reference Documentation contains information about:

  • Data Model, what MacroPy gives you to work with
  • Arguments, what a macro is given to do its work
  • Quasiquotes, a quick way to manipulate AST fragments
  • Walkers, a flexible tool to traverse and transform ASTs
  • Hygiene, how to avoid weird bugs related to name collisions and shadowing
  • Expansion Failures, what happens when a macro doesn't work.
  • Expansion Order of nested macros with a file
  • Line Numbers, or what errors you get when something goes wrong.

Or just skip ahead to the Discussion and Conclusion. We're open to contributions, so send us your ideas/questions/issues/pull-requests and we'll do our best to accommodate you! You can ask questions on the Google Group or file bugs on thee issues page. See the changelist to see what's changed recently.

MacroPy is tested to run on CPython 2.7.2 and PyPy 2.0, but with only partial support for Python 3.X (You'll need to clone the python3 branch yourself) and no support for Jython. MacroPy is also available on PyPI, using a standard setup.py to manage dependencies, installation and other things. Check out this gist for an example of setting it up on a clean system.

30,000ft Overview

Macro functions are defined in three ways:

from macropy.core.macros import *

macros = Macros()

@macros.expr
def my_expr_macro(tree, **kw):
    ...
    return new_tree

@macros.block
def my_block_macro(tree, **kw):
    ...
    return new_tree

@macros.decorator
def my_decorator_macro(tree, **kw):
    ...
    return new_tree

The line macros = Macros() is required to mark the file as providing macros, and the macros object then provides the methods expr, block and decorator which can be used to decorate functions to mark them out as the three different kinds of macros.

Each macro function is passed a tree. The tree is an AST object, the sort provided by Python's ast module. The macro is able to do whatever transformations it wants, and it returns a modified (or even an entirely new) AST object which MacroPy will use to replace the original macro invocation. The macro also takes **kw, which contains other useful things which you may need.

These three types of macros are called via:

from my_macro_module import macros, my_expr_macro, my_block_macro, my_decorator_macro

val = my_expr_macro[...]

with my_block_macro:
    ...

@my_decorator_macro
class X():
    ...

Where the line from my_macro_module import macros, ... is necessary to tell MacroPy which macros these module relies on. Multiple things can be imported from each module, but macros must come first for macros from that module to be used.

Any time any of these syntactic forms is seen, if a matching macro exists in any of the packages from which macros has been imported from, the abstract syntax tree captured by these forms (the ... in the code above) is given to the respective macro to handle. The tree (new, modified, or even unchanged) which the macro returns is substituted into the original code in-place.

MacroPy intercepts the module-loading workflow, via the functionality provided by PEP 302: New Import Hooks. The workflow is roughly:

  • Intercept an import
  • Parse the contents of the file into an AST
  • Walk the AST and expand any macros that it finds
  • Compile the modified AST and resume loading it as a module

Workflow

Note that this means you cannot use macros in a file that is run directly, as it will not be passed through the import hooks. Hence the minimum viable setup is:

# run.py
import macropy.activate     # sets up macro import hooks
import other                # imports other.py and passes it through import hooks


# my_macro_module.py
from macropy.core.macros import *

macros = Macros()

... define some macros ...


# other.py
from macropy.macros.my_macro_module import macros, ...

... do stuff with macros ...

Where you run run.py instead of other.py. For the same reason, you cannot directly run MacroPy's own unit tests directly using unittest or nose: you need to run the macropy/run_tests.py file from the project root for the tests to run. See the runnable, self-contained no-op example to see exactly what this looks like, or the example for using existing macros.

MacroPy also works in the REPL:

PS C:\Dropbox\Workspace\macropy> python
Python 2.7 (r27:82525, Jul  4 2010, 07:43:08) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import macropy.console
0=[]=====> MacroPy Enabled <=====[]=0
>>> from macropy.tracing import macros, trace
>>> trace[[x*2 for x in range(3)]]
range(3) -> [0, 1, 2]
x*2 -> 0
x*2 -> 2
x*2 -> 4
x*2 for x in range(3) -> [0, 2, 4]
[0, 2, 4]

This example demonstrates the usage of the Tracing macro, which helps trace the evaluation of a Python expression. Although support for the REPL is still experimental, most examples on this page will work when copied and pasted into the REPL verbatim. MacroPy also works in the PyPy and IPython REPLs.

Demo Macros

Below are a few example uses of macros that are implemented (together with test cases!) in the macropy and macropy/experimental folders. These are also the ideal places to go look at to learn to write your own macros: check out the source code of the String Interpolation or Quick Lambda macros for some small (<30 lines), self contained examples. Their unit tests demonstrate how these macros are used.

Feel free to open up a REPL and try out the examples in the console; simply import macropy.console, and most of the examples should work right off the bat when pasted in! Macros in this section are also relatively stable and well-tested, and you can rely on them to work and not to suddenly change from version to version (as much as can be said for a two-month-old project!).

Case Classes

from macropy.case_classes import macros, case

@case
class Point(x, y): pass

p = Point(1, 2)

print str(p) # Point(1, 2)
print p.x    # 1
print p.y    # 2
print Point(1, 2) == Point(1, 2) # True
x, y = p
print x, y   # 1 2

Case classes are classes with extra goodies:

  • Nice __str__ and __repr__ methods autogenerated
  • An autogenerated constructor
  • Structural equality by default
  • A copy-constructor, for creating modified copies of instances
  • A __slots__ declaration, to improve memory efficiency
  • An __iter__ method, to allow destructuring

The reasoning being that although you may sometimes want complex, custom-built classes with custom features and fancy inheritance, very (very!) often you want a simple class with a constructor, pretty __str__ and __repr__ methods, and structural equality which doesn't inherit from anything. Case classes provide you just that, with an extremely concise declaration:

@case
class Point(x, y): pass

As opposed to the equivalent class, written manually:

class Point(object):
    __slots__ = ['x', 'y']
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __str__(self):
        return "Point(" + self.x + ", " + self.y + ")"

    def __repr__(self):
        return self.__str__()

    def __eq__(self, other):
        return self.x == other.x and self.y == other.y

    def __ne__(self, other):
        return not self.__eq__(other)

    def __iter__(self, other):
        yield self.x
        yield self.y

Whew, what a lot of boilerplate! This is clearly a pain to do, error prone to deal with, and violates DRY in an extreme way: each member of the class (x and y in this case) has to be repeated 8 times, with loads and loads of boilerplate. It is also buggy, and will fail at runtime when the above example is run, so see if you can spot the bug in it! Given how tedious writing all this code is, it is no surprise that most python classes do not come with proper __str__ or useful __eq__ functions! With case classes, there is no excuse, since all this will be generated for you.

Case classes also provide a convenient copy-constructor, which creates a shallow copy of the case class with modified fields, leaving the original unchanged:

a = Point(1, 2)
b = a.copy(x = 3)
print a # Point(1, 2)
print b # Point(3, 2)

Like any other class, a case class may contain methods in its body:

@case
class Point(x, y):
    def length(self):
        return (self.x ** 2 + self.y ** 2) ** 0.5

print Point(3, 4).length() # 5.0

or class variables. The only restrictions are that only the __init__, __repr__, ___str__, __eq__ methods will be set for you, and the initializer/class body and inheritance are treated specially.

Body Initializer

@case
class Point(x, y):
    self.length = (self.x**2 + self.y**2) ** 0.5

print Point(3, 4).length # 5

Case classes allow you to add initialization logic by simply placing the initialization statements in the class body: any statements within the class body which are not class or function definitions are taken to be part of the initializer, and so you can use e.g. the self variable to set instance members just like in a normal __init__ method.

Any additional assignments to self.XXX in the body of the class scope are detected and the XXX added to the class' __slots__ declaration, meaning you generally don't need to worry about __slots__ limiting what you can do with the class. As long as there is an assignment to the member somewhere in the class' body, it will be added to slots. This means if you try to set a member of an instance via my_thing.XXX = ... somewhere else, but aren't setting it anywhere in the class' body, it will fail with an AttributeError. The solution to this is to simply add a self.XXX = None in the class body, which will get picked up and added to its __slots__.

The body initializer also means you cannot set class members on a case class, as it any bare assignments XXX = ... will get treated as a local variable assignment in the scope of the class' __init__ method. This is one of several limitations.

Defaults, *args and **kwargs

Case classes also provide a syntax for default values:

@case
class Point(x | 0, y | 0):
    pass

print str(Point(y = 5)) # Point(0, 5)

For *args:

@case
class PointArgs(x, y, [rest]):
    pass

print PointArgs(3, 4, 5, 6, 7).rest # (5, 6, 7)

and **kwargs:

@case
class PointKwargs(x, y, {rest}):
    pass

print PointKwargs(1, 2, a=1, b=2).rest # {'a': 1, 'b': 2}

All these behave as you would expect, and can be combined in all the normal ways. The strange syntax (rather than the normal x=0, *args or **kwargs) is due to limitations in the Python 2.7 grammar, which are removed in Python 3.3.

Inheritance

Instead of manual inheritance, inheritance for case classes is defined by nesting, as shown below:

@case
class List():
    def __len__(self):
        return 0

    def __iter__(self):
        return iter([])

    class Nil:
        pass

    class Cons(head, tail):
        def __len__(self):
            return 1 + len(self.tail)

        def __iter__(self):
            current = self

            while len(current) > 0:
                yield current.head
                current = current.tail

print isinstance(List.Cons(None, None), List)    # True
print isinstance(List.Nil(), List)               # True

my_list = List.Cons(1, List.Cons(2, List.Cons(3, List.Nil())))
empty_list = List.Nil()

print my_list.head              # 1
print my_list.tail              # List.Cons(2, List.Cons(3, List.Nil()))
print len(my_list)              # 5
print sum(iter(my_list))        # 6
print sum(iter(empty_list))     # 0

This is an implementation of a singly linked cons list, providing both head and tail (LISP's car and cdr) as well as the ability to get the len or iter for the list.

As the classes Nil are Cons are nested within List, both of them get transformed into case classes which inherit from it. This nesting can go arbitrarily deep.

Overriding

Except for the __init__ method, all the methods provided by case classes are inherited from macropy.case_classes.CaseClass, and can thus be overriden, with the overriden method still accessible via the normal mechanisms:

from macropy.case_classes import CaseClass

@case
class Point(x, y):
    def __str__(self):
        return "mooo " + CaseClass.__str__(self)

print Point(1, 2) # mooo Point(1, 2)

The __init__ method is generated, not inherited. For the common case of adding additional initialization steps after the assignment of arguments to members, you can use the body initializer described above. However, if you want a different modification (e.g. changing the number of arguments) you can achieve this by manually defining your own __init__ method:

@case
class Point(x, y):
    def __init__(self, value):
        self.x = value
        self.y = value


print Point(1) # mooo Point(1, 1)

You cannot access the replaced __init__ method, due to fact that it's generated, not inherited. Nevertheless, this provides additional flexibility in the case where you really need it.

Limitations

Case classes provide a lot of functionality to the user, but come with their own set of limitations:

  • No class members: a consequence of the body initializer, you cannot assign class variables in the body of a class via the foo = ... syntax. However, @static and @class methods work fine
  • Restricted inheritance: A case class only inherits from macropy.case_classes.CaseClass, as well as any case classes it is lexically scoped within. There is no way to express any other form of inheritance
  • slots: case classes get __slots__ declarations by default. Thus you cannot assign ad-hoc members which are not defined in the class signature (the class Point(x, y) line).

Overall, case classes are similar to Python's namedtuple, but far more flexible (methods, inheritance, etc.), and provides the programmer with a much better experience (e.g. no arguments-as-space-separated-string definition). Unlike namedtuples, they are flexible enough that they can be used to replace a large fraction of user defined classes, rather than being relegated to niche uses.

In the cases where you desperately need additional flexibility not afforded by case classes, you can always fall back on normal Python classes and do without the case class functionality.

Enums

from macropy.case_classes import macros, enum

@enum
class Direction:
    North, South, East, West

print Direction(name="North") # Direction.North

print Direction.South.name    # South

print Direction(id=2)         # Direction.East

print Direction.West.id       # 3

print Direction.North.next    # Direction.South
print Direction.West.prev     # Direction.East

print Direction.all
# [Direction.North, Direction.East, Direction.South, Direction.West]

MacroPy also provides an implementation of Enumerations, heavily inspired by the Java implementation and built upon Case Classes. These are effectively case classes with

  • A fixed set of instances
  • Auto-generated name, id, next and prev fields
  • Auto-generated all list, which enumerates all instances.
  • A __new__ method that retrieves an existing instance, rather than creating new ones

Note that instances of an Enum cannot be created manually: calls such as Direction(name="North") or Direction(id=2) attempt to retrieve an existing Enum with that property, throwing an exception if there is none. This means that reference equality is always used to compare instances of Enums for equality, allowing for much faster equality checks than if you had used Case Classes.

Definition of Instances

The instances of an Enum can be declared on a single line, as in the example above, or they can be declared on subsequent lines:

@enum
class Direction:
    North
    South
    East
    West

or in a mix of the two styles:

@enum
class Direction:
    North, South
    East, West

The basic rule here is that the body of an Enum can only contain bare names, function calls (show below), tuples of these, or function defs: no other statements are allowed. In turn the bare names and function calls are turned into instances of the Enum, while function defs (shown later) are turned into their methods. This also means that unlike Case Classes, Enums cannot have body initializers.

Complex Enums

@enum
class Direction(alignment, continents):
    North("Vertical", ["Northrend"])
    East("Horizontal", ["Azeroth", "Khaz Modan", "Lordaeron"])
    South("Vertical", ["Pandaria"])
    West("Horizontal", ["Kalimdor"])

    @property
    def opposite(self):
        return Direction(id=(self.id + 2) % 4)

    def padded_name(self, n):
        return ("<" * n) + self.name + (">" * n)

# members
print Direction.North.alignment # Vertical
print Direction.East.continent  # ["Azeroth", "Khaz Modan", "Lordaeron"]

# properties
print Direction.North.opposite  # Direction.South

# methods
print Direction.South.padded_name(2) # <<South>>

Enums are not limited to the auto-generated members shown above. Apart from the fact that Enums have no constructor, and no body initializer, they can contain fields, methods and properties just like Case Classes do. This allows you to associate arbitrary data with each instance of the Enum, and have them perform as full-fledged objects rather than fancy integers.

Quick Lambdas

from macropy.quick_lambda import macros, f, _

print map(f[_ + 1], [1, 2, 3])    # [2, 3, 4]
print reduce(f[_ * _], [1, 2, 3]) # 6

Macropy provides a syntax for lambda expressions similar to Scala's anonymous functions. Essentially, the transformation is:

f[_ * _] -> lambda a, b: a * b

where the underscores get replaced by identifiers, which are then set to be the parameters of the enclosing lambda. This works too:

print map(f[_.split(' ')[0]], ["i am cow", "hear me moo"])
# ['i', 'hear']

Quick Lambdas can be also used as a concise, lightweight, more-readable substitute for functools.partial

from macropy.quick_lambda import macros, f
basetwo = f[int(_, base=2)]
print basetwo('10010') # 18

is equivalent to

import functools
basetwo = functools.partial(int, base=2)
print basetwo('10010') # 18

Quick Lambdas can also be used entirely without the _ placeholders, in which case they wrap the target in a no argument lambda: ... thunk:

from random import random
thunk = f[random() * 2 + 3]
print thunk() # 4.522011062548173
print thunk() # 4.894243231792029

This cuts out reduces the number of characters needed to make a thunk from 7 (using lambda) to 2, making it much easier to use thunks to do things like emulating by name parameters. The implementation of quicklambda is about 30 lines of code, and is worth a look if you want to see how a simple (but extremely useful!) macro can be written.

Lazy

from macropy.quick_lambda import macros, lazy

# count how many times expensive_func runs
count = [0]
def expensive_func():
    count[0] += 1

thunk = lazy[expensive_func()]

print count[0] # 0

thunk()
print count[0] # 1
thunk()
print count[0] # 1

The lazy macro is used to create a memoizing thunk. Wrapping an expression with lazy creates a thunk which needs to be applied (e.g. thunk()) in order to get the value of the expression out. This macro then memoizes the result of that expression, such that subsequent calls to thunk() will not cause re-computation.

This macro is a tradeoff between declaring the value as a variable:

var = expensive_func()

Which evaluates exactly once, even when not used, and declaring it as a function

thunk = lambda: expensive_func()

Which no longer evaluates when not used, but now re-evaluates every single time. With lazy, you get an expression that evaluates 0 or 1 times. This way, you don't have to pay the cost of computation if it is not used at all (the problems with variables) or the cost of needlessly evaluating it more than once (the problem with lambdas).

This is handy to have if you know how to compute an expression in a local scope that may be used repeatedly later. It may depend on many local variables, for example, which would be inconvenient to pass along to the point at which you know whether the computation is necessary. This way, you can simply compute the lazy value and pass it along, just as you would compute the value normally, but with the benefit of only-if-necessary evaluation.

Interned

from macropy.quick_lambda import macros, interned

# count how many times expensive_func runs
count = [0]
def expensive_func():
    count[0] += 1

def func():
    return interned[expensive_func()]

print count[0] # 0
func()
print count[0] # 1
func()
print count[0] # 1

The interned macro is similar to the Lazy macro in that the code within the interned[...] block is wrapped in a thunk and evaluated at most once. Unlike the lazy macro, however, interned does not created a memoizing thunk that you can pass around your program; instead, the memoization is done on a per-use-site basis.

As you can see in the example above, although func is called repeatedly, the expensive_func() call within the interned block is only ever evaluated once. This is handy in that it gives you a mechanism for memoizing a particular computation without worrying about finding a place to store the memoized values. It's just memoized globally (often what you want) while being scoped locally, which avoids polluting the global namespace with names only relevant to a single function (also often what you want).

String Interpolation

from macropy.string_interp import macros, s

a, b = 1, 2
print s["{a} apple and {b} bananas"]
# 1 apple and 2 bananas

Unlike the normal string interpolation in Python, MacroPy's string interpolation allows the programmer to specify the variables to be interpolated inline inside the string. The macro s then takes the string literal

"{a} apple and {b} bananas"

and expands it into the expression

"%s apple and %s bananas" % (a, b)

Which is evaluated at run-time in the local scope, using whatever the values a and b happen to hold at the time. The contents of the {...} can be any arbitrary python expression, and is not limited to variable names:

from macropy.string_interp import macros, s
A = 10
B = 5
print s["{A} + {B} = {A + B}"]
# 10 + 5 = 15

Tracing

from macropy.tracing import macros, log
log[1 + 2]
# 1 + 2 -> 3
# 3

log["omg" * 3]
# ('omg' * 3) -> 'omgomgomg'
# 'omgomgomg'

Tracing allows you to easily see what is happening inside your code. Many a time programmers have written code like

print "value", value
print "sqrt(x)", sqrt(x)

and the log() macro (shown above) helps remove this duplication by automatically expanding log(1 + 2) into wrap("(1 + 2)", (1 + 2)). wrap then evaluates the expression, printing out the source code and final value of the computation.

In addition to simple logging, MacroPy provides the trace() macro. This macro not only logs the source and result of the given expression, but also the source and result of all sub-expressions nested within it:

from macropy.tracing import macros, trace
trace[[len(x)*3 for x in ["omg", "wtf", "b" * 2 + "q", "lo" * 3 + "l"]]]
# "b" * 2 -> 'bb'
# "b" * 2 + "q" -> 'bbq'
# "lo" * 3 -> 'lololo'
# "lo" * 3 + "l" -> 'lololol'
# ["omg", "wtf", "b" * 2 + "q", "lo" * 3 + "l"] -> ['omg', 'wtf', 'bbq', 'lololol']
# len(x) -> 3
# len(x)*3 -> 9
# len(x) -> 3
# len(x)*3 -> 9
# len(x) -> 3
# len(x)*3 -> 9
# len(x) -> 7
# len(x)*3 -> 21
# [len(x)*3 for x in ["omg", "wtf", "b" * 2 + "q", "lo" * 3 + "l"]] -> [9, 9, 9, 21]
# [9, 9, 9, 21]

As you can see, trace logs the source and value of all sub-expressions that get evaluated in the course of evaluating the list comprehension.

Lastly, trace can be used as a block macro:

from macropy.tracing import macros, trace
with trace:
    sum = 0
    for i in range(0, 5):
        sum = sum + 5

# sum = 0
# for i in range(0, 5):
#     sum = sum + 5
# range(0, 5) -> [0, 1, 2, 3, 4]
# sum = sum + 5
# sum + 5 -> 5
# sum = sum + 5
# sum + 5 -> 10
# sum = sum + 5
# sum + 5 -> 15
# sum = sum + 5
# sum + 5 -> 20
# sum = sum + 5
# sum + 5 -> 25

Used this way, trace will print out the source code of every statement that gets executed, in addition to tracing the evaluation of any expressions within those statements.

Apart from simply printing out the traces, you can also redirect the traces wherever you want by having a log() function in scope:

result = []

def log(x):
    result.append(x)

The tracer uses whatever log() function it finds, falling back on printing only if none exists. Instead of printing, this log() function appends the traces to a list, and is used in our unit tests.

We think that tracing is an extremely useful macro. For debugging what is happening, for teaching newbies how evaluation of expressions works, or for a myriad of other purposes, it is a powerful tool. The fact that it can be written as a <100 line macro is a bonus.

Smart Asserts

from macropy.tracing import macros, require
require[3**2 + 4**2 != 5**2]
# Traceback (most recent call last):
#   File "<console>", line 1, in <module>
#   File "macropy.tracing.py", line 67, in handle
#     raise AssertionError("Require Failed\n" + "\n".join(out))
# AssertionError: Require Failed
# 3**2 -> 9
# 4**2 -> 16
# 3**2 + 4**2 -> 25
# 5**2 -> 25
# 3**2 + 4**2 != 5**2 -> False

MacroPy provides a variant on the assert keyword called require(. Like assert, require throws an AssertionError if the condition is false.

Unlike assert, require automatically tells you what code failed the condition, and traces all the sub-expressions within the code so you can more easily see what went wrong. Pretty handy!

`require can also be used in block form:

from macropy.tracing import macros, require
with require:
    a > 5
    a * b == 20
    a < 2

# Traceback (most recent call last):
#   File "<console>", line 4, in <module>
#   File "macropy.tracing.py", line 67, in handle
#     raise AssertionError("Require Failed\n" + "\n".join(out))
# AssertionError: Require Failed
# a < 2 -> False

This requires every statement in the block to be a boolean expression. Each expression will then be wrapped in a require(), throwing an AssertionError with a nice trace when a condition fails.

show_expanded

from ast import *
from macropy.core.quotes import macros, q
from macropy.tracing import macros, show_expanded

print show_expanded[q[1 + 2]]
# BinOp(left=Num(n=1), op=Add(), right=Num(n=2))

show_expanded is a macro which is similar to the simple log macro shown above, but prints out what the wrapped code looks like after all macros have been expanded. This makes it extremely useful for debugging macros, where you need to figure out exactly what your code is being expanded into. show_expanded also works in block form:

from macropy.core.quotes import macros, q
from macropy.tracing import macros, show_expanded, trace

with show_expanded:
    a = 1
    b = q[1 + 2]
    with q as code:
        print a

# a = 1
# b = BinOp(left=Num(n=1), op=Add(), right=Num(n=2))
# code = [Print(dest=None, values=[Name(id='a', ctx=Load())], nl=True)]

These examples show how the quasiquote macro works: it turns an expression or block of code into its AST, assigning the AST to a variable at runtime for other code to use.

Here is a less trivial example: case classes are a pretty useful macro, which saves us the hassle of writing a pile of boilerplate ourselves. By using show_expanded, we can see what the case class definition expands into:

from macropy.case_classes import macros, case
from macropy.tracing import macros, show_expanded

with show_expanded:
    @case
    class Point(x, y):
        pass

# class Point(CaseClass):
#     def __init__(self, x, y):
#         self.x = x
#         self.y = y
#         pass
#     _fields = ['x', 'y']
#     _varargs = None
#     _kwargs = None
#     __slots__ = ['x', 'y']

Pretty neat!


If you want to write your own custom logging, tracing or debugging macros, take a look at the 100 lines of code that implements all the functionality shown above.

MacroPEG Parser Combinators

from macropy.peg import macros, peg
from macropy.quick_lambda import macros, f

"""
PEG grammar from Wikipedia

Op      <- "+" / "-" / "*" / "/"
Value   <- [0-9]+ / '(' Expr ')'
Expr <- Value (Op Value)*

Simplified to remove operator precedence
"""
def reduce_chain(chain):
    chain = list(reversed(chain))
    o_dict = {
        "+": f[_+_],
        "-": f[_-_],
        "*": f[_*_],
        "/": f[_/_],
    }
    while len(chain) > 1:
        a, [o, b] = chain.pop(), chain.pop()
        chain.append(o_dict[o](a, b))
    return chain[0]

with peg:
    op = '+' | '-' | '*' | '/'
    value = '[0-9]+'.r // int | ('(', expr, ')') // f[_[1]]
    expr = (value, (op, value).rep is rest) >> reduce_chain([value] + rest)

print expr.parse("123")             # 123
print expr.parse("((123))")         # 123
print expr.parse("(123+456+789)")   # 1368
print expr.parse("(6/2)")           # 3
print expr.parse("(1+2+3)+2")       # 8
print expr.parse("(((((((11)))))+22+33)*(4+5+((6))))/12*(17+5)")    # 1804

MacroPEG is an implementation of Parser Combinators, an approach to building recursive descent parsers, when the task is too large for regexes but yet too small for the heavy-duty parser generators. MacroPEG is inspired by Scala's parser combinator library, utilizing python macros to make the syntax as clean as possible .

The above example describes a simple parser for arithmetic expressions, which roughly follows the PEG syntax. Note how that in the example, the bulk of the code goes into the loop that reduces sequences of numbers and operators to a single number, rather than the recursive-descent parser itself!

Any assignment (xxx = ...) within a with peg: block is transformed into a Parser. A Parser comes with a .parse(input) method, which returns the parsed result if parsing succeeds and raises a ParseError in the case of failure. The ParseError contains a nice human-readable string detailing exactly what went wrong.

json_exp.parse('{"omg": "123", "wtf": , "bbq": "789"}')
# ParseError: index: 22, line: 1, col: 23
# json_exp / obj / pair / json_exp
# {"omg": "123", "wtf": , "bbq": "789"}
#                       ^
# expected: (obj | array | string | true | false | null | number)

In addition to .parse(input), a Parser also contains:

  • parse_string(input), a more program-friendly version of parse that returns successes and failures as boxed values (with metadata).
  • a parse_partial(input) method, which is identical to parse_string, but does not require the entire input to be consumed, as long as some prefix of the input string matches. The remaining attribute of the Success indicates how far into the input string parsing proceeded.

Basic Combinators

Parsers are generally built up from a few common building blocks. The fundamental atoms include:

  • String literals like '+' match the input to their literal value (e.g. '+') and return it as the parse result, or fails if it does not match.
  • Regexes like '[0-9]+'.r match the regex to the input if possible, and return it.
  • Tuples like ('(', expr, ')') match each of the elements within sequentially, and return a list containing the result of each element. It fails if any of its elements fails.
  • Parsers separated by |, for example '+' | '-' | '*' | '/', attempt to match each of the alternatives from left to right, and return the result of the first success.
  • Parsers separated by &, for example '[1234]'.r & '[3456]'.r, require both parsers succeed, and return the result of the left side.
  • parser.rep attempts to match the parser 0 or more times, returning a list of the results from each successful match.
  • -parser negates the parser: if parser succeeded (with any result), -parser fails. If parser failed, -parser succeeds with the result "", the empty string.

Apart from the fundamental atoms, MacroPeg also provides combinators which are not strictly necessary, but are nevertheless generally useful in almost all parsing scenarios:

  • parser.rep1 attempts to match the parser 1 or more times, returning a list of the results from each successful match. If parser does not succeed at least once, parser.rep1 fails. Equivalent to parser.rep & parser.
  • parser.rep_with(other) and parser.rep1_with(other) repeat the parser 0 or more or 1 or more times respectively, except now the other parser is invoked in between invocations of parser. The output of other is discarded, and these methods return a list of values similar to rep and rep1.
  • parser * n attempts to match the parser exactly n times, returning a list of length n containing the result of the n successes. Fails otherwise.
  • parser.opt matches the parser 0 or 1 times, returning either [] or [result] where result is the result of parser. Equivalent to parser | Succeed([])
  • parser.join takes a parser that returns a list of strings (e.g. tuples, rep, rep1, etc.) and returns a parser which returns the strings concatenated together. Equivalent to parser // "".join.

Transforming values using //

So far, these building blocks all return the raw parse tree: all the things like whitespace, curly-braces, etc. will still be there. Often, you want to take a parser e.g.

from macropy.peg import macros, peg
with peg:
    num = '[0-9]+'.r

print repr(num.parse("123")) # '123'

which returns a string of digits, and convert it into a parser which returns an int with the value of that string. This can be done with the // operator:

from macropy.peg import macros, peg
with peg:
    num = '[0-9]+'.r // int

print repr(num.parse("123")) # 123

The // operator takes a function which will be used to transform the result of the parser: in this case, it is the function int, which transforms the returned string into an integer.

Another example is:

with peg:
    laugh = 'lol'
    laughs1 = 'lol'.rep1
    laughs2 = laughs1 // "".join

print laughs1.parse("lollollol") # ['lol', 'lol', 'lol]
print laughs2.parse("lollollol") # lollollol

Where the function "".join" is used to join together the list of results from laughs1 into a single string. As mentioned earlier, laughs2 can also be written as laughs2 = laughs1.join.

Binding Values using >>

Although // is sufficient for everyone's needs, it is not always convenient. In the example above, a value is defined to be:

value = ... | ('(', expr, ')') // (lambda x: x[1])

As you can see, we need to strip off the unwanted parentheses from the parse tree, and we do it with a lambda that only selects the middle element, which is the result of the expr parser. An alternate way of representing this is:

value = ... | ('(', expr is result, ')') >> result

In this case, the is keyword is used to bind the result of expr to the name result. The >> (bind) operator can be used to transform the parser by only operating on the bound results within the parser. >> also binds the results of other parsers to their name. Hence the above is equivalent to:

value = ... | ('(', expr, ')') >> expr

The expr on the left refers to the parser named expr in the with peg: block, while the expr on the right refers to the results of the parser named expr in case of a successful parse. The parser on the left has to be outside any is expressions for it to be captured as above, and so in this line in the above parser:

expr = (value, (op, value).rep is rest) >> reduce_chain([value] + rest)

The result of the first value on the left of >> is bound to value on the right, while the second value is not because it is within an is expression bound to the name rest. If you have multiple parsers of the same name on the left of >>, you can always refer to each individual explicitly using the is syntax shown above.

Althought this seems like a lot of shuffling variables around and meddling with the local scope and semantics, it goes a long way to keep things neat. For example, a JSON parser may define an array to be:

with peg:
    ...
    # parses an array and extracts the relevant bits into a Python list
     array = ('[', (json_exp, (',', json_exp).rep), space.opt, ']') // (lambda x: [x[1][0]] + [y[1] for y in x[1][1]])
    ...

Where the huge lambda is necessary to pull out the necessary parts of the parse tree into a Python list. Although it works, it's difficult to write correctly and equally difficult to read. Using the is operator, this can be rewritten as:

array = ('[', json_exp is first, (',', json_exp is rest).rep, space.opt, ']') >> [first] + rest

Now, it is clear that we are only interested in the result of the two json_exp parsers. The >> operator allows us to use those, while the rest of the parse tree ([s, ,s, etc.) are conveniently discarded. Of course, one could go a step further and us the rep_with method which is intended for exactly this purpose:

array = ('[', json_exp.rep_with(',') >> arr, space.opt, ']') >> arr

Which arguably looks the cleanest of all!

Cut

from macropy.peg import macros, peg, cut
with peg:
    expr1 = ("1", "2", "3") | ("1", "b", "c")
    expr2 = ("1", cut, "2", "3") | ("1", "b", "c")

print expr1.parse("1bc") # ['1', 'b', 'c']
print expr2.parse("1bc")
# ParseError: index: 1, line: 1, col: 2
# expr2
# 1bc
#  ^
# expected: '2'

cut is a special token used in a sequence of parsers, which commits the parsing to the current sequence. As you can see above, without cut, the left alternative fails and the parsing then attempts the right alternative, which succeeds. In contrast, with expr2, the parser is committed to the left alternative once it reaches the cut (after successfully parsing 1) and thus when the left alternative fails, the right alternative is not tried and the entire parse fails.

The purpose of cut is two-fold:

Increasing performance by removing unnecessary backtracking

Using JSON as an example: if your parser sees a {, begins parsing a JSON object, but some time later it fails, it does not need to both backtracking and attempting to parse an Array ([...), or a String ("...), or a Number. None of those could possibly succeed, so cutting the backtracking and failing fast prevents this unnecessary computation.

Better error reporting.

For example, if you try to parse the JSON String;

{        : "failed lol"}

if your JSON parser looks like:

with peg:
    ...
    json_exp = obj | array | string | num | true | false | null
    obj = '{', pair.rep_with(",") , space, '}'
    ...

Without cut, the only information you could gain from attempting to parse that is something like:

index: 0, line: 1, col: 1
json_exp
{    : 1, "wtf": 12.4123}
^
expected: (obj | array | string | true | false | null | number)

On the other hand, using a cut inside the object parser immediately after parsing the first {, we could provide a much more specific error:

index: 5, line: 1, col: 6
json_exp / obj
{    : 1, "wtf": 12.4123}
     ^
expected: '}'

In the first case, after failing to parse obj, the json_exp parser goes on to try all the other alternatives. After all to them fail to parse, it only knows that trying to parse json_exp starting from character 0 doesn't work; it has no way of knowing that the alternative that was supposed to work was obj.

In the second case, cut is inserted inside the object parser, something like:

obj = '{', cut, pair.rep_with(",") , space, '}'

Once the first { is parsed, the parser is committed to that alternative. Thus, when it fails to parse string, it knows it cannot backtrack and can immediately end the parsing. It can now give a much more specific source location (character 10) as well as better information on what it was trying to parse (json / object / string)

Full Example

MacroPEG is not limited to toy problems, like the arithmetic expression parser above. Below is the full source of a JSON parser, provided in the unit tests:

from macropy.peg import macros, peg, cut
from macropy.quick_lambda import macros, f

def decode(x):
    x = x.decode('unicode-escape')
    try:
        return str(x)
    except:
        return x

escape_map = {
    '"': '"',
    '/': '/',
    '\\': '\\',
    'b': '\b',
    'f': '\f',
    'n': '\n',
    'r': '\r',
    't': '\t'
}

"""
Sample JSON PEG grammar for reference, shameless stolen from
https://github.com/azatoth/PanPG/blob/master/grammars/JSON.peg

JSON <- S? ( Object / Array / String / True / False / Null / Number ) S?

Object <- "{"
             ( String ":" JSON ( "," String ":" JSON )*
             / S? )
         "}"

Array <- "["
            ( JSON ( "," JSON )*
            / S? )
        "]"

String <- S? ["] ( [^ " \ U+0000-U+001F ] / Escape )* ["] S?

Escape <- [\] ( [ " / \ b f n r t ] / UnicodeEscape )

UnicodeEscape <- "u" [0-9A-Fa-f]{4}

True <- "true"
False <- "false"
Null <- "null"

Number <- Minus? IntegralPart fractPart? expPart?

Minus <- "-"
IntegralPart <- "0" / [1-9] [0-9]*
fractPart <- "." [0-9]+
expPart <- ( "e" / "E" ) ( "+" / "-" )? [0-9]+
S <- [ U+0009 U+000A U+000D U+0020 ]+
"""
with peg:
        json_doc = (space, (obj | array), space) // f[_[1]]
        json_exp = (space, (obj | array | string | true | false | null | number), space) // f[_[1]]

        pair = (string is k, space, ':', cut, json_exp is v) >> (k, v)
        obj = ('{', cut, pair.rep_with(",") // dict, space, '}') // f[_[1]]
        array = ('[', cut, json_exp.rep_with(","), space, ']') // f[_[1]]

        string = (space, '"', (r'[^"\\\t\n]'.r | escape | unicode_escape).rep.join is body, '"') >> "".join(body)
        escape = ('\\', ('"' | '/' | '\\' | 'b' | 'f' | 'n' | 'r' | 't') // escape_map.get) // f[_[1]]
        unicode_escape = ('\\', 'u', ('[0-9A-Fa-f]'.r * 4).join).join // decode

        true = 'true' >> True
        false = 'false' >> False
        null = 'null' >> None

        number = decimal | integer
        integer = ('-'.opt, integral).join // int
        decimal = ('-'.opt, integral, ((fract, exp).join) | fract | exp).join // float

        integral = '0' | '[1-9][0-9]*'.r
        fract = ('.', '[0-9]+'.r).join
        exp = (('e' | 'E'), ('+' | '-').opt, "[0-9]+".r).join

        space = '\s*'.r

Testing it out with some input, we can see it works as we would expect:

test_string = """
    {
        "firstName": "John",
        "lastName": "Smith",
        "age": 25,
        "address": {
            "streetAddress": "21 2nd Street",
            "city": "New York",
            "state": "NY",
            "postalCode": 10021
        },
        "phoneNumbers": [
            {
                "type": "home",
                "number": "212 555-1234"
            },
            {
                "type": "fax",
                "number": "646 555-4567"
            }
        ]
    }
"""

import json
print json_exp.parse(test_string) == json.loads(test_string)
# True

import pprint
pp = pprint.PrettyPrinter(4)
pp.pprint(json_exp.parse(test_string))
#{   'address': {   'city': 'New York',
#                   'postalCode': 10021.0,
#                   'state': 'NY',
#                   'streetAddress': '21 2nd Street'},
#    'age': 25.0,
#    'firstName': 'John',
#    'lastName': 'Smith',
#    'phoneNumbers': [   {   'number': '212 555-1234', 'type': 'home'},
#                        {   'number': '646 555-4567', 'type': 'fax'}]}

You can see that json_exp parses that non-trivial blob of JSON into an identical structure as Python's in-built json package. In addition, the source of the parser looks almost identical to the PEG grammar it is parsing, shown above. This parser makes good use of the // and >> operators to transform the output of its individual components, as well as using rep_with method to easily parse the comma-separated JSON objects and arrays. This parser is almost fully compliant with the test cases found on the json.org website (it doesn't fail, as it should, for deeply-nested JSON arrays), which isn't bad for 50 lines of code.

As mentioned earlier, MacroPEG parsers also provide exceptions with nice error messages when the parse method fails, and the JSON parser is no exception. Even when parsing larger documents, the error reporting rises to the challenge:

json_exp.parse("""
    {
        "firstName": "John",
        "lastName": "Smith",
        "age": 25,
        "address": {
            "streetAddress": "21 2nd Street",
            "city": "New York",
            "state": "NY",
            "postalCode": 10021
        },
        "phoneNumbers": [
            {
                "type": "home",
                "number": "212 555-1234"
            },
            {
                "type": "fax",
                "number": 646 555-4567"
            }
        ]
    }
""")

# ParseError: index: 456, line: 19, col: 31
# json_exp / obj / pair / json_exp / array / json_exp / obj
#                 "number": 646 555-4567"
#                               ^
# expected: '}'

Pretty neat! This full example of a JSON parser demonstrates what MacroPEG provides to a programmer trying to write a parser:

  • Excellent error reporting
  • Simple AST processing, on the fly
  • An extremely clear PEG-like syntax
  • Extremely concise parser definitions

Not bad for an implementation that spans 350 lines of code!

Experimental Macros

Below are a selection of macros which demonstrate the cooler aspects of MacroPy, but are not currently stable or tested enough that we would be comfortable using them in production code.

Pattern Matching

from macropy.case_classes import macros, case
from macropy.experimental.pattern import macros, switch

@case
class Nil():
    pass

@case
class Cons(x, xs):
    pass

def reduce(op, my_list):
    with switch(my_list):
        if Cons(x, Nil()):
            return x
        elif Cons(x, xs):
            return op(x, reduce(op, xs))

print reduce(lambda a, b: a + b, Cons(1, Cons(2, Cons(4, Nil()))))
# 7
print reduce(lambda a, b: a * b, Cons(1, Cons(3, Cons(5, Nil()))))
# 15
print reduce(Nil(), lambda a, b: a * b)
# None

Pattern matching allows you to quickly check a variable against a series of possibilities, sort of like a switch statement on steroids. Unlike a switch statement in other languages (Java, C++), the switch macro allows you to match against the inside of a pattern: in this case, not just that my_list is a Cons object, but also that the xs member of my_list is a Nil object. This can be nested arbitrarily deep, and allows you to easily check if a data-structure has a particular shape that you are expecting. Out of convenience, the value of the leaf nodes in the pattern are bound to local variables, so you can immediately use x and xs inside the body of the if-statement without having to extract it (again) from my_list.

The reduce function above (an simple, cons-list specific implementation of reduce) takes a Cons list (defined using case classes) and quickly checks if it either a Cons with a Nil right hand side, or a Cons with something else. This is converted (roughly) into:

def reduce(my_list, op):
    if isinstance(my_list, Cons) and isinstance(my_list.xs, Nil):
        x = my_list.x
        return x
    elif isinstance(my_list, Cons):
        x = my_list.x
        xs = my_list.xs
        return op(x, reduce(xs, op))

Which is significantly messier to write, with all the isinstance checks cluttering up the code and having to manually extract the values you need from my_list after the isinstance checks have passed.

Another common use case for pattern matching is working with tree structures, like ASTs. This macro is a stylized version of the MacroPy code to identify with ...: macros:

def expand_macros(node):
    with switch(node):
        if With(Name(name)):
            return handle(name)
        else:
            return node

Compare it to the same code written manually using if-elses:

def expand_macros(node):
    if isinstance(node, With) \
            and isinstance(node.context_expr, Name) \
            and node.context_expr.id in macros.block_registry:
        name = node.context_expr.id

            return handle(name)
    else:
        return node

As you can see, matching against With(Name(name)) is a quick and easy way of checking that the value in node matches a particular shape, and is much less cumbersome than a series of conditionals.

It is also possible to use pattern matching outside of a switch, by using the patterns macro. Within patterns, any left shift (<<) statement attempts to match the value on the right to the pattern on the left, allowing nested matches and binding variables as described earlier.

from macropy.experimental.pattern import macros, patterns
from macropy.case_classes import macros, case

@case
class Rect(p1, p2): pass

@case
class Line(p1, p2): pass

@case
class Point(x, y): pass

def area(rect):
    with patterns:
        Rect(Point(x1, y1), Point(x2, y2)) << rect
        return (x2 - x1) * (y2 - y1)

print area(Rect(Point(1, 1), Point(3, 3))) # 4

If the match fails, a PatternMatchException will be thrown.

print area(Line(Point(1, 1), Point(3, 3)))
# macropy.macros.pattern.PatternMatchException: Matchee should be of type <class 'scratch.Rect'>

Class Matching Details

When you pattern match Foo(x, y) against a value Foo(3, 4), what happens behind the scenes is that the constructor of Foo is inspected. We may find that it takes two parameters a and b. We assume that the constructor then contains lines like:

self.a = a
self.b = b

We don't have access to the source of Foo, so this is the best we can do. Then Foo(x, y) << Foo(3, 4) is transformed roughly into

tmp = Foo(3,4)
tmp_matcher = ClassMatcher(Foo, [NameMatcher('x'), NameMatcher('y')])
tmp_matcher.match(tmp)
x = tmp_matcher.getVar('x')
y = tmp_matcher.getVar('y')

In some cases, constructors will not be so standard. In this case, we can use keyword arguments to pattern match against named fields. For example, an equivalent to the above which doesn't rely on the specific implementation of th constructor is Foo(a=x, b=y) << Foo(3, 4). Here the semantics are that the field a is extracted from Foo(3,4) to be matched against the simple pattern x. We could also replace x with a more complex pattern, as in Foo(a=Bar(z), b=y) << Foo(Bar(2), 4).

Custom Patterns

It is also possible to completely override the way in which a pattern is matched by defining an __unapply__ class method of the class which you are pattern matching. The 'class' need not actually be the type of the matched object, as in the following example borrowed from Scala. The __unapply__ method takes as arguments the value being matched, as well as a list of keywords.

The method should then return a tuple of a list of positional matches, and a dictionary of the keyword matches.

class Twice(object):
    @classmethod
    def __unapply__(clazz, x, kw_keys):
        if not isinstance(x, int) or x % 2 != 0:
            raise PatternMatchException()
        else:
            return ([x/2], {})

with patterns:
    Twice(n) << 8
    print n     # 4

Tail-call Optimization

from macropy.experimental.tco import macros, tco

@tco
def fact(n, acc=0):
    if n == 0:
        return acc
    else:
        return fact(n-1, n * acc)

print fact(10000)  # doesn't stack overflow
# 28462596809170545189064132121198688901...

Tail-call Optimization is a technique which will optimize away the stack usage of functions calls which are in a tail position. Intuitively, if a function A calls another function B, but does not do any computation after B returns (i.e. A returns immediately when B returns), we don't need to keep around the stack frame for A, which is normally used to store where to resume the computation after B returns. By optimizing this, we can prevent really deep tail-recursive functions (like the factorial example above) from overflowing the stack.

The @tco decorator macro doesn't just work with tail-recursive functions, but also with any generic tail-calls (of either a function or a method) via trampolining, such this mutually recursive example:

from macropy.experimental.tco import macros, tco

class Example(object):

    @tco
    def odd(n):
    if n < 0:
        return odd(-n)
    elif n == 0:
        return False
    else:
        return even(n - 1)

    @tco
    def even(n):
        if n == 0:
            return True
        else:
            return odd(n-1)

print Example().even(100000)  # No stack overflow
# True

Note that both odd and even were both decorated with @tco. All functions which would ordinarily use too many stack frames must be decorated.

Trampolining

How is tail recursion implemented? The idea is that if a function f would return the result of a recursive call to some function g, it could instead return g, along with whatever arguments it would have passed to g. Then instead of running f directly, we run trampoline(f), which will call f, call the result of f, call the result of that f, etc. until finally some call returns an actual value.

A transformed (and simplified) version of the tail-call optimized factorial would look like this

def trampoline_decorator(func):
    def trampolined(*args):
        if not in_trampoline():
            return trampoline(func, args)
        return func(*args)
    return trampolined

def trampoline(func, args):
  _enter_trampoline()
  while True:
        result = func(*args)
        with patterns:
            if ('macropy-tco-call', func, args) << result:
                pass
            else:
                if ignoring:
                    _exit_trampoline()
                    return None
                else:
                    _exit_trampoline()
                    return result

@trampoline_decorator
def fact(n, acc):
    if n == 0:
        return 1
    else:
        return ('macropy-tco-call', fact, [n-1, n * acc])

PINQ to SQLAlchemy

from macropy.experimental.pinq import macros, sql, query, generate_schema
from sqlalchemy import *

# prepare database
engine = create_engine("sqlite://")
for line in open("macropy/experimental/test/world.sql").read().split(";"):
    engine.execute(line.strip())

db = generate_schema(engine)

# Countries in Europe with a GNP per Capita greater than the UK
results = query[(
    x.name for x in db.country
    if x.gnp / x.population > (
        y.gnp / y.population for y in db.country
        if y.name == 'United Kingdom'
    ).as_scalar()
    if (x.continent == 'Europe')
)]
for line in results: print line
# (u'Austria',)
# (u'Belgium',)
# (u'Switzerland',)
# (u'Germany',)
# (u'Denmark',)
# (u'Finland',)
# (u'France',)
# (u'Iceland',)
# (u'Liechtenstein',)
# (u'Luxembourg',)
# (u'Netherlands',)
# (u'Norway',)
# (u'Sweden',)

PINQ (Python INtegrated Query) to SQLAlchemy is inspired by C#'s LINQ to SQL. In short, code used to manipulate lists is lifted into an AST which is then cross-compiled into a snippet of SQL. In this case, it is the query macro which does this lifting and cross-compilation. Instead of performing the manipulation locally on some data structure, the compiled query is sent to a remote database to be performed there.

This allows you to write queries to a database in the same way you would write queries on in-memory lists, which is really very nice. The translation is a relatively thin layer of over the SQLAlchemy Query Language, which does the heavy lifting of converting the query into a raw SQL string:. If we start with a simple query:

# Countries with a land area greater than 10 million square kilometers
print query[((x.name, x.surface_area) for x in db.country if x.surface_area > 10000000)\
# [(u'Antarctica', Decimal('13120000.0000000000')), (u'Russian Federation', Decimal('17075400.0000000000'))]

This is to the equivalent SQLAlchemy query:

print engine.execute(select([country.c.name, country.c.surface_area]).where(country.c.surface_area > 10000000)).fetchall()

To verify that PINQ is actually cross-compiling the python to SQL, and not simply requesting everything and performing the manipulation locally, we can use the sql macro to perform the lifting of the query without executing it:

query_string = sql[((x.name, x.surface_area) for x in db.country if x.surface_area > 10000000)]
print type(query_string)
# <class 'sqlalchemy.sql.expression.Select'>
print query_string
# SELECT country_1.name, country_1.surface_area
# FROM country AS country_1
# WHERE country_1.surface_area > ?

As we can see, PINQ converts the python list-comprehension into a SQLAlchemy Select, which when stringified becomes a valid SQL string. The ?s are there because SQLAlchemy uses parametrized queries, and doesn't interpolate values into the query itself.

Consider a less trivial example: we want to find all countries in europe who have a GNP per Capita greater than the United Kingdom. This is the SQLAlchemy code to do so:

query = select([db.country.c.name]).where(
    db.country.c.gnp / db.country.c.population > select(
        [(db.country.c.gnp / db.country.c.population)]
    ).where(
            db.country.c.name == 'United Kingdom'
    ).as_scalar()
).where(
    db.country.c.continent == 'Europe'
)

The SQLAlchemy query looks pretty odd, for somebody who knows python but isn't familiar with the library. This is because SQLAlchemy cannot lift Python code into an AST to manipulate, and instead have to construct the AST manually using python objects. Although it works pretty well, the syntax and semantics of the queries is completely different from python.

Already we are bumping into edge cases: the db.country in the nested query is referred to the same way as the db.country in the outer query, although they are clearly different! One may wonder, what if, in the inner query, we wish to refer to the outer query's values? Naturally, there will be solutions to all of these requirements. In the end, SQLAlchemy ends up effectively creating its own mini programming language, with its own concept of scoping, name binding, etc., basically duplicating what Python already has but with messier syntax and subtly different semantics.

In the equivalent PINQ code, the scoping of which db.country you are referring to is much more explicit, and in general the semantics are identical to a typical python comprehension:

query = sql[(
    x.name for x in db.country
    if x.gnp / x.population > (
        y.gnp / y.population for y in db.country
        if y.name == 'United Kingdom'
    ).as_scalar()
    if (x.continent == 'Europe')
)]

As we can see, rather than mysteriously referring to the db.country all over the place, we clearly bind it in two places: once to the variable x in the outer query, once to the variable `

macropy open issues Ask a question     (View All Issues)
  • almost 4 years quick lambda execution error in python 2.7.9
  • almost 5 years Wrong reference in readme.md
  • almost 5 years Wrong gist reference in readme.md
  • about 5 years Making self implicit in objects
  • about 5 years Pattern matching examples broken: global name '_matching' is not defined
  • over 5 years Set up CI
  • over 5 years Target lost if macro takes arguments
  • almost 7 years Macro Expansion Script
  • about 7 years Inline/Optimization Macros - Take 2
  • about 7 years Separate parts README.md to CONTRIBUTING.md
  • about 7 years Test breakage in core/test/exporters
  • over 7 years Read/write closures
  • over 7 years Get MacroPy working on Python 3.4
  • over 7 years Lenses
  • over 7 years Use pattern matching internally
  • over 7 years A custom AST hierarchy for our own use
macropy open pull requests (View All Pulls)
  • Corrected erroneous caching example
  • New base macro: d, for dictionary literal shorthand
  • Fixed file paths in peg and exporter tests.
  • Fix README formatting
  • Fix typo in README markdown
  • Added travis.yml
macropy questions on Stackoverflow (View All Questions)
  • MacroPy installation fails
macropy list of languages used
Other projects in Python
Powered by Autocode - Instant Webhooks, Scripts and APIs
Autocode logo wordmark