A simple filtering syntax tree in Python

A simple filtering syntax tree in Python

Working on various pieces of software those last years, I noticed that there's always a feature that requires implementing some DSL.

The problem with DSL is that it is never the road that you want to go. I remember how creating my first DSL was fascinating: after using programming languages for years, I was finally designing my own tiny language!

A new language that my users would have to learn and master. Oh, it had nothing new, it was a subset of something, inspired by my years of C, Perl or Python, who knows. And that's the terrible part about DSL: they are an marvelous tradeoff between the power that they give to users, allowing them to define precisely their needs and the cumbersomeness of learning a language that will be useful in only one specific situation.

In this blog post, I would like to introduce a very unsophisticated way of implementing the syntax tree that could be used as a basis for a DSL. The goal of that syntax tree will be filtering. The problem it will solve is the following: having a piece of data, we want the user to tell us if the data matches their conditions or not.

To give a concrete example: a machine wants to grant the user the ability to filter the beans that it should keep. What the machine passes to the filter is the size of the current grain, and the filter should return either true or false, based on the condition defined by the user: for example, only keep beans that are bigger that are between 1 and 2 centimeters or between 4 and 6 centimeters.

The number of conditions that the users can define could be quite considerable, and we want to provide at least a basic set of predicate operators: equal, greater than and lesser than. We also want the user to be able to combine those, so we'll add the logical operators or and and.

A set of conditions can be seen as a tree, where leaves are either predicates, and in that case, do not have children, or are logical operators, and have children. For example, the propositional logic formula Ο†1 ∨ (Ο†2 ∨ Ο†3) can be represented with as a tree like this:

Mqs5i-1

Starting with this in mind, it appears that the natural solution is going to be recursive: handle the predicate as terminal, and if the node is a logical operator, recurse over its children.
Since we will be doing Python, we're going to use Python to evaluate our syntax tree.

The simplest way to write a tree in Python is going to be using dictionaries. A dictionary will represent one node and will have only one key and one value: the key will be the name of the operator (equal, greater than, or, and…) and the value will be the argument of this operator if it is a predicate, or a list of children (as dictionaries) if it is a logical operator.

For example, to filter our bean, we would create a tree such as:

{"or": [
  {"and": [
    {"ge": 1},
    {"le": 2},
  ]},
  {"and": [
    {"ge": 4},
    {"le": 6},
  ]},
]}

The goal here is to walk through the tree and evaluate each of the leaves of the tree and returning the final result: if we passed 5 to this filter, it would return True, and if we passed 10 to this filter, it would return False.

Here's how we could implement a very depthless filter that only handles predicates (for now):

import operator

class InvalidQuery(Exception):
    pass

class Filter(object):
    binary_operators = {
        "eq": operator.eq,
        "gt": operator.gt,
        "ge": operator.ge,
        "lt": operator.lt,
        "le": operator.le,
    }

    def __init__(self, tree):
        # Parse the tree and store the evaluator
        self._eval = self.build_evaluator(tree)

    def __call__(self, value):
        # Call the evaluator with the value
        return self._eval(value)

    def build_evaluator(self, tree):
        try:
            # Pick the first item of the dictionary.
            # If the dictionary has multiple keys/values
            # the first one (= random) will be picked.
            # The key is the operator name (e.g. "eq")
            # and the value is the argument for it
            operator, nodes = list(tree.items())[0]
        except Exception:
            raise InvalidQuery("Unable to parse tree %s" % tree)
        try:
            # Lookup the operator name
            op = self.binary_operators[operator]
        except KeyError:
            raise InvalidQuery("Unknown operator %s" % operator)
        # Return a function (lambda) that takes
        # the filtered value as argument and returns
        # the result of the predicate evaluation
        return lambda value: op(value, nodes)

You can use this Filter class by passing a predicate such as {"eq": 4}:

>>> f = Filter({"eq": 4})
>>> f(2)
False
>>> f(4)
True

This Filter class works but is quite limited as we did not provide logical operators. Here's a complete implementation that supports binary operators and and or:

import operator


class InvalidQuery(Exception):
    pass


class Filter(object):
    binary_operators = {
        u"=": operator.eq,
        u"==": operator.eq,
        u"eq": operator.eq,

        u"<": operator.lt,
        u"lt": operator.lt,

        u">": operator.gt,
        u"gt": operator.gt,

        u"<=": operator.le,
        u"≀": operator.le,
        u"le": operator.le,

        u">=": operator.ge,
        u"β‰₯": operator.ge,
        u"ge": operator.ge,

        u"!=": operator.ne,
        u"β‰ ": operator.ne,
        u"ne": operator.ne,
    }

    multiple_operators = {
        u"or": any,
        u"∨": any,
        u"and": all,
        u"∧": all,
    }

    def __init__(self, tree):
        self._eval = self.build_evaluator(tree)

    def __call__(self, value):
        return self._eval(value)

    def build_evaluator(self, tree):
        try:
            operator, nodes = list(tree.items())[0]
        except Exception:
            raise InvalidQuery("Unable to parse tree %s" % tree)
        try:
            op = self.multiple_operators[operator]
        except KeyError:
            try:
                op = self.binary_operators[operator]
            except KeyError:
                raise InvalidQuery("Unknown operator %s" % operator)
            return lambda value: op(value, nodes)
        # Iterate over every item in the list of the value linked
        # to the logical operator, and compile it down to its own
        # evaluator.
        elements = [self.build_evaluator(node) for node in nodes]
        return lambda value: op((e(value) for e in elements))

To support the and and or operators, we leverage the all and any built-in Python functions. They are called with an argument that is a generator that evaluates each one of the sub-evaluator, doing the trick.

Unicode is the new sexy, so I've also added Unicode symbols support.

And it is now possible to implement our full example:

>>> f = Filter(
...     {"∨": [
...         {"∧": [
...             {"β‰₯": 1},
...             {"≀": 2},
...         ]},
...         {"∧": [
...             {"β‰₯": 4},
...             {"≀": 6},
...         ]},
...     ]})
>>> f(5)
True
>>> f(8)
False
>>> f(1)
True

As an exercise, you could try to add the not operator, which deserve its own category as it is a unary operator!

In the next blog post, we will see how to improve that filter with more features, and how to implement a domain-specific language on top of it, to make humans happy when writing the filter!

IMG_20180427_180044--1-

Hole and Henni – FranΓ§ois Charlier, 2018
In this drawing, the artist represents the deepness of functional programming and how its horse power can help you escape many dark situations.