<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>python — jd:/dev/blog</title><description>Posts tagged &quot;python&quot; on jd:/dev/blog.</description><link>https://julien.danjou.info/</link><item><title>Fixing Alembic&apos;s Multiple Heads Problem with Git</title><link>https://julien.danjou.info/blog/fixing-alembics-multiple-heads-problem-with-git/</link><guid isPermaLink="true">https://julien.danjou.info/blog/fixing-alembics-multiple-heads-problem-with-git/</guid><description>Every team using Alembic with parallel branches hits the same wall. We built a library that uses git commit history to fix it.</description><pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/alembic-multiple-heads.webp&quot; alt=&quot;Alembic multiple heads illustration&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If you&apos;ve used &lt;a href=&quot;https://alembic.sqlalchemy.org/&quot;&gt;Alembic&lt;/a&gt; on a team with more than one developer, you&apos;ve seen this error:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;FAILED: Multiple head revisions are present for given argument &apos;head&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It means two migrations landed with the same &lt;code&gt;down_revision&lt;/code&gt;, and Alembic doesn&apos;t know which one comes first. The fix is always the same: someone manually edits the migration file to re-chain the revisions. At &lt;a href=&quot;https://mergify.com&quot;&gt;Mergify&lt;/a&gt;, with anywhere between 1 and 500 active migrations (depending on when we last squashed), this was happening multiple times a week. We got tired of it, so we built a library that makes it impossible.&lt;/p&gt;
&lt;h2&gt;How Alembic migrations work&lt;/h2&gt;
&lt;p&gt;Every Alembic migration file contains a &lt;code&gt;revision&lt;/code&gt; (its own ID) and a &lt;code&gt;down_revision&lt;/code&gt; (the ID of the migration it depends on). Together, they form a linked list:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/alembic-linear-chain.svg&quot; alt=&quot;A normal Alembic migration chain: rev_a → rev_b → rev_c → rev_d, linear&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This works perfectly when one developer creates migrations at a time. Each new migration points to the previous head, and the chain stays linear.&lt;/p&gt;
&lt;h2&gt;What breaks with parallel branches&lt;/h2&gt;
&lt;p&gt;Real teams don&apos;t work sequentially. Two developers create feature branches from the same &lt;code&gt;main&lt;/code&gt;, and both generate a migration. Both migrations set their &lt;code&gt;down_revision&lt;/code&gt; to the current head, &lt;code&gt;rev_c&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/alembic-multiple-heads.svg&quot; alt=&quot;Two branches both create migrations pointing to rev_c as down_revision, creating a fork with multiple heads&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The first PR merges fine. The second one breaks because Alembic now has two heads: &lt;code&gt;rev_d&lt;/code&gt; and &lt;code&gt;rev_e&lt;/code&gt; both claim &lt;code&gt;rev_c&lt;/code&gt; as their parent.&lt;/p&gt;
&lt;p&gt;We use a &lt;a href=&quot;https://merge-queue.academy/introduction/what-is-a-merge-queue/&quot;&gt;merge queue&lt;/a&gt;, so this never breaks &lt;code&gt;main&lt;/code&gt; directly. The queue detects the conflict and tells the developer to fix it. But it doesn&apos;t fix it &lt;em&gt;for&lt;/em&gt; them. Someone still has to stop what they&apos;re doing, pull the latest &lt;code&gt;main&lt;/code&gt;, update the &lt;code&gt;down_revision&lt;/code&gt; by hand, and push again. The queue turns a production incident into a developer interruption, but the interruption still costs time.&lt;/p&gt;
&lt;p&gt;Without a merge queue, it&apos;s worse: the second PR merges, &lt;code&gt;main&lt;/code&gt; breaks, and now everyone is blocked.&lt;/p&gt;
&lt;p&gt;Alembic has a built-in &lt;code&gt;alembic merge&lt;/code&gt; command for this. It creates a merge revision that joins the two heads. But it doesn&apos;t solve the ordering problem: which migration runs first? If both touch the same table, the order matters, and &lt;code&gt;alembic merge&lt;/code&gt; won&apos;t pick it for you. You also end up with empty migration files that clutter the history, and you still need manual intervention every time. It patches the symptom without fixing the cause.&lt;/p&gt;
&lt;h2&gt;The insight: git already knows the order&lt;/h2&gt;
&lt;p&gt;We deploy migrations as soon as they merge. We never modify a migration after it&apos;s merged. Those two constraints matter: they mean the migration chain is a single timeline, and the order is determined by when each migration file was first committed to &lt;code&gt;main&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Git already tracks that. &lt;code&gt;git log --reverse --diff-filter=A&lt;/code&gt; tells you exactly when each file was added to the repository, in order. That&apos;s the chain. It works with squash merges, regular merges, and rebases, because all that matters is the final commit order on &lt;code&gt;main&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;The fix&lt;/h2&gt;
&lt;p&gt;We built &lt;a href=&quot;https://github.com/mergifyio/alembic-git-revisions&quot;&gt;&lt;code&gt;alembic-git-revisions&lt;/code&gt;&lt;/a&gt;, an open source library that replaces the hardcoded &lt;code&gt;down_revision&lt;/code&gt; with a function call:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from alembic_git_revisions import get_down_revision

revision = &quot;5c9eb899ede0&quot;
down_revision = get_down_revision(revision)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At runtime, &lt;code&gt;get_down_revision&lt;/code&gt; builds the chain from git history and returns the correct parent. Two migrations pointing to the same &lt;code&gt;down_revision&lt;/code&gt;? Doesn&apos;t matter. Git commit order figures out the rest:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/alembic-git-fix.svg&quot; alt=&quot;The fix: git commit order determines rev_d → rev_e, chain stays linear&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The ordering stays linear regardless of how many branches merge in what order. To be clear: this solves the &lt;em&gt;ordering&lt;/em&gt; problem (multiple heads), not semantic conflicts. If two developers both add a column with the same name, that&apos;s a schema conflict and no tool can auto-resolve it. But the &quot;who comes after whom&quot; question? Git already answered it.&lt;/p&gt;
&lt;p&gt;Existing migrations with hardcoded &lt;code&gt;down_revision&lt;/code&gt; values keep working. The library detects which migrations use the old static style and which use the new dynamic resolution, then stitches them together into a single chain.&lt;/p&gt;
&lt;h2&gt;Setup&lt;/h2&gt;
&lt;p&gt;Install the library:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pip install alembic-git-revisions
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then update your Alembic migration template (&lt;code&gt;script.py.mako&lt;/code&gt;) to import &lt;code&gt;get_down_revision&lt;/code&gt; instead of hardcoding the revision. The package includes a &lt;a href=&quot;https://github.com/mergifyio/alembic-git-revisions&quot;&gt;reference template&lt;/a&gt; you can copy.&lt;/p&gt;
&lt;p&gt;That&apos;s it. Every &lt;code&gt;alembic revision --autogenerate&lt;/code&gt; from that point on produces migrations that chain themselves.&lt;/p&gt;
&lt;h2&gt;What about Docker and CI?&lt;/h2&gt;
&lt;p&gt;Git history isn&apos;t always available. Docker builds often use shallow clones or no repository at all. (If your &lt;a href=&quot;https://julien.danjou.info/blog/your-ci-pipeline-wasnt-built-for-this&quot;&gt;CI pipeline is already under pressure&lt;/a&gt;, you don&apos;t want migration resolution adding to the problem.) For those environments, the library can pre-generate a &lt;code&gt;revision_chain.json&lt;/code&gt; file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;alembic-git-revisions /path/to/alembic/versions
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This produces a JSON mapping of every revision to its parent. At runtime, the library checks for this file first and falls back to git only if it&apos;s missing.&lt;/p&gt;
&lt;h2&gt;One month in&lt;/h2&gt;
&lt;p&gt;We&apos;ve been running &lt;code&gt;alembic-git-revisions&lt;/code&gt; for a month. Zero multiple heads errors. The manual re-chaining that used to interrupt developers multiple times a week just stopped happening.&lt;/p&gt;
&lt;p&gt;The multiple heads problem is one of those things that&apos;s small enough to ignore for a while and annoying enough to slow your team down over time. If you&apos;re using Alembic with more than one developer, &lt;a href=&quot;https://github.com/mergifyio/alembic-git-revisions&quot;&gt;give it a try&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>python</category><category>engineering</category></item><item><title>The Hidden Cost of Badly Typed Python Wrappers</title><link>https://julien.danjou.info/blog/the-hidden-cost-of-badly-typed-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/the-hidden-cost-of-badly-typed-python/</guid><description>And How to Fix Them</description><pubDate>Tue, 25 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;What about some technical stuff this week?&lt;/p&gt;
&lt;p&gt;Writing wrappers in Python is a common practice. Whether it’s to simplify function calls, encapsulate complexity, or create a cleaner API, wrapping functions can be a great way to organize code. But there’s a catch: &lt;strong&gt;if you’re not typing your wrappers correctly, you might be introducing subtle bugs that your type checker won’t catch.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you’re using &lt;strong&gt;&lt;a href=&quot;https://mypy-lang.org/&quot;&gt;Mypy&lt;/a&gt;&lt;/strong&gt; (or another static type checker like &lt;a href=&quot;https://julien.danjou.info/blog/the-journey-of-embracing-linters&quot;&gt;ruff&lt;/a&gt;), you should be careful about &lt;strong&gt;blindly passing&lt;/strong&gt; &lt;code&gt;*args&lt;/code&gt; &lt;strong&gt;and&lt;/strong&gt; &lt;code&gt;**kwargs&lt;/code&gt; &lt;strong&gt;as&lt;/strong&gt; &lt;code&gt;Any&lt;/code&gt;—because doing so effectively turns off your type checker, making your code vulnerable to runtime errors that should have been caught statically.&lt;/p&gt;
&lt;p&gt;Let’s dive into &lt;strong&gt;why this is a problem, why traditional approaches fail, and what the correct way to handle wrapped functions is.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/d2a0693b-8828-4e0f-955d-0fdf30dd363e_1376x864.png&quot; alt=&quot;Illustration of type checking pitfalls in Python wrapper functions&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;The Common but Flawed Wrapper Pattern&lt;/h2&gt;
&lt;p&gt;Here’s a classic example of an incorrectly typed wrapper function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import typing

def make_request(url: str, *args: typing.Any, **kwargs: typing.Any):
    return send_request(HttpClient(url), *args, **kwargs)

def send_request(client: &quot;HttpClient&quot;, method: str = &quot;GET&quot;, timeout: int = 5) -&amp;gt; str:
    return f&quot;Request sent to {client.url} with method {method} and timeout {timeout}s&quot;

class HttpClient:
    def __init__(self, url: str):
        self.url = url
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What’s the issue here?&lt;/p&gt;
&lt;p&gt;At first glance, this seems fine. We’re creating an &lt;code&gt;HttpClient&lt;/code&gt; for a given url and passing all additional arguments directly to send_request().&lt;/p&gt;
&lt;p&gt;But the problem arises when you pass the wrong arguments:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;make_request(&quot;https://example.com&quot;, method=&quot;POST&quot;, timout=10)  # ❌ Typo in &quot;timeout&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will result in a &lt;strong&gt;runtime error&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;TypeError: send_request() got an unexpected keyword argument &apos;timout&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since &lt;code&gt;make_request()&lt;/code&gt; uses &lt;code&gt;*args: Any&lt;/code&gt; and &lt;code&gt;**kwargs: Any&lt;/code&gt;, &lt;strong&gt;Mypy won’t flag this mistake.&lt;/strong&gt; The type checker has no way to verify whether the arguments passed to &lt;code&gt;make_request()&lt;/code&gt; are valid for &lt;code&gt;send_request()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Using&lt;/strong&gt; &lt;code&gt;Any&lt;/code&gt; &lt;strong&gt;like this completely disables type checking, making Mypy useless for catching argument mismatches.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/485a66d5-412f-479a-b2d5-c4607c1ee06e_1376x864.webp&quot; alt=&quot;Illustration of using Any disabling Mypy type checking in Python&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;What About Using ParamSpec? (And Why It Doesn’t Work)&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;A natural instinct is to use &lt;code&gt;ParamSpec&lt;/code&gt; to tell Mypy that &lt;code&gt;make_request()&lt;/code&gt; should take the exact same arguments as send_request().&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from typing import ParamSpec, Callable

P = ParamSpec(&quot;P&quot;)

def make_request(url: str, *args: P.args, **kwargs: P.kwargs):
    return send_request(HttpClient(url), *args, **kwargs)  # ❌ Won&apos;t work
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Why doesn’t this work?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;ParamSpec&lt;/code&gt; is &lt;strong&gt;only useful for decorators and higher-order functions&lt;/strong&gt;—functions that return another function.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It &lt;strong&gt;does not work for simple wrappers&lt;/strong&gt; like this, where you’re directly calling the function inside the wrapper.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you try this, &lt;strong&gt;Mypy will complain&lt;/strong&gt; that ParamSpec is being used incorrectly.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;This means that traditional wrapper functions in Python&lt;/strong&gt;—where you take &lt;code&gt;*args&lt;/code&gt; and &lt;code&gt;**kwargs&lt;/code&gt; and pass them blindly—&lt;strong&gt;are no longer a good practice in a world where static typing matters.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;The Correct Approach: Using&lt;/strong&gt; &lt;code&gt;functools.partial&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Instead of directly calling send_request() within make_request(), we should &lt;strong&gt;return a callable function using&lt;/strong&gt; &lt;code&gt;functools.partial&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Here’s how you do it properly:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from functools import partial

def make_request(url: str):
    return partial(send_request, HttpClient(url))

# Correct Usage
request = make_request(&quot;https://example.com&quot;)
print(request(method=&quot;POST&quot;, timeout=10))  # ✅ Works correctly

# Incorrect Usage
print(request(method=&quot;POST&quot;, timout=10))  # ❌ Mypy will catch this!
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;&lt;strong&gt;Why This Works&lt;/strong&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;✅ &lt;strong&gt;Mypy can now properly check argument correctness:&lt;/strong&gt; request has the exact same signature as &lt;code&gt;send_request()&lt;/code&gt;, ensuring proper type safety.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;✅ &lt;strong&gt;No more unexpected runtime errors:&lt;/strong&gt; if you pass an invalid argument, &lt;strong&gt;Mypy will flag it before you even run the code.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;✅ &lt;strong&gt;More maintainable code:&lt;/strong&gt; this pattern makes it clear &lt;strong&gt;what arguments belong to what function&lt;/strong&gt; instead of having them blindly passed along.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stop Using&lt;/strong&gt; &lt;code&gt;*args: Any, **kwargs: Any&lt;/code&gt; &lt;strong&gt;in Wrappers:&lt;/strong&gt; this disables type checking and opens your code to hard-to-debug errors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ParamSpec is NOT a fix:&lt;/strong&gt; it only works for decorators and cannot be used to type generic wrapper functions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use&lt;/strong&gt; &lt;code&gt;functools.partial&lt;/code&gt; &lt;strong&gt;Instead&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This ensures that type checkers can properly verify arguments &lt;strong&gt;while keeping the flexibility of a wrapper.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Final Thoughts&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Python’s type system has evolved significantly, and many old habits—like blindly wrapping functions with Any—should now considered bad practice.&lt;/p&gt;
&lt;p&gt;By using &lt;code&gt;functools.partial&lt;/code&gt;, you ensure that your wrapped functions remain type-safe, predictable, and error-free.&lt;/p&gt;
&lt;p&gt;Start refactoring your wrappers today—you’ll have fewer bugs, cleaner code, and a much happier type checker.&lt;/p&gt;
&lt;p&gt;Have you encountered issues with typing wrappers in Python? Do you have alternative approaches? Let’s discuss in the comments! 🚀&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>A Decade of Writing Books and Selling 25,000 Copies</title><link>https://julien.danjou.info/blog/a-decade-of-writing-books-and-selling/</link><guid isPermaLink="true">https://julien.danjou.info/blog/a-decade-of-writing-books-and-selling/</guid><description>Reflecting on the Journey and Impact of Writing Technical Books.</description><pubDate>Wed, 12 Jun 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Ten years ago, I embarked on a journey that profoundly shaped my career and personal growth. Writing my first book, &lt;em&gt;The Hacker&apos;s Guide to Python&lt;/em&gt; (later updated and renamed &lt;em&gt;Serious Python&lt;/em&gt;), marked the beginning of a series of literary endeavors that allowed me to share my knowledge, experiences, and passion for Python programming with a global audience. Today, I reflect on this journey, the lessons learned, and the incredible milestones achieved along the way.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/a7a65530-4515-43ba-b7d2-2dc29f8af2af_2448x3264.webp&quot; alt=&quot;First print of The Hacker&apos;s Guide to Python in 2014&quot; /&gt;
&lt;em&gt;First print of The Hacker’s Guide to Python in 2014&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;The Genesis: The Hacker&apos;s Guide to Python&lt;/h2&gt;
&lt;p&gt;In March 2014, I published my first book, &lt;em&gt;The Hacker&apos;s Guide to Python&lt;/em&gt;. This book was born out of a desire to provide a comprehensive resource for Python developers, offering insights and techniques I had gathered over the years. The response was overwhelmingly positive, and it motivated me to continue writing and sharing my expertise. I sold over 3,000 copies of the book in a couple of years, which is a very good number in its category.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Challenges and Time Investment&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Writing was an enormous undertaking that demanded significant time and effort. I spent around 150 hours on each book, covering various activities from writing and editing to marketing and publishing. The process was spread over a year most of the time.&lt;/p&gt;
&lt;p&gt;One of the toughest challenges was constructing a coherent and comprehensive table of contents. This initial step was crucial, as it guided the entire writing process, making the subsequent task of filling in the blanks somewhat more manageable. Additionally, I had to balance my time between my day job as a software engineer and this side project, making time management a critical aspect of the endeavor.&lt;/p&gt;
&lt;p&gt;Another significant difficulty was the proofreading process. I needed both technical and language reviews to ensure the content was accurate and well-written, considering English is not my native language. Finding reliable reviewers who could provide timely and constructive feedback was challenging. Despite reaching out to many contacts, only a fraction responded and contributed consistently.&lt;/p&gt;
&lt;p&gt;Self-publishing also taught me &lt;em&gt;marketing&lt;/em&gt;, one of the best skills I could have learned, and I’m still leveraging it to this day while working on &lt;a href=&quot;https://mergify.com&quot;&gt;Mergify&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Scaling Python and Serious Python&lt;/h2&gt;
&lt;p&gt;Following the success of my first book, I continued exploring new topics and challenges within the Python ecosystem. &lt;em&gt;&lt;a href=&quot;http://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;&lt;/em&gt;, published in 2017, delved into the complexities of scaling applications, a topic that resonated with many developers facing similar challenges. I distributed around 1,000 copies of this book.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/86b0bfff-9015-4bf6-a5d6-a027cbc6f98a_329x459.png&quot; alt=&quot;Cover of Scaling Python&quot; /&gt;
&lt;em&gt;Scaling Python cover&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In 2019, after being approached by &lt;a href=&quot;https://nostarch.com/&quot;&gt;No Starch&lt;/a&gt;, I released &lt;em&gt;&lt;a href=&quot;https://serious-python.com&quot;&gt;Serious Python&lt;/a&gt;&lt;/em&gt;, a book aimed at helping developers write more efficient, maintainable, and scalable code. Both books received praise for their practical approach and in-depth coverage of advanced topics. Being backed by No Starch helped the book to be distributed widely, and made it reach &lt;strong&gt;20,000 copies&lt;/strong&gt; as of today.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/1ef41626-ec9b-4637-9e5d-2962a2ce27b6_2284x2284.jpeg&quot; alt=&quot;Cover of Serious Python published by No Starch Press&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Impact on My Career&lt;/h2&gt;
&lt;p&gt;Writing these books significantly impacted my career and established me as an authority in the Python community. When I joined &lt;a href=&quot;https://datadoghq.com&quot;&gt;Datadog&lt;/a&gt; in 2019, I remember seeing my books casually lying around at the entrance of the Paris office.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/3664e543-72d2-4659-92e5-84d7328a74df_3456x3492.jpeg&quot; alt=&quot;Julien&apos;s books displayed at the Datadog Paris office entrance in 2019&quot; /&gt;
&lt;em&gt;My books chilling in the Datadog Paris office entrance in 2019&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This moment was a profound realization of the reach and influence of my work. Colleagues and peers often treated the content of my books as definitive guides. The books provided answers and insights so clearly that people often didn&apos;t feel the need to ask me questions about the topics I covered during interviews; they trusted my written word as a reliable source. This validation opened new opportunities and allowed me to connect with an extensive network of professionals who recognized and respected my expertise.&lt;/p&gt;
&lt;h2&gt;Connecting with the Community&lt;/h2&gt;
&lt;p&gt;Writing allowed me to talk to anyone, reach out to amazing hackers worldwide, and forge new friendships. Writing books was the best excuse to meet fantastic people and create new friendships. I discovered fantastic engineers and learned from their experiences while interviewing them. This journey has been incredibly rewarding, not just professionally but also personally, as I connected with a vibrant community of developers who share my passion for Python and open-source software.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/images/blog/07db89c4-6430-4a4c-b10b-61a84baff249_595x595.jpeg&quot; alt=&quot;Julien presenting his book on stage at PyCon FR 2017&quot; /&gt;
&lt;em&gt;Sharing the knowledge of my book on stage during PyCon FR 2017&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I also saw my books being translated into multiple languages, including Chinese and Korean.&lt;/p&gt;
&lt;p&gt;This feeling is awesome as it gives even more impact to your writing, knowing that your knowledge is spreading across the globe. Having your work translated and accessible to a wider audience is a great reward, and it emphasizes the importance and value of sharing knowledge on such a large scale.&lt;/p&gt;
&lt;h2&gt;The Joy of Writing&lt;/h2&gt;
&lt;p&gt;While writing is hard, it is also refreshing. Producing content that people love and are happy to recommend is a fantastic feeling. My golden rule was, and still is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Produce content that you&apos;d be happy consuming.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The rest then becomes history. This philosophy guided me through the writing process and ensured that my books remained valuable and relevant to readers.&lt;/p&gt;
&lt;h2&gt;A New Era of Writing&lt;/h2&gt;
&lt;p&gt;Reflecting on my writing journey, it resonates deeply with a post I wrote titled &quot;&lt;em&gt;&lt;a href=&quot;https://julien.danjou.info/p/i-used-to-write&quot;&gt;I used to write&lt;/a&gt;.&lt;/em&gt;” In that post, I shared my journey from writing extensively in my early years to facing the challenges of balancing life and work, decreasing my writing output. The desire to return to the keyboard lingered, and despite the rise of AI-generated content, I realized that authentic, human writing still holds immense value.&lt;/p&gt;
&lt;p&gt;Over the last year, I toyed with GPT, generating tons of content and using it to brainstorm, change sentences, and rewrite text. This experimentation reaffirmed my belief that AI could never truly replace the nuanced and creative process of human writing. As AI-generated content grows, the need for genuine, human-crafted writing becomes even more critical. This new writing era challenges us to strengthen our signal amidst the growing noise.&lt;/p&gt;
&lt;p&gt;The past ten years have been an incredible journey of learning, teaching, and connecting with developers worldwide. I look forward to continuing this journey, exploring new topics, and sharing my insights through future books and blog posts.&lt;/p&gt;
&lt;p&gt;Thank you for being a part of this journey.&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>Debugging C code on macOS</title><link>https://julien.danjou.info/blog/debugging-c-code-on-macoss/</link><guid isPermaLink="true">https://julien.danjou.info/blog/debugging-c-code-on-macoss/</guid><description>I started to write C 25 years ago now, with many different tools over the year. As many open source developers, I spent most of my life working with the GNU tools out there.  As I&apos;ve been using an App</description><pubDate>Thu, 11 Feb 2021 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I started to write C 25 years ago now, with many different tools over the year. As many open source developers, I spent most of my life working with the GNU tools out there.&lt;/p&gt;
&lt;p&gt;As I&apos;ve been using an Apple computer over the last years, I had to adapt to this environment and learn the tricks of the trade. Here are some of my notes so a search engine can index them — and I&apos;ll be able to find them later.&lt;/p&gt;
&lt;h2&gt;Debugger: lldb&lt;/h2&gt;
&lt;p&gt;I was used to `gdb` for most of years doing C. I never managed to install gdb correctly on macOS as it needs certificates, authorization, you name it, to work properly.&lt;/p&gt;
&lt;p&gt;macOS provides a native debugger named lldb, which really looks like gdb to me — it runs in a terminal with a prompt.&lt;/p&gt;
&lt;p&gt;I had to learn the few commands I mostly use, which are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;lldb -- myprogram -options&lt;/code&gt; to run the program with options&lt;/li&gt;
&lt;li&gt;&lt;code&gt;r&lt;/code&gt; to run the program&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bt&lt;/code&gt; or &lt;code&gt;bt N&lt;/code&gt; to get a backtrace of the latest N frames&lt;/li&gt;
&lt;li&gt;&lt;code&gt;f N&lt;/code&gt; to select frame N&lt;/li&gt;
&lt;li&gt;&lt;code&gt;p V&lt;/code&gt; to print some variable value or memory address&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those commands cover 99% of my use case with a debugger when writing C, so once I lost my old &lt;code&gt;gdb&lt;/code&gt; habits, I was good to go.&lt;/p&gt;
&lt;h2&gt;Debugging Memory Overflows&lt;/h2&gt;
&lt;h3&gt;On GNU/Linux&lt;/h3&gt;
&lt;p&gt;One of my favorite tools when writing C has always been &lt;a href=&quot;https://en.wikipedia.org/wiki/Electric_Fence&quot;&gt;Electric Fence&lt;/a&gt; (and &lt;a href=&quot;http://duma.sourceforge.net/&quot;&gt;DUMA&lt;/a&gt; more recently). It&apos;s a library that overrides the standard memory manipulation function (e.g., &lt;code&gt;malloc&lt;/code&gt;) and instantly makes the program crash when an out of memory error is produced, rather than corrupting the heap.&lt;/p&gt;
&lt;p&gt;Heap corruption issues are hard to debug without such tools as they can happen at any time and stay unnoticed for a while, crashing your program in a totally different location later.&lt;/p&gt;
&lt;p&gt;There&apos;s no need to compile your program with those libraries. By using the dynamic loader, you can preload them and overload the standard C library functions.&lt;/p&gt;
&lt;p&gt;My &lt;code&gt;gdb&lt;/code&gt; configuration has been sprinkle with my friends &lt;em&gt;efence&lt;/em&gt; and &lt;em&gt;duma&lt;/em&gt;, and I would activate them from &lt;code&gt;gdb&lt;/code&gt; easily with this configuration in &lt;code&gt;~/.gdbinit&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;define efence
        set environment EF_PROTECT_BELOW 0
        set environment EF_ALLOW_MALLOC_0 1
        set environment LD_PRELOAD /usr/lib/libefence.so.0.0
        echo Enabled Electric Fence\n
end
document efence
Enable memory allocation debugging through Electric Fence (efence(3)).
        See also nofence and underfence.
end

define underfence
        set environment EF_PROTECT_BELOW 1
        set environment EF_ALLOW_MALLOC_0 1
        set environment LD_PRELOAD /usr/lib/libefence.so.0.0
        echo Enabled Electric Fence for underflow detection\n
end
document underfence
Enable memory allocation debugging for underflows through Electric Fence
(efence(3)).
        See also nofence and efence.
end

define nofence
        unset environment LD_PRELOAD
        echo Disabled Electric Fence\n
end
document nofence
Disable memory allocation debugging through Electric Fence (efence(3)).
end

define duma
        set environment DUMA_PROTECT_BELOW 0
        set environment DYMA_ALLOW_MALLOC_0 1
        set environment LD_PRELOAD /usr/lib/libduma.so
        echo Enabled DUMA\n
end
document duma
Enable memory allocation debugging through DUMA (duma(3)).
        See also noduma and underduma.
end
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;On macOS&lt;/h3&gt;
&lt;p&gt;I&apos;ve been looking for equivalent features in macOS, and after many hours of research, I found out that this feature is shipped natively with &lt;code&gt;libgmalloc&lt;/code&gt;. It works in the same way, and &lt;a href=&quot;https://developer.apple.com/library/archive/documentation/Performance/Conceptual/ManagingMemory/Articles/MallocDebug.html&quot;&gt;its features are documented by Apple&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My &lt;code&gt;~/.lldbinit&lt;/code&gt;  file now contains the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;command alias gm _regexp-env DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This command alias allows enabling &lt;code&gt;gmalloc&lt;/code&gt; by just typing &lt;code&gt;gm&lt;/code&gt; at the lldb prompt and then &lt;code&gt;run&lt;/code&gt; the program again to see if it crashes with &lt;code&gt;gmalloc&lt;/code&gt; enabled.&lt;/p&gt;
&lt;h2&gt;Debugging CPython&lt;/h2&gt;
&lt;p&gt;It&apos;s not a mystery that I spend a lot of time writing Python code — that&apos;s the main reason I&apos;ve been doing C lately.&lt;/p&gt;
&lt;p&gt;When playing with CPython, it can be useful to, e.g., dump the content of &lt;code&gt;PyObject&lt;/code&gt; structs on the heap or get the Python backtrace.&lt;/p&gt;
&lt;p&gt;I&apos;ve been using &lt;a href=&quot;https://github.com/malor/cpython-lldb&quot;&gt;&lt;em&gt;cpython-lldb&lt;/em&gt;&lt;/a&gt; for this with great success. It adds a few bells and whistles when debugging CPython or extensions inside &lt;code&gt;lldb&lt;/code&gt;. For example, the alias &lt;code&gt;py-bt&lt;/code&gt; is handy to get the Python traceback of your calls rather than a bunch of cryptic C frames.&lt;/p&gt;
&lt;p&gt;Now, you should be ready to debug your nasty issues and memory problems on macOS efficiently!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Interview: The Performance of Python</title><link>https://julien.danjou.info/blog/interview-the-performance-of-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/interview-the-performance-of-python/</guid><description>Earlier this year, I was supposed to participate to dotPy, a one-day Python conference happening in Paris. This event has unfortunately been cancelled due to the COVID-19 pandemic.</description><pubDate>Mon, 11 May 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Earlier this year, I was supposed to participate to &lt;a href=&quot;https://dotpy.io&quot;&gt;dotPy&lt;/a&gt;, a one-day Python conference happening in Paris. This event has unfortunately been cancelled due to the COVID-19 pandemic.&lt;/p&gt;
&lt;p&gt;Both Victor Stinner and me were supposed to attend that event. Victor had prepared a presentation about Python performances, while I was planning on talking about profiling.&lt;/p&gt;
&lt;p&gt;Rather than being completely discouraged, Victor and I sat down (remotely) with Anne Laure from &lt;a href=&quot;https://www.welcometothejungle.com/en/collections/behind-the-code&quot;&gt;Behind the Code&lt;/a&gt; (a blog ran by Welcome to the Jungle, the organizers of the &lt;a href=&quot;https://dotpy.io&quot;&gt;dotPy&lt;/a&gt; conference).&lt;/p&gt;
&lt;p&gt;We discuss Python performance, profiling, speed, projects, problems, analysis, optimization and the GIL.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.welcometothejungle.com/en/articles/btc-performance-python&quot;&gt;You can read the interview here.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/05/image-5.png&quot; alt=&quot;Screenshot of the Behind the Code interview about Python performance&quot; /&gt;&lt;/p&gt;
</content:encoded><category>career</category><category>python</category></item><item><title>Attending FOSDEM 2020</title><link>https://julien.danjou.info/blog/attending-fosdem-2020/</link><guid isPermaLink="true">https://julien.danjou.info/blog/attending-fosdem-2020/</guid><description>This weekend, I&apos;ve been lucky to attend again the FOSDEM conference, one of the largest open-source conference out there.</description><pubDate>Thu, 06 Feb 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This weekend, I&apos;ve been lucky to attend again the &lt;a href=&quot;https://fosdem.org/2020/&quot;&gt;FOSDEM&lt;/a&gt; conference, one of the largest open-source conference out there.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/02/Screenshot-2020-02-05-at-15.54.48.png&quot; alt=&quot;Screenshot of the FOSDEM 2020 Python devroom schedule&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I had a talk scheduled in the &lt;a href=&quot;https://fosdem.org/2020/schedule/track/python/&quot;&gt;Python devroom&lt;/a&gt; on Saturday about &lt;a href=&quot;https://fosdem.org/2020/schedule/event/python2020_profiling/&quot;&gt;building a production-ready profiling in Python&lt;/a&gt;. This was a good overview of the work I&apos;ve been doing at &lt;a href=&quot;https://datadoghq.com&quot;&gt;Datadog&lt;/a&gt; for the last few months.&lt;/p&gt;
&lt;p&gt;The video and slides are available below.&lt;/p&gt;
&lt;p&gt;Your browser does not support the video tag.&lt;/p&gt;
&lt;p&gt;The talk went well, attended by a few hundred people. I had a few interesting exchanges with people being interested and having some ideas about improvement.&lt;/p&gt;
</content:encoded><category>python</category><category>talks</category></item><item><title>Python Logging with Datadog</title><link>https://julien.danjou.info/blog/python-logging-with-datadog/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-logging-with-datadog/</guid><description>At Mergify, we generate a pretty large amount of logs. Every time an event is received from GitHub for a particular pull request, our engine computes a new state for it.</description><pubDate>Mon, 03 Feb 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;At &lt;a href=&quot;https://mergify.io&quot;&gt;Mergify&lt;/a&gt;, we generate a pretty large amount of logs. Every time an event is received from GitHub for a particular pull request, our engine computes a new state for it. Doing so, it logs some informational statements about what it&apos;s doing — and any error that might happen.&lt;/p&gt;
&lt;p&gt;This information is precious to us. Without proper logging, it&apos;d be utterly impossible for us to debug any issue. As we needed to store and index our logs somewhere, we picked Datadog as our log storage provider.&lt;/p&gt;
&lt;p&gt;Datadog offers real-time indexing of our logs. The ability to search our records that fast is compelling as we&apos;re able to retrieve log about a GitHub repository or a pull request with a single click.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/01/Screenshot-2020-01-06-at-17.23.58.png&quot; alt=&quot;Our custom Datadog log facets&quot; /&gt;&lt;/p&gt;
&lt;p&gt;To achieve this result, we had to inject our Python application logs into Datadog. To set up the Python logging mechanism, we rely on &lt;a href=&quot;https://github.com/jd/daiquiri&quot;&gt;&lt;em&gt;daiquiri&lt;/em&gt;&lt;/a&gt;, a fantastic library I maintained for several years now. &lt;em&gt;Daiquiri&lt;/em&gt; leverages the regular Python &lt;code&gt;logging&lt;/code&gt; module, making its a no-brainer to set up and offering a few extra features.&lt;/p&gt;
&lt;p&gt;We recently added native support for the Datadog agent in &lt;em&gt;daiquiri&lt;/em&gt;, making it even more straightforward to log from your Python application.&lt;/p&gt;
&lt;h2&gt;Enabling log on the Datadog agent&lt;/h2&gt;
&lt;p&gt;Datadog has &lt;a href=&quot;https://docs.datadoghq.com/agent/logs/?tab=tailexistingfiles&quot;&gt;extensive documentation on how to configure its agent&lt;/a&gt;. This can be summarized to adding &lt;code&gt;logs_enabled: true&lt;/code&gt; in your agent configuration. Simple as that.&lt;/p&gt;
&lt;p&gt;You then need to create a new source for the agent. The easiest way to connect your application and the Datadog agent is using the TCP socket. Your application will write logs directly to the Datadog agent, which will forward the entries to Datadog backend.&lt;/p&gt;
&lt;p&gt;Create a configuration file in &lt;code&gt;conf.d/python.d/conf.yaml&lt;/code&gt; with the following content:&lt;/p&gt;
&lt;h2&gt;Setting up &lt;code&gt;daiquiri&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Once this is done, you need to configure your Python application to log to the TCP socket configured in the agent above.&lt;/p&gt;
&lt;p&gt;The Datadog agent expects logs in JSON format being sent, which is what &lt;em&gt;daiquiri&lt;/em&gt; does for you. Using JSON allows to embed any extra fields to leverage fast search and indexing. As &lt;em&gt;daiquiri&lt;/em&gt; provides native handling for extra fields, you&apos;ll be able to send those extra fields without trouble.&lt;/p&gt;
&lt;p&gt;First, list &lt;em&gt;daiquiri&lt;/em&gt; in your application dependency. Then, set up logging in your application this way:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import daiquiri

daiquiri.setup(
  outputs=[
    daiquiri.output.Datadog(),
  ],
  level=logging.INFO,
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This configuration logs to the default TCP destination &lt;code&gt;localhost:10518&lt;/code&gt; — though you can pass the &lt;code&gt;host&lt;/code&gt; and &lt;code&gt;port&lt;/code&gt; argument to change that. You can customize the outputs as you wish by checking out &lt;a href=&quot;https://daiquiri.readthedocs.io/en/latest/&quot;&gt;daiquiri documentation&lt;/a&gt;. For example, you could also include logging to &lt;code&gt;stdout&lt;/code&gt; by adding &lt;code&gt;daiquiri.output.Stream(sys.stdout)&lt;/code&gt; in the output list.&lt;/p&gt;
&lt;h2&gt;Using &lt;code&gt;extra&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;When using &lt;em&gt;daiquiri&lt;/em&gt;, you&apos;re free to use &lt;code&gt;logging.getLogger&lt;/code&gt; to get your regular logging object. However, by using the alternative &lt;code&gt;daiquiri.getLogger&lt;/code&gt; function, you&apos;re enabling the native use of extra arguments — which is quite handy. That means you can pass any arbitrary key/value to your log call, and see it up being embedded in your log data — up to Datadog.&lt;/p&gt;
&lt;p&gt;Here&apos;s an example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import daiquiri

[…]

log = daiquiri.getLogger(__name__)
log.info(&quot;User did something important&quot;, user=user, request_id=request_id)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The extra keyword argument passed to &lt;code&gt;log.info&lt;/code&gt; will be directly shown as attributes in Datadog logs:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/01/Screenshot-2020-01-06-at-18.22.04.png&quot; alt=&quot;One of the log line of our Mergify engine&quot; /&gt;&lt;/p&gt;
&lt;p&gt;All those attributes can then be used to search or to display custom views. This is really powerful to monitor and debug any kind of service.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/01/Screenshot-2020-01-06-at-18.39.05.png&quot; alt=&quot;Screenshot of Datadog log explorer showing custom attributes for search and display&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;A log object per object&lt;/h2&gt;
&lt;p&gt;When passing &lt;em&gt;extra&lt;/em&gt; arguments, it is easy to make mistakes and forget some. This especially can happen when your application wants to log information for a particular object.&lt;/p&gt;
&lt;p&gt;The best pattern to avoid this is to create a custom log object per object:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import daiquiri

class MyObject:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.log = daiquiri.getLogger(&quot;MyObject&quot;, x=self.x, y=self.y)

    def do_something(self):
        try:
            self.call_this()
        except Exception:
            self.log.error(&quot;Something bad happened&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By using the &lt;code&gt;self.log&lt;/code&gt; object as defined above, there&apos;s no way for your application to miss some extra fields for an object. All your logs will look in the same style and will end up being indexed correctly in Datadog.&lt;/p&gt;
&lt;h2&gt;Log Design&lt;/h2&gt;
&lt;p&gt;The &lt;em&gt;extra&lt;/em&gt; arguments from the Python loggers are often dismissed, and many developers stick to logging strings with various information included inside. Having a proper explanation string, plus a few extra key/value pairs that are parsable by machines and humans, is a better way to do logging. Leveraging engines such as Datadog allow to store and query those logs in a snap.&lt;/p&gt;
&lt;p&gt;This is way more efficient than trying to parse and grep strings yourselves!&lt;/p&gt;
</content:encoded><category>python</category><category>mergify</category><category>monitoring</category></item><item><title>Atomic lock-free counters in Python</title><link>https://julien.danjou.info/blog/atomic-lock-free-counters-in-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/atomic-lock-free-counters-in-python/</guid><description>At Datadog, we&apos;re really into metrics. We love them, we store them, but we also generate them. To do that, you need to juggle with integers that are incremented, also known as counters.</description><pubDate>Mon, 06 Jan 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;At &lt;a href=&quot;https://datadog.com&quot;&gt;Datadog&lt;/a&gt;, we&apos;re really into metrics. We love them, we store them, but we also &lt;em&gt;generate&lt;/em&gt; them. To do that, you need to juggle with integers that are incremented, also known as &lt;em&gt;counters&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;While having an integer that changes its value sounds dull, it might not be without some surprises in certain circumstances. Let&apos;s dive in.&lt;/p&gt;
&lt;h2&gt;The Straightforward Implementation&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;class SingleThreadCounter(object):
	def __init__(self):
    	self.value = 0
        
    def increment(self):
        self.value += 1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Pretty easy, right?&lt;/p&gt;
&lt;p&gt;Well, not so fast, buddy. As the class name implies, this works fine with a single-threaded application. Let&apos;s take a look at the instructions in the &lt;code&gt;increment&lt;/code&gt; method:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import dis
&amp;gt;&amp;gt;&amp;gt; dis.dis(&quot;self.value += 1&quot;)
  1           0 LOAD_NAME                0 (self)
              2 DUP_TOP
              4 LOAD_ATTR                1 (value)
              6 LOAD_CONST               0 (1)
              8 INPLACE_ADD
             10 ROT_TWO
             12 STORE_ATTR               1 (value)
             14 LOAD_CONST               1 (None)
             16 RETURN_VALUE
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;self.value +=1&lt;/code&gt; line of code generates 8 different operations for Python. Operations that could be interrupted at any time in their flow to switch to a different thread that could also increment the counter.&lt;/p&gt;
&lt;p&gt;Indeed, the &lt;code&gt;+=&lt;/code&gt; operation is not atomic: one needs to do a &lt;code&gt;LOAD_ATTR&lt;/code&gt; to read the current value of the counter, then an &lt;code&gt;INPLACE_ADD&lt;/code&gt; to add 1, to finally &lt;code&gt;STORE_ATTR&lt;/code&gt; to store the final result in the &lt;code&gt;value&lt;/code&gt; attribute.&lt;/p&gt;
&lt;p&gt;If another thread executes the same code at the same time, you could end up with adding 1 to an old value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Thread-1 reads the value as 23
Thread-1 adds 1 to 23 and get 24
Thread-2 reads the value as 23
Thread-1 stores 24 in value
Thread-2 adds 1 to 23
Thread-2 stores 24 in value
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Boom. Your &lt;code&gt;Counter&lt;/code&gt; class is not thread-safe. 😭&lt;/p&gt;
&lt;h2&gt;The Thread-Safe Implementation&lt;/h2&gt;
&lt;p&gt;To make this thread-safe, a &lt;em&gt;lock&lt;/em&gt; is necessary. We need a lock each time we want to increment the value, so we are sure the increments are done serially.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import threading

class FastReadCounter(object):
    def __init__(self):
        self.value = 0
        self._lock = threading.Lock()
        
    def increment(self):
        with self._lock:
            self.value += 1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This implementation is thread-safe. There is no way for multiple threads to increment the value at the same time, so there&apos;s no way that an increment is lost.&lt;/p&gt;
&lt;p&gt;The only downside of this counter implementation is that you need to lock the counter each time you need to increment. There might be much contention around this lock if you have many threads updating the counter.&lt;/p&gt;
&lt;p&gt;On the other hand, if it&apos;s barely updated and often read, this is an excellent implementation of a thread-safe counter.&lt;/p&gt;
&lt;h2&gt;A Fast Write Implementation&lt;/h2&gt;
&lt;p&gt;There&apos;s a way to implement a thread-safe counter in Python that does not need to be locked on write. It&apos;s a trick that should only work on CPython because of the &lt;em&gt;Global Interpreter Lock&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;While everybody is unhappy with it, this time, the GIL is going to help us. When a C function is executed and does not do any I/O, it cannot be interrupted by any other thread. It turns out there&apos;s a counter-like class implemented in Python: &lt;a href=&quot;https://docs.python.org/3/library/itertools.html#itertools.count&quot;&gt;itertools.count&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We can use this &lt;code&gt;count&lt;/code&gt; class as our advantage by avoiding the need to use a lock when incrementing the counter.&lt;/p&gt;
&lt;p&gt;If you read the documentation for &lt;code&gt;itertools.count&lt;/code&gt;, you&apos;ll notice that there&apos;s no way to read the current value of the counter. This is tricky, and this is where we&apos;ll need to use a lock to bypass this limitation. Here&apos;s the code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import itertools
import threading

class FastWriteCounter(object):
    def __init__(self):
        self._number_of_read = 0
        self._counter = itertools.count()
        self._read_lock = threading.Lock()

    def increment(self):
        next(self._counter)

    def value(self):
        with self._read_lock:
            value = next(self._counter) - self._number_of_read
            self._number_of_read += 1
        return value
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;increment&lt;/code&gt; code is quite simple in this case: the counter is just incremented without any lock. The GIL protects concurrent access to the internal data structure in C, so there&apos;s no need for us to lock anything.&lt;/p&gt;
&lt;p&gt;On the other hand, Python does not provide any way to read the value of an &lt;code&gt;itertools.count&lt;/code&gt; object. We need to use a small trick to get the current value. The &lt;code&gt;value&lt;/code&gt; method increments the counter and then gets the value while subtracting the number of times the counter has been read (and therefore incremented for nothing).&lt;/p&gt;
&lt;p&gt;This counter is, therefore, lock-free for writing, but not for reading. The opposite of our previous implementation&lt;/p&gt;
&lt;h2&gt;Measuring Performance&lt;/h2&gt;
&lt;p&gt;After writing all of this code, I wanted to make sure how the different implementations impacted speed. Using the &lt;a href=&quot;https://docs.python.org/3/library/timeit.html&quot;&gt;timeit&lt;/a&gt; module and my fancy laptop, I&apos;ve measured the performance of reading and writing to this counter.&lt;/p&gt;
&lt;p&gt;Operation&lt;/p&gt;
&lt;p&gt;SingleThreadCounter&lt;/p&gt;
&lt;p&gt;FastReadCounter&lt;/p&gt;
&lt;p&gt;FastWriteCounter&lt;/p&gt;
&lt;p&gt;&lt;code&gt;increment&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;176 ns&lt;/p&gt;
&lt;p&gt;390 ns&lt;/p&gt;
&lt;p&gt;169 ns&lt;/p&gt;
&lt;p&gt;&lt;code&gt;value&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;26 ns&lt;/p&gt;
&lt;p&gt;26 ns&lt;/p&gt;
&lt;p&gt;529 ns&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Benchmark table comparing counter performance for read and increment operations&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I&apos;m glad that the performance measurements in practice match the theory 😅. Both &lt;code&gt;SingleThreadCounter&lt;/code&gt; and &lt;code&gt;FastReadCounter&lt;/code&gt; have the same performance for reading. Since they use a simple variable read, it makes absolute sense.&lt;/p&gt;
&lt;p&gt;The same goes for &lt;code&gt;SingleThreadCounter&lt;/code&gt; and &lt;code&gt;FastWriteCounter&lt;/code&gt;, which have the same performance for incrementing the counter. Again they&apos;re using the same kind of lock-free code to add 1 to an integer, making the code fast.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;It&apos;s pretty obvious, but if you&apos;re using a single-threaded application and do not have to care about concurrent access, you should stick to using a simple incremented integer.&lt;/p&gt;
&lt;p&gt;For fun, I&apos;ve published a Python package named &lt;a href=&quot;https://pypi.org/project/fastcounter/&quot;&gt;fastcounter&lt;/a&gt; that provides those classes. The &lt;a href=&quot;https://github.com/jd/fastcounter&quot;&gt;sources are available on GitHub&lt;/a&gt;. Enjoy!&lt;/p&gt;
</content:encoded><category>python</category><category>coding</category></item><item><title>Finding definitions from a source file and a line number in Python</title><link>https://julien.danjou.info/blog/finding-definitions-from-a-source-file-and-a-line-number-in-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/finding-definitions-from-a-source-file-and-a-line-number-in-python/</guid><description>My job at Datadog keeps me busy with new and questioning challenges. I recently stumbled upon a problem that sounded easy but was more difficult than I imagined.</description><pubDate>Mon, 04 Nov 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;My job at &lt;a href=&quot;https://datadog.com&quot;&gt;Datadog&lt;/a&gt; keeps me busy with new and questioning challenges. I recently stumbled upon a problem that sounded easy but was more difficult than I imagined.&lt;/p&gt;
&lt;p&gt;Here&apos;s the thing: considering a filename and a line number, can you tell which function, method or class this line of code belongs to?&lt;/p&gt;
&lt;p&gt;I started to dig into the standard library, but I did not find anything solving this problem. It sounded like I had to write this myself.&lt;/p&gt;
&lt;p&gt;The first steps sound easy. Open a file, read it, find the line number. Right.&lt;/p&gt;
&lt;p&gt;Then, how do you know which functions this line is in? You don&apos;t, expect if you parse the whole file and keep tracks of function definitions. A regular expression parsing each line might be a solution?&lt;/p&gt;
&lt;p&gt;Well, you had to be careful as function definitions can span multiple lines.&lt;/p&gt;
&lt;h2&gt;Using the AST&lt;/h2&gt;
&lt;p&gt;I decided that a good and robust strategy was not going to use manual parsing or the like, but using Python abstract syntax tree (AST) directly. By leveraging Python&apos;s own parsing code, I was sure I was not going to fail while parsing a Python source file.&lt;/p&gt;
&lt;p&gt;This can be simply be accomplished with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import ast

def parse_file(filename):
    with open(filename) as f:
        return ast.parse(f.read(), filename=filename)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And you&apos;re done. Are you? No, because that only works in 99.99% of the case. If your source file is using an encoding that is now ASCII or UTF-8, then the function fails. I know you think I&apos;m crazy to think about this but I like my code to be robust.&lt;/p&gt;
&lt;p&gt;It turns out Python has a cookie to specify the encoding in the form of &lt;code&gt;# encoding: utf-8&lt;/code&gt; as defined in &lt;a href=&quot;https://www.python.org/dev/peps/pep-0263/&quot;&gt;PEP 263&lt;/a&gt;. Reading this cookie would help to find the encoding.&lt;/p&gt;
&lt;p&gt;To do that, we need to open the file in binary mode, use a regular expression to match the data, and… Well, it&apos;s dull, and somebody already implemented it for us so let&apos;s use the fantastic &lt;code&gt;[tokenize.open](https://docs.python.org/3/library/tokenize.html#tokenize.open)&lt;/code&gt; function provided by Python:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import ast
import tokenize

def parse_file(filename):
    with tokenize.open(filename) as f:
        return ast.parse(f.read(), filename=filename)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That should work in 100% of the time. Until proven otherwise.&lt;/p&gt;
&lt;h2&gt;Browsing the AST&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;parse_file&lt;/code&gt; function now returns a Python AST. If you never played with Python AST, it&apos;s a gigantic tree that represents your source code just before it is compiled down to Python bytecode.&lt;/p&gt;
&lt;p&gt;In the tree, there should be statements and expression. In our case, we&apos;re interested in finding the function definition that is the closest to our line number. Here&apos;s an implementation of that function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def filename_and_lineno_to_def(filename, lineno):
    candidate = None
    for item in ast.walk(parse_file(filename)):
        if isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
            if item.lineno &amp;gt; lineno:
                # Ignore whatever is after our line
                continue
            if candidate:
                distance = lineno - item.lineno
                if distance &amp;lt; (lineno - candidate.lineno):
                    candidate = item
            else:
                candidate = item

    if candidate:
        return candidate.name
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This iterates over all the node of the AST and returns the node where the line number is the closest to our definition. If we have a file that contains:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class A(object):
    X = 1
    def y(self):
        return 42
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;the function &lt;code&gt;filename_and_lineno_to_def&lt;/code&gt; returns for the lines 1 to 5:&lt;/p&gt;
&lt;p&gt;It works!&lt;/p&gt;
&lt;h2&gt;Closures?&lt;/h2&gt;
&lt;p&gt;The naive approach described earlier likely works for 90% of your code, but there are some edge cases. For example, when defining function closures, the above algorithm fails. With the following code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class A(object):
   X = 1
   def y(self):
       def foo():
           return 42
       return foo
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;the function &lt;code&gt;filename_and_lineno_to_def&lt;/code&gt; returns for lines 1 to 7:&lt;/p&gt;
&lt;p&gt;Oops. Clearly, lines 6 and 7 do not belong to the &lt;code&gt;foo&lt;/code&gt; function. Our approach is too naive to see that starting at line 6, we&apos;re back in the &lt;code&gt;y&lt;/code&gt; method.&lt;/p&gt;
&lt;h2&gt;Interval Trees&lt;/h2&gt;
&lt;p&gt;The correct way of handling that is to consider each function definition as an interval:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/06/interval-tree.png&quot; alt=&quot;Piece of code seen as interval.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Whatever the line number we request is, we should return the node that is responsible for the smallest interval that the line is in.&lt;/p&gt;
&lt;p&gt;What we need in this case is a correct data structure to solve our problem: an &lt;a href=&quot;https://en.wikipedia.org/wiki/Interval_tree&quot;&gt;interval tree&lt;/a&gt; fits perfectly our use case. It allows for searching rapidly pieces of code that match our line number.&lt;/p&gt;
&lt;p&gt;To solve our problem we need several things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A way to compute the beginning and end line numbers for a function.&lt;/li&gt;
&lt;li&gt;A tree that is fed with the intervals we computed just before.&lt;/li&gt;
&lt;li&gt;A way to select the best matching intervals if a line is part of several functions (closure).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Computing Function Intervals&lt;/h2&gt;
&lt;p&gt;The interval of a function is the first and last lines that compose its body. It&apos;s pretty easy to find those by walking through the function AST node:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def _compute_interval(node):
    min_lineno = node.lineno
    max_lineno = node.lineno
    for node in ast.walk(node):
        if hasattr(node, &quot;lineno&quot;):
            min_lineno = min(min_lineno, node.lineno)
            max_lineno = max(max_lineno, node.lineno)
    return (min_lineno, max_lineno + 1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Given any AST node, the function returns a tuple of the first and last line number of that node.&lt;/p&gt;
&lt;h2&gt;Building The Tree&lt;/h2&gt;
&lt;p&gt;Rather than implementing an interval tree, we&apos;ll use the &lt;a href=&quot;https://pypi.org/project/intervaltree/&quot;&gt;intervaltree&lt;/a&gt; library. We need to create a tree and feed it with the computed interval:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def file_to_tree(filename):
    with tokenize.open(filename) as f:
        parsed = ast.parse(f.read(), filename=filename)
    tree = intervaltree.IntervalTree()
    for node in ast.walk(parsed):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
            start, end = _compute_interval(node)
            tree[start:end] = node
    return tree
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here you go: the function parses the Python file passed as an argument and converts it to its AST representation. It then walks it and feeds the interval tree with every class and function definition.&lt;/p&gt;
&lt;h2&gt;Querying the Tree&lt;/h2&gt;
&lt;p&gt;Now that the tree is built, it should be queried with the line number. This is pretty simple:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;matches = file_to_tree(filename)[lineno]
if matches:
    return min(matches, key=lambda i: i.length()).data.name
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The build tree might return several matches if there are several intervals containing our line number. In that case, we pick the smallest interval and return the name of the node — which is our class or function name!&lt;/p&gt;
&lt;h2&gt;Mission Success&lt;/h2&gt;
&lt;p&gt;We did it! We started with a naive approach and iterated to a final solution covering 100% of our cases. Picking the right data structure, interval trees here, helped us solving this in an intelligent approach.&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Sending Emails in Python — Tutorial with Code Examples</title><link>https://julien.danjou.info/blog/sending-emails-in-python-tutorial-code-examples/</link><guid isPermaLink="true">https://julien.danjou.info/blog/sending-emails-in-python-tutorial-code-examples/</guid><description>What do you need to send an email with Python? Some basic programming and web knowledge along with the elementary Python skills. I assume you’ve already had a web app built with this language and now</description><pubDate>Tue, 15 Oct 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;What do you need to send an email with Python? Some basic programming and web knowledge along with the elementary Python skills. I assume you’ve already had a web app built with this language and now you need to extend its functionality with notifications or other emails sending. This tutorial will guide you through the most essential steps of sending emails via an SMTP server:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configuring a server for testing (do you know why it’s important?)&lt;/li&gt;
&lt;li&gt;Local SMTP server&lt;/li&gt;
&lt;li&gt;Mailtrap test SMTP server&lt;/li&gt;
&lt;li&gt;Different types of emails: HTML, with images, and attachments&lt;/li&gt;
&lt;li&gt;Sending multiple personalized emails (Python is just invaluable for email automation)&lt;/li&gt;
&lt;li&gt;Some popular email sending options like Gmail and transactional email services&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Served with numerous code examples written and tested on Python 3.7!&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Sending an email using an SMTP&lt;/h3&gt;
&lt;p&gt;The first good news about Python is that it has a built-in module for sending emails via SMTP in its standard library. No extra installations or tricks are required. You can import the module using the following statement:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import smtplib
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To make sure that the module has been imported properly and get the full description of its classes and arguments, type in an interactive Python session:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;help(smtplib)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At our next step, we will talk a bit about servers: choosing the right option and configuring it.&lt;/p&gt;
&lt;h4&gt;An SMTP server for testing emails in Python&lt;/h4&gt;
&lt;p&gt;When creating a new app or adding any functionality, especially when doing it for the first time, it’s essential to experiment on a test server. Here is a brief list of reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You won’t hit your friends’ and customers’ inboxes. This is vital when you test bulk email sending or work with an email database.&lt;/li&gt;
&lt;li&gt;You won’t flood your own inbox with testing emails.&lt;/li&gt;
&lt;li&gt;Your domain won’t be blacklisted for spam.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;Local SMTP server&lt;/h4&gt;
&lt;p&gt;If you prefer working in the local environment, the local SMTP debugging server might be an option. For this purpose, Python offers an &lt;em&gt;smtpd&lt;/em&gt; module. It has a &lt;code&gt;DebuggingServer&lt;/code&gt; feature, which will discard messages you are sending out and will print them to &lt;code&gt;stdout&lt;/code&gt;. It is compatible with all operations systems.&lt;/p&gt;
&lt;p&gt;Set your SMTP server to &lt;em&gt;localhost:1025&lt;/em&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;python -m smtpd -n -c DebuggingServer localhost:1025
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In order to run SMTP server on port 25, you’ll need root permissions:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo python -m smtpd -n -c DebuggingServer localhost:25
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It will help you verify whether your code is working and point out the possible problems if there are any. However, it won’t give you the opportunity to check how your HTML email template is rendered.&lt;/p&gt;
&lt;h4&gt;Fake SMTP server&lt;/h4&gt;
&lt;p&gt;Fake SMTP server imitates the work of a real 3rd party web server. In further examples in this post, we will use &lt;a href=&quot;https://mailtrap.io&quot;&gt;Mailtrap&lt;/a&gt;. Beyond testing email sending, it will let us check how the email will  be rendered and displayed, review the message raw data as well as will provide us with a spam report. Mailtrap is very easy to set up: you will need just copy the credentials generated by the app and paste them into your code.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/xBVM7uyt4Q6mtpLLTiCBze9lNV-dpkO2rMBLSazZ9gb8LImFDgzZWVIOTCtke87LBixqrsJF-pii7usO3ezPbgjOWGRj7isa_ap2-EXK5GiHmSz4mtwenUIi-f_s05CfxQJoHGvl&quot; alt=&quot;Screenshot of Mailtrap fake SMTP server setup interface&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Here is how it looks in practice:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import smtplib

port = 2525
smtp_server = &quot;smtp.mailtrap.io&quot;
login = &quot;1a2b3c4d5e6f7g&quot; # your login generated by Mailtrap
password = &quot;1a2b3c4d5e6f7g&quot; # your password generated by Mailtrap
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Mailtrap makes things even easier. Go to the &lt;em&gt;Integrations&lt;/em&gt; section in the SMTP settings tab and get the ready-to-use template of the simple message, with your Mailtrap credentials in it. It is the most basic option of instructing your Python script on who sends what to who is the &lt;em&gt;sendmail()&lt;/em&gt; instance method:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/eKUJ__R4SYnY5jdvPiPPucHnoaOMBUxHZIu0DT2NjnTMU2FhvObBzqVN-qgCOTeSIm7yc_ifUAe5a0RofkbNdxOqNrzAw1icea4c9WIyb6NGk8KMmIvctLgUPlblmzFMSeaRnbGQ&quot; alt=&quot;Screenshot of Mailtrap integration settings with Python SMTP credentials&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The code looks pretty straightforward, right? Let’s take a closer look at it and add some error handling (see the comments in between). To catch errors, we use the &lt;code&gt;try&lt;/code&gt; and &lt;code&gt;except&lt;/code&gt; blocks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## The first step is always the same: import all necessary components:
import smtplib
from socket import gaierror

## Now you can play with your code. Let’s define the SMTP server separately here:
port = 2525
smtp_server = &quot;smtp.mailtrap.io&quot;
login = &quot;1a2b3c4d5e6f7g&quot; # paste your login generated by Mailtrap
password = &quot;1a2b3c4d5e6f7g&quot; # paste your password generated by Mailtrap

## Specify the sender’s and receiver’s email addresses:
sender = &quot;from@example.com&quot;
receiver = &quot;mailtrap@example.com&quot;

## Type your message: use two newlines (\n) to separate the subject from the message body, and use &apos;f&apos; to  automatically insert variables in the text
message = f&quot;&quot;&quot;\
Subject: Hi Mailtrap
To: {receiver}
From: {sender}
This is my first message with Python.&quot;&quot;&quot;

try:
  # Send your message with credentials specified above
  with smtplib.SMTP(smtp_server, port) as server:
    server.login(login, password)
    server.sendmail(sender, receiver, message)
except (gaierror, ConnectionRefusedError):
  # tell the script to report if your message was sent or which errors need to be fixed
  print(&apos;Failed to connect to the server. Bad connection settings?&apos;)
except smtplib.SMTPServerDisconnected:
  print(&apos;Failed to connect to the server. Wrong user/password?&apos;)
except smtplib.SMTPException as e:
  print(&apos;SMTP error occurred: &apos; + str(e))
else:
  print(&apos;Sent&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once you get the &lt;em&gt;Sent&lt;/em&gt; result in Shell, you should see your message in your Mailtrap inbox:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/xCCQOuWFyqmvbiOaLa7VgYyBdCu5c2q5oXzyn2aeFFE8tkfbUvDwi_H19fSNAempeUWIoDuHVn5ETqr34lO8WkT8vZh8iJVChjnCZgoAA3TsTJF2n32sGUl1GX89WcYUdChJZ2Ux&quot; alt=&quot;Screenshot of a test email received in the Mailtrap inbox&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Sending emails with HTML content&lt;/h3&gt;
&lt;p&gt;In most cases, you need to add some formatting, links, or images to your email notifications. We can simply put all of these with the HTML content. For this purpose, Python has an &lt;em&gt;email&lt;/em&gt; package.&lt;/p&gt;
&lt;p&gt;We will deal with the MIME message type, which is able to combine HTML and plain text. In Python, it is handled by the &lt;em&gt;email.mime&lt;/em&gt; module.&lt;/p&gt;
&lt;p&gt;It is better to write a text version and an HTML version separately, and then merge them with the &lt;code&gt;MIMEMultipart(&quot;alternative&quot;)&lt;/code&gt; instance. It means that such a message has two rendering options accordingly. In case an HTML isn’t be rendered successfully for some reason, a text version will still be available.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

port = 2525
smtp_server = &quot;smtp.mailtrap.io&quot;
login = &quot;1a2b3c4d5e6f7g&quot; # paste your login generated by Mailtrap
password = &quot;1a2b3c4d5e6f7g&quot; # paste your password generated by Mailtrap

sender_email = &quot;mailtrap@example.com&quot;
receiver_email = &quot;new@example.com&quot;

message = MIMEMultipart(&quot;alternative&quot;)
message[&quot;Subject&quot;] = &quot;multipart test&quot;
message[&quot;From&quot;] = sender_email
message[&quot;To&quot;] = receiver_email
## Write the plain text part
text = &quot;&quot;&quot;\ Hi, Check out the new post on the Mailtrap blog: SMTP Server for Testing: Cloud-based or Local? https://blog.mailtrap.io/2018/09/27/cloud-or-local-smtp-server/ Feel free to let us know what content would be useful for you!&quot;&quot;&quot;

## write the HTML part
html = &quot;&quot;&quot;\ &amp;lt;html&amp;gt; &amp;lt;body&amp;gt; &amp;lt;p&amp;gt;Hi,&amp;lt;br&amp;gt; Check out the new post on the Mailtrap blog:&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&amp;lt;a href=&quot;https://blog.mailtrap.io/2018/09/27/cloud-or-local-smtp-server&quot;&amp;gt;SMTP Server for Testing: Cloud-based or Local?&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt; Feel free to &amp;lt;strong&amp;gt;let us&amp;lt;/strong&amp;gt; know what content would be useful for you!&amp;lt;/p&amp;gt; &amp;lt;/body&amp;gt; &amp;lt;/html&amp;gt; &quot;&quot;&quot;

## convert both parts to MIMEText objects and add them to the MIMEMultipart message
part1 = MIMEText(text, &quot;plain&quot;)
part2 = MIMEText(html, &quot;html&quot;)
message.attach(part1)
message.attach(part2)

## send your email
with smtplib.SMTP(&quot;smtp.mailtrap.io&quot;, 2525) as server:
  server.login(login, password)
  server.sendmail( sender_email, receiver_email, message.as_string() )

print(&apos;Sent&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://lh6.googleusercontent.com/jRh9dfieiWa1JAH1o5eb62Pv4wPUgIPGWpyz5RJkcFaflS-JnWJ7nQfdkr5hp87iOoDT-dx9WyvPwngJsvQnMoe9iKqa7jg6hDklFOxaLeftGqNp8MgtE8YDS13UmLLkBeee5cPT&quot; alt=&quot;The resulting output&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Sending Emails with Attachments in Python&lt;/h3&gt;
&lt;p&gt;The next step in mastering sending emails with Python is attaching files. Attachments are still the MIME objects but we need to encode them with the &lt;em&gt;base64&lt;/em&gt; module. A couple of important points about the attachments:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Python lets you attach text files, images, audio files, and even applications. You just need to use the appropriate email class like &lt;code&gt;email.mime.audio.MIMEAudio&lt;/code&gt; or &lt;code&gt;email.mime.image.MIMEImage&lt;/code&gt;&lt;em&gt;.&lt;/em&gt; For the full information, refer to &lt;a href=&quot;https://docs.python.org/3/library/email.mime.html&quot;&gt;this section&lt;/a&gt; of the Python documentation.&lt;/li&gt;
&lt;li&gt;Remember about the file size: sending files over 20MB is a bad practice.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In transactional emails, the PDF files are the most frequently used: we usually get receipts, tickets, boarding passes, order confirmations, etc. So let’s review how to send a boarding pass as a PDF file.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import smtplib
from email import encoders
from email.mime.base import MIMEBase
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

port = 2525
smtp_server = &quot;smtp.mailtrap.io&quot;
login = &quot;1a2b3c4d5e6f7g&quot; # paste your login generated by Mailtrap
password = &quot;1a2b3c4d5e6f7g&quot; # paste your password generated by Mailtrap

subject = &quot;An example of boarding pass&quot;
sender_email = &quot;mailtrap@example.com&quot;
receiver_email = &quot;new@example.com&quot;

message = MIMEMultipart()
message[&quot;From&quot;] = sender_email
message[&quot;To&quot;] = receiver_email
message[&quot;Subject&quot;] = subject

## Add body to email
body = &quot;This is an example of how you can send a boarding pass in attachment with Python&quot;
message.attach(MIMEText(body, &quot;plain&quot;))

filename = &quot;yourBP.pdf&quot;
## Open PDF file in binary mode
## We assume that the file is in the directory where you run your Python script from
with open(filename, &quot;rb&quot;) as attachment:
## The content type &quot;application/octet-stream&quot; means that a MIME attachment is a binary file
part = MIMEBase(&quot;application&quot;, &quot;octet-stream&quot;)
part.set_payload(attachment.read())
## Encode to base64
encoders.encode_base64(part)
## Add header
part.add_header(&quot;Content-Disposition&quot;, f&quot;attachment; filename= {filename}&quot;)
## Add attachment to your message and convert it to string
message.attach(part)

text = message.as_string()
## send your email
with smtplib.SMTP(&quot;smtp.mailtrap.io&quot;, 2525) as server:
  server.login(login, password)
  server.sendmail(sender_email, receiver_email, text)

print(&apos;Sent&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/xxqg_Ro8uggpJxjCKMmukQ2jJmwDeXasadM5HA0LeOUktOPYc-0iXp2xQZHkILyfdFWroJEz-UqgTr_zBEKISuydHmoqCAPrvikrC23VgCDawHBVH-9-ufmmfF556nsU-1vPJ2Ng&quot; alt=&quot;The received email with your PDF&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To attach several files&lt;/strong&gt;, you can call the &lt;code&gt;message.attach()&lt;/code&gt; method several times.&lt;/p&gt;
&lt;h4&gt;How to send an email with image attachment&lt;/h4&gt;
&lt;p&gt;Images, even if they are a part of the message body, are attachments as well. There are three types of them: CID attachments (embedded as a MIME object), &lt;em&gt;base64&lt;/em&gt; images (inline embedding), and linked images.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For adding a CID attachment,&lt;/strong&gt; we will create a MIME multipart message with &lt;code&gt;MIMEImage&lt;/code&gt; component:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import smtplib
from email.mime.text import MIMEText
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart

port = 2525
smtp_server = &quot;smtp.mailtrap.io&quot;
login = &quot;1a2b3c4d5e6f7g&quot; # paste your login generated by Mailtrap
password = &quot;1a2b3c4d5e6f7g&quot; # paste your password generated by Mailtrap

sender_email = &quot;mailtrap@example.com&quot;
receiver_email = &quot;new@example.com&quot;

message = MIMEMultipart(&quot;alternative&quot;)
message[&quot;Subject&quot;] = &quot;CID image test&quot;
message[&quot;From&quot;] = sender_email
message[&quot;To&quot;] = receiver_email

## write the HTML part
html = &quot;&quot;&quot;\
&amp;lt;html&amp;gt;
&amp;lt;body&amp;gt;
&amp;lt;img src=&quot;cid:myimage&quot;&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&quot;&quot;&quot;
part = MIMEText(html, &quot;html&quot;)
message.attach(part)

## We assume that the image file is in the same directory that you run your Python script from
with open(&apos;mailtrap.jpg&apos;, &apos;rb&apos;) as img:
  image = MIMEImage(img.read())
## Specify the  ID according to the img src in the HTML part
image.add_header(&apos;Content-ID&apos;, &apos;&amp;lt;myimage&amp;gt;&apos;)
message.attach(image)

## send your email
with smtplib.SMTP(&quot;smtp.mailtrap.io&quot;, 2525) as server:
  server.login(login, password)
  server.sendmail(sender_email, receiver_email, message.as_string())

print(&apos;Sent&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/VzdSmA1lJli_ZX_m6KmJW7VW-am20z5Vr_RUxJP5ZHxC72fRImhDuZxEXV0o2mDr09JTEMzPykskHKWh1DuMLF_yoKl5eIsMiKpebmILpvYioDGfzU70hFjfxFIu-fPVZqWF7vc8&quot; alt=&quot;The received email with CID image&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The CID image is shown both as a part of the HTML message and as an attachment. Messages with this image type are often considered spam: check the &lt;em&gt;Analytics&lt;/em&gt; tab in Mailtrap to see the spam rate and recommendations on its improvement. Many email clients — Gmail in particular — don’t display CID images in most cases. So let’s review &lt;strong&gt;how to embed a base64 encoded image instead.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here we will use &lt;em&gt;base64&lt;/em&gt; module and experiment with the same image file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import base64

port = 2525
smtp_server = &quot;smtp.mailtrap.io&quot;
login = &quot;1a2b3c4d5e6f7g&quot; # paste your login generated by Mailtrap
password = &quot;1a2b3c4d5e6f7g&quot; # paste your password generated by Mailtrap
sender_email = &quot;mailtrap@example.com&quot;
receiver_email = &quot;new@example.com&quot;

message = MIMEMultipart(&quot;alternative&quot;)
message[&quot;Subject&quot;] = &quot;inline embedding&quot;
message[&quot;From&quot;] = sender_email
message[&quot;To&quot;] = receiver_email

## We assume that the image file is in the same directory that you run your Python script from
with open(&quot;image.jpg&quot;, &quot;rb&quot;) as image:
  encoded = base64.b64encode(image.read()).decode()

html = f&quot;&quot;&quot;\
&amp;lt;html&amp;gt;
&amp;lt;body&amp;gt;
&amp;lt;img src=&quot;data:image/jpg;base64,{encoded}&quot;&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&quot;&quot;&quot;
part = MIMEText(html, &quot;html&quot;)
message.attach(part)

## send your email
with smtplib.SMTP(&quot;smtp.mailtrap.io&quot;, 2525) as server:
  server.login(login, password)
  server.sendmail(sender_email, receiver_email, message.as_string())

print(&apos;Sent&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/zMSMgzypDp3lL1o1M21RB1nr6Dcc5Tekq8ucJktZqzWHynM8-YR2I4Ze6Rp7TkHtDxmcfYMyZXe1F_5sQihWL7kwpEFmQhnCRrDhe9aPjlJ0E7FzmdNvvibUOIU2yGqqC3U3ULEl&quot; alt=&quot;A base64 encoded image&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Now the image is embedded into the HTML message and is not available as an attached file. Python has encoded our JPEG image, and if we go to the &lt;em&gt;HTML Source&lt;/em&gt; tab, we will see the long image data string in the &lt;code&gt;img src&lt;/code&gt; attribute.&lt;/p&gt;
&lt;h3&gt;How to Send Multiple Emails&lt;/h3&gt;
&lt;p&gt;Sending multiple emails to different recipients and making them personal is the special thing about emails in Python.&lt;/p&gt;
&lt;p&gt;To add several more recipients, you can just type their addresses in separated by a comma, add &lt;code&gt;Cc&lt;/code&gt; and &lt;code&gt;Bcc&lt;/code&gt;. But if you work with a bulk email sending, Python will save you with loops.&lt;/p&gt;
&lt;p&gt;One of the options is to create a database in a &lt;em&gt;CSV&lt;/em&gt; format (we assume it is saved to the same folder as your Python script).&lt;/p&gt;
&lt;p&gt;We often see our names in transactional or even promotional examples. Here is how we can make it with Python.&lt;/p&gt;
&lt;p&gt;Let’s organize the list in a simple table with just two columns: name and email address. It should look like the following example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#name,email
John Johnson,john@johnson.com
Peter Peterson,peter@peterson.com
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code below will open the file and loop over its rows line by line, replacing the &lt;code&gt;{name}&lt;/code&gt; with the value from the “name” column.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import csv
import smtplib

port = 2525
smtp_server = &quot;smtp.mailtrap.io&quot;
login = &quot;1a2b3c4d5e6f7g&quot; # paste your login generated by Mailtrap
password = &quot;1a2b3c4d5e6f7g&quot; # paste your password generated by Mailtrap

message = &quot;&quot;&quot;Subject: Order confirmation
To: {recipient}
From: {sender}
Hi {name}, thanks for your order! We are processing it now and will contact you soon&quot;&quot;&quot;
sender = &quot;new@example.com&quot;
with smtplib.SMTP(&quot;smtp.mailtrap.io&quot;, 2525) as server:
  server.login(login, password)
  with open(&quot;contacts.csv&quot;) as file:
  reader = csv.reader(file)
  next(reader)  # it skips the header row
  for name, email in reader:
    server.sendmail(
      sender,
      email,
      message.format(name=name, recipient=email, sender=sender),
    )
    print(f&apos;Sent to {name}&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In our Mailtrap inbox, we see two messages: one for John Johnson and another for Peter Peterson, delivered simultaneously:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/Q6fRy7tMexzLqSsfAwEDdkZVh7Onb4impsLJkqLs40HsuVo43JV0eAUjJiWvxf-L0t9vdoTgEfeiN3MYX0wBU0vUKVZCRbmstlHk2RqvQWnPqr9WJbMX7LciUO9ebj89B5UZLrLd&quot; alt=&quot;Screenshot of Mailtrap inbox showing two emails sent to different recipients&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Sending emails with Python via Gmail&lt;/p&gt;
&lt;p&gt;When you are ready for sending emails to real recipients, you can configure your production server. It also depends on your needs, goals, and preferences: your localhost or any external SMTP.&lt;/p&gt;
&lt;p&gt;One of the most popular options is Gmail so let’s take a closer look at it.&lt;/p&gt;
&lt;p&gt;We can often see titles like “How to set up a Gmail account for development”. In fact, it means that you will create a new Gmail account and will use it for a particular purpose.&lt;/p&gt;
&lt;p&gt;To be able to send emails via your Gmail account, you need to provide access to it for your application. You can &lt;a href=&quot;https://myaccount.google.com/lesssecureapps&quot;&gt;&lt;em&gt;Allow less secure apps&lt;/em&gt;&lt;/a&gt; or take advantage of the &lt;a href=&quot;https://developers.google.com/gmail/api/quickstart/python&quot;&gt;OAuth2 authorization protocol&lt;/a&gt;. It’s a way more difficult but recommended due to the security reasons.&lt;/p&gt;
&lt;p&gt;Further, to use a Gmail server, you need to know:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the server name = &lt;em&gt;smtp.gmail.com&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;port = &lt;em&gt;465 for SSL/TLS&lt;/em&gt; connection (preferred)&lt;/li&gt;
&lt;li&gt;or port = &lt;em&gt;587 for STARTTLS&lt;/em&gt; connection&lt;/li&gt;
&lt;li&gt;username = your Gmail email address&lt;/li&gt;
&lt;li&gt;password = your password&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;import smtplib
import ssl

port = 465
password = input(&quot;your password&quot;)
context = ssl.create_default_context()

with smtplib.SMTP_SSL(&quot;smtp.gmail.com&quot;, port, context=context) as server:
  server.login(&quot;my@gmail.com&quot;, password)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you tend to simplicity, then you can use &lt;a href=&quot;https://pypi.org/project/yagmail/&quot;&gt;Yagmail&lt;/a&gt;, the dedicated Gmail/SMTP. It makes email sending really easy. Just compare the above examples with these several lines of code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import yagmail

yag = yagmail.SMTP()
contents = [
&quot;This is the body, and here is just text http://somedomain/image.png&quot;,
&quot;You can find an audio file attached.&quot;, &apos;/local/path/to/song.mp3&apos;
]
yag.send(&apos;to@someone.com&apos;, &apos;subject&apos;, contents)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Next steps with Python&lt;/h3&gt;
&lt;p&gt;Those are just basic options of sending emails with Python. To get great results, review the Python documentation and experiment with your own code!&lt;/p&gt;
&lt;p&gt;There are a bunch of various Python frameworks and libraries, which make creating apps more elegant and dedicated. In particular, some of them can help improve your experience with building emails sending functionality:&lt;/p&gt;
&lt;p&gt;The most popular frameworks are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Flask, which offers a simple interface for email sending: Flask Mail.&lt;/li&gt;
&lt;li&gt;Django, which can be a great option for building HTML templates.&lt;/li&gt;
&lt;li&gt;Zope comes in handy for a website development.&lt;/li&gt;
&lt;li&gt;Marrow Mailer is a dedicated mail delivery framework adding various helpful configurations.&lt;/li&gt;
&lt;li&gt;Plotly and its Dash can help with mailing graphs and reports.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Also, here is a &lt;a href=&quot;https://awesome-python.com/&quot;&gt;handy list&lt;/a&gt; of Python resources sorted by their functionality.&lt;/p&gt;
&lt;p&gt;Good luck and don’t forget to stay on the safe side when sending your emails!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published at Mailtrap’s blog: &lt;a href=&quot;https://blog.mailtrap.io/sending-emails-in-python-tutorial-with-code-examples/&quot;&gt;Sending emails with Python&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
</content:encoded><category>python</category><category>email</category></item><item><title>Python and fast HTTP clients</title><link>https://julien.danjou.info/blog/python-and-fast-http-clients/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-and-fast-http-clients/</guid><description>Nowadays, it is more than likely that you will have to write an HTTP client for your application that will have to talk to another HTTP server. The ubiquity of REST API makes HTTP a first class citize</description><pubDate>Mon, 07 Oct 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Nowadays, it is more than likely that you will have to write an HTTP client for your application that will have to talk to another HTTP server. The ubiquity of REST API makes HTTP a first class citizen. That&apos;s why knowing optimization patterns are a prerequisite.&lt;/p&gt;
&lt;p&gt;There are many HTTP clients in Python; the most widely used and easy to&lt;br /&gt;
work with is &lt;em&gt;&lt;a href=&quot;https://requests.kennethreitz.org/&quot;&gt;requests&lt;/a&gt;&lt;/em&gt;. It is the de-factor standard nowadays.&lt;/p&gt;
&lt;h2&gt;Persistent Connections&lt;/h2&gt;
&lt;p&gt;The first optimization to take into account is the use of a persistent connection to the Web server. Persistent connections are a standard since HTTP 1.1 though many applications do not leverage them. This lack of optimization is simple to explain if you know that when using &lt;em&gt;requests&lt;/em&gt; in its simple mode (e.g. with the &lt;code&gt;get&lt;/code&gt; function) the connection is closed on return. To avoid that, an application needs to use a &lt;code&gt;Session&lt;/code&gt; object that allows reusing an already opened connection.&lt;/p&gt;
&lt;p&gt;Each connection is stored in a pool of connections (10 by default), the size of&lt;br /&gt;
which is also configurable:&lt;/p&gt;
&lt;p&gt;Reusing the TCP connection to send out several HTTP requests offers a number of performance advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lower CPU and memory usage (fewer connections opened simultaneously).&lt;/li&gt;
&lt;li&gt;Reduced latency in subsequent requests (no TCP handshaking).&lt;/li&gt;
&lt;li&gt;Exceptions can be raised without the penalty of closing the TCP connection.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The HTTP protocol also provides &lt;a href=&quot;https://en.wikipedia.org/wiki/HTTP_pipelining&quot;&gt;pipelining&lt;/a&gt;, which allows sending several requests on the same connection without waiting for the replies to come (think batch). Unfortunately, this is not supported by the &lt;em&gt;requests&lt;/em&gt; library. However, pipelining requests may not be as fast as sending them in parallel. Indeed, the HTTP 1.1 protocol forces the replies to be sent in the same order as the requests were sent – first-in first-out.&lt;/p&gt;
&lt;h2&gt;Parallelism&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;requests&lt;/em&gt; also has one major drawback: it is synchronous. Calling &lt;code&gt;requests.get(&quot;http://example.org&quot;)&lt;/code&gt; blocks the program until the HTTP server replies completely. Having the application waiting and doing nothing can be a drawback here. It is possible that the program could do something else rather than sitting idle.&lt;/p&gt;
&lt;p&gt;A smart application can mitigate this problem by using a pool of threads like the ones provided by &lt;code&gt;concurrent.futures&lt;/code&gt;. It allows parallelizing the HTTP requests in a very rapid way.&lt;/p&gt;
&lt;p&gt;This pattern being quite useful, it has been packaged into a library named &lt;em&gt;&lt;a href=&quot;https://github.com/ross/requests-futures&quot;&gt;requests-futures&lt;/a&gt;&lt;/em&gt;. The usage of &lt;code&gt;Session&lt;/code&gt; objects is made transparent to the developer:&lt;/p&gt;
&lt;p&gt;By default a worker with two threads is created, but a program can easily customize this value by passing the &lt;code&gt;max_workers&lt;/code&gt; argument or even its own executor to the &lt;code&gt;FuturSession&lt;/code&gt; object – for example like this: &lt;code&gt;FuturesSession(executor=ThreadPoolExecutor(max_workers=10))&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Asynchronicity&lt;/h2&gt;
&lt;p&gt;As explained earlier, &lt;em&gt;requests&lt;/em&gt; is entirely synchronous. That blocks the application while waiting for the server to reply, slowing down the program. Making HTTP requests in threads is one solution, but threads do have their own overhead and this implies parallelism, which is not something everyone is always glad to see in a program.&lt;/p&gt;
&lt;p&gt;Starting with version 3.5, Python offers asynchronicity as its core using &lt;em&gt;asyncio&lt;/em&gt;. The &lt;a href=&quot;http://aiohttp.readthedocs.io/%5Baiohttp%5D&quot;&gt;aiohttp&lt;/a&gt; library provides an asynchronous HTTP client built on top of &lt;em&gt;asyncio&lt;/em&gt;. This library allows sending requests in series but without waiting for the first reply to come back before sending the new one. In contrast to HTTP pipelining, &lt;em&gt;aiohttp&lt;/em&gt; sends the requests over multiple connections in parallel, avoiding the ordering issue explained earlier.&lt;/p&gt;
&lt;p&gt;All those solutions (using &lt;code&gt;Session&lt;/code&gt;, &lt;em&gt;threads&lt;/em&gt;, &lt;em&gt;futures&lt;/em&gt; or &lt;em&gt;asyncio&lt;/em&gt;) offer different approaches to making HTTP clients faster.&lt;/p&gt;
&lt;h2&gt;Performances&lt;/h2&gt;
&lt;p&gt;The snippet below is an HTTP client sending requests to &lt;code&gt;httpbin.org&lt;/code&gt;, an HTTP API that provides (among other things) an endpoint simulating a long request (a second here). This example implements all the techniques listed above and times them.&lt;/p&gt;
&lt;p&gt;Running this program gives the following output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Time needed for `serialized&apos; called: 12.12s
Time needed for `Session&apos; called: 11.22s
Time needed for `FuturesSession w/ 2 workers&apos; called: 5.65s
Time needed for `FuturesSession w/ max workers&apos; called: 1.25s
Time needed for `aiohttp&apos; called: 1.19s
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/07/20190716092338_hd.png&quot; alt=&quot;Benchmark chart comparing HTTP client performance in Python&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Without any surprise, the slower result comes with the dumb serialized version, since all the requests are made one after another without reusing the connection — 12 seconds to make 10 requests.&lt;/p&gt;
&lt;p&gt;Using a &lt;code&gt;Session&lt;/code&gt; object and therefore reusing the connection means saving 8% in terms of time, which is already a big and easy win. Minimally, you should always use a session.&lt;/p&gt;
&lt;p&gt;If your system and program allow the usage of threads, it is a good call to use them to parallelize the requests. However threads have some overhead, and they are not weight-less. They need to be created, started and then joined.&lt;/p&gt;
&lt;p&gt;Unless you are still using old versions of Python, without a doubt using &lt;em&gt;aiohttp&lt;/em&gt; should be the way to go nowadays if you want to write a fast and asynchronous HTTP client. It is the fastest and the most scalable solution as it can handle hundreds of parallel requests. The alternative, managing hundreds of threads in parallel is not a great option.&lt;/p&gt;
&lt;h2&gt;Streaming&lt;/h2&gt;
&lt;p&gt;Another speed optimization that can be efficient is streaming the requests. When making a request, by default the body of the response is downloaded immediately. The &lt;code&gt;stream&lt;/code&gt; parameter provided by the &lt;em&gt;requests&lt;/em&gt; library or the &lt;code&gt;content&lt;/code&gt; attribute for &lt;code&gt;aiohttp&lt;/code&gt; both provide a way to not load the full content in memory as soon as the request is executed.&lt;/p&gt;
&lt;p&gt;Not loading the full content is extremely important in order to avoid allocating potentially hundred of megabytes of memory for nothing. If your program does not need to access the entire content as a whole but can work on chunks, it is probably better to just use those methods. For example, if you&apos;re going to save and write the content to a file, reading only a chunk and writing it at the same time is going to be much more memory efficient than reading the whole HTTP body, allocating a giant pile of memory, and then writing it to disk.&lt;/p&gt;
&lt;p&gt;I hope that&apos;ll make it easier for you to write proper HTTP clients and requests. If you know any other useful technic or method, feel free to write it down in the comment section below!&lt;/p&gt;
</content:encoded><category>python</category><category>web</category></item><item><title>Dependencies Handling in Python</title><link>https://julien.danjou.info/blog/dependencies-handling-in-python-automatic-update/</link><guid isPermaLink="true">https://julien.danjou.info/blog/dependencies-handling-in-python-automatic-update/</guid><description>Dependencies are a nightmare. Here&apos;s how to handle them properly in Python with pipenv, poetry, Dependabot, and Mergify for fully automatic updates.</description><pubDate>Mon, 02 Sep 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Dependencies are a nightmare for many people. &lt;a href=&quot;https://thenewstack.io/to-reduce-tech-debt-eliminate-dependencies-and-refactoring/&quot;&gt;Some even argue they are technical debt&lt;/a&gt;. Managing the list of the libraries of your software is a horrible experience. Updating them — automatically? — sounds like a delirium.&lt;/p&gt;
&lt;p&gt;Stick with me here as I am going to help you get a better grasp on something that you cannot, in practice, get rid of — unless you&apos;re incredibly rich and talented and can live without the code of others.&lt;/p&gt;
&lt;p&gt;First, we need to be clear of something about dependencies: there are two types of them. &lt;a href=&quot;https://caremad.io/posts/2013/07/setup-vs-requirement/&quot;&gt;Donald Stuff wrote better than I would about the subject&lt;/a&gt; years ago. To make it simple, one can say that they are two types of code packages depending on  external code: applications and libraries.&lt;/p&gt;
&lt;h3&gt;Libraries Dependencies&lt;/h3&gt;
&lt;p&gt;Python libraries should specify their dependencies in a generic way. A library should not require &lt;code&gt;requests 2.1.5&lt;/code&gt;: it does not make sense. If every library out there needs a different version of &lt;code&gt;requests&lt;/code&gt;, they can&apos;t be used at the same time.&lt;/p&gt;
&lt;p&gt;Libraries need to declare dependencies based on ranges of version numbers. Requiring &lt;code&gt;requests&amp;gt;=2&lt;/code&gt; is correct. Requiring &lt;code&gt;requests&amp;gt;=1,&amp;lt;2&lt;/code&gt; is also correct if you know that &lt;code&gt;requests 2.x&lt;/code&gt; does not work with the library. The problem that your version range specification is solving is the &lt;strong&gt;API compatibility issue&lt;/strong&gt; between your code and your dependencies — &lt;em&gt;nothing else&lt;/em&gt;. That&apos;s a good reason for libraries to use &lt;a href=&quot;https://semver.org/&quot;&gt;Semantic Versioning&lt;/a&gt; whenever possible.&lt;/p&gt;
&lt;p&gt;Therefore, dependencies should be written in &lt;code&gt;setup.py&lt;/code&gt; as something like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from setuptools import setup

setup(
    name=&quot;MyLibrary&quot;,
    version=&quot;1.0&quot;,
    install_requires=[
        &quot;requests&quot;,
    ],
    # ...
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This way, it is easy for any application to use the library and co-exist with others.&lt;/p&gt;
&lt;h3&gt;Applications Dependencies&lt;/h3&gt;
&lt;p&gt;An application is just a particular case of libraries. They are not intended to be reused (imported) by other libraries of applications — though nothing would prevent it in practice.&lt;/p&gt;
&lt;p&gt;In the end, that means that you should specify the dependencies the same way that you would do for a library in the application&apos;s &lt;code&gt;setup.py&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The main difference is that an application is usually &lt;em&gt;deployed&lt;/em&gt; in production to provide its service. Deployments need to be reproducible. For that, you can&apos;t solely rely on &lt;code&gt;setup.py&lt;/code&gt;: the requested range of the dependencies are too broad. You&apos;re at the mercy of random version changes at any time when re-deploying your application.&lt;/p&gt;
&lt;p&gt;You, therefore, need a different version management mechanism to handle deployment than just &lt;code&gt;setup.py&lt;/code&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;pipenv&lt;/em&gt; has &lt;a href=&quot;https://docs.pipenv.org/en/latest/advanced/#pipfile-vs-setuppy&quot;&gt;an excellent section recapping this&lt;/a&gt; in its documentation. It splits dependency types into &lt;em&gt;abstract&lt;/em&gt; and &lt;em&gt;concrete&lt;/em&gt; dependencies: &lt;em&gt;abstract&lt;/em&gt; dependencies are based on ranges (e.g., libraries) whereas &lt;em&gt;concrete&lt;/em&gt; dependencies are specified with precise versions (e.g., application deployments) — as we&apos;ve just seen here.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;Handling Deployment&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;requirements.txt&lt;/code&gt; file has been used to solve application deployment reproducibility for a long time now. Its format is usually something like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;requests==3.1.5
foobar==2.0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each library sees itself specified to the micro version. That makes sure each of your deployment is going to install the same version of your dependency. Using a &lt;code&gt;requirements.txt&lt;/code&gt; is a simple solution and a first step toward reproducible deployment. However, it&apos;s not &lt;em&gt;enough&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Indeed, while you can specify which version of &lt;code&gt;requests&lt;/code&gt; you want, if &lt;code&gt;requests&lt;/code&gt; depends on &lt;code&gt;urllib3&lt;/code&gt;, that could make &lt;code&gt;pip&lt;/code&gt; install &lt;code&gt;urllib 2.1&lt;/code&gt; or &lt;code&gt;urllib 2.2&lt;/code&gt;. You can&apos;t know which one will be installed, which does not make your deployment 100% reproducible.&lt;/p&gt;
&lt;p&gt;Of course, you &lt;em&gt;could&lt;/em&gt; duplicate all &lt;code&gt;requests&lt;/code&gt; dependencies yourself in your &lt;code&gt;requirements.txt&lt;/code&gt;, but that would be &lt;strong&gt;madness&lt;/strong&gt;!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/08/image.png&quot; alt=&quot;An application dependency tree can be quite deep and complex sometimes.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;There are various hacks available to fix this limitation, but the real saviors here are &lt;a href=&quot;https://github.com/pypa/pipenv&quot;&gt;&lt;em&gt;pipenv&lt;/em&gt;&lt;/a&gt; and &lt;a href=&quot;https://poetry.eustace.io/&quot;&gt;&lt;em&gt;poetry&lt;/em&gt;&lt;/a&gt;. The way they solve it is similar to many package managers in other programming languages. They generate a &lt;em&gt;lock file&lt;/em&gt; that contains the list of all installed dependencies (and their own dependencies, etc.) with their version numbers. That makes sure the deployment is 100% reproducible.&lt;/p&gt;
&lt;p&gt;Check out their documentation on how to set up and use them!&lt;/p&gt;
&lt;h3&gt;Handling Dependencies Updates&lt;/h3&gt;
&lt;p&gt;Now that you have your &lt;em&gt;lock file&lt;/em&gt; that makes sure your deployment is reproducible in a snap, you&apos;ve another problem. How do you make sure that your dependencies are up-to-date? There is a real security concern about this, but also bug fixes and optimizations that you might miss by staying behind.&lt;/p&gt;
&lt;p&gt;If your project is hosted on &lt;a href=&quot;https://github.com&quot;&gt;GitHub&lt;/a&gt;, &lt;a href=&quot;https://dependabot.com/&quot;&gt;Dependabot&lt;/a&gt; is an excellent solution to solve this issue. Enabling this application on your repository creates automatically pull requests whenever a new version of the library listed in your lock file is available. For example, if you&apos;ve deployed your application with &lt;code&gt;redis 3.3.6&lt;/code&gt;, Dependabot will create a pull request updating to &lt;code&gt;redis 3.3.7&lt;/code&gt; as soon as it gets released. Furthermore, Dependabot supports &lt;code&gt;requirements.txt&lt;/code&gt;, &lt;em&gt;pipenv&lt;/em&gt;, and &lt;em&gt;poetry&lt;/em&gt;!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/08/Screenshot-2019-08-14-at-17.57.47.png&quot; alt=&quot;Dependabot updating jinja2 for you&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Automatic Deployment Update&lt;/h2&gt;
&lt;p&gt;You&apos;re almost there. You have a bot that is letting you know that a new version of a library your project needs is available.&lt;/p&gt;
&lt;p&gt;Once the pull request is created, your continuous integration system is going to kick in, deploy your project, and runs the test. If everything works fine, your pull request is ready to be merged. But are &lt;em&gt;you&lt;/em&gt; really needed in this process?&lt;/p&gt;
&lt;p&gt;Unless you have a particular and personal aversion on specific version numbers —&quot;Gosh I hate versions that end with a 3. It&apos;s always bad luck.&quot;— or unless you have zero automated testing, you, human, is useless. This merge can be fully automatic.&lt;/p&gt;
&lt;p&gt;This is where &lt;a href=&quot;https://mergify.io&quot;&gt;&lt;em&gt;Mergify&lt;/em&gt;&lt;/a&gt; comes into play. Mergify is a GitHub application allowing to define precise rules about how to merge your pull requests. Here&apos;s a rule that I use in every project:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pull_requests_rules:
  - name: automatic merge from dependabot
    conditions:
      - author~=^dependabot(|-preview)\[bot\]$
      - label!=work-in-progress
      - &quot;status-success=ci/circleci: pep8&quot;
      - &quot;status-success=ci/circleci: py37&quot;
    actions:
      merge:
        method: merge
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/08/Screenshot-2019-08-14-at-18.38.25.png&quot; alt=&quot;Mergify reports when the rule fully matches&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As soon as your continuous integration system passes, Mergify merges the pull request for you.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/08/Screenshot-2019-08-14-at-18.38.37.png&quot; alt=&quot;Screenshot of Mergify automatically merging a Dependabot pull request&quot; /&gt;&lt;/p&gt;
&lt;p&gt;You can then automatically trigger your deployment hooks to update your production deployment and get the new library version installed right away. This leaves your application always up-to-date with newer libraries and not lagging behind several years of releases.&lt;/p&gt;
&lt;p&gt;If anything goes wrong, you&apos;re still able to revert the commit from Dependabot — which you can also automate if you wish with a Mergify rule.&lt;/p&gt;
&lt;h2&gt;Beyond&lt;/h2&gt;
&lt;p&gt;This is to me the state of the art of dependency management lifecycle right now. And while this applies exceptionally well to Python, it can be applied to many other languages that use a similar pattern — such as Node and &lt;em&gt;npm&lt;/em&gt;.&lt;/p&gt;
</content:encoded><category>python</category><category>mergify</category><category>github</category></item><item><title>Handling multipart/form-data natively in Python</title><link>https://julien.danjou.info/blog/handling-multipart-form-data-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/handling-multipart-form-data-python/</guid><description>RFC7578 (who obsoletes RFC2388) defines the multipart/form-data type that is usually transported over HTTP when users submit forms on your Web page.</description><pubDate>Mon, 01 Jul 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://tools.ietf.org/html/rfc7578&quot;&gt;RFC7578&lt;/a&gt; (who obsoletes &lt;a href=&quot;https://tools.ietf.org/html/rfc2388&quot;&gt;RFC2388&lt;/a&gt;) defines the &lt;code&gt;multipart/form-data&lt;/code&gt; type that is usually transported over HTTP when users submit forms on your Web page. Nowadays, it tends to be replaced by JSON encoded payloads; nevertheless, it is still widely used.&lt;/p&gt;
&lt;p&gt;While you could decode an HTTP body request made with JSON natively with Python — thanks to the &lt;code&gt;json&lt;/code&gt; module — there is no such way to do that with &lt;code&gt;multipart/form-data&lt;/code&gt;. That&apos;s something barely understandable considering how old the format is.&lt;/p&gt;
&lt;p&gt;There is a wide variety of way available to encode and decode this format. Libraries such as &lt;em&gt;requests&lt;/em&gt; support this natively without making you notice, and the same goes for the majority of Web server frameworks such as &lt;em&gt;Django&lt;/em&gt; or &lt;em&gt;Flask&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;However, in certain circumstances, you might be on your own to encode or decode this format, and it might not be an option to pull (significant) dependencies.&lt;/p&gt;
&lt;h2&gt;Encoding&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;multipart/form-data&lt;/code&gt; format is quite simple to understand and can be summarised as an easy way to encode a list of keys and values, i.e., a portable way of serializing a dictionary.&lt;/p&gt;
&lt;p&gt;There&apos;s nothing in Python to generate such an encoding. The format is quite simple and consists of the key and value surrounded by a random boundary delimiter. This delimiter must be passed as part of the &lt;code&gt;Content-Type&lt;/code&gt;, so that the decoder can decode the form data.&lt;/p&gt;
&lt;p&gt;There&apos;s a simple implementation in &lt;em&gt;urllib3&lt;/em&gt; that does the job. It&apos;s possible to summarize it in this simple implementation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import binascii
import os

def encode_multipart_formdata(fields):
    boundary = binascii.hexlify(os.urandom(16)).decode(&apos;ascii&apos;)

    body = (
        &quot;&quot;.join(&quot;--%s\r\n&quot;
                &quot;Content-Disposition: form-data; name=\&quot;%s\&quot;\r\n&quot;
                &quot;\r\n&quot;
                &quot;%s\r\n&quot; % (boundary, field, value)
                for field, value in fields.items()) +
        &quot;--%s--\r\n&quot; % boundary
    )

    content_type = &quot;multipart/form-data; boundary=%s&quot; % boundary

    return body, content_type
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can use by passing a dictionary where keys and values are bytes. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;encode_multipart_formdata({&quot;foo&quot;: &quot;bar&quot;, &quot;name&quot;: &quot;jd&quot;})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which returns:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;--00252461d3ab8ff5c25834e0bffd6f70
Content-Disposition: form-data; name=&quot;foo&quot;

bar
--00252461d3ab8ff5c25834e0bffd6f70
Content-Disposition: form-data; name=&quot;name&quot;

jd
--00252461d3ab8ff5c25834e0bffd6f70--
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;multipart/form-data; boundary=00252461d3ab8ff5c25834e0bffd6f70
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can use the returned content type in your HTTP reply header &lt;code&gt;Content-Type&lt;/code&gt;. Note that this format is used for forms: it can also be used by emails.&lt;/p&gt;
&lt;p&gt;Emails did you say?&lt;/p&gt;
&lt;h2&gt;Encoding with &lt;code&gt;email&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Right, emails are usually encoded using MIME, which is defined by yet another RFC, &lt;a href=&quot;https://tools.ietf.org/html/rfc2046&quot;&gt;RFC2046&lt;/a&gt;. It turns out that &lt;code&gt;multipart/form-data&lt;/code&gt; is just a particular MIME format, and that if you have code that implements MIME handling, it&apos;s easy to use it to implement this format.&lt;/p&gt;
&lt;p&gt;Fortunately for us, Python standard library comes with a module that handles exactly that: &lt;code&gt;email.mime&lt;/code&gt;. I told you it was heavily used by email — I guess that&apos;s why they put that code in the &lt;code&gt;email&lt;/code&gt; subpackage.&lt;/p&gt;
&lt;p&gt;Here&apos;s a piece of code that handles &lt;code&gt;multipart/form-data&lt;/code&gt; in a few lines of code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from email import message
from email.mime import multipart
from email.mime import nonmultipart
from email.mime import text

class MIMEFormdata(nonmultipart.MIMENonMultipart):
    def __init__(self, keyname, *args, **kwargs):
        super(MIMEFormdata, self).__init__(*args, **kwargs)
        self.add_header(
            &quot;Content-Disposition&quot;, &quot;form-data; name=\&quot;%s\&quot;&quot; % keyname)

def encode_multipart_formdata(fields):
    m = multipart.MIMEMultipart(&quot;form-data&quot;)

    for field, value in fields.items():
        data = MIMEFormdata(field, &quot;text&quot;, &quot;plain&quot;)
        data.set_payload(value)
        m.attach(data)

    return m
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using this piece of code returns the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Content-Type: multipart/form-data; boundary=&quot;===============1107021068307284864==&quot;
MIME-Version: 1.0

--===============1107021068307284864==
Content-Type: text/plain
MIME-Version: 1.0
Content-Disposition: form-data; name=&quot;foo&quot;

bar
--===============1107021068307284864==
Content-Type: text/plain
MIME-Version: 1.0
Content-Disposition: form-data; name=&quot;name&quot;

jd
--===============1107021068307284864==--
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This method has several advantages over our first implementation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It handles &lt;code&gt;Content-Type&lt;/code&gt; for each of the added MIME parts. We could add other data types than just &lt;code&gt;text/plain&lt;/code&gt; like it is implicitly done in the first version. We could also specify the charset (encoding) of the textual data.&lt;/li&gt;
&lt;li&gt;It&apos;s very likely more robust by leveraging the wildly tested Python standard library.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The main downside, in that case, is that the &lt;code&gt;Content-Type&lt;/code&gt; header is included with the content. In case of handling HTTP, it is problematic as this needs to be sent as part of the HTTP header and not as part of the payload.&lt;/p&gt;
&lt;p&gt;It should be possible to build a particular generator from &lt;code&gt;email.generator&lt;/code&gt; that does this. I&apos;ll leave that as an exercise to you, reader.&lt;/p&gt;
&lt;h2&gt;Decoding&lt;/h2&gt;
&lt;p&gt;We must be able to use that same &lt;code&gt;email&lt;/code&gt; package to decode our encoded data, right? It turns out that&apos;s the case, with a piece of code that looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import email.parser

msg = email.parser.BytesParser().parsebytes(my_multipart_data)

print({
    part.get_param(&apos;name&apos;, header=&apos;content-disposition&apos;): part.get_payload(decode=True)
    for part in msg.get_payload()
})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With the example data above, this returns:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{&apos;foo&apos;: b&apos;bar&apos;, &apos;name&apos;: b&apos;jd&apos;}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Amazing, right?&lt;/p&gt;
&lt;p&gt;The moral of this story is that you should never underestimate the power of the standard library. While it&apos;s easy to add a single line in your list of dependencies, it&apos;s not always required if you dig a bit into what Python provides for you!&lt;/p&gt;
</content:encoded><category>python</category><category>email</category><category>web</category></item><item><title>Advanced Functional Programming in Python: lambda</title><link>https://julien.danjou.info/blog/python-functional-programming-lambda/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-functional-programming-lambda/</guid><description>A few weeks ago, I introduced you to functional programming in Python. Today, I&apos;d like to go further into this topic and show you so more interesting features.</description><pubDate>Mon, 03 Jun 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few weeks ago, I introduced you to &lt;a href=&quot;https://julien.danjou.info/blog/python-and-functional-programming&quot;&gt;functional programming in Python&lt;/a&gt;. Today, I&apos;d like to go further into this topic and show you so more interesting features.&lt;/p&gt;
&lt;h2&gt;Lambda Functions&lt;/h2&gt;
&lt;p&gt;What do we call lambda functions? They are in essence anonymous functions. In order to create them, you must use the &lt;code&gt;lambda&lt;/code&gt; statement:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; lambda x: x
&amp;lt;function &amp;lt;lambda&amp;gt; at 0x102e23620&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In Python, lambda functions are quite limited. They can take any number of arguments; however they can contain only one statement and be written on a single line.&lt;/p&gt;
&lt;p&gt;They are mostly useful to be passed to high-order functions, such as &lt;code&gt;map()&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; list(map(lambda x: x * 2, range(10)))
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will apply the anonymous function &lt;code&gt;lambda x: x * 2&lt;/code&gt; to every item returned by &lt;code&gt;range(10)&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;functools.partial&lt;/h2&gt;
&lt;p&gt;Since lambda functions are limited to being one line long, it&apos;s often that they are used to &lt;em&gt;specialize&lt;/em&gt; longer version of an existing function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def between(number, min=0, max=1000):
    return max &amp;gt; number &amp;gt; min

## Only returns number between 10 and 1000
filter(lambda x: between(x, min=10), range(10000))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our lambda is finally just a wrapper of the &lt;code&gt;between&lt;/code&gt; function with one of the argument already set. What if we would have a better way, without the various lambda limitations, to write that? That&apos;s where &lt;code&gt;functools.partial&lt;/code&gt; comes handy.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import functools
def between(number, min=0, max=1000):
    return max &amp;gt; number &amp;gt; min

## Only returns number between 10 and 1000
atleast_10_and_upto = functools.partial(between, min=10)
## Return number betweens 10 and 1000
filter(atleast_10_and_upto, range(10000))

## Return number betweens 10 and 20
filter(lambda x: atleast_10_and_upto(x, max=20), range(10000))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;functools.partial&lt;/code&gt; function returns a specialized version of the &lt;code&gt;between&lt;/code&gt; function, where &lt;code&gt;min&lt;/code&gt; is already set. We can store them in a variable, use it, reuse it, as much as we want. We can pass it a &lt;code&gt;max&lt;/code&gt; argument, as shown in the second part — using a &lt;code&gt;lambda&lt;/code&gt;! You can mix and matches those two as you prefer and what seems clearer for you.&lt;/p&gt;
&lt;h2&gt;Common lambda&lt;/h2&gt;
&lt;p&gt;There is a type of lambda function that is pretty common: the attribute or item getter. They are typically used a &lt;code&gt;key&lt;/code&gt; function for sorting or filtering.&lt;/p&gt;
&lt;p&gt;Here&apos;s a list of 200 tuples containing two integers &lt;code&gt;(i1, i2)&lt;/code&gt;. If you want to use only &lt;code&gt;i2&lt;/code&gt; as the sorting key, you would write:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mylist = list(zip(range(40, 240), range(-100, 100)))

sorted(mylist, key=lambda i: i[1])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which works fine, but make you use &lt;code&gt;lambda&lt;/code&gt;. You could rather use the &lt;code&gt;operator&lt;/code&gt; module:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import operator

mylist = list(zip(range(40, 240), range(-100, 100)))

sorted(mylist, key=operator.itemgetter(1))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This does the same thing, except it avoids using &lt;code&gt;lambda&lt;/code&gt; altogether. Cherry-on-the-cake: it is actually 10% faster on my laptop.&lt;/p&gt;
&lt;p&gt;I hope that&apos;ll make you write more functional code!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>An Introduction to Functional Programming with Python</title><link>https://julien.danjou.info/blog/python-and-functional-programming/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-and-functional-programming/</guid><description>Many Python developers are unaware of the extent to which you can use functional programming in Python, which is a shame: with few exceptions, functional programming allows you to write more concise a</description><pubDate>Mon, 06 May 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Many Python developers are unaware of the extent to which you can use functional programming in Python, which is a shame: with few exceptions, functional programming allows you to write more concise and efficient code. Moreover, Python’s support for functional programming is extensive.&lt;/p&gt;
&lt;p&gt;Here I&apos;d like to talk a bit about how you can actually have a functional approach to programming with our favorite language.&lt;/p&gt;
&lt;h2&gt;Pure Functions&lt;/h2&gt;
&lt;p&gt;When you write code using a functional style, your functions are designed to have no side effects: instead, they take an input and produce an output without keeping state or modifying anything not reflected in the return value. Functions that follow this ideal are referred to as purely functional.&lt;/p&gt;
&lt;p&gt;Let’s start with an example of a regular, non-pure function that removes the last item in a list:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def remove_last_item(mylist):
    &quot;&quot;&quot;Removes the last item from a list.&quot;&quot;&quot;
    mylist.pop(-1)  # This modifies mylist
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This function is not pure: it has a side effect as it modifies the argument it is given. Let&apos;s rewrite it as purely functional:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def butlast(mylist):
    &quot;&quot;&quot;Like butlast in Lisp; returns the list without the last element.&quot;&quot;&quot;
    return mylist[:-1]  # This returns a copy of mylist
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We define a &lt;code&gt;butlast()&lt;/code&gt; function (like &lt;code&gt;butlast&lt;/code&gt; in Lisp) that returns the list without the last element without modifying the original list. Instead, it returns a copy of the list that has the modifications in place, allowing us to keep the original. The practical advantages of using functional programming include the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Modularity.&lt;/em&gt; Writing with a functional style forces a certain degree of&lt;br /&gt;
separation in solving your individual problems and makes sections of code&lt;br /&gt;
easier to reuse in other contexts. Since the function does not depend on any&lt;br /&gt;
external variable or state, call it from a different piece of code is&lt;br /&gt;
straightforward.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Brevity.&lt;/em&gt; Functional programming is often less verbose than other paradigms.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Concurrency.&lt;/em&gt; Purely functional functions are thread-safe and can run&lt;br /&gt;
concurrently. Some functional languages do this automatically, which can be&lt;br /&gt;
a big help if you ever need to scale your application, though this is not&lt;br /&gt;
quite the case yet in Python.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Testability.&lt;/em&gt; Testing a functional program is incredibly easy: all you need&lt;br /&gt;
is a set of inputs and an expected set of outputs. They are idempotent,&lt;br /&gt;
meaning that calling the same function over and over with the same arguments&lt;br /&gt;
will always return the same result.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that concepts such as &lt;a href=&quot;https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions&quot;&gt;list comprehension&lt;/a&gt; in Python are already functionals in their approach, as they are designed to avoid side effects. We&apos;ll see in the following that some of the functional functions Python provide can actually be expressed as list comprehension!&lt;/p&gt;
&lt;h2&gt;Python Functional Functions&lt;/h2&gt;
&lt;p&gt;You might repeatedly encounter the same set of problems when manipulating data using functional programming. To help you deal with this situation efficiently, Python includes a number of functions for functional programming. Here, we&apos;ll see with a quick overview some of these built-in functions that allows you to build fully functional programs. Once you have an idea of what’s available, I encourage you to research further and try out functions where they might apply in your own code.&lt;/p&gt;
&lt;h3&gt;Applying Functions to Items with &lt;code&gt;map&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;map()&lt;/code&gt; function takes the form &lt;code&gt;map(function, iterable)&lt;/code&gt; and applies &lt;code&gt;function&lt;/code&gt; to each item in &lt;code&gt;iterable&lt;/code&gt; to return an iterable map object:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; map(lambda x: x + &quot;bzz!&quot;, [&quot;I think&quot;, &quot;I&apos;m good&quot;])
&amp;lt;map object at 0x7fe7101abdd0&amp;gt;
&amp;gt;&amp;gt;&amp;gt; list(map(lambda x: x + &quot;bzz!&quot;, [&quot;I think&quot;, &quot;I&apos;m good&quot;]))
[&apos;I thinkbzz!&apos;, &quot;I&apos;m goodbzz!&quot;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You could also write an equivalent of &lt;code&gt;map()&lt;/code&gt; using list comprehension, which&lt;br /&gt;
would look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; (x + &quot;bzz!&quot; for x in [&quot;I think&quot;, &quot;I&apos;m good&quot;])
&amp;lt;generator object &amp;lt;genexpr&amp;gt; at 0x7f9a0d697dc0&amp;gt;
&amp;gt;&amp;gt;&amp;gt; [x + &quot;bzz!&quot; for x in [&quot;I think&quot;, &quot;I&apos;m good&quot;]]
[&apos;I thinkbzz!&apos;, &quot;I&apos;m goodbzz!&quot;]
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Filtering Lists with &lt;code&gt;filter&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;filter()&lt;/code&gt; function takes the form &lt;code&gt;filter(function or None, iterable)&lt;/code&gt; and filters the items in iterable based on the result returned by &lt;code&gt;function&lt;/code&gt;. This will return iterable &lt;code&gt;filter&lt;/code&gt; object:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; filter(lambda x: x.startswith(&quot;I &quot;), [&quot;I think&quot;, &quot;I&apos;m good&quot;])
&amp;lt;filter object at 0x7f9a0d636dd0&amp;gt;
&amp;gt;&amp;gt;&amp;gt; list(filter(lambda x: x.startswith(&quot;I &quot;), [&quot;I think&quot;, &quot;I&apos;m good&quot;]))
[&apos;I think&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You could also write an equivalent of &lt;code&gt;filter()&lt;/code&gt; using list comprehension, like&lt;br /&gt;
so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; (x for x in [&quot;I think&quot;, &quot;I&apos;m good&quot;] if x.startswith(&quot;I &quot;))
&amp;lt;generator object &amp;lt;genexpr&amp;gt; at 0x7f9a0d697dc0&amp;gt;
&amp;gt;&amp;gt;&amp;gt; [x for x in [&quot;I think&quot;, &quot;I&apos;m good&quot;] if x.startswith(&quot;I &quot;)]
[&apos;I think&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Getting Indexes with &lt;code&gt;enumerate&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;enumerate()&lt;/code&gt; function takes the form &lt;code&gt;enumerate(iterable[, start])&lt;/code&gt; and returns an iterable object that provides a sequence of tuples, each consisting of an integer index (starting with &lt;code&gt;start&lt;/code&gt;, if provided) and the corresponding item in &lt;code&gt;iterable&lt;/code&gt;. This function is useful when you need to write code that refers to array indexes. For example, instead of writing this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;i = 0
while i &amp;lt; len(mylist):
    print(&quot;Item %d: %s&quot; % (i, mylist[i]))
    i += 1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You could accomplish the same thing more efficiently with &lt;code&gt;enumerate()&lt;/code&gt;, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for i, item in enumerate(mylist):
    print(&quot;Item %d: %s&quot; % (i, item))
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Sorting a List with &lt;code&gt;sorted&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;sorted()&lt;/code&gt; function takes the form &lt;code&gt;sorted(iterable, key=None, reverse=False)&lt;/code&gt; and returns a sorted version of &lt;code&gt;iterable&lt;/code&gt;. The key argument allows you to provide a function that returns the value to sort on:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; sorted([(&quot;a&quot;, 2), (&quot;c&quot;, 1), (&quot;d&quot;, 4)])
[(&apos;a&apos;, 2), (&apos;c&apos;, 1), (&apos;d&apos;, 4)]
&amp;gt;&amp;gt;&amp;gt; sorted([(&quot;a&quot;, 2), (&quot;c&quot;, 1), (&quot;d&quot;, 4)], key=lambda x: x[1])
[(&apos;c&apos;, 1), (&apos;a&apos;, 2), (&apos;d&apos;, 4)]
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Finding Items That Satisfy Conditions with any and all&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;any(iterable)&lt;/code&gt; and &lt;code&gt;all(iterable)&lt;/code&gt; functions both return a Boolean depending on the values returned by &lt;code&gt;iterable&lt;/code&gt;. These simple functions are equivalent to the following full Python code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def all(iterable):
    for x in iterable:
        if not x:
            return False
    return True

def any(iterable):
    for x in iterable:
        if x:
            return True
    return False
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These functions are useful for checking whether any or all of the values in an iterable satisfy a given condition. For example, the following checks a list for two conditions:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mylist = [0, 1, 3, -1]
if all(map(lambda x: x &amp;gt; 0, mylist)):
    print(&quot;All items are greater than 0&quot;)
if any(map(lambda x: x &amp;gt; 0, mylist)):
    print(&quot;At least one item is greater than 0&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The key difference here, as you can see, is that &lt;code&gt;any()&lt;/code&gt; returns &lt;code&gt;True&lt;/code&gt; when at least one element meets the condition, while &lt;code&gt;all()&lt;/code&gt; returns &lt;code&gt;True&lt;/code&gt; only if every element meets the condition. The &lt;code&gt;all()&lt;/code&gt; function will also return &lt;code&gt;True&lt;/code&gt; for an empty iterable, since none of the elements is &lt;code&gt;False&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Combining Lists with &lt;code&gt;zip&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;zip()&lt;/code&gt; function takes the form &lt;code&gt;zip(iter1 [,iter2 [...]])&lt;/code&gt; and takes multiple sequences and combines them into tuples. This is useful when you need to combine a list of keys and a list of values into a &lt;code&gt;dict&lt;/code&gt;. Like the other functions described here, &lt;code&gt;zip()&lt;/code&gt; returns an iterable. Here we have a list of keys that we map to a list of values to create a dictionary:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; keys = [&quot;foobar&quot;, &quot;barzz&quot;, &quot;ba!&quot;]
&amp;gt;&amp;gt;&amp;gt; map(len, keys)
&amp;lt;map object at 0x7fc1686100d0&amp;gt;
&amp;gt;&amp;gt;&amp;gt; zip(keys, map(len, keys))
&amp;lt;zip object at 0x7fc16860d440&amp;gt;
&amp;gt;&amp;gt;&amp;gt; list(zip(keys, map(len, keys)))
[(&apos;foobar&apos;, 6), (&apos;barzz&apos;, 5), (&apos;ba!&apos;, 3)]
&amp;gt;&amp;gt;&amp;gt; dict(zip(keys, map(len, keys)))
{&apos;foobar&apos;: 6, &apos;barzz&apos;: 5, &apos;ba!&apos;: 3}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;What&apos;s Next?&lt;/h2&gt;
&lt;p&gt;While Python is often advertised as being object oriented, it can be used in a very functional manner. A lot of its built-in concepts, such as generators and list comprehension, are functionally oriented and don’t conflict with an object-oriented approach. Python provides a large set of builtin functions that can help you keeping your code with no side effects. That also limits the reliance on a program’s global state, for your own good.&lt;/p&gt;
&lt;p&gt;In the next blog post, we&apos;ll see how you can leverage Python &lt;em&gt;functools&lt;/em&gt; and &lt;em&gt;itertools&lt;/em&gt; module to enhance your functional adventure. Stay tuned!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Writing Your Own Filtering DSL in Python</title><link>https://julien.danjou.info/blog/writing-your-own-filtering-dsl-in-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/writing-your-own-filtering-dsl-in-python/</guid><description>A few months ago, we&apos;ve seen how to write a filtering syntax tree in Python. The idea behind this was to create a data structure — in the form of a dictionary — that would allow to filter data based.</description><pubDate>Mon, 01 Apr 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few months ago, &lt;a href=&quot;https://julien.danjou.info/blog/multi-value-syntax-tree-filtering-in-python&quot;&gt;we&apos;ve seen how to write a filtering syntax tree&lt;/a&gt; in Python. The idea behind this was to create a data structure — in the form of a dictionary — that would allow to filter data based on conditions.&lt;/p&gt;
&lt;p&gt;Our API looked like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; f = Filter(
  {&quot;and&quot;: [
    {&quot;eq&quot;: (&quot;foo&quot;, 3)},
    {&quot;gt&quot;: (&quot;bar&quot;, 4)},
   ]
  },
)
&amp;gt;&amp;gt;&amp;gt; f(foo=3, bar=5)
True
&amp;gt;&amp;gt;&amp;gt; f(foo=4, bar=5)
False
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While such a mechanism is pretty powerful to use, the input data structure format might not be user friendly. It&apos;s great to use, for example, with a JSON based REST API, but it&apos;s pretty terrible to use for a command-line interface.&lt;/p&gt;
&lt;p&gt;A good solution to that problem is to build our own &lt;em&gt;language&lt;/em&gt;. That&apos;s called a DSL.&lt;/p&gt;
&lt;h2&gt;Building a DSL&lt;/h2&gt;
&lt;p&gt;What&apos;s a Domain-Specific Language (DSL)? It&apos;s a computer language that is specialized to a certain domain. In our case, our domain is filtering, as we&apos;re providing a &lt;em&gt;Filter&lt;/em&gt; class that allows to filter a set of value.&lt;/p&gt;
&lt;p&gt;How do you build a data structure such as &lt;code&gt;{&quot;and&quot;: [{&quot;eq&quot;: (&quot;foo&quot;, 3)}, {&quot;gt&quot;: (&quot;bar&quot;, 4)}]}&lt;/code&gt; from a string? Well, you define a language, parse it, and then convert it to the right format.&lt;/p&gt;
&lt;p&gt;In order to parse a language, there are a lot of different solutions, from implementing manual parsers to using regular expression. In this case, we&apos;ll use &lt;a href=&quot;https://en.wikipedia.org/wiki/Lexical_analysis&quot;&gt;lexical analsysis&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;First Iteration&lt;/h3&gt;
&lt;p&gt;Let&apos;s start small and define the base of our grammar. That should be something simple, so we&apos;ll go with &lt;code&gt;&amp;lt;identifier&amp;gt;&amp;lt;operator&amp;gt;&amp;lt;value&amp;gt;&lt;/code&gt;. For example &lt;code&gt;&quot;foobar&quot;=&quot;baz&quot;&lt;/code&gt; is a valid sentence in our grammar and will conver to &lt;code&gt;{&quot;=&quot;: (&quot;foobar&quot;, &quot;baz&quot;)}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The following code snippet leverages &lt;a href=&quot;https://pypi.org/project/pyparsing/&quot;&gt;&lt;em&gt;pyparsing&lt;/em&gt;&lt;/a&gt; for parsing the string and specifying the grammar:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import pyparsing

identifier = pyparsing.QuotedString(&apos;&quot;&apos;)
operator = (
    pyparsing.Literal(&quot;=&quot;) |
    pyparsing.Literal(&quot;≠&quot;) |
    pyparsing.Literal(&quot;≥&quot;) |
    pyparsing.Literal(&quot;≤&quot;) |
    pyparsing.Literal(&quot;&amp;lt;&quot;) |
    pyparsing.Literal(&quot;&amp;gt;&quot;)
)
value = pyparsing.QuotedString(&apos;&quot;&apos;)

match_format = identifier + operator + value

print(match_format.parseString(&apos;&quot;foobar&quot;=&quot;123&quot;&apos;))

## Prints:
## [&apos;foobar&apos;, &apos;=&apos;, &apos;123&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With that simple grammar, we can parse and get a token list composed of our 3 items: the identifier, the operator and the value.&lt;/p&gt;
&lt;h3&gt;Transforming the Data&lt;/h3&gt;
&lt;p&gt;The list above in the format &lt;code&gt;[identifier, operator, value]&lt;/code&gt; is not really what we need in the end. We need something like &lt;code&gt;{operator: (identifier, value)}&lt;/code&gt;. We can leverage &lt;em&gt;pyparsing&lt;/em&gt; API to help us with that.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def list_to_dict(pos, tokens):
    return {tokens[1]: (tokens[0], tokens[2])}

match_format = (identifier + operator + value).setParseAction(list_to_dict)

print(match_format.parseString(&apos;&quot;foobar&quot;=&quot;123&quot;&apos;))

## Prints:
## [{&apos;=&apos;: (&apos;foobar&apos;, &apos;123&apos;)}]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;parseString&lt;/code&gt; method allows to modify the returned value of a grammar token. In that case, we transform the list of the dict we need.&lt;/p&gt;
&lt;h3&gt;Plugging the Parser and the Filter&lt;/h3&gt;
&lt;p&gt;In the following code, we&apos;ll reuse the &lt;code&gt;Filter&lt;/code&gt; class we wrote in &lt;a href=&quot;https://julien.danjou.info/blog/multi-value-syntax-tree-filtering-in-python&quot;&gt;our previous post&lt;/a&gt;. We&apos;ll just add the following code to our previous example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def parse_string(s):
    return match_format.parseString(s, parseAll=True)[0]

f = Filter(parse_string(&apos;&quot;foobar&quot;=&quot;baz&quot;&apos;))
print(f(foobar=&quot;baz&quot;))
print(f(foobar=&quot;biz&quot;))

## Prints:
## True
## False
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, we have a pretty simple parser and a good way to build a &lt;code&gt;Filter&lt;/code&gt; object from a string.&lt;/p&gt;
&lt;p&gt;As our &lt;em&gt;Filter&lt;/em&gt; object supports complex and nested operations, such as &lt;code&gt;and&lt;/code&gt; and &lt;code&gt;or&lt;/code&gt;, we could also add it to the grammar — I&apos;ll leave that to you reader as an exercise!&lt;/p&gt;
&lt;h3&gt;Building your own Grammar&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;pyparsing&lt;/em&gt; makes it easy to build one&apos;s own grammar. However, it should not be abused: building a DSL means that your users will have to discover and learn it. If it&apos;s way different that what they know and already exists, it might be cumbersome for them.&lt;/p&gt;
&lt;p&gt;Finally, if you&apos;re curious and want to see a real world usage, &lt;a href=&quot;https://doc.mergify.io/conditions.html#grammar&quot;&gt;Mergify condition system&lt;/a&gt; leverages &lt;em&gt;pyparsing&lt;/em&gt; to &lt;a href=&quot;https://github.com/Mergifyio/mergify-engine/blob/master/mergify_engine/rules/parser.py&quot;&gt;implement its parser&lt;/a&gt;. Check it out!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Python + Memcached: Efficient Caching in Distributed Applications</title><link>https://julien.danjou.info/blog/python-memcached-efficient-caching-in-distributed-applications/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-memcached-efficient-caching-in-distributed-applications/</guid><description>When writing Python applications, caching is important. Using a cache to avoid recomputing data or accessing a slow database can provide you with a great performance boost.  Python offers built-in pos</description><pubDate>Mon, 04 Mar 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;When writing Python applications, caching is important. Using a cache to avoid recomputing data or accessing a slow database can provide you with a great performance boost.&lt;/p&gt;
&lt;p&gt;Python offers built-in possibilities for caching, from a simple dictionary to a more complete data structure such as &lt;a href=&quot;https://docs.python.org/3/library/functools.html#functools.lru_cache&quot;&gt;&lt;code&gt;functools.lru_cache&lt;/code&gt;&lt;/a&gt;. The latter can cache any item using a &lt;a href=&quot;https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_Recently_Used_(LRU)&quot;&gt;Least-Recently Used algorithm&lt;/a&gt; to limit the cache size.&lt;/p&gt;
&lt;p&gt;Those data structures are, however, by definition &lt;em&gt;local&lt;/em&gt; to your Python process. When several copies of your application run across a large platform, using a in-memory data structure disallows sharing the cached content. This can be a problem for large-scale and distributed applications.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://files.realpython.com/media/python-memcached.97e1deb2aa17.png&quot; alt=&quot;Python + Memcached System Design Diagram&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Therefore, when a system is distributed across a network, it also needs a cache that is running on the network. Nowadays, there are plenty of network servers that offer caching capability—for example, &lt;a href=&quot;https://redis.io&quot;&gt;Redis&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As you’re going to see in this tutorial, &lt;a href=&quot;http://memcached.org/&quot;&gt;memcached&lt;/a&gt; is another great option for caching. After a quick introduction to basic memcached usage, you’ll learn about advanced patterns such as “cache and set” and using fallback caches to avoid cold cache performance issues.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Installing memcached&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Memcached&lt;/em&gt; is &lt;a href=&quot;https://github.com/memcached/memcached/wiki/Install&quot;&gt;available for many platforms&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you run &lt;strong&gt;Linux&lt;/strong&gt;, you can install it using &lt;code&gt;apt-get install memcached&lt;/code&gt; or &lt;code&gt;yum install memcached&lt;/code&gt;. This will install memcached from a pre-built package but you can alse build memcached from source, &lt;a href=&quot;https://github.com/memcached/memcached/wiki/Install&quot;&gt;as explained here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;macOS&lt;/strong&gt;, using &lt;a href=&quot;https://brew.sh/&quot;&gt;Homebrew&lt;/a&gt; is the simplest option. Just run &lt;code&gt;brew install memcached&lt;/code&gt; after you’ve installed the Homebrew package manager.&lt;/li&gt;
&lt;li&gt;On &lt;strong&gt;Windows&lt;/strong&gt;, you would have to compile memcached yourself or find &lt;a href=&quot;https://commaster.net/content/installing-memcached-windows&quot;&gt;pre-compiled binaries&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once installed, &lt;em&gt;memcached&lt;/em&gt; can simply be launched by calling the &lt;code&gt;memcached&lt;/code&gt; command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ memcached
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Before you can interact with memcached from Python-land you’ll need to install a memcached &lt;em&gt;client&lt;/em&gt; library. You’ll see how to do this in the next section, along with some basic cache access operations.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Storing and Retrieving Cached Values Using Python&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;If you never used &lt;em&gt;memcached&lt;/em&gt;, it is pretty easy to understand. It basically provides a giant network-available dictionary. This dictionary has a few properties that are different from a classical Python dictionnary, mainly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keys and values have to be bytes&lt;/li&gt;
&lt;li&gt;Keys and values are automatically deleted after an expiration time&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, the two basic operations for interacting with &lt;em&gt;memcached&lt;/em&gt; are &lt;code&gt;set&lt;/code&gt; and &lt;code&gt;get&lt;/code&gt;. As you might have guessed, they’re used to assign a value to a key or to get a value from a key, respectively.&lt;/p&gt;
&lt;p&gt;My preferred Python library for interacting with &lt;em&gt;memcached&lt;/em&gt; is &lt;a href=&quot;https://pypi.python.org/pypi/pymemcache&quot;&gt;&lt;code&gt;pymemcache&lt;/code&gt;&lt;/a&gt;—I recommend using it. You can simply &lt;a href=&quot;https://realpython.com/learn/python-first-steps/#11-pythons-power-packagesmodules&quot;&gt;install it using pip&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;$ pip install pymemcache&lt;/p&gt;
&lt;p&gt;The following code shows how you can connect to &lt;em&gt;memcached&lt;/em&gt; and use it as a network cache in your Python applications:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from pymemcache.client import base
## Don&apos;t forget to run `memcached&apos; before running this next line:
&amp;gt;&amp;gt;&amp;gt; client = base.Client((&apos;localhost&apos;, 11211))
## Once the client is instantiated, you can access the cache:
&amp;gt;&amp;gt;&amp;gt; client.set(&apos;some_key&apos;, &apos;some value&apos;)
## Retrieve previously set data again:
&amp;gt;&amp;gt;&amp;gt; client.get(&apos;some_key&apos;)&apos;some value&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;memcached&lt;/em&gt; network protocol is really simple an its implementation extremely fast, which makes it useful to store data that would be otherwise slow to retrieve from the canonical source of data or to compute again.&lt;/p&gt;
&lt;p&gt;While straightforward enough, this example allows storing key/value tuples across the network and accessing them through multiple, distributed, running copies of your application. This is simplistic, yet powerful. And it’s a great first step towards optimizing your application.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Automatically Expiring Cached Data&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;When storing data into &lt;em&gt;memcached&lt;/em&gt;, you can set an expiration time—a maximum number of seconds for &lt;em&gt;memcached&lt;/em&gt; to keep the key and value around. After that delay, &lt;em&gt;memcached&lt;/em&gt; automatically removes the key from its cache.&lt;/p&gt;
&lt;p&gt;What should you set this cache time to? There is no magic number for this delay, and it will entirely depend on the type of data and application that you are working with. It could be a few seconds, or it might be a few hours.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Cache invalidation&lt;/em&gt;, which defines when to remove the cache because it is out of sync with the current data, is also something that your application will have to handle. Especially if presenting data that is too old or &lt;em&gt;stale&lt;/em&gt; is to be avoided.&lt;/p&gt;
&lt;p&gt;Here again, there is no magical recipe; it depends on the type of application you are building. However, there are several outlying cases that should be handled—which we haven’t yet covered in the above example.&lt;/p&gt;
&lt;p&gt;A caching server cannot grow infinitely—memory is a finite resource. Therefore, keys will be flushed out by the caching server as soon as it needs more space to store other things.&lt;/p&gt;
&lt;p&gt;Some keys might also be expired because they reached their expiration time (also sometimes called the “time-to-live” or TTL.) In those cases the data is lost, and the canonical data source must be queried again.&lt;/p&gt;
&lt;p&gt;This sounds more complicated than it really is. You can generally work with the following pattern when working with &lt;em&gt;memcached&lt;/em&gt; in Python:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from pymemcache.client import base
def do_some_query():
    # Replace with actual querying code to a database,
    # a remote REST API, etc.
    return 42
    
## Don&apos;t forget to run `memcached&apos; before running this code
client = base.Client((&apos;localhost&apos;, 11211))
result = client.get(&apos;some_key&apos;)
if result is None:
    # The cache is empty, need to get the value
        # from the canonical source:
        result = do_some_query()
        # Cache the result for next time:
        client.set(&apos;some_key&apos;, result)
        # Whether we needed to update the cache or not,
        # at this point you can work with the data
        # stored in the `result` variable:
        print(result)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Handling missing keys is mandatory because of normal flush-out operations. It is also obligatory to handle the cold cache scenario, i.e. when &lt;em&gt;memcached&lt;/em&gt; has just been started. In that case, the cache will be entirely empty and the cache needs to be fully repopulated, one request at a time.&lt;/p&gt;
&lt;p&gt;This means you should view any cached data as ephemeral. And you should never expect the cache to contain a value you previously wrote to it.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Warming Up a Cold Cache&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Some of the cold cache scenarios cannot be prevented, for example a &lt;em&gt;memcached&lt;/em&gt; crash. But some can, for example migrating to a new &lt;em&gt;memcached&lt;/em&gt; server.&lt;/p&gt;
&lt;p&gt;When it is possible to predict that a cold cache scenario will happen, it is better to avoid it. A cache that needs to be refilled means that all of the sudden, the canonical storage of the cached data will be massively hit by all cache users who lack a cache data (also known as the &lt;a href=&quot;https://en.wikipedia.org/wiki/Thundering_herd_problem&quot;&gt;thundering herd problem&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;&lt;em&gt;pymemcache&lt;/em&gt; provides a class named &lt;code&gt;FallbackClient&lt;/code&gt; that helps in implementing this scenario as demonstrated here:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from pymemcache.client import base
from pymemcache import fallback

def do_some_query():
    # Replace with actual querying code to a database,
    # a remote REST API, etc.
    return 42
    
## Set `ignore_exc=True` so it is possible to shut down
## the old cache before removing its usage from
## the program, if ever necessary.
old_cache = base.Client((&apos;localhost&apos;, 11211), ignore_exc=True)
new_cache = base.Client((&apos;localhost&apos;, 11212))

client = fallback.FallbackClient((new_cache, old_cache))

result = client.get(&apos;some_key&apos;)

if result is None:
    # The cache is empty, need to get the value
    # from the canonical source:
    result = do_some_query()
    # Cache the result for next time:
    client.set(&apos;some_key&apos;, result)
    print(result)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;FallbackClient&lt;/code&gt; queries the old cache passed to its constructor, respecting the order. In this case, the new cache server will always be queried first, and in case of a cache miss, the old one will be queried—avoiding a possible return-trip to the primary source of data.&lt;/p&gt;
&lt;p&gt;If any key is set, it will only be set to the new cache. After some time, the old cache can be decommissioned and the &lt;code&gt;FallbackClient&lt;/code&gt; can be replaced directed with the &lt;code&gt;new_cache&lt;/code&gt;client.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Check And Set&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;When communicating with a remote cache, the usual concurrency problem comes back: there might be several clients trying to access the same key at the same time. &lt;em&gt;memcached&lt;/em&gt; provides a &lt;em&gt;check and set&lt;/em&gt; operation, shortened to &lt;em&gt;CAS&lt;/em&gt;, which helps to solve this problem.&lt;/p&gt;
&lt;p&gt;The simplest example is an application that wants to count the number of users it has. Each time a visitor connects, a counter is incremented by 1. Using &lt;em&gt;memcached&lt;/em&gt;, a simple implementation would be:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def on_visit(client):
    result = client.get(&apos;visitors&apos;)
    if result is None:
        result = 1
    else:
        result += 1
    client.set(&apos;visitors&apos;, result)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, what happens if two instances of the application try to update this counter at the same time?&lt;/p&gt;
&lt;p&gt;The first call &lt;code&gt;client.get(&apos;visitors&apos;)&lt;/code&gt; will return the same number of visitors for both of them, let’s say it’s 42. Then both will add 1, compute 43, and set the number of visitors to 43. That number is wrong, and the result should be 44, i.e. 42 + 1 + 1.&lt;/p&gt;
&lt;p&gt;To solve this concurrency issue, the CAS operation of &lt;em&gt;memcached&lt;/em&gt; is handy. The following snippet implements a correct solution:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def on_visit(client):
    while True:
        result, cas = client.gets(&apos;visitors&apos;)
        if result is None:
            result = 1
        else:
            result += 1
        if client.cas(&apos;visitors&apos;, result, cas):
             break
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;gets&lt;/code&gt; method returns the value, just like the &lt;code&gt;get&lt;/code&gt; method, but it also returns a &lt;em&gt;CAS value&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;What is in this value is not relevant, but it is used for the next method &lt;code&gt;cas&lt;/code&gt; call. This method is equivalent to the &lt;code&gt;set&lt;/code&gt; operation, except that it fails if the value has changed since the &lt;code&gt;gets&lt;/code&gt; operation. In case of success, the loop is broken. Otherwise, the operation is restarted from the beginning.&lt;/p&gt;
&lt;p&gt;In the scenario where two instances of the application try to update the counter at the same time, only one succeeds to move the counter from 42 to 43. The second instance gets a &lt;code&gt;False&lt;/code&gt; value returned by the &lt;code&gt;client.cas&lt;/code&gt; call, and have to retry the loop. It will retrieve 43 as value this time, will increment it to 44, and its &lt;code&gt;cas&lt;/code&gt; call will succeed, thus solving our problem.&lt;/p&gt;
&lt;p&gt;Incrementing a counter is interesting as an example to explain how CAS works because it is simplistic. However, &lt;em&gt;memcached&lt;/em&gt; also provides the &lt;code&gt;incr&lt;/code&gt; and &lt;code&gt;decr&lt;/code&gt; methods to increment or decrement an integer in a single request, rather than doing multiple &lt;code&gt;gets&lt;/code&gt;/&lt;code&gt;cas&lt;/code&gt; calls. In real-world applications &lt;code&gt;gets&lt;/code&gt; and &lt;code&gt;cas&lt;/code&gt; are used for more complex data type or operations&lt;/p&gt;
&lt;p&gt;Most remote caching server and data store provide such a mechanism to prevent concurrency issues. It is critical to be aware of those cases to make proper use of their features.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Beyond Caching&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The simple techniques illustrated in this article showed you how easy it is to leverage &lt;em&gt;memcached&lt;/em&gt; to speed up the performances of your Python application.&lt;/p&gt;
&lt;p&gt;Just by using the two basic “set” and “get” operations you can often accelerate data retrieval or avoid recomputing results over and over again. With &lt;em&gt;memcached&lt;/em&gt; you can share the cache accross a large number of distributed nodes.&lt;/p&gt;
&lt;p&gt;Other, more advanced patterns you saw in this tutorial, like the _Check And Set (CAS)_operation allow you to update data stored in the cache concurrently across multiple Python threads or processes while avoiding data corruption.&lt;/p&gt;
&lt;p&gt;If you are interested into learning more about advanced techniques to write faster and more scalable Python applications, check out &lt;a href=&quot;https://scaling-python.com/&quot;&gt;Scaling Python&lt;/a&gt;. It covers many advanced topics such as network distribution, queuing systems, distributed hashing, and code profiling.&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>How to Log Properly in Python</title><link>https://julien.danjou.info/blog/how-to-log-properly-in-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/how-to-log-properly-in-python/</guid><description>Logging is one of the most underrated features. Often ignored by software engineers, it can save your time when your application&apos;s running in production.  Most teams don&apos;t think about it until it&apos;s to</description><pubDate>Mon, 04 Feb 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Logging is one of the most underrated features. Often ignored by software engineers, it can save your time when your application&apos;s running in production.&lt;/p&gt;
&lt;p&gt;Most teams don&apos;t think about it until it&apos;s too late in their development process. It&apos;s when things start to get wrong in deployments that somebody realizes too late that logging is missing.&lt;/p&gt;
&lt;h2&gt;Guidelines&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://12factor.net&quot;&gt;Twelve-Factor App&lt;/a&gt; &lt;a href=&quot;https://12factor.net/logs&quot;&gt;defines logs&lt;/a&gt; as a &lt;em&gt;stream of aggregated, time-ordered events collected from the output streams of all running processes&lt;/em&gt;. It also describes how applications should handle their logging. We can summarize those guidelines as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Logs have no fixed beginning or end.&lt;/li&gt;
&lt;li&gt;Print logs to &lt;code&gt;stdout&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Print logs unbuffered.&lt;/li&gt;
&lt;li&gt;The environment is responsible for capturing the stream.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From my experience, this set of rules is a good trade-off. Logs have to be kept pretty simple to be efficient and reliable. Building complex logging systems might make it harder to get insight into a running application.&lt;/p&gt;
&lt;p&gt;There&apos;s also no point in duplication effort in log management (e.g., log file rotation, archival policy, etc) in your different applications. Having an external workflow that can be shared across different programs seems more efficient.&lt;/p&gt;
&lt;h2&gt;In Python&lt;/h2&gt;
&lt;p&gt;Python provides a logging subsystem with its &lt;a href=&quot;https://docs.python.org/3/library/logging.html&quot;&gt;&lt;em&gt;logging&lt;/em&gt;&lt;/a&gt; module. This module provides a &lt;em&gt;Logger&lt;/em&gt; object that allows you to emit messages with different levels of criticality. Those messages can then be filtered and send to different handlers.&lt;/p&gt;
&lt;p&gt;Let&apos;s have an example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import logging

logger = logging.getLogger(&quot;myapp&quot;)
logger.error(&quot;something wrong&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Depending on the version of Python you&apos;re running you&apos;ll either see:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;No handlers could be found for logger &quot;test123&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;something wrong
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Python 2 used to have no logging setup by default, so it would print an error message about no handler being found. Since Python 3, a default handler outputting to &lt;code&gt;stdout&lt;/code&gt; is now installed — matching the requirements from the 12factor App.&lt;/p&gt;
&lt;p&gt;However, this default setup is far from being perfect.&lt;/p&gt;
&lt;h2&gt;Shortcomings&lt;/h2&gt;
&lt;p&gt;The default format that Python uses does not embed any contextual information. There is no way to know the name of the logger — &lt;code&gt;myapp&lt;/code&gt; in the previous example — nor the date and time of the logged message.&lt;/p&gt;
&lt;p&gt;You &lt;strong&gt;must&lt;/strong&gt; configure Python logging subsystem to enhance its output format.&lt;/p&gt;
&lt;p&gt;To do that, I advise using the &lt;em&gt;&lt;a href=&quot;https://github.com/jd/daiquiri&quot;&gt;daiquiri&lt;/a&gt;&lt;/em&gt; module. It provides an excellent default configuration and a simple API to configure logging, plus some exciting features.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/12/markus-spiske-109588-unsplash.jpg&quot; alt=&quot;Close-up of code on a screen illustrating logging configuration&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Logging Setup&lt;/h2&gt;
&lt;p&gt;When using &lt;em&gt;daiquiri&lt;/em&gt;, the first thing to do is to set up your logging correctly. This can be done with the &lt;code&gt;daiquiri.setup&lt;/code&gt; function as this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import daiquiri

daiquiri.setup()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As simple as that. You can tweak the setup further by asking it to log to file, to change the default string formats, etc, but just calling &lt;code&gt;daiquiri.setup&lt;/code&gt; is enough to get a proper logging default.&lt;/p&gt;
&lt;p&gt;See:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import daiquiri

daiquiri.setup()
daiquiri.getLogger(&quot;myapp&quot;).error(&quot;something wrong&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;outputs:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;2018-12-13 10:24:04,373 [38550] ERROR    myapp: something wrong
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If your terminal supports writing text in colors, the line will be printed in red since it&apos;s an error. The format provided by &lt;em&gt;daiquiri&lt;/em&gt; is better than Python&apos;s default: this one includes a timestamp, the process ID,  the criticality level and the logger&apos;s name. Needless to say that this format can also be customized.&lt;/p&gt;
&lt;h2&gt;Passing Contextual Information&lt;/h2&gt;
&lt;p&gt;Logging strings are boring. Most of the time, engineers end up writing code such as:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;logger.error(&quot;Something wrong happened with %s when writing data at %d&quot;, myobject.myfield, myobject.mynumber&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The issue with this approach is that you have to think about each field that you want to log about your object, and to make sure that they are inserted correctly in your sentence. If you forget an essential field to describe your object and the problem, you&apos;re screwed.&lt;/p&gt;
&lt;p&gt;A reliable alternative to this manual crafting of log strings is to pass interesting objects as keyword arguments. &lt;em&gt;Daiquiri&lt;/em&gt; supports it, and it works that way:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import attr
import daiquiri
import requests

daiquiri.setup()
logger = daiquiri.getLogger(&quot;myapp&quot;)

@attr.s
class Request:
     url = attr.ib()
     status_code = attr.ib(init=False, default=None)
     
     def get(self):
         r = requests.get(self.url)
         self.status_code = r.status_code
         r.raise_for_status()
         return r

user = &quot;jd&quot;
req = Request(&quot;https://google.com/not-this-page&quot;)
try:
    req.get()
except Exception:
    logger.error(&quot;Something wrong happened during the request&quot;,
                 request=req, user=user)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If anything goes wrong with the request, it will be logged with the stack trace, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;2018-12-14 10:37:24,586 [43644] ERROR    myapp [request: Request(url=&apos;https://google.com/not-this-page&apos;, status_code=404)] [user: jd]: Something wrong happened during the request
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, the call to &lt;code&gt;logger.error&lt;/code&gt; is pretty straight-forward: a line that explains what&apos;s wrong, and then the different interesting objects are passed as keyword arguments.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Daiquiri&lt;/em&gt; logs those keyword arguments with a default format of &lt;code&gt;[key: value]&lt;/code&gt; that is included as a prefix to the log string. The value is printed using its &lt;code&gt;__format__&lt;/code&gt; method — that&apos;s why I&apos;m using the &lt;em&gt;&lt;a href=&quot;http://www.attrs.org/en/stable/&quot;&gt;attr&lt;/a&gt;&lt;/em&gt; module here: it automatically generates this method for me and includes all fields by default. You can also customize &lt;em&gt;daiquiri&lt;/em&gt; to use any other format.&lt;/p&gt;
&lt;p&gt;Following those guidelines should be a perfect start for logging correctly with Python!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Serious Python released!</title><link>https://julien.danjou.info/blog/serious-python-released/</link><guid isPermaLink="true">https://julien.danjou.info/blog/serious-python-released/</guid><description>Today I&apos;m glad to announce that my new book, Serious Python, has been released.  However, you wonder… what is Serious Python?  Well, Serious Python is the the new name of The Hacker&apos;s Guide to Python</description><pubDate>Thu, 17 Jan 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Today I&apos;m glad to announce that my new book, Serious Python, has been released.&lt;/p&gt;
&lt;p&gt;However, you wonder… what is &lt;em&gt;Serious Python&lt;/em&gt;?&lt;/p&gt;
&lt;p&gt;Well, Serious Python is the the new name of &lt;em&gt;The Hacker&apos;s Guide to Python&lt;/em&gt; — the first book I published. Serious Python is the 4th update of that book — but with a brand a new name and a new editor!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/01/serious-python.png&quot; alt=&quot;Cover of Serious Python&quot; /&gt;&lt;/p&gt;
&lt;p&gt;For more than a year, I&apos;ve been working with the editor &lt;a href=&quot;https://nostarch.com&quot;&gt;No Starch Press&lt;/a&gt; to enhance this book and bring it to the next level! I&apos;m very proud of what we achieved, and working with a whole team on this book has been a fantastic experience.&lt;/p&gt;
&lt;p&gt;The content has been updated to be ready for 2019: &lt;em&gt;pytest&lt;/em&gt; is now a de-facto standard for testing, so I had to write about it. On the other hand, Python 2 support was less a focus, and I removed many mentions of Python 2 altogether. Some chapters have been reorganized, regrouped and others got enhanced with new content!&lt;/p&gt;
&lt;p&gt;The good news: you can get this new edition of the book with a &lt;strong&gt;15% discount&lt;/strong&gt; for the next 24 hours using the coupon code &lt;strong&gt;SERIOUSPYTHONLAUNCH&lt;/strong&gt; on the &lt;a href=&quot;https://serious-python.com&quot;&gt;book page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The book is also released as part as No Starch collection. They also are in charge of distributing the paperback copy of the book. If you want a version of the book that you can touch and hold in your arms, look for it in &lt;a href=&quot;https://nostarch.com/seriouspython&quot;&gt;No Starch shop&lt;/a&gt;, on &lt;a href=&quot;https://www.amazon.com/gp/product/B074S4G1L5/ref=as_li_tl?ie=UTF8&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=B074S4G1L5&amp;amp;linkCode=as2&amp;amp;tag=juliendanjou-20&amp;amp;linkId=2d68dde537d79ba5e334d4291ad37fff&quot;&gt;Amazon&lt;/a&gt; or in your favorite book shop!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/01/hackerspython_cover-front_v5.png&quot; alt=&quot;No Starch version of Serious Python cover&quot; /&gt;&lt;/p&gt;
</content:encoded><category>python</category><category>books</category></item><item><title>A multi-value syntax tree filtering in Python</title><link>https://julien.danjou.info/blog/multi-value-syntax-tree-filtering-in-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/multi-value-syntax-tree-filtering-in-python/</guid><description>A while ago, we&apos;ve seen how to write a simple filtering syntax tree with Python. The idea was to provide a small abstract syntax tree with an easy to write data structure that would be able to filter.</description><pubDate>Mon, 03 Dec 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A while ago, we&apos;ve seen &lt;a href=&quot;https://julien.danjou.info/blog/simple-filtering-syntax-tree-in-python&quot;&gt;how to write a simple filtering syntax tree with Python&lt;/a&gt;. The idea was to provide a small abstract syntax tree with an easy to write data structure that would be able to filter a value. Filtering meaning that once evaluated, our AST would return either &lt;code&gt;True&lt;/code&gt; or &lt;code&gt;False&lt;/code&gt; based on the passed value.&lt;/p&gt;
&lt;p&gt;With that, we were able to write small rules like &lt;code&gt;Filter({&quot;eq&quot;: 3})(4)&lt;/code&gt; that would return &lt;code&gt;False&lt;/code&gt; since, well, 4 is not equal to 3.&lt;/p&gt;
&lt;p&gt;In this new post, I propose we enhance our filtering ability to support multiple values. The idea is to be able to write something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; f = Filter(
  {&quot;and&quot;: [
    {&quot;eq&quot;: (&quot;foo&quot;, 3)},
    {&quot;gt&quot;: (&quot;bar&quot;, 4)},
   ]
  },
)
&amp;gt;&amp;gt;&amp;gt; f(foo=3, bar=5)
True
&amp;gt;&amp;gt;&amp;gt; f(foo=4, bar=5)
False
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The biggest change here is that the binary operators (&lt;code&gt;eq&lt;/code&gt;, &lt;code&gt;gt&lt;/code&gt;, &lt;code&gt;le&lt;/code&gt;, etc.) now support getting two values, and not only one, and that we can pass multiple values to our filter by using keyword arguments.&lt;/p&gt;
&lt;p&gt;How should we implement that? Well, we can keep the same data structure we built previously. However, this time we&apos;re gonna do the following change:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The left value of the binary operator will be a string that will be used as the key to access the keyword arguments passed to our &lt;code&gt;Filter.__call__&lt;/code&gt; values.&lt;/li&gt;
&lt;li&gt;The right value of the binary operator will be kept as it is (like before).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We therefore need to change our &lt;code&gt;Filter.build_evaluator&lt;/code&gt; to accommodate this as follow:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def build_evaluator(self, tree):
    try:
        operator, nodes = list(tree.items())[0]
    except Exception:
        raise InvalidQuery(&quot;Unable to parse tree %s&quot; % tree)
    try:
        op = self.multiple_operators[operator]
    except KeyError:
        try:
            op = self.binary_operators[operator]
        except KeyError:
            raise InvalidQuery(&quot;Unknown operator %s&quot; % operator)
        assert len(nodes) == 2 # binary operators take 2 values
        def _op(values):
            return op(values[nodes[0]], nodes[1])
        return _op
    # Iterate over every item in the list of the value linked
    # to the logical operator, and compile it down to its own
    # evaluator.
    elements = [self.build_evaluator(node) for node in nodes]
    return lambda values: op((e(values) for e in elements))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The algorithm is pretty much the same, the tree being browsed recursively.&lt;/p&gt;
&lt;p&gt;First, the operator and its arguments (nodes) are extracted.&lt;/p&gt;
&lt;p&gt;Then, if the operator takes multiple arguments (such as &lt;code&gt;and&lt;/code&gt; and &lt;code&gt;or&lt;/code&gt; operators), each node is recursively evaluated and a function is returned evaluating those nodes.&lt;br /&gt;
If the operator is a binary operator (such as &lt;code&gt;eq&lt;/code&gt;, &lt;code&gt;lt&lt;/code&gt;, etc.), it checks that the passed argument list length is 2. Then, it returns a function that will apply the operator (e.g., &lt;code&gt;operator.eq&lt;/code&gt;) to &lt;code&gt;values[nodes[0]]&lt;/code&gt; and &lt;code&gt;nodes[1]&lt;/code&gt;: the former access the arguments (&lt;code&gt;values&lt;/code&gt;) passed to the filter&apos;s &lt;code&gt;__call__&lt;/code&gt; function while the latter is directly the passed argument.&lt;/p&gt;
&lt;p&gt;The full class looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import operator

class InvalidQuery(Exception):
    pass

class Filter(object):
    binary_operators = {
        u&quot;=&quot;: operator.eq,
        u&quot;==&quot;: operator.eq,
        u&quot;eq&quot;: operator.eq,

        u&quot;&amp;lt;&quot;: operator.lt,
        u&quot;lt&quot;: operator.lt,

        u&quot;&amp;gt;&quot;: operator.gt,
        u&quot;gt&quot;: operator.gt,

        u&quot;&amp;lt;=&quot;: operator.le,
        u&quot;≤&quot;: operator.le,
        u&quot;le&quot;: operator.le,

        u&quot;&amp;gt;=&quot;: operator.ge,
        u&quot;≥&quot;: operator.ge,
        u&quot;ge&quot;: operator.ge,

        u&quot;!=&quot;: operator.ne,
        u&quot;≠&quot;: operator.ne,
        u&quot;ne&quot;: operator.ne,
    }

    multiple_operators = {
        u&quot;or&quot;: any,
        u&quot;∨&quot;: any,
        u&quot;and&quot;: all,
        u&quot;∧&quot;: all,
    }

    def __init__(self, tree):
        self._eval = self.build_evaluator(tree)

    def __call__(self, **kwargs):
        return self._eval(kwargs)

    def build_evaluator(self, tree):
        try:
            operator, nodes = list(tree.items())[0]
        except Exception:
            raise InvalidQuery(&quot;Unable to parse tree %s&quot; % tree)
        try:
            op = self.multiple_operators[operator]
        except KeyError:
            try:
                op = self.binary_operators[operator]
            except KeyError:
                raise InvalidQuery(&quot;Unknown operator %s&quot; % operator)
            assert len(nodes) == 2 # binary operators take 2 values
            def _op(values):
                return op(values[nodes[0]], nodes[1])
            return _op
        # Iterate over every item in the list of the value linked
        # to the logical operator, and compile it down to its own
        # evaluator.
        elements = [self.build_evaluator(node) for node in nodes]
        return lambda values: op((e(values) for e in elements))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can check that it works by building some filters:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;x = Filter({&quot;eq&quot;: (&quot;foo&quot;, 1)})
assert x(foo=1)

x = Filter({&quot;eq&quot;: (&quot;foo&quot;, &quot;bar&quot;)})
assert not x(foo=1)

x = Filter({&quot;or&quot;: (
    {&quot;eq&quot;: (&quot;foo&quot;, &quot;bar&quot;)},
    {&quot;eq&quot;: (&quot;bar&quot;, 1)},
)})
assert x(foo=1, bar=1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Supporting multiple values is handy as it allows to pass complete dictionaries to the filter, rather than just one value. That enables users to filter more complex objects.&lt;/p&gt;
&lt;h2&gt;Sub-dictionary support&lt;/h2&gt;
&lt;p&gt;It&apos;s also possible to support deeper data structure, like a dictionary of dictionary. By replacing &lt;code&gt;values[nodes[0]]&lt;/code&gt; by &lt;code&gt;self._resolve_name(values, node[0])&lt;/code&gt; with a &lt;code&gt;_resolve_name&lt;/code&gt; method like this one, the filter is able to traverse dictionaries:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ATTR_SEPARATOR = &quot;.&quot;

def _resolve_name(self, values, name):
    try:
        for subname in name.split(self.ATTR_SEPARATOR):
            values = values[subname]
        return values
    except KeyError:
        raise InvalidQuery(&quot;Unknown attribute %s&quot; % name)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It then works like that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;x = Filter({&quot;eq&quot;: (&quot;baz.sub&quot;, 23)})
assert x(foo=1, bar=1, baz={&quot;sub&quot;: 23})

x = Filter({&quot;eq&quot;: (&quot;baz.sub&quot;, 23)})
assert not x(foo=1, bar=1, baz={&quot;sub&quot;: 3})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By using the syntax &lt;code&gt;key.subkey.subsubkey&lt;/code&gt; the filter is able to access item inside dictionaries on more complex data structure.&lt;/p&gt;
&lt;p&gt;That basic filter engine can evolve quite easily in something powerful, as you can add new operators or new way to access/manipulate the passed data structure.&lt;/p&gt;
&lt;p&gt;If you have other ideas on nifty features that could be added, feel free to add a comment below!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>The Best flake8 Extensions for your Python Project</title><link>https://julien.danjou.info/blog/the-best-flake8-extensions/</link><guid isPermaLink="true">https://julien.danjou.info/blog/the-best-flake8-extensions/</guid><description>In the last blog post about coding style, we dissected what the state of the art was regarding coding style check in Python.</description><pubDate>Mon, 05 Nov 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In the &lt;a href=&quot;https://julien.danjou.info/blog/code-style-checks-in-python&quot;&gt;last blog post about coding style&lt;/a&gt;, we dissected what the state of the art was regarding coding style check in Python.&lt;/p&gt;
&lt;p&gt;As we&apos;ve seen, Flake8 is a wrapper around several tools and is extensible via plugins: meaning that you can add your own checks. I&apos;m a heavy user of Flake8 and relies on a few plugins to extend the check coverage of common programming mistakes in Python. Here&apos;s the list of the ones I can&apos;t work without. As a bonus, you&apos;ll find at the end of this post, a sample of my go-to &lt;code&gt;tox.ini&lt;/code&gt; file.&lt;/p&gt;
&lt;h2&gt;flake8-import-order&lt;/h2&gt;
&lt;p&gt;The name is quite explicit: this extension checks the order of your &lt;code&gt;import&lt;/code&gt; statements at the beginning of your files. By default, it uses a style that I enjoy, which looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import os
import sys

import requests

import yaml

import myproject
from myproject.utils import somemodule
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The builtin modules are grouped as the first ones. Then comes a group for each third-party modules that are imported. Finally, the last group manages the modules of the current project. I find this way of organizing modules import quite clear and easy to read.&lt;/p&gt;
&lt;p&gt;To make sure flake8-import-order knows about the name of your project module name, you need to specify it in &lt;code&gt;tox.ini&lt;/code&gt; with the &lt;code&gt;application-import-names&lt;/code&gt; option.&lt;/p&gt;
&lt;p&gt;If you beg to differ, you can use &lt;a href=&quot;https://github.com/PyCQA/flake8-import-order/#styles&quot;&gt;any of the other styles that flake8-import-order offers by default&lt;/a&gt; by setting the &lt;code&gt;import-order-style&lt;/code&gt; option. You can obviously &lt;a href=&quot;https://github.com/PyCQA/flake8-import-order/#extending-styles&quot;&gt;provide your own style&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;flake8-blind-except&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/elijahandrews/flake8-blind-except&quot;&gt;flake8-blind-except extension&lt;/a&gt; checks that no &lt;code&gt;except&lt;/code&gt; statement is used without specifying an exception type. The following excerpt is, therefore, considered invalid:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;try:
    do_something()
except:
    pass
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using &lt;code&gt;except&lt;/code&gt; without any exception type specified is considered bad practice as it might catch unwanted exceptions. It forces the developer to think about what kind of errors might happen and should &lt;em&gt;really&lt;/em&gt; be caught.&lt;/p&gt;
&lt;p&gt;In the rare case any exception should be caught, it&apos;s still possible to use &lt;code&gt;except Exception&lt;/code&gt; anyway.&lt;/p&gt;
&lt;h2&gt;flake8-builtins&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/gforcada/flake8-builtins&quot;&gt;flake8-builtins plugin&lt;/a&gt; checks that there is no name collision between your code and the Python builtin variables.&lt;/p&gt;
&lt;p&gt;For example, this code would trigger an error:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def first(list):
    return list[0]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As &lt;code&gt;list&lt;/code&gt; is a builtin in Python (to create a list!), shadowing its definition by using &lt;code&gt;list&lt;/code&gt; as the name of a parameter in a function signature would trigger a warning from flake8-builtins.&lt;/p&gt;
&lt;p&gt;While the code is valid, it&apos;s a bad habit to override Python builtins functions. It might lead to tricky errors; in the above example, if you ever need to call &lt;code&gt;list()&lt;/code&gt;, you won&apos;t be able to.&lt;/p&gt;
&lt;h2&gt;flake8-logging-format&lt;/h2&gt;
&lt;p&gt;This &lt;a href=&quot;https://github.com/globality-corp/flake8-logging-format&quot;&gt;module&lt;/a&gt; is handy as it is still slapping my fingers once in a while. When using the &lt;code&gt;logging&lt;/code&gt; module, it prevents from writing this kind of code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mylogger.info(&quot;Hello %s&quot; % mystring)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While this works, it&apos;s suboptimal as it forces the string interpolation. If the logger is configured to print only messages with a logging level of warning or above, doing a string interpolation here is pointless.&lt;/p&gt;
&lt;p&gt;Therefore, one should instead write:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mylogger.info(&quot;Hello %s&quot;, mystring)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Same goes if you use &lt;code&gt;format&lt;/code&gt; to do any formatting.&lt;/p&gt;
&lt;p&gt;Be aware that contrary to other flake8 modules, this one does not enable the check by default. You&apos;ll need to add &lt;code&gt;enable-extensions=G&lt;/code&gt; in your &lt;code&gt;tox.ini&lt;/code&gt; file.&lt;/p&gt;
&lt;h2&gt;flake8-docstrings&lt;/h2&gt;
&lt;p&gt;The f&lt;a href=&quot;https://gitlab.com/pycqa/flake8-docstringshttps://gitlab.com/pycqa/flake8-docstrings&quot;&gt;lake8-docstrings&lt;/a&gt; module checks the content of your Python docstrings for respect of the &lt;a href=&quot;https://www.python.org/dev/peps/pep-0257/&quot;&gt;PEP 257&lt;/a&gt;. This PEP is full of small details about formatting your docstrings the right way, which is something you wouldn&apos;t be able to do without such a tool. A simple example would be:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class Foobar:
    &quot;&quot;&quot;A foobar&quot;&quot;&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While this seems valid, there is a missing point at the end of the docstring.&lt;/p&gt;
&lt;p&gt;Trust me, especially if you are writing a library that is consumed by other developers, this is a must-have.&lt;/p&gt;
&lt;h2&gt;flake8-rst-docstrings&lt;/h2&gt;
&lt;p&gt;This &lt;a href=&quot;https://pypi.org/project/flake8-rst-docstrings/&quot;&gt;extension&lt;/a&gt; is a good complement to flake8-docstrings: it checks that the content of your docstrings is valid RST. It&apos;s a no-brainer, so I&apos;d install it without question. Again, if your project exports a documented API that is built with &lt;a href=&quot;https://sphinx-doc.org&quot;&gt;Sphinx&lt;/a&gt;, this is a must-have.&lt;/p&gt;
&lt;h2&gt;My standard tox.ini&lt;/h2&gt;
&lt;p&gt;Here&apos;s the standard &lt;code&gt;tox.ini&lt;/code&gt; excerpt that I use in most of my projects. You can copy paste it and use&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[testenv:pep8]
deps = flake8
       flake8-import-order
       flake8-blind-except
       flake8-builtins
       flake8-docstrings
       flake8-rst-docstrings
       flake8-logging-format
commands = flake8

[flake8]
exclude = .tox
## If you need to ignore some error codes in the whole source code
## you can write them here
## ignore = D100,D101
show-source = true
enable-extensions=G
application-import-names = &amp;lt;myprojectname&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Before disabling an error code for your entire project, remember that you can force flake8 to ignore a particular instance of the error by adding the &lt;code&gt;# noqa&lt;/code&gt; tag at the end of the line.&lt;/p&gt;
&lt;p&gt;If you have any flake8 extension that you think is useful, please let me know in the comment section!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Code Style Checks in Python</title><link>https://julien.danjou.info/blog/code-style-checks-in-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/code-style-checks-in-python/</guid><description>After starting your first Python project, you might realize that it is actually not that obvious to be consistent with the way you write Python code.</description><pubDate>Mon, 01 Oct 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;After &lt;a href=&quot;https://julien.danjou.info/blog/starting-your-first-python-project&quot;&gt;starting your first Python project&lt;/a&gt;, you might realize that it is actually not that obvious to be consistent with the way you write Python code. If you collaborate with other developers, your code style might differ, and the code can become somehow unreadable.&lt;/p&gt;
&lt;p&gt;I hate coding style discussions as much as every engineer. Who has not seen hours of nitpicking on code reviews, a heated debate around the coffee machine or nerf guns battles to decide where the semicolon should be?&lt;/p&gt;
&lt;p&gt;When I start a new project, the first thing I do is set up an automated style check. With that in place, there&apos;s no time wasted during code reviews about manually checking what&apos;s a program&apos;s good at: coding style consistency. Since coding style is a touchy subject, it&apos;s a good reason to tackle it at the beginning of the project.&lt;/p&gt;
&lt;p&gt;Python has an amazing quality that few other languages have: it uses indentation to define blocks. While it offers a solution to the age-old question of &quot;where should I put my curly braces?&quot;, it introduces a new question in the process: &quot;how should I indent?&quot;.&lt;/p&gt;
&lt;p&gt;I imagine that it was one of the first question that was raised in the community, so the Python folks, in their vast wisdom, came up with the &lt;a href=&quot;http://www.python.org/dev/peps/pep-0008/&quot;&gt;PEP 8&lt;/a&gt;: &lt;em&gt;Style Guide for Python Code&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;This document defines the standard style for writing Python code. The list of guidelines boils down to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use 4 spaces per indentation level.&lt;/li&gt;
&lt;li&gt;Limit all lines to a maximum of 79 characters.&lt;/li&gt;
&lt;li&gt;Separate top-level function and class definitions with two blank lines.&lt;/li&gt;
&lt;li&gt;Encode files using ASCII or UTF-8.&lt;/li&gt;
&lt;li&gt;One module import per &lt;code&gt;import&lt;/code&gt; statement and per line, at the top of the file, after comments and docstrings, grouped first by standard, then third-party, and finally local library imports.&lt;/li&gt;
&lt;li&gt;No extraneous whitespaces between parentheses, brackets, or braces, or before commas.&lt;/li&gt;
&lt;li&gt;Name classes in &lt;code&gt;CamelCase&lt;/code&gt;; suffix exceptions with &lt;code&gt;Error&lt;/code&gt; (if applicable); name functions in lowercase with words &lt;code&gt;separated_by_underscores&lt;/code&gt;; and use a leading underscore for &lt;code&gt;_private&lt;/code&gt; attributes or methods.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These guidelines really aren&apos;t hard to follow and they make a lot of sense. Most Python programmers have no trouble sticking to them as they write code.&lt;/p&gt;
&lt;p&gt;However, &lt;em&gt;errare humanum est&lt;/em&gt;, and it&apos;s still a pain to look through your code to make sure it fits the PEP 8 guidelines. That&apos;s what the &lt;a href=&quot;http://pycodestyle.pycqa.org/en/latest/&quot;&gt;pycodestyle&lt;/a&gt; tool (formerly called &lt;em&gt;pep8&lt;/em&gt;) is there for: it can automatically check any Python file you send its way.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pycodestyle hello.py
hello.py:4:1: E302 expected 2 blank lines, found 1
$ echo $?
1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;pycodestyle&lt;/em&gt; indicates which lines and columns do not conform to PEP 8 and reports each issue with a code. Violations of &lt;em&gt;MUST&lt;/em&gt; statements in the specification are reported as &lt;em&gt;errors&lt;/em&gt; — their error codes start with an &lt;em&gt;E&lt;/em&gt;. Minor issues are reported as &lt;em&gt;warnings&lt;/em&gt; — their error codes start with a &lt;em&gt;W&lt;/em&gt;. The three-digit code following the first letter indicates the exact kind of error or warning.&lt;/p&gt;
&lt;p&gt;You can tell the general category of an error code at a glance by looking at the hundreds digit: for example, errors starting with &lt;code&gt;E2&lt;/code&gt; indicate issues with whitespace; errors starting with &lt;code&gt;E3&lt;/code&gt; indicate issues with blank lines; and warnings starting with &lt;code&gt;W6&lt;/code&gt; indicate deprecated features being used.&lt;/p&gt;
&lt;p&gt;I advise you to consider it and run a PEP 8 validation tool against your source code on a regular basis. An easy way to do this is to integrate it into your continuous integration system: it&apos;s a good way to ensure that you continue to respect the PEP 8 guidelines in the long term.&lt;/p&gt;
&lt;p&gt;Most open source project enforce PEP 8 conformance through automatic checks. Doing so since the beginning of the project might frustrate newcomers, but it also ensures that the codebase always looks the same in every part of the project. This is very important for a project of any size where there are multiple developers with differing opinions on whitespace ordering. You know what I mean.&lt;/p&gt;
&lt;p&gt;It&apos;s also possible to ignore certain kinds of errors and warnings by using the &lt;code&gt;--ignore&lt;/code&gt; option:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pycodestyle --ignore=E3 hello.py
$ echo $?
0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This allows you to effectively ignore parts of the PEP 8 specification that you don&apos;t want to follow. If you&apos;re running &lt;em&gt;pycodestyle&lt;/em&gt; on a existing code base, it also allows you to ignore certain kinds of problems so you can focus on fixing issues one category at a time.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you write C code for Python (e.g. modules), the &lt;a href=&quot;http://www.python.org/dev/peps/pep-0007/&quot;&gt;PEP 7&lt;/a&gt; standard describes the coding style that you should follow.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Other tools also exist that check for actual coding errors rather than style errors. Some notable examples include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://launchpad.net/pyflakes&quot;&gt;pyflakes&lt;/a&gt;, which is also extendable via plugins.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pypi.python.org/pypi/pylint&quot;&gt;pylint&lt;/a&gt;, which also checks PEP 8 conformance while performing more checks by default. It also can be extended via plugins.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These tools all make use of static analysis — that is, they parse the code and analyze it rather than running it outright.&lt;/p&gt;
&lt;p&gt;If you choose to use &lt;em&gt;pyflakes&lt;/em&gt; — which I recommend — note that it doesn&apos;t check PEP 8 conformance on its own — you would still &lt;em&gt;pycodestyle&lt;/em&gt; to do that. That means you need 2 different tools to have a proper coverage.&lt;/p&gt;
&lt;p&gt;In order to simplify things, a project named &lt;em&gt;&lt;a href=&quot;https://pypi.python.org/pypi/flake8&quot;&gt;flake8&lt;/a&gt;&lt;/em&gt; exists and combines &lt;em&gt;pyflakes&lt;/em&gt; and &lt;em&gt;pycodestyle&lt;/em&gt; into a single command. It also adds some new fancy features: for example, it can skip checks on lines containing &lt;code&gt;# noqa&lt;/code&gt; and is extensible via plugins.&lt;/p&gt;
&lt;p&gt;There are a large number of plugins available for &lt;em&gt;flake8&lt;/em&gt; that you can just use. For example, installing &lt;em&gt;flake8-import-order&lt;/em&gt; (with &lt;code&gt;pip install flake8-import-order&lt;/code&gt;) will extend &lt;em&gt;flake8&lt;/em&gt; so it also checks that your &lt;code&gt;import&lt;/code&gt; statements are sorted alphabetically in your source code.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;flake8&lt;/em&gt; is now heavily used in most open source projects for code style verification. Some large open source projects even wrote their own plugins, adding checks checks for errors such as odd usage of &lt;code&gt;except&lt;/code&gt;, Python 2/3 portability issues, import style, dangerous string formatting, possible localization issues, etc.&lt;/p&gt;
&lt;p&gt;If you&apos;re starting a new project, I strongly recommend you use one of these tools and rely on it for automatic checking of your code quality and style. If you already have a codebase, a good approach is to run them with most of the warnings disabled and fix issues one category at a time.&lt;/p&gt;
&lt;p&gt;While none of these tools may be a perfect fit for your project or your preferences, using &lt;em&gt;flake8&lt;/em&gt; together is a good way to improve the quality of your code and make it more durable. If nothing else, it&apos;s a good start toward that goal.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Many text editors, including the famous &lt;a href=&quot;http://www.gnu.org/software/emacs/&quot;&gt;GNU Emacs&lt;/a&gt; and &lt;a href=&quot;http://www.vim.org/&quot;&gt;vim&lt;/a&gt;, have plugins available (such as &lt;em&gt;Flycheck&lt;/em&gt;) that can run tools such as &lt;em&gt;pep8&lt;/em&gt; or &lt;em&gt;flake8&lt;/em&gt; directly in your code buffer, interactively highlighting any part of your code that isn&apos;t PEP 8-compliant. This is a handy way to fix most style errors as you write your code.&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded><category>python</category></item><item><title>High-Performance in Python with Zero-Copy and the Buffer Protocol</title><link>https://julien.danjou.info/blog/high-performance-in-python-with-zero-copy-and-the-buffer-protocol/</link><guid isPermaLink="true">https://julien.danjou.info/blog/high-performance-in-python-with-zero-copy-and-the-buffer-protocol/</guid><description>Whatever your programs are doing, they often have to deal with vast amounts of data. This data is usually represented and manipulated in the form of strings. However, handling such a large quantity of</description><pubDate>Mon, 03 Sep 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Whatever your programs are doing, they often have to deal with vast amounts of data. This data is usually represented and manipulated in the form of &lt;em&gt;strings&lt;/em&gt;. However, handling such a large quantity of input in strings can be very ineffective once you start manipulating them by copying, slicing, and modifying. Why?&lt;/p&gt;
&lt;p&gt;Let&apos;s consider a small program which reads a large file of binary data, and&lt;br /&gt;
copies it partially into another file. To examine out the memory usage of this program, we will use &lt;a href=&quot;https://pypi.python.org/pypi/memory_profiler&quot;&gt;memory_profiler&lt;/a&gt;, an excellent Python package that allows us to see the memory usage of a program line by line.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@profile
def read_random():
    with open(&quot;/dev/urandom&quot;, &quot;rb&quot;) as source:
        content = source.read(1024 * 10000)
        content_to_write = content[1024:]
    print(&quot;Content length: %d, content to write length %d&quot; %
          (len(content), len(content_to_write)))
    with open(&quot;/dev/null&quot;, &quot;wb&quot;) as target:
        target.write(content_to_write)

if __name__ == &apos;__main__&apos;:
    read_random()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running the above program using &lt;em&gt;memory_profiler&lt;/em&gt; produces the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ python -m memory_profiler memoryview/copy.py
Content length: 10240000, content to write length 10238976
Filename: memoryview/copy.py

Mem usage    Increment   Line Contents
======================================
                         @profile
 9.883 MB     0.000 MB   def read_random():
 9.887 MB     0.004 MB       with open(&quot;/dev/urandom&quot;, &quot;rb&quot;) as source:
19.656 MB     9.770 MB           content = source.read(1024 * 10000)
29.422 MB     9.766 MB           content_to_write = content[1024:]
29.422 MB     0.000 MB       print(&quot;Content length: %d, content to write length %d&quot; %
29.434 MB     0.012 MB             (len(content), len(content_to_write)))
29.434 MB     0.000 MB       with open(&quot;/dev/null&quot;, &quot;wb&quot;) as target:
29.434 MB     0.000 MB           target.write(content_to_write)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The call to &lt;code&gt;source.read&lt;/code&gt; reads 10 MB from &lt;code&gt;/dev/urandom&lt;/code&gt;. Python needs to allocate around 10 MB of memory to store this data as a string. The instruction on the line just after, &lt;code&gt;content[1024:]&lt;/code&gt;, copies the entire block of data minus the first KB — allocating 10 more megabytes.&lt;/p&gt;
&lt;p&gt;So what&apos;s interesting here, is to notice that the memory usage of the program increased by about 10 MB when building the variable &lt;code&gt;content_to_write&lt;/code&gt;. The slice operator is copying the entirety of &lt;code&gt;content&lt;/code&gt;, minus the first KB, into a new string object.&lt;/p&gt;
&lt;p&gt;When dealing with extensive data, performing this kind of operation on large byte arrays is going to be a disaster. If you already have written C code, you know that using &lt;code&gt;memcpy()&lt;/code&gt; has a significant cost, both in term of memory usage and regarding general performance: copying memory is slow.&lt;/p&gt;
&lt;p&gt;However, as a C programmer, you also know that strings are arrays of characters and that nothing stops you from looking at only part of this array without copying it, through the use of basic pointer arithmetic – assuming that the entire string is in a contiguous memory area.&lt;/p&gt;
&lt;p&gt;This is possible in Python using objects which implement the &lt;em&gt;buffer protocol&lt;/em&gt;. The buffer protocol is defined in &lt;a href=&quot;http://www.python.org/dev/peps/pep-3118/&quot;&gt;PEP 3118&lt;/a&gt;, which explains the C API used to provide this protocol to various types, such as strings.&lt;/p&gt;
&lt;p&gt;When an object implements this protocol, you can use the &lt;code&gt;memoryview&lt;/code&gt; class constructor on it to build a new &lt;em&gt;memoryview&lt;/em&gt; object that references the original object memory.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; s = b&quot;abcdefgh&quot;
&amp;gt;&amp;gt;&amp;gt; view = memoryview(s)
&amp;gt;&amp;gt;&amp;gt; view[1]
98
&amp;gt;&amp;gt;&amp;gt; limited = view[1:3]
&amp;gt;&amp;gt;&amp;gt; limited
&amp;lt;memory at 0x7fca18b8d460&amp;gt;
&amp;gt;&amp;gt;&amp;gt; bytes(view[1:3])
b&apos;bc&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: &lt;code&gt;98&lt;/code&gt; is the ASCII code for the letter &lt;code&gt;b&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In the example above, we use the fact that the &lt;code&gt;memoryview&lt;/code&gt; object&apos;s slice operator itself returns a &lt;code&gt;memoryview&lt;/code&gt; object. That means it does &lt;strong&gt;not&lt;/strong&gt; copy any data but merely references a particular slice of it.&lt;/p&gt;
&lt;p&gt;The graph below illustrates what happens:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/08/serious-python__3.png&quot; alt=&quot;serious-python__3&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Therefore, it is possible to rewrite the program above in a more efficient manner. We need to reference the data that we want to write using a &lt;em&gt;memoryview&lt;/em&gt; object, rather than allocating a new string.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@profile
def read_random():
    with open(&quot;/dev/urandom&quot;, &quot;rb&quot;) as source:
        content = source.read(1024 * 10000)
        content_to_write = memoryview(content)[1024:]
    print(&quot;Content length: %d, content to write length %d&quot; %
          (len(content), len(content_to_write)))
    with open(&quot;/dev/null&quot;, &quot;wb&quot;) as target:
        target.write(content_to_write)

if __name__ == &apos;__main__&apos;:
    read_random()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&apos;s run the program above with the memory profiler:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ python -m memory_profiler memoryview/copy-memoryview.py
Content length: 10240000, content to write length 10238976
Filename: memoryview/copy-memoryview.py

Mem usage    Increment   Line Contents
======================================
                         @profile
 9.887 MB     0.000 MB   def read_random():
 9.891 MB     0.004 MB       with open(&quot;/dev/urandom&quot;, &quot;rb&quot;) as source:
19.660 MB     9.770 MB           content = source.read(1024 * 10000) &amp;lt;1&amp;gt;
19.660 MB     0.000 MB           content_to_write = memoryview(content)[1024:] &amp;lt;2&amp;gt;
19.660 MB     0.000 MB       print(&quot;Content length: %d, content to write length %d&quot; %
19.672 MB     0.012 MB             (len(content), len(content_to_write)))
19.672 MB     0.000 MB       with open(&quot;/dev/null&quot;, &quot;wb&quot;) as target:
19.672 MB     0.000 MB           target.write(content_to_write)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In that case, the &lt;code&gt;source.read&lt;/code&gt; call still allocates 10 MB of memory to read the content of the file. However, when using &lt;code&gt;memoryview&lt;/code&gt; to refer to the offset content, no more memory is allocated.&lt;/p&gt;
&lt;p&gt;This version of the program ends up allocating 50% less memory than the original version!&lt;/p&gt;
&lt;p&gt;This kind of trick is especially useful when dealing with sockets. When sending data over a socket, all the data might not be sent in a single call.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import socket
s = socket.socket(…)
s.connect(…)
## Build a bytes object with more than 100 millions times the letter `a`
data = b&quot;a&quot; * (1024 * 100000)
while data:
    sent = s.send(data)
    # Remove the first `sent` bytes sent
    data = data[sent:] &amp;lt;2&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using a mechanism as implemented above, the program copies the data over and over until the socket has sent everything. By using &lt;code&gt;memoryview&lt;/code&gt;, it is possible to achieve the same functionality with zero-copy, and therefore higher performance:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import socket
s = socket.socket(…)
s.connect(…)
## Build a bytes object with more than 100 millions times the letter `a`
data = b&quot;a&quot; * (1024 * 100000)
mv = memoryview(data)
while mv:
    sent = s.send(mv)
    # Build a new memoryview object pointing to the data which remains to be sent
    mv = mv[sent:]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As this won&apos;t copy anything, it won&apos;t use any more memory than the 100 MB&lt;br /&gt;
initially needed for the &lt;code&gt;data&lt;/code&gt; variable.&lt;/p&gt;
&lt;p&gt;So far we&apos;ve used &lt;code&gt;memoryview&lt;/code&gt; objects to write data efficiently, but the same method can also be used to read data. Most I/O operations in Python know how to deal with objects implementing the buffer protocol. They can read from it, but also write to it. In this case, we don&apos;t need &lt;code&gt;memoryview&lt;/code&gt; objects – we can ask an I/O function to write into our pre-allocated object:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; ba = bytearray(8)
&amp;gt;&amp;gt;&amp;gt; ba
bytearray(b&apos;\x00\x00\x00\x00\x00\x00\x00\x00&apos;)
&amp;gt;&amp;gt;&amp;gt; with open(&quot;/dev/urandom&quot;, &quot;rb&quot;) as source:
...     source.readinto(ba)
... 
8
&amp;gt;&amp;gt;&amp;gt; ba
bytearray(b&apos;`m.z\x8d\x0fp\xa1&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With such techniques, it&apos;s easy to pre-allocate a buffer (as you would do in C to mitigate the number of calls to &lt;code&gt;malloc()&lt;/code&gt;) and fill it at your convenience.&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;memoryview&lt;/code&gt;, you can even place data at any point in the memory area:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; ba = bytearray(8)
&amp;gt;&amp;gt;&amp;gt; # Reference the _bytearray_ from offset 4 to its end
&amp;gt;&amp;gt;&amp;gt; ba_at_4 = memoryview(ba)[4:]
&amp;gt;&amp;gt;&amp;gt; with open(&quot;/dev/urandom&quot;, &quot;rb&quot;) as source:
... # Write the content of /dev/urandom from offset 4 to the end of the
... # bytearray, effectively reading 4 bytes only
...     source.readinto(ba_at_4)
... 
4
&amp;gt;&amp;gt;&amp;gt; ba
bytearray(b&apos;\x00\x00\x00\x00\x0b\x19\xae\xb2&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The buffer protocol is fundamental to achieve low memory overhead and great performances. As Python hides all the memory allocations, developers tend to forget what happens under the hood, at a high cost for the speed of their programs!&lt;/p&gt;
&lt;p&gt;It&apos;s also good to know that both the objects in the &lt;code&gt;array&lt;/code&gt; module and the functions in the &lt;code&gt;struct&lt;/code&gt; module can handle the buffer protocol correctly, and can, therefore, efficiently perform when targeting zero copy.&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Starting your first Python project</title><link>https://julien.danjou.info/blog/starting-your-first-python-project/</link><guid isPermaLink="true">https://julien.danjou.info/blog/starting-your-first-python-project/</guid><description>There&apos;s a gap between learning the syntax of the Python programming language and being able to build a project from scratch. When you finish reading your first tutorial or book about Python, you&apos;re go</description><pubDate>Thu, 26 Jul 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;There&apos;s a gap between learning the syntax of the Python programming language and being able to build a project from scratch. When you finish reading your first tutorial or book about Python, you&apos;re good to go for writing a Fibonacci suite calculator, but that does not help you starting your &lt;em&gt;actual&lt;/em&gt; project.&lt;/p&gt;
&lt;p&gt;There are a few questions that pop up in your mind, and that&apos;s normal. Let&apos;s take a stab at those!&lt;/p&gt;
&lt;h3&gt;Which Python version should I use?&lt;/h3&gt;
&lt;p&gt;It&apos;s not a secret that Python has several versions that are supported at the same time. Each minor version of the interpreter gets bugfix support for 18 months and security support for 5 years. For example, Python 3.7, released on 27th June 2018, will be supported until Python 3.8 is released, around October 2019 (15 months later). Around December 2019, the last bugfix release of Python 3.7 will occur, and everyone is expected to switch to Python 3.8.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/07/python-release-timeline.png&quot; alt=&quot;Current Python 3.7/3.8 release schedule&quot; /&gt;&lt;/p&gt;
&lt;p&gt;That&apos;s important to be aware of as the version of the interpreter will be entirely part of your software lifecycle.&lt;/p&gt;
&lt;p&gt;On top of that, we should take into consideration the Python 2 versus Python 3 question. That still might be an open question for people working with (very) old platforms.&lt;/p&gt;
&lt;p&gt;In the end, the question of which version of Python one should use is well worth asking.&lt;/p&gt;
&lt;p&gt;Here are some short answers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Versions 2.6 and older are really obsolete by now, so you don&apos;t have to worry about supporting them at all. If you intend on supporting these older versions anyway, be warned that you&apos;ll have an even harder time ensuring that your program supports Python 3.x as well. Though you might still run into Python 2.6 on some older systems; if that&apos;s the case, sorry for you!&lt;/li&gt;
&lt;li&gt;Version 2.7 is and will remain the last version of Python 2.x. I don&apos;t think there is a system where Python 3 is not available one way or the other nowadays. So unless you&apos;re doing archeology once again, forget it. Python 2.7 will not be supported after the year 2020, so the last thing you want to do is build a new software based on it.&lt;/li&gt;
&lt;li&gt;Versions 3.7 is the most recent version of the Python 3 branch as of this writing, and that&apos;s the one that you should target. Most recent operating systems ship at least 3.6, so in the case where you&apos;d target those, you can make sure your application also work with 3.7.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Project Layout&lt;/h2&gt;
&lt;p&gt;Starting a new project is always a puzzle. You never know how to organize your files. However, once you have a proper understanding of the best practice out there, it&apos;s pretty simple.&lt;/p&gt;
&lt;p&gt;First, your project structure should be fairly basic. Use packages and hierarchy wisely: a deep hierarchy can be a nightmare to navigate, while a flat hierarchy tends to become bloated.&lt;/p&gt;
&lt;p&gt;Then, avoid making a few common mistakes. Don&apos;t leave unit tests outside the package directory. These tests should be included in a sub-package of your software so that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They don&apos;t get automatically installed as a &lt;em&gt;tests&lt;/em&gt; top-level module by &lt;em&gt;setuptools&lt;/em&gt; (or some other packaging library) by accident.&lt;/li&gt;
&lt;li&gt;They can be installed and eventually used by other packages to build their unit tests.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following diagram illustrates what a standard file hierarchy should look like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/07/serious-python__1-3.png&quot; alt=&quot;A Python project files and directories hierarchy&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;setup.py&lt;/code&gt; is the standard name for Python installation script, along with its companion &lt;code&gt;setup.cfg&lt;/code&gt;, which should contain the installation script configuration. When run, &lt;code&gt;setup.py&lt;/code&gt; installs your package using the Python distribution utilities.&lt;/p&gt;
&lt;p&gt;You can also provide valuable information to users in &lt;code&gt;README.rst&lt;/code&gt; (or &lt;code&gt;README.txt&lt;/code&gt;, or whatever filename suits your fancy). Finally, the &lt;code&gt;docs&lt;/code&gt; directory should contain the package&apos;s documentation in &lt;em&gt;reStructuredText&lt;/em&gt; format, that will be consumed by &lt;em&gt;Sphinx&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Packages often have to provide extra data, such as images, shell scripts, and so forth. Unfortunately, there&apos;s no universally accepted standard for where these files should be stored. Just put them wherever makes the most sense for your project: depending on their functions, for example, Web application templates could go in a &lt;code&gt;templates&lt;/code&gt; directory in your package root directory.&lt;/p&gt;
&lt;p&gt;The following top-level directories also frequently appear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;etc&lt;/code&gt; for sample configuration files.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tools&lt;/code&gt; for shell scripts or related tools.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bin&lt;/code&gt; for binary scripts you&apos;ve written that will be installed by &lt;code&gt;setup.py&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&apos;s another design issue that I often encounter. When creating files or modules, some developers create them based on the type of code they will store. For example, they would create &lt;code&gt;functions.py&lt;/code&gt; or &lt;code&gt;exceptions.py&lt;/code&gt; files. This is a &lt;em&gt;terrible&lt;/em&gt; approach. It doesn&apos;t help any developer when navigating the code. The code organization doesn&apos;t benefit from this, and it forces readers to jump between files for no good reason. There are a few exceptions, such as libraries, in some instances, because they do expose a complete API for consumers. However, other than that, think twice before doing that in your application.&lt;/p&gt;
&lt;p&gt;Organize your code based on features, not based on types.&lt;/p&gt;
&lt;p&gt;Creating a module directory with just an &lt;code&gt;__init__.py&lt;/code&gt; file in it is also a bad idea. For example, don&apos;t create a directory named &lt;code&gt;hooks&lt;/code&gt; with a single file named &lt;code&gt;hooks/__init__.py&lt;/code&gt; in it where &lt;code&gt;hooks.py&lt;/code&gt; would have been enough instead. If you create a directory, it should contain several other Python files that belong to the category the directory represents.&lt;/p&gt;
&lt;p&gt;Be also very careful about the code that you put in the &lt;code&gt;__init__.py&lt;/code&gt; files: it is going to be called and executed the first time that any of the module contained in the directory is loaded. This can have unwanted side effects. Those &lt;code&gt;__init__.py&lt;/code&gt; files should be empty most of the time unless you know what you&apos;re doing.&lt;/p&gt;
&lt;h2&gt;Version Numbering&lt;/h2&gt;
&lt;p&gt;Software version needs to be stamped to know which one is more recent than another. As every piece of code evolves, it&apos;s a requirement for every project to be able to organize its timeline.&lt;/p&gt;
&lt;p&gt;There is an infinite number of way to organize your version numbers, but &lt;a href=&quot;http://www.python.org/dev/peps/pep-0440/&quot;&gt;PEP 440&lt;/a&gt; introduces a version format that every Python package, and ideally every application, should follow. This way, programs and packages will be able to quickly and reliably identify which versions of your package they require.&lt;/p&gt;
&lt;p&gt;PEP 440 defines the following regular expression format for version numbering:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;N[.N]+[{a|b|c|rc}N][.postN][.devN]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This allows for standard numbering like &lt;code&gt;1.2&lt;/code&gt; or &lt;code&gt;1.2.3&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;However, please do note that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;1.2&lt;/code&gt; is equivalent to &lt;code&gt;1.2.0&lt;/code&gt;; &lt;code&gt;1.3.4&lt;/code&gt; is equivalent to &lt;code&gt;1.3.4.0&lt;/code&gt;, and so forth.&lt;/li&gt;
&lt;li&gt;Versions matching &lt;code&gt;N[.N]+&lt;/code&gt; are considered &lt;em&gt;final releases&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Date-based versions such as &lt;code&gt;2013.06.22&lt;/code&gt; are considered invalid. Automated tools designed to detect PEP 440-format version numbers will (or should) raise an error if they detect a version number greater than or equal to &lt;code&gt;1980&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Final components can also use the following format:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;N[.N]+aN&lt;/code&gt; (e.g. &lt;code&gt;1.2a1&lt;/code&gt;) denotes an &lt;em&gt;alpha&lt;/em&gt; release, a version that might be unstable and missing features.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;N[.N]+bN&lt;/code&gt; (e.g. &lt;code&gt;2.3.1b2&lt;/code&gt;) denotes a &lt;em&gt;beta&lt;/em&gt; release, a version that might be feature-complete but still buggy.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;N[.N]+cN&lt;/code&gt; or &lt;code&gt;N[.N]+rcN&lt;/code&gt; (e.g. &lt;code&gt;0.4rc1&lt;/code&gt;) denotes a &lt;em&gt;(release) candidate&lt;/em&gt;, a version that might be released as the final product unless significant bugs emerge. While the &lt;code&gt;rc&lt;/code&gt; and &lt;code&gt;c&lt;/code&gt; suffixes have the same meaning, if both are used, &lt;code&gt;rc&lt;/code&gt; releases are considered to be newer than &lt;code&gt;c&lt;/code&gt; releases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These suffixes can also be used:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;.postN&lt;/code&gt; (e.g.&lt;code&gt;1.4.post2&lt;/code&gt;) indicates a &lt;em&gt;post-release&lt;/em&gt;. These are typically used to address minor errors in the publication process (e.g. mistakes in release notes). You shouldn&apos;t use &lt;code&gt;.postN&lt;/code&gt; when releasing a bugfix version; instead, you should increment the minor version number.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.devN&lt;/code&gt; (e.g. &lt;code&gt;2.3.4.dev3&lt;/code&gt;) indicates a &lt;em&gt;developmental release&lt;/em&gt;. This suffix is discouraged because it is harder for humans to parse. It indicates a prerelease of the version that it qualifies: e.g. &lt;code&gt;2.3.4.dev3&lt;/code&gt; indicates the third developmental version of the &lt;code&gt;2.3.4&lt;/code&gt; release, before any alpha, beta, candidate or final release.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This scheme should be sufficient for most common use cases.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You might have heard of &lt;a href=&quot;http://semver.org/&quot;&gt;Semantic Versioning&lt;/a&gt;, which provides its own guidelines for version numbering. This specification partially overlaps with PEP 440, but unfortunately, they&apos;re not entirely compatible. For example, Semantic Versioning&apos;s recommendation for prerelease versioning uses a scheme such as &lt;code&gt;1.0.0-alpha+001&lt;/code&gt; that is not compliant with PEP 440.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Many DVCS platforms, such as Git and Mercurial, can generate version numbers using an identifying hash (for Git, refer to &lt;code&gt;git describe&lt;/code&gt;). Unfortunately, this system isn&apos;t compatible with the scheme defined by PEP 440: for one thing, identifying hashes aren&apos;t orderable.&lt;/p&gt;
&lt;p&gt;Those are only some of the first questions you could have. If you have any other one that you would like me to answer, feel free to write a comment below. Some goes if you have any other pieces of advice you&apos;d like to share!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>A simple filtering syntax tree in Python</title><link>https://julien.danjou.info/blog/simple-filtering-syntax-tree-in-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/simple-filtering-syntax-tree-in-python/</guid><description>Working on various pieces of software those last years, I noticed that there&apos;s always a feature that requires implementing some DSL.  The problem with DSL is that it is never the road that you want to</description><pubDate>Thu, 03 May 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Working on various pieces of software those last years, I noticed that there&apos;s always a feature that requires implementing some DSL.&lt;/p&gt;
&lt;p&gt;The problem with DSL is that it is never the road that you want to go. I remember how creating my first DSL was fascinating: after using programming languages for years, I was finally designing my own tiny language!&lt;/p&gt;
&lt;p&gt;A new language that my users would have to learn and master. Oh, it had nothing new, it was a subset of something, inspired by my years of C, Perl or Python, who knows. And that&apos;s the terrible part about DSL: they are an marvelous tradeoff between the power that they give to users, allowing them to define precisely their needs and the cumbersomeness of learning a language that will be useful in only one specific situation.&lt;/p&gt;
&lt;p&gt;In this blog post, I would like to introduce a very unsophisticated way of implementing the syntax tree that could be used as a basis for a DSL. The goal of that syntax tree will be filtering. The problem it will solve is the following: having a piece of data, we want the user to tell us if the data matches their conditions or not.&lt;/p&gt;
&lt;p&gt;To give a concrete example: a machine wants to grant the user the ability to filter the beans that it should keep. What the machine passes to the filter is the size of the current grain, and the filter should return either &lt;code&gt;true&lt;/code&gt; or &lt;code&gt;false&lt;/code&gt;, based on the condition defined by the user: for example, only keep beans that are bigger that are between 1 and 2 centimeters or between 4 and 6 centimeters.&lt;/p&gt;
&lt;p&gt;The number of conditions that the users can define could be quite considerable, and we want to provide at least a basic set of predicate operators: &lt;code&gt;equal&lt;/code&gt;, &lt;code&gt;greater than&lt;/code&gt; and &lt;code&gt;lesser than&lt;/code&gt;. We also want the user to be able to combine those, so we&apos;ll add the logical operators &lt;code&gt;or&lt;/code&gt; and &lt;code&gt;and&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A set of conditions can be seen as a tree, where leaves are either predicates, and in that case, do not have children, or are logical operators, and have children. For example, the propositional logic formula &lt;code&gt;φ1 ∨ (φ2 ∨ φ3)&lt;/code&gt; can be represented with as a tree like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/05/Mqs5i-1.png&quot; alt=&quot;Mqs5i-1&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Starting with this in mind, it appears that the natural solution is going to be recursive: handle the predicate as terminal, and if the node is a logical operator, recurse over its children.&lt;br /&gt;
Since we will be doing Python, we&apos;re going to use Python to evaluate our syntax tree.&lt;/p&gt;
&lt;p&gt;The simplest way to write a tree in Python is going to be using dictionaries. A dictionary will represent one node and will have only one key and one value: the key will be the name of the operator (&lt;code&gt;equal&lt;/code&gt;, &lt;code&gt;greater than&lt;/code&gt;, &lt;code&gt;or&lt;/code&gt;, &lt;code&gt;and&lt;/code&gt;…) and the value will be the argument of this operator if it is a predicate, or a list of children (as dictionaries) if it is a logical operator.&lt;/p&gt;
&lt;p&gt;For example, to filter our bean, we would create a tree such as:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{&quot;or&quot;: [
  {&quot;and&quot;: [
    {&quot;ge&quot;: 1},
    {&quot;le&quot;: 2},
  ]},
  {&quot;and&quot;: [
    {&quot;ge&quot;: 4},
    {&quot;le&quot;: 6},
  ]},
]}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The goal here is to walk through the tree and evaluate each of the leaves of the tree and returning the final result: if we passed &lt;code&gt;5&lt;/code&gt; to this filter, it would return &lt;code&gt;True&lt;/code&gt;, and if we passed &lt;code&gt;10&lt;/code&gt; to this filter, it would return &lt;code&gt;False&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Here&apos;s how we could implement a very depthless filter that only handles predicates (for now):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import operator

class InvalidQuery(Exception):
    pass

class Filter(object):
    binary_operators = {
        &quot;eq&quot;: operator.eq,
        &quot;gt&quot;: operator.gt,
        &quot;ge&quot;: operator.ge,
        &quot;lt&quot;: operator.lt,
        &quot;le&quot;: operator.le,
    }

    def __init__(self, tree):
        # Parse the tree and store the evaluator
        self._eval = self.build_evaluator(tree)

    def __call__(self, value):
        # Call the evaluator with the value
        return self._eval(value)

    def build_evaluator(self, tree):
        try:
            # Pick the first item of the dictionary.
            # If the dictionary has multiple keys/values
            # the first one (= random) will be picked.
            # The key is the operator name (e.g. &quot;eq&quot;)
            # and the value is the argument for it
            operator, nodes = list(tree.items())[0]
        except Exception:
            raise InvalidQuery(&quot;Unable to parse tree %s&quot; % tree)
        try:
            # Lookup the operator name
            op = self.binary_operators[operator]
        except KeyError:
            raise InvalidQuery(&quot;Unknown operator %s&quot; % operator)
        # Return a function (lambda) that takes
        # the filtered value as argument and returns
        # the result of the predicate evaluation
        return lambda value: op(value, nodes)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can use this &lt;code&gt;Filter&lt;/code&gt; class by passing a predicate such as &lt;code&gt;{&quot;eq&quot;: 4}&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; f = Filter({&quot;eq&quot;: 4})
&amp;gt;&amp;gt;&amp;gt; f(2)
False
&amp;gt;&amp;gt;&amp;gt; f(4)
True
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This &lt;code&gt;Filter&lt;/code&gt; class works but is quite limited as we did not provide logical operators. Here&apos;s a complete implementation that supports binary operators &lt;code&gt;and&lt;/code&gt; and &lt;code&gt;or&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import operator

class InvalidQuery(Exception):
    pass

class Filter(object):
    binary_operators = {
        u&quot;=&quot;: operator.eq,
        u&quot;==&quot;: operator.eq,
        u&quot;eq&quot;: operator.eq,

        u&quot;&amp;lt;&quot;: operator.lt,
        u&quot;lt&quot;: operator.lt,

        u&quot;&amp;gt;&quot;: operator.gt,
        u&quot;gt&quot;: operator.gt,

        u&quot;&amp;lt;=&quot;: operator.le,
        u&quot;≤&quot;: operator.le,
        u&quot;le&quot;: operator.le,

        u&quot;&amp;gt;=&quot;: operator.ge,
        u&quot;≥&quot;: operator.ge,
        u&quot;ge&quot;: operator.ge,

        u&quot;!=&quot;: operator.ne,
        u&quot;≠&quot;: operator.ne,
        u&quot;ne&quot;: operator.ne,
    }

    multiple_operators = {
        u&quot;or&quot;: any,
        u&quot;∨&quot;: any,
        u&quot;and&quot;: all,
        u&quot;∧&quot;: all,
    }

    def __init__(self, tree):
        self._eval = self.build_evaluator(tree)

    def __call__(self, value):
        return self._eval(value)

    def build_evaluator(self, tree):
        try:
            operator, nodes = list(tree.items())[0]
        except Exception:
            raise InvalidQuery(&quot;Unable to parse tree %s&quot; % tree)
        try:
            op = self.multiple_operators[operator]
        except KeyError:
            try:
                op = self.binary_operators[operator]
            except KeyError:
                raise InvalidQuery(&quot;Unknown operator %s&quot; % operator)
            return lambda value: op(value, nodes)
        # Iterate over every item in the list of the value linked
        # to the logical operator, and compile it down to its own
        # evaluator.
        elements = [self.build_evaluator(node) for node in nodes]
        return lambda value: op((e(value) for e in elements))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To support the &lt;code&gt;and&lt;/code&gt; and &lt;code&gt;or&lt;/code&gt; operators, we leverage the &lt;code&gt;all&lt;/code&gt; and &lt;code&gt;any&lt;/code&gt; built-in Python functions. They are called with an argument that is a generator that evaluates each one of the sub-evaluator, doing the trick.&lt;/p&gt;
&lt;p&gt;Unicode is the new sexy, so I&apos;ve also added Unicode symbols support.&lt;/p&gt;
&lt;p&gt;And it is now possible to implement our full example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; f = Filter(
...     {&quot;∨&quot;: [
...         {&quot;∧&quot;: [
...             {&quot;≥&quot;: 1},
...             {&quot;≤&quot;: 2},
...         ]},
...         {&quot;∧&quot;: [
...             {&quot;≥&quot;: 4},
...             {&quot;≤&quot;: 6},
...         ]},
...     ]})
&amp;gt;&amp;gt;&amp;gt; f(5)
True
&amp;gt;&amp;gt;&amp;gt; f(8)
False
&amp;gt;&amp;gt;&amp;gt; f(1)
True
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As an exercise, you could try to add the &lt;code&gt;not&lt;/code&gt; operator, which deserve its own category as it is a unary operator!&lt;/p&gt;
&lt;p&gt;In the next blog post, we will see how to improve that filter with more features, and how to implement a domain-specific language on top of it, to make humans happy when writing the filter!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/05/IMG_20180427_180044--1-.jpg&quot; alt=&quot;Hole and Henni – François Charlier, 2018In this drawing, the artist represents the deepness of functional programming and how its horse power can help you escape many dark situations.&quot; /&gt;&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Correct HTTP scheme in WSGI with Cloudflare</title><link>https://julien.danjou.info/blog/correct-http-scheme-in-wsgi-with-cloudflare/</link><guid isPermaLink="true">https://julien.danjou.info/blog/correct-http-scheme-in-wsgi-with-cloudflare/</guid><description>I&apos;ve recently been using Cloudflare as an HTTP frontend for some applications, and getting things working correctly with WSGI was unobvious.</description><pubDate>Wed, 25 Apr 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve recently been using &lt;a href=&quot;https://cloudflare.com&quot;&gt;Cloudflare&lt;/a&gt; as an HTTP frontend for some applications, and getting things working correctly with WSGI was unobvious.&lt;/p&gt;
&lt;p&gt;In Python, &lt;a href=&quot;https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface&quot;&gt;WSGI&lt;/a&gt; is the standard protocol to write a Web application. All Web frameworks that I know follows it. And many of those Web frameworks leverage some request environment variables to learn how the request has been made.&lt;/p&gt;
&lt;p&gt;One of those environment variables is &lt;code&gt;wsgi.url_scheme&lt;/code&gt;, and it contains either &lt;code&gt;http&lt;/code&gt; or &lt;code&gt;https&lt;/code&gt;, depending on the protocol that has been used to connect to your WSGI server.&lt;/p&gt;
&lt;p&gt;And that&apos;s where things can get messy. If you enable SSL at Cloudflare in &quot;Flexible&quot; mode, your visitor will connect to your Web site using HTTPS, but Cloudflare will connect to your backend using HTTP. That means that for your application, the traffic will appear to be over HTTP, and not HTTPS: &lt;code&gt;wsgi.url_scheme&lt;/code&gt; will be set to &lt;code&gt;http&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/04/Screen-Shot-2018-04-19-at-22.43.55.png&quot; alt=&quot;Cloudflare SSL setting&quot; /&gt;&lt;/p&gt;
&lt;p&gt;That can lead to several problems with some frameworks. For example, the function &lt;code&gt;url_for&lt;/code&gt; of &lt;a href=&quot;http://flask.pocoo.org/&quot;&gt;Flask&lt;/a&gt; will rely on this variable to generate the scheme part of any URL. In this case, it would, therefore, generate URL starting with &lt;code&gt;http://&lt;/code&gt; whereas your visitors are using &lt;code&gt;https&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The usual workaround is to leverage the &lt;code&gt;X-Forwarded-Proto&lt;/code&gt; that is actually &lt;a href=&quot;https://support.cloudflare.com/hc/en-us/articles/200170986-How-does-Cloudflare-handle-HTTP-Request-headers-&quot;&gt;set by Cloudflare&lt;/a&gt;. In the case where Cloudflare proxies the request to your HTTP host, this will be set to &lt;code&gt;https&lt;/code&gt;. By using the &lt;a href=&quot;http://werkzeug.pocoo.org/docs/contrib/fixers/#werkzeug.contrib.fixers.ProxyFix&quot;&gt;werkzeug.contrib.fixers.ProxyFix&lt;/a&gt; module, the variable &lt;code&gt;wsgi.url_scheme&lt;/code&gt; will be set to what &lt;code&gt;X-Forwarded-Proto&lt;/code&gt; is set.&lt;/p&gt;
&lt;p&gt;That would work fine for any application that is directly behind Cloudflare, or any single HTTP reverse proxy.&lt;/p&gt;
&lt;p&gt;But that does not work as soon as you have multiple reverse proxies. If your application runs on top of &lt;a href=&quot;https://heroku.com&quot;&gt;Heroku&lt;/a&gt; for example, they already provide a reverse proxy and overwrite those headers. That gives the following: &lt;code&gt;Visitor -HTTPS-&amp;gt; Cloudflare -HTTP-&amp;gt; Heroku proxy -HTTP-&amp;gt; Heroku dyno&lt;/code&gt;. Once your dyno is reacher, &lt;code&gt;X-Forwarded-For&lt;/code&gt; will be set to &lt;code&gt;http&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Damn it!&lt;/p&gt;
&lt;p&gt;The proper solution is, therefore, to have all your proxies implement &lt;a href=&quot;https://tools.ietf.org/html/rfc7239&quot;&gt;RFC7239&lt;/a&gt;. This RFC defines a new &lt;code&gt;Forwarded&lt;/code&gt; header that can contain all the hops that have forwarded this request, including all the scheme and IP addresses. Unfortunately, this is not implemented by Cloudflare nor Heroku. Bummer!&lt;/p&gt;
&lt;p&gt;Finally, Cloudflare provides yet another custom header named &lt;code&gt;Cf-Visitor&lt;/code&gt;. It contains a JSON payload with the original HTTP scheme used by the visitor: we can use that to solve our issue. Here&apos;s a WSGI middleware to do that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class CloudflareProxy(object):
    &quot;&quot;&quot;This middleware sets the proto scheme based on the Cf-Visitor header.&quot;&quot;&quot;

    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        cf_visitor = environ.get(&quot;HTTP_CF_VISITOR&quot;)
        if cf_visitor:
            try:
                cf_visitor = json.loads(cf_visitor)
            except ValueError:
                pass
            else:
                proto = cf_visitor.get(&quot;scheme&quot;)
                if proto is not None:
                    environ[&apos;wsgi.url_scheme&apos;] = proto
        return self.app(environ, start_response)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can then use it to encapsulate your WSGI application with &lt;code&gt;app = CloudflareProxy(app)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you&apos;re using JavaScript, I noticed that the &lt;a href=&quot;https://github.com/jshttp/forwarded&quot;&gt;forwarded&lt;/a&gt; library provides that same support for Cloudflare along all the other headers – even RFC7239!&lt;/p&gt;
</content:encoded><category>python</category><category>web</category></item><item><title>Is Python a Good Choice for Entreprise Projects?</title><link>https://julien.danjou.info/blog/is-python-a-good-choice-for-entreprise-projects/</link><guid isPermaLink="true">https://julien.danjou.info/blog/is-python-a-good-choice-for-entreprise-projects/</guid><description>A few weeks ago, one of my followers, Morteza, reached out and asked me the following:  &gt; I develop projects mostly with Python, but I am scared that Python is not a good choice for enterprise project</description><pubDate>Wed, 04 Apr 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few weeks ago, one of my followers, Morteza, reached out and asked me the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I develop projects mostly with Python, but I am scared that Python is not a good choice for enterprise projects. In many cases, I&apos;ve encountered a situation where Python performance was not sufficient, like thread spawning and so on, and as you know, the GIL supports one thread at the time.&lt;br /&gt;
Some friends told me to try to use Java, C++ or even Go for enterprise projects instead of Python. I see many job boards that require Python just for testing, QA or some small projects. I feel that Python is a small gun for showing my experiences and that I&apos;d have to choose an alternative language.&lt;br /&gt;
As you are advanced and professional in many topics especially in Python, I&apos;d need your advice. Is Python good enough for enterprise systems? Or should I choose an alternative language which fills the gaps that exist in Python?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you follow me for a long time, you know I&apos;ve been doing Python for more than ten years now and even wrote two books about it. So while I&apos;m obviously biased, and before writing a reply, I would also like to take a step back and reassure you, dear reader, that I&apos;ve used plenty of other programming languages those last 20 years: Perl, C, PHP, Lua, Lisp, Java, etc. I&apos;ve built tiny to big projects with some of them, and I consider that Lisp is the best programming language. 😅 Therefore, I like to think that I&apos;m not overly partial.&lt;/p&gt;
&lt;p&gt;To reply to Morteza, I would say that you first need to acknowledge that a language itself is not slow or fast. English is not faster than French; however, some French people speak faster than English people.&lt;/p&gt;
&lt;p&gt;So then, yes, CPython, the chief implementation of the Python programming language has some limitations: the GIL (&lt;em&gt;Global Interpreter Lock&lt;/em&gt;) as Morteza says, is the most significant parallelism limiter. The rest of the language is being optimized regularly, and you can follow the work done in each Python version to see where this is going. CPython gets faster on each minor version.&lt;/p&gt;
&lt;p&gt;On the other hand, don&apos;t think that Go or Java are miracles: they both have their limitations. For example, you can read this compelling presentation from Ben Bangert at Mozilla entitled &quot;&lt;a href=&quot;https://docs.google.com/presentation/d/1LO_WI3N-3p2Wp9PDWyv5B6EGFZ8XTOTNJ7Hd40WOUHo/edit?pli=1#slide=id.g70b0035b2_1_168&quot;&gt;From Python to Go and back again&lt;/a&gt;&quot;. Ben explains some of the limitations that he encountered while switching to Go.&lt;/p&gt;
&lt;p&gt;I&apos;m sure you can find problems and limitations with the Java Virtual Machine too.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://images.unsplash.com/photo-1475539175801-4f770d7d1a49?ixlib=rb-0.3.5&amp;amp;q=80&amp;amp;fm=jpg&amp;amp;crop=entropy&amp;amp;cs=tinysrgb&amp;amp;w=1080&amp;amp;fit=max&amp;amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;amp;s=08d0de01765dce0d3f464715abc656ce&quot; alt=&quot;Two jockeys riding horses head-to-head during a race&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In &lt;a href=&quot;https://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;, I wrote a few chapters covering the GIL and how you can circumvent its limitation. If you write widely scalable applications, the GIL is not such a big deal, as you need, anyway, to spread the load across multiple servers, not only on several processors.&lt;/p&gt;
&lt;p&gt;There are tons of companies running Python applications at large scale, e.g. &lt;a href=&quot;https://thenewstack.io/instagram-makes-smooth-move-python-3/&quot;&gt;Instagram&lt;/a&gt;, &lt;a href=&quot;https://www.python.org/about/quotes/&quot;&gt;Google and YouTube&lt;/a&gt;, &lt;a href=&quot;https://blogs.dropbox.com/tech/?s=python&quot;&gt;Dropbox&lt;/a&gt; or &lt;a href=&quot;https://www.paypal-engineering.com/tag/python/&quot;&gt;PayPal&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Therefore, no, Python is not only for QA applications, no more than Java is only good for browser applets nor Go is for devops or whatever.&lt;/p&gt;
&lt;p&gt;They all are different languages that approach problems from different angles. Depending on your mindset and on the solution that you want to implement, some might appear better equipped than others. Their virtual machines or compilers are marvelous, but also have their limitations and shortcomings that you need to be aware of so you can avoid falling into a big trap.&lt;/p&gt;
&lt;p&gt;Of course, another approach is to remove all those issues by going down a layer and use a lower level language, e.g. C or C++. That&apos;ll remove those limitations for sure: no Python GIL, no Go resources leaking, no JVM startup slowness, etc. However, it&apos;ll add a &lt;em&gt;ton&lt;/em&gt; of extra work and problems that YOU will have to solve – puzzles that are already resolved by higher-level languages. That&apos;s a matter of trade-offs: do you want to write a blazingly fast program in 10 years or do you want to write a decently fast program in 1 year? 😏&lt;/p&gt;
&lt;p&gt;In the end, picking a language is not only a matter of performance but also a concern of support, community, and ecosystem. Picking battle-tested languages like Python and Java is the assurance of reliability and trustworthiness, while selecting a younger language like Rust might be an exciting ride. Doing some &quot;reality check&quot; is always worth considering before choosing a language. If you wanted to write an application that uses, e.g., AMQP and HTTP/2, are you sure that there are libraries providing those features and that are broadly used and supported? Or are you ready to commit time to maintain them yourself?&lt;/p&gt;
&lt;p&gt;Again, Python is pretty solid here. Considering the extensive practice it has, there are tons of generously used libraries for everything you could ever need. The community is large and the ecosystem is flourishing.&lt;/p&gt;
&lt;p&gt;In the end, I do think that yes, Python is a terrific choice for any enterprise projects, and considering the number of existing projects it counts, I&apos;m not the only one thinking that way.&lt;/p&gt;
&lt;p&gt;Feel free to share your experience – or even projects – in the comments section below!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>On blog migration</title><link>https://julien.danjou.info/blog/blog-migration-ghost/</link><guid isPermaLink="true">https://julien.danjou.info/blog/blog-migration-ghost/</guid><description>I&apos;ve started my first Web page in 1998 and one could say that it evolved quite a bit in the meantime. From a Frontpage designed Web site with frames, it evolved to plain HTML files. I&apos;ve started blogg</description><pubDate>Wed, 21 Mar 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve started my first Web page in 1998 and one could say that it evolved quite a bit in the meantime. From a Frontpage designed Web site with frames, it evolved to plain HTML files. I&apos;ve started blogging in 2003, though the archives of this blog only gets back to 2007. Truth is, many things I wrote in the first years were short (there were no Twitter) and not that relevant nowadays. Therefore, I never migrated them along the road of the many migrations that site had.&lt;/p&gt;
&lt;p&gt;The last time I switched this site engine was in 2011, were I switched from &lt;a href=&quot;https://www.gnu.org/software/emacs-muse/index.html&quot;&gt;Emacs Muse&lt;/a&gt; (and my custom &lt;em&gt;muse-blog.el&lt;/em&gt; extension) to &lt;a href=&quot;https://github.com/hyde/hyde&quot;&gt;Hyde&lt;/a&gt;, a static Web site generator written in Python.&lt;/p&gt;
&lt;p&gt;That taught me a few things.&lt;/p&gt;
&lt;p&gt;First, you can&apos;t really know for sure which project will be a ghost in 5 years. I had no clue back then that Hyde author would lose interest and struggle passing the maintainership to someone else. The community was not big but it existed. Betting on a horse is part skill and part chance. My skills were probably lower seven years ago and I also may have had bad luck.&lt;/p&gt;
&lt;p&gt;Secondly, maintaining a Web site is painful. I used to blog more regularly a few years ago, as the friction of using a dynamic blog engine was lower than spawning my deprecated static engine. Knowing that it needs 2 minutes to generate a static Web site really makes it difficult to compose and see the result at the same time without losing patience. It took me a few years to decide it was time to invest in the migration. I just jumped from Hyde to &lt;a href=&quot;https://ghost.org/&quot;&gt;Ghost&lt;/a&gt;, hosted on their Pro engine as I don&apos;t want to do any maintenance. Let&apos;s be honest, I&apos;ve no will to inflict myself the maintenance of a JavaScript blogging engine.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://images.unsplash.com/photo-1486262715619-67b85e0b08d3?ixlib=rb-0.3.5&amp;amp;q=80&amp;amp;fm=jpg&amp;amp;crop=entropy&amp;amp;cs=tinysrgb&amp;amp;w=1080&amp;amp;fit=max&amp;amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;amp;s=b279db1ffb6919c7cc32df9b5300cdc7&quot; alt=&quot;Macro of motor engine with gears and screws&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The positive side is that this is still Markdown based, so the migration job was not so painful. Ghost offers a &lt;a href=&quot;https://api.ghost.org/&quot;&gt;REST API&lt;/a&gt; which allow to manipulate most of the content. It works fine, and I was able to leverage the &lt;a href=&quot;https://github.com/rycus86/ghost-client&quot;&gt;Python ghost-client&lt;/a&gt; to write a tiny migration script to migrate every post.&lt;/p&gt;
&lt;p&gt;I am looking forward to share most of the things that I work on during the next months. I really enjoyed reading contents of great hackers those last years, and I&apos;ve learned ton of things by reading the adventure of smarter engineers.&lt;/p&gt;
&lt;p&gt;It might be my time to share.&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Scaling a polling Python application with tooz</title><link>https://julien.danjou.info/blog/scaling-a-python-application-tooz/</link><guid isPermaLink="true">https://julien.danjou.info/blog/scaling-a-python-application-tooz/</guid><description>This article is the final one of the series I wrote about scaling a large number of connections in a Python application. If you don&apos;t remember what the problem we&apos;re trying to solve is, here it is, co</description><pubDate>Mon, 05 Mar 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This article is the final one of the series I wrote about scaling a large number of connections in a Python application. If you don&apos;t remember what the problem we&apos;re trying to solve is, here it is, coming from one of my followers:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It so happened that I&apos;m currently working on scaling some Python app. Specifically, now I&apos;m trying to figure out the best way to scale SSH connections - when one server has to connect to thousands (or even tens of thousands) of remote machines in a short period of time (say, several minutes).&lt;br /&gt;
How would you write an application that does that in a scalable way?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href=&quot;https://julien.danjou.info/blog/scaling-python-application-threads&quot;&gt;first blog post&lt;/a&gt; was exploring a solution based on threads, while the &lt;a href=&quot;https://julien.danjou.info/blog/scaling-python-application-asyncio&quot;&gt;second blog post&lt;/a&gt; was exploring an architecture around &lt;em&gt;asyncio&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In the two first articles, we wrote programs that could handle this problem by using multiple &lt;em&gt;threads&lt;/em&gt; or &lt;em&gt;asyncio&lt;/em&gt; – or both. While this worked pretty well, this had some limitations, such as only using one computer. So this time, we&apos;re going to take a different approach and use multiple computers!&lt;/p&gt;
&lt;h3&gt;The job&lt;/h3&gt;
&lt;p&gt;As we&apos;ve already seen, writing a Python application that connects to a host by ssh can be done using &lt;a href=&quot;http://docs.paramiko.org/en/&quot;&gt;Paramiko&lt;/a&gt; or &lt;a href=&quot;https://github.com/ronf/asyncssh&quot;&gt;asyncssh&lt;/a&gt; as we&apos;ve seen previously. Here again, that will not be the focus of this blog post since it is pretty straightforward to do.&lt;/p&gt;
&lt;p&gt;To keep this exercise simple, we&apos;ll reuse our &lt;code&gt;ping&lt;/code&gt; function from the first article. It looked like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import subprocess

def ping(hostname):
    p = subprocess.Popen([&quot;ping&quot;, &quot;-c&quot;, &quot;3&quot;, &quot;-w&quot;, &quot;1&quot;, hostname],
                         stdout=subprocess.DEVNULL,
                         stderr=subprocess.DEVNULL)
    return p.wait() == 0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As a reminder, running this program alone and pinging serially 255 IP addresses takes more than 10 minutes. Let&apos;s try to make it faster by running it in parallel.&lt;/p&gt;
&lt;h3&gt;The architecture&lt;/h3&gt;
&lt;p&gt;Remember: if pinging 255 hosts takes 10 minutes, pinging the whole Internet is going to take forever – around five years at this rate.&lt;/p&gt;
&lt;p&gt;With our ping experiment, we already divided our mission (e.g. &quot;who&apos;s alive on the Internet&quot;) into very small tasks (&quot;ping&quot;). If we want to ping 4 billion hosts, we need to run those tasks in parallel. But one computer is not going to be enough: we need to distribute those tasks to different hosts, so we can use some massive parallelism to go even faster!&lt;/p&gt;
&lt;p&gt;There are two ways to distribute such a set of tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Use a queue. That works well for jobs that are not determined in advance, such as user-submitted tasks or that are going to be executed only once.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use a distribution algorithm. That works only for tasks are determined in advance, and that are scheduled regularly, such as polling.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are going to pick the second option here, as those ping tasks (or polling in the original problem) should regularly be run. That approach will allow us to spread the jobs onto several processes whose can be even spread onto several nodes over a network. We also won&apos;t have to &quot;maintain&quot; the queue (e.g. make it work and monitor it) so that&apos;s also a bonus point.&lt;/p&gt;
&lt;p&gt;That&apos;s infinite horizontal scalability!&lt;/p&gt;
&lt;h3&gt;The distribution algorithm&lt;/h3&gt;
&lt;p&gt;The algorithm we&apos;re going to use to distribute this task is based on a &lt;a href=&quot;https://en.wikipedia.org/wiki/Consistent_hashing&quot;&gt;consistent hashring&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here&apos;s how it works in short. Picture a circular ring. We map objects onto this ring. The ring is then split into partitions. Those partitions are distributed among all the workers. The workers take care of jobs that are in the partitions they are responsible for.&lt;/p&gt;
&lt;p&gt;In the case where a new node joins the ring, it is inserted between 2 nodes and take a bit of their workload. In the case where a node leaves the ring, the partitions it was taking care of are reassigned to its adjacent nodes.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/consistent-hashing.png&quot; alt=&quot;Diagram of consistent hashing ring with partitions distributed among worker nodes&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If you want more details, it exists plenty of explanations about how this algorithm work. Feel free to look online!&lt;/p&gt;
&lt;p&gt;However, to make this work, we need to know which nodes are alive or dead. This is another problem to solve, and the best way to tackle it is to use a coordination mechanism. There are plenty of those, from &lt;a href=&quot;https://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt; to &lt;a href=&quot;https://coreos.com/etcd/&quot;&gt;etcd&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Without going too much into details, those pieces of software provide a network service where every node can connect to and can manage its state. If a client gets disconnected or crashes, it&apos;s then easy to consider it as removed. That enables the application to get the full list of nodes, and split the ring accordingly. There&apos;s no need to have any shared state between the nodes other than who&apos;s alive and running.&lt;/p&gt;
&lt;h3&gt;Using group membership&lt;/h3&gt;
&lt;p&gt;To get a list of nodes that are available to help us pinging the Internet, we need a service that provides this and a library to interact with it. Since the use case is pretty simple and I don&apos;t know which backends you like the most, we&apos;re going to use the &lt;a href=&quot;https://pypi.python.org/pypi/tooz&quot;&gt;Tooz&lt;/a&gt; library.&lt;/p&gt;
&lt;p&gt;Tooz provides a coordination mechanism on top of a large variety of backends: ZooKeeper or etcd, as suggested earlier, but also &lt;a href=&quot;https://redis.io&quot;&gt;Redis&lt;/a&gt; or &lt;a href=&quot;https://memcached.org&quot;&gt;memcached&lt;/a&gt; for those who want to live more dangerously. Indeed, while ZooKeeper or etcd can be set up in a synchronized cluster, memcached, on the other hand, is a &lt;a href=&quot;https://en.wikipedia.org/wiki/Single_point_of_failure&quot;&gt;SPOF&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For the sake of the exercise, we&apos;re going to use a single instance of etcd here. Thanks to Tooz, switching to another backend would be a one-line change anyway.&lt;/p&gt;
&lt;p&gt;Tooz provides a &lt;code&gt;tooz.coordination.Coordinator&lt;/code&gt; object that represents the connection to the coordination subsystem. It then exposes an API based on groups and members. A member is a node connected through a &lt;code&gt;Coordinator&lt;/code&gt; instance. A group is a place that members can join or leave.&lt;/p&gt;
&lt;p&gt;Here&apos;s a first implementation of a member joining a group and printing the member list:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import sys
import time

from tooz import coordination

## Check that a client and group ids are passed as arguments
if len(sys.argv) != 3:
    print(&quot;Usage: %s &amp;lt;client id&amp;gt; &amp;lt;group id&amp;gt;&quot; % sys.argv[0])
    sys.exit(1)

## Get the Coordinator object
c = coordination.get_coordinator(
    &quot;etcd3://localhost&quot;,
    sys.argv[1].encode())
## Start it (initiate connection).
c.start(start_heart=True)

group = sys.argv[2].encode()

## Create the group
try:
    c.create_group(group).get()
except coordination.GroupAlreadyExist:
    pass

## Join the group
c.join_group(group).get()

try:
    while True:
        # Print the members list
        members = c.get_members(group)
        print(members.get())
        time.sleep(1)
finally:
    # Leave the group
    c.leave_group(group).get()

    # Stop when we&apos;re done
    c.stop()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Don&apos;t forget to run etcd on your machine before running this program. Running a first instance of this program will print &lt;code&gt;set([&apos;client1&apos;])&lt;/code&gt; every second. As soon as you run a second instance of this program, they both start to print &lt;code&gt;set([&apos;client1&apos;, &apos;client2&apos;])&lt;/code&gt;. If you shut down one of the clients, they will print the member list with only one member of it.&lt;/p&gt;
&lt;p&gt;This can work with any number of client. If a client crashes rather than disconnect properly, its membership will automatically expire a few seconds – you can configure this expiration period with by passing a &lt;code&gt;timeout&lt;/code&gt; value in&lt;br /&gt;
Tooz URL.&lt;/p&gt;
&lt;h3&gt;Using consistent hashing&lt;/h3&gt;
&lt;p&gt;Now that we have a group, which will turn out to be our &lt;em&gt;ring&lt;/em&gt;, we can&lt;br /&gt;
implement consistent hashring on top of it. Fortunately, Tooz also provides an implementation of this that is ready to be used. Rather than using the&lt;br /&gt;
&lt;code&gt;join_group&lt;/code&gt; method, we&apos;re gonna use the &lt;code&gt;join_partitioned_group&lt;/code&gt; method.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import sys
import time

from tooz import coordination

## Check that a client and group ids are passed as arguments
if len(sys.argv) != 3:
    print(&quot;Usage: %s &amp;lt;client id&amp;gt; &amp;lt;group id&amp;gt;&quot; % sys.argv[0])
    sys.exit(1)

## Get the Coordinator object
c = coordination.get_coordinator(
    &quot;etcd3://localhost&quot;,
    sys.argv[1].encode())
## Start it (initiate connection).
c.start(start_heart=True)

group = sys.argv[2].encode()

## Join the partitioned group
p = c.join_partitioned_group(group)

try:
    while True:
        print(p.members_for_object(&quot;foobar&quot;))
        time.sleep(1)
finally:
    # Leave the group
    c.leave_group(group).get()

    # Stop when we&apos;re done
    c.stop()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running this program on one node (or just one terminal) will output the following every second:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ python distribution.py client1 foobar
0 handled by set([&apos;client1&apos;])
1 handled by set([&apos;client1&apos;])
2 handled by set([&apos;client1&apos;])
3 handled by set([&apos;client1&apos;])
4 handled by set([&apos;client1&apos;])
5 handled by set([&apos;client1&apos;])
6 handled by set([&apos;client1&apos;])
7 handled by set([&apos;client1&apos;])
8 handled by set([&apos;client1&apos;])
9 handled by set([&apos;client1&apos;])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As soon as a second members join (just run another copy of the script in another terminal), the output changes and both the running programs output the same thing:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0 handled by set([&apos;client2&apos;])
1 handled by set([&apos;client1&apos;])
2 handled by set([&apos;client1&apos;])
3 handled by set([&apos;client1&apos;])
4 handled by set([&apos;client1&apos;])
5 handled by set([&apos;client2&apos;])
6 handled by set([&apos;client2&apos;])
7 handled by set([&apos;client1&apos;])
8 handled by set([&apos;client1&apos;])
9 handled by set([&apos;client2&apos;])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;They just shared the ten objects between them. They &lt;strong&gt;did not communicate with each other&lt;/strong&gt;. They just know each other presence, and since they are using the same algorithm to compute where an object should belong, they share the same&lt;br /&gt;
results. You can do the test with a third copy of the program:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0 handled by set([&apos;client2&apos;])
1 handled by set([&apos;client1&apos;])
2 handled by set([&apos;client1&apos;])
3 handled by set([&apos;client1&apos;])
4 handled by set([&apos;client1&apos;])
5 handled by set([&apos;client2&apos;])
6 handled by set([&apos;client2&apos;])
7 handled by set([&apos;client3&apos;])
8 handled by set([&apos;client1&apos;])
9 handled by set([&apos;client3&apos;])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we got a third client in the mix, excellent! If we stop one of the clients, the rebalancing is done automatically.&lt;/p&gt;
&lt;p&gt;While the consistent hashing approach is great, is has a few characteristics you might want to know about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The distribution algorithm is not made to be perfectly even. If you have a vast number of objects, it might seem pretty even statistically, but if you are trying to distribute two objects on two nodes, it&apos;s probable one node will handle the two objects and the other one none.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The distribution is not done in real time, meaning there&apos;s a small chance that an object might be owned by two nodes at the same time. This is not a problem in a scenario such as this one, since pinging a host twice is not going to be a big deal, but if your job needed to be unique and executed once and only once, this might not be an adequate method of distribution. Rather use a queue which has the proper characteristics.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Distributed ping&lt;/h3&gt;
&lt;p&gt;Now that we have our hashring ready to distribute our job, we can implement our final program!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import sys
import subprocess
import time

from tooz import coordination

## Check that a client and group ids are passed as arguments
if len(sys.argv) != 3:
    print(&quot;Usage: %s &amp;lt;client id&amp;gt; &amp;lt;group id&amp;gt;&quot; % sys.argv[0])
    sys.exit(1)

## Get the Coordinator object
c = coordination.get_coordinator(
    &quot;etcd3://localhost&quot;,
    sys.argv[1].encode())
## Start it (initiate connection).
c.start(start_heart=True)

group = sys.argv[2].encode()

## Join the partitioned group
p = c.join_partitioned_group(group)

class Host(object):
    def __init__(self, hostname):
        self.hostname = hostname

    def __tooz_hash__(self):
        &quot;&quot;&quot;Returns a unique byte identifier so Tooz can distribute this object.&quot;&quot;&quot;
        return self.hostname.encode()

    def __str__(self):
        return &quot;&amp;lt;%s: %s&amp;gt;&quot; % (self.__class__.__name__, self.hostname)

    def ping(self):
        p = subprocess.Popen([&quot;ping&quot;, &quot;-q&quot;, &quot;-c&quot;, &quot;3&quot;, &quot;-W&quot;, &quot;1&quot;,
                              self.hostname],
                             stdout=subprocess.DEVNULL,
                             stderr=subprocess.DEVNULL)
        return p.wait() == 0

hosts_to_ping = [Host(&quot;192.168.2.%d&quot; % i) for i in range(255)]

try:
    while True:
        for host in hosts_to_ping:
            c.run_watchers()
            if p.belongs_to_self(host):
                print(&quot;Pinging %s&quot; % host)
                if host.ping():
                    print(&quot;  %s is alive&quot; % host)
        time.sleep(1)
finally:
    # Leave the group
    c.leave_group(group).get()

    # Stop when we&apos;re done
    c.stop()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When the first client starts, it starts iterating on the host, and since it is alone, all hosts belong to it. So it starts pinging all nodes:&lt;/p&gt;
&lt;p&gt;{% syntax %}&lt;br /&gt;
$ python3 ping.py client1 ping&lt;br /&gt;
Pinging &amp;lt;Host: 192.168.2.0&amp;gt;&lt;br /&gt;
&amp;lt;Host: 192.168.2.0&amp;gt; is alive&lt;br /&gt;
Pinging &amp;lt;Host: 192.168.2.1&amp;gt;&lt;br /&gt;
&amp;lt;Host: 192.168.2.1&amp;gt; is alive&lt;br /&gt;
Pinging &amp;lt;Host: 192.168.2.2&amp;gt;&lt;br /&gt;
{% endsyntax %}&lt;/p&gt;
&lt;p&gt;Then, a second client starts pinging too, and automatically the jobs are split. The &lt;code&gt;client1&lt;/code&gt; instance starts skipping some nodes that now belongs to &lt;code&gt;client2&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## client1 output
Pinging &amp;lt;Host: 192.168.2.8&amp;gt;
  &amp;lt;Host: 192.168.2.8&amp;gt; is alive
Pinging &amp;lt;Host: 192.168.2.9&amp;gt;
Pinging &amp;lt;Host: 192.168.2.11&amp;gt;
Pinging &amp;lt;Host: 192.168.2.12&amp;gt;

## client2 output
Pinging &amp;lt;Host: 192.168.2.7&amp;gt;
Pinging &amp;lt;Host: 192.168.2.10&amp;gt;
Pinging &amp;lt;Host: 192.168.2.13&amp;gt;
  &amp;lt;Host: 192.168.2.13&amp;gt; is alive
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;On the other hand, &lt;code&gt;client2&lt;/code&gt; is skipping nodes that are belonging to &lt;code&gt;client1&lt;/code&gt;. If you want to scale further our application, we can start new clients on other nodes on the network and expand our pinging system!&lt;/p&gt;
&lt;h3&gt;Just a first step&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://scaling-python.com&quot;&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-scaling-python.png&quot; alt=&quot;Cover of The Hacker&apos;s Guide to Scaling Python&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This &lt;code&gt;ping&lt;/code&gt; job does not use a lot of CPU time or I/O bandwidth, neither would the original ssh case by Alon. However, if that would be the case, this method would be even more efficient as the scalability of the resources would be a key.&lt;/p&gt;
&lt;p&gt;These are just the first steps of the distribution and scalability mechanism&lt;br /&gt;
that you can implement using Python. There are a few other options available on top of this mechanism such as defining different weights for different nodes or using replicas to achieve high-availability scenario. I&apos;ve covered those in my book &lt;a href=&quot;https://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;, if you&apos;re interested in learning more!&lt;/p&gt;
</content:encoded><category>python</category><category>openstack</category></item><item><title>Scaling a polling Python application with asyncio</title><link>https://julien.danjou.info/blog/scaling-python-application-asyncio/</link><guid isPermaLink="true">https://julien.danjou.info/blog/scaling-python-application-asyncio/</guid><description>This article is a follow-up of my previous blog post about scaling a large number of connections . If you don&apos;t remember, I was trying to solve one of my followers&apos; problem:  &gt; It so happened that I&apos;m</description><pubDate>Mon, 12 Feb 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This article is a follow-up of my &lt;a href=&quot;https://julien.danjou.info/blog/scaling-python-application-threads&quot;&gt;previous blog post about scaling a large number of connections&lt;/a&gt;. If you don&apos;t remember, I was trying to solve one of my followers&apos; problem:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It so happened that I&apos;m currently working on scaling some Python app. Specifically, now I&apos;m trying to figure out the best way to scale SSH connections - when one server has to connect to thousands (or even tens of thousands) of remote machines in a short period of time (say, several minutes).&lt;br /&gt;
How would you write an application that does that in a scalable way?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In the first article, we wrote a program that could handle large scale of this problem by using multiple &lt;em&gt;threads&lt;/em&gt;. While this worked pretty well, this had some severe limitations. This time, we&apos;re going to take a different approach.&lt;/p&gt;
&lt;h2&gt;The job&lt;/h2&gt;
&lt;p&gt;The job has not changed and is still about connecting to a remote server via&lt;br /&gt;
ssh. This time, rather than faking it by using &lt;em&gt;ping&lt;/em&gt; instead, we are going to connect for real to an ssh server. Once connected to the remote server, the mission will be to run a single command. For the sake of this example, the command that will be run here is just a simple &quot;echo hello world&quot;.&lt;/p&gt;
&lt;h2&gt;Using an event loop&lt;/h2&gt;
&lt;p&gt;This time, rather than leveraging threads, we are using &lt;a href=&quot;https://docs.python.org/3/library/asyncio.html&quot;&gt;asyncio&lt;/a&gt;. &lt;em&gt;Asyncio&lt;/em&gt; is the leading Python event loop system implementation. It allows executing multiple functions (named &lt;em&gt;coroutines&lt;/em&gt;) concurrently. The idea is that each time a coroutine performs an I/O operation, it yields back the control to the event loop. As the input or output might be blocking (e.g., the socket has no data yet to be read), the event loop will reschedule the coroutine as soon as there is work to do. In the meantime, the loop can schedule another coroutine that has something to do – or wait for that to happen.&lt;/p&gt;
&lt;p&gt;Not all libraries are compatible with the &lt;em&gt;asyncio&lt;/em&gt; framework. In our case, we need an ssh library that has support for &lt;em&gt;asyncio&lt;/em&gt;. It happens that &lt;a href=&quot;https://github.com/ronf/asyncssh&quot;&gt;&lt;em&gt;AsyncSSH&lt;/em&gt;&lt;/a&gt; is a Python library that provides ssh connection handling support for asyncio. It is particularly easy to use, and the &lt;a href=&quot;http://asyncssh.readthedocs.io/&quot;&gt;documentation has plenty of examples&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here&apos;s the function that we&apos;re going to use to execute our command on a remote host:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import asyncssh

async def run_command(host, command):
    async with asyncssh.connect(host) as conn:
        result = await conn.run(command)
        return result.stdout
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The function &lt;code&gt;run_command&lt;/code&gt; runs a &lt;code&gt;command&lt;/code&gt; on a remote &lt;code&gt;host&lt;/code&gt; once connected&lt;br /&gt;
to it via ssh. It then returns the standard output of the command. The function uses the keywords &lt;code&gt;async&lt;/code&gt; and &lt;code&gt;await&lt;/code&gt; that are specific to Python &amp;gt;= 3.6 and &lt;em&gt;asyncio&lt;/em&gt;. It indicates that the called functions are coroutine that might be blocking, and that the control is yield back to the event loop.&lt;/p&gt;
&lt;p&gt;As I don&apos;t own hundreds of servers where I can connect to, I will be using a single remote server as the target – but the program will connect to it multiple times. The server is at a latency of about 6 ms, so that&apos;ll magnify a bit the results.&lt;/p&gt;
&lt;p&gt;The first version of this program is simple and stupid. It&apos;ll run N times the &lt;code&gt;run_command&lt;/code&gt; function serially by providing the tasks one at a time to the &lt;em&gt;asyncio&lt;/em&gt; event loop:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;loop = asyncio.get_event_loop()

outputs = [
    loop.run_until_complete(
        run_command(&quot;myserver&quot;, &quot;echo hello world %d&quot; % i))
    for i in range(200)
]
print(outputs)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once executed, the program prints the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ time python3 asyncssh-test.py
[&apos;hello world 0\n&apos;, &apos;hello world 1\n&apos;, &apos;hello world 2\n&apos;, … &apos;hello world 199\n&apos;]
python3 asyncssh-test.py  6.11s user 0.35s system 15% cpu 41.249 total
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It took 41 seconds to connect 200 times to the remote server and execute a simple printing command.&lt;/p&gt;
&lt;p&gt;To make this faster, we&apos;re going to schedule all the coroutines at the same time. We just need to feed the event loop with the 200 coroutines at once. That will give it the ability to schedule them efficiently.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;outputs = loop.run_until_complete(asyncio.gather(
    *[run_command(&quot;myserver&quot;, &quot;echo hello world %d&quot; % i)
      for i in range(200)]))
print(outputs)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By using &lt;code&gt;asyncio.gather&lt;/code&gt;, it is possible to pass a list of coroutines and wait for all of them to be finished. Once run, this program prints the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ time python3 asyncssh-test.py
[&apos;hello world 0\n&apos;, &apos;hello world 1\n&apos;, &apos;hello world 2\n&apos;, … &apos;hello world 199\n&apos;]
python3 asyncssh-test.py  4.90s user 0.34s system 35% cpu 14.761 total
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This version only took ⅓ of the original execution time to finish! As a fun note, the main limitation here is that my remote server is having trouble to handle more than 150 connections in parallel, so this program is a bit tough for it alone.&lt;/p&gt;
&lt;h2&gt;Scalability&lt;/h2&gt;
&lt;p&gt;To show how great this method is, I&apos;ve built a chart below that shows the difference of execution time between the two approaches, depending on the number of hosts the application has to connect to.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/chart-asyncssh.png&quot; alt=&quot;chart-asyncssh&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The trend lines highlight the difference of execution time and how important the concurrency is here. For 10,000 nodes, the time needed for a serial execution would be around 40 minutes whereas it would be only 7 minutes with a cooperative approach – quite a difference. The concurrent approach allows executing one command 205 times a day rather than only 36 times!&lt;/p&gt;
&lt;h2&gt;That was the second step&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://scaling-python.com&quot;&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-scaling-python.png&quot; alt=&quot;Cover of The Hacker&apos;s Guide to Scaling Python&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Using an event loop for tasks that can run concurrently due to their I/O intensive nature is really a great way to maximize the throughput of a program. This simple changes made the program &lt;em&gt;6× faster&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Anyhow, this is not the only way to scale Python program. There are a few other options available on top of this mechanism – I&apos;ve covered those in my book &lt;a href=&quot;https://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;, if you&apos;re interested in learning more!&lt;/p&gt;
&lt;p&gt;Until then, stay tuned for the next article of this series!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Scaling a polling Python application with parallelism</title><link>https://julien.danjou.info/blog/scaling-python-application-threads/</link><guid isPermaLink="true">https://julien.danjou.info/blog/scaling-python-application-threads/</guid><description>A few weeks ago, Alon contacted me and asked me the following:  &gt; It so happened that I&apos;m currently working on scaling some Python app. Specifically, now I&apos;m trying to figure out the best way to scale</description><pubDate>Tue, 23 Jan 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few weeks ago, Alon contacted me and asked me the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It so happened that I&apos;m currently working on scaling some Python app. Specifically, now I&apos;m trying to figure out the best way to scale SSH connections - when one server has to connect to thousands (or even tens of thousands) of remote machines in a short period of time (say, several minutes).&lt;br /&gt;
How would you write an application that does that in a scalable way?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Alon is using such an application to gather information on the hosts it connects to, though that&apos;s not important in this case.&lt;/p&gt;
&lt;p&gt;In a series of blog post, I&apos;d like to help Alon solve this problem! We&apos;re gonna write an application that can manage millions of hosts.&lt;/p&gt;
&lt;p&gt;Well, if you have enough hardware, obviously.&lt;/p&gt;
&lt;h2&gt;The job&lt;/h2&gt;
&lt;p&gt;Writing a Python application that connects to a host by ssh can be done using, for example, &lt;a href=&quot;http://docs.paramiko.org/en/&quot;&gt;Paramiko&lt;/a&gt;. That will not be the focus of this blog post since it is pretty straightforward to do.&lt;/p&gt;
&lt;p&gt;To keep this exercise simple, we&apos;ll just use a &lt;code&gt;ping&lt;/code&gt; function that looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import subprocess

def ping(hostname):
    p = subprocess.Popen([&quot;ping&quot;, &quot;-c&quot;, &quot;3&quot;, &quot;-W&quot;, &quot;1&quot;, hostname],
                         stdout=subprocess.DEVNULL,
                         stderr=subprocess.DEVNULL)
    return p.wait() == 0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The function &lt;code&gt;ping&lt;/code&gt; returns &lt;code&gt;True&lt;/code&gt; if the host is reachable and alive, or &lt;code&gt;False&lt;/code&gt; if an error occurs (bad hostname, network unreachable, ping timeout, etc.). We&apos;re also not trying to make &lt;code&gt;ping&lt;/code&gt; fast by specifying a lower timeout or a smaller number of packets. The goal is to scale this task while knowing it &lt;em&gt;takes time&lt;/em&gt; to execute.&lt;/p&gt;
&lt;p&gt;So &lt;code&gt;ping&lt;/code&gt; is going to be the job to be executed by our application. It&apos;ll replace &lt;code&gt;ssh&lt;/code&gt; in this example, but you&apos;ll see it&apos;ll be easy to replace it with any other job &lt;em&gt;you&lt;/em&gt; might have.&lt;/p&gt;
&lt;p&gt;We&apos;re going to use this job to accomplish a bigger mission: determine which hosts in my home network are up:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for i in range(255):
    ip = &quot;192.168.2.%d&quot; % i
    if ping(ip):
        print(&quot;%s is alive&quot; % ip)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running this program alone and pinging all 255 IP addresses takes more than 10 minutes.&lt;/p&gt;
&lt;p&gt;It is pretty slow because each time we ping a host, we wait for the ping to succeed or timeout before starting the next ping. So if you need 3 seconds to ping each host in average, then to ping 255 nodes you&apos;ll need 5 seconds × 255 = 765 seconds and that&apos;s more than 12 minutes.&lt;/p&gt;
&lt;h2&gt;The solution&lt;/h2&gt;
&lt;p&gt;If 255 hosts need 12 minutes to be pinged, you can imagine how long it&apos;s going to be when we&apos;re going to test which hosts are alive on the IPv4 Internet – 4 294 967 296 addresses to ping!&lt;/p&gt;
&lt;p&gt;Since those ping (or ssh) jobs are not CPU intensive, we can consider that one multi-processor host is going to be powerful enough – at least for a beginning.&lt;/p&gt;
&lt;p&gt;The real issue here currently is that those tasks are I/O intensive and executing them serially is &lt;em&gt;very&lt;/em&gt; long.&lt;/p&gt;
&lt;p&gt;So let&apos;s run them in parallel!&lt;/p&gt;
&lt;p&gt;To do this, we&apos;re going to use &lt;em&gt;threads&lt;/em&gt;. Threads are not efficient in Python when your tasks are CPU intensive, but in case of blocking I/O, they are good enough.&lt;/p&gt;
&lt;h2&gt;Using &lt;code&gt;concurrent.futures&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;With &lt;code&gt;concurrent.futures&lt;/code&gt;, it&apos;s easy to manage a pool of threads and schedule the execution of tasks. Here&apos;s how we&apos;re going to do it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import functools
from concurrent import futures
import subprocess

def ping(hostname):
    p = subprocess.Popen([&quot;ping&quot;, &quot;-q&quot;, &quot;-c&quot;, &quot;3&quot;, &quot;-W&quot;, &quot;1&quot;,
                          hostname],
                         stdout=subprocess.DEVNULL,
                         stderr=subprocess.DEVNULL)
    return p.wait() == 0

with futures.ThreadPoolExecutor(max_workers=4) as executor:
    futs = [
        (host, executor.submit(functools.partial(ping, host)))
        for host in (&quot;192.168.2.%d&quot; % i for i in range(255))
    ]

    for ip, f in futs:
        if f.result():
            print(&quot;%s is alive&quot; % ip)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; is an engine, called executor, that allows us to submit tasks to it. Each task submitted is put into an internal queue using the &lt;code&gt;executor.submit&lt;/code&gt; method. This method takes a function to execute as argument.&lt;/p&gt;
&lt;p&gt;Then, the executor pulls jobs out of its queue and execute them. In order to execute them, it starts a thread that is going to be responsible for the execution. The maximum number of threads to start is controlled by the &lt;code&gt;max_workers&lt;/code&gt; parameters.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;executor.submit&lt;/code&gt; returns a &lt;code&gt;Future&lt;/code&gt; object, that holds the future result of the submitted task. &lt;code&gt;Future&lt;/code&gt; objects expose methods to know if the task is finished or not; here we just use &lt;code&gt;Future.result()&lt;/code&gt; to get the result. This method will block until the result is ready.&lt;/p&gt;
&lt;p&gt;There&apos;s no magic recipe to find how many max workers you should use. It really depends on the nature of the tasks that are submitted. In this case, using a value of 4 brings down the execution time to 3 minutes – roughly 12 minutes divided by 4, which makes sense. Setting the &lt;code&gt;max_workers&lt;/code&gt; to 255 (i.e. the number of tasks submitted) will make all the pings started at the same time, producing a CPU usage spike, but bringing down the total execution time to less than 5 seconds!&lt;/p&gt;
&lt;p&gt;Obviously, you wouldn&apos;t be able to start 4 billion threads in parallel, but if your system is big and fast enough, and your task using more I/O than CPU, you can use a pretty high value in this case. The memory should also be taken into account – in this case, it&apos;s very low since the ping task is not using a lot of memory.&lt;/p&gt;
&lt;h2&gt;Just a first step&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://scaling-python.com&quot;&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-scaling-python.png&quot; alt=&quot;Cover of The Hacker&apos;s Guide to Scaling Python&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As already said, this &lt;code&gt;ping&lt;/code&gt; job does not use a lot of CPU time or I/O bandwidth, neither would the original ssh case by Alon. However, if that would be the case, this method would be limited pretty quickly. Threads are not always the best option to maximize your throughput, especially with Python.&lt;/p&gt;
&lt;p&gt;These are just the first steps of the distribution and scalability mechanism that you can implement using Python. There are a few other options available on top of this mechanism – I&apos;ve covered those in my book &lt;a href=&quot;https://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;, if you&apos;re interested in learning more!&lt;/p&gt;
&lt;p&gt;If you&apos;re curious, go read &lt;a href=&quot;https://julien.danjou.info/blog/scaling-python-application-asyncio&quot;&gt;the next article of this series&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Scaling Python released</title><link>https://julien.danjou.info/blog/scaling-python-released/</link><guid isPermaLink="true">https://julien.danjou.info/blog/scaling-python-released/</guid><description>I am proud to announce today the immediate release of Scaling Python, my second book about Python! It talks about the distribution and performance of applications written in Python, and how to build.</description><pubDate>Tue, 05 Dec 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I am proud to announce today the immediate release of &lt;em&gt;&lt;a href=&quot;https://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;&lt;/em&gt;, my second book about Python! It talks about the distribution and performance of applications written in Python, and how to build them properly!&lt;/p&gt;
&lt;p&gt;It took me a year to build this entirely new product around Python. It&apos;s an exciting moment and I am sure it will enjoy many of my dear readers that are waiting for it for a while now!&lt;/p&gt;
&lt;p&gt;I&apos;ve been able to build this using my last three years of experience working on &lt;em&gt;&lt;a href=&quot;http://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;&lt;/em&gt; – an amazing adventure.&lt;/p&gt;
&lt;p&gt;Starting now, you can enjoy reading the book and learn a bit more about building distributed and scalable applications with Python. I really hope it&apos;ll help you bring your Python-fu to a new level, and that it will help you build great projects!&lt;/p&gt;
&lt;p&gt;Since this is first days of sale, you will enjoy &lt;strong&gt;a 15% discount&lt;/strong&gt; on all packages for the next 48 hours!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://scaling-python.com&quot;&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-scaling-python.png&quot; alt=&quot;Cover of The Hacker&apos;s Guide to Scaling Python&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>Scaling Python: the interviewees</title><link>https://julien.danjou.info/blog/scaling-python-interviews/</link><guid isPermaLink="true">https://julien.danjou.info/blog/scaling-python-interviews/</guid><description>The release date for Scaling Python is now very close! Today, I&apos;d like to talk a bit about the interviews that I&apos;ve run those last months that are featured in the book.</description><pubDate>Tue, 28 Nov 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The release date for &lt;a href=&quot;http://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt; is now very close! Today, I&apos;d like to talk a bit about the interviews that I&apos;ve run those last months that are featured in the book.&lt;/p&gt;
&lt;p&gt;I&apos;m glad that during those long weeks work, I have managed to find a Python expert on each of the major topic covered in the book. They will provide hindsight on the different subject covered and share their experience so you can benefit from it!&lt;/p&gt;
&lt;p&gt;Without further delay, ladies and gentlemen, here they are:&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;# Mehdi Abaakouk &lt;img src=&quot;https://julien.danjou.info/content/images/03/mabaakouk.png&quot; alt=&quot;Mehdi Abaakouk portrait&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Mehdi is a French free software hacker, working at Red Hat, who has been using Linux for almost twenty years now. He works daily on OpenStack, the largest open source project using Python. He also regularly builds and contribute to distributed applications and is responsible for several widely used Python libraries – &lt;em&gt;Cotyledon&lt;/em&gt;, &lt;em&gt;oslo.messaging&lt;/em&gt;, etc.&lt;/p&gt;
&lt;p&gt;In the book, Mehdi gives excellent tips on how to build distributed daemons.&lt;/p&gt;
&lt;h2&gt;Naoki Inada&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/ninada.png&quot; alt=&quot;Naoki Inada portrait&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Naoki is a Japanese software engineer, who happens to also be one of the CPython developers. He worked on several significant features in CPython, such as &lt;em&gt;asyncio&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;You&apos;ll be able to read Naoki opinion on Python and other programming languages when it comes to asynchronous workflows.&lt;/p&gt;
&lt;h2&gt;Chris Dent&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/cdent.png&quot; alt=&quot;Chris Dent portrait&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Chris Dent has been using Python for more than 15 years now and is an expert on WSGI. He has an extensive knowledge about REST API – he is one of the early organizers of the &lt;a href=&quot;https://specs.openstack.org/openstack/api-wg/&quot;&gt;OpenStack API working group&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Chris has, among other things, created Gabbi, a fabulous Python testing tool for. In &lt;em&gt;Scaling Python&lt;/em&gt;, he provides best practice on building REST API.&lt;/p&gt;
&lt;h2&gt;Joshua Harlow&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/harlowja.png&quot; alt=&quot;Joshua Harlow portrait&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Joshua is a highly experienced engineer in distributed systems. He maintains a few Python libraries, such as &lt;em&gt;Kazoo&lt;/em&gt; (ZooKeeper client) or &lt;em&gt;TaskFlow&lt;/em&gt; (distributed tasks).&lt;/p&gt;
&lt;p&gt;In the book, Joshua lays down principles that make Python application resilient and fault tolerant.&lt;/p&gt;
&lt;h2&gt;Alexys Jacob-Monier&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/ajacobmonier.png&quot; alt=&quot;Alexys Jacob-Monier portrait&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Alexys is the CTO of 1000mercis and is part of the open-source software community for a few years now. He regularly gives speeches at Python conferences and talks about how to leverage Python when distributing applications.&lt;/p&gt;
&lt;p&gt;Alexys talks about advanced techniques, e.g. using consistent hash rings, and how they should be applied.&lt;/p&gt;
&lt;h3&gt;Victor Stinner&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/vstinner.png&quot; alt=&quot;Victor Stinner portrait&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Victor is a long time CPython core developer, working on the language itself for several years now. He is well known in the community for working on making CPython faster and leads several performance-oriented projects.&lt;/p&gt;
&lt;p&gt;In &lt;em&gt;Scaling Python&lt;/em&gt;, Victor talks about optimizations, profiling, and performance when using Python, and how to make the right decisions.&lt;/p&gt;
&lt;h3&gt;Jason Myers&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/jmyers.png&quot; alt=&quot;Jason Myers portrait&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Jason is a Python developer and an author – he wrote an entire book on SQLAlchemy, the famous Python SQL library. He worked on cloud computing platforms, as a Web developer, and as a data engineer.&lt;/p&gt;
&lt;p&gt;In the book, we discuss with Jason about caching and RDBMS usage.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;a href=&quot;https://scaling-python.com&quot;&gt;&lt;img src=&quot;https://scaling-python.com/img/the-hacker-guide-to-scaling-python.png&quot; alt=&quot;Cover of The Hacker&apos;s Guide to Scaling Python&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It was marvelous to have a chat with all those developers and pick their brain about different subjects. These contents broaden the scope and expand the view of the themes covered through the chapters. I can&apos;t thank them all enough!&lt;/p&gt;
&lt;p&gt;If you want to be informed of the release of the book, subscribe in the following form! You&apos;ll be the first to be notified and to enjoy an exclusive offer. ;-)&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>My interview with Cool Python Codes</title><link>https://julien.danjou.info/blog/interview-coolpythoncodes/</link><guid isPermaLink="true">https://julien.danjou.info/blog/interview-coolpythoncodes/</guid><description>A few days ago, I&apos;ve recently been contacted by Godson Rapture from Cool Python codes to answer a few questions about what I work on in open source.</description><pubDate>Thu, 05 Oct 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few days ago, I&apos;ve recently been contacted by Godson Rapture from &lt;a href=&quot;http://coolpythoncodes.com/&quot;&gt;Cool Python codes&lt;/a&gt; to answer a few questions about what I work on in open source. Godson regularly interview developers and I invite you to check out his website!&lt;/p&gt;
&lt;p&gt;Here&apos;s a copy of &lt;a href=&quot;http://coolpythoncodes.com/julien-danjou/&quot;&gt;my original interview&lt;/a&gt;. Enjoy!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Good day, Julien Danjou, welcome to Cool Python Codes. Thanks for taking your precious time to be here.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You’re welcome!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Could you kindly tell us about yourself like your full name, hobbies, nationality, education, and experience in programming?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sure. I’m Julien Danjou, I’m French and live in Paris, France. I studied Computer science for 5 years around 15 years ago, and continued my career in that field since then, specializing in open source projects.&lt;/p&gt;
&lt;p&gt;Those last years, I’ve been working as a software engineer at Red Hat. I’ve spent the last 10 years working with the Python programming language. Now I work on the Gnocchi project which is a time series database.&lt;/p&gt;
&lt;p&gt;When I’m not coding, I enjoy running half-marathon and playing FPS games.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/pyconfr-2017-jd.jpg&quot; alt=&quot;Julien Danjou at PyCon France 2017&quot; /&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Can you narrate your first programming experience and what got you to start learning to program?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I started programming around 2001, and my first serious programs were in Perl. I was contributing to a hosting platform for free software named VHFFS. It was a free software project itself, and I enjoyed being able to learn from other more experienced developers and being able to contribute back to it. That’s what got me stuck into that world of open source projects.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Which programming language do you know and which is your favorite?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I know quite a few, I’ve been doing serious programming in Perl, C, Lua, Common Lisp, Emacs Lisp and Python.&lt;/p&gt;
&lt;p&gt;Obviously, my favorite is Common Lisp, but I was never able to use it for any serious project, for various reasons. So I spend most of my time hacking with Python, which I really enjoy as it is close to Lisp, in some ways. I see it as a small subset of Lisp.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What inspired you to venture into the world of programming and drove you to learn a handful of programming languages?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It was mostly scratching my own itches when I started. Each time I saw something I wanted to do or a feature I wanted in an existing software, I learned what I needed to get going and get it working.&lt;/p&gt;
&lt;p&gt;I studied C and Lua while writing awesome- the window manager that I created 10 years ago and used for a while. I learned Emacs Lisp while writing extensions that I wanted to see in Emacs, etc. It’s the best way to start.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What is your blog about?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My blog is a platform where I write about what I work on most of the time. Nowadays, it’s mostly about Python and the main project I contribute to,&lt;br /&gt;
Gnocchi.&lt;/p&gt;
&lt;p&gt;When writing about Gnocchi, I usually try to explain what part of the project I worked on, what new features we achieved, etc.&lt;/p&gt;
&lt;p&gt;On Python, I try to share solutions to common problems I encountered or identified while doing e.g. code reviews. Or presenting a new library I created!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Tell us more about your book, The Hacker’s Guide to Python.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It’s a compilation of everything I learned those last years building large Python applications. I spent the last 6 years developing on a large code base with thousands of other developers.&lt;/p&gt;
&lt;p&gt;I’ve reviewed tons of code and identified the biggest issues, mistakes, and bad practice that developers tend to have. I decided to compile that in a guide, helping developers that played a bit with Python to learn the stages to get really productive with Python.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenStack is the biggest open source project in Python, Can you tell us more about OpenStack?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenStack is a cloud computing platform, started 7 years ago now. Its goal is to provide a programmatic platform to manage your infrastructure while being open source and avoiding vendor lock-in.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Who uses OpenStack? Is it for programmers, website owners?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It’s used by a lot of different organizations – not really by individuals. It’s a big piece of software. You can find it in some famous public cloud providers (Dreamhost, Rackspace…), and also as a private cloud in a lot of different organizations, from Bloomberg to eBay or the CERN in Switzerland, a big OpenStack user. Tons of telecom providers also leverages OpenStack for their own internal infrastructure.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Have you participated in any OpenStack conference? What did you speak on if&lt;br /&gt;
you did?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I’ve attended the last 9 OpenStack summits and a few other OpenStack events around the world. I’ve been engaged in the upstream community for the last 6 years now.&lt;/p&gt;
&lt;p&gt;My area of expertise is telemetry, the stack of software that is in charge of collecting and storing metrics from the various OpenStack components. This is what I regularly talk about during those events.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How can one join the OpenStack community?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There’s an entire documentation about that, called the &lt;a href=&quot;https://docs.openstack.org/infra/manual/developers.html&quot;&gt;Developer’s Guide&lt;/a&gt;. It explains how to setup your environment to send patches, how to join the community using the mailing-lists or IRC.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What makes your book, &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker’s Guide to Python&lt;/a&gt; stand out from other Python books? Also, who exactly did you write this book for?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wrote the book that I always wanted to read about Python, but never found. It’s not a book for people that want to learn Python from scratch. It’s a great guide for those who know the language but don’t know the details that experienced developers know and that make the difference. The best practice, the elegant solutions to common problems, etc. That’s why it also includes interviews with prominent Python developers, so they can share their advice on different areas.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How can someone get your book?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I’ve decided to self-publish my book, so he does not have an editor like you can be used to see. The best place to get it is online at where you can pick the format you want, electronic or paper.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What do you mean when you say you hack with Python?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Unfortunately, most people refer to hacking as the activity of some bad guys trying to get access to whatever they’re not supposed to see. In the book title, I mean “hacking” as the elegant way of writing code and making things worse smoothly even when you were not expecting to make it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You mentioned earlier that Gnocchi is a time series database. Can you please be more elaborate about Gnocchi? Is there also any documentation about Gnocchi?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So Gnocchi is a project I started a few years ago to store time series at large scale. Timeseries are basically a series of tuple composed of a timestamp and a value.&lt;/p&gt;
&lt;p&gt;Imagine you wanted to store the temperature of all the rooms of the world at any point of time. You’d need a dedicated database for that with the right data structure. This is what Gnocchi does: it provides this data structure storage at very, very large scale.&lt;/p&gt;
&lt;p&gt;The primary use case is infrastructure monitoring, so most people use it to store tons of metrics about their hardware, software, etc. It’s fully documented on &lt;a href=&quot;http://gnocchi.xyz&quot;&gt;its website&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How can a programmer without much experience contribute to open source projects?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The best way to start is to try to fix something that irritates you in some way. It might be a bug, it might be a missing feature. Start small. Don’t try big things first or you could be discouraged.&lt;/p&gt;
&lt;p&gt;Never stop.&lt;/p&gt;
&lt;p&gt;Also, don’t plunge right away in the community and start poking random people or spam them with questions. Do your homework, and listen to the community for a while to get a sense of how things are going. That can be joining IRC and lurking or following the mailing lists for example.&lt;/p&gt;
&lt;p&gt;Big open source communities dedicate programs to help you become engaged. It might be worth a try. Generic programs like Outreachy or Google Summer of Code are a great way to start if you don’t feel confident enough to jump by your own means in a community.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Just out of curiosity, do you write code in French?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Never ever. I think it’s acceptable to write in your language if you are sure that your code will never be open sourced and that your whole team is talking in that language, no matter what – but it’s a ballsy assumption, clearly.&lt;/p&gt;
&lt;p&gt;Truth is that if you do open source, English is the standard, so go with it. Be sad if you want, but please be pragmatic.&lt;/p&gt;
&lt;p&gt;I’ve seen projects being open sourced by companies where all the code source comments were in Korean. It was impossible for any non-Korean people to get a glance of what the code and the project was doing, so it just failed and disappeared.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How does a team of programmers handle bugs in a large open source project?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wish there was some magic recipe, but I don’t think it’s the case. What you want is to have a place where your users can feel safe reporting bugs. Include a template so they don’t forget any details: how to reproduce the bugs, what they expected, etc. The worst thing is to have users reporting “That does not work.” with no details. It’s a waste of time.&lt;/p&gt;
&lt;p&gt;What tool to use to log all of that really depends on the team size and culture.&lt;/p&gt;
&lt;p&gt;Once that works, the actual fixing of bug doesn’t follow any rule. Most developers fix the bug they encounter or the ones that are the most critical for users. Smaller problems might not be fixed for a long time.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Can you tell us about the new book you are working on and when do we expect&lt;br /&gt;
to get it?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That new book is entitled &lt;a href=&quot;https://scaling-python.com&quot;&gt;“Scaling Python”&lt;/a&gt; and it provides insight into how to build largely scalable and distributed applications using Python.&lt;/p&gt;
&lt;p&gt;It is also based on my experience in building this kind of software during the past years. This book also includes interviews of great Python hackers who work on scalable system or know a thing or two about writing applications for performance – an important point to have scalable applications.&lt;/p&gt;
&lt;p&gt;The book is in its final stage now, and it should be out at the beginning of 2018.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How can someone get in contact with you?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I’m reachable at &lt;a href=&quot;mailto:julien@danjou.info&quot;&gt;julien@danjou.info&lt;/a&gt; by email or via Twitter, &lt;a href=&quot;https://twitter.com/juldanjou&quot;&gt;@juldanjou&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>career</category><category>python</category><category>books</category><category>gnocchi</category><category>openstack</category></item><item><title>Attending PyCon FR 2017</title><link>https://julien.danjou.info/blog/pyconfr-announce/</link><guid isPermaLink="true">https://julien.danjou.info/blog/pyconfr-announce/</guid><description>The French edition of the annual Python conference, PyCon FR, will happen in Toulouse from 21st to 24th September.</description><pubDate>Thu, 31 Aug 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The French edition of the annual Python conference, &lt;a href=&quot;https://pycon.fr/2017/&quot;&gt;PyCon FR&lt;/a&gt;, will happen in Toulouse from 21st to 24th September.&lt;/p&gt;
&lt;p&gt;I skipped the last few PyCon FR, but this year I will be back with a one-hour long talk entitled &quot;&lt;em&gt;Scalable and distributed applications in Python&lt;/em&gt;&quot;. It will take place on Saturday afternoon. I will lay out many topics that will be covered in the book I&apos;m working on, &lt;a href=&quot;http://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;The Thursday and Friday will be dedicated to development sprints. I will be there with my friend &lt;a href=&quot;https://blog.sileht.net/&quot;&gt;Mehdi&lt;/a&gt; running a session for &lt;a href=&quot;http://gnocchi.xyz&quot;&gt;Gnocchi&lt;/a&gt;! We&apos;ll spend time teaching new contributors how to use it or how to send love and patches to the project. If you&apos;re into Python and want to learn about timeseries management, it&apos;s an excellent occasion to join us for some fun. 😎&lt;/p&gt;
&lt;p&gt;To join the sprint and the conference, visit the &lt;a href=&quot;http://pyconfr.org&quot;&gt;PyCon FR website&lt;/a&gt; and &lt;a href=&quot;https://www.eventbrite.fr/e/billets-pycon-fr-2017-a-toulouse-37380880219&quot;&gt;register&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>talks</category><category>python</category></item><item><title>Easy Python logging with daiquiri</title><link>https://julien.danjou.info/blog/python-logging-easy-with-daiquiri/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-logging-easy-with-daiquiri/</guid><description>After more than 10 years of writing Python, there&apos;s something I always have been annoyed with: logging.</description><pubDate>Tue, 04 Jul 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;After more than 10 years of writing Python, there&apos;s something I always have&lt;br /&gt;
been annoyed with: logging.&lt;/p&gt;
&lt;p&gt;Don&apos;t read me wrong: I like the &lt;a href=&quot;https://docs.python.org/3/library/logging.html&quot;&gt;Python logging subsystem&lt;/a&gt;. It&apos;s easy to use and works like a charm in most cases. If you never used it, logging in Python turns down to be as simple as:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import logging

logger = logging.getLogger()
logger.info(&quot;Something useful&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It could barely be easier. What annoys me is that if you run the example above, an error will happen. See by yourself:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import logging
&amp;gt;&amp;gt;&amp;gt; logger = logging.getLogger()
&amp;gt;&amp;gt;&amp;gt; logger.error(&quot;Something useful&quot;)
No handlers could be found for logger &quot;root&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Nothing is printed, except an error. No log file is created. Logging does not work &quot;by default&quot;. I hate it.&lt;/p&gt;
&lt;p&gt;Each time I write a new application, I need to remember how to set logging up. There&apos;s a full API, documented, that explains how to setup handlers, formatters, filters, or a record factory. And each time I need to dig into all that documentation to remember how to set some sane defaults (e.g. log to stderr in a format with a timestamp). I could use &lt;a href=&quot;https://docs.python.org/3/library/logging.html#logging.basicConfig&quot;&gt;&lt;code&gt;logging.basicConfig&lt;/code&gt;&lt;/a&gt;, but it&apos;s usually too basic (e.g. it does not print any timestamp).&lt;/p&gt;
&lt;p&gt;Each time I go down the rabbit-hole of tweaking logging.&lt;/p&gt;
&lt;h2&gt;Here comes daiquiri&lt;/h2&gt;
&lt;p&gt;I finally took some time recently to bootstrap a tiny library to do this job for me. It&apos;s named &lt;em&gt;daiquiri&lt;/em&gt;, and it does only one thing: configure the Python logging subsystem for modern Python applications.&lt;/p&gt;
&lt;p&gt;It&apos;s small and the 1.0.0 version I just released contains 228 lines of code and 79 lines of tests. That&apos;s it!&lt;/p&gt;
&lt;p&gt;Its promise is to setup a complete standard Python logging system with just one function call. Nothing more, nothing less. The interesting features are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Logs to stderr by default.&lt;/li&gt;
&lt;li&gt;Use colors if logging to a terminal.&lt;/li&gt;
&lt;li&gt;Support file logging.&lt;/li&gt;
&lt;li&gt;Use program name as the name of the logging file so providing just a directory for logging will work.&lt;/li&gt;
&lt;li&gt;Support syslog.&lt;/li&gt;
&lt;li&gt;Support journald.&lt;/li&gt;
&lt;li&gt;JSON output support.&lt;/li&gt;
&lt;li&gt;Support of arbitrary key/value context information providing.&lt;/li&gt;
&lt;li&gt;Capture the warnings emitted by the &lt;a href=&quot;https://docs.python.org/3/library/warnings.html&quot;&gt;&lt;code&gt;warnings&lt;/code&gt;&lt;/a&gt; module.&lt;/li&gt;
&lt;li&gt;Native logging of any exception.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And it&apos;s used by &lt;a href=&quot;http://gnocchi.xyz&quot;&gt;Gnocchi&lt;/a&gt; starting with version 4.0. That should say how long it&apos;s production ready, right? 😀&lt;/p&gt;
&lt;p&gt;Enough selling. Let&apos;s see how it looks by default!&lt;/p&gt;
&lt;h2&gt;Basic working&lt;/h2&gt;
&lt;p&gt;Here&apos;s the basic usage of daiquiri:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import daiquiri

daiquiri.setup()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I told you I want it to be simple. Just doing this is already doing a better job than &lt;code&gt;logging.basicConfig&lt;/code&gt;, since it&apos;ll do something useful by default:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import daiquiri
&amp;gt;&amp;gt;&amp;gt; daiquiri.setup()
&amp;gt;&amp;gt;&amp;gt; logger = daiquiri.getLogger()
&amp;gt;&amp;gt;&amp;gt; logger.error(&quot;something wrong happened&quot;)
2017-07-04 18:03:04,929 [16876] ERROR root: something wrong happened
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It does print the message on &lt;code&gt;stderr&lt;/code&gt; using a useful formatting and a timestamp by default. Just what everybody wants, isn&apos;t it? If you run this on a terminal, the line will be printed in red as it is an error that is logged. Other colors will be used for different logging levels (green for debug, etc).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/daiquiri.png&quot; alt=&quot;daiquiri&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Better, daiquiri will log any exception in your program:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import daiquiri
&amp;gt;&amp;gt;&amp;gt; daiquiri.setup()
&amp;gt;&amp;gt;&amp;gt; raise Exception(&quot;boom!&quot;)
2017-07-04 18:05:43,378 [16959] CRITICAL root: Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
Exception: boom!
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As soon as an exception is uncaught, it&apos;ll be logged as a critical log message.&lt;/p&gt;
&lt;h2&gt;More advanced features&lt;/h2&gt;
&lt;p&gt;If you want to tweak the default output, you can pass some arguments to &lt;code&gt;daiquiri.setup&lt;/code&gt;. This function accepts a &lt;code&gt;outputs&lt;/code&gt; argument that must be an iterable of &lt;code&gt;daiquiri.Output&lt;/code&gt; objects. This is typically a list of &lt;code&gt;daiquiri.File&lt;/code&gt; object to log to a file, &lt;code&gt;daiquiri.Syslog&lt;/code&gt; to log to &lt;em&gt;syslog&lt;/em&gt; or &lt;code&gt;daiquiri.Stream&lt;/code&gt; to log to any stream (e.g. an opened file, &lt;code&gt;stdout&lt;/code&gt; or &lt;code&gt;stderr&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;If you want to log via &lt;code&gt;syslog&lt;/code&gt; but also to &lt;code&gt;stderr&lt;/code&gt;, here&apos;s what you&apos;ll have to do:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;daiquiri.setup(outputs=(
    daiquiri.output.Syslog(),
    daiquiri.output.STDERR,
))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you want to log to a file, you can just specify a directory, daiquiri will guess the program name and creates the appropriate file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## If the program name is foobar-server then the logging will
## be done to /var/log/foobar-server.log
daiquiri.setup(outputs=(
     daiquiri.output.File(directory=&quot;/var/log&quot;),
))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Those examples might be too easy. So let&apos;s log to journald and also to a network server using JSON output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import socket
import daiquiri

## Let&apos;s connect to the server first
s = socket.socket()
## You can run a simple server in another terminal by typing `nc -l 2333`
s.connect((&quot;localhost&quot;, 2333))
f = s.makefile()

daiquiri.setup(outputs=(
     daiquiri.output.Journal(),
     daiquiri.output.Stream(f, formatter=daiquiri.formatter.JSON_FORMATTER),
))
daiquiri.getLogger().error(&quot;oops&quot;, somekey=42, anotherkey=&quot;foobar&quot;)
## Server will receive:
## {&quot;message&quot;: &quot;oops&quot;, &quot;somekey&quot;: 42, &quot;anotherkey&quot;: &quot;foobar&quot;}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can obviously extend it with your own formatter or outputs, the API is pretty simple. But the default should be usable for 99% of applications.&lt;/p&gt;
&lt;p&gt;Let me know what you think and feel free to pip install and git clone it! The library is available at &lt;a href=&quot;http://pypi.python.org/pypi/daiquiri&quot;&gt;PyPI&lt;/a&gt;, the source is on &lt;a href=&quot;https://github.com/jd/daiquiri&quot;&gt;GitHub&lt;/a&gt; and the &lt;a href=&quot;http://daiquiri.readthedocs.io/&quot;&gt;documentation is published online&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Python never gives up: the tenacity library</title><link>https://julien.danjou.info/blog/python-tenacity/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-tenacity/</guid><description>A couple of years ago, I wrote about the Python retrying library . This library was designed to retry the execution of a task when a failure occurred.  I started to spread usage of this library in var</description><pubDate>Thu, 02 Mar 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A couple of years ago, I &lt;a href=&quot;https://julien.danjou.info/blog/python-retrying&quot;&gt;wrote about the Python &lt;em&gt;retrying&lt;/em&gt; library&lt;/a&gt;. This library was designed to retry the execution of a task when a failure occurred.&lt;/p&gt;
&lt;p&gt;I started to spread usage of this library in various projects, such as &lt;a href=&quot;http://gnocchi.xyz&quot;&gt;Gnocchi&lt;/a&gt;, these last years. Unfortunately, it started to get very hard to contribute and send patches to the upstream &lt;em&gt;retrying&lt;/em&gt; project. I spent several months trying to work with the original author. But after a while, I had to come to the conclusion that I would be unable to fix bugs and enhance it at the pace I would like to. Therefore, I had to take a difficult decision and decided to fork the library.&lt;/p&gt;
&lt;h2&gt;Here comes &lt;em&gt;tenacity&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;I picked a new name and rewrote parts of the API of &lt;em&gt;retrying&lt;/em&gt; that were not working correctly or were too complicated. I also fixed bugs with the help of Joshua, and named this new library &lt;em&gt;tenacity&lt;/em&gt;. It works in the same manner as &lt;em&gt;retrying&lt;/em&gt; does, except that it is written in a more functional way and offers some nifty new features.&lt;/p&gt;
&lt;h2&gt;Basic usage&lt;/h2&gt;
&lt;p&gt;The basic usage is to use it as a decorator:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import tenacity

@tenacity.retry
def do_something_and_retry_on_any_exception():
    pass
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will make the function &lt;code&gt;do_something_and_retry_on_any_exception&lt;/code&gt; be called over and over again until it stops raising an exception. It would have been hard to design anything simpler. Obviously, this is a pretty rare case, as one usually wants to e.g. wait some time between retries. For that, &lt;em&gt;tenacity&lt;/em&gt; offers a large panel of waiting methods:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import tenacity

@tenacity.retry(wait=tenacity.wait_fixed(1))
def do_something_and_retry():
    do_something()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or a simple exponential back-off method can be used instead:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import tenacity

@tenacity.retry(wait=tenacity.wait_exponential())
def do_something_and_retry():
    do_something()
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Combination&lt;/h2&gt;
&lt;p&gt;What is especially interesting with &lt;em&gt;tenacity&lt;/em&gt;, is that you can easily combine several methods. For example, you can combine &lt;code&gt;tenacity.wait.wait_random&lt;/code&gt; with &lt;code&gt;tenacity.wait.wait_fixed&lt;/code&gt; to wait a number of seconds defined in an interval:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import tenacity

@tenacity.retry(wait=tenacity.wait_fixed(10) + wait.wait_random(0, 3))
def do_something_and_retry():
    do_something()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will make the function being retried wait randomly between 10 and 13 seconds before trying again.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;tenacity&lt;/em&gt; offers more customization, such as retrying on some exceptions only. You can retry every second to execute the function only if the exception raised by &lt;code&gt;do_something&lt;/code&gt; is an instance of &lt;code&gt;IOError&lt;/code&gt;, e.g. a network communication error.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import tenacity

@tenacity.retry(wait=tenacity.wait_fixed(1),
                retry=tenacity.retry_if_exception_type(IOError))
def do_something_and_retry():
    do_something()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can combine several condition easily by using the &lt;code&gt;|&lt;/code&gt; or &lt;code&gt;&amp;amp;&lt;/code&gt; binary operators. They are used to make the code retry if an &lt;code&gt;IOError&lt;/code&gt; exception is raised, or if no result is returned. Also, a stop condition is added with the &lt;code&gt;stop&lt;/code&gt; keyword arguments. It allows to specify a condition unrelated to the function result of exception to stop, such as a number of attemps or a delay.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import tenacity

@tenacity.retry(wait=tenacity.wait_fixed(1),
                stop=tenacity.stop_after_delay(60),
                retry=(tenacity.retry_if_exception_type(IOError) |
                       tenacity.retry_if_result(lambda result: result == None))
def do_something_and_retry():
    do_something()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The functional approach of &lt;em&gt;tenacity&lt;/em&gt; makes it easy and clean to combine a lot of condition for various use cases with simple binary operators.&lt;/p&gt;
&lt;h2&gt;Standalone usage&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;tenacity&lt;/em&gt; can also be used without decorator by using the object &lt;code&gt;Retrying&lt;/code&gt;, that implements its main behaviour, and usig its &lt;code&gt;call&lt;/code&gt; method. This allows to call any function with different retry conditions, or to retry any piece of code that do not use the decorator at all – like code from an external library.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import tenacity

r = tenacity.Retrying(
    wait=tenacity.wait_fixed(1),
    retry=tenacity.retry_if_exception_type(IOError))
r.call(do_something)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This also allows you to re-use that object without creating one new each time, saving some memory!&lt;/p&gt;
&lt;p&gt;I hope you&apos;ll like it and will find it some use. Feel free to fork it, report bug or ask for new features on &lt;a href=&quot;https://github.com/jd/tenacity&quot;&gt;its GitHub&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;If you want to learn more about retrying strategy and how to handle failure, there&apos;s even more in &lt;a href=&quot;https://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;. Check it out!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>FOSDEM 2017, recap</title><link>https://julien.danjou.info/blog/fosdem-2017-recap/</link><guid isPermaLink="true">https://julien.danjou.info/blog/fosdem-2017-recap/</guid><description>Last week-end, I was in Brussels, Belgium for the 2017 edition of the FOSDEM, one of the greatest open source developer conference.</description><pubDate>Mon, 06 Feb 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/fosdem.png&quot; alt=&quot;FOSDEM 2017 conference logo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Last week-end, I was in Brussels, Belgium for the 2017 edition of the &lt;a href=&quot;http://fosdem.org&quot;&gt;FOSDEM&lt;/a&gt;, one of the greatest open source developer conference.&lt;/p&gt;
&lt;p&gt;This year, I decided to propose a talk about &lt;a href=&quot;http://gnocchi.xyz&quot;&gt;Gnocchi&lt;/a&gt; which was accepted in the &lt;a href=&quot;https://fosdem.org/2017/schedule/track/python/&quot;&gt;Python devroom&lt;/a&gt;. The track was very well organized (thanks to &lt;a href=&quot;https://wirtel.be/&quot;&gt;Stéphane Wirtel&lt;/a&gt;) and I was able to present Gnocchi to a room full of Python developers!&lt;/p&gt;
&lt;p&gt;I&apos;ve explained why we created Gnocchi and how we did it, and finally briefly explained how to use it with the command-line interface or in a Python application using the &lt;a href=&quot;http://gnocchi.xyz/gnocchiclient&quot;&gt;SDK&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can check the slides below and [the video of the talk (&lt;a href=&quot;https://video.fosdem.org/2017/UD2.120/storing_metrics_gnocchi.mp4&quot;&gt;https://video.fosdem.org/2017/UD2.120/storing_metrics_gnocchi.mp4&lt;/a&gt;).&lt;/p&gt;
</content:encoded><category>talks</category><category>python</category><category>gnocchi</category></item><item><title>Scaling Python is on its way</title><link>https://julien.danjou.info/blog/announcing-scaling-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/announcing-scaling-python/</guid><description>My day-to-day activities are still evolving around the Python programming language, as I continue working on the OpenStack project as part of my job at Red Hat.</description><pubDate>Mon, 16 Jan 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;My day-to-day activities are still evolving around the &lt;a href=&quot;http://python.org&quot;&gt;Python&lt;/a&gt; programming language, as I continue working on the &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; project as part of my job at &lt;a href=&quot;http://redhat.com&quot;&gt;Red Hat&lt;/a&gt;. OpenStack is still the biggest Python project out there, and attract a lot of Python hackers.&lt;/p&gt;
&lt;p&gt;Those last few years, however, things have taken a different turn for me when I made the choice with my team to rework the telemetry stack architecture. We&lt;br /&gt;
decided to make a point of making it scale way beyond what has been done in the project so far.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://scaling-python.com&quot;&gt;&lt;img src=&quot;https://scaling-python.com/img/the-hacker-guide-to-scaling-python.png&quot; alt=&quot;Cover of The Hacker&apos;s Guide to Scaling Python&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I started to dig into a lot of different fields around Python. Topics you don&apos;t often look at when writing a simple and straight-forward application. It turns out that writing scalable applications in Python is not impossible, nor that difficult. There are a few hiccups to avoid, and various tools that can help, but it really is possible – without switching to another whole language, framework, or exotic tool set.&lt;/p&gt;
&lt;p&gt;Working on those projects seemed to me like a good opportunity to share with the rest of the world what I learned. Therefore, I decided to share my most recent knowledge addition around distributed and scalable Python application in a new book, entitled &lt;a href=&quot;https://scaling-python.com&quot;&gt;The Hacker&apos;s Guide to Scaling Python&lt;/a&gt; (or &lt;em&gt;Scaling Python&lt;/em&gt;, in short). The book should be released in a few months – fingers crossed.&lt;/p&gt;
&lt;p&gt;And as the book is still a work-in-progress, I&apos;ll be happy to hear any remark, subject, interrogation or topic idea you might have or any particular angle you would like me to take in this book (reply in the &lt;a href=&quot;#disqus_thread&quot;&gt;comments section&lt;/a&gt; or shoot me an &lt;a href=&quot;mailto:julien@danjou.info&quot;&gt;email&lt;/a&gt;). And if you&apos;d like to get be kept updated on this book advancement, you can subscribe in the following form or from the &lt;a href=&quot;https://scaling-python.com&quot;&gt;book homepage&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The adventure of working on my previous book, &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;, has been so tremendous and the feedback so great, that I&apos;m looking forward releasing this new book later this year!&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>Packaging Python software with pbr</title><link>https://julien.danjou.info/blog/packaging-python-with-pbr/</link><guid isPermaLink="true">https://julien.danjou.info/blog/packaging-python-with-pbr/</guid><description>Packaging Python has been a painful experience for long. The history of the various distribution that Python offered along the years is really bumpy, and both the user and developer experience has bee</description><pubDate>Mon, 02 Jan 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Packaging Python has been a painful experience for long. The history of the various distribution that Python offered along the years is really bumpy, and both the user and developer experience has been pretty bad.&lt;/p&gt;
&lt;p&gt;Fortunately, things improved a lot in the recent years, with the reconciliation of &lt;em&gt;&lt;a href=&quot;https://setuptools.readthedocs.io&quot;&gt;setuptools&lt;/a&gt;&lt;/em&gt; and &lt;em&gt;distribute&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Though in the context of the &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; project, a solution on top of &lt;em&gt;setuptools&lt;/em&gt; has been already started a while back. Its usage is now spread across a whole range of software and libraries.&lt;/p&gt;
&lt;p&gt;This project is called &lt;em&gt;&lt;a href=&quot;http://docs.openstack.org/developer/pbr/&quot;&gt;pbr&lt;/a&gt;&lt;/em&gt;, for &lt;em&gt;Python Build Reasonableness&lt;/em&gt;. Don&apos;t be afraid by the OpenStack colored themed of the documentation – it is a bad habit of OpenStack folks to not advertise their tooling in an agnostic fashion. The tool has no dependency with the cloud platform, and can be used painlessly with any package.&lt;/p&gt;
&lt;h2&gt;How it works&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;pbr&lt;/em&gt; takes inspiration from &lt;em&gt;distutils2&lt;/em&gt; (a now abandoned project) and uses a &lt;code&gt;setup.cfg&lt;/code&gt; file to describe the packager&apos;s intents. This is how a &lt;code&gt;setup.py&lt;/code&gt; using &lt;em&gt;pbr&lt;/em&gt; looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import setuptools

setuptools.setup(setup_requires=[&apos;pbr&apos;], pbr=True)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Two lines of code – it&apos;s that simple. The actual metadata that the setup requires is stored in &lt;code&gt;setup.cfg&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[metadata]
name = foobar
author = Dave Null
author-email = foobar@example.org
summary = Package doing nifty stuff
license = MIT
description-file =
    README.rst
home-page = http://pypi.python.org/pypi/foobar
requires-python = &amp;gt;=2.6
classifier = 
    Development Status :: 4 - Beta
    Environment :: Console
    Intended Audience :: Developers
    Intended Audience :: Information Technology
    License :: OSI Approved :: Apache Software License
    Operating System :: OS Independent
    Programming Language :: Python

[files]
packages =
    foobar
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This syntax is way easier to write and read than the standard &lt;code&gt;setup.py&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;pbr&lt;/em&gt; also offers other features such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;automatic dependency installation based on &lt;code&gt;requirements.txt&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;automatic documentation building and generation using Sphinx&lt;/li&gt;
&lt;li&gt;automatic generation of &lt;code&gt;AUTHORS&lt;/code&gt; and &lt;code&gt;ChangeLog&lt;/code&gt; files based on &lt;em&gt;git&lt;/em&gt; history&lt;/li&gt;
&lt;li&gt;automatic creation of the list of files to include using &lt;em&gt;git&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;version management based on &lt;em&gt;git&lt;/em&gt; tags&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of this comes with little to no effort on your part.&lt;/p&gt;
&lt;h2&gt;Using flavors&lt;/h2&gt;
&lt;p&gt;One of the feature that I use a lot, is the definition of flavors. It&apos;s not tied particularly to &lt;em&gt;pbr&lt;/em&gt; – it&apos;s actually provided by &lt;em&gt;setuptools&lt;/em&gt; and &lt;em&gt;pip&lt;/em&gt; themselves – but &lt;em&gt;pbr&lt;/em&gt; &lt;code&gt;setup.cfg&lt;/code&gt; file makes it easy to use.&lt;/p&gt;
&lt;p&gt;When distributing a software, it&apos;s common to have different drivers for it. For example, your project could support both PostgreSQL or MySQL – but nobody is going to use both at the same time. The usual trick to make it work is to add the needed library to the requirements list (e.g. &lt;code&gt;requirements.txt&lt;/code&gt;). The upside is that the software will work directly with either RDBMS, but the downside is that this will install both libraries, whereas only one is needed. Using flavors, you can specify different scenarios:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[extras]
postgresql =
    psycopg2
mysql =
    pymysql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When installing your package, the user can then just pick the right flavor by using &lt;em&gt;&lt;a href=&quot;https://pip.pypa.io/&quot;&gt;pip&lt;/a&gt;&lt;/em&gt; to install the package:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pip install foobar[postgresql]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will install &lt;em&gt;foobar&lt;/em&gt;, all its dependencies listed in &lt;code&gt;requirements.txt&lt;/code&gt;, plus whatever dependencies are listed in the &lt;code&gt;[extras]&lt;/code&gt; section of &lt;code&gt;setup.cfg&lt;/code&gt; matching the flavor. You can also combine several flavors, e.g.:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pip install foobar[postgresql,mysql]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;would install both flavors.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;pbr&lt;/em&gt; is well-maintained and in very active development, so if you have any plans to distribute your software, you should seriously consider including &lt;em&gt;pbr&lt;/em&gt; in those plans.&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>From decimal to timestamp with MySQL</title><link>https://julien.danjou.info/blog/python-sqlalchemy-from-decimal-to-timestamp/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-sqlalchemy-from-decimal-to-timestamp/</guid><description>When working with timestamps, one question that often arises is the precision of those timestamps. Most software is good enough with a precision up to the second, and that&apos;s easy. But in some cases, l</description><pubDate>Thu, 08 Sep 2016 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;When working with timestamps, one question that often arises is the precision of those timestamps. Most software is good enough with a precision up to the second, and that&apos;s easy. But in some cases, like working on metering, a finer precision is required.&lt;/p&gt;
&lt;p&gt;I don&apos;t know exactly why, and it makes me suffer every day, but &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; is really tied to &lt;a href=&quot;http://mysql.com&quot;&gt;MySQL&lt;/a&gt; (and its clones). It hurts because MySQL is a very poor solution if you want to leverage your database to actually solve problems. But that&apos;s how life is, unfair. And in the context of the projects I work on, that boils down to that we can&apos;t afford to not support MySQL.&lt;/p&gt;
&lt;p&gt;So here we are, needing to work with MySQL and at the same time requiring timestamp with a finer precision than just seconds. And guess what: MySQL did not support that until 2011.&lt;/p&gt;
&lt;h2&gt;No microseconds in MySQL? No problem: DECIMAL!&lt;/h2&gt;
&lt;p&gt;MySQL 5.6.4 (released in 2011), a beta version of MySQL 5.6 (hello MySQL, ever heard of &lt;a href=&quot;http://semver.org&quot;&gt;Semantic Versioning&lt;/a&gt;?), brought microsecond precision to timestamps. But the first stable version supporting that, MySQL 5.6.10, was only released in 2013. So for a long time, there was a problem without any solution.&lt;/p&gt;
&lt;p&gt;The obvious workaround, in this case, is to reassess your choices in technologies, discover that &lt;a href=&quot;https://www.postgresql.org/docs/7.1/static/datatype-datetime.html&quot;&gt;PostgreSQL supports microsecond precision for at least a decade&lt;/a&gt; and problem solved.&lt;/p&gt;
&lt;p&gt;This is not what happened in our case, and in order to support MySQL, one had to find a workaround. And so did they in our &lt;a href=&quot;http://launchpad.net/ceilometer&quot;&gt;Ceilometer&lt;/a&gt; project, using a &lt;a href=&quot;https://dev.mysql.com/doc/refman/5.7/en/precision-math-decimal-characteristics.html&quot;&gt;&lt;code&gt;DECIMAL&lt;/code&gt;&lt;/a&gt; type instead of &lt;code&gt;DATETIME&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;DECIMAL&lt;/code&gt; type takes 2 arguments: the total number of digits you need to store, and how many in that total will be used for the fractional part. Knowing that the internal storage of MySQL uses 1 byte for 2 digits, 2 bytes for 4 digits, 3 bytes for 6 digits and 4 bytes for 9 digits, and that each part is stored independently, in order to maximize your storage space, you want to pick a number of digits that fits that correctly.&lt;/p&gt;
&lt;p&gt;This is why Ceilometer picked 14 for the integer part (9 digits on 4 bytes and 5 digits on 3 bytes) and 6 for the decimal part (3 bytes).&lt;/p&gt;
&lt;p&gt;Wait. It&apos;s stupid because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;DECIMAL(20, 6)&lt;/code&gt; implies that you uses 14 digits for the integer part, which using epoch as a reference makes you able to encode timestamp &lt;code&gt;(10^14) - 1&lt;/code&gt; which is year 3170843. I am certain Ceilometer won&apos;t last that far.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;14 digits is 9 + 5 digits in MySQL which is 7 bytes, the same size that is used for 9 + 6 digits. So if you could have &lt;code&gt;DECIMAL(21, 6)&lt;/code&gt; for the same storage space (and go up to year 31690708 which is a nice bonus, right?)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Well, I guess the original author of the patch did not read the documentation entirely (&lt;code&gt;DECIMAL(20, 6)&lt;/code&gt; being on the MySQL documentation page as an example, I imagine it just has been copy-pasted blindly?).&lt;/p&gt;
&lt;p&gt;The best choice for this use case would have been &lt;code&gt;DECIMAL(17, 6)&lt;/code&gt; which would allow storing 11 digits for integer (5 bytes), supporting timestamp up to &lt;code&gt;(2^11)-1&lt;/code&gt; (year 5138), and 6 digits for decimal part (3 bytes), using only 8 bytes in total per timestamp.&lt;/p&gt;
&lt;p&gt;Nonetheless, this workaround has been implemented using a &lt;a href=&quot;http://sqlalchemy.org&quot;&gt;SQLAlchemy&lt;/a&gt; custom type and works as expected:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class PreciseTimestamp(sqlalchemy.types.TypeDecorator):
    &quot;&quot;&quot;Represents a timestamp precise to the microsecond.&quot;&quot;&quot;

    impl = sqlalchemy.DateTime

    def load_dialect_impl(self, dialect):
        if dialect.name == &apos;mysql&apos;:
            return sqlalchemy.dialect.type_descriptor(
                sqlalchemy.types.DECIMAL(precision=20,
                                         scale=6,
                                         asdecimal=True))
        return sqlalchemy.dialect.type_descriptor(self.impl)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Microseconds in MySQL? Damn, migration!&lt;/h2&gt;
&lt;p&gt;As I said, MySQL 5.6.4 brought microseconds precision to the table (pun intended). Therefore, it&apos;s a great time to migrate away from this hackish format to the brand new one.&lt;/p&gt;
&lt;p&gt;First, be aware that the default &lt;code&gt;DATETIME&lt;/code&gt; type has no microseconds precision: &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.7/en/datetime.html&quot;&gt;you have to specify how many digits you want as an argument&lt;/a&gt;.&lt;br /&gt;
To support microseconds, you should therefore use &lt;code&gt;DATETIME(6)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If we were using a great RDBMS, let&apos;s say, hum, PostgreSQL, we could do that&lt;br /&gt;
very easily, see:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;postgres=# CREATE TABLE foo (mytime decimal);
CREATE TABLE
postgres=# \d foo
      Table &quot;public.foo&quot;
 Column │  Type   │ Modifiers
────────┼─────────┼───────────
 mytime │ numeric │
postgres=# INSERT INTO foo (mytime) VALUES (1473254401.234);
INSERT 0 1
postgres=# ALTER TABLE foo ALTER COLUMN mytime SET DATA TYPE timestamp with time zone USING to_timestamp(mytime);
ALTER TABLE
postgres=# \d foo
              Table &quot;public.foo&quot;
 Column │           Type           │ Modifiers
────────┼──────────────────────────┼───────────
 mytime │ timestamp with time zone │

postgres=# select * from foo;
           mytime
────────────────────────────
 2016-09-07 13:20:01.234+00
(1 row)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And since this is a pretty common use case, it&apos;s even &lt;a href=&quot;https://www.postgresql.org/docs/9.5/static/sql-altertable.html&quot;&gt;an example in the PostgreSQL documentation&lt;/a&gt;. The version from the documentation uses a calculation based on epoch, whereas my example here leverages the &lt;code&gt;to_timestamp()&lt;/code&gt; function. That&apos;s my personal touch.&lt;/p&gt;
&lt;p&gt;Obviously, doing this conversion in a single line is not possible with MySQL: it does not implement the &lt;code&gt;USING&lt;/code&gt; keyword on &lt;code&gt;ALTER TABLE … ALTER COLUMN&lt;/code&gt;. So what&apos;s the solution gonna be? Well, it&apos;s a 4 steps job:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a new column of type &lt;code&gt;DATETIME(6)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Copy data from the old column to the new column, converting them to the new format&lt;/li&gt;
&lt;li&gt;Delete the old column&lt;/li&gt;
&lt;li&gt;Rename the new column to the old column name.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;But I know what you&apos;re thinking: there are 4 steps, but that&apos;s not a problem, we&apos;ll just use a transaction and embed these operations inside.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://dev.mysql.com/doc/refman/5.7/en/cannot-roll-back.html&quot;&gt;MySQL does not support transactions on data definition language (DDL)&lt;/a&gt;.&lt;br /&gt;
So if any of those steps fails, you&apos;ll be unable rollback steps 1, 3 and 4. Who knew that using MySQL was like living on the edge, right?&lt;/p&gt;
&lt;h2&gt;Doing this in Python with our friend Alembic&lt;/h2&gt;
&lt;p&gt;I like &lt;a href=&quot;http://alembic.zzzcomputing.com/&quot;&gt;Alembic&lt;/a&gt;. It&apos;s a Python library based on &lt;a href=&quot;http://sqlalchemy.org&quot;&gt;SQLAlchemy&lt;/a&gt; that handles schema migration for your favorite RDBMS.&lt;/p&gt;
&lt;p&gt;Once you created a new alembic migration script using &lt;code&gt;alembic revision&lt;/code&gt;, it&apos;s time to edit it and write something along those lines:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import func

class Timestamp(sa.types.TypeDecorator):
    &quot;&quot;&quot;Represents a timestamp precise to the microsecond.&quot;&quot;&quot;

    impl = sqlalchemy.DateTime

    def load_dialect_impl(self, dialect):
        if dialect.name == &apos;mysql&apos;:
            return dialect.type_descriptor(mysql.DATETIME(fsp=6))
        return self.impl

def upgrade():
    bind = op.get_bind()
    if bind and bind.engine.name == &quot;mysql&quot;:
        existing_type = sa.types.DECIMAL(
            precision=20, scale=6, asdecimal=True)
        existing_col = sa.Column(&quot;mytime&quot;, existing_type, nullable=False)
        temp_col = sa.Column(&quot;mytime_ts&quot;, Timestamp(), nullable=False)
        # Step 1: ALTER TABLE mytable ADD COLUMN mytime_ts DATETIME(6)
        op.add_column(&quot;mytable&quot;, temp_col)
        t = sa.sql.table(&quot;mytable&quot;, existing_col, temp_col)
        # Step 2: UPDATE mytable SET mytime_ts=from_unixtime(mytime)
        op.execute(t.update().values(mytime_ts=func.from_unixtime(existing_col)}))
        # Step 3: ALTER TABLE mytable DROP COLUMN mytime
        op.drop_column(&quot;mytable&quot;, &quot;mytime&quot;)
        # Step 4: ALTER TABLE mytable CHANGE mytime_ts mytime
        # Note: MySQL needs to have all the old/new information to just rename a column…
        op.alter_column(&quot;mytable&quot;,
                        &quot;mytime_ts&quot;,
                        nullable=False,
                        type_=Timestamp(),
                        existing_nullable=False,
                        existing_type=existing_type,
                        new_column_name=&quot;mytime&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In MySQL, the function to convert a float to a UNIX timestamp is &lt;code&gt;from_unixtime()&lt;/code&gt;, so the script leverages it to convert the data. As said, you&apos;ll notice we don&apos;t bother using any kind of transaction, so if anything goes wrong, there&apos;s no rollback, and it won&apos;t be possible to re-run the migration without a manual intervention.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;TimestampUTC&lt;/code&gt; is a custom class that implements &lt;code&gt;sqlalchemy.DateTime&lt;/code&gt; using a &lt;code&gt;DATETIME(6)&lt;/code&gt; type for MySQL, and a regular &lt;code&gt;sqlalchemy.DateTime&lt;/code&gt; type for other back-ends. It is used by the rest of the code (e.g. ORM model) but I&apos;ve pasted it in this example for a better understanding.&lt;/p&gt;
&lt;p&gt;Once written, you can easily test your migration using &lt;a href=&quot;https://github.com/jd/pifpaf&quot;&gt;&lt;em&gt;pifpaf&lt;/em&gt;&lt;/a&gt; to run a temporary database:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pifpaf run mysql $SHELL
$ alembic -c alembic/alembic.ini upgrade 1c98ac614015 # upgrade to the initial revision
$ mysql -S $PIFPAF_MYSQL_SOCKET pifpaf
mysql&amp;gt; INSERT INTO mytable (mytime) VALUES (1325419200.213000);
Query OK, 1 row affected (0.00 sec)

mysql&amp;gt; SELECT * FROM mytable;
+-------------------+
| mytime            |
+-------------------+
| 1325419200.213000 |
+-------------------+
1 row in set (0.00 sec)

$ alembic -c alembic/alembic.ini upgrade head

$ mysql -S $PIFPAF_MYSQL_SOCKET pifpaf
mysql&amp;gt; SELECT * FROM mytable;
+----------------------------+
| mytime                     |
+----------------------------+
| 2012-01-01 13:00:00.213000 |
+----------------------------+
1 row in set (0.00 sec)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And voilà, we just migrated unsafely our data to a new fancy format. Thank you Alembic for solving a problem we would not have without MySQL. 😊&lt;/p&gt;
</content:encoded><category>python</category><category>databases</category><category>openstack</category></item><item><title>The definitive guide to Python exceptions</title><link>https://julien.danjou.info/blog/python-exceptions-guide/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-exceptions-guide/</guid><description>Three years after my definitive guide on Python classic, static, class and abstract methods, it seems to be time for a new one. Here, I would like to dissect and discuss Python exceptions.</description><pubDate>Thu, 11 Aug 2016 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Three years after my definitive guide on &lt;a href=&quot;https://julien.danjou.info/blog/guide-python-static-class-abstract-methods&quot;&gt;Python classic, static, class and abstract methods&lt;/a&gt;, it seems to be time for a new one. Here, I would like to dissect and discuss &lt;a href=&quot;https://docs.python.org/3/tutorial/errors.html&quot;&gt;Python exceptions&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Dissecting the base exceptions&lt;/h2&gt;
&lt;p&gt;In Python, the base exception class is named &lt;code&gt;BaseException&lt;/code&gt;. Being rarely used in any program or library, it ought to be considered as an &lt;em&gt;implementation detail&lt;/em&gt;. But to discover how it&apos;s implemented, you can go and read &lt;a href=&quot;https://github.com/python/cpython/blob/master/Objects/exceptions.c&quot;&gt;Objects/exceptions.c&lt;/a&gt; in the CPython source code. In that file, what is interesting is to see that the &lt;code&gt;BaseException&lt;/code&gt; class defines all the basic methods and attribute of exceptions. The basic well-known &lt;code&gt;Exception&lt;/code&gt; class is then simply defined as a subclass of &lt;code&gt;BaseException&lt;/code&gt;, nothing more:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;/*
 *    Exception extends BaseException
 */
SimpleExtendsException(PyExc_BaseException, Exception,
                       &quot;Common base class for all non-exit exceptions.&quot;);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The only other exceptions that inherits directly from &lt;code&gt;BaseException&lt;/code&gt; are &lt;code&gt;GeneratorExit&lt;/code&gt;, &lt;code&gt;SystemExit&lt;/code&gt; and &lt;code&gt;KeyboardInterrupt&lt;/code&gt;. All the other builtin exceptions inherits from &lt;code&gt;Exception&lt;/code&gt;. The whole hierarchy can be seen by running &lt;code&gt;pydoc2 exceptions&lt;/code&gt; or &lt;code&gt;pydoc3 builtins&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Here are the graph representing the builtin exceptions inheritance in Python 2 and Python 3 (generated using &lt;a href=&quot;https://github.com/jd/julien.danjou.info/blob/master/bin/generate-python-exceptions-graph.py&quot;&gt;this script&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/python2-exceptions-graph.png&quot; alt=&quot;Python 2 builtin exceptions inheritance graph&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/python3-exceptions-graph.png&quot; alt=&quot;Python 3 builtin exceptions inheritance graph&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;BaseException.__init__&lt;/code&gt; signature is actually &lt;code&gt;BaseException.__init__(*args)&lt;/code&gt;. This initialization method stores any arguments that is passed in the &lt;code&gt;args&lt;/code&gt; attribute of the exception. This can be seen in the &lt;code&gt;exceptions.c&lt;/code&gt; source code – and is true for both Python 2 and Python 3:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;static int
BaseException_init(PyBaseExceptionObject *self, PyObject *args, PyObject *kwds)
{
    if (!_PyArg_NoKeywords(Py_TYPE(self)-&amp;gt;tp_name, kwds))
        return -1;

    Py_INCREF(args);
    Py_XSETREF(self-&amp;gt;args, args);

    return 0;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The only place where this &lt;code&gt;args&lt;/code&gt; attribute is used is in the &lt;code&gt;BaseException.__str__&lt;/code&gt; method. This method uses &lt;code&gt;self.args&lt;/code&gt; to convert an exception to a string:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;static PyObject *
BaseException_str(PyBaseExceptionObject *self)
{
    switch (PyTuple_GET_SIZE(self-&amp;gt;args)) {
    case 0:
        return PyUnicode_FromString(&quot;&quot;);
    case 1:
        return PyObject_Str(PyTuple_GET_ITEM(self-&amp;gt;args, 0));
    default:
        return PyObject_Str(self-&amp;gt;args);
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This can be translated in Python to:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def __str__(self):
    if len(self.args) == 0:
        return &quot;&quot;
    if len(self.args) == 1:
        return str(self.args[0])
    return str(self.args)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Therefore, the message to display for an exception should be passed as the first and the only argument to the &lt;code&gt;BaseException.__init__&lt;/code&gt; method.&lt;/p&gt;
&lt;h2&gt;Defining your exceptions properly&lt;/h2&gt;
&lt;p&gt;As you may already know, in Python, exceptions can be raised in any part of the program. The basic exception is called &lt;code&gt;Exception&lt;/code&gt; and can be used anywhere in your program. In real life, however no program nor library should ever raise &lt;code&gt;Exception&lt;/code&gt; directly: it&apos;s not specific enough to be helpful.&lt;/p&gt;
&lt;p&gt;Since all exceptions are expected to be derived from the base class &lt;code&gt;Exception&lt;/code&gt;, this base class can easily be used as a catch-all:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;try:
    do_something()
except Exception:
    # THis will catch any exception!
    print(&quot;Something terrible happened&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To define your own exceptions correctly, there are a few rules and best practice that you need to follow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Always inherit from (at least) &lt;code&gt;Exception&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;class MyOwnError(Exception):
    pass
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Leverage what we saw earlier about &lt;code&gt;BaseException.__str__&lt;/code&gt;: it uses the first argument passed to &lt;code&gt;BaseException.__init__&lt;/code&gt; to be printed, so always call &lt;code&gt;BaseException.__init__&lt;/code&gt; with &lt;strong&gt;only one argument&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When building a library, define a base class inheriting from &lt;code&gt;Excepion&lt;/code&gt;. It will make it easier for consumers to catch any exception from the library:&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;class ShoeError(Exception):
    &quot;&quot;&quot;Basic exception for errors raised by shoes&quot;&quot;&quot;

class UntiedShoelace(ShoeError):
    &quot;&quot;&quot;You could fall&quot;&quot;&quot;

class WrongFoot(ShoeError):
    &quot;&quot;&quot;When you try to wear your left show on your right foot&quot;&quot;&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It then makes it easy to use &lt;code&gt;except ShoeError&lt;/code&gt; when doing anything with that piece of code related to shoes. For example, &lt;a href=&quot;https://docs.djangoproject.com/en/1.9/_modules/django/core/exceptions/&quot;&gt;Django does not do that&lt;/a&gt; for some of its exceptions, making it hard to catch &quot;any exception raised by Django&quot;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Provide details about the error. This is extremely valuable to be able to log correctly errors or take further action and try to recover:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;class CarError(Exception):
    &quot;&quot;&quot;Basic exception for errors raised by cars&quot;&quot;&quot;
    def __init__(self, car, msg=None):
        if msg is None:
            # Set some default useful error message
            msg = &quot;An error occured with car %s&quot; % car
        super(CarError, self).__init__(msg)
        self.car = car

class CarCrashError(CarError):
    &quot;&quot;&quot;When you drive too fast&quot;&quot;&quot;
    def __init__(self, car, other_car, speed):
        super(CarCrashError, self).__init__(
            car, msg=&quot;Car crashed into %s at speed %d&quot; % (other_car, speed))
        self.speed = speed
        self.other_car = other_car
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, any code can inspect the exception to take further action:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;try:
    drive_car(car)
except CarCrashError as e:
    # If we crash at high speed, we call emergency
    if e.speed &amp;gt;= 30:
        call_911()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For example, this is leveraged in &lt;a href=&quot;http://gnocchi.xyz&quot;&gt;Gnocchi&lt;/a&gt; to raise specific application exceptions (&lt;code&gt;NoSuchArchivePolicy&lt;/code&gt;) on expected foreign key violations raised by SQL constraints:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;try:
    with self.facade.writer() as session:
        session.add(m)
except exception.DBReferenceError as e:
    if e.constraint == &apos;fk_metric_ap_name_ap_name&apos;:
        raise indexer.NoSuchArchivePolicy(archive_policy_name)
    raise
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Inherits from builtin exceptions types when it makes sense. This makes it easier for programs to not be specific to your application or library:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;class CarError(Exception):
    &quot;&quot;&quot;Basic exception for errors raised by cars&quot;&quot;&quot;

class InvalidColor(CarError, ValueError):
    &quot;&quot;&quot;Raised when the color for a car is invalid&quot;&quot;&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That allows many programs to catch errors in a more generic way without noticing your own defined type. If a program already knows how to handle a &lt;code&gt;ValueError&lt;/code&gt;, it won&apos;t need any specific code nor modification.&lt;/p&gt;
&lt;h2&gt;Organization&lt;/h2&gt;
&lt;p&gt;Organizing code can be quite touchy and complicated. I cover more general rules in &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;, but here&apos;s a few rules concerning exceptions in particular.&lt;/p&gt;
&lt;p&gt;There is no limitation on where and when you can define exceptions. As they are, after all, normal classes, they can be defined in any module, function or class – even as closures.&lt;/p&gt;
&lt;p&gt;Most libraries package their exceptions into a specific exception module: &lt;a href=&quot;http://sqlalchemy.org&quot;&gt;SQLAlchemy&lt;/a&gt; has them in&lt;br /&gt;
&lt;a href=&quot;http://docs.sqlalchemy.org/en/latest/core/exceptions.html&quot;&gt;&lt;code&gt;sqlalchemy.exc&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;http://docs.python-requests.org/&quot;&gt;requests&lt;/a&gt; has them in&lt;br /&gt;
&lt;a href=&quot;http://docs.python-requests.org/en/master/_modules/requests/exceptions/&quot;&gt;&lt;code&gt;requests.exceptions&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;http://werkzeug.pocoo.org/&quot;&gt;Werkzeug&lt;/a&gt; has them in &lt;a href=&quot;http://werkzeug.pocoo.org/docs/0.11/exceptions/&quot;&gt;&lt;code&gt;werkzeug.exceptions&lt;/code&gt;&lt;/a&gt;, etc.&lt;/p&gt;
&lt;p&gt;That makes sense for libraries to export exceptions that way, as it makes it very easy for consumers to import their exception module and know where the exceptions are defined when writing code to handle errors.&lt;/p&gt;
&lt;p&gt;This is not mandatory, and smaller Python modules might want to retain their exceptions into their sole module. Typically, if your module is small enough to be kept in one file, don&apos;t bother splitting your exceptions into a different file/module.&lt;/p&gt;
&lt;p&gt;While this wisely applies to libraries, applications tend to be different beasts. Usually, they are composed of different subsystems, where each one might have its own set of exceptions. This is why I generally discourage going with only one exception module in an application, but to split them across the different parts of one&apos;s program. There might be no need of a special &lt;code&gt;myapp.exceptions&lt;/code&gt; module.&lt;/p&gt;
&lt;p&gt;For example, if your application is composed of an HTTP REST API defined into the module &lt;code&gt;myapp.http&lt;/code&gt; and of a TCP server contained into &lt;code&gt;myapp.tcp&lt;/code&gt;, it&apos;s likely they can both define different exceptions tied to their own protocol errors and cycle of life. Defining those exceptions in a &lt;code&gt;myapp.exceptions&lt;/code&gt; module would just scatter the code for the sake of some useless consistency. If the exceptions are local to a file, just define them somewhere at the top of that file. It will simplify the maintenance of the code.&lt;/p&gt;
&lt;h2&gt;Wrapping exceptions&lt;/h2&gt;
&lt;p&gt;Wrapping exception is the practice by which one exception is encapsulated into another:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class MylibError(Exception):
    &quot;&quot;&quot;Generic exception for mylib&quot;&quot;&quot;
    def __init__(self, msg, original_exception):
        super(MylibError, self).__init__(msg + (&quot;: %s&quot; % original_exception))
        self.original_exception = original_exception

try:
    requests.get(&quot;http://example.com&quot;)
except requests.exceptions.ConnectionError as e:
     raise MylibError(&quot;Unable to connect&quot;, e)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This makes sense when writing a library which leverages other libraries. If a library uses &lt;code&gt;requests&lt;/code&gt; and does not encapsulate &lt;code&gt;requests&lt;/code&gt; exceptions into its own defined error classes, it will be a case of layer violation. Any application using your library might receive a &lt;code&gt;requests.exceptions.ConnectionError&lt;/code&gt;, which is a problem because:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The application has no clue that the library was using &lt;code&gt;requests&lt;/code&gt; and does not need/want to know about it.&lt;/li&gt;
&lt;li&gt;The application will have to import &lt;code&gt;requests.exceptions&lt;/code&gt; itself and therefore will depend on &lt;code&gt;requests&lt;/code&gt; – even if it does not use it directly.&lt;/li&gt;
&lt;li&gt;As soon as &lt;code&gt;mylib&lt;/code&gt; changes from &lt;code&gt;requests&lt;/code&gt; to e.g. &lt;code&gt;httplib2&lt;/code&gt;, the application code catching &lt;code&gt;requests&lt;/code&gt; exceptions will become irrelevant.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/openstack/tooz&quot;&gt;Tooz&lt;/a&gt; library is a good example of wrapping, as it uses a driver-based approach and depends on a lot of different Python modules to talk to different backends (ZooKeeper, PostgreSQL, etcd…).&lt;br /&gt;
Therefore, it wraps exception from other modules on every occasion into its own set of error classes. Python 3 introduced the &lt;code&gt;raise from&lt;/code&gt; form to help with that, and that&apos;s what Tooz leverages to raise its own error.&lt;/p&gt;
&lt;p&gt;It&apos;s also possible to encapsulate the original exception into a custom defined exception, as done above. That makes the original exception available for inspection easily.&lt;/p&gt;
&lt;h2&gt;Catching and logging&lt;/h2&gt;
&lt;p&gt;When designing exceptions, it&apos;s important to remember that they should be targeted both at humans and computers. That&apos;s why they should include an explicit message, and embed as much information as possible. That will help to debug and write resilient programs that can pivot their behavior depending on the attributes of exception, as seen above.&lt;/p&gt;
&lt;p&gt;Also, silencing exceptions completely is to be considered as bad practice. You should not write code like that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;try:
    do_something()
except Exception:
    # Whatever
    pass
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not having any kind of information in a program where an exception occurs is a nightmare to debug.&lt;/p&gt;
&lt;p&gt;If you use (and you should) the &lt;a href=&quot;https://docs.python.org/3/library/logging.html&quot;&gt;&lt;code&gt;logging&lt;/code&gt;&lt;/a&gt; library, you can use the &lt;code&gt;exc_info&lt;/code&gt; parameter to log a complete traceback when an exception occurs, which might help debugging on severe and unrecoverable failure:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;try:
    do_something()
except Exception:
    logging.getLogger().error(&quot;Something bad happened&quot;, exc_info=True)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you often forget on how to setup the &lt;code&gt;logging&lt;/code&gt; library, you should check out &lt;a href=&quot;https://github.com/jd/daiquiri&quot;&gt;daiquiri&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Further reading&lt;/h2&gt;
&lt;p&gt;If you understood everything so far, congratulations, you might be ready to handle exception in Python! If you want to have a broader scope on exceptions and what Python misses, I encourage you to read about &lt;a href=&quot;https://en.wikipedia.org/wiki/Exception_handling#Condition_systems&quot;&gt;condition systems&lt;/a&gt; and discover the generalization of exceptions – that I hope we&apos;ll see in Python one day!&lt;/p&gt;
&lt;p&gt;I hope this will help you building better libraries and application. Feel free to shoot any question in the comment section!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>The Hacker&apos;s Guide to Python 3rd edition is out</title><link>https://julien.danjou.info/blog/the-hacker-guide-to-python-third-edition/</link><guid isPermaLink="true">https://julien.danjou.info/blog/the-hacker-guide-to-python-third-edition/</guid><description>Exactly a year ago, I released the second edition of my book The Hacker&apos;s Guide to Python . One more time, it has been a wonderful release and I received a lot of amazing feedback from my readers all</description><pubDate>Wed, 04 May 2016 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Exactly a year ago, I &lt;a href=&quot;https://julien.danjou.info/blog/the-hacker-guide-to-python-second-edition&quot;&gt;released the second edition of my book The Hacker&apos;s Guide to Python&lt;/a&gt;. One more time, it has been a wonderful release and I received a lot of amazing feedback from my readers all over this year.&lt;/p&gt;
&lt;p&gt;Since then, the book has been &lt;strong&gt;translated into 2 languages&lt;/strong&gt;: Korean and Chinese. A few thousands of copies has been distributed there, and I&apos;m very glad the book has been such a success. I&apos;m looking into getting it translated into more languages – don&apos;t hesitate to get in touch with me if you have any interesting connections in your country.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/thgtp-korean.jpg&quot; alt=&quot;thgtp-korean&quot; /&gt;&lt;/p&gt;
&lt;p&gt;For those who still don&apos;t know about this guide, that I first released a couple of years ago, let me sum up by saying it&apos;s &lt;strong&gt;the Python book that I always wanted to read&lt;/strong&gt;, never found, and finally wrote. It does not cover the basics of the language, but deals with concrete problems, best practice and some of the languages internals.&lt;/p&gt;
&lt;p&gt;It includes content about unit testing, methods, decorators, AST, distribution, documentation, functional programming, scaling, Python 3, etc. All of that made it pretty &lt;strong&gt;successful&lt;/strong&gt;! It comes with awesome &lt;strong&gt;9 interviews&lt;/strong&gt; that I realized with some of my fellow experienced Python hackers and developers!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/thgtp-v3-photo-stack.jpg&quot; alt=&quot;thgtp-v3-photo-stack&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In that &lt;strong&gt;3rd edition&lt;/strong&gt;, there is, like in each new edition, a few fixes on code, typos, etc. I guess books need a lot of time to become perfect! I also updated some of the content: things evolved a bit since I last revised the content a year ago. Finally, a new chapter about timestamps handling and timezone has made his appearance too.&lt;/p&gt;
&lt;p&gt;If you didn&apos;t get the book yet, it&apos;s time to go &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;check it out&lt;/a&gt; and use the coupon &lt;strong&gt;THGTP3LAUNCH&lt;/strong&gt; to get 20 % off during the next 48 hours!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-python-darken-v2-1.png&quot; alt=&quot;the-hacker-guide-to-python-darken-v2-1&quot; /&gt;&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>Pifpaf, or how to run any daemon briefly</title><link>https://julien.danjou.info/blog/pifpaf-a-tool-to-run-daemon-briefly/</link><guid isPermaLink="true">https://julien.danjou.info/blog/pifpaf-a-tool-to-run-daemon-briefly/</guid><description>There&apos;s a lot of situation where you end up needing a software deployed temporarily. This can happen when testing something manually, when running a script or when launching a test suite.  Indeed, man</description><pubDate>Fri, 08 Apr 2016 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;There&apos;s a lot of situation where you end up needing a software deployed temporarily. This can happen when testing something manually, when running a script or when launching a test suite.&lt;/p&gt;
&lt;p&gt;Indeed, many applications need to use and interconnect with external software: a RDBMS (&lt;a href=&quot;http://postgressql.org&quot;&gt;PostgreSQL&lt;/a&gt;, &lt;a href=&quot;http://mysql.org&quot;&gt;MySQL&lt;/a&gt;…), a cache (&lt;a href=&quot;http://memcached.org&quot;&gt;memcached&lt;/a&gt;, &lt;a href=&quot;http://redis.io&quot;&gt;Redis&lt;/a&gt;…) or any other external component. This tends to make more difficult running a software (or its test suite). If you want to rely on this component being installed and deployed, you end up needing a full environment set-up and properly configured to run your tests. Which is discouraging.&lt;/p&gt;
&lt;p&gt;The different &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; projects I work on ended up pretty soon spawning some of their back-ends temporarily to run their tests. Some of those unit tests somehow became entirely what you would call functional or integration tests. But that&apos;s just a name. In the end, what we ended up doing is testing that the software was really working. And there&apos;s no better way doing that than talking to a real PostgreSQL instance rather than mocking every call.&lt;/p&gt;
&lt;h2&gt;Pifpaf to the rescue&lt;/h2&gt;
&lt;p&gt;To solve that issue, I created a new tool, named &lt;em&gt;&lt;a href=&quot;https://github.com/jd/pifpaf&quot;&gt;Pifpaf&lt;/a&gt;&lt;/em&gt;. &lt;em&gt;Pifpaf&lt;/em&gt; eases the run of any daemon in a test mode for a brief moment, before making it disappear completely. It&apos;s pretty easy to install as &lt;a href=&quot;http://pypi.python.org/pypi/pifpaf&quot;&gt;it is available on PyPI&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pip install pifpaf
Collecting pifpaf
[…]
Installing collected packages: pifpaf
Successfully installed pifpaf-0.0.7
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can then use it to run any of the listed daemons:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pifpaf list
+---------------+
| Daemons       |
+---------------+
| redis         |
| postgresql    |
| mongodb       |
| zookeeper     |
| aodh          |
| influxdb      |
| ceph          |
| elasticsearch |
| etcd          |
| mysql         |
| memcached     |
| rabbitmq      |
| gnocchi       |
+---------------+
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Pifpaf&lt;/em&gt; accepts any shell command line to execute after its arguments:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pifpaf run postgresql -- psql
Expanded display is used automatically.
Line style is unicode.
SET
psql (9.5.2)
Type &quot;help&quot; for help.

template1=# \l
                              List of databases
   Name    │ Owner │ Encoding │   Collate   │    Ctype    │ Access privileges
───────────┼───────┼──────────┼─────────────┼─────────────┼───────────────────
 postgres  │ jd    │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │
 template0 │ jd    │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/jd            ↵
           │       │          │             │             │ jd=CTc/jd
 template1 │ jd    │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/jd            ↵
           │       │          │             │             │ jd=CTc/jd
(3 rows)

template1=# create database foobar;
CREATE DATABASE
template1=# \l
                              List of databases
   Name    │ Owner │ Encoding │   Collate   │    Ctype    │ Access privileges
───────────┼───────┼──────────┼─────────────┼─────────────┼───────────────────
 foobar    │ jd    │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │
 postgres  │ jd    │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │
 template0 │ jd    │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/jd            ↵
           │       │          │             │             │ jd=CTc/jd
 template1 │ jd    │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/jd            ↵
           │       │          │             │             │ jd=CTc/jd
(4 rows)

template1=# \q
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What &lt;em&gt;pifpaf&lt;/em&gt; does is that it runs the different commands needed to create a new PostgreSQL cluster and then run PostgreSQL on a temporary port for you. So your &lt;em&gt;psql&lt;/em&gt; session actually connects to a temporary PostgreSQL server, that is trashed as soon as you quit &lt;em&gt;psql&lt;/em&gt;. And all of that in less than 10 seconds, without the use of any virtualization or container technology!&lt;/p&gt;
&lt;p&gt;You can see what it does in detail using the &lt;em&gt;debug&lt;/em&gt; mode:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pifpaf --debug run mysql $SHELL
DEBUG: pifpaf.drivers: executing: [&apos;mysqld&apos;, &apos;--initialize-insecure&apos;, &apos;--datadir=/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg&apos;]
DEBUG: pifpaf.drivers: executing: [&apos;mysqld&apos;, &apos;--datadir=/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg&apos;, &apos;--pid-file=/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg/mysql.pid&apos;, &apos;--socket=/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg/mysql.socket&apos;, &apos;--skip-networking&apos;, &apos;--skip-grant-tables&apos;]
DEBUG: pifpaf.drivers: executing: [&apos;mysql&apos;, &apos;--no-defaults&apos;, &apos;-S&apos;, &apos;/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg/mysql.socket&apos;, &apos;-e&apos;, &apos;CREATE DATABASE test;&apos;]
[…]
$ exit
[…]
DEBUG: pifpaf.drivers: mysqld output: 2016-04-08T08:52:04.202143Z 0 [Note] InnoDB: Starting shutdown...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Pifpaf&lt;/em&gt; also supports my pet project &lt;a href=&quot;http://launchpad.net/gnocchi&quot;&gt;Gnocchi&lt;/a&gt;, so you can run and try that timeseries database in a snap:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pifpaf run gnocchi $SHELL
$ gnocchi metric create
+------------------------------------+-----------------------------------------------------------------------+
| Field                              | Value                                                                 |
+------------------------------------+-----------------------------------------------------------------------+
| archive_policy/aggregation_methods | std, count, 95pct, min, max, sum, median, mean                        |
| archive_policy/back_window         | 0                                                                     |
| archive_policy/definition          | - points: 12, granularity: 0:05:00, timespan: 1:00:00                 |
|                                    | - points: 24, granularity: 1:00:00, timespan: 1 day, 0:00:00          |
|                                    | - points: 30, granularity: 1 day, 0:00:00, timespan: 30 days, 0:00:00 |
| archive_policy/name                | low                                                                   |
| created_by_project_id              | admin                                                                 |
| created_by_user_id                 | admin                                                                 |
| id                                 | ff825d33-c8c8-46d4-b696-4b1e8f84a871                                  |
| name                               | None                                                                  |
| resource/id                        | None                                                                  |
+------------------------------------+-----------------------------------------------------------------------+
$ exit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And it takes less than 10 seconds to launch Gnocchi on my laptop using &lt;em&gt;pifpaf&lt;/em&gt;. I&apos;m then able to play with the &lt;code&gt;gnocchi&lt;/code&gt; command line tool. It&apos;s by far faster than using OpenStack &lt;a href=&quot;http://devstack.org&quot;&gt;devstack&lt;/a&gt; to deloy everything the software.&lt;/p&gt;
&lt;h2&gt;Using &lt;em&gt;pifpaf&lt;/em&gt; with your test suite&lt;/h2&gt;
&lt;p&gt;We leverage &lt;em&gt;Pifpaf&lt;/em&gt; in several of our OpenStack telemetry related projects now, and even in &lt;a href=&quot;http://launchpad.net/tooz&quot;&gt;tooz&lt;/a&gt;. For example, to run unit/functional tests with a &lt;em&gt;memcached&lt;/em&gt; server available, a &lt;code&gt;tox.ini&lt;/code&gt; file should like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[testenv:py27-memcached]
commands = pifpaf run memcached -- python setup.py testr
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The tests can then use the environment variable &lt;code&gt;PIFPAF_MEMCACHED_PORT&lt;/code&gt; to connect to &lt;em&gt;memcached&lt;/em&gt; and run tests using it. As soon as the tests are finished, &lt;em&gt;memcached&lt;/em&gt; is killed by &lt;em&gt;pifpaf&lt;/em&gt; and the temporary data are trashed.&lt;/p&gt;
&lt;p&gt;We move a few OpenStack projects to using &lt;em&gt;Pifpaf&lt;/em&gt; already, and I&apos;m planning to make use of it in a few more. My fellow developer &lt;a href=&quot;http://sileht.net&quot;&gt;Mehdi Abaakouk&lt;/a&gt; added support for &lt;a href=&quot;http://rabbitmq.com&quot;&gt;RabbitMQ&lt;/a&gt; in &lt;em&gt;Pifpaf&lt;/em&gt; and &lt;a href=&quot;https://review.openstack.org/#/c/301771&quot;&gt;added support for more advanced tests&lt;/a&gt; in &lt;a href=&quot;http://launchpad.net/oslo.messaging&quot;&gt;oslo.messaging&lt;/a&gt; (such as failure scenarios) using &lt;em&gt;Pifpaf&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Pifpaf&lt;/em&gt; is a very small and handy tool. Give it a try and let me know how it works for you!&lt;/p&gt;
</content:encoded><category>python</category><category>openstack</category></item><item><title>Timeseries storage and data compression</title><link>https://julien.danjou.info/blog/gnocchi-carbonara-timeseries-compression/</link><guid isPermaLink="true">https://julien.danjou.info/blog/gnocchi-carbonara-timeseries-compression/</guid><description>The first major version of the scalable timeserie database I work on, Gnocchi was a released a few months ago. In this first iteration, it took a rather naive approach to data storage.</description><pubDate>Mon, 15 Feb 2016 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The first major version of the scalable timeserie database I work on, &lt;a href=&quot;http://gnocchi.xyz&quot;&gt;Gnocchi&lt;/a&gt; was a released a few months ago. In this first iteration, it took a rather naive approach to data storage. We had little ideas about if and how our distributed back-ends were going to be heavily used, so we stuck to the code of the first proof-of-concept written a couple of years ago.&lt;/p&gt;
&lt;p&gt;Recently we got more feedbacks from our users, ran a few &lt;a href=&quot;https://julien.danjou.info/blog/gnocchi-benchmarks&quot;&gt;benchmarks&lt;/a&gt;. That gave us enough feedback to start investigating in improving our storage strategy.&lt;/p&gt;
&lt;h2&gt;Data split&lt;/h2&gt;
&lt;p&gt;Up to Gnocchi 1.3, all data for a single metric are stored in a single gigantic file per aggregation method (&lt;em&gt;min&lt;/em&gt;, &lt;em&gt;max&lt;/em&gt;, &lt;em&gt;average&lt;/em&gt;…). This means that the file can grow to several megabytes in size, which make it slow to manipulate. For the next version of Gnocchi, our first work has been to rework that storage and split the data into smaller parts.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/gnocchi-carbonara-split.png&quot; alt=&quot;gnocchi-carbonara-split&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The diagram above shows how data are organized inside Gnocchi. Until version 1.3, there would have been only one file for each aggregation methods.&lt;/p&gt;
&lt;p&gt;In the upcoming 2.0 version, Gnocchi will split all these data into smaller parts, where each data split is stored in a file/object. This allows to manipulate smaller pieces of data and to increase the parallelism of the CRUD operations on the back-end – leading to large speed improvement.&lt;/p&gt;
&lt;p&gt;In order to split timeseries into several chunks, Gnocchi defines a maximum number of N points to keep per chunk, to limit their maximum size. It then defines a hash function that produces a non-unique key for any timestamp. It makes it easy to find in which chunk any timestamp should be stored or retrieved.&lt;/p&gt;
&lt;h2&gt;Data compression&lt;/h2&gt;
&lt;p&gt;Up to Gnocchi 1.3, the data stored for each metric is simply serialized using &lt;a href=&quot;http://msgpack.org&quot;&gt;msgpack&lt;/a&gt;, a fast and small serialization format. Though, this format does not provide any compression. That means that storing data points needs 8 bytes for a timestamp (64 bits timestamp with nanosecond precision) and 8 bytes for a value (64 bits double-precision floating-point), plus some overhead (extra information and &lt;em&gt;msgpack&lt;/em&gt; itself).&lt;/p&gt;
&lt;p&gt;After looking around on how to compress all these measures, I stumbled upon a paper from some &lt;a href=&quot;http://facebook&quot;&gt;Facebook&lt;/a&gt; engineers called about Gorilla, their in-memory timeserie database, entitled &quot;&lt;em&gt;&lt;a href=&quot;http://www.vldb.org/pvldb/vol8/p1816-teller.pdf&quot;&gt;Gorilla: A Fast, Scalable, In-Memory Time Series Database&lt;/a&gt;&lt;/em&gt;&quot;. For reference, part of this encoding is also used by &lt;a href=&quot;https://docs.influxdata.com/influxdb/v0.9/concepts/storage_engine/&quot;&gt;InfluxDB&lt;/a&gt; in its new storage engine.&lt;/p&gt;
&lt;p&gt;The first technique I implemented is easy enough, and it&apos;s inspired from delta-of-delta encoding. Instead of storing each timestamp for each data point, and since all the data points are aggregated on a regular interval, we transpose points to be the time difference divided by the interval. For example, the suite of timestamps &lt;code&gt;timestamps = [41230, 41235, 41240, 41250, 41255]&lt;/code&gt; is encoded into &lt;code&gt;timestamps = [41230, 1, 1, 2, 1], interval = 5&lt;/code&gt;. This allows regular compression algorithms to reduce the size of the integer list using &lt;a href=&quot;https://en.wikipedia.org/wiki/Run-length_encoding&quot;&gt;run-length encoding&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To actually compress the values, I tried two different algorithms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)&quot;&gt;LZ4&lt;/a&gt;, a fast compression/decompression algorithm&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The XOR based compression scheme described in the Gorilla paper mentioned above – that &lt;a href=&quot;https://gist.github.com/jd/b0aa5cbfa42f4eb23eb9&quot;&gt;I had to implement myself&lt;/a&gt;. For reference, it also exists a &lt;a href=&quot;http://golang.org&quot;&gt;Go&lt;/a&gt; implementation in &lt;a href=&quot;https://github.com/dgryski/go-tsz&quot;&gt;go-tsz&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I then benchmarked these solutions:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/gnocchi-carbonara-compression-speed.png&quot; alt=&quot;gnocchi-carbonara-compression-speed&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The XOR algorithm implemented in Python is pretty slow, compared to LZ4. Truth is that &lt;a href=&quot;https://github.com/steeve/python-lz4&quot;&gt;python-lz4&lt;/a&gt; is fully implemented in C, which makes it fast. I&apos;ve profiled my XOR implementation in Python, to discover that one operation took 20 % of the time: &lt;code&gt;count_lead_and_trail_zeroes&lt;/code&gt;, which is in charge of counting the number of leading and trailing zeroes in a binary number.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/gnocchi-carbonara-xor-profiling.png&quot; alt=&quot;gnocchi-carbonara-xor-profiling&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I tried 2 Python implementations of the same algorithm (and submitted them to my friend and Python developer &lt;a href=&quot;http://haypo-notes.readthedocs.org/&quot;&gt;Victor Stinner&lt;/a&gt; by the way).&lt;/p&gt;
&lt;p&gt;The first version using string search with &lt;code&gt;.index()&lt;/code&gt; is 10× faster than the second one that only do integer computation. Ah, Python… As Victor explained, each Python operation is slow and there&apos;s a lot in the second version, whereas &lt;code&gt;.index()&lt;/code&gt; is implemented in C and really well optimized and only needs 2 Python operations.&lt;/p&gt;
&lt;p&gt;Finally, I ended up optimizing that code by leveraging &lt;a href=&quot;https://cffi.readthedocs.org/en/latest/&quot;&gt;cffi&lt;/a&gt; to use directly &lt;code&gt;ffsll()&lt;/code&gt; and &lt;code&gt;flsll()&lt;/code&gt;. That decreased the run-time of &lt;code&gt;count_lead_and_trail_zeroes&lt;/code&gt; by 45 %, making the entire XOR compression code speed increased by a small 7 %. This is not enough to catch up with LZ4 speed. At this stage, the only solution to achieve a high-speed would probably to go with a full C implementation.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/gnocchi-carbonara-compression-size.png&quot; alt=&quot;gnocchi-carbonara-compression-size&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Considering the compression ratio of the different algorithms, they are pretty much identical. The worst case scenario (random values) for LZ4 compress down to 9 bytes per data point, whereas XOR can go down to 7.38 bytes per data point. In general XOR encoding beats LZ4 by 15 %, except for cases where all values are 0 or 1. However, LZ4 is faster than XOR by a factor of 4×-70× depending on cases.&lt;/p&gt;
&lt;p&gt;That means that we&apos;ll use LZ4 for data compression in Gnocchi 2.0. It&apos;s possible that we could achieve as fast compression/decompression algorithm, but I don&apos;t think it&apos;s worth the effort right now – it&apos;d represent a lot of code to write and to maintain.&lt;/p&gt;
</content:encoded><category>gnocchi</category><category>python</category></item><item><title>Profiling Python using cProfile: a concrete case</title><link>https://julien.danjou.info/blog/guide-to-python-profiling-cprofile-concrete-case-carbonara/</link><guid isPermaLink="true">https://julien.danjou.info/blog/guide-to-python-profiling-cprofile-concrete-case-carbonara/</guid><description>Writing programs is fun, but making them fast can be a pain. Python programs are no exception to that, but the basic profiling toolchain is actually not that complicated to use. Here, I would like to</description><pubDate>Mon, 16 Nov 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Writing programs is fun, but making them fast can be a pain. Python programs are no exception to that, but the basic profiling toolchain is actually not that complicated to use. Here, I would like to show you how you can quickly profile and analyze your Python code to find what part of the code you should optimize.&lt;/p&gt;
&lt;h2&gt;What&apos;s profiling?&lt;/h2&gt;
&lt;p&gt;Profiling a Python program is doing a dynamic analysis that measures the execution time of the program and everything that compose it. That means measuring the time spent in each of its functions. This will give you data about where your program is spending time, and what area might be worth optimizing.&lt;/p&gt;
&lt;p&gt;It&apos;s a very interesting exercise. Many people focus on local optimizations, such as determining e.g. which of the Python functions &lt;code&gt;range&lt;/code&gt; or &lt;code&gt;xrange&lt;/code&gt; is going to be faster. It turns out that knowing which one is faster may never be an issue in your program, and that the time gained by one of the functions above might not be worth the time you spend researching that, or arguing about it with your colleague.&lt;/p&gt;
&lt;p&gt;Trying to blindly optimize a program without measuring where it is actually spending its time is a useless exercise. Following your guts alone is not always sufficient.&lt;/p&gt;
&lt;p&gt;There are many types of profiling, as there are many things you can measure. In this exercise, we&apos;ll focus on CPU utilization profiling, meaning the time spent by each function executing instructions. Obviously, we could do many more kind of profiling and optimizations, such as memory profiling which would measure the memory used by each piece of code – something I talk about in &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;cProfile&lt;/h2&gt;
&lt;p&gt;Since Python 2.5, Python provides a C module called &lt;em&gt;&lt;a href=&quot;https://docs.python.org/2/library/profile.html&quot;&gt;cProfile&lt;/a&gt;&lt;/em&gt; which has a reasonable overhead and offers a good enough feature set. The basic usage goes down to:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import cProfile
&amp;gt;&amp;gt;&amp;gt; cProfile.run(&apos;2 + 2&apos;)
         2 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 &amp;lt;string&amp;gt;:1(&amp;lt;module&amp;gt;)
        1    0.000    0.000    0.000    0.000 {method &apos;disable&apos; of &apos;_lsprof.Profiler&apos; objects}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Though you can also run a script with it, which turns out to be handy:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ python -m cProfile -s cumtime lwn2pocket.py
         72270 function calls (70640 primitive calls) in 4.481 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.004    0.004    4.481    4.481 lwn2pocket.py:2(&amp;lt;module&amp;gt;)
        1    0.001    0.001    4.296    4.296 lwn2pocket.py:51(main)
        3    0.000    0.000    4.286    1.429 api.py:17(request)
        3    0.000    0.000    4.268    1.423 sessions.py:386(request)
      4/3    0.000    0.000    3.816    1.272 sessions.py:539(send)
        4    0.000    0.000    2.965    0.741 adapters.py:323(send)
        4    0.000    0.000    2.962    0.740 connectionpool.py:421(urlopen)
        4    0.000    0.000    2.961    0.740 connectionpool.py:317(_make_request)
        2    0.000    0.000    2.675    1.338 api.py:98(post)
       30    0.000    0.000    1.621    0.054 ssl.py:727(recv)
       30    0.000    0.000    1.621    0.054 ssl.py:610(read)
       30    1.621    0.054    1.621    0.054 {method &apos;read&apos; of &apos;_ssl._SSLSocket&apos; objects}
        1    0.000    0.000    1.611    1.611 api.py:58(get)
        4    0.000    0.000    1.572    0.393 httplib.py:1095(getresponse)
        4    0.000    0.000    1.572    0.393 httplib.py:446(begin)
       60    0.000    0.000    1.571    0.026 socket.py:410(readline)
        4    0.000    0.000    1.571    0.393 httplib.py:407(_read_status)
        1    0.000    0.000    1.462    1.462 pocket.py:44(wrapped)
        1    0.000    0.000    1.462    1.462 pocket.py:152(make_request)
        1    0.000    0.000    1.462    1.462 pocket.py:139(_make_request)
        1    0.000    0.000    1.459    1.459 pocket.py:134(_post_request)
[…]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This prints out all the function called, with the time spend in each and the number of times they have been called.&lt;/p&gt;
&lt;h3&gt;Advanced visualization with KCacheGrind&lt;/h3&gt;
&lt;p&gt;While being useful, the output format is very basic and does not make easy to grab knowledge for complete programs. For more advanced visualization, I leverage &lt;a href=&quot;https://kcachegrind.github.io/html/Home.html&quot;&gt;KCacheGrind&lt;/a&gt;. If you did any C programming and profiling these last years, you may have used it as it is primarily designed as front-end for &lt;a href=&quot;http://valgrind.org/&quot;&gt;Valgrind&lt;/a&gt; generated call-graphs.&lt;/p&gt;
&lt;p&gt;In order to use, you need to generate a &lt;em&gt;cProfile&lt;/em&gt; result file, then convert it to KCacheGrind format. To do that, I use &lt;em&gt;&lt;a href=&quot;https://pypi.python.org/pypi/pyprof2calltree&quot;&gt;pyprof2calltree&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ python -m cProfile -o myscript.cprof myscript.py
$ pyprof2calltree -k -i myscript.cprof
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And the KCacheGrind window magically appears!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/kcachegrind.png&quot; alt=&quot;kcachegrind&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Concrete case: Carbonara optimization&lt;/h2&gt;
&lt;p&gt;I was curious about the performances of &lt;a href=&quot;https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/carbonara.py&quot;&gt;Carbonara&lt;/a&gt;, the small timeseries library I wrote for &lt;a href=&quot;http://launchpad.net/gnocchi&quot;&gt;Gnocchi&lt;/a&gt;. I decided to do some basic profiling to see if there was any obvious optimization to do.&lt;/p&gt;
&lt;p&gt;In order to profile a program, you need to run it. But running the whole program in profiling mode can generate &lt;em&gt;a lot&lt;/em&gt; of data that you don&apos;t care about, and adds noise to what you&apos;re trying to understand. Since Gnocchi has thousands of unit tests and a few for Carbonara itself, I decided to profile the code used by these unit tests, as it&apos;s a good reflection of basic features of the library.&lt;/p&gt;
&lt;p&gt;Note that this is a good strategy for a curious and naive first-pass profiling.&lt;br /&gt;
There&apos;s no way that you can make sure that the hotspots you will see in the unit tests are the actual hotspots you will encounter in production. Therefore, a profiling in conditions and with a scenario that mimics what&apos;s seen in production is often a necessity if you need to push your program optimization further and want to achieve perceivable and valuable gain.&lt;/p&gt;
&lt;p&gt;I activated &lt;em&gt;cProfile&lt;/em&gt; using the method described above, creating a &lt;code&gt;cProfile.Profile&lt;/code&gt; object around my tests (I actually &lt;a href=&quot;https://github.com/testing-cabal/testtools/pull/163&quot;&gt;started to implement that in testtools&lt;/a&gt;). I then run &lt;em&gt;KCacheGrind&lt;/em&gt; as described above. Using &lt;em&gt;KCacheGrind&lt;/em&gt;, I generated the following figures.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/kcachegrind-carbonara-old-list.png&quot; alt=&quot;kcachegrind-carbonara-old-list&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The test I profiled here is called &lt;code&gt;test_fetch&lt;/code&gt; and is pretty easy to understand: it puts data in a timeserie object, and then fetch the aggregated result. The above list shows that 88 % of the ticks are spent in &lt;code&gt;set_values&lt;/code&gt; (44 ticks over 50). This function is used to insert values into the timeserie, not to fetch the values. That means that it&apos;s really slow to insert data, and pretty fast to actually retrieve them.&lt;/p&gt;
&lt;p&gt;Reading the rest of the list indicates that several functions share the rest of the ticks, &lt;code&gt;update&lt;/code&gt;, &lt;code&gt;_first_block_timestamp&lt;/code&gt;, &lt;code&gt;_truncate&lt;/code&gt;, &lt;code&gt;_resample&lt;/code&gt;, etc. Some of the functions in the list are not part of Carbonara, so there&apos;s no point in looking to optimize them. The only thing that can be optimized is, sometimes, the number of times they&apos;re called.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/kcachegrind-carbonara-old-graph.png&quot; alt=&quot;kcachegrind-carbonara-old-graph&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The call graph gives me a bit more insight about what&apos;s going on here. Using my knowledge about how Carbonara works, I don&apos;t think that the whole stack on the left for &lt;code&gt;_first_block_timestamp&lt;/code&gt; makes much sense. This function is supposed to find the first timestamp for an aggregate, e.g. with a timestamp of 13:34:45 and a period of 5 minutes, the function should return 13:30:00. The way it works currently is by calling the &lt;code&gt;resample&lt;/code&gt; function from Pandas on a timeserie with only one element, but that seems to be very slow. Indeed, currently this function represents 25 % of the time spent by &lt;code&gt;set_values&lt;/code&gt; (11 ticks on 44).&lt;/p&gt;
&lt;p&gt;Fortunately, I recently added a small function called &lt;code&gt;_round_timestamp&lt;/code&gt; that does exactly what &lt;code&gt;_first_block_timestamp&lt;/code&gt; needs that without calling any Pandas function, so no &lt;code&gt;resample&lt;/code&gt;. So I ended up rewriting that function this way:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;     def _first_block_timestamp(self):
-        ts = self.ts[-1:].resample(self.block_size)
-        return (ts.index[-1] - (self.block_size * self.back_window))
+        rounded = self._round_timestamp(self.ts.index[-1], self.block_size)
+        return rounded - (self.block_size * self.back_window)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then I re-run the exact same test to compare the output of &lt;em&gt;cProfile&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/kcachegrind-carbonara-new-list.png&quot; alt=&quot;kcachegrind-carbonara-new-list&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The list of function seems quite different this time. The number of time spend used by &lt;code&gt;set_values&lt;/code&gt; dropped from 88 % to 71 %.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/kcachegrind-carbonara-new-graph.png&quot; alt=&quot;kcachegrind-carbonara-new-graph&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The call stack for &lt;code&gt;set_values&lt;/code&gt; shows that pretty well: we can&apos;t even see the &lt;code&gt;_first_block_timestamp&lt;/code&gt; function as it is so fast that it totally disappeared from the display. It&apos;s now being considered insignificant by the profiler.&lt;/p&gt;
&lt;p&gt;So we just speed up the whole insertion process of values into Carbonara by a nice 25 % in a few minutes. Not that bad for a first naive pass, right?&lt;/p&gt;
&lt;p&gt;If you want to know more, I wrote a whole chapter about optimizing code in &lt;a href=&quot;https://scaling-python.com&quot;&gt;Scaling Python&lt;/a&gt;. Check it out!&lt;/p&gt;
</content:encoded><category>python</category><category>gnocchi</category></item><item><title>My interview in le Journal du Hacker</title><link>https://julien.danjou.info/blog/interview-journal-du-hacker/</link><guid isPermaLink="true">https://julien.danjou.info/blog/interview-journal-du-hacker/</guid><description>Le Journal du Hacker interviewed me about my work on OpenStack, my job at Red Hat, and my self-published book The Hacker&apos;s Guide to Python.</description><pubDate>Thu, 17 Sep 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few days ago, the French equivalent of &lt;a href=&quot;https://news.ycombinator.com/&quot;&gt;Hacker News&lt;/a&gt;, called &quot;&lt;a href=&quot;https://www.journalduhacker.net/&quot;&gt;Le Journal du Hacker&lt;/a&gt;&quot;, &lt;a href=&quot;https://www.journalduhacker.net/s/l5qktw/journal_du_hacker_entretien_avec_julien_danjou_d_veloppeur_openstack&quot;&gt;interviewed me&lt;/a&gt; about my work on &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt;, my job at &lt;a href=&quot;http://redhat.com&quot;&gt;Red Hat&lt;/a&gt; and my self-published book &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;. I&apos;ve spent some time translating it into English so you can read it if you don&apos;t understand French! I hope you&apos;ll enjoy it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Hi Julien, and thanks for participating in this interview for the Journal du Hacker. For our readers who don&apos;t know you, can you introduce you briefly?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You&apos;re welcome! My name is Julien, I&apos;m 31 years old, and I live in Paris. I now have been developing free software for around fifteen years. I had the pleasure to work (among other things) on &lt;a href=&quot;http://debian.org&quot;&gt;Debian&lt;/a&gt;, &lt;a href=&quot;https://www.gnu.org/software/emacs/&quot;&gt;Emacs&lt;/a&gt; and &lt;a href=&quot;http://awesome.naquadah.org&quot;&gt;awesome&lt;/a&gt; these last years, and more recently on OpenStack. Since a few months now, I work at Red Hat, as a Principal Software Engineer on &lt;a href=&quot;http://opensack.org&quot;&gt;OpenStack&lt;/a&gt;. I am in charge of doing upstream development for that cloud-computing platform, mainly around the Ceilometer, Aodh and Gnocchi projects.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Being myself a system architect, I follow your work in &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; since a while. It&apos;s uncommon to have the point of view of someone as implied as you are. Can you give us a summary of the state of the project, and then detail your activities in this project?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; project has grown and changed a lot since I started 4 years ago. It started as a few projects providing the basics, like &lt;a href=&quot;https://launchpad.net/nova&quot;&gt;Nova&lt;/a&gt; (compute), &lt;a href=&quot;https://launchpad.net/swift&quot;&gt;Swift&lt;/a&gt; (object storage), &lt;a href=&quot;https://launchpad.net/cinder&quot;&gt;Cinder&lt;/a&gt; (volume), &lt;a href=&quot;https://launchpad.net/keystone&quot;&gt;Keystone&lt;/a&gt; (identity) or even &lt;a href=&quot;https://launchpad.net/neutron&quot;&gt;Neutron&lt;/a&gt; (network) who are basis for a cloud-computing platform, and finally became composed of a lot more projects.&lt;/p&gt;
&lt;p&gt;For a while, the inclusion of projects was the subject of a strict review from the technical committee. But since a few months, the rules have been relaxed, and we see a lot more projects connected to cloud-computing &lt;a href=&quot;http://governance.openstack.org/reference/projects/&quot;&gt;joining us&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As far as I&apos;m concerned, I&apos;ve started with a few others people the &lt;a href=&quot;http://governance.openstack.org/reference/projects/ceilometer.html&quot;&gt;Ceilometer&lt;/a&gt; project in 2012, devoted to handling metrics of OpenStack platforms. Our goal is to be able to collect all the metrics and record them to analyze them later. We also have a module providing the ability to trigger actions on threshold crossing (alarm).&lt;/p&gt;
&lt;p&gt;The project grew in a monolithic way, and in a linear way for the number of contributors, during the first two years. I was the PTL (Project Technical Leader) for a year. This leader position asks for a lot of time for bureaucratic things and people management, so I decided to leave my spot in order to be able to spend more time solving the technical challenges that Ceilometer offered.&lt;/p&gt;
&lt;p&gt;I&apos;ve started the &lt;a href=&quot;https://launchpad.net/gnocchi&quot;&gt;Gnocchi&lt;/a&gt; project in 2014. The first stable version (1.0.0) was released a few months ago. It&apos;s a timeseries database offering a REST API and a strong ability to scale. It was a necessary development to solve the problems tied to the large amount of metrics created by a cloud-computing platform, where tens of thousands of virtual machines have to be metered as often as possible. This project works as a standalone deployment or with the rest of OpenStack.&lt;/p&gt;
&lt;p&gt;More recently, I&apos;ve started &lt;a href=&quot;https://launchpad.net/aodh&quot;&gt;Aodh&lt;/a&gt;, the result of moving out the code and features of Ceilometer related to threshold action triggering (alarming). That&apos;s the logical suite to what we started with Gnocchi. It means Ceilometer is to be split into independent modules that can work together – with or without OpenStack. It seems to me that the features provided by Ceilometer, Aodh and Gnocchi can also be interesting for operators running more classical infrastructures. That&apos;s why I&apos;ve pushed the projects into that direction, and also to have a more service-oriented architecture (&lt;a href=&quot;https://fr.wikipedia.org/wiki/Architecture_orient%C3%A9e_services&quot;&gt;SOA&lt;/a&gt;).&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I&apos;d like to stop for a moment on Ceilometer. I think that this solution was very expected, especially by the cloud-computing providers using OpenStack for billing resources sold to their customers. I remember reading a blog post where you were talking about the high-speed construction of this brick, and features that were not supposed to be there. Nowadays, with Gnocchi and Aodh, what is the quality of the brick Ceilometer and the programs it relies on?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Indeed, one of the first use-case for Ceilometer was tied to the ability to get metrics to feed a billing tool. That&apos;s now a reached goal since we have billing tools for OpenStack using Ceilometer, such as &lt;a href=&quot;https://wiki.openstack.org/wiki/CloudKitty&quot;&gt;CloudKitty&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, other use-cases appeared rapidly, such as the ability to trigger alarms. This feature was necessary, for example, to implement the auto scaling feature that &lt;a href=&quot;http://launchpad.net/heat&quot;&gt;Heat&lt;/a&gt; needed. At the time, for technical and political reasons, it was not possible to implement this feature in a new project, and the functionality ended up in Ceilometer, since it was using the metrics collected and stored by Ceilometer itself.&lt;/p&gt;
&lt;p&gt;Though, like I said, this feature is now in its own project, Aodh. The alarm feature is used since a few cycles in production, and the Aodh project brings new features on the table. It allows to trigger threshold actions and is one of the few solutions able to work at high scale with several thousands of alarms.&lt;br /&gt;
It&apos;s impossible to make Nagios run with millions of instances to fetch metrics and triggers alarms. Ceilometer and Aodh can do that easily on a few tens of nodes automatically.&lt;/p&gt;
&lt;p&gt;On the other side, Ceilometer has been for a long time painted as slow and complicated to use, because its metrics storage system was by default using &lt;a href=&quot;https://www.mongodb.org/&quot;&gt;MongoDB&lt;/a&gt;. Clearly, the data structure model picked was not optimal for what the users were doing with the data.&lt;/p&gt;
&lt;p&gt;That&apos;s why I started Gnocchi last year, which is perfectly designed for this use case. It allows linear access time to metrics (O(1) complexity) and fast access time to the resources data via an index.&lt;/p&gt;
&lt;p&gt;Today, with 3 projects having their own perimeter of features defined – and which can work together – Ceilometer, Aodh and Gnocchi finally erased the biggest problems and defects of the initial project.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To end with OpenStack, one last question. You&apos;re a &lt;a href=&quot;http://www.python.org/&quot;&gt;Python&lt;/a&gt; developer for a long time and a fervent user of software testing and &lt;a href=&quot;https://en.wikipedia.org/wiki/Test_driven_development&quot;&gt;test-driven development&lt;/a&gt;. Several of your blogs posts point how important their usage are. Can you tell us more about the usage of tests in OpenStack, and the test prerequisites to contribute to OpenStack?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I don&apos;t know any project that is as tested on every layer as OpenStack is. At the start of the project, there was a vague test coverage, made of a few unit tests. For each release, a bunch of new features were provided, and you had to keep your fingers crossed to have them working. That&apos;s already almost unacceptable. But the big issue was that there was also a lot of regressions, et things that were working were not anymore. It was often corner cases that developers forgot about that stopped working.&lt;/p&gt;
&lt;p&gt;Then the project decided to change its policy and started to refuse all patches – new features or bug fix – that would not implement a minimal set of unit tests, proving the patch would work. Quickly, regressions were history, and the number of bugs largely reduced months after months.&lt;/p&gt;
&lt;p&gt;Then came the functional tests, with the &lt;a href=&quot;http://launchpad.net/tempest&quot;&gt;Tempest&lt;/a&gt; project, which runs a test battery on a complete OpenStack deployment.&lt;/p&gt;
&lt;p&gt;OpenStack now possesses a &lt;a href=&quot;http://status.openstack.org/zuul/&quot;&gt;complete test infrastructure&lt;/a&gt;, with operators hired full-time to maintain them. The developers have to write the test, and the operators maintain an architecture based on Gerrit, Zuul, and Jenkins, which runs the test battery of each project for each patch sent.&lt;/p&gt;
&lt;p&gt;Indeed, for each version of a patch sent, a full OpenStack is deployed into a virtual machine, and a battery of thousands of unit and functional tests is run to check that no regressions are possible.&lt;/p&gt;
&lt;p&gt;To contribute to OpenStack, you need to know how to write a unit test – the policy on functional tests is laxer. The tools used are standard Python tools, unittest for the framework and &lt;a href=&quot;https://pypi.python.org/pypi/tox&quot;&gt;tox&lt;/a&gt; to run a virtual environment (venv) and run them.&lt;/p&gt;
&lt;p&gt;It&apos;s also possible to use &lt;a href=&quot;http://docs.openstack.org/developer/devstack/&quot;&gt;DevStack&lt;/a&gt; to deploy an OpenStack platform on a virtual machine and run functional tests. However, since the project infrastructure also do that when a patch is submitted, it&apos;s not mandatory to do that yourself locally.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The tools and tests you write for OpenStack are written in Python, a language which is very popular today. You seem to like it more than you have to, since you wrote a book about it, &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;, that I really enjoyed. Can you explain what brought you to Python, the main strong points you attribute to this language (quickly) and how you went from developer to author?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I stumbled upon Python by chance, around 2005. I don&apos;t remember how I hear about it, but I bought a first book to discover it and started toying with that language. At that time, I didn&apos;t find any project to contribute to or to start. My first project with Python was rebuildd for Debian in 2007, a bit later.&lt;/p&gt;
&lt;p&gt;I like Python for its simplicity, its object orientation rather clean, its easiness to be deployed and its rich open source ecosystem. Once you get the basics, it&apos;s very easy to evolve and to use it for anything, because the ecosystem makes it easy to find libraries to solve any kind of problem.&lt;/p&gt;
&lt;p&gt;I became an author by chance, writing blog posts from time to time about Python. I finally realized that after a few years studying Python internals (CPython), I learned a lot of things. While writing a post about&lt;br /&gt;
&lt;a href=&quot;https://julien.danjou.info/blog/2013/guide-python-static-class-abstract-methods&quot;&gt;the differences between method types in Python&lt;/a&gt; – which is still one of the most read post on my blog – I realized that a lot of things that seemed obvious to me where not for other developers.&lt;/p&gt;
&lt;p&gt;I wrote that initial post after thousands of hours spent doing code reviews on OpenStack. I, therefore, decided to note all the developers pain points and to write a book about that. A compilation of what years of experience taught me and taught to the other developers I decided to interview in the book.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I&apos;ve been very interested by the publication of your book, for the subject itself, but also the process you chose. You self-published the book, which seems very relevant nowadays. Is that a choice from the start? Did you look for an editor? Can you tell use more about that?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&apos;ve been lucky to find out about others self-published authors, such as &lt;a href=&quot;http://nathanbarry.com/&quot;&gt;Nathan Barry&lt;/a&gt; – who even wrote a book on that subject, called &lt;a href=&quot;http://nathanbarry.com/authority/&quot;&gt;Authority&lt;/a&gt;. That&apos;s what convinced me it was possible and gave me hints for that project.&lt;/p&gt;
&lt;p&gt;I&apos;ve started to write in August 2013, and I ran the firs interviews with other developers at that time. I started to write the table of contents and then filled the pages with what I knew and what I wanted to share. I manage to finish the book around January 2014. The proof-reading took more time than I expected, so the book was only released in March 2014. I wrote a &lt;a href=&quot;https://julien.danjou.info/blog/making-of-the-hacker-guide-to-python&quot;&gt;complete report&lt;/a&gt; about that on my blog, where I explain the full process in detail, from writing to launching.&lt;/p&gt;
&lt;p&gt;I did not look for editors though I&apos;ve been proposed some. The idea of self-publishing really convince me, so I decided to go on my own, and I have no regret. It&apos;s true that you have to wear two hats at the same time and handle a lot more things, but with a minimal audience and some help from the Internet, anything&apos;s possible!&lt;/p&gt;
&lt;p&gt;I&apos;ve been reached by two editors since then, a &lt;a href=&quot;http://item.jd.com/11685556.html&quot;&gt;Chinese&lt;/a&gt; and &lt;a href=&quot;https://twitter.com/juldanjou/status/552056642322583552&quot;&gt;Korean&lt;/a&gt; one. I gave them rights to translate and publish the books in their countries, so you can buy the Chinese and Korean version of the first edition of the book out there.&lt;/p&gt;
&lt;p&gt;Seeing how successful it was, I decided to launch a second edition in May 2015, and it&apos;s likely that a third edition will be released in 2016.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Nowadays, you work for &lt;a href=&quot;http://www.redhat.com&quot;&gt;Red Hat&lt;/a&gt;, a company that represents the success of using Free Software as a commercial business model. This company fascinates a lot in our community. What can you say about your employer from your point of view?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It only has been a year since I joined Red Hat (when they bought &lt;a href=&quot;http://www.enovance.com/&quot;&gt;eNovance&lt;/a&gt;), so my experience is quite recent.&lt;/p&gt;
&lt;p&gt;Though, Red Hat is really a special company on every level. It&apos;s hard to see from the outside how open it is, and how it works. It&apos;s really close to and it really looks like an open source project. For more details, you should read &lt;a href=&quot;https://www.redhat.com/en/explore/the-open-organization-book&quot;&gt;The Open Organization&lt;/a&gt;, a book wrote by Jim Whitehurst (CEO of Red Hat), which he just published. It describes perfectly how Red Hat works. To summarize, meritocracy and the lack of organization in silos is what makes Red Hat a strong organization and puts them as&lt;br /&gt;
&lt;a href=&quot;http://www.forbes.com/innovative-companies/list/&quot;&gt;one of the most innovative company&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the end, I&apos;m lucky enough to be autonomous for the project I work on with my team around OpenStack, and I can spend 100% working upstream and enhance the Python ecosystem.&lt;/p&gt;
</content:encoded><category>career</category><category>openstack</category><category>books</category><category>python</category></item><item><title>Data validation in Python with voluptuous</title><link>https://julien.danjou.info/blog/python-schema-validation-voluptuous/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-schema-validation-voluptuous/</guid><description>Continuing my post series on the tools I use these days in Python, this time I would like to talk about a library I really like, named voluptuous.</description><pubDate>Fri, 04 Sep 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Continuing my post series on the tools I use these days in Python, this time I would like to talk about a library I really like, named &lt;em&gt;&lt;a href=&quot;https://pypi.python.org/pypi/voluptuous&quot;&gt;voluptuous&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;It&apos;s no secret that most of the time, when a program receives data from the outside, it&apos;s a big deal to handle it. Indeed, most of the time your program has no guarantee that the stream is valid and that it contains what is expected.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://en.wikipedia.org/wiki/Robustness_principle&quot;&gt;robustness principle&lt;/a&gt; says you should be liberal in what you accept, though &lt;a href=&quot;http://cacm.acm.org/magazines/2011/8/114933-the-robustness-principle-reconsidered/fulltext&quot;&gt;that is not always a good idea&lt;/a&gt; neither. Whatever policy you chose, you need to process those data and implement a policy that will work – lax or not.&lt;/p&gt;
&lt;p&gt;That means that the program need to look into the data received, check that it finds everything it needs, complete what might be missing (e.g. set some default), transform some data, and maybe reject those data in the end.&lt;/p&gt;
&lt;h2&gt;Data validation&lt;/h2&gt;
&lt;p&gt;The first step is to validate the data, which means checking all the fields are there and all the types are right or understandable (parseable). &lt;em&gt;Voluptuous&lt;/em&gt; provides a single interface for all that called a &lt;code&gt;Schema&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema
&amp;gt;&amp;gt;&amp;gt; s = Schema({
...   &apos;q&apos;: str,
...   &apos;per_page&apos;: int,
...   &apos;page&apos;: int,
... })
&amp;gt;&amp;gt;&amp;gt; s({&quot;q&quot;: &quot;hello&quot;})
{&apos;q&apos;: &apos;hello&apos;}
&amp;gt;&amp;gt;&amp;gt; s({&quot;q&quot;: &quot;hello&quot;, &quot;page&quot;: &quot;world&quot;})
voluptuous.MultipleInvalid: expected int for dictionary value @ data[&apos;page&apos;]
&amp;gt;&amp;gt;&amp;gt; s({&quot;q&quot;: &quot;hello&quot;, &quot;unknown&quot;: &quot;key&quot;})
voluptuous.MultipleInvalid: extra keys not allowed @ data[&apos;unknown&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The argument to &lt;code&gt;voluptuous.Schema&lt;/code&gt; should be the data structure that you expect. &lt;em&gt;Voluptuous&lt;/em&gt; accepts any kind of data structure, so it could also be a simple string or an array of dict of array of integer. You get it. Here it&apos;s a &lt;code&gt;dict&lt;/code&gt; with a few keys that if present should be validated as certain types. By default, &lt;em&gt;Voluptuous&lt;/em&gt; does not raise an error if some keys are missing. However, it is invalid to have extra keys in a dict by default. If you want to allow extra keys, it is possible to specify it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema
&amp;gt;&amp;gt;&amp;gt; s = Schema({&quot;foo&quot;: str}, extra=True)
&amp;gt;&amp;gt;&amp;gt; s({&quot;bar&quot;: 2})
{&quot;bar&quot;: 2}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It is also possible to make some keys mandatory.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema, Required
&amp;gt;&amp;gt;&amp;gt; s = Schema({Required(&quot;foo&quot;): str})
&amp;gt;&amp;gt;&amp;gt; s({})
voluptuous.MultipleInvalid: required key not provided @ data[&apos;foo&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can create custom data type very easily. &lt;em&gt;Voluptuous&lt;/em&gt; data types are actually just functions that are called with one argument, the value, and that should either return the value or raise an &lt;code&gt;Invalid&lt;/code&gt; or &lt;code&gt;ValueError&lt;/code&gt; exception.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema, Invalid
&amp;gt;&amp;gt;&amp;gt; def StringWithLength5(value):
...     if isinstance(value, str) and len(value) == 5:
...             return value
...     raise Invalid(&quot;Not a string with 5 chars&quot;)
...
&amp;gt;&amp;gt;&amp;gt; s = Schema(StringWithLength5)
&amp;gt;&amp;gt;&amp;gt; s(&quot;hello&quot;)
&apos;hello&apos;
&amp;gt;&amp;gt;&amp;gt; s(&quot;hello world&quot;)
voluptuous.MultipleInvalid: Not a string with 5 chars
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Most of the time though, there is no need to create your own data types. &lt;em&gt;Voluptuous&lt;/em&gt; provides logical operators that can, combined with a few others provided primitives such as &lt;code&gt;voluptuous.Length&lt;/code&gt; or &lt;code&gt;voluptuous.Range&lt;/code&gt;, create a large range of validation scheme.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema, Length, All
&amp;gt;&amp;gt;&amp;gt; s = Schema(All(str, Length(min=3, max=5)))
&amp;gt;&amp;gt;&amp;gt; s(&quot;hello&quot;)
&quot;hello&quot;
&amp;gt;&amp;gt;&amp;gt; s(&quot;hello world&quot;)
voluptuous.MultipleInvalid: length of value must be at most 5
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;a href=&quot;https://pypi.python.org/pypi/voluptuous&quot;&gt;voluptuous documentation&lt;/a&gt; has a good set of examples that you can check to have a good overview of what you can do.&lt;/p&gt;
&lt;h2&gt;Data transformation&lt;/h2&gt;
&lt;p&gt;What&apos;s important to remember, is that each data type that you use is a function that is called and returns a value, if the value is considered valid. That value returned is what is actually used and returned after the schema validation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import uuid
&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema
&amp;gt;&amp;gt;&amp;gt; def UUID(value):
...     return uuid.UUID(value)
...
&amp;gt;&amp;gt;&amp;gt; s = Schema({&quot;foo&quot;: UUID})
&amp;gt;&amp;gt;&amp;gt; data_converted = s({&quot;foo&quot;: &quot;uuid?&quot;})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data[&apos;foo&apos;]
&amp;gt;&amp;gt;&amp;gt; data_converted = s({&quot;foo&quot;: &quot;8B7BA51C-DFF5-45DD-B28C-6911A2317D1D&quot;})
&amp;gt;&amp;gt;&amp;gt; data_converted
{&apos;foo&apos;: UUID(&apos;8b7ba51c-dff5-45dd-b28c-6911a2317d1d&apos;)}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By defining a custom &lt;code&gt;UUID&lt;/code&gt; function that converts a value to a UUID, the schema converts the string passed in the data to a Python UUID object – validating the format at the same time.&lt;/p&gt;
&lt;p&gt;Note a little trick here: it&apos;s not possible to use directly &lt;code&gt;uuid.UUID&lt;/code&gt; in the schema, otherwise &lt;em&gt;Voluptuous&lt;/em&gt; would check that the data is actually an instance of &lt;code&gt;uuid.UUID&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema
&amp;gt;&amp;gt;&amp;gt; s = Schema({&quot;foo&quot;: uuid.UUID})
&amp;gt;&amp;gt;&amp;gt; s({&quot;foo&quot;: &quot;8B7BA51C-DFF5-45DD-B28C-6911A2317D1D&quot;})
voluptuous.MultipleInvalid: expected UUID for dictionary value @ data[&apos;foo&apos;]
&amp;gt;&amp;gt;&amp;gt; s({&quot;foo&quot;: uuid.uuid4()})
{&apos;foo&apos;: UUID(&apos;60b6d6c4-e719-47a7-8e2e-b4a4a30631ed&apos;)}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And that&apos;s not what is wanted here.&lt;/p&gt;
&lt;p&gt;That mechanism is really neat to transform, for example, strings to timestamps.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import datetime
&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema
&amp;gt;&amp;gt;&amp;gt; def Timestamp(value):
...     return datetime.datetime.strptime(value, &quot;%Y-%m-%dT%H:%M:%S&quot;)
...
&amp;gt;&amp;gt;&amp;gt; s = Schema({&quot;foo&quot;: Timestamp})
&amp;gt;&amp;gt;&amp;gt; s({&quot;foo&quot;: &apos;2015-03-03T12:12:12&apos;})
{&apos;foo&apos;: datetime.datetime(2015, 3, 3, 12, 12, 12)}
&amp;gt;&amp;gt;&amp;gt; s({&quot;foo&quot;: &apos;2015-03-03T12:12&apos;})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data[&apos;foo&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Recursive schemas&lt;/h2&gt;
&lt;p&gt;So far, &lt;em&gt;Voluptuous&lt;/em&gt; has one limitation so far: the ability to have recursive schemas. The simplest way to circumvent it is by using another function as an indirection.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from voluptuous import Schema, Any
&amp;gt;&amp;gt;&amp;gt; def _MySchema(value):
...     return MySchema(value)
...
&amp;gt;&amp;gt;&amp;gt; from voluptuous import Any
&amp;gt;&amp;gt;&amp;gt; MySchema = Schema({&quot;foo&quot;: Any(&quot;bar&quot;, _MySchema)})
&amp;gt;&amp;gt;&amp;gt; MySchema({&quot;foo&quot;: {&quot;foo&quot;: &quot;bar&quot;}})
{&apos;foo&apos;: {&apos;foo&apos;: &apos;bar&apos;}}
&amp;gt;&amp;gt;&amp;gt; MySchema({&quot;foo&quot;: {&quot;foo&quot;: &quot;baz&quot;}})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data[&apos;foo&apos;][&apos;foo&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Usage in REST API&lt;/h2&gt;
&lt;p&gt;I started to use &lt;em&gt;Voluptuous&lt;/em&gt; to validate data in a the REST API provided by &lt;a href=&quot;http://launchpad.net/gnocchi&quot;&gt;Gnocchi&lt;/a&gt;. So far it has been a really good tool, and we&apos;ve been able to &lt;a href=&quot;http://docs.openstack.org/developer/gnocchi/rest.html&quot;&gt;create a complete REST API&lt;/a&gt; that is very easy to validate on the server side. I would definitely recommend it for that. It blends with any Web framework easily.&lt;/p&gt;
&lt;p&gt;One of the upside compared to solution like &lt;a href=&quot;http://json-schema.org/&quot;&gt;JSON Schema&lt;/a&gt;, is the ability to create or re-use your own custom data types while converting values at validation time. It is also very Pythonic, and extensible – it&apos;s pretty great to use for all of that. It&apos;s also not tied to any serialization format.&lt;/p&gt;
&lt;p&gt;On the other hand, JSON Schema is language agnostic and is serializable itself as JSON. That makes it easy to be exported and provided to a consumer so it can understand the API and validate the data potentially on its side.&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Reading LWN.net with Pocket</title><link>https://julien.danjou.info/blog/announcing-lwn2pocket/</link><guid isPermaLink="true">https://julien.danjou.info/blog/announcing-lwn2pocket/</guid><description>I&apos;ve started to use Pocket a few months ago to store my backlog of things to read. It&apos;s especially useful as I can use it to read content offline since we still don&apos;t have any Internet access in.</description><pubDate>Thu, 13 Aug 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve started to use &lt;a href=&quot;https://pocket.co&quot;&gt;Pocket&lt;/a&gt; a few months ago to store my backlog of things to read. It&apos;s especially useful as I can use it to read content offline since we still don&apos;t have any Internet access in places such as airplanes or the Paris metro. It&apos;s only 2015 after all.&lt;/p&gt;
&lt;p&gt;I am also a &lt;a href=&quot;http://lwn.net&quot;&gt;LWN.net&lt;/a&gt; subscriber for years now, and I really like their articles from the weekly edition. Unfortunately, as the access is restricted to subscribers, you need to login: it makes it impossible to add these articles to Pocket directly. Sad.&lt;/p&gt;
&lt;p&gt;Yesterday, I thought about that and decided to start hacking on it. LWN provides a feature called &quot;Subscriber Link&quot; that allows you to share an article with a friend. I managed to use that feature to share the articles with my friend… Pocket!&lt;/p&gt;
&lt;p&gt;As doing that every week is tedious, I wrote a small Python program called &lt;a href=&quot;https://github.com/jd/lwn2pocket&quot;&gt;lwn2pocket&lt;/a&gt; that I published on &lt;a href=&quot;http://github.com&quot;&gt;GitHub&lt;/a&gt;. Feel free to use it, hack it and send pull requests.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/lwn2pocket.png&quot; alt=&quot;lwn2pocket&quot; /&gt;&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Timezones and Python</title><link>https://julien.danjou.info/blog/python-and-timezones/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-and-timezones/</guid><description>Recently, I&apos;ve been fighting with the never ending issue of timezones. I never thought I would have plunged into this rabbit hole, but hacking on OpenStack and Gnocchi I felt into that trap easily is,</description><pubDate>Tue, 16 Jun 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Recently, I&apos;ve been fighting with the never ending issue of timezones. I never thought I would have plunged into this rabbit hole, but hacking on OpenStack and Gnocchi I felt into that trap easily is, thanks to Python.&lt;/p&gt;
&lt;h2&gt;“Why you really, really, should never ever deal with timezones”&lt;/h2&gt;
&lt;p&gt;To get a glimpse of the complexity of timezones, I recommend that you watch &lt;a href=&quot;http://www.tomscott.com/&quot;&gt;Tom Scott&lt;/a&gt;&apos;s video on the subject. It&apos;s fun and it summarizes remarkably well the nightmare that timezones are and why you should stop thinking that you&apos;re smart.&lt;/p&gt;
&lt;h2&gt;The importance of timezones in applications&lt;/h2&gt;
&lt;p&gt;Once you&apos;ve heard what Tom says, I think it gets pretty clear that a timestamp without any timezone attached does not give any useful information. It should be considered irrelevant and useless. Without the necessary context given by the timezone, you cannot infer what point in time your application is really referring to.&lt;/p&gt;
&lt;p&gt;That means your application should never handle timestamps with no timezone information. It should try to guess or raises an error if no timezone is provided in any input.&lt;/p&gt;
&lt;p&gt;Of course, you can infer that having no timezone information means UTC. This sounds very handy, but can also be dangerous in certain applications or language – such as Python, as we&apos;ll see.&lt;/p&gt;
&lt;p&gt;Indeed, in certain applications, converting timestamps to UTC and losing the timezone information is a terrible idea. Imagine that a user create a recurring event every Wednesday at 10:00 in its local timezone, say CET. If you convert that to UTC, the event will end up being stored as every Wednesday at 09:00.&lt;/p&gt;
&lt;p&gt;Now imagine that the CET timezone switches from UTC+01:00 to UTC+02:00: your application will compute that the event starts at 11:00 CET every Wednesday. Which is wrong, because as the user told you, the event starts at 10:00 CET, whatever the definition of CET is. Not at 11:00 CET. So CET means CET, not necessarily UTC+1.&lt;/p&gt;
&lt;p&gt;As for endpoints like REST API, a thing I daily deal with, all timestamps should include a timezone information. It&apos;s nearly impossible to know what timezone the timestamps are in otherwise: UTC? Server local? User local? No way to know.&lt;/p&gt;
&lt;h2&gt;Python design &amp;amp; defect&lt;/h2&gt;
&lt;p&gt;Python comes with a timestamp object named &lt;code&gt;datetime.datetime&lt;/code&gt;. It can store date and time precise to the microsecond, and is qualified of timezone &quot;aware&quot; or &quot;unaware&quot;, whether it embeds a timezone information or not.&lt;/p&gt;
&lt;p&gt;To build such an object based on the current time, one can use &lt;code&gt;datetime.datetime.utcnow()&lt;/code&gt; to retrieve the date and time for the UTC timezone, and &lt;code&gt;datetime.datetime.now()&lt;/code&gt; to retrieve the date and time for the current timezone, whatever it is.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import datetime
&amp;gt;&amp;gt;&amp;gt; datetime.datetime.utcnow()
datetime.datetime(2015, 6, 15, 13, 24, 48, 27631)
&amp;gt;&amp;gt;&amp;gt; datetime.datetime.now()
datetime.datetime(2015, 6, 15, 15, 24, 52, 276161)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can notice, none of these results contains timezone information. Indeed, Python &lt;code&gt;datetime&lt;/code&gt; API always returns unaware &lt;code&gt;datetime&lt;/code&gt; objects, which is very unfortunate. Indeed, as soon as you get one of this object, there is no way to know what the timezone is, therefore these objects are pretty &quot;useless&quot; on their own.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://lucumr.pocoo.org/2011/7/15/eppur-si-muove/&quot;&gt;Armin Ronacher proposes that an application always consider that the unaware &lt;code&gt;datetime&lt;/code&gt; objects from Python are considered as UTC&lt;/a&gt;. As we just saw, that statement cannot be considered true for objects returned by &lt;code&gt;datetime.datetime.now()&lt;/code&gt;, so I would not advise doing so. &lt;code&gt;datetime&lt;/code&gt; objects with no timezone should be considered as a &quot;bug&quot; in the application.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/timezone-map.jpg&quot; alt=&quot;timezone-map&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Recommendations&lt;/h2&gt;
&lt;p&gt;My recommendation list comes down to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Always use aware &lt;code&gt;datetime&lt;/code&gt; object, i.e. with timezone information. That makes sure you can compare them directly (aware and unaware &lt;code&gt;datetime&lt;/code&gt; objects are not comparable) and will return them correctly to users. Leverage &lt;a href=&quot;http://pytz.sourceforge.net/&quot;&gt;pytz&lt;/a&gt; to have timezone objects.&lt;/li&gt;
&lt;li&gt;Use &lt;a href=&quot;https://en.wikipedia.org/wiki/ISO_8601&quot;&gt;ISO 8601&lt;/a&gt; as input and output string format. Use &lt;code&gt;datetime.datetime.isoformat()&lt;/code&gt; to return timestamps as string formatted using that format, which includes the timezone information.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In Python, that&apos;s equivalent to having:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import datetime
&amp;gt;&amp;gt;&amp;gt; import pytz
&amp;gt;&amp;gt;&amp;gt; def utcnow():
    return datetime.datetime.now(tz=pytz.utc)
&amp;gt;&amp;gt;&amp;gt; utcnow()
datetime.datetime(2015, 6, 15, 14, 45, 19, 182703, tzinfo=&amp;lt;UTC&amp;gt;)
&amp;gt;&amp;gt;&amp;gt; utcnow().isoformat()
&apos;2015-06-15T14:45:21.982600+00:00&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you need to parse strings containing ISO 8601 formatted timestamp, you can rely on the &lt;em&gt;&lt;a href=&quot;https://pypi.python.org/pypi/iso8601&quot;&gt;iso8601&lt;/a&gt;&lt;/em&gt;, which returns timestamps with correct timezone information. This makes timestamps directly comparable:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import iso8601
&amp;gt;&amp;gt;&amp;gt; iso8601.parse_date(utcnow().isoformat())
datetime.datetime(2015, 6, 15, 14, 46, 43, 945813, tzinfo=&amp;lt;FixedOffset &apos;+00:00&apos; datetime.timedelta(0)&amp;gt;)
&amp;gt;&amp;gt;&amp;gt; iso8601.parse_date(utcnow().isoformat()) &amp;lt; utcnow()
True
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you need to store those timestamps, the same rule should apply. If you rely on &lt;a href=&quot;http://mongodb.org&quot;&gt;MongoDB&lt;/a&gt;, it assumes that all the timestamp are in UTC, so be careful when storing them – you will have to normalize the timestamp to UTC.&lt;/p&gt;
&lt;p&gt;For &lt;a href=&quot;http://mysql.org&quot;&gt;MySQL&lt;/a&gt;, nothing is assumed, it&apos;s up to the application to insert them in a timezone that makes sense to it. Obviously, if you have multiple applications accessing the same database with different data sources, this can end up being a nightmare.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://postgresql.org&quot;&gt;PostgreSQL&lt;/a&gt; has a &lt;a href=&quot;http://www.postgresql.org/docs/9.4/static/datatype-datetime.html&quot;&gt;special data type that is recommended&lt;/a&gt; called &lt;code&gt;timestamp with timezone&lt;/code&gt;. That does not mean you should not use UTC in most cases; that just means you are sure that the timestamp are stored in UTC since it&apos;s written in the database, and you check if any other application inserted timestamps with different timezone.&lt;/p&gt;
&lt;h2&gt;OpenStack status&lt;/h2&gt;
&lt;p&gt;As a side note, I&apos;ve improved OpenStack situation recently by changing the &lt;a href=&quot;http://docs.openstack.org/developer/oslo.utils/api/timeutils.html&quot;&gt;oslo.utils.timeutils&lt;/a&gt; module to deprecate some useless and dangerous functions. I&apos;ve also added support for returning timezone aware objects when using the &lt;code&gt;oslo_utils.timeutils.utcnow()&lt;/code&gt; function. It&apos;s not possible to make it a default unfortunately for backward compatibility reason, but it&apos;s there nevertheless, and it&apos;s advised to use it. Thanks to my colleague &lt;a href=&quot;http://haypo-notes.readthedocs.org/&quot;&gt;Victor&lt;/a&gt; for the help!&lt;/p&gt;
&lt;p&gt;Have a nice day, whatever your timezone is!&lt;/p&gt;
</content:encoded><category>python</category><category>openstack</category></item><item><title>Get back up and try again: retrying in Python</title><link>https://julien.danjou.info/blog/python-retrying/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-retrying/</guid><description>The library presented in this article is becoming obsolete and un-maintained. I recommend you to read this post about tenacity   instead.  I don&apos;t often write about tools I use when for my daily softw</description><pubDate>Tue, 02 Jun 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;The library presented in this article is becoming obsolete and un-maintained. I recommend you to read this post about &lt;a href=&quot;https://julien.danjou.info/blog/python-tenacity&quot;&gt;tenacity&lt;/a&gt; instead.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I don&apos;t often write about tools I use when for my daily software development tasks. I recently realized that I really should start to share more often my workflows and weapons of choice.&lt;/p&gt;
&lt;p&gt;One thing that I have a hard time enduring while doing Python code reviews, is people writing utility code that is not directly tied to the core of their business. This looks to me as wasted time maintaining code that should be reused from elsewhere.&lt;/p&gt;
&lt;p&gt;So today I&apos;d like to start with &lt;a href=&quot;https://pypi.python.org/pypi/retrying&quot;&gt;retrying&lt;/a&gt;, a Python package that you can use to… retry anything.&lt;/p&gt;
&lt;h3&gt;It&apos;s OK to fail&lt;/h3&gt;
&lt;p&gt;Often in computing, you have to deal with external resources. That means accessing resources you don&apos;t control. Resources that can fail, become flapping, unreachable or unavailable.&lt;/p&gt;
&lt;p&gt;Most applications don&apos;t deal with that at all, and explode in flight, leaving a skeptical user in front of the computer. A lot of software engineers refuse to deal with failure, and don&apos;t bother handling this kind of scenario in their code.&lt;/p&gt;
&lt;p&gt;In the best case, applications usually handle simply the case where the external reached system is out of order. They log something, and inform the user that it should try again later.&lt;/p&gt;
&lt;p&gt;In this cloud computing area, we tend to design software components with &lt;a href=&quot;https://en.wikipedia.org/wiki/Service-oriented_architecture&quot;&gt;service-oriented architecture&lt;/a&gt; in mind. That means having a lot of different services talking to each others over the network. And we all know that networks tend to fail, and distributed systems too. Writing software with failing being part of normal operation is a terrific idea.&lt;/p&gt;
&lt;h3&gt;Retrying&lt;/h3&gt;
&lt;p&gt;In order to help applications with the handling of these potential failures, you need a plan. Leaving to the user the burden to &quot;try again later&quot; is rarely a good choice. Therefore, most of the time you want your application to &lt;em&gt;retry&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Retrying an action is a full strategy on its own, with a lot of options. You can retry only on certain condition, and with the number of tries based on time (e.g. every second), based on a number of tentative (e.g. retry 3 times and abort), based on the problem encountered, or even on all of those.&lt;/p&gt;
&lt;p&gt;For all of that, I use the &lt;a href=&quot;https://github.com/rholder/retrying&quot;&gt;retrying&lt;/a&gt; library that you can retrieve easily on &lt;a href=&quot;https://pypi.python.org/pypi/retrying&quot;&gt;PyPI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;retrying&lt;/em&gt; provides a decorator called &lt;code&gt;retry&lt;/code&gt; that you can use on top of any function or method in Python to make it retry in case of failure. By default, &lt;code&gt;retry&lt;/code&gt; calls your function endlessly until it returns rather than raising an error.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import random
from retrying import retry

@retry
def pick_one():
    if random.randint(0, 10) != 1:
        raise Exception(&quot;1 was not picked&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will execute the function &lt;code&gt;pick_one&lt;/code&gt; until &lt;code&gt;1&lt;/code&gt; is returned by &lt;code&gt;random.randint&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;retry&lt;/code&gt; accepts a few arguments, such as the minimum and maximum delays to use, which also can be randomized. Randomizing delay is a good strategy to avoid detectable pattern or congestion. But more over, it supports exponential delay, which can be used to implement &lt;a href=&quot;https://en.wikipedia.org/wiki/Exponential_backoff&quot;&gt;exponential backoff&lt;/a&gt;, a good solution for retrying tasks while really avoiding congestion. It&apos;s especially handy for background tasks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@retry(wait_exponential_multiplier=1000, wait_exponential_max=10000)
def wait_exponential_1000():
    print &quot;Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards&quot;
    raise Exception(&quot;Retry!&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can mix that with a maximum delay, which can give you a good strategy to retry for a while, and then fail anyway:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Stop retrying after 30 seconds anyway
&amp;gt;&amp;gt;&amp;gt; @retry(wait_exponential_multiplier=1000, wait_exponential_max=10000, stop_max_delay=30000)
... def wait_exponential_1000():
...     print &quot;Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards&quot;
...     raise Exception(&quot;Retry!&quot;)
...
&amp;gt;&amp;gt;&amp;gt; wait_exponential_1000()
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
  File &quot;/usr/local/lib/python2.7/site-packages/retrying.py&quot;, line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File &quot;/usr/local/lib/python2.7/site-packages/retrying.py&quot;, line 212, in call
    raise attempt.get()
  File &quot;/usr/local/lib/python2.7/site-packages/retrying.py&quot;, line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File &quot;/usr/local/lib/python2.7/site-packages/retrying.py&quot;, line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 4, in wait_exponential_1000
  Exception: Retry!
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A pattern I use very often, is the ability to retry only based on some exception type. You can specify a function to filter out exception you want to ignore or the one you want to use to retry.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def retry_on_ioerror(exc):
    return isinstance(exc, IOError)

@retry(retry_on_exception=retry_on_ioerror)
def read_file():
    with open(&quot;myfile&quot;, &quot;r&quot;) as f:
        return f.read()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;retry&lt;/code&gt; will call the function passed as &lt;code&gt;retry_on_exception&lt;/code&gt; with the exception raised as first argument. It&apos;s up to the function to then return a boolean indicating if a retry should be performed or not. In the example above, this will only retry to read the file if an &lt;code&gt;IOError&lt;/code&gt; occurs; if any other exception type is raised, no retry will be performed.&lt;/p&gt;
&lt;p&gt;The same pattern can be implemented using the keyword argument &lt;code&gt;retry_on_result&lt;/code&gt;, where you can provide a function that analyses the result and retry based on it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def retry_if_file_empty(result):
    return len(result) &amp;lt;= 0

@retry(retry_on_result=retry_if_file_empty)
def read_file():
    with open(&quot;myfile&quot;, &quot;r&quot;) as f:
        return f.read()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This example will read the file until it stops being empty. If the file does not exist, an &lt;code&gt;IOError&lt;/code&gt; is raised, and the default behavior which triggers retry on all exceptions kicks-in – the retry is therefore performed.&lt;/p&gt;
&lt;p&gt;That&apos;s it! &lt;code&gt;retry&lt;/code&gt; is really a good and small library that you should leverage rather than implementing your own half-baked solution!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>My interview about software tests and Python</title><link>https://julien.danjou.info/blog/interview-software-tests-in-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/interview-software-tests-in-python/</guid><description>Johannes Hubertz interviewed me for his upcoming German book about Python software testing, covering my work on OpenStack and testing best practices.</description><pubDate>Mon, 11 May 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve recently been contacted by &lt;a href=&quot;http://hubertz.de/blog/&quot;&gt;Johannes Hubertz&lt;/a&gt;, who is writing a new book about Python in German called &lt;em&gt;&quot;Softwaretests mit Python&quot;&lt;/em&gt; which will be published by &lt;em&gt;Open Source Press, Munich&lt;/em&gt; this summer. His book will feature some interviews, and he was kind enough to let me write a bit about software testing. This is the interview that I gave for his book. Johannes translated to German and it will be included in Johannes&apos; book, and I decided to publish it on my blog today. Following is the original version.&lt;/p&gt;
&lt;h2&gt;How did you come to Python?&lt;/h2&gt;
&lt;p&gt;I don&apos;t recall exactly, but around ten years ago, I saw more and more people using it and decided to take a look. Back then, I was more used to Perl. I didn&apos;t really like Perl and was not getting a good grip on its object system.&lt;/p&gt;
&lt;p&gt;As soon as I found an idea to work on – if I remember correctly that was rebuildd – I started to code in Python, learning the language at the same time.&lt;/p&gt;
&lt;p&gt;I liked how Python worked, and how fast I was to able to develop and learn it, so I decided to keep using it for my next projects. I ended up diving into Python core for some reasons, even doing things like briefly hacking on projects like Cython at some point, and finally ended up working on OpenStack.&lt;/p&gt;
&lt;p&gt;OpenStack is a cloud computing platform entirely written in Python. So I&apos;ve been writing Python every day since working on it.&lt;/p&gt;
&lt;p&gt;That&apos;s what pushed me to write &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt; in 2013 and then self-publish it a year later in 2014, a book where I talk about doing smart and efficient Python.&lt;/p&gt;
&lt;p&gt;It had a great success, has even been translated in Chinese and Korean, so I&apos;m currently working on a second edition of the book. It has been an amazing adventure!&lt;/p&gt;
&lt;h2&gt;Zen of Python: Which line is the most important for you and why?&lt;/h2&gt;
&lt;p&gt;I like the &quot;There should be one – and preferably only one – obvious way to do it&quot;. The opposite is probably something that scared me in languages like Perl. But having one obvious way to do it is and something I tend to like in functional languages like Lisp, which are in my humble opinion, even better at that.&lt;/p&gt;
&lt;h2&gt;For a python newbie, what are the most difficult subjects in Python?&lt;/h2&gt;
&lt;p&gt;I haven&apos;t been a newbie since a while, so it&apos;s hard for me to say. I don&apos;t think the language is hard to learn. There are some subtlety in the language itself when you deeply dive into the internals, but for beginners most of the concept are pretty straight-forward. If I had to pick, in the language basics, the most difficult thing would be around the generator objects (yield).&lt;/p&gt;
&lt;p&gt;Nowadays I think the most difficult subject for new comers is what version of Python to use, which libraries to rely on, and how to package and distribute projects. Though things get better, fortunately.&lt;/p&gt;
&lt;h2&gt;When did you start using Test Driven Development and why?&lt;/h2&gt;
&lt;p&gt;I learned unit testing and TDD at school where teachers forced me to learn Java, and I hated it. The frameworks looked complicated, and I had the impression I was losing my time. Which I actually was, since I was writing disposable programs – that&apos;s the only thing you do at school.&lt;/p&gt;
&lt;p&gt;Years later, when I started to write real and bigger programs (e.g. rebuildd), I quickly ended up fixing bugs… I already fixed. That recalled me about unit tests and that it may be a good idea to start using them to stop fixing the same things over and over again.&lt;/p&gt;
&lt;p&gt;For a few years, I wrote less Python and more C code and Lua (for the &lt;a href=&quot;http://awesome.naquadah.org&quot;&gt;awesome window manager&lt;/a&gt;), and I didn&apos;t use any testing. I probably lost hundreds of hours testing manually and fixing regressions – that was a good lesson. Though I had good excuses at that time – it is/was way harder to do testing in C/Lua than in Python.&lt;/p&gt;
&lt;p&gt;Since that period, I have never stopped writing &quot;tests&quot;. When I started to hack on OpenStack, the project was adopting a &quot;no test? no merge!&quot; policy due to the high number of regressions it had during the first releases.&lt;/p&gt;
&lt;p&gt;I honestly don&apos;t think I could work on any project that does not have – at least a minimal – test coverage. It&apos;s impossible to hack efficiently on a code base that you&apos;re not able to test in just a simple command. It&apos;s also a real problem for new comers in the open source world. When there are no test, you can hack something and send a patch, and get a &quot;you broke this&quot; in response.&lt;/p&gt;
&lt;p&gt;Nowadays, this kind of response sounds unacceptable to me: if there is no test, then I didn&apos;t break anything!&lt;/p&gt;
&lt;p&gt;In the end, it&apos;s just too much frustration to work on non tested projects as I demonstrated in &lt;a href=&quot;https://julien.danjou.info/blog/python-bad-practice-concrete-case&quot;&gt;my study of whisper source code&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;What do you think to be the most often seen pitfalls of TDD and how to avoid them best?&lt;/h2&gt;
&lt;p&gt;The biggest problems are when and at what rate writing tests.&lt;/p&gt;
&lt;p&gt;On one hand, some people starts to write too precise tests way too soon. Doing that slows you down, especially when you are prototyping some idea or concept you just had. That does not mean that you should not do test at all, but you should probably start with a light coverage, until you are pretty sure that you&apos;re not going to rip every thing and start over. On the other hand, some people postpone writing tests for ever, and end up with no test all or a too thin layer of test. Which makes the project with a pretty low coverage.&lt;/p&gt;
&lt;p&gt;Basically, your test coverage should reflect the state of your project. If it&apos;s just starting, you should build a thin layer of test so you can hack it on it easily and remodel it if needed. The more your project grow, the more you should make it sold and lay more tests.&lt;/p&gt;
&lt;p&gt;Having too detailed tests is painful to make the project evolve at the start. Having not enough in a big project makes it painful to maintain it.&lt;/p&gt;
&lt;h2&gt;Do you think, TDD fits and scales well for the big projects like OpenStack?&lt;/h2&gt;
&lt;p&gt;Not only I think it fits and scales well, but I also think it&apos;s just impossible to not use TDD in such big projects.&lt;/p&gt;
&lt;p&gt;When unit and functional tests coverage was weak in OpenStack – at its beginning – it was just impossible to fix a bug or write a new feature without breaking a lot of things without even noticing. We would release version N, and a ton of old bugs present in N-2 – but fixed in N-1 – were reopened.&lt;/p&gt;
&lt;p&gt;For big projects, with a lot of different use cases, configuration options, etc, you need belt and braces. You cannot throw code in a repository thinking it&apos;s going to work ever, and you can&apos;t afford to test everything manually at each commit. That&apos;s just insane.&lt;/p&gt;
</content:encoded><category>career</category><category>python</category><category>books</category><category>openstack</category></item><item><title>The Hacker&apos;s Guide to Python, 2nd edition!</title><link>https://julien.danjou.info/blog/the-hacker-guide-to-python-second-edition/</link><guid isPermaLink="true">https://julien.danjou.info/blog/the-hacker-guide-to-python-second-edition/</guid><description>A year passed since the first release of The Hacker&apos;s Guide to Python in March 2014. A few hundreds copies have been distributed so far, and the feedback is wonderful!</description><pubDate>Mon, 04 May 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A year passed since the &lt;a href=&quot;https://julien.danjou.info/blog/2014/the-hacker-guide-to-python-has-been-released&quot;&gt;first release of The Hacker&apos;s Guide to Python&lt;/a&gt; in March 2014. A few hundreds copies have been distributed so far, and the feedback is wonderful!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-python-darken-v2.png&quot; alt=&quot;the-hacker-guide-to-python-darken-v2&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I already wrote extensively about the &lt;a href=&quot;https://julien.danjou.info/blog/making-of-the-hacker-guide-to-python&quot;&gt;making of that book&lt;/a&gt; last year, and I cannot emphasize enough how this adventure has been amazing so far. That&apos;s why I decided a few months ago to update the guide and add some new content.&lt;/p&gt;
&lt;p&gt;So let&apos;s talk about what&apos;s new in this second edition of the book!&lt;/p&gt;
&lt;p&gt;First, I obviously fixed a few things. I had some reports about small mistakes and typos which I applied as I received them. Not a lot fortunately, but it&apos;s still better to have fewer errors in a book, right?&lt;/p&gt;
&lt;p&gt;Then, I updated some of the content. Things changed since I wrote the first chapters of that guide 18 months ago. Therefore I had to rewrite some of the sections and take into account new software or libraries that were released.&lt;/p&gt;
&lt;p&gt;At last, I decided to enhance the book with one more interview. I&apos;ve requested my fellow OpenStack developer &lt;a href=&quot;https://github.com/harlowja&quot;&gt;Joshua Harlow&lt;/a&gt;, who is leading a few interesting Python projects, to join the long list of interviewees in the book. I hope you&apos;ll enjoy it!&lt;/p&gt;
&lt;p&gt;If you didn&apos;t get the book yet, go &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;check it out&lt;/a&gt; and use the coupon &lt;strong&gt;THGTP2LAUNCH&lt;/strong&gt; to get 20% off during the next 48 hours!&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>Hacking Python AST: checking methods declaration</title><link>https://julien.danjou.info/blog/python-ast-checking-method-declaration/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-ast-checking-method-declaration/</guid><description>A few months ago, I wrote the definitive guide about Python method declaration, which had quite a good success.</description><pubDate>Mon, 16 Feb 2015 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few months ago, I wrote &lt;a href=&quot;https://julien.danjou.info/blog/guide-python-static-class-abstract-methods&quot;&gt;the definitive guide about Python method declaration&lt;/a&gt;, which had quite a good success. I still fight every day in &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; to have the developers declare their methods correctly in the patches they submit.&lt;/p&gt;
&lt;h2&gt;Automation plan&lt;/h2&gt;
&lt;p&gt;The thing is, I really dislike doing the same things over and over again. Furthermore, I&apos;m not perfect either, and I miss a lot of these kind of problems in the reviews I made. So I decided to replace me by a program – a more scalable and less error-prone version of my brain.&lt;/p&gt;
&lt;p&gt;In OpenStack, we rely on &lt;em&gt;&lt;a href=&quot;http://flake8.readthedocs.org/en/2.2.3/&quot;&gt;flake8&lt;/a&gt;&lt;/em&gt; to do static analysis of our Python code in order to spot common programming mistakes.&lt;/p&gt;
&lt;p&gt;But we are really pedantic, so we wrote some extra hacking rules that we enforce on our code. To that end, we wrote a &lt;em&gt;flake8&lt;/em&gt; extension called &lt;a href=&quot;https://pypi.python.org/pypi/hacking&quot;&gt;hacking&lt;/a&gt;. I really like these rules, I even recommend to apply them in your own project. Though I might be biased or victim of Stockholm syndrome. Your call.&lt;/p&gt;
&lt;p&gt;Anyway, it&apos;s pretty clear that I need to add a check for method declaration in &lt;em&gt;hacking&lt;/em&gt;. Let&apos;s write a &lt;em&gt;flake8&lt;/em&gt; extension!&lt;/p&gt;
&lt;h2&gt;Typical error&lt;/h2&gt;
&lt;p&gt;The typical error I spot is the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class Foo(object):
    # self is not used, the method does not need
    # to be bound, it should be declared static
    def bar(self, a, b, c):
        return a + b - c
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That would be the correct version:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class Foo(object):
    @staticmethod
    def bar(a, b, c):
        return a + b - c
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This kind of mistake is not a show-stopper. It&apos;s just not optimized. Why you have to manually declare static or class methods might be a language issue, but I don&apos;t want to debate about Python misfeatures or design flaws.&lt;/p&gt;
&lt;h2&gt;Strategy&lt;/h2&gt;
&lt;p&gt;We could probably use some big magical regular expression to catch this problem. &lt;em&gt;flake8&lt;/em&gt; is based on the &lt;em&gt;&lt;a href=&quot;https://pypi.python.org/pypi/pep8&quot;&gt;pep8&lt;/a&gt;&lt;/em&gt; tool, which can do a line by line analysis of the code. But this method would make it very hard and error prone to detect this pattern.&lt;/p&gt;
&lt;p&gt;Though it&apos;s also possible to do an AST based analysis on on a per-file basis with &lt;em&gt;pep8&lt;/em&gt;. So that&apos;s the method I pick as it&apos;s the most solid.&lt;/p&gt;
&lt;h2&gt;AST analysis&lt;/h2&gt;
&lt;p&gt;I won&apos;t dive deeply into Python AST and how it works. You can find plenty of sources on the Internet, and I even talk about it a bit in my book &lt;em&gt;&lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/python-ast.png&quot; alt=&quot;python-ast&quot; /&gt;&lt;/p&gt;
&lt;p&gt;To check correctly if all the methods in a Python file are correctly declared, we need to do the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Iterate over all the statement node of the AST&lt;/li&gt;
&lt;li&gt;Check that the statement is a class definition (&lt;code&gt;ast.ClassDef&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Iterate over all the function definitions (&lt;code&gt;ast.FunctionDef&lt;/code&gt;) of that class statement to check if it is already declared with &lt;code&gt;@staticmethod&lt;/code&gt; or not&lt;/li&gt;
&lt;li&gt;If the method is not declared static, we need to check if the first argument (&lt;code&gt;self&lt;/code&gt;) is used somewhere in the method&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;em&gt;Flake8&lt;/em&gt; plugin&lt;/h2&gt;
&lt;p&gt;In order to register a new plugin in &lt;em&gt;flake8&lt;/em&gt; via &lt;em&gt;hacking&lt;/em&gt;, we just need to add an entry in &lt;code&gt;setup.cfg&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[entry_points]
flake8.extension =
    […]
    H904 = hacking.checks.other:StaticmethodChecker
    H905 = hacking.checks.other:StaticmethodChecker
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We register 2 &lt;em&gt;hacking&lt;/em&gt; codes here. As you will notice later, we are actually going to add an extra check in our code for the same price. Stay tuned.&lt;/p&gt;
&lt;p&gt;The next step is to write the actual plugin. Since we are using an AST based check, the plugin needs to be a class following a certain signature:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@core.flake8ext
class StaticmethodChecker(object):
    def __init__(self, tree, filename):
        self.tree = tree

    def run(self):
        pass
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So far, so good and pretty easy. We store the tree locally, then we just need to use it in &lt;code&gt;run()&lt;/code&gt; and &lt;code&gt;yield&lt;/code&gt; the problem we discover following &lt;code&gt;pep8&lt;/code&gt; expected signature, which is a tuple of &lt;code&gt;(lineno, col_offset, error_string, code)&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;This AST is made for walking ♪ ♬ ♩&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;ast&lt;/code&gt; module provides the &lt;code&gt;walk&lt;/code&gt; function, that allow to iterate easily on a tree. We&apos;ll use that to run through the AST. First, let&apos;s write a loop that ignores the statement that are not class definition.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@core.flake8ext
class StaticmethodChecker(object):
    def __init__(self, tree, filename):
        self.tree = tree

    def run(self):
        for stmt in ast.walk(self.tree):
            # Ignore non-class
            if not isinstance(stmt, ast.ClassDef):
                continue
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We still don&apos;t check for anything, but we know how to ignore statement that are not class definitions. The next step need to be to ignore what is not function definition. We just iterate over the attributes of the class definition.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for stmt in ast.walk(self.tree):
    # Ignore non-class
    if not isinstance(stmt, ast.ClassDef):
        continue
    # If it&apos;s a class, iterate over its body member to find methods
    for body_item in stmt.body:
        # Not a method, skip
        if not isinstance(body_item, ast.FunctionDef):
            continue
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&apos;re all set for checking the method, which is &lt;code&gt;body_item&lt;/code&gt;. First, we need to check if it&apos;s already declared as static. If so, we don&apos;t have to do any further check and we can bail out.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for stmt in ast.walk(self.tree):
    # Ignore non-class
    if not isinstance(stmt, ast.ClassDef):
        continue
    # If it&apos;s a class, iterate over its body member to find methods
    for body_item in stmt.body:
        # Not a method, skip
        if not isinstance(body_item, ast.FunctionDef):
            continue
        # Check that it has a decorator
        for decorator in body_item.decorator_list:
            if (isinstance(decorator, ast.Name)
               and decorator.id == &apos;staticmethod&apos;):
                # It&apos;s a static function, it&apos;s OK
                break
        else:
            # Function is not static, we do nothing for now
            pass
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that we use the special &lt;code&gt;for/else&lt;/code&gt; form of Python, where the &lt;code&gt;else&lt;/code&gt; is evaluated unless we used &lt;code&gt;break&lt;/code&gt; to exit the &lt;code&gt;for&lt;/code&gt; loop.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for stmt in ast.walk(self.tree):
    # Ignore non-class
    if not isinstance(stmt, ast.ClassDef):
        continue
    # If it&apos;s a class, iterate over its body member to find methods
    for body_item in stmt.body:
        # Not a method, skip
        if not isinstance(body_item, ast.FunctionDef):
            continue
        # Check that it has a decorator
        for decorator in body_item.decorator_list:
            if (isinstance(decorator, ast.Name)
               and decorator.id == &apos;staticmethod&apos;):
                # It&apos;s a static function, it&apos;s OK
                break
        else:
            try:
                first_arg = body_item.args.args[0]
            except IndexError:
                yield (
                    body_item.lineno,
                    body_item.col_offset,
                    &quot;H905: method misses first argument&quot;,
                    &quot;H905&quot;,
                )
                # Check next method
                continue
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We finally added some check! We grab the first argument from the method signature. Unless it fails, and in that case, we know there&apos;s a problem: you can&apos;t have a bound method without the &lt;code&gt;self&lt;/code&gt; argument, therefore we raise the &lt;code&gt;H905&lt;/code&gt; code to signal a method that misses its first argument.&lt;/p&gt;
&lt;p&gt;Now you know why we registered this second &lt;code&gt;pep8&lt;/code&gt; code along with &lt;code&gt;H904&lt;/code&gt; in &lt;code&gt;setup.cfg&lt;/code&gt;. We have here a good opportunity to kill two birds with one stone.&lt;/p&gt;
&lt;p&gt;The next step is to check if that first argument is used in the code of the method.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for stmt in ast.walk(self.tree):
    # Ignore non-class
    if not isinstance(stmt, ast.ClassDef):
        continue
    # If it&apos;s a class, iterate over its body member to find methods
    for body_item in stmt.body:
        # Not a method, skip
        if not isinstance(body_item, ast.FunctionDef):
            continue
        # Check that it has a decorator
        for decorator in body_item.decorator_list:
            if (isinstance(decorator, ast.Name)
               and decorator.id == &apos;staticmethod&apos;):
                # It&apos;s a static function, it&apos;s OK
                break
        else:
            try:
                first_arg = body_item.args.args[0]
            except IndexError:
                yield (
                    body_item.lineno,
                    body_item.col_offset,
                    &quot;H905: method misses first argument&quot;,
                    &quot;H905&quot;,
                )
                # Check next method
                continue
            for func_stmt in ast.walk(body_item):
                if six.PY3:
                    if (isinstance(func_stmt, ast.Name)
                       and first_arg.arg == func_stmt.id):
                        # The first argument is used, it&apos;s OK
                        break
                else:
                    if (func_stmt != first_arg
                       and isinstance(func_stmt, ast.Name)
                       and func_stmt.id == first_arg.id):
                        # The first argument is used, it&apos;s OK
                        break
            else:
                yield (
                    body_item.lineno,
                    body_item.col_offset,
                    &quot;H904: method should be declared static&quot;,
                    &quot;H904&quot;,
                )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To that end, we iterate using &lt;code&gt;ast.walk&lt;/code&gt; again and we look for the use of the same variable named (usually &lt;code&gt;self&lt;/code&gt;, but if could be anything, like &lt;code&gt;cls&lt;/code&gt; for &lt;code&gt;@classmethod&lt;/code&gt;) in the body of the function. If not found, we finally yield the &lt;code&gt;H904&lt;/code&gt; error code. Otherwise, we&apos;re good.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I&apos;ve &lt;a href=&quot;https://review.openstack.org/#/c/151952/&quot;&gt;submitted this patch to &lt;em&gt;hacking&lt;/em&gt;&lt;/a&gt;, and, fingers crossed, it might be merged one day. If it&apos;s not I&apos;ll create a new Python package with that check for flake8. The actual submitted code is a bit more complex to take into account the use of &lt;a href=&quot;https://docs.python.org/2/library/abc.html&quot;&gt;&lt;code&gt;abc&lt;/code&gt;&lt;/a&gt; module and include some tests.&lt;/p&gt;
&lt;p&gt;As you may have notice, the code walks over the module AST definition several times. There might be a couple of optimization to browse the AST in only one pass, but I&apos;m not sure it&apos;s worth it considering the actual usage of the tool. I&apos;ll let that as an exercise for the reader interested in contributing to OpenStack. 😉&lt;/p&gt;
&lt;p&gt;Happy hacking!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Distributed group management and locking in Python with tooz</title><link>https://julien.danjou.info/blog/python-distributed-membership-lock-with-tooz/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-distributed-membership-lock-with-tooz/</guid><description>With OpenStack embracing the Tooz library more and more over the past year, I think it&apos;s a good start to write a bit about it.</description><pubDate>Fri, 21 Nov 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;With &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; embracing the &lt;a href=&quot;http://launchpad.net/python-tooz&quot;&gt;Tooz&lt;/a&gt; library more and more over the past year, I think it&apos;s a good start to write a bit about it.&lt;/p&gt;
&lt;h2&gt;A bit of history&lt;/h2&gt;
&lt;p&gt;A little more than year ago, with my colleague Yassine Lamgarchal and others at &lt;a href=&quot;http://enovance.com&quot;&gt;eNovance&lt;/a&gt;, we investigated on how to solve a problem often encountered inside OpenStack: synchronization of multiple distributed workers. And while many people in our ecosystem continue to drive development by adding new bells and whistles, we made a point of solving new problems with a generic solution able to address the technical debt at the same time.&lt;/p&gt;
&lt;p&gt;Yassine wrote the first ideas of what should be the &lt;a href=&quot;https://wiki.openstack.org/wiki/Oslo/blueprints/service-sync&quot;&gt;group membership service&lt;/a&gt; that was needed for OpenStack, identifying several projects that could make use of this. I&apos;ve presented this concept during the &lt;a href=&quot;https://www.openstack.org/summit/openstack-summit-hong-kong-2013/&quot;&gt;OpenStack Summit in Hong-Kong&lt;/a&gt; during an Oslo session. It turned out that the idea was well-received, and the week following the summit we started the &lt;a href=&quot;http://launchpad.net/python-tooz&quot;&gt;tooz&lt;/a&gt; project on &lt;a href=&quot;http://ci.openstack.org/stackforge.html&quot;&gt;StackForge&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Goals&lt;/h2&gt;
&lt;p&gt;Tooz is a Python library that provides a coordination API. Its primary goal is to handle groups and membership of these groups in distributed systems.&lt;/p&gt;
&lt;p&gt;Tooz also provides another useful feature which is distributed locking. This allows distributed nodes to acquire and release locks in order to synchronize themselves (for example to access a shared resource).&lt;/p&gt;
&lt;h2&gt;The architecture&lt;/h2&gt;
&lt;p&gt;If you are familiar with distributed systems, you might be thinking that there are a lot of solutions already available to solve these issues: &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;ZooKeeper&lt;/a&gt;, the &lt;a href=&quot;http://raftconsensus.github.io/&quot;&gt;Raft consensus algorithm&lt;/a&gt; or even &lt;a href=&quot;http://redis.io/&quot;&gt;Redis&lt;/a&gt; for example.&lt;/p&gt;
&lt;p&gt;You&apos;ll be thrilled to learn that Tooz is not the result of the &lt;a href=&quot;http://en.wikipedia.org/wiki/Not_invented_here&quot;&gt;NIH&lt;/a&gt; syndrome, but is an abstraction layer on top of all these solutions. It uses drivers to provide the real functionalities behind, and does not try to do anything fancy.&lt;/p&gt;
&lt;p&gt;All the drivers do not have the same amount of functionality of robustness, but depending on your environment, any available driver might be suffice. Like most of OpenStack, we let the deployers/operators/developers chose whichever backend they want to use, informing them of the potential trade-offs they will make.&lt;/p&gt;
&lt;p&gt;So far, Tooz provides drivers based on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://pypi.python.org/pypi/kazoo&quot;&gt;Kazoo&lt;/a&gt; (ZooKeeper)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pypi.python.org/pypi/zake&quot;&gt;Zake&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://memcached.org&quot;&gt;memcached&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://redis.io&quot;&gt;redis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.tldp.org/LDP/lpg/node21.html&quot;&gt;SysV IPC&lt;/a&gt; (only for distributed locks for now)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://postgresql.org&quot;&gt;PostgreSQL&lt;/a&gt; (only for distributed locks for now)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://mysql.org&quot;&gt;MySQL&lt;/a&gt; (only for distributed locks for now)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All drivers are distributed across processes. Some can be distributed across the network (ZooKeeper, memcached, redis…) and some are only available on the same host (IPC).&lt;/p&gt;
&lt;p&gt;Also note that the Tooz API is completely asynchronous, allowing it to be more efficient, and potentially included in an event loop.&lt;/p&gt;
&lt;h2&gt;Features&lt;/h2&gt;
&lt;h3&gt;Group membership&lt;/h3&gt;
&lt;p&gt;Tooz provides an API to manage group membership. The basic operations provided are: the creation of a group, the ability to join it, leave it and list its members. It&apos;s also possible to be notified as soon as a member joins or leaves a group.&lt;/p&gt;
&lt;h3&gt;Leader election&lt;/h3&gt;
&lt;p&gt;Each group can have a leader elected. Each member can decide if it wants to run for the election. If the leader disappears, another one is elected from the list of current candidates. It&apos;s possible to be notified of the election result and to retrieve the leader of a group at any moment.&lt;/p&gt;
&lt;h3&gt;Distributed locking&lt;/h3&gt;
&lt;p&gt;When trying to synchronize several workers in a distributed environment, you may need a way to lock access to some resources. That&apos;s what a distributed lock can help you with.&lt;/p&gt;
&lt;h2&gt;Adoption in OpenStack&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;http://launchpad.net/ceilometer&quot;&gt;Ceilometer&lt;/a&gt; is the first project in OpenStack to use Tooz. It has replaced part of the old alarm distribution system, where RPC was used to detect active alarm evaluator workers. The group membership feature of Tooz was leveraged by Ceilometer to coordinate between alarm evaluator workers.&lt;/p&gt;
&lt;p&gt;Another new feature part of the Juno release of Ceilometer is the distribution of polling tasks of the central agent among multiple workers. There&apos;s again a group membership issue to know which nodes are online and available to receive polling tasks, so Tooz is also being used here.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;http://wiki.openstack.org/Oslo&quot;&gt;Oslo&lt;/a&gt; team &lt;a href=&quot;https://review.openstack.org/#/c/122439/&quot;&gt;has accepted the adoption of Tooz&lt;/a&gt; during this release cycle. That means that it will be maintained by more developers, and will be part of the OpenStack release process.&lt;/p&gt;
&lt;p&gt;This opens the door to push Tooz further in OpenStack. Our next candidate would be write a service group driver for &lt;a href=&quot;http://launchpad.net/nova&quot;&gt;Nova&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;http://tooz.rtfd.org/&quot;&gt;complete documentation for Tooz is available online&lt;/a&gt; and has examples for the various features described here, go read it if you&apos;re curious and adventurous!&lt;/p&gt;
</content:encoded><category>python</category><category>openstack</category></item><item><title>Python bad practice, a concrete case</title><link>https://julien.danjou.info/blog/python-bad-practice-concrete-case/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-bad-practice-concrete-case/</guid><description>A lot of people read up on good Python practice, and there&apos;s plenty of information about that on the Internet. Many tips are included in the book I wrote this year, The Hacker&apos;s Guide to Python.</description><pubDate>Mon, 15 Sep 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A lot of people read up on good Python practice, and there&apos;s plenty of information about that on the Internet. Many tips are included in the book I wrote this year, &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;. Today I&apos;d like to show a concrete case of code that I don&apos;t consider being the state of the art.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/python-thumb-down.png&quot; alt=&quot;python-thumb-down&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In my &lt;a href=&quot;https://julien.danjou.info/blog/openstack-ceilometer-the-gnocchi-experiment&quot;&gt;last article&lt;/a&gt; where I talked about my new project Gnocchi, I wrote about how I tested, hacked and then ditched &lt;em&gt;&lt;a href=&quot;http://graphite.wikidot.com/whisper&quot;&gt;whisper&lt;/a&gt;&lt;/em&gt; out. Here I&apos;m going to explain part of my thought process and a few things that raised my eyebrows when hacking this code.&lt;/p&gt;
&lt;p&gt;Before I start, please don&apos;t get the spirit of this article wrong. It&apos;s in no way a personal attack to the authors and contributors (who I don&apos;t know). Furthermore, &lt;em&gt;whisper&lt;/em&gt; is a piece of code that is in production in thousands of installation, storing metrics for years. While I can argue that I consider the code not to be following best practice, it definitely works well enough and is worthy to a lot of people.&lt;/p&gt;
&lt;h2&gt;Tests&lt;/h2&gt;
&lt;p&gt;The first thing that I noticed when trying to hack on &lt;em&gt;whisper&lt;/em&gt;, is the lack of test. There&apos;s only one file containing tests, named &lt;code&gt;test_whisper.py&lt;/code&gt;, and the coverage it provides is pretty low. One can check that using the &lt;em&gt;coverage&lt;/em&gt; tool.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ coverage run test_whisper.py
...........
----------------------------------------------------------------------
Ran 11 tests in 0.014s

OK
$ coverage report
Name           Stmts   Miss  Cover
----------------------------------
test_whisper     134      4    97%
whisper          584    227    61%
----------------------------------
TOTAL            718    231    67%
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While one would think that 61% is &quot;not so bad&quot;, taking a quick peak at the actual test code shows that the tests are incomplete. Why I mean by incomplete is that they for example use the library to store values into a database, but they never check if the results can be fetched and if the fetched results are accurate. Here&apos;s a good reason one should never blindly trust the test cover percentage as a quality metric.&lt;/p&gt;
&lt;p&gt;When I tried to modify &lt;em&gt;whisper&lt;/em&gt;, as the tests do not check the entire cycle of the values fed into the database, I ended up doing wrong changes but had the tests still pass.&lt;/p&gt;
&lt;h2&gt;No PEP 8, no Python 3&lt;/h2&gt;
&lt;p&gt;The code doesn&apos;t respect PEP 8 . A run of &lt;a href=&quot;https://flake8.readthedocs.org/&quot;&gt;flake8&lt;/a&gt; + &lt;a href=&quot;https://pypi.python.org/pypi/hacking&quot;&gt;hacking&lt;/a&gt; shows 732 errors… While it does not impact the code itself, it&apos;s more painful to hack on it than it is on most Python projects.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;hacking&lt;/em&gt; tool also shows that the code is not Python 3 ready as there is usage of Python 2 only syntax.&lt;/p&gt;
&lt;p&gt;A good way to fix that would be to set up &lt;a href=&quot;https://testrun.org/tox/latest/&quot;&gt;tox&lt;/a&gt; and adds a few targets for PEP 8 checks and Python 3 tests. Even if the test suite is not complete, starting by having flake8 run without errors and the few unit tests working with Python 3 should put the project in a better light.&lt;/p&gt;
&lt;h2&gt;Not using idiomatic Python&lt;/h2&gt;
&lt;p&gt;A lot of the code could be simplified by using idiomatic Python. Let&apos;s take a simple example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def fetch(path,fromTime,untilTime=None,now=None):
  fh = None
  try:
    fh = open(path,&apos;rb&apos;)
    return file_fetch(fh, fromTime, untilTime, now)
  finally:
    if fh:
      fh.close()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That piece of code could be easily rewritten as:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def fetch(path,fromTime,untilTime=None,now=None):
  with open(path, &apos;rb&apos;) as fh:
    return file_fetch(fh, fromTime, untilTime, now)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This way, the function looks actually so simple that one can even wonder why it should exists – but why not.&lt;/p&gt;
&lt;p&gt;Usage of loops could also be made more Pythonic:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for i,archive in enumerate(archiveList):
  if i == len(archiveList) - 1:
    break
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;could be actually:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for archive in itertools.islice(archiveList, len(archiveList) - 1):
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That reduce the code size and makes it easier to read through the code.&lt;/p&gt;
&lt;h2&gt;Wrong abstraction level&lt;/h2&gt;
&lt;p&gt;Also, one thing that I noticed in &lt;em&gt;whisper&lt;/em&gt;, is that it abstracts its features at the wrong level.&lt;/p&gt;
&lt;p&gt;Take the &lt;code&gt;create()&lt;/code&gt; function, it&apos;s pretty obvious:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def create(path,archiveList,xFilesFactor=None,aggregationMethod=None,sparse=False,useFallocate=False):
  # Set default params
  if xFilesFactor is None:
    xFilesFactor = 0.5
  if aggregationMethod is None:
    aggregationMethod = &apos;average&apos;

  #Validate archive configurations...
  validateArchiveList(archiveList)

  #Looks good, now we create the file and write the header
  if os.path.exists(path):
    raise InvalidConfiguration(&quot;File %s already exists!&quot; % path)
  fh = None
  try:
    fh = open(path,&apos;wb&apos;)
    if LOCK:
      fcntl.flock( fh.fileno(), fcntl.LOCK_EX )

    aggregationType = struct.pack( longFormat, aggregationMethodToType.get(aggregationMethod, 1) )
    oldest = max([secondsPerPoint * points for secondsPerPoint,points in archiveList])
    maxRetention = struct.pack( longFormat, oldest )
    xFilesFactor = struct.pack( floatFormat, float(xFilesFactor) )
    archiveCount = struct.pack(longFormat, len(archiveList))
    packedMetadata = aggregationType + maxRetention + xFilesFactor + archiveCount
    fh.write(packedMetadata)
    headerSize = metadataSize + (archiveInfoSize * len(archiveList))
    archiveOffsetPointer = headerSize

    for secondsPerPoint,points in archiveList:
      archiveInfo = struct.pack(archiveInfoFormat, archiveOffsetPointer, secondsPerPoint, points)
      fh.write(archiveInfo)
      archiveOffsetPointer += (points * pointSize)

    #If configured to use fallocate and capable of fallocate use that, else
    #attempt sparse if configure or zero pre-allocate if sparse isn&apos;t configured.
    if CAN_FALLOCATE and useFallocate:
      remaining = archiveOffsetPointer - headerSize
      fallocate(fh, headerSize, remaining)
    elif sparse:
      fh.seek(archiveOffsetPointer - 1)
      fh.write(&apos;\x00&apos;)
    else:
      remaining = archiveOffsetPointer - headerSize
      chunksize = 16384
      zeroes = &apos;\x00&apos; * chunksize
      while remaining &amp;gt; chunksize:
        fh.write(zeroes)
        remaining -= chunksize
      fh.write(zeroes[:remaining])

    if AUTOFLUSH:
      fh.flush()
      os.fsync(fh.fileno())
  finally:
    if fh:
      fh.close()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The function is doing &lt;strong&gt;everything&lt;/strong&gt;: checking if the file doesn&apos;t exist already, opening it, building the structured data, writing this, building more structure, then writing that, etc.&lt;/p&gt;
&lt;p&gt;That means that the caller has to give a file path, even if it just wants a &lt;em&gt;whipser&lt;/em&gt; data structure to store itself elsewhere. &lt;code&gt;StringIO()&lt;/code&gt; could be used to fake a file handler, but it will fail if the call to &lt;code&gt;fcntl.flock()&lt;/code&gt; is not disabled – and it is inefficient anyway.&lt;/p&gt;
&lt;p&gt;There&apos;s a lot of other functions in the code, such as for example &lt;code&gt;setAggregationMethod()&lt;/code&gt;, that mixes the handling of the files – even doing things like &lt;code&gt;os.fsync()&lt;/code&gt; – while manipulating structured data. This is definitely not a good design, especially for a library, as it turns out reusing the function in different context is near impossible.&lt;/p&gt;
&lt;h2&gt;Race conditions&lt;/h2&gt;
&lt;p&gt;There are race conditions, for example in &lt;code&gt;create()&lt;/code&gt; (see added comment):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;if os.path.exists(path):
  raise InvalidConfiguration(&quot;File %s already exists!&quot; % path)
fh = None
try:
  # TOO LATE I ALREADY CREATED THE FILE IN ANOTHER PROCESS YOU ARE GOING TO
  # FAIL WITHOUT GIVING ANY USEFUL INFORMATION TO THE CALLER :-(
  fh = open(path,&apos;wb&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That code should be:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;try:
  fh = os.fdopen(os.open(path, os.O_WRONLY | os.O_CREAT | os.O_EXCL), &apos;wb&apos;)
except OSError as e:
  if e.errno == errno.EEXIST:
    raise InvalidConfiguration(&quot;File %s already exists!&quot; % path)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;to avoid any race condition.&lt;/p&gt;
&lt;h2&gt;Unwanted optimization&lt;/h2&gt;
&lt;p&gt;We saw earlier the &lt;code&gt;fetch()&lt;/code&gt; function that is barely useful, so let&apos;s take a look at the &lt;code&gt;file_fetch()&lt;/code&gt; function that it&apos;s calling.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def file_fetch(fh, fromTime, untilTime, now = None):
  header = __readHeader(fh)
[...]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first thing the function does is to read the header from the file handler.&lt;/p&gt;
&lt;p&gt;Let&apos;s take a look at that function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def __readHeader(fh):
  info = __headerCache.get(fh.name)
  if info:
    return info

  originalOffset = fh.tell()
  fh.seek(0)
  packedMetadata = fh.read(metadataSize)

  try:
    (aggregationType,maxRetention,xff,archiveCount) = struct.unpack(metadataFormat,packedMetadata)
  except:
    raise CorruptWhisperFile(&quot;Unable to read header&quot;, fh.name)
[...]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first thing the function does is to look into a cache. Why is there a cache?&lt;/p&gt;
&lt;p&gt;It actually caches the header based with an index based on the file path (&lt;code&gt;fh.name&lt;/code&gt;). Except that if one for example decide not to use file and cheat using &lt;code&gt;StringIO&lt;/code&gt;, then it does not have any name attribute. So this code path will raise an &lt;code&gt;AttributeError&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;One has to set a fake name manually on the &lt;code&gt;StringIO&lt;/code&gt; instance, and it must be unique so nobody messes with the cache&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import StringIO

packedMetadata = &amp;lt;some source&amp;gt;
fh = StringIO.StringIO(packedMetadata)
fh.name = &quot;myfakename&quot;
header = __readHeader(fh)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The cache may actually be useful when accessing files, but it&apos;s definitely useless when not using files. But it&apos;s not necessarily true that the complexity (even if small) that the cache adds is worth it. I doubt most of &lt;em&gt;whisper&lt;/em&gt; based tools are long run processes, so the cache that is really used when accessing the files is the one handled by the operating system kernel, and this one is going to be much more efficient anyway, and shared between processed. There&apos;s also no expiry of that cache, which could end up of tons of memory used and wasted.&lt;/p&gt;
&lt;h2&gt;Docstrings&lt;/h2&gt;
&lt;p&gt;None of the docstrings are written in a a parsable syntax like &lt;a href=&quot;http://sphinx-doc.org/&quot;&gt;Sphinx&lt;/a&gt;. This means you cannot generate any documentation in a nice format that a developer using the library could read easily.&lt;/p&gt;
&lt;p&gt;The documentation is also not up to date:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def fetch(path,fromTime,untilTime=None,now=None):
  &quot;&quot;&quot;fetch(path,fromTime,untilTime=None)
[...]
&quot;&quot;&quot;

def create(path,archiveList,xFilesFactor=None,aggregationMethod=None,sparse=False,useFallocate=False):
  &quot;&quot;&quot;create(path,archiveList,xFilesFactor=0.5,aggregationMethod=&apos;average&apos;)
[...]
&quot;&quot;&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is something that could be avoided if a proper format was picked to write the docstring. A tool cool be used to be noticed when there&apos;s a diversion between the actual function signature and the documented one, like missing an argument.&lt;/p&gt;
&lt;h2&gt;Duplicated code&lt;/h2&gt;
&lt;p&gt;Last but not least, there&apos;s a lot of code that is duplicated around in the scripts provided by &lt;em&gt;whisper&lt;/em&gt; in its &lt;code&gt;bin&lt;/code&gt; directory. Theses scripts should be very lightweight and be using the &lt;code&gt;console_scripts&lt;/code&gt; facility of &lt;em&gt;setuptools&lt;/em&gt;, but they actually contains a lot of (untested) code. Furthermore, some of that code is partially duplicated from the &lt;code&gt;whisper.py&lt;/code&gt; library which is against &lt;a href=&quot;http://en.wikipedia.org/wiki/Don&apos;t_repeat_yourself&quot;&gt;DRY&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;There are a few more things that made me stop considering &lt;em&gt;whisper&lt;/em&gt;, but these are part of the &lt;em&gt;whisper&lt;/em&gt; features, not necessarily code quality. One can also point out that the code is very condensed and hard to read, and that&apos;s a more general problem about how it is organized and abstracted.&lt;/p&gt;
&lt;p&gt;A lot of these defects are actually points that made me start writing &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt; a year ago.&lt;br /&gt;
Running into this kind of code makes me think it was a really good idea to write a book on advice to write better Python code!&lt;/p&gt;
</content:encoded><category>python</category><category>monitoring</category></item><item><title>Making of The Hacker&apos;s Guide to Python</title><link>https://julien.danjou.info/blog/making-of-the-hacker-guide-to-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/making-of-the-hacker-guide-to-python/</guid><description>As promised, today I would like to write a bit about the making of The Hacker&apos;s Guide to Python. It has been a very interesting experimentation, and I think it is worth sharing it with you.</description><pubDate>Wed, 07 May 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;As promised, today I would like to write a bit about the making of &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;. It has been a very interesting experimentation, and I think it is worth sharing it with you.&lt;/p&gt;
&lt;h2&gt;The inspiration&lt;/h2&gt;
&lt;p&gt;All started out at the beginning of August 2013. I was spending my summer, as the rest of the year, hacking on OpenStack.&lt;/p&gt;
&lt;p&gt;As years passed, I got more and more deeply involved in the various tools that we either built or contributed to within the OpenStack community. And I somehow got the feeling that my experience with Python, the way we used it inside OpenStack and other applications during these last years was worth sharing. Worth writing something bigger than a few blog posts.&lt;/p&gt;
&lt;p&gt;The OpenStack project is doing code reviews, and therefore so did I for almost two years. That inspired a lot of topics, like &lt;a href=&quot;https://julien.danjou.info/blog/guide-python-static-class-abstract-methods&quot;&gt;the definitive guide to method decorators&lt;/a&gt; that I wrote at the time I started the hacker&apos;s guide. Stumbling upon the same mistakes or misunderstanding over and over is, somehow, inspiring.&lt;/p&gt;
&lt;p&gt;I also stumbled upon &lt;a href=&quot;http://nathanbarry.com&quot;&gt;Nathan Barry&lt;/a&gt;&apos;s blog and book &lt;a href=&quot;http://nathanbarry.com/authority/&quot;&gt;Authority&lt;/a&gt; which were very helpful to get started and some sort of guideline.&lt;/p&gt;
&lt;p&gt;All of that brought me enough ideas to start writing a book about Python software development for people already familiar with the language.&lt;/p&gt;
&lt;h2&gt;The writing&lt;/h2&gt;
&lt;p&gt;The first thing I started to do is to list all the topics I wanted to write about. The list turned out to have subjects that had no direct interest for a practical guide. For example, on one hand, very few developers know in details how metaclasses work, but on the other hand, I never had to write a metaclass during these last years. That&apos;s the kind of subject I decided not to write about, dropped all subjects that I felt were not going to help my reader to be more productive. Even if they could be technically interesting.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-python-opened.png&quot; alt=&quot;the-hacker-guide-to-python-opened&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Then, I gathered all problems I saw during the code reviews I did during these last two years. Some of them I only recalled in the days following the beginning of that project. But I kept adding them to the table of contents, reorganizing stuff as needed.&lt;/p&gt;
&lt;p&gt;After a couple of weeks, I had a pretty good overview of the contents that there I will write about. All I had to do was to fill in the blank (that sounds so simple now).&lt;/p&gt;
&lt;p&gt;The entire writing of the took hundred hours spread from August to November, during my spare time. I had to stop all my other side projects for that.&lt;/p&gt;
&lt;h2&gt;The interviews&lt;/h2&gt;
&lt;p&gt;While writing the book, I tried to parallelize every thing I could. That included asking people for interviews to be included in the book. I already had a pretty good list of the people I wanted to feature in the book, so I took some time as soon as possible to ask them, and send them detailed questions.&lt;/p&gt;
&lt;p&gt;I discovered two categories of interviewees. Some of them were very fast to answer (≤ 1 week), and others were much, much slower. A couple of them even set up Git repositories to answer the questions, because that probably looked like an entire project to them. :-) So I had to not lose sight and kindly ask from time to time if everything was alright, and at some point started to kindly set some deadline.&lt;/p&gt;
&lt;p&gt;In the end, the quality of the answers was awesome, and I like to think that was because I picked the right people!&lt;/p&gt;
&lt;h2&gt;The proof-reading&lt;/h2&gt;
&lt;p&gt;Once the book was finished, I somehow needed to have people proof-reading it. This was probably the hardest part of this experiment. I needed two different types of reviews: technical reviews, to check that the content was correct and interesting, and language review. That one is even more important since English is not my native language.&lt;/p&gt;
&lt;p&gt;Finding technical reviewers seemed easy at first, as I had ton of contacts that I identified as being able to review the book. I started by asking a few people if they would be comfortable reading a simple chapter and giving me feedbacks. I started to do that in September: having the writing and the reviews done in parallel was important to me in order to minimize latency and the book&apos;s release delay.&lt;/p&gt;
&lt;p&gt;All people I contacted answered positively that they would be interested in doing a technical review of a chapter. So I started to send chapters to them. But in the end, only 20% replied back. And even after that, a large portion stopped reviewing after a couple of chapters.&lt;/p&gt;
&lt;p&gt;Don&apos;t get me wrong: you can&apos;t be mad at people not wanting to spend their spare time in book edition like you do.&lt;/p&gt;
&lt;p&gt;However, from the few people that gave their time to review a few chapters, I got tremendous feedback, at all level. That&apos;s something that was very important and that helped a lot getting confident. Writing a book alone for months without having anyone taking a look upon your shoulder can make you doubt that you are creating something worth it.&lt;/p&gt;
&lt;p&gt;As far as English proof-reading, I went ahead and used &lt;a href=&quot;http://odesk.com&quot;&gt;ODesk&lt;/a&gt; to recruit a professional proof-reader. I looked for people with the right skills: a good English level (being a native English speaker at least), be able to understand what the book was about, and being able to work with correct delays. I had mixed results from the people I hired, but I guess that&apos;s normal. The only error I made was not to parallelize those reviews enough, so I probably lost a couple of months on&lt;br /&gt;
that.&lt;/p&gt;
&lt;h2&gt;The toolchain&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-python-2.png&quot; alt=&quot;the-hacker-guide-to-python-2&quot; /&gt;&lt;/p&gt;
&lt;p&gt;While writing the book, I did a few breaks to build a toolchain. What I call a toolchain is set of tools used to render the final PDF, EPUB and MOBI files of the guide.&lt;/p&gt;
&lt;p&gt;After some research, I decided to settle on &lt;a href=&quot;http://www.methods.co.nz/asciidoc/&quot;&gt;AsciiDoc&lt;/a&gt;, using the &lt;a href=&quot;http://www.docbook.org&quot;&gt;DocBook&lt;/a&gt; output, which is then being transformed to LaTeX, and then to PDF, or either to EPUB directly. I rely on &lt;a href=&quot;http://calibre-ebook.com/&quot;&gt;Calibre&lt;/a&gt; to convert the EPUB file to MOBI. It took me a few hours to do what I wanted, using some magic LaTeX tricks to have a proper render, but it was worth it and I&apos;m particularly happy with the result.&lt;/p&gt;
&lt;p&gt;For the cover design, I asked my talented friend &lt;a href=&quot;http://nicolas-veyret.com/&quot;&gt;Nicolas&lt;/a&gt; to do something for me, and he designed the wonderful cover and its little snake!&lt;/p&gt;
&lt;h2&gt;The publishing&lt;/h2&gt;
&lt;p&gt;Publishing in an interesting topic people kept asking me about. This is what I had to answer a few dozens of time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&quot;Who is your editor?&quot;&lt;/li&gt;
&lt;li&gt;&quot;Me.&quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I never had any plan for asking an editor to publish this book. Nowadays, asking an editor to publish a book feels to me like asking a major company to publish a CD. It feels awkward.&lt;/p&gt;
&lt;p&gt;However, don&apos;t get me wrong: there can be a few upsides of having an editor. They will find reviewers and review your book for you. Having the book review handled for you is probably a very good thing, considering how it was hard to me to get that in place. It can be especially important for a technical book.&lt;/p&gt;
&lt;p&gt;Also, your book may end up in brick and mortar stores and be part of a collection, both improving visibility. That may improve your book&apos;s selling, though the editor and all the intermediaries are going to keep the largest amount of the money anyway.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&quot;Oh, you will publish it yourself, great. So you will print it and sell it to people?&quot;&lt;/li&gt;
&lt;li&gt;&quot;Not really.&quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&apos;ve heard good stories about people using &lt;a href=&quot;http://gumroad.com&quot;&gt;Gumroad&lt;/a&gt; to sell electronic contents, so after looking for competitors in that market, I picked them. I also had the idea to sell the book with Bitcoins, so I settled on &lt;a href=&quot;http://coinbase.com&quot;&gt;Coinbase&lt;/a&gt;, because they have a nice API to do that.&lt;/p&gt;
&lt;p&gt;Setting up everything was quite straight-forward, especially with Gumroad. It only took me a few hours to do so. Writing the Coinbase application took a few hours too.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&quot;Oh, you will sell it only as an ebook? That&apos;s too bad. You need a paper version. Many people will want a paper version.&quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My initial plan was to only sell online an electronic version. On the other hand, since I kept hearing that a printed version should exist, I decided to give it a try. I chose to work with &lt;a href=&quot;http://lulu.com&quot;&gt;Lulu&lt;/a&gt; because I knew people using it, and it was pretty simple to set up.&lt;/p&gt;
&lt;h2&gt;The launch&lt;/h2&gt;
&lt;p&gt;Once I had everything ready, I built the selling page and connected everything between Mailchimp, Gumroad, Coinbase, Google Analytics, etc.&lt;/p&gt;
&lt;p&gt;Writing the launch email was really exciting. I used Mailchimp feature to send the launch mail in several batches, just to have some margin in case of a sudden last minute problem. But everything went fine. Hurrah!&lt;/p&gt;
&lt;p&gt;I distributed around 200 copies of the ebook in the first 48 hours, for about $5000. That covered all the cost I had from the writing the book, and even more, so I was already pretty happy with the launch.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/thgtp-sell-graph.png&quot; alt=&quot;thgtp-sell-graph&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Retrospective&lt;/h2&gt;
&lt;p&gt;In retrospect, something that I didn&apos;t do the best way possible is probably to build a solid mailing list of people interested, and to build an important anticipation and incentive to buy the book at launch date. My mailing list counted around 1500 people subscribed because they were interested in the launch of the book subscribed; in the end, probably only 10-15% of them bought the book during the launch, which is probably a bit lower than what I could expect.&lt;/p&gt;
&lt;p&gt;But more than a month later, I distributed in total almost 500 copies of the book (including physical units) for more than $10000, so I tend to think that this was a success. I still sell a few copies of the book each weeks, but the number are small compared to the launch.&lt;/p&gt;
&lt;p&gt;I sold less than 10 copies of the ebook using Bitcoins, and I admit I&apos;m a bit disappointed and surprised about that.&lt;/p&gt;
&lt;p&gt;Physical copies represent 10% of the book distribution. It&apos;s probably a lot lower than most people that pushed me to do it thought it would be. But it is still higher of what I thought it would be. So I still would advise to have a paperback version of your book. At least because it&apos;s nice to have it&lt;br /&gt;
in your library.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/thgtp-paperback.jpg&quot; alt=&quot;thgtp-paperback&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I only got positive feedbacks, a few typo notices, and absolutely no refund demand, which I really find amazing.&lt;/p&gt;
&lt;p&gt;The good news is also that I&apos;ve been contacted with a couple of Korean and Chinese editors to get the book translated and published in those countries. If everything goes well, the book should be translated in the upcoming months and be available on these markets in 2015!&lt;/p&gt;
&lt;p&gt;If you didn&apos;t get a copy, &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;it&apos;s still time to do so&lt;/a&gt;!&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>The Hacker&apos;s Guide to Python released!</title><link>https://julien.danjou.info/blog/the-hacker-guide-to-python-has-been-released/</link><guid isPermaLink="true">https://julien.danjou.info/blog/the-hacker-guide-to-python-has-been-released/</guid><description>And done! It took me just 8 months to do this entire book project around Python. From the first day I started writing to today, where I finally publish and sell – almost entirely – myself this book. I</description><pubDate>Tue, 25 Mar 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;And done! It took me just 8 months to do this entire book project around Python. From the first day I started writing to today, where I finally publish and sell – almost entirely – myself this book. I&apos;m really proud of what I&apos;ve achieved so far, as this was something totally new to me.&lt;/p&gt;
&lt;p&gt;Doing all of that has been a great adventure, and I&apos;ll promise I&apos;ll write something about that later on. A making of.&lt;/p&gt;
&lt;p&gt;For now, you can enjoy reading the book and learn a bit more about Python. I really hope it&apos;ll help you bring your Python-fu to a new level, and help you build great projects!&lt;/p&gt;
&lt;p&gt;Go check it out, and since this is first day of sale, enjoy 20% off by using the offer code &lt;strong&gt;THGTP20&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/the-hacker-guide-to-python-4.png&quot; alt=&quot;Cover of The Hacker&apos;s Guide to Python&quot; /&gt;&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>Databases integration testing strategies with Python</title><link>https://julien.danjou.info/blog/db-integration-testing-strategies-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/db-integration-testing-strategies-python/</guid><description>The Ceilometer project supports various database backend that can be used as storage. Among them are MongoDB, SQLite MySQL, PostgreSQL, HBase, DB2… All Ceilometer&apos;s code is unit tested, but when.</description><pubDate>Mon, 06 Jan 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The &lt;a href=&quot;http://launchpad.net/ceilometer&quot;&gt;Ceilometer&lt;/a&gt; project supports various database backend that can be used as storage. Among them are &lt;a href=&quot;http://www.mongodb.org/&quot;&gt;MongoDB&lt;/a&gt;, &lt;a href=&quot;http://sqlite.org&quot;&gt;SQLite&lt;/a&gt; &lt;a href=&quot;http://mysql.com&quot;&gt;MySQL&lt;/a&gt;, &lt;a href=&quot;http://postgresql.org&quot;&gt;PostgreSQL&lt;/a&gt;, &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;HBase&lt;/a&gt;, DB2… All Ceilometer&apos;s code is unit tested, but when dealing with external storage services, one cannot be sure that the code is really working. You could be inserting data with an incorrect SQL statement, or in the wrong table. Only having the real database storage running and being used can tell you that.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/python_db_tests.png&quot; alt=&quot;python_db_tests&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Over the months, we developed integration testing on top of our unit testing to validate that our storage drivers are able to deal with real world databases. That is not really different from generic &lt;a href=&quot;http://en.wikipedia.org/wiki/Integration_testing&quot;&gt;integration testing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Integration testing is about plugging all the pieces of your software all together and running. In what I call &quot;database integration testing&quot;, the pieces will be both your software and the database system that you are going to rely on.&lt;/p&gt;
&lt;p&gt;The only difference here is that one of the module is not coming from the application itself but is an external project. The type of database that you use (RDBMS, NoSQL…) does not matter. Taking a step back, what I will describe here could also apply to a lot of other different software modules, even something that would not be a database sytem at all.&lt;/p&gt;
&lt;h3&gt;Writing tests for integration&lt;/h3&gt;
&lt;p&gt;Presumably, your Python application has unit tests. In order to test against a database back-end, you need to write a few specific classes of tests that will use the database subsystem for real. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import unittest
import os
import sqlalchemy

class TestDB(unittest.TestCase):
    def setUp(self):
       url = os.getenv(&quot;DB_TEST_URL&quot;)
       if not url:
           self.skipTest(&quot;No database URL set&quot;)
       self.engine = sqlalchemy.create_engine(url)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code will try to fetch the database URL to use from an environment variable, and then will rely on &lt;a href=&quot;http://sqlalchemy.org&quot;&gt;SQLAlchemy&lt;/a&gt; to create a database connection.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import unittest
import os
import sqlalchemy

import myapp

class TestDB(unittest.TestCase):
    def setUp(self):
       url = os.getenv(&quot;DB_TEST_URL&quot;)
       if not url:
           self.skipTest(&quot;No database URL set&quot;)
       self.engine = sqlalchemy.create_engine(url)

    def test_foobar(self):
        self.assertTrue(myapp.store_integer(self.engine, 42))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can then add as many tests as you want using the connection stored in &lt;code&gt;self.engine&lt;/code&gt;. If no test database URL is, the tests will be skipped; however that decision is up to you. You may want to have these tests always run and fail if they can&apos;t be run.&lt;/p&gt;
&lt;p&gt;In the &lt;code&gt;setUp()&lt;/code&gt; method, you may also need to do more work, like create a database and delete a database.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import unittest
import os
import sqlalchemy

class TestDB(unittest.TestCase):
    def setUp(self):
       url = os.getenv(&quot;DB_TEST_URL&quot;)
       if not url:
           self.skipTest(&quot;No database URL set&quot;)
       self.engine = sqlalchemy.create_engine(url)
       self.connection = self.engine.connect()
       self.connection.execute(&quot;CREATE DATABASE testdb&quot;)

    def tearDown(self):
        self.connection.execute(&quot;DROP DATABASE testdb&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will make sure that the database you need is clean and ready to be used to testing.&lt;/p&gt;
&lt;h3&gt;Launching modules, a.k.a. databases&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/postgresql.png&quot; alt=&quot;postgresql&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The main problem we encountered when building integration testing with databases, is to find a way to start them. Most users are used to start them system-wide with some sort of init script, but when running sandboxed tests, that is not really a good option. Browsing the documentation of each storage allowed us to find a way to start them in foreground and control them &quot;interactively&quot; via a shell script.&lt;/p&gt;
&lt;p&gt;The following is a script that you can use to run Python tests using &lt;a href=&quot;http://nose.readthedocs.org/&quot;&gt;nose&lt;/a&gt; and is heavily inspired by the one we wrote for Ceilometer.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/bin/bash
set -e

clean_exit() {
    local error_code=&quot;$?&quot;
    kill -9 $(jobs -p) &amp;gt;/dev/null 2&amp;gt;&amp;amp;1 || true
    rm -rf &quot;$PGSQL_DATA&quot;
    return $error_code
}

check_for_cmd () {
    if ! which &quot;$1&quot; &amp;gt;/dev/null 2&amp;gt;&amp;amp;1
    then
        echo &quot;Could not find $1 command&quot; 1&amp;gt;&amp;amp;2
        exit 1
    fi
}

wait_for_line () {
    while read line
    do
        echo &quot;$line&quot; | grep -q &quot;$1&quot; &amp;amp;&amp;amp; break
    done &amp;lt; &quot;$2&quot;
    # Read the fifo for ever otherwise process would block
    cat &quot;$2&quot; &amp;gt;/dev/null &amp;amp;
}

check_for_cmd postgres

trap &quot;clean_exit&quot; EXIT

## Start PostgreSQL process for tests
PGSQL_DATA=`mktemp -d /tmp/PGSQL-XXXXX`
PGSQL_PATH=`pg_config --bindir`
${PGSQL_PATH}/initdb ${PGSQL_DATA}
mkfifo ${PGSQL_DATA}/out
${PGSQL_PATH}/postgres -F -k ${PGSQL_DATA} -D ${PGSQL_DATA} &amp;amp;&amp;gt; ${PGSQL_DATA}/out &amp;amp;
## Wait for PostgreSQL to start listening to connections
wait_for_line &quot;database system is ready to accept connections&quot; ${PGSQL_DATA}/out
export DB_TEST_URL=&quot;postgresql:///?host=${PGSQL_DATA}&amp;amp;dbname=template1&quot;

## Run the tests
nosetests
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you use &lt;a href=&quot;http://tox.readthedocs.org&quot;&gt;tox&lt;/a&gt; to automatize your test run, you can use this scripts (I call it &lt;code&gt;run-test.sh&lt;/code&gt;) in your &lt;code&gt;tox.ini&lt;/code&gt; file.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[testenv]
commands = {toxinidir}/run-tests.sh {posargs}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/mysql.png&quot; alt=&quot;mysql&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Most databases are able to be run in some sort of standalone mode where you can connect to them using a either a Unix domain socket, or a fixed port. Here are the snippet used in Ceilometer to run with MongoDB and MySQL:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Start MongoDB process for tests
MONGO_DATA=$(mktemp -d /tmp/MONGODB-XXXXX)
MONGO_PORT=29000
mkfifo ${MONGO_DATA}/out
mongod --maxConns 32 --nojournal --noprealloc --smallfiles --quiet --noauth --port ${MONGO_PORT} --dbpath &quot;${MONGO_DATA}&quot; --bind_ip localhost &amp;amp;&amp;gt;${MONGO_DATA}/out &amp;amp;
## Wait for Mongo to start listening to connections
wait_for_line &quot;waiting for connections on port ${MONGO_PORT}&quot; ${MONGO_DATA}/out
export DB_TEST_URL=&quot;mongodb://localhost:${MONGO_PORT}/testdb&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/mongodb.png&quot; alt=&quot;mongodb&quot; /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Start MySQL process for tests
MYSQL_DATA=$(mktemp -d /tmp/MYSQL-XXXXX)
mkfifo ${MYSQL_DATA}/out
mysqld --datadir=${MYSQL_DATA} --pid-file=${MYSQL_DATA}/mysql.pid --socket=${MYSQL_DATA}/mysql.socket --skip-networking --skip-grant-tables &amp;amp;&amp;gt; ${MYSQL_DATA}/out &amp;amp;
## Wait for MySQL to start listening to connections
wait_for_line &quot;mysqld: ready for connections.&quot; ${MYSQL_DATA}/out
export DB_TEST_URL=&quot;mysql://root@localhost/testdb?unix_socket=${MYSQL_DATA}/mysql.socket&amp;amp;charset=utf8&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The mechanism is always the same. We create a &lt;em&gt;fifo&lt;/em&gt; with &lt;code&gt;mkfifo&lt;/code&gt;, and then run the database daemon with the output redirected to that fifo. We then read from it until we find a line stating the the database is ready to be used. At that point, we can continue and start running the tests. You have to read continuously from the fifo, otherwise the process writing to it will block. We redirect the output to &lt;code&gt;/dev/null&lt;/code&gt;, but you could also redirect it to a different log file, or not at all.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: &lt;a href=&quot;http://www.die-welt.net/&quot;&gt;Evgeni Golov&lt;/a&gt; pointed it exists a &lt;a href=&quot;https://alioth.debian.org/scm/loggerhead/pkg-postgresql/postgresql-common/trunk/view/head:/pg_virtualenv&quot;&gt;pg_virtualenv&lt;/a&gt; for PostgreSQL and &lt;a href=&quot;https://github.com/evgeni/my_virtualenv&quot;&gt;my_virtualenv&lt;/a&gt; for MySQL that does the same kind of thing, but with more bells and whistles.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;One step further: using parallelism and scenarios&lt;/h3&gt;
&lt;p&gt;The described approach is quite simple, as it only support one database type. When using an abstraction layer, such as SQLAlchemy, it would be a good idea to run all these tests against different RDBMS, such as MySQL and PostgreSQL for example.&lt;/p&gt;
&lt;p&gt;The snippet above allows to run both RDBMS in parallel, but the classic approach of unit tests does not allow that. Using one scenario for each database backend would be a great idea. To that end, you can use the &lt;a href=&quot;https://launchpad.net/testscenarios&quot;&gt;testscenarios&lt;/a&gt; library.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import unittest
import os
import sqlalchemy
import testscenarios

load_tests = testscenarios.load_tests_apply_scenarios

class TestDB(unittest.TestCase):
    scenarios = [
        (&apos;mysql&apos;, dict(database_connection=os.getenv(&quot;MYSQL_TEST_URL&quot;)),
        (&apos;postgresql&apos;, dict(database_connection=os.getenv(&quot;PGSQL_TEST_URL&quot;)),
    ]

    def setUp(self):
       if not self.database_connection:
           self.skipTest(&quot;No database URL set&quot;)
       self.engine = sqlalchemy.create_engine(self.database_connection)
       self.connection = self.engine.connect()
       self.connection.execute(&quot;CREATE DATABASE testdb&quot;)

    def tearDown(self):
        self.connection.execute(&quot;DROP DATABASE testdb&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;$ python -m subunit.run test_scenario | subunit2pyunit
test_scenario.TestDB.test_foobar(mysql)
test_scenario.TestDB.test_foobar(mysql) ... ok
test_scenario.TestDB.test_foobar(postgresql)
test_scenario.TestDB.test_foobar(postgresql) ... ok

---------------------------------------------------------
Ran 2 tests in 0.061s

OK
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To speed up tests run, you could also run the test in parallel. It can be intesting as you&apos;ll be able to spread the workload among a lot of different CPUs. However, note that it can require a different database for each test or a locking mechanism to be in place. It&apos;s likely that your tests won&apos;t be able to work altogether at the same time on only one database.&lt;/p&gt;
&lt;p&gt;(Both usage of scenarios and parallelism in testing will be covered in &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;,&lt;br /&gt;
in case you wonder.)&lt;/p&gt;
</content:encoded><category>python</category><category>openstack</category></item><item><title>Python 3.4 single dispatch, a step into generic functions</title><link>https://julien.danjou.info/blog/python-3-4-single-dispatch-generic-function/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-3-4-single-dispatch-generic-function/</guid><description>I love to say that Python is a nice subset of Lisp, and I discover that it&apos;s getting even more true as time passes.</description><pubDate>Tue, 17 Sep 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I love to say that Python is a nice subset of Lisp, and I discover that it&apos;s getting even more true as time passes. Recently, I&apos;ve stumbled upon the &lt;a href=&quot;http://python.org/dev/peps/pep-0443/&quot;&gt;PEP 443&lt;/a&gt; that describes a way to dispatch generic functions, in a way that looks like what CLOS, the Common Lisp Object System, provides.&lt;/p&gt;
&lt;h2&gt;What are generic functions&lt;/h2&gt;
&lt;p&gt;If you come from the Lisp world, this won&apos;t be something new to you. The Lisp object system provides a really good way to define and handle method dispatching. It&apos;s a base of the Common Lisp object system. For my own pleasure to see Lisp code in a Python post, I&apos;ll show you how generic methods work in Lisp first.&lt;/p&gt;
&lt;p&gt;To begin, let&apos;s define a few very simple classes.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(defclass snare-drum ()
  ())

(defclass cymbal ()
  ())

(defclass stick ()
  ())

(defclass brushes ()
  ())
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This defines a few classes: &lt;code&gt;snare-drum&lt;/code&gt;, &lt;code&gt;symbal&lt;/code&gt;, &lt;code&gt;stick&lt;/code&gt; and &lt;code&gt;brushes&lt;/code&gt;, without any parent class nor attribute. These classes compose a drum kit, and we can combine them to play sound. So we define a &lt;code&gt;play&lt;/code&gt; method that takes two arguments, and returns a sound (as a string).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(defgeneric play (instrument accessory)
  (:documentation &quot;Play sound with instrument and accessory.&quot;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This only defines a generic method: it has no body, and cannot be called with any instance yet. At this stage, we only inform the object system that the method is generic and can be then implemented with various type of arguments. We&apos;ll start by implementing versions of this method that knows how to play with the snare-drum.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(defmethod play ((instrument snare-drum) (accessory stick))
  &quot;POC!&quot;)

(defmethod play ((instrument snare-drum) (accessory brushes))
  &quot;SHHHH!&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we just defined concrete methods with code. They also takes two arguments: &lt;code&gt;instrument&lt;/code&gt; which is an instance of &lt;code&gt;snare-drum&lt;/code&gt; and &lt;code&gt;accessory&lt;/code&gt; that is an instance of &lt;code&gt;stick&lt;/code&gt; or &lt;code&gt;brushes&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;At this stage, you should note the first difference with object system as built into language like Python: the method isn&apos;t tied to any class in particular. The methods are &lt;em&gt;generic&lt;/em&gt;, and any class can implement them, or not.&lt;/p&gt;
&lt;p&gt;Let&apos;s try it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;* (play (make-instance &apos;snare-drum) (make-instance &apos;stick))
&quot;POC!&quot;

* (play (make-instance &apos;snare-drum) (make-instance &apos;brushes))
&quot;SHHHH!&quot;

* (play (make-instance &apos;cymbal) (make-instance &apos;stick))
debugger invoked on a SIMPLE-ERROR in thread
#&amp;lt;THREAD &quot;main thread&quot; RUNNING {1002ADAF23}&amp;gt;:
  There is no applicable method for the generic function
    #&amp;lt;STANDARD-GENERIC-FUNCTION PLAY (2)&amp;gt;
  when called with arguments
    (#&amp;lt;CYMBAL {1002B801D3}&amp;gt; #&amp;lt;STICK {1002B82763}&amp;gt;).

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RETRY] Retry calling the generic function.
  1: [ABORT] Exit debugger, returning to top level.

((:METHOD NO-APPLICABLE-METHOD (T)) #&amp;lt;STANDARD-GENERIC-FUNCTION PLAY (2)&amp;gt; #&amp;lt;CYMBAL {1002B801D3}&amp;gt; #&amp;lt;STICK {1002B82763}&amp;gt;) [fast-method]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you see, the function called depends on the class of the arguments. The object systems &lt;strong&gt;dispatch&lt;/strong&gt; the function calls to the right function for us, depending on the arguments classes. If we call &lt;code&gt;play&lt;/code&gt; with instances that are not know to the object system, an error will be thrown.&lt;/p&gt;
&lt;p&gt;Inheritance is also supported and the equivalent (but more powerful and less error prone) equivalent of Python&apos;s &lt;code&gt;super()&lt;/code&gt; is available via &lt;code&gt;(call-next-method)&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(defclass snare-drum () ())
(defclass cymbal () ())

(defclass accessory () ())
(defclass stick (accessory) ())
(defclass brushes (accessory) ())

(defmethod play ((c cymbal) (a accessory))
  &quot;BIIING!&quot;)

(defmethod play ((c cymbal) (b brushes))
  (concatenate &apos;string &quot;SSHHHH!&quot; (call-next-method)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this example, we define the &lt;code&gt;stick&lt;/code&gt; and &lt;code&gt;brushes&lt;/code&gt; classes as subclass of the &lt;code&gt;accessory&lt;/code&gt; class. The &lt;code&gt;play&lt;/code&gt; method defined will return the sound &lt;em&gt;BIIING!&lt;/em&gt; regardless of the accessory instance that is used to play the cymbal. Except in the case where it&apos;s a &lt;code&gt;brushes&lt;/code&gt; instance; only the most precise method is always called. The &lt;code&gt;(call-next-method)&lt;/code&gt; function is used to call the closest parent method, in this case that would be the method returning _&quot;BIIING!&quot;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;* (play (make-instance &apos;cymbal) (make-instance &apos;stick))
&quot;BIIING!&quot;

* (play (make-instance &apos;cymbal) (make-instance &apos;brushes))
&quot;SSHHHH!BIIING!&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that CLOS is also able to dispatch on object instances themself by using the &lt;code&gt;eql&lt;/code&gt; specializer.&lt;/p&gt;
&lt;p&gt;But if you&apos;re really curious about all features CLOS provides, I suggest you read the &lt;a href=&quot;http://www.aiai.ed.ac.uk/~jeff/clos-guide.html&quot;&gt;brief guide to CLOS by Jeff Dalton&lt;/a&gt; as a starter.&lt;/p&gt;
&lt;h2&gt;Python implementation&lt;/h2&gt;
&lt;p&gt;Python implements a simpler equivalence of this workflow with the &lt;code&gt;singledispatch&lt;/code&gt; function. It will be provided with Python 3.4 as part of the &lt;code&gt;functools&lt;/code&gt; module. Here&apos;s a rough equivalence of the above Lisp program.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import functools

class SnareDrum(object): pass
class Cymbal(object): pass
class Stick(object): pass
class Brushes(object): pass

@functools.singledispatch
def play(instrument, accessory):
    raise NotImplementedError(&quot;Cannot play these&quot;)

@play.register(SnareDrum)
def _(instrument, accessory):
    if isinstance(accessory, Stick):
        return &quot;POC!&quot;
    if isinstance(accessory, Brushes):
        return &quot;SHHHH!&quot;
    raise NotImplementedError(&quot;Cannot play these&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We define our four classes, and a base &lt;code&gt;play&lt;/code&gt; function that raises &lt;code&gt;NotImplementedError&lt;/code&gt;, indicating that by default we don&apos;t know what to do. We can then write specialized version of this function with a first instrument, the &lt;code&gt;SnareDrum&lt;/code&gt;. We then check for the accessory type that we get, and return the appropriate sound or raise &lt;code&gt;NotImplementedError&lt;/code&gt; again if we don&apos;t know what to do with it.&lt;/p&gt;
&lt;p&gt;If we run it, it works as expected:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; play(SnareDrum(), Stick())
&apos;POC!&apos;
&amp;gt;&amp;gt;&amp;gt; play(SnareDrum(), Brushes())
&apos;SHHHH!&apos;
&amp;gt;&amp;gt;&amp;gt; play(Cymbal(), Brushes())
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
  File &quot;/home/jd/Source/cpython/Lib/functools.py&quot;, line 562, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File &quot;/home/jd/sd.py&quot;, line 10, in play
    raise NotImplementedError(&quot;Cannot play these&quot;)
NotImplementedError: Cannot play these
&amp;gt;&amp;gt;&amp;gt; play(SnareDrum(), Cymbal())
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
  File &quot;/home/jd/Source/cpython/Lib/functools.py&quot;, line 562, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File &quot;/home/jd/sd.py&quot;, line 18, in _
    raise NotImplementedError(&quot;Cannot play these&quot;)
NotImplementedError: Cannot play these
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;singledispatch&lt;/code&gt; module looks through the classes of the first argument passed to the &lt;code&gt;play&lt;/code&gt; function, and calls the right version of it. The first defined version of the &lt;code&gt;play&lt;/code&gt; function is always run for the &lt;code&gt;object&lt;/code&gt; class, so if our instrument is a class that we did not register for, this base function will be called.&lt;/p&gt;
&lt;p&gt;For whose eager to try and use it, the &lt;code&gt;singledispatch&lt;/code&gt; function is &lt;a href=&quot;https://pypi.python.org/pypi/singledispatch/&quot;&gt;provided Python 2.6 to 3.3 through the Python Package Index&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Limitations&lt;/h2&gt;
&lt;p&gt;First, as you noticed in the Lisp version, CLOS provides a multiple dispatcher that can dispatch on the type of &lt;strong&gt;any of the argument&lt;/strong&gt; defined in the method prototype, not only the first one. Unfortunately, Python dispatcher is named &lt;em&gt;singledispatch&lt;/em&gt; for this good reason: it only knows to dispatch on the first argument. Guido van Rossum wrote a short article about the subject that he called &lt;a href=&quot;http://www.artima.com/weblogs/viewpost.jsp?thread=101605&quot;&gt;multimethod&lt;/a&gt; a few years ago.&lt;/p&gt;
&lt;p&gt;Then, there&apos;s no way to call the parent function directly. There&apos;s no equivalent of the &lt;code&gt;(call-next-method)&lt;/code&gt; from Lisp nor the &lt;code&gt;super()&lt;/code&gt; function that allows to do that in Python class system. This means you will have to use various trick to bypass this limitation.&lt;/p&gt;
&lt;p&gt;So while I am really glad that Python is going toward that direction, as it&apos;s a really powerful way to enhance an object system, it really lacks a lot of more advanced features that CLOS provides out of the box.&lt;/p&gt;
&lt;p&gt;Though, improving this could be an interesting challenge. Especially to bring more CLOS power to &lt;a href=&quot;http://hylang.org&quot;&gt;Hy&lt;/a&gt;. :-)&lt;/p&gt;
</content:encoded><category>python</category><category>lisp</category></item><item><title>Announcing The Hacker&apos;s Guide to Python</title><link>https://julien.danjou.info/blog/announcing-the-hacker-guide-to-python/</link><guid isPermaLink="true">https://julien.danjou.info/blog/announcing-the-hacker-guide-to-python/</guid><description>I&apos;ve been hacking on Python for a lot of years now, on various project. For the last two years, I&apos;ve been heavily involved in OpenStack, which makes an heavy usage of Python.</description><pubDate>Tue, 03 Sep 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve been hacking on Python for a lot of years now, on various project. For the last two years, I&apos;ve been heavily involved in &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt;, which makes an heavy usage of Python.&lt;/p&gt;
&lt;p&gt;Once you start working with a hundred of hackers, on several software and libraries representing more than half a million source lines of Python, things change. The scalability, testing and deployment problems inherent to a cloud platform meddle with everything in designing components.&lt;/p&gt;
&lt;p&gt;During these two years working on OpenStack development, I&apos;ve learned a lot on Python from astounding Python hackers. From general architecture and design principles to various tips and tricks of the language.&lt;/p&gt;
&lt;p&gt;It seemed to me like a good opportunity to share what I learnt doing so with others so you can benefit from it in other projects. I&apos;ve started working a book, entitled &quot;The Hacker&apos;s Guide to Python&quot;, where I will try to share what I learnt while working with Python.&lt;/p&gt;
&lt;p&gt;The book is still a work in progress at this stage, but if you&apos;d like to get in touch and keep updated on its advancement, you can subscribe in the following form or from the &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;book homepage&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>books</category><category>python</category></item><item><title>The definitive guide on how to use static, class or abstract methods in Python</title><link>https://julien.danjou.info/blog/guide-python-static-class-abstract-methods/</link><guid isPermaLink="true">https://julien.danjou.info/blog/guide-python-static-class-abstract-methods/</guid><description>Doing code reviews is a great way to discover things that people might struggle to comprehend. While proof-reading OpenStack patches recently, I spotted that people were not using correctly the.</description><pubDate>Thu, 01 Aug 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Doing code reviews is a great way to discover things that people might struggle to comprehend. While proof-reading &lt;a href=&quot;http://review.openstack.org&quot;&gt;OpenStack patches&lt;/a&gt; recently, I spotted that people were not using correctly the various decorators Python provides for methods. So here&apos;s my attempt at providing me a link to send them to in my next code reviews. :-)&lt;/p&gt;
&lt;h2&gt;How methods work in Python&lt;/h2&gt;
&lt;p&gt;A method is a function that is stored as a class attribute. You can declare and access such a function this way:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; class Pizza(object):
...     def __init__(self, size):
...         self.size = size
...     def get_size(self):
...         return self.size
...
&amp;gt;&amp;gt;&amp;gt; Pizza.get_size
&amp;lt;unbound method Pizza.get_size&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What Python tells you here, is that the attribute &lt;em&gt;get_size&lt;/em&gt; of the class &lt;em&gt;Pizza&lt;/em&gt; is a method that is &lt;strong&gt;unbound&lt;/strong&gt;. What does this mean? We&apos;ll know as soon as we&apos;ll try to call it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; Pizza.get_size()
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
TypeError: unbound method get_size() must be called with Pizza instance as first argument (got nothing instead)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can&apos;t call it because it&apos;s not bound to any instance of &lt;em&gt;Pizza&lt;/em&gt;. And a method wants an instance as its first argument (in Python 2 it &lt;strong&gt;must&lt;/strong&gt; be an instance of that class; in Python 3 it could be anything). Let&apos;s try to do that then:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; Pizza.get_size(Pizza(42))
42
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It worked! We called the method with an instance as its first argument, so everything&apos;s fine. But you will agree with me if I say this is not a very handy way to call methods; we have to refer to the class each time we want to call a method. And if we don&apos;t know what class is our object, this is not going to work for very long.&lt;/p&gt;
&lt;p&gt;So what Python does for us, is that it binds all the methods from the class &lt;code&gt;Pizza&lt;/code&gt; to any instance of this class. This means that the attribute &lt;code&gt;get_size&lt;/code&gt; of an instance of &lt;code&gt;Pizza&lt;/code&gt; is a bound method: a method for which the first argument will be the instance itself.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; Pizza(42).get_size
&amp;lt;bound method Pizza.get_size of &amp;lt;__main__.Pizza object at 0x7f3138827910&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; Pizza(42).get_size()
42
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As expected, we don&apos;t have to provide any argument to &lt;code&gt;get_size&lt;/code&gt;, since it&apos;s bound, its &lt;code&gt;self&lt;/code&gt; argument is automatically set to our &lt;code&gt;Pizza&lt;/code&gt; instance. Here&apos;s an even better proof of that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; m = Pizza(42).get_size
&amp;gt;&amp;gt;&amp;gt; m()
42
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Indeed, you don&apos;t even have to keep a reference to your &lt;code&gt;Pizza&lt;/code&gt; object. Its method is bound to the object, so the method is sufficient to itself.&lt;/p&gt;
&lt;p&gt;But what if you wanted to know which object this bound method is bound to? Here&apos;s a little trick:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; m = Pizza(42).get_size
&amp;gt;&amp;gt;&amp;gt; m.__self__
&amp;lt;__main__.Pizza object at 0x7f3138827910&amp;gt;
&amp;gt;&amp;gt;&amp;gt; # You could guess, look at this:
...
&amp;gt;&amp;gt;&amp;gt; m == m.__self__.get_size
True
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Obviously, we still have a reference to our object, and we can find it back if we want.&lt;/p&gt;
&lt;p&gt;In Python 3, the functions attached to a class are not considered as &lt;em&gt;unbound method&lt;/em&gt; anymore, but as simple functions, that are bound to an object if required. So the principle stays the same, the model is just simplified.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; class Pizza(object):
...     def __init__(self, size):
...         self.size = size
...     def get_size(self):
...         return self.size
...
&amp;gt;&amp;gt;&amp;gt; Pizza.get_size
&amp;lt;function Pizza.get_size at 0x7f307f984dd0&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Static methods&lt;/h2&gt;
&lt;p&gt;Static methods are a special case of methods. Sometimes, you&apos;ll write code that belongs to a class, but that doesn&apos;t use the object itself at all. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class Pizza(object):
    @staticmethod
    def mix_ingredients(x, y):
        return x + y

    def cook(self):
        return self.mix_ingredients(self.cheese, self.vegetables)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In such a case, writing &lt;code&gt;mix_ingredients&lt;/code&gt; as a non-static method would work too, but it would provide it with a &lt;code&gt;self&lt;/code&gt; argument that would not be used. Here, the decorator &lt;code&gt;@staticmethod&lt;/code&gt; buys us several things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Python doesn&apos;t have to instantiate a bound-method for each &lt;code&gt;Pizza&lt;/code&gt; object we instantiate. Bound methods are objects too, and creating them has a cost. Having a static method avoids that:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; Pizza().cook is Pizza().cook
False
&amp;gt;&amp;gt;&amp;gt; Pizza().mix_ingredients is Pizza.mix_ingredients
True
&amp;gt;&amp;gt;&amp;gt; Pizza().mix_ingredients is Pizza().mix_ingredients
True
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It eases the readability of the code: seeing &lt;code&gt;@staticmethod&lt;/code&gt;, we know that the method does not depend on the state of the object itself;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It allows us to override the &lt;code&gt;mix_ingredients&lt;/code&gt; method in a subclass. If we used a function &lt;code&gt;mix_ingredients&lt;/code&gt; defined at the top-level of our module, a class inheriting from &lt;code&gt;Pizza&lt;/code&gt; wouldn&apos;t be able to change the way we mix ingredients for our pizza without overriding &lt;code&gt;cook&lt;/code&gt; itself.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Class methods&lt;/h2&gt;
&lt;p&gt;Having said that, what are class methods? Class methods are methods that are&lt;br /&gt;
not bound to an object, but to… a class!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; class Pizza(object):
...     radius = 42
...     @classmethod
...     def get_radius(cls):
...         return cls.radius
... 
&amp;gt;&amp;gt;&amp;gt; 
&amp;gt;&amp;gt;&amp;gt; Pizza.get_radius
&amp;lt;bound method type.get_radius of &amp;lt;class &apos;__main__.Pizza&apos;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; Pizza().get_radius
&amp;lt;bound method type.get_radius of &amp;lt;class &apos;__main__.Pizza&apos;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; Pizza.get_radius == Pizza().get_radius
True
&amp;gt;&amp;gt;&amp;gt; Pizza.get_radius()
42
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Whatever the way you use to access this method, it will always be bound to the class it is attached to, and its first argument will be the class itself (remember that classes are objects too).&lt;/p&gt;
&lt;p&gt;When to use this kind of methods? Well class methods are mostly useful for two types of methods:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Factory methods, that are used to create an instance for a class using for example some sort of pre-processing. If we use a &lt;code&gt;@staticmethod&lt;/code&gt; instead, we would have to hardcode the &lt;code&gt;Pizza&lt;/code&gt; class name in our function, making any class inheriting from &lt;code&gt;Pizza&lt;/code&gt; unable to use our factory for its own use.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;class Pizza(object):
    def __init__(self, ingredients):
        self.ingredients = ingredients

    @classmethod
    def from_fridge(cls, fridge):
        return cls(fridge.get_cheese() + fridge.get_vegetables())
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Static methods calling static methods: if you split a static method in several static methods, you shouldn&apos;t hard-code the class name but use class methods. Using this way to declare our method, the &lt;code&gt;Pizza&lt;/code&gt; name is never directly referenced and inheritance and method overriding will work flawlessly&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;class Pizza(object):
    def __init__(self, radius, height):
        self.radius = radius
        self.height = height

    @staticmethod
    def compute_area(radius):
         return math.pi * (radius ** 2)

    @classmethod
    def compute_volume(cls, height, radius):
         return height * cls.compute_area(radius)

    def get_volume(self):
        return self.compute_volume(self.height, self.radius)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Abstract methods&lt;/h2&gt;
&lt;p&gt;An abstract method is a method defined in a base class, but that may not provide any implementation. In Java, it would describe the methods of an interface.&lt;/p&gt;
&lt;p&gt;So the simplest way to write an abstract method in Python is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class Pizza(object):
    def get_radius(self):
        raise NotImplementedError
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Any class inheriting from &lt;code&gt;Pizza&lt;/code&gt; should implement and override the &lt;code&gt;get_radius&lt;/code&gt; method, otherwise an exception would be raised.&lt;/p&gt;
&lt;p&gt;This particular way of implementing abstract method has a drawback. If you write a class that inherits from &lt;code&gt;Pizza&lt;/code&gt; and forget to implement &lt;code&gt;get_radius&lt;/code&gt;, the error will only be raised when you&apos;ll try to use that method.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; Pizza()
&amp;lt;__main__.Pizza object at 0x7fb747353d90&amp;gt;
&amp;gt;&amp;gt;&amp;gt; Pizza().get_radius()
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 3, in get_radius
NotImplementedError
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There&apos;s a way to trigger this way earlier, when the object is being instantiated, using the &lt;a href=&quot;http://docs.python.org/2/library/abc.html&quot;&gt;abc&lt;/a&gt; module that&apos;s provided with Python.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import abc

class BasePizza(object):
    __metaclass__  = abc.ABCMeta

    @abc.abstractmethod
    def get_radius(self):
         &quot;&quot;&quot;Method that should do something.&quot;&quot;&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using &lt;code&gt;abc&lt;/code&gt; and its special class, as soon as you&apos;ll try to instantiate &lt;code&gt;BasePizza&lt;/code&gt; or any class inheriting from it, you&apos;ll get a &lt;code&gt;TypeError&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; BasePizza()
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
TypeError: Can&apos;t instantiate abstract class BasePizza with abstract methods get_radius
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Mixing static, class and abstract methods&lt;/h2&gt;
&lt;p&gt;When building classes and inheritances, the time will come where you will have to mix all these methods decorators. So here&apos;s some tips about it.&lt;/p&gt;
&lt;p&gt;Keep in mind that declaring a method as being abstract, doesn&apos;t freeze the prototype of that method. That means that it must be implemented, but it can be implemented with any argument list.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import abc

class BasePizza(object):
    __metaclass__  = abc.ABCMeta

    @abc.abstractmethod
    def get_ingredients(self):
         &quot;&quot;&quot;Returns the ingredient list.&quot;&quot;&quot;

class Calzone(BasePizza):
    def get_ingredients(self, with_egg=False):
        egg = Egg() if with_egg else None
        return self.ingredients + egg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is valid, since &lt;code&gt;Calzone&lt;/code&gt; fulfills the interface requirement we defined for &lt;code&gt;BasePizza&lt;/code&gt; objects. That means that we could also implement it as being a class or a static method, for example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import abc

class BasePizza(object):
    __metaclass__  = abc.ABCMeta

    @abc.abstractmethod
    def get_ingredients(self):
         &quot;&quot;&quot;Returns the ingredient list.&quot;&quot;&quot;

class DietPizza(BasePizza):
    @staticmethod
    def get_ingredients():
        return None
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is also correct and fulfills the contract we have with our abstract &lt;code&gt;BasePizza&lt;/code&gt; class. The fact that the &lt;code&gt;get_ingredients&lt;/code&gt; method doesn&apos;t need to know about the object to return result is an implementation detail, not a criteria to have our contract fulfilled.&lt;/p&gt;
&lt;p&gt;Therefore, you can&apos;t force an implementation of your abstract method to be a regular, class or static method, and arguably you shouldn&apos;t. Starting with Python 3 (this won&apos;t work as you would expect in Python 2, see &lt;a href=&quot;http://bugs.python.org/issue5867&quot;&gt;issue5867&lt;/a&gt;), it&apos;s now possible to use the &lt;code&gt;@staticmethod&lt;/code&gt; and &lt;code&gt;@classmethod&lt;/code&gt; decorators on top of &lt;code&gt;@abstractmethod&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import abc

class BasePizza(object):
    __metaclass__  = abc.ABCMeta

    ingredient = [&apos;cheese&apos;]

    @classmethod
    @abc.abstractmethod
    def get_ingredients(cls):
         &quot;&quot;&quot;Returns the ingredient list.&quot;&quot;&quot;
         return cls.ingredients
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Don&apos;t misread this: if you think this is going to force your subclasses to implement &lt;code&gt;get_ingredients&lt;/code&gt; as a class method, you are wrong. This simply implies that your implementation of &lt;code&gt;get_ingredients&lt;/code&gt; in the &lt;code&gt;BasePizza&lt;/code&gt; class is a class method.&lt;/p&gt;
&lt;p&gt;An implementation in an abstract method? Yes! In Python, contrary to methods in Java interfaces, you can have code in your abstract methods and call it via &lt;code&gt;super()&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import abc

class BasePizza(object):
    __metaclass__  = abc.ABCMeta

    default_ingredients = [&apos;cheese&apos;]

    @classmethod
    @abc.abstractmethod
    def get_ingredients(cls):
         &quot;&quot;&quot;Returns the ingredient list.&quot;&quot;&quot;
         return cls.default_ingredients

class DietPizza(BasePizza):
    def get_ingredients(self):
        return [&apos;egg&apos;] + super(DietPizza, self).get_ingredients()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In such a case, every pizza you will build by inheriting from &lt;code&gt;BasePizza&lt;/code&gt; will have to override the &lt;code&gt;get_ingredients&lt;/code&gt; method, but will be able to use the default mechanism to get the ingredient list by using &lt;code&gt;super()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you&apos;re interested in knowing more, I&apos;ve covered this topic extensively in &lt;a href=&quot;https://thehackerguidetopython.com&quot;&gt;The Hacker&apos;s Guide to Python&lt;/a&gt;. Checkout it out!&lt;/p&gt;
</content:encoded><category>python</category></item><item><title>Hy, Lisp in Python</title><link>https://julien.danjou.info/blog/lisp-python-hy/</link><guid isPermaLink="true">https://julien.danjou.info/blog/lisp-python-hy/</guid><description>I&apos;ve meant to look at Hy since Paul Tagliamonte started to talk to me about it, but never took a chance until now.</description><pubDate>Wed, 03 Apr 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve meant to look at &lt;a href=&quot;http://github.com/paultag/hy&quot;&gt;Hy&lt;/a&gt; since &lt;a href=&quot;http://blog.pault.ag/&quot;&gt;Paul Tagliamonte&lt;/a&gt; started to talk to me about it, but never took a chance until now. Yesterday, Paul indicated it was a good time for me to start looking at it, so I spent a few hours playing.&lt;/p&gt;
&lt;h2&gt;But what&apos;s Hy?&lt;/h2&gt;
&lt;p&gt;Python is very nice: it has a great community and a wide range of useful libraries. But let&apos;s face it, it misses a great language.&lt;/p&gt;
&lt;p&gt;Hy is an implementation of a &lt;a href=&quot;http://en.wikipedia.org/wiki/Lisp_(programming_language)&quot;&gt;Lisp&lt;/a&gt; on top of Python.&lt;/p&gt;
&lt;p&gt;Technically, Hy is built directly with a custom made parser (for now) which then translates expressions using the &lt;a href=&quot;http://docs.python.org/2/library/ast.html&quot;&gt;Python AST&lt;/a&gt; module to generate code, which is then run by Python. Therefore, it shares the same properties as Python, and is a Lisp-1 (i.e. with a single namespace for symbols and functions).&lt;/p&gt;
&lt;p&gt;If you&apos;re interested to listen Paul talking about Hy during last PyCon US, I recommend watching his lightning talk. As the name implies, it&apos;s only a few minutes long.&lt;/p&gt;
&lt;h2&gt;Does it work?&lt;/h2&gt;
&lt;p&gt;I&apos;ve been cloning the code and played around a bit with Hy. And to my greatest surprise and pleasure, it works quite well. You can imagine writing Python from there easily. Part of the syntax smells like &lt;a href=&quot;http://clojure.org&quot;&gt;Clojure&lt;/a&gt;&apos;s, which looks like a good thing since they&apos;re playing in the same area.&lt;/p&gt;
&lt;p&gt;You can try a &lt;a href=&quot;http://hy.pault.ag/&quot;&gt;Hy REPL&lt;/a&gt; in your Web browser if you want.&lt;/p&gt;
&lt;p&gt;Here&apos;s what some code look like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(import requests)

(setv req (requests.get &quot;http://hy.pault.ag&quot;))
(if (= req.status_code 200)
  (for (kv (.iteritems req.headers))
    (print kv))
  (throw (Exception &quot;Wrong status code&quot;)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code would ouput:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(&apos;date&apos;, &apos;Wed, 03 Apr 2013 12:09:23 GMT&apos;)
(&apos;connection&apos;, &apos;keep-alive&apos;)
(&apos;content-encoding&apos;, &apos;gzip&apos;)
(&apos;transfer-encoding&apos;, &apos;chunked&apos;)
(&apos;content-type&apos;, &apos;text/html; charset=utf-8&apos;)
(&apos;server&apos;, &apos;nginx/1.2.6&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, it&apos;s really simple to write Lispy code that really uses Python idioms.&lt;/p&gt;
&lt;p&gt;There&apos;s obviously still a lots of missing features in Hy. The language if far from complete and many parts are moving, but it&apos;s really promising, and Paul&apos;s doing a great job implementing every idea.&lt;/p&gt;
&lt;p&gt;I actually started to hack a bit on Hy, and will try to continue to do so, since I&apos;m really eager to learn a bit more about both Lisp and Python internals in the process. I&apos;ve already send a few patches on small bugs I&apos;ve encountered, and proposed a few ideas. It&apos;s really exciting to be able to influence early a language design that I&apos;ll love to use! Being a recent fan of Common Lisp, I tend to grab the good stuff from it to add them into Hy.&lt;/p&gt;
</content:encoded><category>python</category><category>lisp</category><category>talks</category></item><item><title>OpenStack Swift eventual consistency analysis &amp; bottlenecks</title><link>https://julien.danjou.info/blog/openstack-swift-consistency-analysis/</link><guid isPermaLink="true">https://julien.danjou.info/blog/openstack-swift-consistency-analysis/</guid><description>Swift is the software behind the OpenStack Object Storage service.</description><pubDate>Mon, 23 Apr 2012 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://launchpad.net/swift&quot;&gt;Swift&lt;/a&gt; is the software behind the &lt;a href=&quot;http://openstack.org/projects/storage/&quot;&gt;OpenStack Object Storage&lt;/a&gt; service.&lt;/p&gt;
&lt;p&gt;This service provides a simple storage service for applications using &lt;a href=&quot;http://docs.openstack.org/api/openstack-object-storage/1.0/content/&quot;&gt;RESTful interfaces&lt;/a&gt;, providing maximum data availability and storage capacity.&lt;/p&gt;
&lt;p&gt;I explain here how some parts of the storage and replication in Swift works, and show some of its current limitations.&lt;/p&gt;
&lt;p&gt;If you don&apos;t know Swift and want to read a more &quot;shallow&quot; overview first, you can read John Dickinson&apos;s &lt;a href=&quot;http://programmerthoughts.com/openstack/swift-tech-overview/&quot;&gt;Swift Tech Overview&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;How Swift storage works&lt;/h2&gt;
&lt;p&gt;If we refer to the &lt;a href=&quot;http://en.wikipedia.org/wiki/CAP_theorem&quot;&gt;CAP theorem&lt;/a&gt;, Swift chose &lt;strong&gt;availability&lt;/strong&gt; and &lt;strong&gt;partition tolerance&lt;/strong&gt; and dropped &lt;strong&gt;consistency&lt;/strong&gt;. That means that you&apos;ll always get your data, they will be dispersed on many places, but you could get an old version of them (or no data at all) in some odd cases (like some server overload or failure). This compromise is made to allow maximum availability and scalability of the storage platform.&lt;/p&gt;
&lt;p&gt;But there are mechanisms built into Swift to minimize the potential data inconsistency window: they are responsible for data replication and consistency.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;http://swift.openstack.org/&quot;&gt;official Swift documentation&lt;/a&gt; explains the internal storage in a certain way, but I&apos;m going to write my own explanation here about this.&lt;/p&gt;
&lt;h3&gt;Consistent hashing&lt;/h3&gt;
&lt;p&gt;Swift uses the principle of &lt;a href=&quot;http://en.wikipedia.org/wiki/Consistent_hashing&quot;&gt;consistent hashing&lt;/a&gt;. It builds what it calls a &lt;em&gt;ring&lt;/em&gt;. A ring represents the space of all possible computed hash values divided in equivalent parts. Each part of this space is called a &lt;em&gt;partition&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The following schema (stolen from the &lt;a href=&quot;http://wiki.basho.com/&quot;&gt;Riak&lt;/a&gt; project) shows the principle nicely:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/riak-ring.png&quot; alt=&quot;riak-ring&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In a simple world, if you wanted to store some objects and distribute them on 4 nodes, you would split your hash space in 4. You would have 4 partitions, and computing &lt;em&gt;hash(object) modulo 4&lt;/em&gt; would tell you where to store your object: on node 0, 1, 2 or 3.&lt;/p&gt;
&lt;p&gt;But since you want to be able to extend your storage cluster to more nodes without breaking the whole hash mapping and moving everything around, you need to build a lot more partitions. Let&apos;s say we&apos;re going to build 210 partitions. Since we have 4 nodes, each node will have &lt;code&gt;210 ÷ 4 = 256&lt;/code&gt; partitions. If we ever want to add a 5th node, it&apos;s easy: we just have to re-balance the partitions and move 1⁄4 of the partitions from each node to this 5th node. That means all our nodes will end up with &lt;code&gt;210 ÷ 5 ≈ 204&lt;/code&gt; partitions. We can also define a &lt;em&gt;weight&lt;/em&gt; for each node, in order for some nodes to get more partitions than others.&lt;/p&gt;
&lt;p&gt;With 210 partitions, we can have up to 210 nodes in our cluster. Yeepee.&lt;/p&gt;
&lt;p&gt;For reference, Gregory Holt, one of the Swift authors, also wrote &lt;a href=&quot;http://greg.brim.net/page/building_a_consistent_hashing_ring.html&quot;&gt;an explanation post about the ring&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Concretely, when building one Swift ring, you&apos;ll have to say how much partitions you want, and this is what this value is really about.&lt;/p&gt;
&lt;h3&gt;Data duplication&lt;/h3&gt;
&lt;p&gt;Now, to assure availability and partitioning (as seen in the &lt;em&gt;CAP theorem&lt;/em&gt;) we also want to store replicas of our objects. By default, Swift stores 3 copies of every objects, but that&apos;s configurable.&lt;/p&gt;
&lt;p&gt;In that case, we need to store each partition defined above not only on 1 node, but on 2 others. So Swift adds another concept: zones. A zone is an isolated space that does not depends on other zone, so in case of an outage on a zone, the other zones are still available. Concretely, a zone is likely to be a disk, a server, or a whole cabinet, depending on the size of your cluster. It&apos;s up to you to choose anyway.&lt;/p&gt;
&lt;p&gt;Consequently, each partitions has not to be mapped to 1 host only anymore, but to N hosts. Each node will therefore store this number of partitions:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;number of partition stored on one node = number of replicas × total number of partitions ÷ number of node
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Examples:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We split the ring in 210 = 1024 partitions. We have 3 nodes. We want 3 replicas of data.&lt;br /&gt;
→ Each node will store a copy of the full partition space: &lt;code&gt;3 × 210 ÷ 3 = 210 = 1024 partitions&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;We split the ring in 211 = 2048 partitions. We have 5 nodes. We want 3 replicas of data.&lt;br /&gt;
→ Each node will store &lt;code&gt;211 × 3 ÷ 5 ≈ 1129 partitions&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;We split the ring in 211 = 2048 partitions. We have 6 nodes. We want 3 replicas of data.&lt;br /&gt;
→ Each node will store &lt;code&gt;211 × 3 ÷ 6 = 1024 partitions&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;Three rings to rule them all&lt;/h3&gt;
&lt;p&gt;In Swift, there is 3 categories of thing to store: &lt;em&gt;account&lt;/em&gt;, &lt;em&gt;container&lt;/em&gt; and &lt;em&gt;objects&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;account&lt;/strong&gt; is what you&apos;d expect it to be, a user account. An account contains &lt;strong&gt;containers&lt;/strong&gt; (the equivalent of Amazon S3&apos;s buckets). Each container can contains user-defined key and values (just like a hash table or a dictionary): values are what Swift call &lt;strong&gt;objects&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Swift wants you to build 3 different and independent rings to store its 3 kind of things (&lt;em&gt;accounts&lt;/em&gt;, &lt;em&gt;containers&lt;/em&gt; and &lt;em&gt;objects&lt;/em&gt;).&lt;/p&gt;
&lt;p&gt;Internally, the two first categories are stored as &lt;a href=&quot;http://www.sqlite.org/&quot;&gt;SQLite&lt;/a&gt; databases, whereas the last one is stored using regular files.&lt;/p&gt;
&lt;p&gt;Note that this 3 rings can be stored and managed on 3 completely different set of servers.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/openstack-swift-storage-1.png&quot; alt=&quot;openstack-swift-storage-1&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Data replication&lt;/h2&gt;
&lt;p&gt;Now that we have our storage theory in place (accounts, containers and objects distributed into partitions, themselves stored into multiple zones), let&apos;s go the replication practice.&lt;/p&gt;
&lt;p&gt;When you put something in one of the 3 rings (being an account, a container or an object) it is uploaded into all the zones responsible for the ring partition the object belongs to. This upload into the different zones is the responsibility of the &lt;em&gt;swift-proxy&lt;/em&gt; daemon.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/openstack-swift-replication.png&quot; alt=&quot;openstack-swift-replication&quot; /&gt;&lt;/p&gt;
&lt;p&gt;But if one of the zone is failing, you can&apos;t upload all your copies in all zones at the upload time. So you need a mechanism to be sure the failing zone will catch up to a correct state at some point.&lt;/p&gt;
&lt;p&gt;That&apos;s the role of the &lt;em&gt;swift-{container,account,object}-replicator&lt;/em&gt; processes. This processes are &lt;strong&gt;running on each node part of a zone&lt;/strong&gt; and replicates their contents to nodes of the other zones.&lt;/p&gt;
&lt;p&gt;When they run, they walk through all the contents from all the partitions on the whole file system and for each partition, issue a special &lt;em&gt;REPLICATE&lt;/em&gt; HTTP request to all the other zones responsible for that same partition. The other zone responds with information about the local state of the partition. That allows the replicator process to decide if the remote zone has an up-to-date version of the partition.&lt;/p&gt;
&lt;p&gt;In case of account and containers, it doesn&apos;t check at the partition level, but check each account/container contained inside each partition.&lt;/p&gt;
&lt;p&gt;If something is not up-to-date, it will be pushed using &lt;em&gt;rsync&lt;/em&gt; by the replicator process. This is why you&apos;ll read that the replication updates are &lt;em&gt;&quot;push based&quot;&lt;/em&gt; in Swift documentation.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Pseudo code describing replication process for accounts
## The principle is exactly the same for containers
for account in accounts:
    # Determine the partition used to store this account
    partition = hash(account) % number_of_partitions
    # The number of zone is the number of replicas configured
    for zone in partition.get_zones_storing_this_partition():
        # Send a HTTP REPLICATE command to the remote swift-account-server process
        version_of_account = zone.send_HTTP_REPLICATE_for(account):
        if version_of_account &amp;lt; account.version()
            account.sync_to(zone)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This replication process is &lt;em&gt;O(number of account × number of replicas)&lt;/em&gt;. The more your number of account will increase and the more you will want replicas for your data, the more the replication time for your accounts will grow. The same rule applies for containers.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Pseudo code describing replication process for objects
for partition in partitions_storing_objects:
    # The number of zone is the number of replicas configured
    for zone in partition.get_zones_storing_this_partition():
        # Send a HTTP REPLICATE command to the remote swift-object-server process
        verion_of_partition = zone.send_HTTP_REPLICATE_for(partition):
        if version_of_partition &amp;lt; partition.version()
            # Use rsync to synchronize the whole partition
            # and all its objects
            partition.rsync_to(zone)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This replication process is &lt;em&gt;O(number of objects partitions × number of replicas)&lt;/em&gt;. The more your number of objects partitions will increase, and the more you will want replicas for your data, the more the replication time for your objects will grow.&lt;/p&gt;
&lt;p&gt;I think this is something important to know when deciding how to build your Swift architecture. Choose the right number the number of replicas, partitions and nodes.&lt;/p&gt;
&lt;h2&gt;Replication process bottlenecks&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://julien.danjou.info/content/images/03/copy-cat.jpg&quot; alt=&quot;copy-cat&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;File accesses&lt;/h3&gt;
&lt;p&gt;The problem, as you might have guessed, is that to replicate, &lt;strong&gt;it walks through every damn things&lt;/strong&gt;, things being accounts, containers, or object&apos;s partition hash files. This means it need to open and read (part of) a every file your node stores to check that data need or not to be replicated!&lt;/p&gt;
&lt;p&gt;For accounts &amp;amp; containers replication, this is done every 30 seconds by default, but it will likely take more than 30 seconds as soon as you hit around 12 000 containers on a node (see measurements below). Therefore you&apos;ll end up checking consistency of accounts &amp;amp; containers on each all node &lt;strong&gt;all the time&lt;/strong&gt;, using obviously a lot of CPU time.&lt;/p&gt;
&lt;p&gt;For reference, &lt;a href=&quot;http://web.archive.org/web/20120903043209/http://alexyang.sinaapp.com/?p=115&quot;&gt;Alex Yang also did an analysis&lt;/a&gt; of that same problem.&lt;/p&gt;
&lt;h3&gt;TCP connections&lt;/h3&gt;
&lt;p&gt;Worst, the HTTP connections used to send the &lt;em&gt;REPLICATE&lt;/em&gt; commands are not pooled: a new TCP connection is established each time something has to be checked against the same thing stored on a remote zone.&lt;/p&gt;
&lt;p&gt;This is why you&apos;ll see in the &lt;a href=&quot;http://swift.openstack.org/deployment_guide.html&quot;&gt;Swift&apos;s Deployment Guide&lt;/a&gt; this lines listed&lt;br /&gt;
under &lt;a href=&quot;http://swift.openstack.org/deployment_guide.html#general-system-tuning&quot;&gt;&quot;general system tuning&quot;&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## disable TIME_WAIT.. wait..
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1

## double amount of allowed conntrack
net.ipv4.netfilter.ip_conntrack_max = 262144
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In my humble opinion, this is more an ugly hack than a tuning. If you don&apos;t activate this and if you have a lot of containers on your node, you&apos;ll end up soon with thousands of connections in &lt;em&gt;TIME_WAIT&lt;/em&gt; state, and you indeed risk to overload the IP conntrack module.&lt;/p&gt;
&lt;h3&gt;Container deletion&lt;/h3&gt;
&lt;p&gt;We also should talk about container deletion. When a user deletes a container from its account, the container is &lt;strong&gt;marked as deleted&lt;/strong&gt;. And that&apos;s it. It&apos;s not deleted. Therefore the SQLite database file representing the container will continue to be checked for synchronization, over and over.&lt;/p&gt;
&lt;p&gt;The only way to have a container permanently deleted is to &lt;strong&gt;mark an account as deleted&lt;/strong&gt;. This way the &lt;em&gt;swift-account-reaper&lt;/em&gt; will delete all its containers and, finally, the account.&lt;/p&gt;
&lt;h2&gt;Measurement&lt;/h2&gt;
&lt;p&gt;On a pretty big server, I measured the replications to be done at a speed of around 350 {account,container,object-partitions}/second, which can be a real problem if you choose to build a lots of partition and you have a low &lt;em&gt;number_of_node ⁄ number_of_replicas&lt;/em&gt; ratio.&lt;/p&gt;
&lt;p&gt;For example, the default parameters runs the container replication every 30 seconds. To check replication status of 12 000 containers stored on one node at the speed of 350 containers/seconds, you&apos;ll need around 34 seconds to do so. In the end, you&apos;ll never stop checking replication of your containers, and the more you&apos;ll have containers, the more your &lt;strong&gt;inconsistency window will increase&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Until some of the code is fixed (the HTTP connection pooling probably being the &quot;easiest&quot; one), I warmly recommend to choose correctly the different Swift parameters for your setup. The replication process optimization consists in having the minimum amount of partitions per node, which can be done by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;decreasing the number of partitions&lt;/li&gt;
&lt;li&gt;decreasing the number of replicas&lt;/li&gt;
&lt;li&gt;increasing the number of node&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For very large setups, some code to speed up accounts and containers synchronization, and remove deleted containers will be required, but this does not exist yet, as far as I know.&lt;/p&gt;
</content:encoded><category>openstack</category><category>python</category></item><item><title>First release of PyMuninCli</title><link>https://julien.danjou.info/blog/pymunincli-0-1/</link><guid isPermaLink="true">https://julien.danjou.info/blog/pymunincli-0-1/</guid><description>Today I release a Python client library to query Munin servers.</description><pubDate>Tue, 17 Apr 2012 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Today I release a &lt;a href=&quot;http://python.org&quot;&gt;Python&lt;/a&gt; client library to query &lt;a href=&quot;http://munin-monitoring.org/&quot;&gt;Munin&lt;/a&gt; servers.&lt;/p&gt;
&lt;p&gt;I wrote it as part of some experiments I did a few weeks ago. I discovered there was no client library to query a Munin server. There&apos;s &lt;a href=&quot;http://aouyar.github.com/PyMunin/&quot;&gt;PyMunin&lt;/a&gt; or &lt;a href=&quot;http://samuelks.com/python-munin/&quot;&gt;python-munin&lt;/a&gt; which help developing Munin plugins, but nothing to access the &lt;em&gt;munin-node&lt;/em&gt; and retrieve its data.&lt;/p&gt;
&lt;p&gt;So I decided to write a quick and simple one, and it&apos;s released under the name of &lt;a href=&quot;https://github.com/jd/pymunincli&quot;&gt;PyMuninCli&lt;/a&gt;, providing the &lt;em&gt;munin.client&lt;/em&gt; Python module.&lt;/p&gt;
</content:encoded><category>python</category><category>monitoring</category></item><item><title>xpyb 1.3 released</title><link>https://julien.danjou.info/blog/xpyb-1-3/</link><guid isPermaLink="true">https://julien.danjou.info/blog/xpyb-1-3/</guid><description>It took a while to get it out, but finally, 3 years after the latest release (1.2), the version of 1.3 of xpyb (the XCB Python bindngs) is out.</description><pubDate>Thu, 22 Mar 2012 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It took a while to get it out, but finally, 3 years after the latest release (1.2), the version of 1.3 of &lt;a href=&quot;http://cgit.freedesktop.org/xcb/xpyb/&quot;&gt;xpyb&lt;/a&gt; (the &lt;a href=&quot;http://xcb.freedesktop.org&quot;&gt;XCB&lt;/a&gt; Python bindngs) is out.&lt;/p&gt;
&lt;p&gt;This version has a lot of improvement, and major bug fixes (memory corruption and memory leak were tracked down and fixed).&lt;/p&gt;
&lt;p&gt;One amazing feature that is now shipped with that release, is &lt;a href=&quot;https://julien.danjou.info/blog/python-cairo-and-xcb-support&quot;&gt;my code to export the xpyb API to other Python modules&lt;/a&gt;, allowing to draw with &lt;a href=&quot;http://www.cairographics.org/pycairo/&quot;&gt;Pycairo&lt;/a&gt; in Python using XCB.&lt;/p&gt;
&lt;p&gt;Here is an example of a Python program that draws a spiral in a window using xpyb and Pycairo. You need xpyb &amp;gt;= 1.3 and Pycairo &amp;gt;= 1.10 to make this works.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import cairo
import xcb
from xcb.xproto import *

WIDTH, HEIGHT = 600, 600

def draw_spiral(ctx, width, height):
    &quot;&quot;&quot;Draw a spiral with lines!&quot;&quot;&quot;
    wd = .02 * width
    hd = .02 * height

    width -= 2
    height -= 2

    ctx.move_to (width + 1, 1-hd)
    for i in range(9):
        ctx.rel_line_to (0, height - hd * (2 * i - 1))
        ctx.rel_line_to (- (width - wd * (2 *i)), 0)
        ctx.rel_line_to (0, - (height - hd * (2*i)))
        ctx.rel_line_to (width - wd * (2 * i + 1), 0)

    ctx.set_source_rgb (0, 0, 1)
    ctx.stroke()

## Connect to the X server
conn = xcb.connect()
## Get the X server setup
setup = conn.get_setup()
## Generate X ID for our X &quot;objects&quot;
window = conn.generate_id()
pixmap = conn.generate_id()
gc = conn.generate_id()
## Create a new window
conn.core.CreateWindow(setup.roots[0].root_depth, window,
                       # Parent is the root window
                       setup.roots[0].root,
                       0, 0, WIDTH, HEIGHT, 0, WindowClass.InputOutput,
                       setup.roots[0].root_visual,
                       CW.BackPixel | CW.EventMask,
                       [ setup.roots[0].white_pixel, EventMask.ButtonPress | EventMask.EnterWindow | EventMask.LeaveWindow | EventMask.Exposure ])

## Create a pixmap: it will be used to draw with cairo
conn.core.CreatePixmap(setup.roots[0].root_depth, pixmap, setup.roots[0].root,
                       WIDTH, HEIGHT)

## We just need a GC to copy later the pixmap on the window, so create one
## very simple
conn.core.CreateGC(gc, setup.roots[0].root, GC.Foreground | GC.Background,
                   [ setup.roots[0].black_pixel, setup.roots[0].white_pixel ])

## Create a cairo surface
surface = cairo.XCBSurface (conn, pixmap,
                            setup.roots[0].allowed_depths[0].visuals[0], WIDTH, HEIGHT)
## Create a cairo context with that surface
ctx = cairo.Context(surface)

## Paint everything in white
ctx.set_source_rgb (1, 1, 1)
ctx.set_operator (cairo.OPERATOR_SOURCE)
ctx.paint()

## Draw our spiral
draw_spiral (ctx, WIDTH, HEIGHT)

## Map the window on the screen so it gets visible
conn.core.MapWindow(window)

## Flush all X requests to the X server
conn.flush()

while True:
    try:
        event = conn.wait_for_event()
    except xcb.ProtocolException, error:
        print &quot;Protocol error %s received!&quot; % error.__class__.__name__
        break
    except:
        break

    # ExposeEvent are received when we need to refresh the content of the
    # window, so we copy the content of the pixmap (where cairo drew) in the
    # window
    if isinstance(event, ExposeEvent):
        conn.core.CopyArea(pixmap, window, gc, 0, 0, 0, 0, WIDTH, HEIGHT)
    # You click, I quit.
    elif isinstance(event, ButtonPressEvent):
        break
    conn.flush()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Seeing the complexity it is to draw something simple with this technology, I somehow understand why nobody bothered to release or use the code during the last 3 years.&lt;/p&gt;
&lt;p&gt;But hey, now that it&apos;s out, you can build the next Python based desktop environment with bleeding edge technologies. :-)&lt;/p&gt;
</content:encoded><category>x11</category><category>python</category></item><item><title>Google Calendar notifications using pynotify</title><link>https://julien.danjou.info/blog/google-calendar-pynotify/</link><guid isPermaLink="true">https://julien.danjou.info/blog/google-calendar-pynotify/</guid><description>I use Google Calendar to manage my calendars, and I really missed something to warn me whenever I have an appointment with an alert set.</description><pubDate>Tue, 03 Jan 2012 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I use &lt;a href=&quot;http://google.com/calendar&quot;&gt;Google Calendar&lt;/a&gt; to manage my calendars, and I really missed something to warn me whenever I have an appointment with an alert set.&lt;/p&gt;
&lt;p&gt;So here is an example of a Python program to do such a thing. It is written using the &lt;a href=&quot;http://code.google.com/p/gdata-python-client/&quot;&gt;Google Data APIs Python client library&lt;/a&gt; and pynotify.&lt;/p&gt;
&lt;p&gt;I&apos;ll detail the code here, so you can build your own and adapt it to your needs.&lt;/p&gt;
&lt;p&gt;First, we need to import GTK+ and pynotify, and initialize it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import gtk
import pynotify
pynotify.init(sys.argv[0])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, we need to import gdata Calendar API and connect to the calendar. I&apos;ll use the simple email/password way to login, which is clearly not the best, but it&apos;s also the simplest. Feel free to use OAuth 2.0. :-)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;calendar_service = gdata.calendar.service.CalendarService()
calendar_service.email = &apos;mygooglelogin&apos;
calendar_service.password = &apos;mygooglepassword&apos;
calendar_service.ProgrammaticLogin()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we&apos;re ready to request stuff and notify! First, request the events from the default calendar.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;feed = calendar_service.GetCalendarEventFeed()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can iterate over &lt;em&gt;feed&lt;/em&gt; and do various checks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for event in feed.entry:
    # If the event status is not confirmed, go to the next event.
    if event.event_status.value != &quot;CONFIRMED&quot;:
        continue
    # Now iterate over all the event dates (usually it has one)
    for when in event.when:
        # Parse start and end time
        try:
            start_time = datetime.datetime.strptime(when.start_time.split(&quot;.&quot;)[0], &quot;%Y-%m-%dT%H:%M:%S&quot;)
            end_time = datetime.datetime.strptime(when.end_time.split(&quot;.&quot;)[0], &quot;%Y-%m-%dT%H:%M:%S&quot;)
        except ValueError:
            # ValueError happens on parsing error. Parsing errors
            # usually happen for &quot;all day&quot; events since they have
            # not time, but we do not care about this events.
            continue
        now = datetime.datetime.now()
        # Check that the event hasn&apos;t already ended
        if end_time &amp;gt; now:
            # Check each alert
            for reminder in when.reminder:
                # We handle only reminders with method &quot;alert&quot;
                # and whose start time minus the reminder delay has passed
                if reminder.method == &quot;alert&quot; \
                        and start_time - datetime.timedelta(0, 60 * int(reminder.minutes)) &amp;lt; now:
                    # Build the notification
                    notification = pynotify.Notification(summary=event.title.text,
                                                         message=event.content.text)
                    # Set an icon from the GTK+ stock icons
                    notification.set_icon_from_pixbuf(gtk.Label().render_icon(gtk.STOCK_DIALOG_INFO,
                                                                              gtk.ICON_SIZE_LARGE_TOOLBAR))
                    notification.set_timeout(0)
                    # Show the notification
                    notification.show()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running this program, you should see a notification if an appointment has an alert to be raised at that time.&lt;/p&gt;
&lt;p&gt;This should be enough to start to build something.&lt;/p&gt;
&lt;p&gt;If you don&apos;t want to program this into Python, you might want to take a look at &lt;a href=&quot;http://code.google.com/p/gcalcli/wiki/HowTo&quot;&gt;gcalcli&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>python</category><category>x11</category><category>google</category></item><item><title>Using GTK+ stock icons with pynotify</title><link>https://julien.danjou.info/blog/python-notify-with-gtk-stock-icon/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-notify-with-gtk-stock-icon/</guid><description>It took me a while to find this, so I&apos;m just blogging it so other people will be able to find it.</description><pubDate>Tue, 27 Dec 2011 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It took me a while to find this, so I&apos;m just blogging it so other people will be able to find it.&lt;/p&gt;
&lt;p&gt;I wanted to send a &lt;a href=&quot;http://www.galago-project.org/specs/notification/&quot;&gt;desktop notification&lt;/a&gt; using pynotify, but using a &lt;a href=&quot;http://developer.gnome.org/gtk/2.24/gtk-Stock-Items.html&quot;&gt;GTK+ stock icons&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With the following snippet, I managed to do it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import pynotify
pynotify.init(&quot;myapp&quot;)
import gtk
n = pynotify.Notification(summary=&quot;Summary&quot;, message=&quot;Message!&quot;)
n.set_icon_from_pixbuf(gtk.Label().render_icon(gtk.STOCK_HARDDISK, gtk.ICON_SIZE_LARGE_TOOLBAR))
n.show()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that the use of a &lt;em&gt;Label&lt;/em&gt; is just to have a widget instanciated to use the &lt;em&gt;render_icon()&lt;/em&gt; method. It could be any widget type as far as I understand.&lt;/p&gt;
</content:encoded><category>python</category><category>x11</category></item><item><title>New job, new blog</title><link>https://julien.danjou.info/blog/new-job-new-blog/</link><guid isPermaLink="true">https://julien.danjou.info/blog/new-job-new-blog/</guid><description>It has been a while since I blogged but I&apos;ve been very busy, with my new job and this new blog!  New job! I quitted my job last September, and found another one that I started in October. I&apos;m now the</description><pubDate>Wed, 07 Dec 2011 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It has been a while since I blogged but I&apos;ve been very busy, with my new job and this new blog!&lt;/p&gt;
&lt;h2&gt;New job!&lt;/h2&gt;
&lt;p&gt;I quitted my job last September, and found another one that I started in October. I&apos;m now the lead developer of &lt;a href=&quot;http://www.enovance.com/fr/produits-solutions/opencloud-opensource/enovance-labs&quot;&gt;eNovance Labs&lt;/a&gt;, where I work on the &lt;a href=&quot;http://openstack.org/&quot;&gt;OpenStack&lt;/a&gt; project. So far, this allowed me to contribute heavily to the &lt;a href=&quot;https://alioth.debian.org/projects/openstack&quot;&gt;Debian packaging of OpenStack&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;New blog!&lt;/h2&gt;
&lt;p&gt;In the meantime, I took some time to redesign my personal homepage and this blog, which is now using &lt;a href=&quot;https://github.com/hyde/hyde&quot;&gt;Hyde&lt;/a&gt;, the &lt;a href=&quot;http://python.org&quot;&gt;Python&lt;/a&gt; equivalent of &lt;a href=&quot;http://jekyllrb.com/&quot;&gt;Jekyll&lt;/a&gt;, which is in &lt;a href=&quot;http://www.ruby-lang.org/&quot;&gt;Ruby&lt;/a&gt;. Since I dislike Ruby (sorry), I preferred to use a Python based generator, and I admit Hyde is really cool.&lt;br /&gt;
Since I really suck at Web design, this one is obviously based on &lt;a href=&quot;http://twitter.github.com/bootstrap/&quot;&gt;Twitter&apos;s bootstrap&lt;/a&gt;&lt;/p&gt;
</content:encoded><category>career</category><category>openstack</category><category>python</category><category>debian</category></item><item><title>Python sets comparisons</title><link>https://julien.danjou.info/blog/python-sets-comparisons/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-sets-comparisons/</guid><description>This week I lost some time playing with Python&apos;s sets.</description><pubDate>Tue, 17 May 2011 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This week I lost some time playing with &lt;a href=&quot;http://python.org&quot;&gt;Python&lt;/a&gt;&apos;s &lt;a href=&quot;http://docs.python.org/library/stdtypes.html#set&quot;&gt;sets&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;After digging into Python source code, I finally discovered there is what seems to be little bug. Anyway, it has been &quot;fixed&quot; in Python 3, fortunately. I did not find if it was reported somewhere, but since it&apos;s fixed, it&apos;s not a big deal.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Python 2.7.1+ (default, Apr 20 2011, 10:53:33) 
[GCC 4.5.2] on linux2
Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.
&amp;gt;&amp;gt;&amp;gt; class A(object):
...     def __eq__(self, other):
...             return True
... 
&amp;gt;&amp;gt;&amp;gt; A() == A()
True
&amp;gt;&amp;gt;&amp;gt; [A()] == [A()]
True
&amp;gt;&amp;gt;&amp;gt; set([A()]) == set([A()])
False
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This clearly did not make any sense to me. I&apos;ve then tested under Python 3.2:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Python 3.2.1a0 (default, May  4 2011, 19:59:25) 
[GCC 4.6.1 20110428 (prerelease)] on linux2
Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.
&amp;gt;&amp;gt;&amp;gt; class A(object):
...     def __eq__(self, other):
...             return True
... 
&amp;gt;&amp;gt;&amp;gt; set([A()]) == set([A()])
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
TypeError: unhashable type: &apos;A&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At least, raising an error is saner. It actually helped me to understand what I needed to do to have my sets working correctly with Python 2:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Python 2.7.1+ (default, Apr 20 2011, 10:53:33) 
[GCC 4.5.2] on linux2
Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.
&amp;gt;&amp;gt;&amp;gt; class A(object):
...     def __eq__(self, other):
...             return True
...     def __hash__(self):
...             return 123456789
... 
&amp;gt;&amp;gt;&amp;gt; set([A()]) == set([A()])
True
&lt;/code&gt;&lt;/pre&gt;
</content:encoded><category>python</category></item><item><title>Python cairo and XCB support</title><link>https://julien.danjou.info/blog/python-cairo-and-xcb-support/</link><guid isPermaLink="true">https://julien.danjou.info/blog/python-cairo-and-xcb-support/</guid><description>cairo has a Python binding (pycairo) since a long time, and some months ago a Python binding for XCB (xpyb) has been released.</description><pubDate>Tue, 22 Dec 2009 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;http://www.cairographics.org&quot;&gt;cairo&lt;/a&gt; has a &lt;a href=&quot;http://www.cairographics.org/pycairo/&quot;&gt;Python binding (pycairo)&lt;/a&gt; since a long time, and some months ago a &lt;a href=&quot;http://cgit.freedesktop.org/xcb/xpyb/&quot;&gt;Python binding for XCB (xpyb)&lt;/a&gt; has been released.&lt;/p&gt;
&lt;p&gt;Pycairo has no support for creating Xlib surfaces. You can get a Xlib surface from PyGTK and then use Pycairo to draw on it, but there&apos;s no way to create one directly.&lt;/p&gt;
&lt;p&gt;What I&apos;ve done is make Pycairo aware of xpyb so it can creates directly an XCB surface from a XCB connection and a drawable.&lt;/p&gt;
&lt;p&gt;As said in &lt;a href=&quot;http://lists.freedesktop.org/archives/xcb/2009-December/005438.html&quot;&gt;my mail to the XCB list&lt;/a&gt;, I&apos;m now waiting for a review before pushing this upstream. :-)&lt;/p&gt;
&lt;p&gt;For the first time, I guess, XCB has beaten Xlib support! ;-)&lt;/p&gt;
</content:encoded><category>python</category><category>x11</category></item><item><title>Teething troubles</title><link>https://julien.danjou.info/blog/teething-troubles/</link><guid isPermaLink="true">https://julien.danjou.info/blog/teething-troubles/</guid><description>It&apos;s not that often that I start something from scratch. It&apos;s an amazing feeling to start a new project, to start writing something new. I like that. It&apos;s creation, it&apos;s an artistic part of our comput</description><pubDate>Sun, 20 Dec 2009 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It&apos;s not that often that I start something from scratch. It&apos;s an amazing feeling to start a new project, to start writing something new. I like that. It&apos;s creation, it&apos;s an artistic part of our computing stuff. I feel like a code artist.&lt;/p&gt;
&lt;p&gt;And what I like even more is that little feeling that you are going in an unknown land. Some area in this tech world where nobody ever came before you, or only a few pioneers.&lt;/p&gt;
&lt;p&gt;That&apos;s the sensation I got starting to using &lt;a href=&quot;http://www.cython.org&quot;&gt;Cython&lt;/a&gt;, &lt;a href=&quot;http://www.python.org&quot;&gt;Python 3&lt;/a&gt; and various other tools. I just spent half of my time trying to fix problems, rather than working on &lt;em&gt;my&lt;/em&gt; code. Problems in autoconf macro not knowing Python 2.6 or Python 3.1. Problems and limitations in Cython. And problem in Python.&lt;/p&gt;
&lt;p&gt;That last one was a hard one. I&apos;m still a beginner in the Python world: I barely know anything. And I was trying to use something nobody never did: building an embedded Python with a set of built-in modules.&lt;/p&gt;
&lt;p&gt;I spent hours trying to find why one type of module importing was badly failing. I finally found the answer thanks to a guy. who has the same problem A guy ? No. A pioneer. What do I say? A hero. He&apos;s been my week-hero! Thank you Miguel Lobo because you found the bug I chased for hours and because you even reported it as &lt;a href=&quot;http://bugs.python.org/issue1644818&quot;&gt;issue 1644818&lt;/a&gt;, including a patch! How not damn wonderful is that?&lt;/p&gt;
&lt;p&gt;I will not bore you with the technical details of that bug, since nobody cares. Nobody cares, even the Python guys, since that bug has been opened for 3 years, and nobody even reviewed in that time. I found an old thread about that bug where some guys were wanking about how they should do the review, because Miguel pushed for several weeks to have a review, back in 2007.&lt;/p&gt;
&lt;p&gt;But that bug was in my way. I had to do something. So I prepared my mail reader, mounted my web browser and here I was for a uniq quest: getting a Python bug fixed.&lt;/p&gt;
&lt;p&gt;At that point, if you did not stop reading earlier, you might get very excited. Don&apos;t be, spoiler, it&apos;s still not fixed. You&apos;ll have to wait the end of the season and see all the episodes I&apos;ll have to write to get the end of the story!&lt;/p&gt;
&lt;p&gt;Let&apos;s continue.&lt;/p&gt;
&lt;p&gt;I had to create an account on the Python bug tracking system. That was a trivial task for a man like me (you bet). Then, I launched a verbal attack, something you rarely see in a bug tracking system. Something I knew would awake any developer caring about their software.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Julien Danjou:&lt;br /&gt;
Is there any chance to see this &lt;em&gt;bug&lt;/em&gt; fixed someday?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I had the deep feeling that my quest was starting here. How many days would I have to wait until I get an answer? Time was passing. Minutes were ticking while I was waiting, sat in a comfortable sofa in a softly lighted room. It seemed like all my life was shorter than the delay I had to wait to get an answer.&lt;/p&gt;
&lt;p&gt;After waiting for hours, suddenly, and only 15 minutes later, I got an answer:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Martin v. Löwis:&lt;br /&gt;
Please ask on python-dev. I may be willing to revive my five-for-one offer.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Martin? Don&apos;t know that guy. Who is he? Who is he like? Will he fix that bug? What is this offer? So many question without an answer. But he asked to ask on python-dev, and I said: challenged accepted! I will write a mail to python-dev to get that bug fixed.&lt;/p&gt;
&lt;p&gt;Which I did. I sent a short (but well written you know, I made efforts) &quot;WTF?&quot; to pyhon-dev.&lt;/p&gt;
&lt;p&gt;And then the guy asked me to review 5 bugs so he will review and fix this one. And this is how I said that he was pissing me off for blackmailing me to fix a bug that was its &quot;duty&quot;.&lt;/p&gt;
&lt;p&gt;Therefore, this is the end of the story so far. Will that bug be fixed some day? There&apos;s a hope, because another guy jumped in and took the bug assignment.&lt;/p&gt;
&lt;p&gt;To be continued.&lt;/p&gt;
&lt;p&gt;My conclusion about all that story: that is a little rude to start something new, with new tools, and get quickly into teething troubles. It&apos;s even more harsh to enter a community because you just found bugs, and be not very well received when you ask to apply a 10 lines long fix somebody wrote 3 years ago to fix it.&lt;/p&gt;
&lt;p&gt;I&apos;ll probably still use Python :-), but I get a darker image of its community now.&lt;/p&gt;
</content:encoded><category>python</category></item></channel></rss>