This article is a follow-up of my previous blog post about scaling a large number of connections. If you don't remember, I was trying to solve one of my followers' problem:

It so happened that I'm currently working on scaling some Python app. Specifically, now I'm trying to figure out the best way to scale SSH connections - when one server has to connect to thousands (or even tens of thousands) of remote machines in a short period of time (say, several minutes).
How would you write an application that does that in a scalable way?

In the first article, we wrote a program that could handle large scale of this problem by using multiple threads. While this worked pretty well, this had some severe limitations. This time, we're going to take a different approach.

The job

The job has not changed and is still about connecting to a remote server via
ssh. This time, rather than faking it by using ping instead, we are going to connect for real to an ssh server. Once connected to the remote server, the mission will be to run a single command. For the sake of this example, the command that will be run here is just a simple "echo hello world".

Using an event loop

This time, rather than leveraging threads, we are using asyncio. Asyncio is the leading Python event loop system implementation. It allows executing multiple functions (named coroutines) concurrently. The idea is that each time a coroutine performs an I/O operation, it yields back the control to the event loop. As the input or output might be blocking (e.g., the socket has no data yet to be read), the event loop will reschedule the coroutine as soon as there is work to do. In the meantime, the loop can schedule another coroutine that has something to do – or wait for that to happen.

Not all libraries are compatible with the asyncio framework. In our case, we need an ssh library that has support for asyncio. It happens that AsyncSSH is a Python library that provides ssh connection handling support for asyncio. It is particularly easy to use, and the documentation has plenty of examples.

Here's the function that we're going to use to execute our command on a remote host:

import asyncssh

async def run_command(host, command):
    async with asyncssh.connect(host) as conn:
        result = await
        return result.stdout

The function run_command runs a command on a remote host once connected
to it via ssh. It then returns the standard output of the command. The function uses the keywords async and await that are specific to Python >= 3.6 and asyncio. It indicates that the called functions are coroutine that might be blocking, and that the control is yield back to the event loop.

As I don't own hundreds of servers where I can connect to, I will be using a single remote server as the target – but the program will connect to it multiple times. The server is at a latency of about 6 ms, so that'll magnify a bit the results.

The first version of this program is simple and stupid. It'll run N times the run_command function serially by providing the tasks one at a time to the asyncio event loop:

loop = asyncio.get_event_loop()

outputs = [
        run_command("myserver", "echo hello world %d" % i))
    for i in range(200)

Once executed, the program prints the following:

$ time python3
['hello world 0\n', 'hello world 1\n', 'hello world 2\n', … 'hello world 199\n']
python3  6.11s user 0.35s system 15% cpu 41.249 total

It took 41 seconds to connect 200 times to the remote server and execute a simple printing command.

To make this faster, we're going to schedule all the coroutines at the same time. We just need to feed the event loop with the 200 coroutines at once. That will give it the ability to schedule them efficiently.

outputs = loop.run_until_complete(asyncio.gather(
    *[run_command("myserver", "echo hello world %d" % i)
      for i in range(200)]))

By using asyncio.gather, it is possible to pass a list of coroutines and wait for all of them to be finished. Once run, this program prints the following:

$ time python3
['hello world 0\n', 'hello world 1\n', 'hello world 2\n', … 'hello world 199\n']
python3  4.90s user 0.34s system 35% cpu 14.761 total

This version only took ⅓ of the original execution time to finish! As a fun note, the main limitation here is that my remote server is having trouble to handle more than 150 connections in parallel, so this program is a bit tough for it alone.


To show how great this method is, I've built a chart below that shows the difference of execution time between the two approaches, depending on the number of hosts the application has to connect to.


The trend lines highlight the difference of execution time and how important the concurrency is here. For 10,000 nodes, the time needed for a serial execution would be around 40 minutes whereas it would be only 7 minutes with a cooperative approach – quite a difference. The concurrent approach allows executing one command 205 times a day rather than only 36 times!

That was the second step

Using an event loop for tasks that can run concurrently due to their I/O intensive nature is really a great way to maximize the throughput of a program. This simple changes made the program 6× faster.

Anyhow, this is not the only way to scale Python program. There are a few other options available on top of this mechanism – I've covered those in my book Scaling Python, if you're interested in learning more!

Until then, stay tuned for the next article of this series!