Doing A/B testing with Apache httpd

Doing A/B testing with Apache httpd

When I started writing the landing page for The Hacker's Guide to Python, I wanted to try new things at the same time. I read about A/B testing a while ago, and I figured it was a good opportunity to test it out.

A/B testing

If you do not know what A/B testing is about, take a quick look at the Wikipedia page on that subject. Long story short, the idea is to serve two different version of a page to your visitors and check which one is getting the most success. When you found which version is better, you can definitely switch to it.

In the case of my book, I used that technique on the pre-launch page where people were able to subscribe to the newsletter. I didn't have a lot of things I wanted to test out on that page, so I just used that approach on the subtitle, being either "Learn everything you need to build a successful Python project" or "It's time you make the most out of Python".

Statistically, each version would be served half of the time, so both would get the same number of view. I then would build statistics about which page was attracting the most subscribers. With the results I would be able to switch definitively to that version of the landing page.

Technical design

My Web site, this Web site, is entirely static and served by Apache httpd. I didn't want to use any dynamic page, language or whatever. Mainly because I didn't want to have something else to install and maintain just for that on my server.

It turns out that Apache httpd is powerful enough to implement such a feature. There are different ways to build it, and I'm going to describe my choices here.

The first thing to pick is a way to balance the display of the page. You need to find a way so that if you get 100 visitors, around 50 will see the version A of your page, and around 50 will see the version B of the page.

You could use a random number generator, pick a random number for each visitor, and decides which page he's going to see. But it turns out that I didn't find a way to do that with Apache httpd at first sight.

My second thought was to pick the client IP address. But it's not such a good idea, because if you got visitors from, for example, people behind a company firewall, they are all going to be served the same page, so that kind of kills the statistics.

Finally, I picked time based balancing: if you visit the page on a second that is even, you get version A of the page, and if you visit the page on a second that is odd, you get version B. Simple, and so far nothing proves there are more visitors on even than odd seconds, or vice-versa.

The next thing is to always serve the same page to a returning visitor. I mean that if the visitor comes back later and get a different version, that's cheating. I decided the system should always serve the same page once a visitor "picked" a version. To do that, it's simple enough, you just have to use cookies to store the page the visitor has been attributed, and then use that cookie if he comes back.


To do that in Apache httpd, I used the powerful mod_rewrite that is shipped with it. I put 2 files in the books directory, named either "the-hacker-guide-to-python-a.html" and "the-hacker-guide-to-python-b.html" that got served when you requested "".

RewriteEngine On
RewriteBase /books

# If there's a cookie called thgtp-pre-version set,
# use its value and serve the page
RewriteCond %{HTTP_COOKIE} thgtp-pre-version=([^;])
RewriteRule ^the-hacker-guide-to-python$ %{REQUEST_FILENAME}-%1.html [L]

# No cookie yet and…
RewriteCond %{HTTP_COOKIE} !thgtp-pre-version=([^;]+)
# … the number of seconds of the time right now is even
RewriteCond %{TIME_SEC} [02468]$
# Then serve the page A and store "a" in a cookie
RewriteRule ^the-hacker-guide-to-python$ %{REQUEST_FILENAME}-a.html [,L]

# No cookie yet and…
RewriteCond %{HTTP_COOKIE} !thgtp-pre-version=([^;]+)
# … the number of seconds of the time right now is odd
RewriteCond %{TIME_SEC} [13579]$
# Then serve the page B and store "b" in a cookie
RewriteRule ^the-hacker-guide-to-python$ %{REQUEST_FILENAME}-b.html [,L]

With that few lines, it worked flawlessly.


The results were very good, as it worked perfectly. Combined with Google Analytics, I was able to follow the score of each page. It turns out that testing this particular little piece of content of the page was, as expected, really useless. The final score didn't allow to pick any winner. Which also kind of proves that the system worked perfectly.


But it still was an interesting challenge!