Recently, I've been fighting with the never ending issue of timezones. I never thought I would have plunged into this rabbit hole, but hacking on OpenStack and Gnocchi I felt into that trap easily is, thanks to Python.
“Why you really, really, should never ever deal with timezones”
To get a glimpse of the complexity of timezones, I recommend that you watch Tom Scott's video on the subject. It's fun and it summarizes remarkably well the nightmare that timezones are and why you should stop thinking that you're smart.
The importance of timezones in applications
Once you've heard what Tom says, I think it gets pretty clear that a timestamp without any timezone attached does not give any useful information. It should be considered irrelevant and useless. Without the necessary context given by the timezone, you cannot infer what point in time your application is really referring to.
That means your application should never handle timestamps with no timezone information. It should try to guess or raises an error if no timezone is provided in any input.
Of course, you can infer that having no timezone information means UTC. This sounds very handy, but can also be dangerous in certain applications or language – such as Python, as we'll see.
Indeed, in certain applications, converting timestamps to UTC and losing the timezone information is a terrible idea. Imagine that a user create a recurring event every Wednesday at 10:00 in its local timezone, say CET. If you convert that to UTC, the event will end up being stored as every Wednesday at 09:00.
Now imagine that the CET timezone switches from UTC+01:00 to UTC+02:00: your application will compute that the event starts at 11:00 CET every Wednesday. Which is wrong, because as the user told you, the event starts at 10:00 CET, whatever the definition of CET is. Not at 11:00 CET. So CET means CET, not necessarily UTC+1.
As for endpoints like REST API, a thing I daily deal with, all timestamps should include a timezone information. It's nearly impossible to know what timezone the timestamps are in otherwise: UTC? Server local? User local? No way to know.
Python design & defect
Python comes with a timestamp object named
datetime.datetime. It can store date and time precise to the microsecond, and is qualified of timezone "aware" or "unaware", whether it embeds a timezone information or not.
To build such an object based on the current time, one can use
datetime.datetime.utcnow() to retrieve the date and time for the UTC timezone, and
datetime.datetime.now() to retrieve the date and time for the current timezone, whatever it is.
>>> import datetime >>> datetime.datetime.utcnow() datetime.datetime(2015, 6, 15, 13, 24, 48, 27631) >>> datetime.datetime.now() datetime.datetime(2015, 6, 15, 15, 24, 52, 276161)
As you can notice, none of these results contains timezone information. Indeed, Python
datetime API always returns unaware
datetime objects, which is very unfortunate. Indeed, as soon as you get one of this object, there is no way to know what the timezone is, therefore these objects are pretty "useless" on their own.
Armin Ronacher proposes that an application always consider that the unaware
datetime objects from Python are considered as UTC. As we just saw, that statement cannot be considered true for objects returned by
datetime.datetime.now(), so I would not advise doing so.
datetime objects with no timezone should be considered as a "bug" in the application.
My recommendation list comes down to:
- Always use aware
datetimeobject, i.e. with timezone information. That makes sure you can compare them directly (aware and unaware
datetimeobjects are not comparable) and will return them correctly to users. Leverage pytz to have timezone objects.
- Use ISO 8601 as input and output string format. Use
datetime.datetime.isoformat()to return timestamps as string formatted using that format, which includes the timezone information.
In Python, that's equivalent to having:
>>> import datetime >>> import pytz >>> def utcnow(): return datetime.datetime.now(tz=pytz.utc) >>> utcnow() datetime.datetime(2015, 6, 15, 14, 45, 19, 182703, tzinfo=<UTC>) >>> utcnow().isoformat() '2015-06-15T14:45:21.982600+00:00'
If you need to parse strings containing ISO 8601 formatted timestamp, you can rely on the iso8601, which returns timestamps with correct timezone information. This makes timestamps directly comparable:
>>> import iso8601 >>> iso8601.parse_date(utcnow().isoformat()) datetime.datetime(2015, 6, 15, 14, 46, 43, 945813, tzinfo=<FixedOffset '+00:00' datetime.timedelta(0)>) >>> iso8601.parse_date(utcnow().isoformat()) < utcnow() True
If you need to store those timestamps, the same rule should apply. If you rely on MongoDB, it assumes that all the timestamp are in UTC, so be careful when storing them – you will have to normalize the timestamp to UTC.
For MySQL, nothing is assumed, it's up to the application to insert them in a timezone that makes sense to it. Obviously, if you have multiple applications accessing the same database with different data sources, this can end up being a nightmare.
PostgreSQL has a special data type that is recommended called
timestamp with timezone. That does not mean you should not use UTC in most cases; that just means you are sure that the timestamp are stored in UTC since it's written in the database, and you check if any other application inserted timestamps with different timezone.
As a side note, I've improved OpenStack situation recently by changing the oslo.utils.timeutils module to deprecate some useless and dangerous functions. I've also added support for returning timezone aware objects when using the
oslo_utils.timeutils.utcnow() function. It's not possible to make it a default unfortunately for backward compatibility reason, but it's there nevertheless, and it's advised to use it. Thanks to my colleague Victor for the help!
Have a nice day, whatever your timezone is!