j-bennet codes

On coding and data analysis, by Irina Truong (j-bennet@github)

Follow publication

Dealing with datetimes like a pro in Pandas

In my previous article (https://codeburst.io/dealing-with-datetimes-like-a-pro-in-python-fb3ac0feb94b), I was writing about challenges related to the datetime type in Python. My recommended approach for solving them was to use Pendulum library (https://github.com/sdispater/pendulum).

But guess what?

Pandas can solve those problems just as well!

What is Pandas?

Pandas is an open-source Python library designed for data analysis. If you haven’t heard about it before, check out the comprehensive documentation here: http://pandas.pydata.org/.

Challenge #1: Parsing datetimes

Let’s see how Pandas would help with your Google Analytics-like application. In that application, you were parsing log lines that looked like this:

Here is how you’d do that with Pandas:

This code:

  • reads the log lines
  • splits each lines into parts, preserving only the relevant fields, and
  • converts the resulting list of tuples into a Pandas DataFrame.

Think of the DataFrame object as a table-like structure. It has 4 columns and contains the following data:

At this point, every field is still a string (or, to be exact, a numpy object). Now you got to the datetime parsing part:

The code above:

  • provides the format string, because the log file uses a non-standard date format (date and time parts are separated by a colon “:” instead of a space “ ”)
  • provides utc=True, to tell Pandas that your dates and times should not be naive, but UTC.

That’s all it takes.

Challenge #2: Displaying datetimes with timezones

First, let’s use your date field as the dataframe’s index. This will give you a DatetimeIndex with lots of useful methods:

Now, you can convert datetimes to the user’s timezone:

And get a localized dataframe:

Challenge #3: Rounding (truncating) datetimes

To aggregate things on an hourly frequency, you have to round datetimes down to an hour. DatetimeIndex has a method for that:

In case you wanted to round up to an hour, there’s a corresponding ceiling method.

Now, to count things in this dataframe, group by date and request:

Here is your aggregate:

Challenge # 4: Finding edges of an interval

Here is how you can calculate the start of a week:

And the start of next week:

Challenge #5: Creating ranges

Creating a range of dates is extremely easy. You can define the number of points you need:

Or provide a start and end date, and generate every point in between:

I would not necessarily recommend installing Pandas just for its datetime functionality — it’s a pretty heavy library, and you may run into installation issues on some systems (*cough* Windows). But if you already use Pandas to process data, there’s no need for any additional libraries to deal with datetimes. You have this great tool right there, in Pandas’ toolbox.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

j-bennet codes
j-bennet codes

Published in j-bennet codes

On coding and data analysis, by Irina Truong (j-bennet@github)

Responses (5)

Write a response