Common Applications of Time Series Data (Python x Pandas)

Firstly, time in Python can be represented as either a string or a timedelta or datetime object. We’ll have more on those two below. Whenever reading data from a .csv file using the pd.read_csv function, time is generally read as a string, which can be converted to a datetime object with a very simply function. But more importantly, it is imperative to understand how time is actually represented.

YYYY-MM-DD HH:mm:ss.msmsms (m is minute, s is seconds, ms is milliseconds).

This notation is both simple and effective when dealing with time. Now, while reading, your code sees this format as a string, so it doesn’t whether this represents time or not. The pd.to_datetime is a vectorized function that converts all values in a Series (for dataframes, a specific column) to datetime objects. Intuitively, the first half is the ‘date’, and the second part the ‘time.’

Perhaps the most important element of dealing with time in python is your purpose. Generally, one would use this data to either calculate the amount of time passed between readings, or to calculate the amount of time between a starting time and current time. Let’s look at each separately.

Between Readings

To start off, I’ll write down an example. Say you have a pH sensor that gives you the following values: 7, 8, 9, 8. Now, your computer also logs the time at which these values are received: 2019-06-19 4:03:12, 2019-06-19 9:06:10, 2019-06-19 23:56:55 and 2019-06-20 1:01:04. Now, we have to find the amount of time passed between readings. Thus, you have a pandas dataframe as follows. I will explain why the last row is as such below.

        pH       time
0 7 2019-06-19 4:03:12
1 8 2019-06-19 9:06:10
2 9 2019-06-19 23:56:55
3 8 2019-06-20 1:01:04
Now, what we will do here is shift each time value down by 1 and store it in a series. This is done using the shift() function from pandas that takes in 1 required argument, n, that determines the number of spaces to shift by and whether to shift from left to right or top to bottom (default is top to bottom).
 
shifted_time = df.time.shift(1)

creating this series

0    NaN
1    2019-06-19 4:03:12
2    2019-06-19 9:06:15
3    2019-06-19 23:56:55
name: shifted_time

now, creating a last row of NaN and 0 respectively ensures that our last value is taken into consideration. The difference can now be calculated as follows.

diff = [df.datetime[i] - diff[i] for i in diff[]
diff[0] = 0    # as initial value has nothing to be compared against
df['difference'] = diff

creating the following data frame and giving us the amount of time between reading

    pH           time               difference
0 7 2019-06-19 4:03:12 ~
1 8 2019-06-19 9:06:15 0 days 05:03:03
2 9 2019-06-19 23:56:55 0 days 14:50:30
3 8 2019-06-20 1:01:04 0 days 01:05:09
This difference can be converted to seconds by specific functions, such as dt.total_seconds This just does this:

    pH           time               difference
0 7 2019-06-19 4:03:12 ~
1 8 2019-06-19 9:06:15 18183
2 9 2019-06-19 23:56:55 53430
3 8 2019-06-20 1:01:04 3909
difference is in seconds now. note: subtracting two datatime objects automatically creates a timedelta object, no need to do manually for latest version of pandas

Now, these sort of computations have many real world applications. For example, formula 1. If you’ve ever noticed, the person who comes first (Hamilton most of the time) has his time written in absolute terms, i.e, his time is written as it is. This is done by initializing the time column of the first row to 0. Then, from second place onwards, their time is represented as the difference by which they lost to the first person. I suggest you google F1 and have a look at it yourself.

Differences from starting point

This methodology isn’t so prominent to customers, but is extremely useful to data scientists looking to check whether the time data they receive has breaks or not. This is done by taking the minimum time value (normally the first one), and then subtracting it from each value – giving the time passed from the start to that specific point. These ‘differences’ are then plotted on the vertical axis, with the index of the time data in the entire data set taken on the horizontal axis.

Consequently, you get graphs like these that display the continuity of the data you have.

A log scale has been taken on the y-axis to ensure that small difference values are noticeable.

Now, Python, and especially pandas has many functions and methods to deal with time series data. This link would be the best to explore these functions –> https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html. One functionality that I absolutely adore is the ability to convert data as per timezone – adding both flexibility and adaptability to your code. This is extremely prominent on the web. For example, if you’ve ever noticed, most websites display the timings for a sport event as per your timezone. This is independent from python and pandas, but has the same functionality. Using your location, the time data employed by the server is converted to your timezone