Working with datetime data sets can be one of the most frustrating aspects of computer programming. Not only do you need to keep track of the date but you also need to learn how to represent dates and times in each language, create indices out of those data points and ensure all your data sets handle daylight saving time in the same manner. Fortunately, this primer on datetime data will help you get started working in Python.
Datetime Object in Python: What Is It?
What Is a Datetime Object in Python?
Most representations of date and time in Python are presented as datetime objects, created from the Datetime package. This means that knowing the Datetime package and how to use it are critical!
Fundamentally, a datetime object is a variable that contains information about (no surprise coming here) a date and time. It can also include time zone information, and there are tools for changing time zones as needed.
Let’s look at a few examples of datetime objects, created using the Datetime package. First, we can use the package’s command to create a variable storing the current time as follows:
import datetime
import pytz
now = datetime.datetime.now(pytz.timezone('US/Pacific'))
print(now)
print(now.tzinfo)
The first two lines import the important packages for this task. The first is the Datetime package, which enables us to create and manipulate datetime objects. The second is the Pytz package, which provides time zone information.
The third line calls the datetime.datetime.now
function to create a datetime object representing the time at which we run the code. This line also adds a time zone to the datetime stating the datetime represents a time in the U.S. Pacific time zone.
The fourth and fifth lines both print outputs used to demonstrate the result of the code.
The outputs from that code are:
2022-05-11 09:19:01.859385-07:00
US/Pacific
The first output shows the full information of the variable now. It shows that the variable was created on May 11, 2022 at 9:19 and 1.86 seconds. Since I set the time zone as ‘US/Pacific’
the program attaches the appropriate -7
hours (compared to UTC, Coordinate Universal Time) to the variable. The second output confirms the time zone information by printing that the variable’s time zone is the U.S. Pacific time zone.
You’ll notice that in the above code I set the time zone to the U.S. Pacific time zone by calling pytz.timezone(‘US/Pacific’)
. If you want to use a different time zone you need to know the correct code (though they all follow the same format and reference known time zones, so they’re fairly predictable). If you want to find your time zone you can print a list of all options with the following command.
print(pytz.all_timezones)
We can also use the datetime.datetime
function to create a datetime at a specified date. Notice the format of the datetime object showing the current time from the previous example, as the inputs to datetime.datetime
are provided in the same order (year, month, day, hour, minute, seconds). In other words, if we want to create a datetime object representing May 11, 2022 at 12:11:03 in the U.S. Pacific time zone we can use the following code:
specified_datetime = datetime.datetime(2022, 5, 11, 12, 11, 3).astimezone(pytz.timezone('US/Pacific'))
print(specified_datetime)
print(specified_datetime.tzinfo)
Notice how the inputs to datetime.datetime
appear above. Then the .astimezone
method is called to set the time zone to the U.S. Pacific time zone.
The outputs from that code are:
2022-05-11 12:11:03-07:00
US/Pacific
Turns out that the code created the variable we wanted. As desired, specified_datetime
now returns May 11, 2022 at 12:11:03 in the U.S. Pacific time zone.
Now what if we want to represent the same time in a different time zone? We could go through the effort of calculating that new time zone and creating a new datetime object accordingly but that would require us to know the time difference, do the math and create the new object accordingly. Another option is to convert the time zone to the desired time zone and save the output to a new variable. So if we want to convert specified_datetime
to the U.S. Eastern time zone we can use the following code.
eastern_datetime = specified_datetime.astimezone(pytz.timezone('US/Eastern'))
print(eastern_datetime)
print(eastern_datetime.tzinfo)
That code calls the .astimezone()
method of specified_datetime
with a new time zone object representing the U.S. Eastern time zone. The printed outputs are:
2022-05-11 15:11:03-04:00
US/Eastern
Notice the changes between specified_datetime
and atlantic_datetime
. The hour has changed from 12 to 15, due to U.S. Eastern time being three hours ahead of U.S. Pacific time. The time zone information has changed from -7
to -4
, because U.S. Eastern time is four hours different from UTC instead of seven hours different. Finally, notice that the printed time zone information is now US/Eastern
instead of US/Pacific
.
Pandas Indices
Pandas dataframes often use datetime objects as the indices because this enables data sets to track the date and time at which a measurement was recorded. Therefore Pandas provides many tools you can use.
Your first introduction to dataframes with a datetime index was most likely importing somebody else’s data set that happens to use one. Consider the following example and the outputs provided:
data = pd.read_csv(r'C:\Users\Peter Grant\Desktop\Sample_Data.csv', index_col = 0)
print(data.index)
print(type(data.index))
print(type(data.index[0]))
The index in the sample data set uses datetime, so you’d expect the dataframe’s index to be a datetime index, right? Well, unfortunately, you’d be wrong. Let’s take a look at the outputs:
Index(['10/1/2020 0:00', '10/1/2020 0:00', '10/1/2020 0:00', '10/1/2020 0:01',
'10/1/2020 0:01', '10/1/2020 0:01', '10/1/2020 0:01', '10/1/2020 0:02',
'10/1/2020 0:02', '10/1/2020 0:02',
...
'4/1/2021 2:01', '4/1/2021 2:01', '4/1/2021 2:02', '4/1/2021 2:02',
'4/1/2021 2:02', '4/1/2021 2:02', '4/1/2021 2:03', '4/1/2021 2:03',
'4/1/2021 2:03', '4/1/2021 2:03'],
dtype='object', length=1048575)
<class 'pandas.core.indexes.base.Index'>
<class 'str'>
The index looks like what we’d expect. Each represents the date and time while iterating through dates and times to the end of the index. Samples are recorded every 15 seconds and there are four per minute, so that’s good, but the value doesn’t show the number of seconds, so that’s a bit odd.
But then things get strange. The type of the index is generic when we want it to be a datetime index. Finally, the type of the first entry is a string instead of a datetime object.
Pandas has read in the datetime index as a list of strings, not as a datetime index. This will happen every time. Fortunately, Pandas has a to_datetime()
function that solves this problem! Consider the following code:
data.index = pd.to_datetime(data.index)
print(data.index)
print(type(data.index))
print(type(data.index[0]))
And the outputs:
DatetimeIndex(['2020-10-01 00:00:00', '2020-10-01 00:00:00',
'2020-10-01 00:00:00', '2020-10-01 00:01:00',
'2020-10-01 00:01:00', '2020-10-01 00:01:00',
'2020-10-01 00:01:00', '2020-10-01 00:02:00',
'2020-10-01 00:02:00', '2020-10-01 00:02:00',
...
'2021-04-01 02:01:00', '2021-04-01 02:01:00',
'2021-04-01 02:02:00', '2021-04-01 02:02:00',
'2021-04-01 02:02:00', '2021-04-01 02:02:00',
'2021-04-01 02:03:00', '2021-04-01 02:03:00',
'2021-04-01 02:03:00', '2021-04-01 02:03:00'],
dtype='datetime64[ns]', length=1048575, freq=None)
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Ah ha. This looks much better. The index of the dataframe is now a datetime index and the type of the first entry is now a Pandas time stamp (which is equivalent to a datetime object). This is what we wanted, and something we can work with.
But what if you want to create a datetime index of your own? If you know the date range and frequency for which you want to create a datetime index, you can use the Pandas date_range()
function. Here’s an example:
index = pd.date_range(datetime.datetime(2022, 1, 1, 0, 0), datetime.datetime(2022, 12, 31, 23, 55), freq = '5min')
print(index)
This code returns the following output:
DatetimeIndex(['2022-01-01 00:00:00', '2022-01-01 00:05:00',
'2022-01-01 00:10:00', '2022-01-01 00:15:00',
'2022-01-01 00:20:00', '2022-01-01 00:25:00',
'2022-01-01 00:30:00', '2022-01-01 00:35:00',
'2022-01-01 00:40:00', '2022-01-01 00:45:00',
...
'2022-12-31 23:10:00', '2022-12-31 23:15:00',
'2022-12-31 23:20:00', '2022-12-31 23:25:00',
'2022-12-31 23:30:00', '2022-12-31 23:35:00',
'2022-12-31 23:40:00', '2022-12-31 23:45:00',
'2022-12-31 23:50:00', '2022-12-31 23:55:00'],
dtype='datetime64[ns]', length=105120, freq='5T')
Comparing the code calling date_range()
to the documentation for the function, you can see that the first two entries set the start and end dates of the range. The start date was set to January 1, 2022 at midnight, and the end range was set to December 31, 2022 at 23:55:00. The third entry sets the frequency of the datetime index to be five minutes. Notice that the code for five minutes is ‘5min’
. In order to get the frequency you want, you need to set the frequency using the correct code. Fortunately, a list of Pandas codes is available.
Indexing with Pandas datetime indices can also be a bit of a pain. At first glance it seems that you need to create complex datetime objects to reference the correct part of the dataframe. Consider the following example wherein I create a new dataframe using the index from the last example, set a value in the dataframe and print that value to ensure it updates correctly.
df = pd.DataFrame(index = index, columns = ['Example'])
df.loc[datetime.datetime(2022, 1, 1, 0, 0, 0), 'Example'] = 2
print(df.loc[datetime.datetime(2022, 1, 1, 0, 0, 0), 'Example'])
The output from this code is exactly what I wanted. It prints 2
, showing that the dataframe’s value at [datetime.datetime(2022, 1, 1, 0, 0, 0), ‘Example’]
is 2
as desired. But specifying that datetime over and over again gets tedious.
Fortunately, you can still reference datetime indexes by position. You can do so as follows, if you want to edit the first entry in the index.
df = pd.DataFrame(index = index, columns = ['Example'])
df.loc[datetime.datetime(2022, 1, 1, 0, 0, 0), 'Example'] = 2
print(df.loc[datetime.datetime(2022, 1, 1, 0, 0, 0), 'Example'])
Notice how that code does the exact same thing, except it provides the desired index value by calling the first value of the index. You don’t even need to know what the value there is, you just need to know that you want to work with the first one — or second, or third, or whatever value you want. You only need to update the call accordingly.
Daylight Saving Time in Python
Working with daylight saving time is one of the biggest pains in Python, and one that can cause very serious data analysis errors. Consider the example of comparing physical measurements to theoretical approximations. Now you have two data sets, and you want to make sure they say the same thing. What if one data set uses daylight saving time and the other doesn’t? Suddenly you’re comparing data sets that disagree by one hour.
One way to resolve this issue is to remove one hour from the dataframe index with daylight saving time during the hours when daylight saving time occurs. To help with this, Pandas timestamps have a .dst()
method which returns the daylight saving time difference at any point. If the timestamp occurs during daylight saving time, it returns a datetime.timedelta
value of one hour. If the timestamp does not occur during daylight saving time, it returns a datetime.timedelta
value of zero hours. This enables us to identify timestamps that occur during daylight saving time and remove one hour from it accordingly.
Imagine that you have a dataframe with a datetime index that includes a daylight saving time offset. To remove daylight saving time you can iterate through the index, make the index time zone naive and remove one hour from the index. Unfortunately, indexes are immutable so you can’t edit them directly. What you can do is create an external list of values, add your updated index values to that list, and replace the index with that list at the end. That code would look like this:
temp = []
for ix in range(len(df.index)):
if df.index[ix].dst() == datetime.timedelta(hours = 1):
temp.append(df.index[ix].tz_localize(None) - datetime.timedelta(hours = 1))
else:
temp.append(df.index[ix].tz_localize(None))
df.index = temp
And there you have it. Now you know how to work with datetime objects, use them to form the indices of your Pandas dataframes and remove daylight saving time from your data sets.