Getting started with Time Series using Pandas
Time series analysis is the backbone for many companies since most businesses work by analyzing their past data to predict their future decisions. Analyzing such data can be tricky but Python, as a programming language, can help to deal with such data. Python has both inbuilt tools and external libraries, making the whole analysis process both seamless and easy. Python’s Pandas library is frequently used to import, manage, and analyze datasets in various formats. However, in this article, we’ll use it to analyze stock prices and perform some basic time-series operations.
Time Series Data
Time Series data is a sequence of data points listed in time order. It is a set of observations at specified times and equal intervals. Time series data is pretty common in our day to day lives, and some common examples include:
Time series data usually has some
date-time index and a corresponding
value for that date-time index.
Components of Time Series
Time series data mainly consists of four components:
- Trend Component: A variation that moves up or down in a reasonably predictable pattern over a long period.
- Seasonality Component: The variation that is regular and periodic and repeats itself over a specific period such as a day, week, month, season, etc.,
- Cyclical Component: The variation that corresponds with business or economic ‘boom-bust cycles or follows their peculiar cycles, and
- Random Component: The variation that is erratic or residual and does not fall under any of the above three classifications.
Here is a visual interpretation of the various components of the time series to make this concept clearer. You can view the original diagram with its context here.
Case study: Analyzing the stock prices of Maruti
This article aims to introduce some standard techniques used in time-series analysis and walks through the iterative steps required to manipulate and visualize time-series data.
Maruti Suzuki India Limited, formerly known as Maruti Udyog Limited, is an automobile manufacturer in India. It is a 56.21% owned subsidiary of the Japanese car and motorcycle manufacturer Suzuki Motor Corporation. As of July 2018, it had a market share of 53% of the Indian passenger car market[Wikipedia]
Importing necessary libraries and the Stock Data
Fire up the editor of your choice and type in the following code to import the required libraries and data. The data has been taken from Kaggle. The code along with the dataset can be accessed from here.
# Importing required modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
from datetime import datetime # To access datetime
from pandas import Series # To work on series# Settings for pretty nice plots
%matplotlib inline# ignore warnings
warnings.filterwarnings('ignore')# Reading in the data
df = pd.read_csv('maruti_stock.csv')
A first look at Maruti’s stock Prices
Let’s look at the first few columns of the dataset.
# Inspecting the data
To keep things simple, let’s limit the number of columns.
data = df[['Date','Open','High','Low','Close','Volume','VWAP']]
Datetime objects in Python
Let us now look at the datatypes of the various components.
It appears that the
Date column is being treated as a string rather than as dates. Let’s make things right. For this, we shall use the pandas’
to_datetime feature, which converts the arguments to dates. Lastly, we want to make sure that the Date column is the index column.
# Convert string to datetime64
data['Date'] = data['Date'].apply(pd.to_datetime)
The data has been imported, and we are ready to begin our analysis. Next, we will look at some of the essential attributes of stock data.
Understanding the stock data
Before beginning any analysis, we must understand the data first. A typical stock data consists of many columns. Let’s dive a bit deeper into some of them:
Manipulating Time Series dataset with Pandas
As the pandas’ library was developed in financial modeling, it contains a comprehensive set of tools for working with dates, times, and time-indexed data. The name pandas is derived from the term “panel data,” an econometrics term for data sets that include observations over multiple time periods for the same individuals[source]. Let’s look at the main pandas’ data structures for working with time-series data.
1. DateTime Manipulations ⚙️
Python’s necessary objects for working with dates and times reside in the built-in
datetime module. In pandas, a single point in time is represented as a
Timestamp And we can use
datetime() function to create
Timestamps from strings in a wide variety of date/time formats.
from datetime import datetimemy_year = 2019
my_month = 4
my_day = 21
my_hour = 10
my_minute = 5
my_second = 30
We can now create timestamps by using the above attributes.
test_data = datetime(my_year,my_month,my_day)
datetime.datetime(2019, 4, 21, 0, 0)
We have selected only the day, month, and year. We could also include more details like an hour, minute, and second, if necessary.
test_data = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)print("The day is : ",test_data.day)
print("The hour is : ",test_data.hour)
print("The month is : ",test_data.month)
For our stock price dataset, the index column is of the type.
DatetimeIndex. We can use pandas to obtain the minimum and maximum dates in the data.
We can also calculate the
latest date index location and the
Earliest Date index location as follows:
# Earliest date index location
print('Earliest date index location is: ',data.index.argmin())
# Latest date location
print('Latest date location: ',data.index.argmax())
2. Subsetting the time series data ✂️
Instead of working with the entire data, it is prudent to slice the time series data to highlight the portion of the data we are interested in. Since the
volume-weighted average price (VWAP) is a trading benchmark, we shall limit our analysis to only that column.
df_vwap = df[['Date','VWAP']] # df is the original dataframe
df_vwap['Date'] = df_vwap['Date'].apply(pd.to_datetime)
df_vwap.set_index("Date", inplace = True)
The data can also be sliced on year, month, or day, as follows:
# Slicing on year
vwap_subset = df_vwap['2017':'2020']
# Slicing on month
vwap_subset = df_vwap['2017-01':'2020-12']
#Slicing on day
vwap_subset = df_vwap['2017-01-01':'2020-12-15']
3. Visualizing the Time Series data 📊
Visualizing the time series data can offer a better perspective instead of merely looking at the numbers.
df_vwap['VWAP'].plot(figsize=(16,8),title=' volume weighted average price')
It appears that Maruti had a more or less steady increase in its stock price from 2004 to the mid-2018 window. There seems to be some drop in 2019, though. Let’s further analyze the data for the year 2018.
ax = df_vwap.loc['2018', 'VWAP'].plot(figsize=(15,6))
ax.set_title('Month-wise Trend in 2018');
We see a dip in the stock prices, particularly around the end of October and November. Let’s further zoom in on these dates.
ax = df_vwap.loc['2018-10':'2018-11','VWAP'].plot(marker='o', linestyle='-',figsize=(15,6))
ax.set_title('Oct-Nov 2018 trend');
So there is a dip in stock prices around the last week of October and the first week of November. One could investigate it further by finding out if there was some special event on that day.
4. Feature Extraction ⚒
Let’s extract time and date features from the Date column.
df_vwap['year'] = df_vwap.Date.dt.year
df_vwap['month'] = df_vwap.Date.dt.month
df_vwap['day'] = df_vwap.Date.dt.day
df_vwap['day of week'] = df_vwap.Date.dt.dayofweek
#Set Date column as the index column.
5. Time resampling ⏳
Time resampling is a way to aggregate data with respect to a defined time period. We have the stock price data for each day, but this doesn’t make much sense if we want to see the trend for a financial institution. What is useful is the aggregated information for every month or every quarter. This helps the management to get an overview instantly and then make decisions based on this overview.
Pandas library has a resample() function which resamples time-series data. The resample method in pandas is similar to its groupby method since it is essentially grouping by a specific time span.
df_vwap.resample(rule = 'A').mean()[:5]
Let’s understand what this means:
df_vwap.resample()is used to resample the stock data.
- The ‘A’ stands for year-end frequency and denotes the offset values we want to resample the data.
mean()indicates that we want the average stock price during this period.
The output looks like this:
So here we have the average stock data displayed on the 31st of each month. Below is a complete list of the offset values. The list can be found in the pandas’ documentation.
We can also use time sampling to plot charts for some specific columns.
plt.rcParams['figure.figsize'] = (8, 6)
plt.title('Yearly Mean VWAP for Maruti')
The above bar plot corresponds to Maruti’s average adjusted close price at year-end. Similarly, here is the monthly maximum opening price for Maruti
df_vwap['VWAP'].resample('AS').mean().plot(kind='bar',figsize = (10,4))
plt.title('Yearly start Mean VWAP for Maruti')
6. Time Shifting ⏲️
Sometimes, it is desirable to shift or move the data forward or backward in time. This shifting is done along a time index by the desired number of time-frequency increments. Here is the original dataset without any shiftings.
6.1 Forward Shifting
To shift our data forward, we will pass the desired number of periods (or increments) through the
shift() function, which in this case, needs to be a positive value. Let’s move our data forward by one period or index, which means that all values which earlier corresponded to row N will now belong to row N+1. Here is the output:
6.2 Backward Shifting
Similarly, there is a concept of backward shifting. To shift our data backward, the number of periods (or increments) to change by, must be negative.
6.3 Shifting based off Time String Code
We can also use the
offset from the offset table (from the Time Resampling section )for time-shifting. For that, we will make use of the pandas’
tshift() function. We only need to pass in the periods and freq parameters. The period attribute defines the number of steps to be shifted, and the freq parameters denote the size of those steps.
Let’s say we want to shift the data 3 months forward:
df_vwap.tshift(periods=3, freq = 'M').head()
We would get the following as an output.
7. Rolling Windows 🧈
Time series data can be noisy, and as a result, it becomes difficult to gauge the trend or pattern due to the high fluctuations. For instance, here is the visualization of the VWAP price of the Maruti stock over the years.
df_vwap['VWAP'].plot(figsize = (10,6))
There’s quite a bit of noise here because this is the daily data. It would be nice to average this out by a week, which is where a rolling mean comes in. A
rolling mean or
Moving average is a transformation method that tends to average out this noise from the data. The idea is simple. Split the data into windows, and the data in each of these windows is then aggregated by some function like
For this example, we shall use a rolling mean for seven days.
The first six values have become blank because there wasn’t enough data to fill since we chose a window of 7 days.
So what are the significant effects of calculating a moving average or using this rolling method? Our data becomes a lot less noisy and becomes more reflective of the trend than the actual data. Let’s plot this out. We shall plot the original data and then the rolling data for 30 days in the same graph.
The blue line is the original noisy data, while the orange line with a 30-day rolling window ends up having not as much noise as the blue line. Now, once you run this code, an important aspect to keep in mind is that the first 29 days aren’t going to have that orange line because there wasn’t enough data actually to calculate that rolling mean.
Pandas is a powerful library with a lot of inbuilt functions for analyzing time-series data. This article saw how Python’s pandas’ library could be used for wrangling and visualizing time series data. We also performed tasks like time sampling, time-shifting, and rolling on the stock data. These are usually the first steps in investigating any time series data. Going forward, we could use this data in several ways. One way could be to perform a basic financial analysis by calculating the daily percentage change in stocks to get an idea about the stock price volatility. Another way would be to use this data to predict Maruti’s stock price for the next few days by employing Machine Learning Techniques. Whatever assignment you choose, the preliminary steps shown in this article would come in handy.
Originally published here