# Getting started with Time Series using Pandas

Time series analysis is the backbone for many companies since most businesses work by analyzing their past data to predict their future decisions. Analyzing such data can be tricky but Python, as a programming language, can help to deal with such data. Python has both inbuilt tools and external libraries, making the whole analysis process both seamless and easy. Python’s Pandas library is frequently used to import, manage, and analyze datasets in various formats. However, in this article, we’ll use it to analyze stock prices and perform some basic time-series operations.

# Time Series Data

Time Series data is a sequence of data points listed in time order. It is a set of observations at specified times and equal intervals. Time series data is pretty common in our day to day lives, and some common examples include:

Time series data usually has some `date-time index` and a corresponding `value` for that date-time index.

## Components of Time Series

Time series data mainly consists of four components:

• Trend Component: A variation that moves up or down in a reasonably predictable pattern over a long period.
• Seasonality Component: The variation that is regular and periodic and repeats itself over a specific period such as a day, week, month, season, etc.,
• Cyclical Component: The variation that corresponds with business or economic ‘boom-bust cycles or follows their peculiar cycles, and
• Random Component: The variation that is erratic or residual and does not fall under any of the above three classifications.

Here is a visual interpretation of the various components of the time series to make this concept clearer. You can view the original diagram with its context here.

# Case study: Analyzing the stock prices of Maruti

This article aims to introduce some standard techniques used in time-series analysis and walks through the iterative steps required to manipulate and visualize time-series data.

## Dataset

Maruti Suzuki India Limited, formerly known as Maruti Udyog Limited, is an automobile manufacturer in India. It is a 56.21% owned subsidiary of the Japanese car and motorcycle manufacturer Suzuki Motor Corporation. As of July 2018, it had a market share of 53% of the Indian passenger car market[Wikipedia]

## Importing necessary libraries and the Stock Data

Fire up the editor of your choice and type in the following code to import the required libraries and data. The data has been taken from Kaggle. The code along with the dataset can be accessed from here.

`# Importing required modulesimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport matplotlib.dates as mdatesimport datetime as dtfrom datetime import datetime    # To access datetime from pandas import Series        # To work on series# Settings for pretty nice plots plt.style.use('fivethirtyeight')%matplotlib inline# ignore warningsimport warningswarnings.filterwarnings('ignore')# Reading in the datadf = pd.read_csv('maruti_stock.csv')`

## A first look at Maruti’s stock Prices

Let’s look at the first few columns of the dataset.

`# Inspecting the datadf.head()` A look at the first five rows of the dataset | Image by Author

To keep things simple, let’s limit the number of columns.

`data = df[['Date','Open','High','Low','Close','Volume','VWAP']]`

## Datetime objects in Python

Let us now look at the datatypes of the various components.

`data.info()`

It appears that the `Date `column is being treated as a string rather than as dates. Let’s make things right. For this, we shall use the pandas’ `to_datetime` feature, which converts the arguments to dates. Lastly, we want to make sure that the Date column is the index column.

`# Convert string to datetime64data['Date'] = data['Date'].apply(pd.to_datetime)data.set_index('Date',inplace=True)data.head()`

The data has been imported, and we are ready to begin our analysis. Next, we will look at some of the essential attributes of stock data.

# Understanding the stock data

Before beginning any analysis, we must understand the data first. A typical stock data consists of many columns. Let’s dive a bit deeper into some of them:

The volume-weighted average price (VWAP) is crucial because it provides traders with insight into both the security trend and value.[Investopedia]

# Manipulating Time Series dataset with Pandas

As the pandas’ library was developed in financial modeling, it contains a comprehensive set of tools for working with dates, times, and time-indexed data. The name pandas is derived from the term “panel data,” an econometrics term for data sets that include observations over multiple time periods for the same individuals[source]. Let’s look at the main pandas’ data structures for working with time-series data.

## 1. DateTime Manipulations ⚙️

Python’s necessary objects for working with dates and times reside in the built-in `datetime` module. In pandas, a single point in time is represented as a `Timestamp` And we can use `datetime()` function to create `Timestamps` from strings in a wide variety of date/time formats.

`from datetime import datetimemy_year = 2019my_month = 4my_day = 21my_hour = 10my_minute = 5my_second = 30`

We can now create timestamps by using the above attributes.

`test_data = datetime(my_year,my_month,my_day)test_data#Outputdatetime.datetime(2019, 4, 21, 0, 0)`

We have selected only the day, month, and year. We could also include more details like an hour, minute, and second, if necessary.

`test_data = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)print("The day is : ",test_data.day)print("The hour is : ",test_data.hour)print("The month is : ",test_data.month)`

For our stock price dataset, the index column is of the type. `DatetimeIndex.` We can use pandas to obtain the minimum and maximum dates in the data.

`print(data.index.max())print(data.index.min())`

We can also calculate the `latest date index` location and the `Earliest Date index` location as follows:

`# Earliest date index locationprint('Earliest date index location is: ',data.index.argmin())# Latest date locationprint('Latest date location: ',data.index.argmax())`

## 2. Subsetting the time series data ✂️

Instead of working with the entire data, it is prudent to slice the time series data to highlight the portion of the data we are interested in. Since the `volume-weighted average price (VWAP)` is a trading benchmark, we shall limit our analysis to only that column.

`df_vwap = df[['Date','VWAP']] # df is the original dataframedf_vwap['Date'] = df_vwap['Date'].apply(pd.to_datetime)df_vwap.set_index("Date", inplace = True)df_vwap.head()`

The data can also be sliced on year, month, or day, as follows:

`# Slicing on yearvwap_subset = df_vwap['2017':'2020']# Slicing on monthvwap_subset = df_vwap['2017-01':'2020-12']#Slicing on dayvwap_subset = df_vwap['2017-01-01':'2020-12-15']`

## 3. Visualizing the Time Series data 📊

Visualizing the time series data can offer a better perspective instead of merely looking at the numbers.

`df_vwap['VWAP'].plot(figsize=(16,8),title=' volume weighted average price')`

It appears that Maruti had a more or less steady increase in its stock price from 2004 to the mid-2018 window. There seems to be some drop in 2019, though. Let’s further analyze the data for the year 2018.

`ax = df_vwap.loc['2018', 'VWAP'].plot(figsize=(15,6))ax.set_title('Month-wise Trend in 2018'); ax.set_ylabel('VWAP');ax.xaxis.set_major_locator(mdates.MonthLocator())ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'));`

We see a dip in the stock prices, particularly around the end of October and November. Let’s further zoom in on these dates.

`ax = df_vwap.loc['2018-10':'2018-11','VWAP'].plot(marker='o', linestyle='-',figsize=(15,6))ax.set_title('Oct-Nov 2018 trend'); ax.set_ylabel('VWAP');ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=mdates.MONDAY))ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'));`

So there is a dip in stock prices around the last week of October and the first week of November. One could investigate it further by finding out if there was some special event on that day.

## 4. Feature Extraction ⚒

Let’s extract time and date features from the Date column.

`df_vwap.reset_index(inplace=True)df_vwap['year'] = df_vwap.Date.dt.yeardf_vwap['month'] = df_vwap.Date.dt.monthdf_vwap['day'] = df_vwap.Date.dt.daydf_vwap['day of week'] = df_vwap.Date.dt.dayofweek#Set Date column as the index column.df_vwap.set_index('Date', inplace=True)df_vwap.head()`

## 5. Time resampling ⏳

Time resampling is a way to aggregate data with respect to a defined time period. We have the stock price data for each day, but this doesn’t make much sense if we want to see the trend for a financial institution. What is useful is the aggregated information for every month or every quarter. This helps the management to get an overview instantly and then make decisions based on this overview.

Pandas library has a resample() function which resamples time-series data. The resample method in pandas is similar to its groupby method since it is essentially grouping by a specific time span.

`df_vwap.resample(rule = 'A').mean()[:5]`

Let’s understand what this means:

• `df_vwap.resample()` is used to resample the stock data.
• The ‘A’ stands for year-end frequency and denotes the offset values we want to resample the data.
• `mean()` indicates that we want the average stock price during this period.

The output looks like this:

So here we have the average stock data displayed on the 31st of each month. Below is a complete list of the offset values. The list can be found in the pandas’ documentation.

We can also use time sampling to plot charts for some specific columns.

`plt.rcParams['figure.figsize'] = (8, 6)df_vwap['VWAP'].resample('A').mean().plot(kind='bar')plt.title('Yearly Mean VWAP for Maruti')`

The above bar plot corresponds to Maruti’s average adjusted close price at year-end. Similarly, here is the monthly maximum opening price for Maruti

`df_vwap['VWAP'].resample('AS').mean().plot(kind='bar',figsize = (10,4))plt.title('Yearly start Mean VWAP for Maruti')`

## 6. Time Shifting ⏲️

Sometimes, it is desirable to shift or move the data forward or backward in time. This shifting is done along a time index by the desired number of time-frequency increments. Here is the original dataset without any shiftings.

`df_vwap.head()`

6.1 Forward Shifting

To shift our data forward, we will pass the desired number of periods (or increments) through the `shift()` function, which in this case, needs to be a positive value. Let’s move our data forward by one period or index, which means that all values which earlier corresponded to row N will now belong to row N+1. Here is the output:

`df_vwap.shift(1).head()`

6.2 Backward Shifting

Similarly, there is a concept of backward shifting. To shift our data backward, the number of periods (or increments) to change by, must be negative.

`df_vwap.shift(-1).head()`

6.3 Shifting based off Time String Code

We can also use the `offset` from the offset table (from the Time Resampling section )for time-shifting. For that, we will make use of the pandas’ `tshift() `function. We only need to pass in the periods and freq parameters. The period attribute defines the number of steps to be shifted, and the freq parameters denote the size of those steps.

Let’s say we want to shift the data 3 months forward:

`df_vwap.tshift(periods=3, freq = 'M').head()`

We would get the following as an output.

## 7. Rolling Windows 🧈

Time series data can be noisy, and as a result, it becomes difficult to gauge the trend or pattern due to the high fluctuations. For instance, here is the visualization of the VWAP price of the Maruti stock over the years.

`df_vwap['VWAP'].plot(figsize = (10,6))`

There’s quite a bit of noise here because this is the daily data. It would be nice to average this out by a week, which is where a rolling mean comes in. A `rolling mean` or `Moving average` is a transformation method that tends to average out this noise from the data. The idea is simple. Split the data into windows, and the data in each of these windows is then aggregated by some function like `mean()``median()``count()`, etc.

For this example, we shall use a rolling mean for seven days.

`df_vwap.rolling(7).mean().head(10)`

The first six values have become blank because there wasn’t enough data to fill since we chose a window of 7 days.

So what are the significant effects of calculating a moving average or using this rolling method? Our data becomes a lot less noisy and becomes more reflective of the trend than the actual data. Let’s plot this out. We shall plot the original data and then the rolling data for 30 days in the same graph.

`df_vwap['VWAP'].plot()df_vwap.rolling(window=30).mean()['VWAP'].plot(figsize=(16, 6))`