Calculate differences with sparse date/value dataframes

Mon 07 January 2019

Goal¶

Find the difference between two dataframes with a date and a value column
Create a continuous time axis to eliminate gaps in the date column
Fill the interpolated with the value of the previous date

In [1]:

import pandas as pd

Create a dataframe with two entries:

In [2]:

df_old = pd.DataFrame(
    [
        {
            'date': '2019-01-01',
            'value': 0.844
        },{
            'date': '2019-01-02',
            'value': 0.842
        }
    ]
)

Create a second dataframe with additional values for three more days:

In [3]:

df_new = pd.DataFrame(
    [
        {
            'date': '2019-01-01',
            'value': 0.844
        },{
            'date': '2019-01-02',
            'value': 0.842
        },{
            'date': '2019-01-03',
            'value': 0.84
        },{
            'date': '2019-01-04',
            'value': 0.84
        },{
            'date': '2019-01-07',
            'value': 0.838
        }
    ]
)

Create a time axis between the minimum and maximum date found in the bigger dataframe:

In [4]:

time_index = pd.date_range(df_new['date'].min(), df_new['date'].max())

Set the index to the date column:

In [5]:

df_new.set_index('date', inplace=True)

Convert the index to a DatetimeIndex:

In [6]:

df_new.index = pd.DatetimeIndex(df_new.index)

Reindex the existing date index with the continuous time_index to interpolate missing dates:

In [7]:

df_new = df_new.reindex(time_index, method='ffill')

Convert the date index back to a column:

In [8]:

df_new = df_new.reset_index().rename(columns={'index': 'date'})

Convert the date column to the proper datatype for the smaller dataframe:

In [9]:

df_old['date'] = pd.to_datetime(df_old['date'])

Combine the two dataframes and remove all the duplicate couples to only keep the none duplicate rows:

In [10]:

df_diff = pd.concat([df_new, df_old]).drop_duplicates(keep=False)

In [11]:

df_diff

Out[11]:

	date	value
2	2019-01-03	0.840
3	2019-01-04	0.840
4	2019-01-05	0.840
5	2019-01-06	0.840
6	2019-01-07	0.838

notebook Python Jupyter interpolation Pandas dataframes