Identifying Rows not Present in Another Dataframe using Pandas

21 Jan, 2023

When working with data in Python, the pandas package is commonly used to manipulate and analyze data in the form of DataFrames. Finding rows that are present in one DataFrame but not in another is a common problem when working with numerous DataFrames. In this article, we will explore how to execute this task using the pandas library.

The first method to achieve this is by using the .isin() method and the ~ operator. The .isin() method is used to filter a DataFrame based on a list of values, and the ~ operator is used to negate the filter. Here is an example of how to use these tools to find the rows from the first DataFrame that are not included in the second DataFrame:

import pandas as pd

# Create two example DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 3], 'B': [4, 6]})

# Get the rows from df1 that are not in df2
df1[~df1.isin(df2).all(1)]

This code will return a new DataFrame containing the rows from df1 that are not in df2.

Another approach is to use the .merge() function with the indicator parameter set to True, and filter on the resulting DataFrame.

result_df = pd.merge(df1, df2, on=['A', 'B'], indicator=True, how='outer')
df1_not_in_df2 = result_df[result_df['_merge']=='left_only']

This will give you a DataFrame df1_not_in_df2 containing only the rows from df1 which are not included in df2.

In summary, finding the rows in one DataFrame that are not present in another DataFrame is a common task when working with data in pandas. This can be accomplished by using the .isin() method and the ~ operator or by using the .merge() function with the indicator parameter set to True. By understanding these methods, you will be able to efficiently work with DataFrames and manipulate data in pandas.

#pandas #python