What is df in pandas

Last updated: April 1, 2026

Quick Answer: df is a DataFrame, the primary data structure in pandas (Python Data Analysis Library) that stores tabular data with labeled rows and columns, similar to a spreadsheet or SQL table.

Key Facts

Overview

In pandas, df is a conventional variable name for a DataFrame object, the core data structure used for data analysis in Python. A DataFrame is a two-dimensional table-like object containing columns and rows, similar to a spreadsheet or relational database table. Each column can contain different data types, and operations can be performed across rows and columns.

How DataFrames Work

DataFrames are built on top of NumPy arrays and provide a higher-level abstraction for data manipulation. They include row and column labels (indices) that make data selection and filtering intuitive. The structure allows users to perform complex operations like filtering, grouping, merging, and statistical calculations efficiently.

Creating DataFrames

You can create a DataFrame in several ways: from dictionaries, lists, NumPy arrays, or by reading external files like CSV or Excel. The most common approach is using a dictionary where keys become column names and values become the data in each column.

Common Operations

DataFrames support indexing, slicing, filtering with boolean masks, grouping (groupby), aggregation functions (sum, mean, count), and merging with other DataFrames. These operations are optimized for performance and handle missing data (NaN values) gracefully.

Why Use DataFrames

DataFrames are essential for data scientists and analysts because they provide an intuitive interface for data exploration and transformation. They handle real-world messy data efficiently and integrate seamlessly with other Python libraries like NumPy, Matplotlib, and scikit-learn.

Related Questions

How do I create a pandas DataFrame?

DataFrames can be created using pd.DataFrame() with dictionaries, lists, NumPy arrays, or imported from CSV files using pd.read_csv(). Common syntax includes passing a dictionary where keys are column names and values are lists of data.

What is the difference between a Series and a DataFrame?

A Series is a one-dimensional array-like object (single column of data), while a DataFrame is two-dimensional with multiple rows and columns. A DataFrame can be thought of as a collection of Series objects.

How do I select specific columns in a DataFrame?

You can select columns using bracket notation (df['column_name']) for single columns or df[['col1', 'col2']] for multiple columns. Column access returns a Series or DataFrame depending on the selection method.

Sources

  1. Pandas - DataFrame Documentation BSD-3-Clause
  2. Wikipedia - Pandas CC-BY-SA-3.0