Pandas for Data Manipulation

75 min
Pandas
35%

Introduction to Pandas

Pandas is the most popular Python library for data manipulation and analysis. It provides powerful data structures that make working with structured data intuitive and efficient.

Why Pandas for Machine Learning?

1. Data Loading: Easily read data from CSV, Excel, JSON, SQL databases, and more 2. Data Cleaning: Handle missing values, duplicates, and data type conversions 3. Data Transformation: Filter, sort, group, and reshape data effortlessly 4. Feature Engineering: Create new features from existing data 5. Integration: Works seamlessly with NumPy, Scikit-learn, and visualization libraries

Installation

pip install pandas

Basic Import Convention

import pandas as pd
import numpy as np

Create a simple DataFrame

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'salary': [50000, 60000, 70000] })

print(df)

name age salary

0 Alice 25 50000

1 Bob 30 60000

2 Charlie 35 70000

Core Data Structures

Series: 1D labeled array

s = pd.Series([1, 2, 3, 4, 5], name='numbers') print(s)

0 1

1 2

2 3

3 4

4 5

Name: numbers, dtype: int64

DataFrame: 2D labeled table (most commonly used)

df = pd.DataFrame({ 'A': [1, 2, 3], 'B': ['x', 'y', 'z'] })

Access column as Series

print(df['A'])

Returns Series

print(df.A)

Alternative syntax