Pandas for Data Manipulation
75 min
Pandas
35%
Introduction to Pandas
Pandas is the most popular Python library for data manipulation and analysis. It provides powerful data structures that make working with structured data intuitive and efficient.
Why Pandas for Machine Learning?
1. Data Loading: Easily read data from CSV, Excel, JSON, SQL databases, and more 2. Data Cleaning: Handle missing values, duplicates, and data type conversions 3. Data Transformation: Filter, sort, group, and reshape data effortlessly 4. Feature Engineering: Create new features from existing data 5. Integration: Works seamlessly with NumPy, Scikit-learn, and visualization libraries
Installation
pip install pandas
Basic Import Convention
import pandas as pd
import numpy as npCreate a simple DataFrame
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'salary': [50000, 60000, 70000]
})print(df)
name age salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
Core Data Structures
Series: 1D labeled array
s = pd.Series([1, 2, 3, 4, 5], name='numbers')
print(s)
0 1
1 2
2 3
3 4
4 5
Name: numbers, dtype: int64
DataFrame: 2D labeled table (most commonly used)
df = pd.DataFrame({
'A': [1, 2, 3],
'B': ['x', 'y', 'z']
})Access column as Series
print(df['A']) Returns Series
print(df.A) Alternative syntax