Data Visualization with Matplotlib
Introduction to Matplotlib
Matplotlib is Python's foundational plotting library. Understanding it is essential for machine learning because visualization helps you understand your data, debug models, and communicate results.
Why Visualization for ML?
1. Exploratory Data Analysis: Understand distributions, relationships, and outliers 2. Feature Engineering: Visualize feature correlations and transformations 3. Model Evaluation: Plot learning curves, confusion matrices, ROC curves 4. Communication: Present findings to stakeholders
Installation
pip install matplotlib
Basic Import Convention
import matplotlib.pyplot as plt
import numpy as npYour first plot
x = np.linspace(0, 10, 100)
y = np.sin(x)plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.show()
Two Interfaces
Matplotlib offers two ways to create plots:
1. Pyplot interface (quick and simple)
plt.plot(x, y)
plt.title('Simple Plot')
plt.show()2. Object-Oriented interface (more control, recommended for complex plots)
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_title('OO Plot')
plt.show()
Figure and Axes
Create figure with multiple subplots
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))Access individual subplots
axes[0, 0].plot(x, y)
axes[0, 0].set_title('Subplot 1')axes[0, 1].scatter(x, y)
axes[0, 1].set_title('Subplot 2')
axes[1, 0].bar([1, 2, 3], [4, 5, 6])
axes[1, 0].set_title('Subplot 3')
axes[1, 1].hist(np.random.randn(1000), bins=30)
axes[1, 1].set_title('Subplot 4')
plt.tight_layout()
Prevent overlapping
plt.show()