Convolutional Neural Networks
Convolution Operations: The Heart of CNNs
Convolutional Neural Networks (CNNs) are designed to process grid-like data, especially images. The key operation is the convolution.
What is a Convolution?
A convolution slides a small matrix (called a kernel or filter) across the input image, computing element-wise multiplications and summing the results.
Input (5x5) Kernel (3x3) Output (3x3)
1 2 3 4 5 1 0 -1
2 3 4 5 6 * 2 0 -2 = Result values
3 4 5 6 7 1 0 -1
4 5 6 7 8
5 6 7 8 9
Why Convolutions for Images?
1. Translation Invariance: A cat is a cat regardless of where it appears 2. Parameter Sharing: Same kernel used across entire image (fewer parameters) 3. Local Connectivity: Each output depends only on a small region (receptive field)
Mathematical Definition
(I * K)[i,j] = Σm Σn I[i+m, j+n] × K[m,n]
NumPy Implementation
import numpy as npdef convolve2d(image, kernel):
"""Simple 2D convolution (no padding, stride=1)"""
h, w = image.shape
kh, kw = kernel.shape
output_h = h - kh + 1
output_w = w - kw + 1
output = np.zeros((output_h, output_w))
for i in range(output_h):
for j in range(output_w):
Extract patch and compute dot product
patch = image[i:i+kh, j:j+kw]
output[i, j] = np.sum(patch * kernel)
return outputEdge detection kernel (Sobel)
sobel_x = np.array([[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]])Apply convolution
edges = convolve2d(image, sobel_x)
PyTorch Convolutions
import torch
import torch.nn as nn2D Convolution layer
conv = nn.Conv2d(
in_channels=3, RGB input
out_channels=16, Number of filters
kernel_size=3, 3x3 kernels
stride=1, Move 1 pixel at a time
padding=1 Keep same spatial size
)Input: (batch, channels, height, width)
x = torch.randn(1, 3, 224, 224)
output = conv(x) Shape: (1, 16, 224, 224)