Convolutional Neural Networks

120 min
Deep Learning
40%

Convolution Operations: The Heart of CNNs

Convolutional Neural Networks (CNNs) are designed to process grid-like data, especially images. The key operation is the convolution.

What is a Convolution?

A convolution slides a small matrix (called a kernel or filter) across the input image, computing element-wise multiplications and summing the results.

Input (5x5)     Kernel (3x3)      Output (3x3)
1 2 3 4 5       1 0 -1            
2 3 4 5 6   *   2 0 -2     =      Result values
3 4 5 6 7       1 0 -1            
4 5 6 7 8                         
5 6 7 8 9                         

Why Convolutions for Images?

1. Translation Invariance: A cat is a cat regardless of where it appears 2. Parameter Sharing: Same kernel used across entire image (fewer parameters) 3. Local Connectivity: Each output depends only on a small region (receptive field)

Mathematical Definition

(I * K)[i,j] = Σm Σn I[i+m, j+n] × K[m,n]

NumPy Implementation

import numpy as np

def convolve2d(image, kernel): """Simple 2D convolution (no padding, stride=1)""" h, w = image.shape kh, kw = kernel.shape output_h = h - kh + 1 output_w = w - kw + 1 output = np.zeros((output_h, output_w)) for i in range(output_h): for j in range(output_w):

Extract patch and compute dot product

patch = image[i:i+kh, j:j+kw] output[i, j] = np.sum(patch * kernel) return output

Edge detection kernel (Sobel)

sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])

Apply convolution

edges = convolve2d(image, sobel_x)

PyTorch Convolutions

import torch
import torch.nn as nn

2D Convolution layer

conv = nn.Conv2d( in_channels=3,

RGB input

out_channels=16,

Number of filters

kernel_size=3,

3x3 kernels

stride=1,

Move 1 pixel at a time

padding=1

Keep same spatial size

)

Input: (batch, channels, height, width)

x = torch.randn(1, 3, 224, 224) output = conv(x)

Shape: (1, 16, 224, 224)