Master CNN like Top Expert, Even if You Don’t Know NumPy

Have you ever wondered what the difference is between an AI developer with 10 years of experience who has a low-paying, dead-end job and a FANG AI engineer earning $100K with stock options and getting promoted to team leader next month?

What Makes the Difference?

The difference lies in what one knows; the person with dead-end and low-pay jobs just knows how to use models on HungFace, and the developer on the opposite spectrum knows the ins and outs of complex AI architectures like CNNs.

This blog is for those who want to understand the algorithms just like a top Expert, for securing the dream job or making an impact on the world.

This is part 2 of from Master CNN like a Top Expert, if you are new, I would recommend that you read Part 1 Master CNN like Top Expert, Even if You Terrible at Maths first and comback on this.

A quick recap of what we’ve covered so far:

Part 1:

  • What exactly are CNNs
  • Explanation of how CNNs work
  • Different layers and how they function
  • The basic mathematics of CNNs
  • The basic mathematics of each layer

In this blog, you will learn:

  • Basics of NumPy
  • Handling Images in NumPy
  • Creating all The Layers in NumPy from Scratch

Basics of NumPy

Before we build a Convolutional Neural Network (CNN) from scratch, we need to master the one tool every top AI expert uses daily: NumPy.

Think of NumPy as the bedrock of AI programming in Python. While libraries like PyTorch and TensorFlow steal the spotlight, underneath, they all rely on NumPy-like tensor operations. So if you can master NumPy, you’re halfway to mastering deep learning.

Let’s start with the essentials:

1. What is NumPy?

NumPy (short for Numerical Python) is a high-performance library for numerical computing. It gives you:

  • ndarrays (N-dimensional arrays) that are faster and more powerful than Python lists
  • Vectorized operations (no for-loops!)
  • Matrix multiplication, dot products, reshaping, slicing, and all the tools needed to handle image data like a pro

NumPy Intro

NumPy (Numerical Python) is the foundation for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them. It’s fast, thanks to its implementation in C, and is a must-know for AI developers working with data.

Getting Started

To use NumPy, first install it using pip if you haven’t already: pip install numpy. Then, import it in your Python script:

import numpy as np

Creating Arrays

NumPy arrays are the core of the library. You can create them from Python lists or use built-in functions:

# From a list
arr = np.array([1, 2, 3, 4])

# 2D array
arr_2d = np.array([[1, 2], [3, 4]])

# Using built-in functions
zeros = np.zeros((2, 3))  # 2x3 array of zeros
ones = np.ones((3, 2))    # 3x2 array of ones
arange = np.arange(0, 10, 2)  # Array from 0 to 9, step 2: [0, 2, 4, 6, 8]

Array Indexing

Access elements in a NumPy array using indices, similar to Python lists:

arr = np.array([10, 20, 30, 40])
print(arr[1])  # Output: 20

# 2D array indexing
arr_2d = np.array([[1, 2], [3, 4]])
print(arr_2d[1, 0])  # Output: 3

Array Slicing

Slicing extracts a portion of the array. Use the syntax arr[start:end:step]:

arr = np.array([0, 1, 2, 3, 4, 5])
print(arr[1:4])  # Output: [1, 2, 3]

# 2D slicing
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d[0, 1:3])  # Output: [2, 3]

Data Types

NumPy arrays have specific data types (e.g., int32, float64). You can check or set the type:

arr = np.array([1, 2, 3])
print(arr.dtype)  # Output: int64 (or int32 depending on system)

# Specify type
arr_float = np.array([1, 2, 3], dtype='float32')
print(arr_float.dtype)  # Output: float32

Copy vs View

A copy creates a new array, while a view references the original array:

arr = np.array([1, 2, 3])
copy_arr = arr.copy()  # Independent copy
view_arr = arr.view()  # View of the original

copy_arr[0] = 10
print(arr)  # Output: [1, 2, 3]

view_arr[0] = 10
print(arr)  # Output: [10, 2, 3]

Array Shape

The shape of an array is its dimensions:

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)  # Output: (2, 3) - 2 rows, 3 columns

Array Reshape

Reshape changes the array’s dimensions without altering its data:

arr = np.array([1, 2, 3, 4, 5, 6])
reshaped = arr.reshape(2, 3)
print(reshaped)  # Output: [[1, 2, 3], [4, 5, 6]]

Array Iterating

Iterate over arrays using loops or NumPy’s nditer for efficiency:

arr = np.array([[1, 2], [3, 4]])
for x in np.nditer(arr):
    print(x, end=' ')  # Output: 1 2 3 4

Array Join

Combine arrays using functions like concatenate, stack, or hstack:

arr1 = np.array([1, 2])
arr2 = np.array([3, 4])
joined = np.concatenate((arr1, arr2))
print(joined)  # Output: [1, 2, 3, 4]

Array Split

Split an array into multiple sub-arrays:

arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.array_split(arr, 3)
print(split_arr)  # Output: [array([1, 2]), array([3, 4]), array([5, 6])]

Find indices where a condition is met using where:

arr = np.array([10, 20, 30, 20])
indices = np.where(arr == 20)
print(indices)  # Output: (array([1, 3]),)

Array Sort

Sort arrays with sort:

arr = np.array([3, 1, 4, 2])
sorted_arr = np.sort(arr)
print(sorted_arr)  # Output: [1, 2, 3, 4]

Array Filter

Filter arrays using boolean indexing:

arr = np.array([1, 2, 3, 4])
filter_arr = arr > 2
filtered = arr[filter_arr]
print(filtered)  # Output: [3, 4]

Mastering these NumPy basics will set a strong foundation for building and manipulating CNN layers in the upcoming sections.

Handling Images in NumPy

Now that you’ve got the basics of NumPy under your belt, let’s move on to something more exciting: handling images! In CNNs, images are the primary input, and NumPy is your go-to tool for loading, manipulating, and preprocessing them before feeding them into your model. Let’s break this down step by step.

Loading an Image as a NumPy Array

Images are essentially arrays of pixel values, and NumPy is perfect for working with them. To load an image, you’ll need a library like Pillow (PIL) or imageio to read the image file and convert it into a NumPy array. Here’s how you can do it with Pillow:

from PIL import Image
import numpy as np

# Load the image
image = Image.open("example.jpg")  # Replace with your image path
image_array = np.array(image)

print(image_array.shape)  # Output: (height, width, channels), e.g., (224, 224, 3) for an RGB image

The resulting array has a shape of (height, width, channels), where channels is typically 3 for RGB images (red, green, blue) or 1 for grayscale.

Visualizing the Image

To check your work, you can visualize the modified image using a library like matplotlib:

import matplotlib.pyplot as plt

plt.imshow(image_array)
plt.axis('off')
plt.show()

This will display the image, helping you verify that your preprocessing steps are working as expected.

Converting to Grayscale

Color images can be computationally heavy, so you might want to convert them to grayscale for simpler processing. Here’s how:

# Convert to grayscale
gray_image = np.mean(image_array, axis=2).astype(np.uint8)  # Average across the channel axis
print(gray_image.shape)  # Output: (height, width), e.g., (224, 224)

This reduces the image to a 2D array, where each value represents the intensity of the pixel (0 to 255).

Normalizing Pixel Values

CNNs often perform better when pixel values are normalized to a range like [0, 1] or [-1, 1]. Since pixel values in images are typically between 0 and 255, you can normalize them easily:

normalized_image = image_array / 255.0
print(normalized_image.min(), normalized_image.max())  # Output: 0.0 1.0

This step is crucial for training your CNN, as it ensures all input values are on a consistent scale.

Now I would like you all to print the images yourselves to encourage practice—not because I’m too lazy to include them in every code block.

Resizing Images

Images often come in different sizes, but CNNs typically require a fixed input size (e.g., 224×224 for many models). You can resize images using libraries like Pillow and then convert them back to a NumPy array:

# Resize the image
resized_image = image.resize((224, 224))  # Resize to 224x224
resized_array = np.array(resized_image)

print(resized_array.shape)  # Output: (224, 224, 3)

Cropping and Padding

Sometimes, you need to crop or pad images to focus on specific regions or match a required size. NumPy makes this straightforward:

# Cropping
cropped_array = image_array Strong style[50:150, 50:150, :]  # Crop a 100x100 section
print(cropped_array.shape)  # Output: (100, 100, 3)

# Padding (e.g., to add a border of zeros)
padded_array = np.pad(image_array, ((10, 10), (10, 10), (0, 0)), mode='constant', constant_values=0)
print(padded_array.shape)  # Output: (height+20, width+20, 3)

Flipping and Rotating

Data augmentation—like flipping or rotating images—can help improve your CNN’s performance by increasing the variety of training data:

# Flip vertically
flipped_vertical = np.flipud(image_array)

# Flip horizontally
flipped_horizontal = np.fliplr(image_array)

# Rotate 90 degrees
rotated = np.rot90(image_array)

Accessing and Modifying Pixel Values

You can directly manipulate pixel values using NumPy’s indexing. For example, let’s brighten an image by increasing pixel values:

brightened_image = np.clip(image_array + 50, 0, 255)  # Add 50 to all pixels, clip to valid range

This is useful for experimenting with image preprocessing techniques that can enhance your CNN’s ability to learn.

Now let’s get started with the more fun stuff—creating different layers of a CNN on our own.

Walking Through the Layers

Input Layer

The input_layer function loads example.jpg, resizes it to 64×64 pixels, converts it to grayscale (to simplify processing), and normalizes the pixel values to [0, 1]. The output is a 2D array of shape (64, 64). This preprocessing ensures the image is in a format suitable for the CNN.

def input_layer(image_path, target_size=(64, 64)):
    """
    Loads an image, converts to grayscale, resizes, and normalizes it.
    Args:
        image_path (str): Path to the image file.
        target_size (tuple): Desired (height, width) for resizing.
    Returns:
        np.ndarray: Preprocessed image array of shape (height, width).
    """
    # Load the image
    image = Image.open(image_path)
    # Resize to target size
    image = image.resize(target_size)
    # Convert to NumPy array
    image_array = np.array(image)
    # Convert to grayscale
    if len(image_array.shape) == 3:  # If RGB, convert to grayscale
        image_array = np.mean(image_array, axis=2).astype(np.uint8)
    # Normalize to [0, 1]
    image_array = image_array / 255.0
    return image_array
Input Layer (Grayscale Normalized)

Convolutional Layer

The convolutional_layer function applies convolution using a 3×3 kernel (a simple averaging filter for this example). We use padding=’same’, which adds a 1-pixel border of zeros around the input (since (3-1)/2 = 1) to ensure the output spatial dimensions match the input (64×64). The stride is 1, and we use 1 filter, so the output shape is (64, 64, 1). The convolution operation slides the kernel over the image, computing the weighted sum at each position to extract features like edges or textures.

def convolutional_layer(input_data, kernel_size=3, num_filters=1, stride=1, padding='same'):
    """
    Applies convolution to the input with specified kernel size, filters, stride, and padding.
    Args:
        input_data (np.ndarray): Input array of shape (height, width).
        kernel_size (int): Size of the kernel (e.g., 3 for 3x3).
        num_filters (int): Number of filters to apply.
        stride (int): Step size for convolution.
        padding (str): 'same' to add padding, 'valid' for no padding.
    Returns:
        np.ndarray: Convolved output of shape (out_height, out_width, num_filters).
    """
    height, width = input_data.shape
    kernel = np.ones((kernel_size, kernel_size)) / (kernel_size * kernel_size)  # Simple averaging kernel

    # Calculate padding
    if padding == 'same':
        pad_size = (kernel_size - 1) // 2
        input_padded = np.pad(input_data, ((pad_size, pad_size), (pad_size, pad_size)), mode='constant', constant_values=0)
    else:
        input_padded = input_data
        pad_size = 0

    # Calculate output dimensions
    out_height = (height + 2 * pad_size - kernel_size) // stride + 1
    out_width = (width + 2 * pad_size - kernel_size) // stride + 1
    output = np.zeros((out_height, out_width, num_filters))

    # Apply convolution for each filter
    for f in range(num_filters):
        for i in range(0, out_height):
            for j in range(0, out_width):
                region = input_padded[i*stride:i*stride+kernel_size, j*stride:j*stride+kernel_size]
                output[i, j, f] = np.sum(region * kernel)

    return output
After Convolution (1 Filter)

Pooling Layer

The pooling_layer function applies max pooling with a 2×2 window and a stride of 2. This reduces the spatial dimensions by half: from (64, 64, 1) to (32, 32, 1). Max pooling takes the maximum value in each 2×2 region, which helps reduce computational complexity and makes the model more robust to small translations in the image.

def pooling_layer(input_data, pool_size=2, stride=2, pool_type='max'):
    """
    Applies pooling to the input to reduce spatial dimensions.
    Args:
        input_data (np.ndarray): Input array of shape (height, width, channels).
        pool_size (int): Size of the pooling window (e.g., 2 for 2x2).
        stride (int): Step size for pooling.
        pool_type (str): 'max' for max pooling, 'avg' for average pooling.
    Returns:
        np.ndarray: Pooled output of shape (out_height, out_width, channels).
    """
    height, width, channels = input_data.shape
    out_height = (height - pool_size) // stride + 1
    out_width = (width - pool_size) // stride + 1
    output = np.zeros((out_height, out_width, channels))

    for c in range(channels):
        for i in range(out_height):
            for j in range(out_width):
                region = input_data[i*stride:i*stride+pool_size, j*stride:j*stride+pool_size, c]
                if pool_type == 'max':
                    output[i, j, c] = np.max(region)
                elif pool_type == 'avg':
                    output[i, j, c] = np.mean(region)

    return output
Image after After Max Pooling

Output Layer

The output_layer function flattens the pooling output (from (32, 32, 1) to a 1D array of size 32*32*1=1024) and applies a fully connected layer. We simulate a binary classification task (2 classes) by using random weights and biases to compute logits, then apply the softmax function to get probabilities. The output is a 1D array of shape (2,), representing the probability of each class.

def output_layer(input_data, num_classes=2):
    """
    Flattens the input and applies a fully connected layer for classification.
    Args:
        input_data (np.ndarray): Input array of shape (height, width, channels).
        num_classes (int): Number of output classes.
    Returns:
        np.ndarray: Output probabilities of shape (num_classes,).
    """
    # Flatten the input
    flattened = input_data.flatten()
    # Simulate a fully connected layer with random weights
    weights = np.random.randn(flattened.size, num_classes) * 0.01  # Small random weights
    biases = np.zeros(num_classes)
    # Compute output (logits)
    logits = np.dot(flattened, weights) + biases
    # Apply softmax to get probabilities
    exp_logits = np.exp(logits)
    probabilities = exp_logits / np.sum(exp_logits)
    return probabilities

So, to sum it up and get some image SEO juice, here’s an image capturing the evolution of an image as it passes through all the CNN layers. It’s fascinating how, with each layer, we lose some visual quality but gain deeper insight about the image. It goes to show that sometimes, you must break things down in order to build something greater.

Image Going Through different layers of CNN

Until we meet again

In the next Blog we will be coding whole CNN from just Numpy stay tuned, and for updates follow me on x

Amarnath Pandey

Tweets by goodboyamarnath

Leave a Reply

Your email address will not be published. Required fields are marked *