advancedData Science

Deep Learning with TensorFlow & Keras

Explore neural network architectures, CNNs, and RNNs. Build and deploy deep learning models for image and text classification.

90 min read14 sections
1

Deep Learning: What It Actually Is (and Isn't)

Deep learning is pattern recognition with neural networks — layered mathematical functions that can learn incredibly complex mappings from input to output. It's the technology behind voice assistants, self-driving cars, and generative AI. But it's not magic, and it's not always the right tool.

Where deep learning genuinely shines: images, text, audio, video — unstructured data where traditional feature engineering is impractical. You can't manually describe every possible way a cat appears in a photo. A CNN learns those patterns from examples.

Where it's often overkill: structured tabular data with a few dozen features. A gradient-boosted tree will frequently match or beat a neural network on your typical business dataset, train in a fraction of the time, and be far easier to interpret. I've watched teams spend months building deep learning pipelines that a well-tuned XGBoost model would have outperformed.

Know when to use each tool. That judgment is more valuable than knowing every TensorFlow API.

2

Setting Up TensorFlow and Keras

TensorFlow is Google's open-source deep learning framework. Keras is its high-level API — the part you'll actually interact with 95% of the time. Since TensorFlow 2.x, Keras is built in.

pip install tensorflow

Quick sanity check:

import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU available: {len(tf.config.list_physical_devices('GPU')) > 0}")

If you don't have a GPU, don't worry. You can still learn everything in this tutorial on CPU — it'll just take longer for training. For serious projects, use Google Colab (free GPU access) or a cloud instance.

The Keras workflow is the same across every model type: define layers, compile (choose optimizer + loss), fit (train), evaluate, predict. Once you've built one model, the second one takes five minutes instead of fifty.

3

How Neural Networks Learn

A neural network is a chain of simple mathematical operations, stacked in layers. Each layer takes numbers in, multiplies them by learnable weights, adds a bias, passes the result through an activation function, and hands the output to the next layer.

The training process:

  1. Forward pass — Data flows through the network, producing a prediction
  2. Loss calculation — A loss function measures how wrong the prediction was
  3. Backward pass (backpropagation) — The error signal propagates backward, computing how each weight contributed to the mistake
  4. Weight update — The optimizer adjusts weights to reduce the error. Learning rate controls step size.

That's one training step. Repeat it thousands of times across millions of examples and the network gradually improves.

The activation function deserves special mention. Without it, stacking layers would be pointless — a chain of linear operations is still linear. ReLU (Rectified Linear Unit) is the default choice for hidden layers: it outputs the input if positive, zero otherwise. Simple, fast, and it works. For the output layer, use sigmoid (binary classification), softmax (multi-class), or nothing (regression).

4

Your First Neural Network in Keras

Let's classify handwritten digits — the MNIST dataset. It's the "hello world" of deep learning, and for good reason: the data is clean, the task is clear, and you get to see results fast.

import tensorflow as tf
from tensorflow import keras
import numpy as np

# Load and preprocess
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0

# Build the network
model = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train
history = model.fit(X_train, y_train, epochs=15, batch_size=64,
                    validation_split=0.15)

# Evaluate
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test accuracy: {accuracy:.4f}")

That should give you around 97–98% accuracy. Not bad for a few lines of code. The Dropout layers randomly shut off 30% of neurons during each training step, which prevents the network from memorising the training data. Think of it as forcing different parts of the network to be independently useful.

5

Convolutional Neural Networks for Image Recognition

The network above treats each pixel independently — it has no concept of spatial structure. That works okay for centred, uniform images like MNIST, but falls apart on real photos where objects can appear anywhere in the frame.

CNNs solve this by using filters — small windows that slide across the image, detecting local patterns. Early layers learn edges and textures. Deeper layers learn shapes and object parts. The deepest layers learn entire objects. This hierarchical feature learning is what makes CNNs so powerful for vision tasks.

model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.Flatten(),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.4),
    keras.layers.Dense(10, activation='softmax')
])

On MNIST, this pushes accuracy to ~99.2%. On real-world image datasets — medical scans, satellite imagery, product photos — CNNs achieve performance that was considered science fiction ten years ago.

The MaxPooling2D layers downsample the feature maps, reducing computation and making the network more robust to small translations. It's a form of built-in abstraction: "I found an edge somewhere in this 2x2 region" is more useful than "I found an edge at pixel (14, 23)."

6

Recurrent Networks and LSTMs for Sequential Data

Images have spatial structure. Text, time series, and audio have temporal structure — the order matters. "The dog bit the man" means something very different from "The man bit the dog."

RNNs (Recurrent Neural Networks) process sequences one step at a time, maintaining a hidden state that carries information from previous steps. In theory, this lets them capture dependencies across a sequence. In practice, vanilla RNNs struggle with anything longer than about 20 steps because gradients either vanish to zero or explode to infinity during backpropagation.

LSTMs (Long Short-Term Memory) fix this with a gating mechanism — three gates that control what information to keep, discard, and output. The internal "cell state" can carry information across hundreds of steps without degradation.

# Sentiment analysis example
model = keras.Sequential([
    keras.layers.Embedding(vocab_size, 128, input_length=max_len),
    keras.layers.LSTM(64, return_sequences=True),
    keras.layers.LSTM(32),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.4),
    keras.layers.Dense(1, activation='sigmoid')
])

Worth noting: for most NLP tasks in 2025+, Transformer-based models (BERT, GPT) have largely superseded LSTMs. But LSTMs remain the go-to for time series forecasting, signal processing, and situations where you don't have the compute budget for a Transformer. Know both.

7

Transfer Learning: Standing on Giants' Shoulders

Training a CNN from scratch requires millions of images and days of GPU time. Transfer learning lets you skip most of that by reusing a model that someone else already trained on a massive dataset.

The idea: a model trained on ImageNet (14 million images, 1000 categories) has already learned to detect edges, textures, shapes, and objects. You take those learned features and fine-tune them for your specific task — even if you only have a few hundred images.

base_model = keras.applications.EfficientNetB0(
    weights='imagenet', include_top=False, input_shape=(224, 224, 3)
)
base_model.trainable = False  # Freeze the pre-trained weights

model = keras.Sequential([
    base_model,
    keras.layers.GlobalAveragePooling2D(),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dropout(0.4),
    keras.layers.Dense(num_classes, activation='softmax')
])

Start with the base model frozen (only training your new layers). Once that converges, unfreeze the last 20–30 layers and fine-tune with a very small learning rate (1e-5). This two-stage approach consistently outperforms either training from scratch or keeping everything frozen.

Transfer learning changed the economics of deep learning. Tasks that once required a research lab with a GPU cluster can now be done on a laptop with a few hundred labelled examples.

8

Fighting Overfitting: When Your Model Memorises Instead of Learns

Overfitting is the most common failure mode in deep learning. Your training accuracy climbs to 99%. Your validation accuracy stalls at 82%. The network has memorised the training examples instead of learning general patterns.

How to detect it: Plot training loss and validation loss over epochs. If training loss keeps dropping but validation loss starts rising, you're overfitting. The divergence point is where you should have stopped training.

The toolkit for fighting it:

  • Dropout — Randomly disables neurons during training. Forces the network to be redundant. Rates of 0.2–0.5 are typical.
  • Early stopping — Monitor validation loss, stop when it stops improving. The simplest and most effective technique.
  • Data augmentation — Generate variations of your training images (flips, rotations, crops). More diverse training data = better generalisation.
  • L2 regularisation — Penalises large weights, encouraging the model to find simpler solutions.
  • Reduce model capacity — Fewer layers or fewer neurons. If a smaller model performs nearly as well, use the smaller model.
early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss', patience=5, restore_best_weights=True
)

model.fit(X_train, y_train, epochs=100, validation_split=0.2,
          callbacks=[early_stop])

Setting patience=5 gives the model five epochs to improve. If it doesn't, training stops and the weights from the best epoch are restored. This alone prevents the majority of overfitting cases I encounter.

9

Data Augmentation for Better Generalisation

Data augmentation is underrated. It's essentially free extra training data — your model sees slightly different versions of each image every epoch, which dramatically improves its ability to handle real-world variation.

augmentation = keras.Sequential([
    keras.layers.RandomFlip("horizontal"),
    keras.layers.RandomRotation(0.1),
    keras.layers.RandomZoom(0.15),
    keras.layers.RandomContrast(0.1),
])

Use domain knowledge to choose augmentations. Horizontal flips make sense for photos of animals (a cat facing left or right is still a cat). They don't make sense for text recognition (a mirrored "b" is not a "d" in your dataset). Random rotation of ±10° is safe for most natural images. Going beyond ±30° rarely helps and can create unnatural examples.

Augmentation is applied only during training. Validation and test images are always evaluated unaugmented. I've seen people accidentally augment their test set and report inflated metrics — don't be that person.

On small datasets (under 1,000 images per class), augmentation combined with transfer learning is the winning formula. I've built production classifiers with as few as 200 images per class using this approach.

10

Training Tricks That Actually Help

A few techniques that consistently make a difference in practice:

Learning rate scheduling — Starting with a moderate learning rate and reducing it when progress stalls is almost always better than a fixed rate:

reduce_lr = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.2, patience=3, min_lr=1e-6
)

Batch size — Smaller batches (16–32) add noise that acts as implicit regularisation, often leading to better generalisation. Larger batches (128–512) train faster and are more stable. If you're not sure, 32 is a safe default for most tasks.

Optimiser choice — Adam works out of the box 90% of the time. If you want to squeeze out slightly better results during fine-tuning, try AdamW (Adam with decoupled weight decay). SGD with momentum can sometimes achieve better final performance, but requires careful learning rate tuning.

Mixed precision training — If you have a modern GPU (Volta or newer), enabling mixed precision gives you nearly 2x speedup with no accuracy loss:

tf.keras.mixed_precision.set_global_policy('mixed_float16')
11

Saving, Loading, and Exporting Models

Training a model and losing it because you forgot to save is a rite of passage. Do it once, and you'll never skip this step again.

# Save the full model
model.save('my_model.keras')

# Load it back
loaded = keras.models.load_model('my_model.keras')

# Checkpoint the best model during training
checkpoint = keras.callbacks.ModelCheckpoint(
    'best_model.keras',
    monitor='val_accuracy',
    save_best_only=True,
    mode='max'
)

model.fit(X_train, y_train, epochs=50,
          callbacks=[checkpoint, early_stop, reduce_lr])

For deployment, you have options depending on the target environment:

  • TensorFlow Serving — Production API server for high-throughput inference
  • TensorFlow Lite — Compressed models for mobile and edge devices
  • TensorFlow.js — Run models directly in the browser
  • ONNX — Universal format if you need to run in non-TensorFlow environments
12

Putting It Together: End-to-End Image Classification

Here's a complete pipeline that mirrors how I'd structure a real project — data loading, augmentation, transfer learning, training with callbacks, evaluation:

import tensorflow as tf
from tensorflow import keras

# Data
train_ds = keras.utils.image_dataset_from_directory(
    'data/train', image_size=(224, 224), batch_size=32
)
val_ds = keras.utils.image_dataset_from_directory(
    'data/val', image_size=(224, 224), batch_size=32
)

# Augmentation
augmentation = keras.Sequential([
    keras.layers.RandomFlip("horizontal"),
    keras.layers.RandomRotation(0.1),
    keras.layers.RandomZoom(0.1),
])

# Pre-trained backbone
base = keras.applications.EfficientNetB0(
    weights='imagenet', include_top=False, input_shape=(224, 224, 3)
)
base.trainable = False

# Full model
inputs = keras.Input(shape=(224, 224, 3))
x = augmentation(inputs)
x = keras.applications.efficientnet.preprocess_input(x)
x = base(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.3)(x)
outputs = keras.layers.Dense(num_classes, activation='softmax')(x)
model = keras.Model(inputs, outputs)

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train with callbacks
callbacks = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
    keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=3),
    keras.callbacks.ModelCheckpoint('best.keras', save_best_only=True)
]

model.fit(train_ds, validation_data=val_ds, epochs=30, callbacks=callbacks)

This pattern — augmentation + frozen backbone + custom head + callbacks — is the standard approach for image classification in production. I've deployed variations of this exact architecture for medical imaging, retail product recognition, and document classification.

13

NLP with Deep Learning: Text Classification

Processing text with neural networks requires converting words into numbers. The standard pipeline:

  1. Tokenisation — Split text into words or subwords
  2. Vectorisation — Map each token to an integer ID
  3. Padding — Make all sequences the same length (pad short ones, truncate long ones)
  4. Embedding — Convert integer IDs into dense vectors where similar words are near each other
from tensorflow.keras.layers import TextVectorization

vectoriser = TextVectorization(max_tokens=20000, output_sequence_length=256)
vectoriser.adapt(train_texts)

model = keras.Sequential([
    vectoriser,
    keras.layers.Embedding(20000, 128),
    keras.layers.GlobalAveragePooling1D(),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.4),
    keras.layers.Dense(1, activation='sigmoid')
])

This works well for sentiment analysis, spam detection, and simple text classification. For anything more complex — question answering, summarisation, translation — you'll want Transformer-based models. The Hugging Face transformers library gives you access to BERT, RoBERTa, and hundreds of other pre-trained models with a Keras-compatible API.

14

Your Deep Learning Roadmap

You now have the foundational skills to build, train, and deploy deep learning models. Here's where to go next, roughly in order of priority:

  • Work through a Kaggle image classification competition — The feedback loop of seeing your score on a public leaderboard is incredibly motivating
  • Learn about Transformers and attention mechanisms — They've fundamentally changed both NLP and computer vision
  • Explore object detection with YOLO or Faster R-CNN — going from "there's a cat in this image" to "there's a cat at coordinates (120, 340, 280, 510)"
  • Try generative models — GANs, VAEs, and diffusion models are the basis of all image generation tools
  • Deploy a model end-to-end — Serve predictions via FastAPI, add monitoring, handle edge cases. This is where most learning happens.

The field moves fast. Papers from six months ago can be outdated. But the fundamentals — how backpropagation works, why CNNs detect spatial features, when to use regularisation — those are permanent knowledge. Invest in understanding the "why" behind each technique, not just the "how."

Ready to Take the Next Step?

Our tutorials are just the beginning. Explore our expert-led courses and certifications for hands-on, career-ready training.