Ian H. Ryan - Projects

GameBattles Forums
\ Xbox 360
\ Halo 3
\
ians projects

View First Unread

Thread Tools

Display Modes

4 Feb 2010, 10:39 PM EST

#1

ian ryan

forum dweller

join date: Apr 2007

Posts: 9,000

Watermark Removal Deep Learning CNN (Self-Supervised) via PyTorch

https://github.com/ianhenryryan/watermarkcnn

Machine Learning Engineer (aspiring): Ian H. Ryan
Version: 0.1
Timeline: May 22, 2025 - June 24th, 2025

Upon making a variety of image scrapers for the world wide web, I noticed that a portion of the images I found contained watermarks. This sprung the idea of creating a deep learning model with the use case of removing watermarks from images. I found a research paper on ArXiv called A self-supervised CNN for image watermark removal . I used it as a reference for this project. This is a Image Restoration (Regression) Deep Learning Model.

Environment

Make & Model: Alienware m15 R7
GPU: NVIDIA GeForce RTX 3060 Mobile (6 GB VRAM)
Secondary GPU: Integrated AMD Radeon Graphics
CPU: AMD Ryzen 7 6800H (16 threads @ 4.78 GHz)
RAM: 16 GB DDR5
CUDA Version: CUDA Version: 12.8
Operating System: Pop!_OS 22.04 LTS
Kernel: 6.12.10-76061203-generic

Notebook Index

Transform .............................. 17
Pathing ............................... 18
Data Loader .............................. 19
Criterion, Optimizer, & Scaler for AMP ......... 20
Training CNN Model ........................................ 21
Save Summary for Regression Task Report JSON .............. 22
Log Training History ............................. 23
Save Model Weights or Save Model of Training .................................... 31
Test Accuracy of CNN Training ............................... 24
Visualizations .................................. 25

Metrics .......................... 26

Summary - Everything .......................... 27
Model Performance Function .................... 28
Training & Validation Loss Plots .............. 29
Confusion Matrices ............................ 30
Feature Maps ................................. 31
Kernel Visualizations ........................ 32
Gradient Visualizations ...................... 33
CAM / Grad-CAM ............................... 34
Training/Validation Curves ................... 35
Explainability Tools ......................... 36

Visuals .......................... 36

Residual Histogram .................... 37
Local PSNR/SSIM Maps .............. 38
Side by side Comparison ............................ 39
Watermark Residual ................................. 40
Training Progress ........................ 41
Watermark Attention ...................... 42
Batch Processing ............................... 43
Learning Rate Visual ................... 44
Train Loss, Peak Signal-to-Noise Ratio, Structural Similarity Index ......................... 45
Residual Error ................... 46
Multi-Layer Activation ................... 47
Feature Activation Maps (Decoder Focus) ................... 48
Kernel/Visualizations ................... 49
Gradient Visualization ................... 50

Literature Cited ................................ 51
Environment .................................... 52
Recommended Resources .......................... 53
Permission ............................ 54

Dataset Description

Utilized self-supervised pairing & watermark synthesis. Thus not being reliant on finding a myriad of paired images with watermark and without watermark.

This project is trained from data that I scraped off of the internet from various sources while making image scrapers. With that being said, I will not be providing downloads to the data.

1000 Images, 996 jpg, 4 png file types.

Aim to create or find a dataset that is diverse in textures, colors, brightness, edges, and backgrounds.

If you need datasets consider checking out: Kaggle | Roboflow | COCO

Heterogeneous U-Net CNN Architecture

Encoder-Decoder Backbone - multi-resolution feature extraction.
DoubleConv Blocks - ReLU and LeakyReLU to capture diverse activations.
Attention Gates - at each skip connection for feature relevance gating.
Learnable Upsampling (Transpose Convs) avoid interpolation artifcats from bilinear.
Self-Attention Bottleneck - global spatial context.
Perceptual Feature Extractor (VGG16) - texture-aware loss.

Model Diagram

Loss Function

There are multiple components to the loss function.

1) Masked L1 Loss

Penalizes per pixel differences between predicted & ground truth images, optionally focusing on the masked region (where the watermark is located).

Formula:
L1_masked = (1/N) ∑_(i=1)^(N) M_i * ∣P_i − T_i∣

Where:
- P_i: Predicted Pixel
- T_i: Target Pixel
- M_i: Binary Mask (1=focus, 0=ignore)
- N: Sum of Mask Values (non-zero pixels)

2) LAB Color Loss

Transforms images to LAB color space & weights L-channel more (70%) to prioritize luminance, which human eyes are more sensitive to.

Formula:
LAB_loss = (0.7) * L1(L_p,L_t) + (0.3) * L1(𝐴𝐵_𝑝, 𝐴𝐵_𝑡)

Where:
- L_p, L_t: Luminance channels (predicted, target) - 𝐴𝐵_𝑝, 𝐴𝐵_𝑡: Chrominance channels

3) Perceptual Loss (VGG16)

Uses pre-trained VGG16 to extract deep features from intermediate layers and compares their activations.

Formula:
Perceptual_loss = L1(ϕ(P),ϕ(T))

Where:
- ϕ: Feature extractor from pretrained VGG16
- P,T: Predicted and target images

4) SSIM (Structural Similarity Index)

Captures structural similarity of images over patches.

Formula:
SSIM_loss = (1 - SSIM(P,T))

SSIM compares means, variances, and covariances of image patches. It’s luminance-driven and more perceptually aligned than L1.

5) Laplacian Edge Loss

Applies a Laplacian kernel to both predicted and target images to extract edges, then uses L1 on the resulting edge maps.

Formula:
Edge_loss = L1((∇^2)P,(∇^2)T)

Where:
- ∇^2 is the Laplacian operator.

Total Loss

Formula:
Total Loss = L1 + α * Perceptual + β * LAB + γ * SSIM + δ * Laplacian

Weights:
- α=0.3
- β=0.3
- γ=0.4
- δ=0.1

Loss Diagram

Total Parameters

Total Parameters: 5383366
Trainable Parameters: 3647878

The reason only 3647878 are trainable is that the other 1735488 are being used in VGG.

Data Generation (Self-Supervised)

This section contains: Watermark Generation, Watermark Application, Generate Pair, Data Augmentation, & Watermark Dataset Class.
The model learns from synthetic watermark pairs that are generated from the myriad of 1000 clean images I used in the dataset.

Watermark Generation

Creates a diverse variety of synthetic watermarks.

Watermark text contains 25 multi-langual phrases or watermarks. For example, ENG, SPAN, FR, GER, CH, JAP, KOR, RUSS, ARA, HIN.
The watermarks vary in size, rotations, opacity, & positions. As well as partial occlusion.
A slight amount of watermarks generated were lines, rectangles, & circles as opposed to text for more diversity.
Distortions: post processing with image filters to replicate low quality scanning or camera captures.

Semi-Random Alpha Blending (Applying Watermark)

The watermark is integrated into the clean image via spatially varying alpha mask:

Random opacity levels across image.
Occasional brightness & contrast adjustments.
Binary supervision mask generated simultaneously to mark affected regions.

Generate Pair (Pairwise Training)

Each training sample is a triplet:

wm_img1: The input with a synthetic watermark
Twm_img2: A second watermarked version (reference)
mask: The region where the watermark is present (used for masked L1 and Laplacian loss)

This mirrors the core idea in the research paper referenced, Simulates watermark removal in the absence of ground truth. It simulates the learning signal by generating different but related views of the same clean image.

Data Augmentation Pipeline (Albumentations)

Goal of enhancing robustness & domain generalization:

Geometric: crop, rotate, flip
Photometric: jitter, grayscale, gamma
Noise/Artifacts: blur, compression, rain, shadows, pixel dropout

Generate Pair (Pairwise Training)

Each training sample is a triplet:

wm_img1: The input with a synthetic watermark
Twm_img2: A second watermarked version (reference)
mask: The region where the watermark is present (used for masked L1 and Laplacian loss)

Hyperparameters

batch_size = 8

epochs = 25

learning_rate = 0.001

weight_decay = 1e-5

Transform

create transform variable and set image size to 256 pixels.
transform = get_augmentations(image_size=256)

Pathing

data/clean_images/.jpg & data/clean_images/.png
convert clean image data to RGB

split %:
train: 70%
validate: 20%
test: 10%

Train: 699, Val: 200, Test: 100

Loss, Optimizer, GradScaler for AMP & Learning Rate Scheduler

criterion = ColorAwareLoss(alpha=0.3, beta=0.3, gamma=0.4, delta=0.1).to(device)

Perceptual weight (alpha): = 0.3
Color weight (beta): = 0.3
SSIM weight (gamma): = 0.4
Laplacian weight (delta): = 0.1

AdamW Optimizer Algorithm

optimizer = torch.optim.AdamW(net.parameters(), lr=learning_rate, weight_decay=weight_decay)

Model is trained on AdamW optimization algorithm. It decouples weight decay from gradient updates which allows for better generalization & stability in training.
AdamW applies weight decay directly to the parameters, not the gradients like Adam which in turn improves generalization.

GradScaler for AMP

scaler = torch.cuda.amp.GradScaler()

Keeping GPU memory & training speed in mind, I utilized Automatic Mixed Precision (AMP) through the GradScaler. It dynamically scales gradients preventing underflow during the mixed precision training.

OneCycleLR Learning Rate Scheduler

scheduler = torch.optim.lr_scheduler.OneCycleLR(
optimizer,
max_lr=0.001,
steps_per_epoch=len(train_loader),
epochs=epochs + 2,
pct_start=0.3,
div_factor=10,
final_div_factor=100
)

LR Scheduler Diagram

Using OneCycleLR Learning Rate Scheduler. It is a dynamic scheduelr that:

warms up learning rate early in training.
Anneal to lower value nearing end of training.
Improves convergence & generalization.

Training

Forward Propogation, Loss Calculation, Backward Propagation, Weight Update.

Warmup Loop

Making adjustments to the hyperparameters, architecture, loss functions, etc can cause early epochs to act up adjusting to the changes made. Thus implementing a warmup loop.

A 2-epoch warmup with a reduced learning rate (1e-4) using a separate AdamW optimizer to stabilize initial convergence.

Training Loop

training runs for 25 epochs (27 including the 2 warmup) using the main AdamW optimizer & OneCycleLR learning rate scheduler.

Manual learning rate decay is applied at epoch 15 (10x drop) to encourage finer convergence.
Epochs compute: Train Loss, Train PSNR, Train SSIM, Val Loss, VaL PSNR, Val SSIM, LR
Best-performing model (lowest val loss) is saved to wm_best_model.pth
Training halts early if no improvement is seen for early_stop_patience epochs.

Results:

Warmup Epoch 1 | Loss: 6.72903
Warmup Epoch 2 | Loss: 3.8560
Epoch 1: Train Loss=4.7524, Train PSNR=14.79, Train SSIM=0.3273 , Val Loss=4.1361, Val PSNR=15.20, Val SSIM=0.3542, LR=0.000134
Epoch 2: Train Loss=3.8627, Train PSNR=16.61, Train SSIM=0.3853 , Val Loss=2.8606, Val PSNR=18.33, Val SSIM=0.4380, LR=0.000229
Epoch 3: Train Loss=2.6319, Train PSNR=19.94, Train SSIM=0.4862 , Val Loss=1.6640, Val PSNR=22.47, Val SSIM=0.5786, LR=0.000372
Epoch 4: Train Loss=1.5781, Train PSNR=23.88, Train SSIM=0.6011 , Val Loss=1.2799, Val PSNR=24.28, Val SSIM=0.6117, LR=0.000542
Epoch 5: Train Loss=1.3406, Train PSNR=25.09, Train SSIM=0.6366 , Val Loss=1.1182, Val PSNR=25.67, Val SSIM=0.6214, LR=0.000713
Epoch 6: Train Loss=1.3366, Train PSNR=25.31, Train SSIM=0.6567 , Val Loss=0.9333, Val PSNR=26.98, Val SSIM=0.6998, LR=0.000860
Epoch 7: Train Loss=1.3522, Train PSNR=25.02, Train SSIM=0.6596 , Val Loss=1.0313, Val PSNR=26.46, Val SSIM=0.6463, LR=0.000960
Epoch 8: Train Loss=1.2893, Train PSNR=25.61, Train SSIM=0.6700 , Val Loss=1.0174, Val PSNR=26.29, Val SSIM=0.6912, LR=0.001000
Epoch 9: Train Loss=1.2259, Train PSNR=25.95, Train SSIM=0.6788 , Val Loss=0.9617, Val PSNR=26.69, Val SSIM=0.6800, LR=0.000994
Epoch 10: Train Loss=1.2163, Train PSNR=26.00, Train SSIM=0.6792 , Val Loss=0.9782, Val PSNR=26.75, Val SSIM=0.6751, LR=0.000975
Epoch 11: Train Loss=1.2225, Train PSNR=26.00, Train SSIM=0.6952 , Val Loss=0.8795, Val PSNR=27.65, Val SSIM=0.7164, LR=0.000943
Epoch 12: Train Loss=1.2029, Train PSNR=26.03, Train SSIM=0.6865 , Val Loss=0.9174, Val PSNR=27.46, Val SSIM=0.6902, LR=0.000898
Epoch 13: Train Loss=1.1444, Train PSNR=26.32, Train SSIM=0.6899 , Val Loss=0.8914, Val PSNR=27.29, Val SSIM=0.6994, LR=0.000843
Epoch 14: Train Loss=1.2011, Train PSNR=26.00, Train SSIM=0.6932 , Val Loss=0.8762, Val PSNR=27.71, Val SSIM=0.7081, LR=0.000778
Epoch 15: Train Loss=1.1981, Train PSNR=26.01, Train SSIM=0.6880 , Val Loss=0.8976, Val PSNR=27.41, Val SSIM=0.7213, LR=0.000705
Lowering learning rate by 10x for fine-tuning phase.
Epoch 16: Train Loss=1.1729, Train PSNR=26.21, Train SSIM=0.6848 , Val Loss=0.9213, Val PSNR=27.10, Val SSIM=0.6916, LR=0.000627
Epoch 17: Train Loss=1.1742, Train PSNR=26.08, Train SSIM=0.6929 , Val Loss=0.9165, Val PSNR=26.96, Val SSIM=0.7061, LR=0.000545
Epoch 18: Train Loss=1.1629, Train PSNR=26.25, Train SSIM=0.6963 , Val Loss=0.8613, Val PSNR=27.75, Val SSIM=0.7229, LR=0.000462
Epoch 19: Train Loss=1.1667, Train PSNR=26.12, Train SSIM=0.6958 , Val Loss=0.8495, Val PSNR=27.56, Val SSIM=0.7105, LR=0.000380
Epoch 20: Train Loss=1.1428, Train PSNR=26.28, Train SSIM=0.6810 , Val Loss=0.8530, Val PSNR=27.83, Val SSIM=0.7173, LR=0.000302
Epoch 21: Train Loss=1.1608, Train PSNR=26.27, Train SSIM=0.6959 , Val Loss=0.8109, Val PSNR=27.90, Val SSIM=0.7294, LR=0.000229
Epoch 22: Train Loss=1.1382, Train PSNR=26.19, Train SSIM=0.6985 , Val Loss=0.8479, Val PSNR=27.54, Val SSIM=0.7122, LR=0.000163
Epoch 23: Train Loss=1.0897, Train PSNR=26.64, Train SSIM=0.7037 , Val Loss=0.8687, Val PSNR=27.52, Val SSIM=0.7099, LR=0.000107
Epoch 24: Train Loss=1.1151, Train PSNR=26.60, Train SSIM=0.7008 , Val Loss=0.8570, Val PSNR=27.33, Val SSIM=0.7125, LR=0.000061
Epoch 25: Train Loss=1.1098, Train PSNR=26.65, Train SSIM=0.6970 , Val Loss=0.8431, Val PSNR=27.54, Val SSIM=0.7151, LR=0.000028

Training Time Elapsed: 228.00 Minutes

Summary:

The warmup phase helped stabilize initial training as seen with the loss going from 6.72903 -> 3.8560 from first to second warmup epoch.
Train PSNR started at ~14.8 dB and reached ~26.6 dB
Val PSNR steadily improved to ~27.5–28 dB, a solid metric for image restoration tasks.
Val SSIM reached ~0.72, indicating perceptual quality and structure recovery are good.

Metrics & Visuals

All of the outputs that I used to make adjustments to the model as needed after each training run.

Metrics

Evaluation

Test Loss: 0.7706
Test PSNR: 28.67 dB
Test SSIM: 0.7202

Visuals

Train & Val Loss, PSNR Progress

Plots show loss & PSNR over time for both training and validation sets. These curves verify the convergence behavior of the network, confirm generalization to unseen validation data, and track performance metrics epoch-by-epoch.

Training Loss, Peak Signal to Noise Ratio, & Structural Similarity Plots

Three synchronized subplots provide a compact summary of the network’s learning trajectory:

Loss Curve: steady convergence from high initial error.
PSNR Curve: clear climb toward 26–27 dB, a common benchmark in high-fidelity restoration.
SSIM Curve: shows perceptual similarity nearing ~0.70, indicating structural retention of clean images.

Learning Rate Scheduler

OneCycleLR plot shows the cyclical learning rate schedule used during training. The LR increases for the first 30% of epochs (warm-up), then decays gradually helping the model converge quickly while avoiding sharp minima.

Local PSNR & Local SSIM Maps

These heatmaps depict spatially localized Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) metrics across image patches. Brighter regions indicate higher fidelity restoration, offering insight into performance variability across different spatial contexts.

Residual Histogram

histogram displays the distribution of absolute pixel-wise residuals between the predicted image and the original watermark-free image. A sharp peak near zero indicates strong restoration fidelity, where most pixels are accurately reconstructed with minimal error.

Side by Side Comparison

Visual triplets are shown in three columns: the clean original, the watermarked version, and the predicted output. These qualitative results help highlight how well the model removes diverse watermark styles, maintaining visual structure and content.

Watermark Residual

A three-panel plot visualizes the residual heatmap, a binary thresholded mask, and the original vs predicted absolute difference. This highlights exactly where watermark artifacts remain and how well they’re suppressed in the output.

Watermark Attention

Decoder-level attention maps (heatmaps) illustrate how different layers in the network respond to watermark perturbations. This diagnostic tool verifies that deeper layers are focusing on regions corrupted by watermarks—demonstrating task-specific feature learning.

Batch Processing

This view shows the entire training batch as input, reference, and predicted samples across multiple augmentations. It’s useful to understand the model’s robustness to various synthetic watermark styles generated on-the-fly.

Residual Error

Highlights pixel-level discrepancies between the original and predicted outputs using three subplots:

Residual Heatmap: This shows the absolute error (|original - predicted|) in pixel intensities. Brighter areas signify larger prediction errors, often correlated with watermark regions.
Thresholded Mask (> 0.1): Highlights only those pixels where the residual exceeds a certain perceptual threshold (0.1). This isolates major prediction faults or watermark remnants.
Overlay of Original vs Predicted: Gives a side-by-side visual of the predicted error distribution relative to the original image, making it easy to spot spatial trends in reconstruction quality.

Helps pinpoint where the model struggles, and guides improvements to the network or loss function.

Multi-Layer Activation

Visualizes encoder and decoder layer outputs (e.g., enc1, enc2, dec1, dec2, dec3). Which reveals internal representation learning and feature abstraction across layers. It is useful for understanding how watermark-related features propagate through the network.

Feature Activation Maps (Decoder Focus)

Feature maps from deeper decoder layers of the U-Net architecture:

The right-side plot shows the activation map from dec3 (final decoder layer before output), scaled from 0 to 1.
It reveals where the model is focusing its attention during reconstruction, particularly in recovering watermark-affected areas.

The decoder's high response in watermark-heavy zones suggests successful spatial attention learning. This also acts as a diagnostic tool to ensure the decoder is not ignoring important visual features.

Referenced Paper

@article{liu2024heterogeneous,
  title={Heterogeneous U-Net for Image Restoration},
  author={Liu, Jiang and Zhang, Yulun and Li, Wangmeng and others},
  journal={arXiv preprint arXiv:2403.05807},
  year={2024},
  url={https://arxiv.org/abs/2403.05807}
}

Grá Mór

4 Feb 2010, 10:39 PM EST

#1

ian ryan

forum dweller

join date: Apr 2007

Posts: 9,000

Capstone - Classification Convolutional Neural Network by Ian Ryan

https://github.com/ianhenryryan/capstone

Welcome to My Fall 2024 Capstone Project advised by Dr. Soltys.

The objective of this semester-long project was to create a Convolutional Neural Network (CNN) using PyTorch by scratch, rather than using pre-trained models like ResNet-18. After developing an acceptable classification model, the next goal was to adapt it into an object detection model by implementing YOLOv8.

Although my primary interest was detecting humans, I chose to create a multiclass model to avoid building a simple perceptron. To up the ante, I also made the dataset that the model was trained on an imbalanced dataset. Developing my own model presented a challenging learning opportunity and helped strengthen my understanding of CNN fundamentals for computer vision, a field in which I aspire to build a career.

This repository contains the majority of my work, including various iterations of fine-tuning, visualizations, the CNN architecture schematic, datasets used, comprehensive Jupyter Notebooks, my capstone poster, final capstone pitch, literature citations, and more. The Jupyter Notebooks feature detailed indexes for easy navigation through different sections of the programs.

Convolutional Neural Network Architecture

Notebook Index

Class Confirmation in Training Data .............. 29
Profile Memory Usage ............................. 30
Training Prep .................................... 31
Training CNN Model ............................... 32
Test Accuracy of Convolutional Neural Network ... 33
Visualizations .................................. 34

Summary - Everything .......................... 35
Model Performance Function .................... 36
Training & Validation Loss Plots .............. 37
Confusion Matrices ............................ 38
Feature Maps ................................. 39
Kernel Visualizations ........................ 40
Gradient Visualizations ...................... 41
CAM / Grad-CAM ............................... 42
Training/Validation Curves ................... 43
Explainability Tools ......................... 44

Acknowledgements ................................ 45
Literature Cited ................................ 46
Environment .................................... 47
Recommended Resources .......................... 48
Creator Information ............................ 49

Acknowledgements

I would like to express my gratitude to Dr. Michael Soltys and Dr. William Barber for sparking my interest in Machine Learning and Artificial Intelligence. Over the past two years at California State University Channel Islands (CSUCI), it has been a pleasure taking multiple courses with both professors, each bringing their own perspectives and experiences. They both were able to articulate complex information in a digestible way effortlessly.

Dr. William Barber
Dr. Barber, with his background in Physics and a distinguished career in Medical Imaging and Research, provided a scientific and mathematical foundation that enhanced my understanding of computational models, data analysis, and data extraction. His extensive industry experience includes serving as a Director of Medical Imaging. His engaging teaching style and passion for image processing techniques and pattern recognition concepts inspired me to explore this field further.

Dr. Michael Soltys
Dr. Soltys, with his comprehensive background in Algorithms, Machine Learning, and Cloud Computing, fostered my interest in pursuing a Computer Vision Capstone Project. As an accomplished author of two books and more than 60 published research papers, and as a Principal Scientist and Software Engineer, Dr. Soltys brings a wealth of in-depth knowledge and real-world applications to his teaching. His courses and recommendations proved to be instrumental during the development of my Convolutional Neural Network (CNN) Capstone Project.

Bayne H. Ryan
Bayne, my dear nephew. You arrived the first week of December, right in the middle of preparations for my Capstone Showcase. I truly admire your impeccable sense of urgency to stop being a submarine and surface just in time to appreciate my Convolutional Neural Network (CNN) Capstone project. Your surprise guest appearance was undoubtedly the highlight of my year. I cannot wait to forcibly teach you calculus when you reach the prestigious age of juice boxes and nap time (also known as four years old).

Literature Cited

Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv preprint arXiv:1901.08688.

Brownlee, J. How to Develop Convolutional Neural Network Models for Time Series Forecasting. Machine Learning Mastery. Available at: https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/.

Dougherty, G. Pattern Recognition and Classification: An Introduction. Springer, 2012.

Eliot, D. Deep Learning with PyTorch Step-by-Step: A Beginner's Guide, Volume I: Fundamentals. Self-published, 2020.

Jocher, G., Chaurasia, A., & Qiu, J. (2023). Ultralytics YOLOv8. Available at: https://github.com/ultralytics/ultralytics.

Environment

AWS SageMaker Jupyter Notebook

Notebook Instance Type: ml.g4dn.2xlarge

Processor: NVIDIA T4 Tensor Core GPU with 16 GB GDDR6 VRAM

Memory: 32 GB System RAM

Storage: 5 GB EBS Volume

Operating System: Amazon Linux 2 with Jupyter Lab 3 (notebook-al2-v2)

Lifecycle Configuration: None

Elastic Inference: Not Enabled

Minimum IMDS Version: 2

Recommended Resources

Courses for Computer Science Students at California State University Channel Islands (CSUCI) Interested in Machine Learning & AI

Channel Island Specific Courses:

COMP 345 - Digital Image Processing (Fall Course)
COMP 354 - Analysis of Algorithms
COMP 445 - Image Analysis and Pattern Recognition (Spring Course)
COMP 454 - Automata, Languages & Computation
COMP 469 - Intro to Artificial Intelligence / Machine Learning (Fall Course)
COMP 491 - Capstone Prep (Specifically with Dr. Soltys)
COMP 499 - Capstone (Specifically with Dr. Soltys)

LinkedIn Learning Courses (if offered):
If a College/University student, it is worth seeing what resources your institution provides at your disposal. For example, CSUCI offers LinkedIn Learning and access to the O'Reilly website.

Deep Learning and Generative AI: Data Prep, Analysis, and Visualization with Python - Leverage Generative AI for Analytics and Insights by Gwendolyn Stripling
Deep Learning: Image Recognition - Learning Image Recognition by Isil Berkun
Advanced AI: Transformers for Computer Vision by Jonathan Fernandes
Building Computer Vision Applications with Python by Eduardo Corpeño
Applied Machine Learning: Algorithms by Matt Harrison
Building Deep Learning Applications with Keras by Isil Berkun

Helpful Resources:

Think Python: How to Think Like a Computer Scientist (2nd Edition, Version 2.4.0) by Allen Downey
Deep Learning with PyTorch Step-by-Step: A Beginner's Guide - Volume I: Fundamentals by Daniel Voigt Godoy
Pattern Recognition and Classification: An Introduction (Springer, 2012) by Geoff Dougherty
An Introduction to the Analysis of Algorithms: (3rd Edition, 2018) by Michael Soltys
Ultralytics YOLOv8 Documentation
Object Detection and Localization with YOLOv3 by B. Rupadevi and J. Pallavi
Digital Image Processing for Medical Applications by Geoff Dougherty

Creator Information

GitHub: https://github.com/ianhenryryan
LinkedIn: https://linkedin.com/in/ianhenryryan/
Websites:
- http://ianhryan.com/
Kaggle: https://kaggle.com/ianryan
Hugging Face: https://huggingface.co/Ianryan

Permission

Students, educators, and anyone else keen on learning about Convolutional Neural Networks (CNNs), Computer Vision, Supervised Learning Algorithms, Machine Learning, Deep Learning, AI, or related fields are welcome to use this notebook in any capacity as a learning resource.

This notebook is part of my Fall 2024 Capstone Project for my Bachelor’s degree in Computer Science at California State University Channel Islands. I structured the content to be approachable and digestible, reflecting my own learning journey in AI and Machine Learning.

I hope this notebook can be of use to those exploring similar topics. This specific Jupyter Notebook focuses on Image Classification and demonstrates combining two datasets to create a class imbalance for training purposes. A separate notebook dedicated to Object Detection will be available soon (if not already).

Important Note on Datasets: The datasets used in this project are not my property. Credit is given to the original dataset creators, and their links are provided within the notebook in the dataset section at the beginning.

I understand that grasping the fundamentals of CNNs and related AI concepts can be overwhelming at first. My goal is to make these topics more accessible through these notebooks.

Permissions:
You are free to download, use, edit, and reference the notebooks, Python code, and Markdown content. I aim for accuracy in the explanations provided, though I acknowledge that scientific understanding is always evolving. I welcome constructive feedback and corrections.

This project is intended as a learning resource.

Ian Ryan

Grá Mór

4 Feb 2010, 11:24 PM EST	#2
ian ryan forum dweller join date: Apr 2007 posts: 9,000	Deconvolution (De-blur) & Enhancement by Ian Ryan A deep dive into digital image processing using ImageJ to correct blurry images through deconvolution and enhance poorly illuminated images. Techniques used include Wiener filtering, custom PSFs, and enhancement via brightness/contrast normalization, often layered with macros to test results efficiently. Table of Contents Introduction Methods Deconvolution Results blurred.tif (License Plate) brainGB2.tif (Skull) carnivalride.png (Rotational Blur) vintagephotographer.tif Enhancement Results bad_background.jpg dark_foreground.jpg Conclusion Resources & Environment Downloadable Macros Introduction The project focused on sharpening blurry images and enhancing dark ones using various digital image processing techniques. This was done in ImageJ, with careful experimentation on gamma values, deconvolution filters, and visual comparison of results. Deconvolution (De-Blur) initial images Enhancement initial images Methods Software ImageJ (Fiji distribution) Filters Wiener Filter (Deconvolution) Enhance Contrast Sharpen Despeckle Brightness/Contrast Shadow Enhancement Point Spread Functions (PSF) pillbox.tif GaussianBlur.tif (with custom modifications) Handcrafted disc-shaped PSF (for carnivalride.png) Custom Macros One macro per image with step-by-step automation Used for documentation and reproducibility Hardware Alienware M15 Nvidia 3060 Windows 11 with ImageJ Deconvolution Results blurred.tif (License Plate) Used: Wiener Filter (gamma = 0.05), pillbox PSF Enhancements: Sharpen → Shadows (SW) → Invert Goal: Make the plate readable Before / After: brainGB2.tif (Skull) Custom PSF based on modified Gaussian blur Steps: Wiener Filter (0.005) → Sharpen → Enhance Contrast Also explored a non-convolution deblur trick using subtraction Before / After: carnivalride.png (Rotational Blur) Most difficult case (rotational motion blur) Hand-drawn disc-shaped PSF used Multiple iterations of deconvolution and blending Before / After: vintagephotographer.tif Modified Gaussian PSF Complex multi-step process involving mean filter, math gamma, image subtraction, despeckle, etc. Before / After: Custom Made Point Spread Functions for Project. Enhancement Results bad_background.jpg Brightness adjusted -> Enhance contrast Result: Partial face revealed Before / After: dark_foreground.jpg MATH gamma failed, fallback to brightness + histogram equalization Result: Visible facial details without losing the background Before / After: Conclusion The project was a rewarding challenge, I despised dealing with rotational blur in the carnival ride. Experimenting with the Wiener filter, custom PSFs, and enhancement orders provided valuable insight into digital image restoration. Environment Software ImageJ (Fiji distribution) System Alienware m15 Nvidia GeForce RTX 3060 Windows 11 File Types .tif .jpg .png Macros Capable of sending macro zip upon request if interested. Grá Mór

5 Feb 2010, 12:15 AM EST

#3

ian ryan

forum dweller

join date: Apr 2007

Posts: 9,000

Why.So.Saucey is a glorious beacon of light

Seed Pattern Recognition by Ian Ryan

This project explores seed classification using pattern recognition techniques. I used self-captured seed images, processed them in ImageJ to extract measurable features, and then trained a K-Means Clustering model in JMP. The goal was to accurately differentiate between 5 seed types based on shape-based features.

Seed Classes

This project contains 5 classes: Weed, Carrot, Pumpkin, Pea, Spinach
Training Data: 5 classes - 4 images per class
Testing Data: 5 classes - 3 images per class Seed Class Example

Image Capturing

This project focuses on classifying a variety of seeds. Conveniently, my parents grow vegetables and my dad reliving his glory days from the 1970s, grows weed. This left a plethora of seeds to choose from to capture images of and then extract features from. The chosen classes of seed types utilized are weed, carrot, pumpkin, pea, and spinach.
Seed Class Example

first raw training image of each class.

For capturing the images I placed a piece of 8.5x11 white paper into an In-N-Out tray. After set-up of the environment I poured seeds into the tray and captured 4 images per seed type to gather the training data using an Iphone 13. I repeated this process again and captured 3 images per seed type to gather testing data since you cannot test with the data you trained on. All images captured at various distances, meaning area cannot be used as a feature for the classes.

Image Processing + Segmentation + Feature Extraction

For the processing of the images in both the training and testing images I utilized a standard approach to all of them using the same techniques and made minor adjustments to the techniques as needed. Over a few hours of tinkering in ImageJ with the images and extracting their data, I successfully segmented and feature extracted both sets of images.

Techniques Performed

8-bit
Duplicate
Subtract Background
Enhance Contrast
Duplicate
Gaussian Blur
ImageCalc: Duplicate of Duplicate subtract Duplicate
Enhance Contrast
Threshold
Remove Outliers
Fill Holes
Set Measurements
Analyze Particles
Invert LUT

Train Weed before/after:
Seed Class Example

Train Carrot before/after:
Seed Class Example

Train Pumpkin before/after:
Seed Class Example

Train Pea before/after:
Seed Class Example

Train Spinach before/after:
Seed Class Example

Classification Models

Now for putting the extracted training data to use. Following the image processing and segmentation we now have acceptable data/features/measures to train the model with. The next step is to export all the data from Google Sheets to an XLS file. Once that is done I open the statistical software JMP (student edition). Here I click on New Data Table which brings up an empty array table. Then press open and click on the all XLS file created earlier. Import it and then save it as a JMP file. Seed Class Example

K-Means Clustering

Used JMP Statistical Software (Student Version)
Imported all labeled feature data into a data table
Set number of clusters = 5 (one per seed class)
Results:
- Decent clustering overall
- Noticeable overlap between certain seed types
- Visually similar seeds (like carrot and pea) contributed to the overlap

Here is the scatterplot matrix, As you can see there is a lot of overlap and overlapping in the clusters. Seed Class Example

Neural Net (out of curiousity, not enough data for accurate NN)

Conclusion

In conclusion, the segmentation portion of the project took me the longest to complete. There were cases in which I had to keep trying to process images over and over again until I produced an acceptable number of objects in each image. It is important to crop the testing and training images so there is less noise to try and eliminate from the images. It is also a good idea to overly take measurements/data to give more options on how to manipulate it in certain ways to benefit the model more. Using seeds that are more varying in circularity and shape would have been beneficial as a lot of my data gets misread in the models.

Specifications

Camera: iPhone 13
Alienware m15 R7 Laptop
Processor:
- AMD Ryzen 7 6800H (20 MB total cache, 8 cores, 16 threads)
Graphics Card:
- NVIDIA GeForce RTX 3060, 6GB GDDR6
Memory:
- 16 GB (2 x 8 GB, DDR5, 4800 MHz)
Hard Drive:
- Crucial T500 2TB PCIe Gen4 NVMe M.2 SSD
Operating Systems:
- Windows 11
Software:
- ImageJ
- Google Sheets
- JMP: Statistical Software: Student Version

Resources

Images:
- Training Images:
  - 4 x Images (self-taken) x 5 different classes (seed types)
- Testing Images:
  - 3 x Images (self-taken) x 5 different classes (seed types)
Code:
- ImageJ Macros:
  - 4 x Segmentation Training Macros x 5 different classes XLS files
  - 3 x Segmentation Testing Macros x 5 different classes XLS files
Data:
- Training Data:
  - 4 x Classes (images) x 5 different classes (seed types) XLS files
  - 1 x All Training Data XLS file with label
- Testing Data:
  - 3 x Classes (images) x 5 different classes (seed types) XLS files
  - 1 x All Testing Data XLS file with label

Please reach out if interested in the project zip file containing all resources.

Grá Mór

Previous Thread | Next Thread

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Posting Rules
You may not post new threads You may not post replies You may not post attachments You may not edit your posts vB code is On Smilies are On [IMG] code is On HTML code is Off

All times are GMT -4. The time now is 4:39 AM EST.
Powered by: vBulletin, Copyright \A92000 - 2010, Jelsoft Enterprises Ltd.

Watermark Removal Deep Learning CNN (Self-Supervised) via PyTorch

Environment

Notebook Index

Table of Contents

Dataset Description

Heterogeneous U-Net CNN Architecture

Model Diagram

Loss Function

1) Masked L1 Loss

2) LAB Color Loss

3) Perceptual Loss (VGG16)

4) SSIM (Structural Similarity Index)

5) Laplacian Edge Loss

Total Loss

Loss Diagram

Total Parameters

Data Generation (Self-Supervised)

Watermark Generation

Semi-Random Alpha Blending (Applying Watermark)

Generate Pair (Pairwise Training)

Data Augmentation Pipeline (Albumentations)

Generate Pair (Pairwise Training)

Hyperparameters

Transform

Pathing

Loss, Optimizer, GradScaler for AMP & Learning Rate Scheduler

AdamW Optimizer Algorithm

GradScaler for AMP

OneCycleLR Learning Rate Scheduler

LR Scheduler Diagram

Training

Warmup Loop

Training Loop

Results:

Metrics & Visuals

Metrics

Evaluation

Visuals

Train & Val Loss, PSNR Progress

Training Loss, Peak Signal to Noise Ratio, & Structural Similarity Plots

Learning Rate Scheduler

Local PSNR & Local SSIM Maps

Residual Histogram

Side by Side Comparison

Watermark Residual

Watermark Attention

Batch Processing

Residual Error

Multi-Layer Activation

Feature Activation Maps (Decoder Focus)

Referenced Paper

Capstone - Classification Convolutional Neural Network by Ian Ryan

Convolutional Neural Network Architecture

Notebook Index

Table of Contents

Acknowledgements

Literature Cited

Environment

AWS SageMaker Jupyter Notebook

Recommended Resources

Creator Information

Permission

Deconvolution (De-blur) & Enhancement by Ian Ryan

Table of Contents

Introduction

Methods

Deconvolution Results

Enhancement Results

Conclusion

Environment

Macros

Seed Pattern Recognition by Ian Ryan

Table of Contents

Seed Classes

Image Capturing

Image Processing + Segmentation + Feature Extraction

Classification Models

K-Means Clustering

Neural Net (out of curiousity, not enough data for accurate NN)

Conclusion