YieldOpsAcademy
Zone 00 ยท Clean Room

The Wafer's Journey

Follow Wafer #5521 through final optical inspection. 50,000 candidate defect images. Human review capacity: 200 per wafer. The CNN classifier has been running for 18 months. Last Tuesday a new defect morphology appeared. The model has been passing it as nuisance ever since. Master CNN-based defect classification through the failure mode that no tabular model can solve: a defect whose identity is in its shape.

๐Ÿ”
Today's Subject
CNN Defect Classification
Convolutional neural networks for wafer map pattern recognition

What a Number Cannot See

Every defect pattern on a wafer map tells a physical story. A ring of defects near the edge means edge bead removal failed. A linear scratch means a robot arm made contact during handling. A dense cluster in a repeating grid position means a reticle has a particle. A donut shape centered on the die means a focus offset hit a specific layer. These stories are encoded in the spatial arrangement of the defects, not in any individual sensor reading. An XGBoost model trained on FDC tabular features cannot read these stories. It never sees the shape. A CNN can. It learns to recognize defect morphologies the same way a trained engineer does: by looking at the spatial pattern as a whole.

01
Convolutional Feature Detection
A convolutional layer slides a small filter (e.g. 3x3 pixels) across the wafer map image. Each filter detects a specific local pattern: an edge, a corner, a gradient. Multiple filters run in parallel, each learning to detect a different low-level feature. Early layers detect edges and local density. Later layers combine these into increasingly abstract representations: "cluster," "ring," "scratch."
02
Pooling Reduces Spatial Sensitivity
Max pooling takes the strongest activation in each region and discards the rest. This makes the representation invariant to small translations: a scratch that appears 3 pixels left in one wafer and 3 pixels right in another still activates the same filter. For defect classification, this invariance is a feature: a ring defect should be recognized whether it is from a 200mm or 300mm wafer, scaled or shifted.
03
Classification Head
After several convolutional and pooling layers, the spatial feature map is flattened and passed to a fully connected layer. This final layer produces a probability distribution over defect classes: Ring (0.82), Scratch (0.09), Cluster (0.06), Random (0.03). The class with the highest probability is the predicted defect type, or "nuisance" if no known pattern matches with confidence.
04
Class Activation Mapping
Grad-CAM (Gradient-weighted Class Activation Mapping) traces which regions of the input image caused the predicted class. For a ring prediction, the activation map highlights the circular edge region. For a scratch, it highlights the linear feature. This spatial attribution is what makes CNN defect classification explainable to a process engineer: the model can show exactly which pixels drove its decision.

The CNN does not memorize specific defects. It learns the abstract spatial grammar of defect patterns: circularity, linearity, clustering, periodicity. This is why a well-trained CNN generalizes to new lots and new product nodes better than any hand-engineered feature set.

Full Access Required

Continue the journey

Zones 01 through 04 cover the problem scenario, algorithm analysis, alternative comparisons, interview gauntlet, and production checklist for this journey.

All six journeys are included with full access.

Unlock full access ยท $149See the learning path