Part 0

The Reset

What you must unlearn before you walk into an interview at a semiconductor company.

Chapter 0.1

The Interview Reality Check

LeetCode rewards speed and correctness on clean, bounded problems. Consider a typical medium problem: "Find the longest substring without repeating characters." The input is a string. The output is an integer. The constraints are explicit. You have 30 minutes. Your solution is judged on time and space complexity and nothing else.

Now consider the fab equivalent: "Detect when a plasma etch process has reached endpoint." The input is a 100 Hz time series of optical emission spectroscopy intensity at 520 nanometers. The output is a binary decision: endpoint reached or not. The constraints are physical: the plasma micro-arc that destroys transistors lasts 50 milliseconds. Your model has 50 milliseconds to detect it and trigger the hardware interlock. False negative: $2.5 million scrap. False positive: 4 hours of unnecessary chamber cleaning, idle downstream tools, missed production targets.

The LeetCode problem is about algorithmic elegance. The fab problem is about operating within constraints that bankrupt divisions if you get them wrong.

The Three Constraints That Reshape Everything

The summaries below are tuned for interview execution: the numbers to integrate, what the interviewer is testing for. For the full mechanical explanation of each constraint, see Field Manual Introduction.

Constraint 1

Experiments Cost $50,000 Per Wafer

In software ML, data is abundant and cheap. In a fabrication facility, every experimental wafer costs approximately $50,000 at advanced nodes, raw silicon, processing costs through 500+ sequential steps, opportunity cost of the production slot, and engineering time. A five-factor optimization at three levels each requires a Resolution V fractional factorial design with 32 runs. That's $1.6 million in wafer cost alone, before the six-week MES coordination and Yield Review Board approval process. You cannot brute-force the search space. Every experiment must be justified with a specific hypothesis, minimum detectable effect size, and power calculation.

Constraint 2

Ground Truth Arrives 60-90 Days Late

In consumer internet ML, ground truth arrives in milliseconds to days. In semiconductor manufacturing, ground truth for many critical predictions arrives when wafers finally reach electrical test, 60 to 90 days after the process step you're modeling. During those 60-90 days, you are flying blind. If your model has a systematic bias that wasn't caught in offline evaluation, the financial consequences accumulate silently. By the time you detect the problem, thousands of wafers have been processed, each accumulating additional value through subsequent steps, each potentially misprocessed based on your predictions.

Constraint 3

Deployment Targets Are Air-Gapped Industrial Computers

In software ML, deployment means pushing to a Kubernetes cluster on modern hardware. In a fabrication facility, the tool controller is often a machine from 2012: dual-core processor, 2GB RAM, no GPU, no internet connection, Python 3.7 if you're lucky. A PyTorch model that runs in 80ms on your development laptop may take 1.5 seconds on that hardware. If the plasma event you're monitoring happens in 50ms, 1.5 seconds is a missed event, 30x too slow. Deployment means physical media transfer, antivirus kiosk, IT change management ticket, and 2-6 weeks of change control approval.

The Interview Disconnect

These constraints explain every "weird" practice you'll encounter. When interviewers ask about anomaly detection, they're not testing whether you know Isolation Forest, they're testing whether you'd check timestamps, sensor health, and spatial patterns before running any algorithm. When they ask about model validation, they're testing whether you understand operating blind for 60-90 days. When they ask about deployment, they're testing whether your cloud assumptions are actively wrong for their environment.

Chapter 0.2

The Banned List

The fastest way to fail a semiconductor data science interview is to confidently propose something that violates fundamental constraints. Study this list not as rote prohibition, but as insight into the physics. Each banned item is an opportunity to demonstrate domain awareness, the interviewer isn't testing whether you know SMOTE exists, they're testing whether you understand why physics makes it dangerous here.

BANNED #1

SMOTE and Synthetic Sampling

What it is

Synthetic Minority Over-sampling Technique generates synthetic training examples by interpolating between existing minority class samples. Standard for imbalanced datasets in software ML.

Why it's banned

SMOTE creates physically impossible process states. A synthetic wafer at 450W, 65 mTorr, 125 sccm may represent a plasma operating point that cannot physically exist, the plasma ignites and extinguishes differently at that combination. Training on physically impossible states teaches your model that the world is different than it is.

What to say instead

I'd use cost-sensitive learning with class weights, or design a split-lot experiment to collect more minority class data. For rare defect modes, I'd implement active learning to prioritize uncertain samples for physical inspection. Synthetic sampling risks physically invalid feature combinations.

BANNED #2

Cloud APIs for Real-Time Inference

What it is

Calling AWS SageMaker, Google Vertex, or Azure ML for model predictions. Standard in software for scalable, low-maintenance deployment.

Why it's banned

Three insurmountable barriers: (1) Latency, round-trip to cloud is 50-200ms minimum; plasma arcs destroy transistors in 50ms. (2) Air gap, production tool controllers have no internet connectivity; physical security requirement. (3) Reliability, network jitter or outages become production outages; unacceptable for safety-critical processes.

What to say instead

I'd export to ONNX INT8 format and deploy to the edge controller. No network dependency, deterministic latency under 10ms, hardware interlock as fail-safe. Model runs bare-metal on the tool PC.

BANNED #3

Neural Networks as Default Choice

What it is

Reaching for MLPs, CNNs, RNNs, or Transformers as the first architecture. Standard in software where interpretability is secondary to accuracy.

Why it's banned

Three problems: (1) Process engineers with 20 years of tool experience will not trust black box predictions, they need to understand why, connecting to physical mechanisms they can control. (2) Even small neural networks are too slow for 50ms inference budgets on 2012 hardware. (3) Many fab processes are locally linear or governed by well-understood physics, EWMA controllers are sufficient, interpretable, and fast.

What to say instead

I'd start with linear models or EWMA control for interpretability and speed. If non-linearities dominate, I'd use gradient-boosted trees with SHAP for feature importance. I'd reserve neural networks for image-based metrology, CD-SEM, defect inspection, where CNNs are appropriate and latency constraints are relaxed.

BANNED #4

Standard K-Fold Cross-Validation

What it is

Randomly splitting data into K folds, training on K-1, testing on 1. Standard validation in software ML.

Why it's banned

Temporal leakage. Random splits put future wafers in training, past wafers in test. In production, you can never train on future data. PM events create step changes, recipe changes create distribution shifts, and seasonal effects create trends, random mixing blends these distinct regimes and creates unrealistically optimistic validation.

What to say instead

I'd use walk-forward validation with expanding windows. Train on [0:t], validate on [t:t+delta_t], never shuffle. Respect the arrow of time. For PM-aware validation, ensure training and test sets come from the same PM window or explicitly test generalization across PM events.

BANNED #5

Retraining on Recent Data Without Investigation

What it is

Detecting distribution shift via PSI, then automatically retraining the model on recent data. Standard continuous learning in software.

Why it's banned

In software, distribution shift means user behavior changed, retraining adapts. In fabs, distribution shift often means hardware degradation: RF match network capacitor wearing, polishing pad glazing, thermocouple drift. Retraining absorbs the degradation into the model as the new normal. The model learns that broken hardware is acceptable. It stops detecting the degradation that engineering should fix.

What to say instead

I'd distinguish expected shift (recipe change, new product) from unexpected shift (hardware degradation). For expected shift, document and update the golden baseline. For unexpected shift, investigate root cause before any retraining. PSI against commissioning baseline, not rolling window. CUSUM on key physics parameters for drift accumulation.

BANNED #6

Forward-Filling Missing Values

What it is

df.fillna(method="ffill") or equivalent. Standard imputation for time series.

Why it's banned

Creates dangerous flatlines. A thermocouple fracturing at 3:47 AM reports the last valid reading, 121.5 degrees, indefinitely. Forward-filling propagates this frozen value. The run-to-run controller sees perfect stability, maintains heater power, actual temperature rises unchecked. 847 wafers processed at wrong temperature. FDC interprets zero variance as excellent tool health. The chamber overheats while the monitoring system reports green.

What to say instead

I'd diagnose first: sentinel value (-9999.0, 9999.0)? Frozen sensor (rolling variance = 0)? Communication timeout (timestamp gaps)? Then drop and ticket, never impute without root cause. Implement frozen-sensor detection: rolling variance below healthy minimum triggers alarm and control suspension.

BANNED #7

Docker or Kubernetes on Tool Controllers

What it is

Containerized deployment. Standard for reproducible, scalable software deployment.

Why it's banned

Three barriers: (1) 2012 industrial PCs with 2GB RAM cannot run the Docker daemon. (2) Container base images require internet access to build, air gap prohibits. (3) Container abstraction adds latency and memory overhead that violate the constraints.

What to say instead

I'd build a static binary with all dependencies compiled in, or export to ONNX for runtime-only inference. Zero external dependencies. Validated on mirror hardware before change control submission. Deployment is file copy, not container orchestration.

BANNED #8

Attention Mechanisms for Temporal Sensor Data

What it is

Transformer architectures, self-attention for time series. State-of-the-art in NLP and many software ML applications.

Why it's banned

Three problems: (1) Attention computation is O(n^2) in sequence length, too slow for 100 Hz real-time processing. (2) Attention weights don't map to physical mechanisms, cannot explain why to process engineers. (3) 100 Hz sensor streams from physical processes don't exhibit the long-range dependencies that justify attention; local patterns (sliding windows, derivatives) are sufficient.

What to say instead

I'd use 1D convolutions or LSTM if sequence matters. Attention only for spatial wafer maps, defect pattern recognition across a 300mm surface, never for temporal tool data where local features suffice.

BANNED #9

Population Stability Index on Rolling Windows

What it is

Computing PSI by comparing the recent distribution to a trailing window distribution. Common drift detection implementation.

Why it's banned

The Boiling Frog problem. If the process drifts gradually (consumable degradation, electrode wear), the rolling baseline adapts with it. PSI stays low even as the process moves far from its original state. The monitoring system accurately compares today to yesterday, missing that both are far from known-good.

What to say instead

I'd compute PSI against a fixed golden baseline established during commissioning or post-PM validation. Never against rolling windows. For seasonal variation, use decomposition and monitor residuals. For consumable degradation, use CUSUM on accumulated deviation from the fixed baseline.

BANNED #10

Feature Engineering by Pure Algorithm

What it is

Genetic programming, deep feature synthesis, or other automated feature engineering that optimizes predictive power without human interpretation.

Why it's banned

Features must be physically interpretable. Process engineers with decades of experience must understand what each feature measures and why it affects yield. A feature called "genetic_program_feature_47" that correlates with yield but has no physical meaning will not be trusted, and it cannot inform process improvement, it only predicts.

What to say instead

Every feature must have a one-sentence physical interpretation. "RF_Forward_Power_mean reflects average energy delivered to plasma, controlling ion bombardment flux and etch rate." Domain-driven engineering: start with physics, validate with data, never algorithm-first.

BANNED #11

P-Values for Model Significance

What it is

Frequentist hypothesis testing with fixed alpha = 0.05. Standard statistical practice.

Why it's banned

Fab data is streaming and sequential with implicit stopping rules. Fixed-n hypothesis testing assumes sample size is determined before data collection, violated in production where you observe continuously. Multiple testing across thousands of sensors creates false positive floods that shut down equipment unnecessarily.

What to say instead

I'd use sequential probability ratio test (SPRT) for online hypothesis testing, or Bayesian decision theory with pre-specified effect sizes and costs. Pre-define false positive cost (dollars per unnecessary investigation) and false negative cost (dollars per missed excursion), and let the optimal test fall out of the cost function.

Chapter 0.3

The Numbers to Memorize

Specific numbers signal credibility. They demonstrate you understand the economics, the physics, and the constraints. Don't recite them, integrate them. The difference: "I know wafers cost $50,000" is weak. "When the process engineer asked for a 243-run full factorial, I calculated $12.1 million in wafer cost and proposed a Resolution V design instead, 32 runs, $1.6 million, 87% savings, same statistical power" establishes credibility.

Financial Anchors

$50,000

Cost per advanced-node wafer

Example usage: "A 32-run factorial design is $1.6 million in wafer cost alone, before engineering time and six-week approval process. That's why we use Resolution V fractional designs, 16 runs instead of 32, $800K savings, same statistical power for the effects we cared about."

$2.5M

Cost of a 200ms endpoint detection miss

Example usage: "Latency isn't a nice-to-have optimization. A 200ms miss costs $2.5 million. That's why we deploy to edge controllers with 10ms inference budgets, not cloud APIs with 200ms round-trips."

$8.4M/mo

Cost of 2.1% unexplained yield loss

Example usage: "Spatial statistics aren't academic. Missing a reticle defect because we monitored aggregate count cost $50 million over six months. Moran's I at the reticle field scale would have caught it in week one."

Physics Anchors

50ms

Plasma micro-arc duration

Example usage: "1 Hz sampling misses 50ms arcs entirely, we're sampling 20x too slowly per Nyquist. We need Interface A at 100 Hz minimum for detection, 20x oversampling for reliable capture."

60-90 days

Ground truth latency (electrical test)

Example usage: "My VM model processes 800 wafers before any physical validation. That's why OOD detection is mandatory, autoencoder reconstruction error flags when we're extrapolating, routes to physical metrology instead of confident wrong predictions."

3nm = 15 atoms

Scale of advanced nodes

Example usage: "At 3nm, a 5-degree post-exposure bake shift moves critical dimensions by 2-4nm. That's 10+ atoms of error, enough to turn a working transistor into leakage current. Temperature control isn't optimization, it's survival."

Operational Anchors

85% OEE

World-class operational efficiency

Example usage: "My R2R tuning targeted availability first, not just yield. Improving quality by 2% but reducing availability by 5% drops OEE net. We computed all three components before claiming ROI."

1 Hz vs 100 Hz

SECS/GEM vs Interface A sampling

Example usage: "The Averaged Arc happened because we relied on 1 Hz SECS/GEM historian. The 50ms arc occurred between samples. Interface A at 100 Hz would have caught it, 20x the Nyquist rate for 50ms events."

2,000+ wafers/day

High-volume fab throughput

Example usage: "My VM model made production decisions on 800 wafers before first physical validation. At 2,000 wafers/day, that's half a day of production. OOD abstention threshold was calibrated to false negative cost at that scale."

Chapter 0.4

The Vocabulary Checkpoint

You must recognize these terms instantly. Mispronouncing or misusing them signals inexperience. When you encounter an unfamiliar term, don't fake it, demonstrate learning. "I'm not familiar with the specific thermal design of your implant chucks. What I do know is that thermal gradients create non-uniform ion range, which affects threshold voltage across the wafer. How do you currently monitor chuck temperature uniformity?" This acknowledges the gap, demonstrates adjacent knowledge, and asks an intelligent question.

Tier 1: Must Know Cold

Use these correctly without hesitation. Confusing any Tier 1 term is an immediate signal that you haven't worked with or studied real fab systems.

Term	Definition	Common Mistake
FDC	Fault Detection and Classification server (ISA-95 Level 2)	Confusing with "feedback control" or "feedforward"
MES	Manufacturing Execution System (ISA-95 Level 3)	Thinking it's "just the database" rather than the operational system of record
MFC	Mass Flow Controller (gas flow valve with thermal measurement)	Not knowing about the 0.5-2 second thermal lag in readings
PM	Preventive Maintenance (scheduled chamber cleaning, consumable replacement)	Confusing with "project manager" or "post-merger"
POR	Process of Record (formally approved production recipe)	Not understanding change control requirements to modify
WIP	Work in Progress (unfinished wafers currently in fab)	Missing that it's also a financial valuation (cumulative processing cost)
Q-Time	Queue Time limit (maximum allowable time at a step before degradation)	Not knowing expiration = mandatory scrap
CEID	Collection Event ID (SECS/GEM trigger message for state changes)	Not knowing Step_Start and Step_End are primary join keys
EWMA	Exponentially Weighted Moving Average (R2R controller core)	Not knowing stability condition 0 < G * lambda < 2
CUSUM	Cumulative Sum control chart (detects small persistent shifts)	Confusing with simple cumulative sum; missing the control chart aspect
OES	Optical Emission Spectroscopy (plasma chemistry monitoring)	Pronouncing as letters "O-E-S" instead of "oh-ee-ess"
R2R	Run-to-Run control (recipe adjustment between wafers)	Confusing with "reinforcement learning"
VM	Virtual Metrology (predicting metrology from process data)	Not understanding the ground truth latency problem
OOD	Out-of-Distribution (input outside training distribution)	Missing that abstention is better than a wrong prediction
PSI	Population Stability Index (distribution shift metric)	Using on rolling windows instead of fixed baselines
SEMI S2	Safety standard for semiconductor equipment (sensor classification)	Not knowing Category 0/1/2 instrument distinction

Tier 2: Recognize and Ask Intelligent Questions

You don't need to be an expert, but when these terms appear, ask the right follow-up question. It signals curiosity and adjacent knowledge rather than ignorance.

Term	Context	Good Question to Ask
ALD	Atomic Layer Deposition (ultra-thin film deposition)	"How does ALD's self-limiting reaction affect sensor telemetry compared to CVD?"
CVD	Chemical Vapor Deposition (film deposition from gas reactions)	"What's the typical MFC response time for CVD precursors? Do you compensate for thermal lag in R2R?"
CMP	Chemical Mechanical Planarization (surface flattening)	"Pad glazing is monotonic degradation, do you use fixed baseline CUSUM or try to model the decay?"
OEE	Overall Equipment Effectiveness (Availability x Performance x Quality)	"What's your fab's biggest OEE component, availability losses from PM or performance from R2R tuning?"
DOE	Design of Experiments (structured experimentation)	"What resolution do you typically run for 5-factor process optimizations?"
APC	Advanced Process Control (umbrella for R2R, FDC, VM)	"How integrated is your APC, separate systems or unified platform?"
Interface A	High-frequency data stream (100 Hz+) from tools	"What percentage of your fleet has Interface A vs. SECS/GEM only?"
HSMS	High-Speed SECS Message Services (TCP/IP transport)	"Do you see latency advantages over SECS-I serial, or is it just reliability?"

Tier 3: Impress If Known

These indicate deep specialization. If you know them, deploy them. If not, do not fake; ask what they mean and demonstrate that you grasp the adjacent physics.

Preston equation

CMP removal rate modeling

"Prestonian behavior breaks down at nanoscale; we had to add non-Preston terms for advanced nodes."

Langmuir probe

Plasma diagnostics

"We used Langmuir probe data to validate our RF match model."

Strehl ratio

Optical quality

"Scanner lens heating degrades Strehl ratio; we model it for overlay prediction."

Zernike polynomials

Wavefront aberration representation

"We fit Zernike coefficients to lens heating deformation."

Knudsen number

Rarefied gas dynamics

"At our operating pressure, Knudsen number indicates transitional flow, continuum assumptions break down."

Part 0 Summary

Before You Proceed

Everything that follows assumes you speak this language. Before moving to the Eight Archetypes, ensure you can do each of the following:

Recite the three constraints and explain how each one reshapes an ML decision you would otherwise make differently

List all 11 banned items and pivot immediately to the acceptable alternative

Drop the six key numbers naturally in conversation, as justification for a decision, not as recitation

Define all 16 Tier 1 vocabulary terms without hesitation and catch common mistakes

Ask intelligent follow-up questions for all Tier 2 terms when they appear

Recognize when a Tier 3 term signals deep specialization and engage with appropriate curiosity

<- Index Part 1: The Eight Archetypes