YieldOpsAcademy
Reference Cards

Reference Cards

11 interview prep cards, 7 on-the-job cards. Print or save as PDF with the buttons above.

I-1Interview Prep
The Mindset Shift: Atoms, Not Clicks
Data volumeAd tech: billions of events daily. Fab DS: thousands of wafers.
Cost of errorMissed ad click: $0.001. Escaped fab defect: $50K lot scrap.
Latency budgetTraining: hours OK. Inference: <10ms hard deadline at the tool.
Physics constraintFeatures must respect thermodynamics. No imaginary temperatures.
Iteration speedModel deploy: weeks (change control), not minutes.
A/B testingCosts $50K per run. DoE replaces A/B testing.
You are optimizing atoms, not engagement. Statistical significance takes weeks, not a 30-minute A/B test. Every experimental wafer has a dollar sign attached.
I-2Interview Prep
The Four Data Domains
FDCSensor streams: 1Hz to 100Hz. Temperature, pressure, RF power.
MetrologySparse, expensive. ~1 measurement per wafer. CD-SEM, film thickness.
Defect inspectionImages + coordinates. Particles, scratches, pattern bridges.
WAT / ETElectrical test: threshold voltage, drive current, yield bins.
MESTool status, alarms, recipe versions, lot genealogy.
The join problemFDC (billions of rows) joined to metrology (thousands). Clocks drift.
Your job is almost always joining high-frequency FDC to sparse metrology. The join key is rarely a perfect timestamp , it is Wafer_ID + Step_ID with a tolerance window.
I-3Interview Prep
Virtual Metrology: The Killer App
ProblemMetrology is slow (4-8 hours) and expensive. Cannot measure every wafer.
SolutionPredict metrology values from FDC sensor data before measurement completes.
Benefit100% inspection coverage, real-time feedback vs. 4-hour delay.
Component 1Feature extraction with physical interpretation for each feature.
Component 2GBT + SHAP for interpretable predictions. Not neural nets.
Component 3Autoencoder Reliability Index , OOD detection is MANDATORY.
Component 4Cost-calibrated router: high RI routes to physical metrology.
VM without OOD detection is more dangerous than no VM. One ghost excursion (812 wafers, no film) = $40M+ loss. The Reliability Index is the safety net.
I-4Interview Prep
The Banned List: What NOT to Say
SMOTECreates physically impossible sensor states. Use anomaly detection instead.
Cloud inferenceFab tools are air-gapped. Arc destroys transistor in 50ms. Must be edge.
Rolling baselineAdapts to the drift it should detect. Use fixed golden baseline.
Random shuffle CVTemporal leakage. Future wafers in train, past in test. Use walk-forward.
RL for controlCannot explore states on a $150M scanner. Use bounded R2R / MPC.
Black-box DLProcess engineer cannot interpret. Model will not be deployed.
Retrain after PMEncodes broken hardware as normal. Investigate, do not adapt.
Forward-fill NaNFlatlines a dead sensor. FDC sees zero variance, assumes stability.
Explicitly stating why you are NOT using a trendy approach due to physical or safety constraints signals senior-level thinking.
I-5Interview Prep
Algorithm Translation: LeetCode to Fab
Endpoint detectionSequential change-point detection on 1D streaming array.
Tool degradationCUSUM on fixed golden baseline. NOT rolling window.
Wafer defect mapsSpatial clustering. Moran's I for periodic patterns.
Yield optimizationBayesian optimization or fractional factorial DoE.
Timestamp skewmerge_asof() with tolerance. Never equality join.
Sensor missing dataSentinel audit first. Frozen sensor = rolling variance = 0.
Fleet monitoringHotelling T-squared. Not individual Shewhart per sensor.
VM validationWalk-forward splits. PM boundaries as natural split points.
Use this to map the interviewer's domain scenario back to the core algorithmic pattern. The fab-specific choice is usually more conservative than the textbook answer.
I-6Interview Prep
Confusion Matrix Reality: Cost Calibration
False NegativeMissed defect. Scrap $2.5M lot. Worst case. Minimize at all cost.
False PositiveFalse alarm. Tool down, metrology delayed 4-8 hours. Expensive but survivable.
FN:FP cost ratioTypically 5000:1 or higher. Justifies 15% abstention rate.
Threshold logicSet threshold on FN:FP cost ratio, NOT on F1 or accuracy.
AbstentionVM should abstain when RI is high. "I don't know" > confident wrong.
ROI framing15% abstention at $500 each = $60K. One prevented excursion = $40M+.
Never evaluate a model in a vacuum. Always ask: "What is the operational cost of a FP versus FN for this specific chamber?" Then set the threshold accordingly.
I-7Interview Prep
The Hardware Latency Trap
EnvironmentAir-gapped 2012 industrial PC, 2GB RAM, no GPU, no Docker.
The eventPlasma micro-arc lasts 50 milliseconds.
Inference budget< 10ms. Must complete before the arc finishes.
Deployment pathPhysical media transfer. Change control: 2-6 weeks.
Banned formatsNo Docker, no pip installs, no network calls.
Required formatONNX export. Static binary. Zero external dependencies.
The answerCUSUM, EWMA, or pre-quantized ONNX model at the edge.
When designing any real-time system, state your hardware assumptions before naming an algorithm. "Assuming an air-gapped 2012 PC with no GPU..."
I-8Interview Prep
The SMOTE / Synthetic Data Trap
ScenarioYield is 99%. Predict the 1% failures (highly imbalanced data).
Bootcamp answerUse SMOTE to generate synthetic minority class samples.
The physics trapSMOTE interpolates geometrically between data points in feature space.
The realityInterpolating between two broken machines creates a non-physical fake.
Fab answerTreat as anomaly detection. Autoencoder or Isolation Forest.
The framingGolden baseline = normal. Anomaly = deviation from golden. Simple.
Never generate synthetic sensor data unless using a validated physics-based simulation. Rare defects are anomalies to detect, not a minority class to synthesize.
I-9Interview Prep
The XGBoost Extrapolation Trap
ScenarioPredict tool degradation or consumable wear for next 30 days.
Bootcamp answerTrain XGBoost or Random Forest to predict future values.
The trapTree-based models CANNOT extrapolate beyond max training value.
The failureTool degrades further than any historical example: flat-line prediction.
Fab answerLinear regression, survival analysis, or CUSUM for degradation.
GBTs ARE right forVM prediction, FDC classification. Within distribution, not forward.
Never suggest a tree-based model for time-series forecasting where the future state may exceed historical bounds. Degradation always moves toward new territory.
I-10Interview Prep
The Domain Expert Dynamic
The question"How do you handle feature engineering for a process you don't understand?"
Bootcamp answer"I run automated feature selection and let the math decide."
The realityThe math does not know the O-ring melts at 600°C.
Winning answer"I sit with the process engineer and ask them to draw the physics on a whiteboard."
The dynamicData scientists are math translators. Engineers own the physics.
Safety corollarySEMI S2 audit before feature selection. Life safety sensors never in model.
Process engineers have spent 20 years tuning these chambers. Extreme humility about the physics. Your model earns their trust , it does not bypass them.
I-11Interview Prep
Framing Failure: Downside Containment
The question"Tell me about a time your model failed or drifted."
Bootcamp answer"My validation F1 score dropped by 2%."
Fab framingFailure is measured in scrapped wafers, tool downtime, engineering hours.
Winning answer"The model drifted, but our OOD detection routed to physical metrology , 2 hours delay instead of $40M scrap."
The emphasisYour detection and containment strategy, not the failure itself.
The asymmetryFalse negative (escaped defect) is catastrophic. False positive is recoverable.
Always demonstrate you understand the financial asymmetry. Engineers who talk about downside containment sound like production engineers, not notebook writers.
J-1On the Job
SECS-II Message Anatomy
Header10 bytes: length + device ID + stream + function + SxFy
S1F13/F14Establish Communication (handshake)
S1F1/F2Are You There? / On-Line Data
S2F41/F42Host command send (R2R recipe adjust)
S6F11Event Report Send (CEID trigger)
S6F23Trace Data Send (Interface A)
S9F7Unrecognized Message , bad SxFy code
HSMSTCP/IP transport, port 5000 by default
Tools responding with SOFTREV=" " (8 spaces): strip whitespace before comparison. Legacy SECS-I tools use serial at 9600 baud.
J-2On the Job
OEE Quick-Calc
OEE = Availability x Performance x Quality
AvailabilityRun Time / Planned Production Time
PerformanceActual Output / Theoretical Max Output
QualityGood Parts / Total Parts
World class> 85% OEE
Typical fab60-75% OEE
Example90% x 95% x 99% = 84.6% OEE
A model that improves Quality but slows Performance might lower OEE net. Always compute all three before claiming ROI.
J-3On the Job
R2R Control Loop Tuning
u[n] = u[n-1] + G * lambda * (y_target - y[n])
LambdaSmoothing weight, 0 < lambda < 1
GProcess gain (slope of recipe-to-output)
Stable when0 < G * lambda < 2
ConservativeStart lambda = 0.3, tune up slowly
FF gainalpha = -beta_upstream / beta_downstream
FF updateu_next = u_nominal + alpha*(x_upstream - x_target)
Unstable R2R amplifies variation. If wafer-to-wafer CD oscillates, reduce lambda before re-tuning G.
J-4On the Job
Ghost Excursion Diagnostic
9999.0SECS/GEM timeout sentinel. Drop the sample.
-999.0Out-of-range sentinel. Drop the sample.
NaNSensor death. Drop the feature + ticket.
FlatlineRolling std = 0. Frozen sensor. Drop + ticket.
> 5-sigmaPlausible arc/spike. Keep, but flag as event.
Rule: Diagnose before imputing. Never forward-fill a sentinel value. Flatline kills variance-based FDC.
J-5On the Job
ML Latency Tiers
Edge (Tool PC)
Budget10-50ms (safety hard limit)
FormatONNX INT8, bare metal, no Docker
Fail-safeHardware interlock (PLC)
Fog (APC Server)
Budget< 2s (SECS/GEM timeout)
Format< 500MB model, fab intranet
Enterprise
BudgetHours to days
FormatSpark job, GPU cluster, cloud OK
J-6On the Job
ML Failure Mode Quick-Ref
R2 drops suddenlyCheck PM log first, not data
R2 = 0.99 (too high)Target leakage, check feature timeline
Rolling baseline flatBoiling frog. Switch to fixed golden baseline
Feature: facility sensorSEMI S2 audit. Remove immediately
SMOTE on sensor dataCreates physically impossible samples
Cloud API for arcArc destroys transistor in 50ms
Retrain after PMEncodes broken hardware as normal
Forward-fill sentinelFlatline kills variance-based FDC
J-7On the Job
Drift vs. Shift
SignalDrift (gradual)Shift (abrupt)
PSI rate< 0.01/day> 0.25 in 24h
SensorsAll drift togetherOne or few step
MES logNo entryPM ticket exists
CUSUMSlow accumulationImmediate alarm
ScopeFleet-wide patternSingle chamber
ResponseOnline learning OKChange-point detect
Applying online learning to a shift teaches the model that broken hardware is normal. Diagnose before responding.