Reference Cards

11 interview prep cards, 7 on-the-job cards. Print or save as PDF with the buttons above.

I-1Interview Prep

The Mindset Shift: Atoms, Not Clicks

Data volumeAd tech: billions of events daily. Fab DS: thousands of wafers.

Cost of errorMissed ad click: $0.001. Escaped fab defect: $50K lot scrap.

Latency budgetTraining: hours OK. Inference: <10ms hard deadline at the tool.

Physics constraintFeatures must respect thermodynamics. No imaginary temperatures.

Iteration speedModel deploy: weeks (change control), not minutes.

A/B testingCosts $50K per run. DoE replaces A/B testing.

You are optimizing atoms, not engagement. Statistical significance takes weeks, not a 30-minute A/B test. Every experimental wafer has a dollar sign attached.

I-2Interview Prep

The Four Data Domains

FDCSensor streams: 1Hz to 100Hz. Temperature, pressure, RF power.

MetrologySparse, expensive. ~1 measurement per wafer. CD-SEM, film thickness.

Defect inspectionImages + coordinates. Particles, scratches, pattern bridges.

WAT / ETElectrical test: threshold voltage, drive current, yield bins.

MESTool status, alarms, recipe versions, lot genealogy.

The join problemFDC (billions of rows) joined to metrology (thousands). Clocks drift.

Your job is almost always joining high-frequency FDC to sparse metrology. The join key is rarely a perfect timestamp , it is Wafer_ID + Step_ID with a tolerance window.

I-3Interview Prep

Virtual Metrology: The Killer App

ProblemMetrology is slow (4-8 hours) and expensive. Cannot measure every wafer.

SolutionPredict metrology values from FDC sensor data before measurement completes.

Benefit100% inspection coverage, real-time feedback vs. 4-hour delay.

Component 1Feature extraction with physical interpretation for each feature.

Component 2GBT + SHAP for interpretable predictions. Not neural nets.

Component 3Autoencoder Reliability Index , OOD detection is MANDATORY.

Component 4Cost-calibrated router: high RI routes to physical metrology.

VM without OOD detection is more dangerous than no VM. One ghost excursion (812 wafers, no film) = $40M+ loss. The Reliability Index is the safety net.

I-4Interview Prep

The Banned List: What NOT to Say

SMOTECreates physically impossible sensor states. Use anomaly detection instead.

Cloud inferenceFab tools are air-gapped. Arc destroys transistor in 50ms. Must be edge.

Rolling baselineAdapts to the drift it should detect. Use fixed golden baseline.

Random shuffle CVTemporal leakage. Future wafers in train, past in test. Use walk-forward.

RL for controlCannot explore states on a $150M scanner. Use bounded R2R / MPC.

Black-box DLProcess engineer cannot interpret. Model will not be deployed.

Retrain after PMEncodes broken hardware as normal. Investigate, do not adapt.

Forward-fill NaNFlatlines a dead sensor. FDC sees zero variance, assumes stability.

Explicitly stating why you are NOT using a trendy approach due to physical or safety constraints signals senior-level thinking.

I-5Interview Prep

Algorithm Translation: LeetCode to Fab

Endpoint detectionSequential change-point detection on 1D streaming array.

Tool degradationCUSUM on fixed golden baseline. NOT rolling window.

Wafer defect mapsSpatial clustering. Moran's I for periodic patterns.

Yield optimizationBayesian optimization or fractional factorial DoE.

Timestamp skewmerge_asof() with tolerance. Never equality join.

Sensor missing dataSentinel audit first. Frozen sensor = rolling variance = 0.

Fleet monitoringHotelling T-squared. Not individual Shewhart per sensor.

VM validationWalk-forward splits. PM boundaries as natural split points.

Use this to map the interviewer's domain scenario back to the core algorithmic pattern. The fab-specific choice is usually more conservative than the textbook answer.

I-6Interview Prep

Confusion Matrix Reality: Cost Calibration

False NegativeMissed defect. Scrap $2.5M lot. Worst case. Minimize at all cost.

False PositiveFalse alarm. Tool down, metrology delayed 4-8 hours. Expensive but survivable.

FN:FP cost ratioTypically 5000:1 or higher. Justifies 15% abstention rate.

Threshold logicSet threshold on FN:FP cost ratio, NOT on F1 or accuracy.

AbstentionVM should abstain when RI is high. "I don't know" > confident wrong.

ROI framing15% abstention at $500 each = $60K. One prevented excursion = $40M+.

Never evaluate a model in a vacuum. Always ask: "What is the operational cost of a FP versus FN for this specific chamber?" Then set the threshold accordingly.

I-7Interview Prep

The Hardware Latency Trap

EnvironmentAir-gapped 2012 industrial PC, 2GB RAM, no GPU, no Docker.

The eventPlasma micro-arc lasts 50 milliseconds.

Inference budget< 10ms. Must complete before the arc finishes.

Deployment pathPhysical media transfer. Change control: 2-6 weeks.

Banned formatsNo Docker, no pip installs, no network calls.

Required formatONNX export. Static binary. Zero external dependencies.

The answerCUSUM, EWMA, or pre-quantized ONNX model at the edge.

When designing any real-time system, state your hardware assumptions before naming an algorithm. "Assuming an air-gapped 2012 PC with no GPU..."

I-8Interview Prep

The SMOTE / Synthetic Data Trap

ScenarioYield is 99%. Predict the 1% failures (highly imbalanced data).

Bootcamp answerUse SMOTE to generate synthetic minority class samples.

The physics trapSMOTE interpolates geometrically between data points in feature space.

The realityInterpolating between two broken machines creates a non-physical fake.

Fab answerTreat as anomaly detection. Autoencoder or Isolation Forest.

The framingGolden baseline = normal. Anomaly = deviation from golden. Simple.

Never generate synthetic sensor data unless using a validated physics-based simulation. Rare defects are anomalies to detect, not a minority class to synthesize.

I-9Interview Prep

The XGBoost Extrapolation Trap

ScenarioPredict tool degradation or consumable wear for next 30 days.

Bootcamp answerTrain XGBoost or Random Forest to predict future values.

The trapTree-based models CANNOT extrapolate beyond max training value.

The failureTool degrades further than any historical example: flat-line prediction.

Fab answerLinear regression, survival analysis, or CUSUM for degradation.

GBTs ARE right forVM prediction, FDC classification. Within distribution, not forward.

Never suggest a tree-based model for time-series forecasting where the future state may exceed historical bounds. Degradation always moves toward new territory.

I-10Interview Prep

The Domain Expert Dynamic

The question"How do you handle feature engineering for a process you don't understand?"

Bootcamp answer"I run automated feature selection and let the math decide."

The realityThe math does not know the O-ring melts at 600°C.

Winning answer"I sit with the process engineer and ask them to draw the physics on a whiteboard."

The dynamicData scientists are math translators. Engineers own the physics.

Safety corollarySEMI S2 audit before feature selection. Life safety sensors never in model.

Process engineers have spent 20 years tuning these chambers. Extreme humility about the physics. Your model earns their trust , it does not bypass them.

I-11Interview Prep

Framing Failure: Downside Containment

The question"Tell me about a time your model failed or drifted."

Bootcamp answer"My validation F1 score dropped by 2%."

Fab framingFailure is measured in scrapped wafers, tool downtime, engineering hours.

Winning answer"The model drifted, but our OOD detection routed to physical metrology , 2 hours delay instead of $40M scrap."

The emphasisYour detection and containment strategy, not the failure itself.

The asymmetryFalse negative (escaped defect) is catastrophic. False positive is recoverable.

Always demonstrate you understand the financial asymmetry. Engineers who talk about downside containment sound like production engineers, not notebook writers.

J-1On the Job

SECS-II Message Anatomy

Header10 bytes: length + device ID + stream + function + SxFy

S1F13/F14Establish Communication (handshake)

S1F1/F2Are You There? / On-Line Data

S2F41/F42Host command send (R2R recipe adjust)

S6F11Event Report Send (CEID trigger)

S6F23Trace Data Send (Interface A)

S9F7Unrecognized Message , bad SxFy code

HSMSTCP/IP transport, port 5000 by default

Tools responding with SOFTREV=" " (8 spaces): strip whitespace before comparison. Legacy SECS-I tools use serial at 9600 baud.

J-2On the Job

OEE Quick-Calc

OEE = Availability x Performance x Quality

AvailabilityRun Time / Planned Production Time

PerformanceActual Output / Theoretical Max Output

QualityGood Parts / Total Parts

World class> 85% OEE

Typical fab60-75% OEE

Example90% x 95% x 99% = 84.6% OEE

A model that improves Quality but slows Performance might lower OEE net. Always compute all three before claiming ROI.

J-3On the Job

R2R Control Loop Tuning

u[n] = u[n-1] + G * lambda * (y_target - y[n])

LambdaSmoothing weight, 0 < lambda < 1

GProcess gain (slope of recipe-to-output)

Stable when0 < G * lambda < 2

ConservativeStart lambda = 0.3, tune up slowly

FF gainalpha = -beta_upstream / beta_downstream

FF updateu_next = u_nominal + alpha*(x_upstream - x_target)

Unstable R2R amplifies variation. If wafer-to-wafer CD oscillates, reduce lambda before re-tuning G.

J-4On the Job

Ghost Excursion Diagnostic

9999.0SECS/GEM timeout sentinel. Drop the sample.

-999.0Out-of-range sentinel. Drop the sample.

NaNSensor death. Drop the feature + ticket.

FlatlineRolling std = 0. Frozen sensor. Drop + ticket.

> 5-sigmaPlausible arc/spike. Keep, but flag as event.

Rule: Diagnose before imputing. Never forward-fill a sentinel value. Flatline kills variance-based FDC.

J-5On the Job

ML Latency Tiers

Edge (Tool PC)

Budget10-50ms (safety hard limit)

FormatONNX INT8, bare metal, no Docker

Fail-safeHardware interlock (PLC)

Fog (APC Server)

Budget< 2s (SECS/GEM timeout)

Format< 500MB model, fab intranet

Enterprise

BudgetHours to days

FormatSpark job, GPU cluster, cloud OK

J-6On the Job

ML Failure Mode Quick-Ref

R2 drops suddenlyCheck PM log first, not data

R2 = 0.99 (too high)Target leakage, check feature timeline

Rolling baseline flatBoiling frog. Switch to fixed golden baseline

Feature: facility sensorSEMI S2 audit. Remove immediately

SMOTE on sensor dataCreates physically impossible samples

Cloud API for arcArc destroys transistor in 50ms

Retrain after PMEncodes broken hardware as normal

Forward-fill sentinelFlatline kills variance-based FDC

J-7On the Job

Drift vs. Shift

SignalDrift (gradual)Shift (abrupt)

PSI rate< 0.01/day> 0.25 in 24h

SensorsAll drift togetherOne or few step

MES logNo entryPM ticket exists

CUSUMSlow accumulationImmediate alarm

ScopeFleet-wide patternSingle chamber

ResponseOnline learning OKChange-point detect

Applying online learning to a shift teaches the model that broken hardware is normal. Diagnose before responding.