Layer 7

Reference

Data schemas, mathematical derivations, the contextual glossary, checkpoint solutions, and the integrated capstone excursion.

Layer 7.1

Data Schema Reference

SECS/GEM message structures

The HSMS message header is 10 bytes. Field layout:

Bytes	Field	Type	Notes
0-3	Message length	UInt32, big-endian	Length of header + data body in bytes
4-5	Device ID	UInt16, big-endian	Tool identifier; high bit set = host-to-equipment direction
6	Stream byte	UInt8	High bit = reply bit (W-bit); lower 7 bits = Stream number (S)
7	Function byte	UInt8	Function number (F)
8-9	Block number	UInt16, big-endian	Always 1 for single-block messages; 0 for multi-block last block

Key message types

S1F1   -  Are You There? (host to tool, connection check)
S1F2   -  I Am Here (tool response)

S6F11  -  Event Report Send (Stream 6, Function 11)
  Body: DATAID (UInt32) | CEID (UInt32) | RPT list
  RPT list: [(RPTID, [SV values]), ...]
  Note: tool-initiated async notification; no reply expected unless W-bit set

S2F41  -  Host Command Send (recipe download, process start)
  Body: RCMD (string) | CPLIST [(CPNAME, CPVAL), ...]

S5F1   -  Alarm Report Send (equipment alarm notification)
  Body: ALCD (alarm code) | ALID (alarm ID) | ALTX (alarm text)

S9F7   -  Unrecognized Device ID (error: bad device ID in message header)

Interface A (EDA) XML schema fragment

<!-- Simplified EDA data collection plan response -->
<DMTGetDataCollectionPlanResponse>
  <DCP>
    <DCPName>ETCH_MAIN_100HZ</DCPName>
    <Frequency unit="Hz">100</Frequency>
    <Parameters>
      <Parameter>
        <ParameterID>RF_Forward_Power</ParameterID>
        <Units>W</Units>
        <DataType>Float32</DataType>
      </Parameter>
      <Parameter>
        <ParameterID>Chamber_Pressure</ParameterID>
        <Units>mTorr</Units>
        <DataType>Float32</DataType>
      </Parameter>
    </Parameters>
    <Trigger>
      <EventType>RecipeStepStart</EventType>
      <CEID>2001</CEID>
    </Trigger>
  </DCP>
</DMTGetDataCollectionPlanResponse>

Layer 7.2

Mathematical Derivations

Partial derivatives and Lagrange multipliers

Lagrange multipliers handle constrained optimization: finding the maximum of f(x) subject to a constraint g(x) = 0. The method introduces a multiplier lambda and solves the system: gradient(f) = lambda * gradient(g), alongside g(x) = 0. In the CMP R2R context, this appears when the recipe optimizer must maximize removal rate uniformity across the wafer subject to a total material removal constraint. The Lagrangian is:

L(P, v, lambda) = Uniformity(P, v) - lambda * (RR(P,v) - RR_target)
where P = downforce, v = relative velocity, RR = removal rate (Preston equation), lambda = Lagrange multiplier

Complex numbers for RF impedance

RF impedance is a complex quantity: Z = R + jX where R is resistance (real part, in Ohms) and X is reactance (imaginary part, in Ohms). The magnitude is |Z| = sqrt(R^2 + X^2) and the phase angle is theta = arctan(X/R). Plasma etch chambers have capacitive reactance (X < 0) during normal operation; inductive behavior (X > 0) indicates a plasma instability or match network failure.

Normal operation:  X < 0 (capacitive), theta in (-90, 0) degrees
Match network saturated: |Z| drifts, theta approaches +/- 90 degrees
Plasma instability: abrupt sign change in X, theta crosses 0

Layer 7.4

The Contextual Glossary

Definitions organized by where the term lives in the ISA-95 network hierarchy. Cross-references indicate which layers use each term in analytical context.

Equipment and Physical Layer (Level 0-1)

Electrostatic chuck (ESC)Level 0

The wafer-holding mechanism inside a process chamber. Uses electrostatic force (applied voltage across a dielectric) to hold the wafer flat against a temperature-controlled pedestal. Helium backside pressure fills the gap for thermal coupling. ESC failure allows the wafer to move or overheat mid-process. See Layer 3.1 (He_BP monitoring) and Layer 6.5 (ESC seal degradation as drift pattern).

Match networkLevel 0

The tunable impedance circuit between the RF generator and the process chamber. Adjusts in real time using variable capacitors to maintain 50-Ohm matching, maximizing power transfer to the plasma. Saturated capacitor position (at 0% or 100%) means the network can no longer compensate for the plasma load - a fault condition that appears as a sudden shift in RF_Impedance_Real without a corresponding process change. See Layer 4.2 and Layer 7.2.

FOUP (Front Opening Unified Pod)Level 0-1

The sealed plastic container that holds 25 wafers and moves between tools via the AMHS. The FOUP's load port is the physical handoff point between the AMHS and the tool. Q-Time starts when the FOUP is placed at the load port of the next process step - not when wafer processing begins.

AMHS (Automated Material Handling System)Level 1

The overhead rail and robotic transport system that moves FOUPs between tools throughout the fab. AMHS dispatch decisions are made by the MES based on queue depth, Q-Time urgency, and tool availability. AMHS transport time (typically 1 to 5 minutes) is included in cycle time but not in process time - a distinction that matters for Little's Law calculations.

Process Control Layer (Level 2)

FDC (Fault Detection and Classification)Level 2

The server that collects high-frequency trace data from tools via Interface A and applies real-time statistical process control. Generates alarms when sensor data deviates from a trained reference model. Distinct from R2R: FDC stops bad processes; R2R corrects drifting ones. The two systems run in parallel on overlapping data streams. See Layer 3.3 for the full FDC architecture.

APC (Advanced Process Control)Level 2-3

The server that hosts R2R controllers and VM models. Receives metrology results from the MES and sends recipe adjustments to tools via SECS/GEM S2F41 (Host Command Send). In some fab configurations the APC server is a separate physical machine; in others it is a module within the MES. The APC server is typically on the fab floor network (not air-gapped) but separated from the process tool control network by a one-way data diode.

Interface A (EDA)Level 2

The high-frequency data streaming interface defined by SEMI standard EDA (Equipment Data Acquisition). Provides a read-only Ethernet port separate from the SECS/GEM control port, streaming sensor data at 100 Hz to 10 kHz. Used for FDC trace collection and VM feature extraction. The 100Hz table in the FDC historian comes from Interface A; the 1Hz table comes from SECS/GEM. See Layer 7.1 for the XML schema.

Manufacturing Execution Layer (Level 3)

MES (Manufacturing Execution System)Level 3

The software system that tracks every wafer's location, routing history, recipe assignments, and metrology results. Common vendors: Applied Materials Automation, Siemens Opcenter, PROMIS. The MES dispatches lots to tools based on priority rules and communicates recipe requirements via SECS/GEM. The MES is the authoritative source for lot genealogy trees and the join key between process telemetry and metrology results.

WIP (Work in Progress)Level 3

Unfinished wafers currently on the factory floor. The financial value of a WIP lot equals the cumulative cost of all process steps completed so far. WIP at risk is the quantity processed after a failure event before the failure is detected and lots are put on hold. The goal of FDC and VM systems is to minimize WIP at risk during excursions. See Layer 5.2 for Little's Law.

Q-Time (Queue Time limit)Level 3

The maximum allowable time between two process steps. Expiration results in mandatory scrap because the wafer surface has degraded beyond the specification for the next step (typically oxidation of exposed silicon). Q-Time limits are enforced by the MES. Q-Time violations corrupt training labels and must be filtered from ML training sets. See Layer 6.5.

SAH (Send-Ahead wafer)Level 3

A protocol where one wafer from a lot is processed first, measured, and its result reviewed before the remainder of the lot is released to the tool. Eliminates WIP at risk at the cost of throughput. Typically used for new recipes, tool qualification, or recovery from a known process excursion. The SAH wafer result feeds directly into the R2R controller as the first data point for the next recipe adjustment.

POR (Process of Record)Level 3

The approved production recipe for a given process step and product. Changes to the POR require passing through a formal change control process with experimental justification reviewed by the Yield Review Board. Data scientists proposing recipe changes based on ML optimization must present to this board. The POR is versioned; the recipe version must be logged with every inference to enable traceability back to what process conditions the model was predicting against.

Layer 7.6

Checkpoint Solutions

Solution: Layer 3.3 Checkpoint, FDC fault signature interpretation

Problem recap

PCA model with k=8 components retaining 88% variance across 40 sensors. Wafer 1,247 shows a high Q alarm with 78% of contribution from RF_Impedance_Real and RF_Impedance_Imag. T-squared = 2.3, well below the UCL of 22.4. Interpret the physical meaning.

What high Q with low T-squared means physically

High Q indicates the observation cannot be well-reconstructed from the retained 8 principal components. Something changed in the sensor correlation structure that the PCA model's retained components do not describe. Low T-squared means the observation is not unusual in the directions the model knows about (the principal component subspace): it is unusual in the directions the model discarded (the residual subspace). The fault is in the correlation structure, not in the magnitude of known variation.

Why RF impedance specifically

RF_Impedance_Real and RF_Impedance_Imag are the real and imaginary parts of the plasma load as a complex number. In normal operation they are strongly correlated: as chamber chemistry changes, both components shift in a predictable ratio determined by the plasma physics. If RF_Impedance_Real shifts without a corresponding shift in RF_Impedance_Imag (or vice versa), the complex impedance has decoupled from normal behavior. This decoupling is not captured in the retained principal components (which describe the normal correlated variation) but is captured in the Q-statistic (which describes the residual from that normal structure).

Physical investigation priority

The decoupling of the real and imaginary components of RF impedance, with no shift in the overall operating mode (low T-squared), is most consistent with a change in the match network's tuning behavior. Specifically: a capacitor in the match network that is beginning to fail will shift one component of the impedance independently of the other, breaking the normal correlation. Recommended action: request a match network capacitor diagnostic on the tool that processed Wafer 1,247 before the next lot starts.

Solution: Layer 4.8 Checkpoint, die bin revenue calculation

Problem recap

Three bins: H (800 dies, +$40 premium over S), S (6,700 dies, nominal), L (2,500 dies, -$15 discount vs. S). Compute revenue impact vs. baseline policy of assigning all wafers to S-bin product. Identify the most costly misclassification cell.

Baseline revenue (relative, S-bin = $0)

Without any binning model, all 10,000 dies sell at S-bin price. The actual population contains 800 H-bin dies that could earn +$40 each and 2,500 L-bin dies that should be discounted -$15 each. Baseline policy treats all as S-bin, capturing zero premium on H-bin and absorbing zero discount on L-bin (by selling L-bin dies as S-bin, which satisfies the S-bin spec, but leaves L-bin quality signal unused).

Model value

A perfect binning model captures the H-bin premium: 800 x $40 = $32,000 per wafer lot in additional revenue. It also identifies L-bin dies so they are not shipped as S-bin (avoiding warranty claims). The most costly misclassification cell is H-bin dies classified as S-bin (false negatives on the H-bin class): each such die costs $40 in lost premium revenue. The second most costly is L-bin dies shipped as S-bin: each risks a customer warranty return costing far more than the $15 price difference.

Decision threshold implication

The asymmetric cost structure means standard accuracy optimization is wrong. The threshold for calling a die H-bin should be set conservatively (high confidence required) because incorrectly shipping an L-bin die as H-bin to a premium customer costs far more than missing a few H-bin premiums. Use a cost-weighted confusion matrix with H-misclassification penalty = max(warranty_cost, $40) when selecting the decision threshold.

Capstone

An Integrated Yield Excursion

The following scenario is a composite of documented excursion patterns from advanced logic fabs. Details have been modified. At each phase, identify which layer of knowledge the response team was drawing on and which gaps caused the initial delay.

The situation

A 3 nm logic fab is manufacturing a high-performance AI processor in production ramp, currently at 74% yield with a target of 88% for volume production in seven months. FDC, R2R, and VM models are deployed across all critical process steps. On Tuesday morning the weekly yield report shows a drop from 74% to 41% across all lots completed in the previous 36 hours.

600 wafers affected. At $50,000 per wafer, $30M of product is at risk. Every FDC chart is green. Every R2R controller shows normal adjustment history.

Phase 1

Spatial diagnosis (20 minutes)

Layer 4.4 (DBSCAN, coordinate systems)Layer 4.2 (CMP ring signature physics)

The first action is not to query the sensor database. It is to look at the spatial distribution of failing dice. Pulling end-of-line electrical test data and applying the affine coordinate transformation from Layer 4.4 (converting tester die indices to wafer-level millimeter coordinates, corrected for notch orientation), then running DBSCAN, reveals a consistent ring pattern at 95 to 110 mm from wafer center across all affected lots. A ring at this radius is the spatial signature of a rotating mechanical tool - specifically CMP. The suspect list narrows from 180 tools to the CMP modules in 20 minutes.

Phase 2

Chamber identification (90 minutes)

Layer 3.1 (MES join patterns)Layer 3.3 (FDC negative result)Layer 2.4 (lot genealogy)

The MES lot history is queried joining all 600 affected wafers to their CMP step assignments. Chamber CMP-07 processed 78% of the failing wafers during a continuous 14-hour window on Sunday night. The FDC historian for CMP-07 is pulled. Standard SPC charts show green throughout. Removal rate mean is within 3-sigma. Within-wafer uniformity is within spec. None of the monitored parameters explain a 74% to 41% yield drop. This is the point where less experienced engineers return to the sensor database and look harder. It is the wrong move. The monitored parameters have already returned negative. The missing information is in a parameter that was not monitored.

Phase 3

Root cause identification

Layer 4.2 (CMP Preston equation, carrier mechanics)Layer 3.3 (contribution plot)Layer 1.x (rotating tool pattern recognition)

The equipment engineer opens CMP-07. The wafer carrier shows a 1.4 mm offset from nominal center position on the carrier rotation axis - a bearing wear signature. A bearing wear offset changes the radial velocity distribution at the wafer surface: the pad contact velocity at radius r from wafer center is no longer axially symmetric. Regions that align with the direction of the offset experience higher contact velocity and therefore higher removal rate (Preston equation). The increased removal rate at 95 to 110 mm produces excessive copper dishing in the interconnect layer, which causes open circuits in the dense wiring at that radius. The ring location at 95 to 110 mm is consistent with the geometric calculation to two significant figures. The FDC charts were green because motor current change from a 1.4 mm offset is approximately 2%, within the normal variance of the monitored signal at the configured alarm threshold.

Phase 4

Disposition and recovery

Layer 4.4 (spatial yield estimation)Layer 2.2 (AEC-Q100 traceability)Layer 5.3 (ROI calculation)

Of the 600 affected wafers, the spatial yield model from Phase 1 estimates which dies are likely functional. The ring at 95 to 110 mm covers approximately 22% of the die area. If yield degradation is spatially contained to that ring, inner dies (radius less than 90 mm) and outer dies (radius greater than 115 mm) may still be functional. The lot genealogy tree (Layer 2.4) is used to check automotive customer allocations. Six lots totaling 150 wafers are AEC-Q100 automotive-allocated. AEC-Q100 does not permit known-compromised process history even if final electrical test passes - the lot history itself fails traceability requirements. Decision: continue processing all 600 wafers to final test. The 150 automotive wafers represent $7.5M in unrecoverable scrap. The 450 non-automotive wafers recover approximately $16.8M of the original $22.5M at risk.

Phase 5

Systemic fix

Layer 3.3 (CUSUM configuration)Layer 3.6 (data lineage package)Layer 2.4 (corrective action documentation)

The corrective action configures daily CUSUM monitoring on carrier motor current slope for all CMP tools. The CUSUM is anchored to a rolling 30-day baseline updated only after bearing inspection and qualification events - not after every maintenance event - to avoid the baseline contamination problem from Layer 3.3. The VM model for CMP removal rate uniformity is retrained to include carrier motor current as a feature, so future drift of this parameter appears in the model's uncertainty output before the physical effect reaches the wafer. The 14-hour window during which 470 wafers processed through the failing chamber occurred on Sunday night. The on-call engineer who received the FDC alert at 2 AM had reviewed the standard SPC charts, found nothing, and cleared the alert as a false alarm. The engineer was correct that the SPC charts showed nothing unusual. The error was not diagnostic; it was architectural: the monitoring system had no visibility into the physical mechanism that was failing.

End of Field Manual

You have completed all seven layers. The algorithms, schemas, and failure modes in this manual were chosen because they appear repeatedly in production fab data science roles. The cases are composites of real incidents.

The gap between knowing this material and being effective in a fab is approximately one production excursion. The excursion will teach you things this manual cannot: the pace of a live investigation, the politics of presenting a root cause to a Fab Director, and the specific way your fab's data infrastructure diverges from the canonical patterns described here.

The manual closes the knowledge gap so you can survive the first excursion long enough to learn from it.

<- Layer 6: ML Arsenal Back to chapter index