Because technology never stopsBecause technology never stops0326
Effective Environmental Test Technology in New Product
Introduction
Effective Environmental Test Technology in New Product
Introduction
May 13, 2009May 13, 2009
Romano [email protected]
Romano [email protected]
2
Reliability Program at TeradyneReliability Program at Teradyne
2009
Formal Reliability Program
Full time Reliability engineers (5)
Negligible Retrofit
Environmental testing during NPI
2 Lab facility
ØNorth Reading MA
3 HALT/HASS chambers
2 Walk In
1 Humidity
2 ESS
Ø Agoura Hills CA
2 HALT chambers
Before 2000
§No Formal Reliability program
§No Reliability engineers
§Hi Retrofit cost due to Field
Reliability
§No Environmental testing
§No Lab facility
§More complex designs in the
pipeline
5
Types of Reliability TestingTypes of Reliability Testing
HALT- Identify operation and destruct limits (Engineering Tool)
HASS – Limits derived from HALT. Process screen to identify
process and component variation (NPI)
POS - (Proof of screen). Verify HASS profile does not damage
good hardware and does not take too much life.
ARG - (Accelerated Reliability Growth). Verify early life reliability.
Aging process
ESS – (Environmental Stress Screen). Used for process validation
of new components or materials.
TBH – (Temperature Bias Humidity). Used for qualifying new
material or processing. Mostly for leakage failure modes.
HALT- Identify operation and destruct limits (Engineering Tool)
HASS – Limits derived from HALT. Process screen to identify
process and component variation (NPI)
POS - (Proof of screen). Verify HASS profile does not damage
good hardware and does not take too much life.
ARG - (Accelerated Reliability Growth). Verify early life reliability.
Aging process
ESS – (Environmental Stress Screen). Used for process validation
of new components or materials.
TBH – (Temperature Bias Humidity). Used for qualifying new
material or processing. Mostly for leakage failure modes.
7
Failure AnalysisFailure Analysis
ØRandom locations
ØNo damage to components
ØAll failures occurred after 3-6 months of operations
ØBurned failures across multiple board types
ØMost Likely Cause are PCBs
Formed QIT with supplier
9
What’s a Cabosil Particle and Why is it a Problem?
• Cabosil is a thickening material used in the formation of Megtron and G-Tek.
• Cabosil is extremely hygroscopic.
• Cabosil is normally filtered to less than 75 microns.
So What’s the problem?
• If the filtration system fails and larger particles make their way into the material you get a defect between copper planes providing a path for the copper to migrate.
• If there is moisture present this migration will happen faster.
• If there is a high voltage gradient (> 5 Volts/mil) it will happen faster yet.
10
TBH ChamberTBH Chamber
PurposeEvaluate PCB material
TBH EquipmentBlue M chamberPower supplyBare PCBs
Typical Profile85C 85RHup to 4 wks
Need to measure resultson regular interval
12
TBH RESULTS
TBH is ideal for accelerated conductive anodic filament (CAF) formation experiments.
Bare boards (4) were put in a chamber at 550C and 85% relative humidity.
The power planes were powered at 7.5 volts. (5 volts / mil rule of thumb)
Limit current with 100K resistor in series
The Result: Shorts appeared in just 48 hours. These boards should be able to provide a smoking gun for root cause analysis.
13
The Smoking Gun!
After weeks of grinding, polishing and looking we finally got a picture of copper migration through a crystal between the power and ground plane. This is proof positive that the short is the result of Conductive Anodic Filament formation. The only mystery is why so fast and what is the crystal? Normal CAF takes years to form while this CAF is happening in a few months (hours in a chamber).
14
Current Use of Humidity Chamber
Evaluate new PCB materials
Evaluate new PCB supplier
Evaluate leakage on sensitive instruments (Pico amps)
Evaluation of contamination related issues
15
WHAT IS HALT?WHAT IS HALT?
§ HALT is Highly Accelerated Life Test§ Not an indicator of MTBF
§ It is a process to quickly identify potential design, supplier and manufacturing problems by:
s Subjecting a system to step stresses• Vibration, temperature, voltage margining, etc.
s Precipitating hard failures
s Soft failures are margin improvement opportunities
s Investigating root cause for each failure
s Implementing solutions that improve product reliability
s Verifying design fixes work• And that the fixes didn’t insert new problems
§ HALT is Highly Accelerated Life Test§ Not an indicator of MTBF
§ It is a process to quickly identify potential design, supplier and manufacturing problems by:
s Subjecting a system to step stresses• Vibration, temperature, voltage margining, etc.
s Precipitating hard failures
s Soft failures are margin improvement opportunities
s Investigating root cause for each failure
s Implementing solutions that improve product reliability
s Verifying design fixes work• And that the fixes didn’t insert new problems
16
HALT UNCOVERS DESIGN LIMITSHALT UNCOVERS DESIGN LIMITS
HALT tests beyond the product specification to:
1) Identify Design Margins (Operating Limits)
2) Identify potential field failures (Destruct Limits)
ProductSpec.
Lower Operating
Limit
Upper Destruct
Limit
Upper Operating
Limit
Lower Destruct
Limit
Hard Failure
Soft Failure
Hard Failure
Soft Failure
Applied Stress
17
HALT HALT -- BE CAREFUL HOW YOU TESTBE CAREFUL HOW YOU TEST
§ Identify physical limitations that will prevent testing to the HALT stress limits:
s Lower temp limit - 50ºC, or as close as possibles Upper temp limit + 140ºC, or as close as
possible • With fast temperature transitions, both up
and downs Upper vibration limit 60 to 80 Grmss Combined stresses of temperature, vibration and
voltage marginings Add any other special test conditions such as
electrical noise or with jitter present.
§ Identify physical limitations that will prevent testing to the HALT stress limits:
s Lower temp limit - 50ºC, or as close as possibles Upper temp limit + 140ºC, or as close as
possible • With fast temperature transitions, both up
and downs Upper vibration limit 60 to 80 Grmss Combined stresses of temperature, vibration and
voltage marginings Add any other special test conditions such as
electrical noise or with jitter present.
19
HALT PlanningHALT Planning
§ Stress only the instrument/Board of interest
§ Card Cages as a fixture are Ok but need to be fortified for
vibration
§ Diagnostic software
s Need high fault coverage
s Ideal is to use the same as Final Test
§ Resources available for live debug
§ Stress only the instrument/Board of interest
§ Card Cages as a fixture are Ok but need to be fortified for
vibration
§ Diagnostic software
s Need high fault coverage
s Ideal is to use the same as Final Test
§ Resources available for live debug
21
HALT SET UPHALT SET UP
Instrument tied to chamber floor. Data-bus, control and power connected via umbilical cable
After a few months of experiments, HALT fixture improved
26
HALT MitigationHALT Mitigation
Problem: The DC-DC Converters will not run above 85ºC when testing the power board. The rest of the board can run to 140ºC chamber temperature.
Solution: Cool the DC-DC converters with compressed air so their temperature stays below 80ºC while the rest of the board is tested.
Cool Coverfor test
28
Typical HALT Failures
Design issuesTiming (Mostly with FPGAs)Voltage offsetsVoltage MarginUnder Rated Components
Software and diagnosticsTimingWrong LimitsMust debug ALL soft failures
ManufacturingSolder issuesComponents not meeting spec
29
Example of HASS Proof of Screen (POS) that FailedExample of HASS Proof of Screen (POS) that Failed
L1
L2
L3
L1
L2
L3
HFE –35C to +65C
Air Temp Dwell at each Temp 17 min
Vibration 30 Grms continuous
Precipitation profileVibe
HFE
Ø POS is successful when 20 or more failure free HASS profiles are executed
Ø POS stop at 11 profiles due to High number of same device A failures
1st failure during 2nd profile
9 total failures after 11 profiles
Device A was the only comp. type that failed. CTE Mismatch
A design solution is required or all instruments may fail within 5 years
30
Example of HASS Proof of Screen (POS) that FailedExample of HASS Proof of Screen (POS) that Failed
Ø 1 Gig TSOP memory failed during 2nd profile (cracked solder)
Ø Multiple devices failed after 17 profiles (18)
Ø Large die in device caused CTE mismatch
Ø Stress occurred during reflow process
Underfill was added to the manufacturing process. Life tests were performed to insure minimum of 10 year life
31
Product Screen Product Screen -- HASSHASSPurpose
Evaluate Process variationMaintain marginsPilot use (30-50 boards)
HASS EquipmentHALT/HASS ChamberLN2VibrationHASS FixtureTBU
Typical ProfilePrecipitation (-30C to 70C 20 GRMS )Detection (0C and 55C 5 GRMS )
Not ideal for time dependent failure modes and drift
Liquid Cooled HASS Set Up
32
Process Validation Process Validation -- Life Test (ESS)Life Test (ESS)
Purpose• Evaluation of new technology/process• Design of Experiments
Typically Profile for Solder Joint Evaluation
Minimum 1000 cyclesRange from 0C to 100CMatrix measures up to 1024 nodes Data-log results on cold and hot cyclesCan detect 200 milliohm change
Past ProjectsTSOP underfill evaluation (POS failure)Dev. A Perimeter bonding (POS failure)SON4 evaluation
PlannedMemory channel cardDigital Pet boardRohs (lead free conversion)
Controller
Test setup and work space
33
Device A Solder Joint Reliability ResultsDevice A Solder Joint Reliability Results
§ 1) Reliability Testing in HLA§ 1) Reliability Testing in HLA
§ 2) Thermal Study of deviceA solder joint temperature running “Real World” applications found:
§ Case 1: ?T = 18C for worst case temperature swing due to power cycling
§ Case 2: ?T = 2C for worst case temperature swing running application programs
§ 3) Product Life based on the reliability testing and customer use conditions for the time to failure is estimated
to be: (Condition 1: 5 power cycles/week, Condition 2: 2 power cycles/week)
§ 2) Thermal Study of deviceA solder joint temperature running “Real World” applications found:
§ Case 1: ?T = 18C for worst case temperature swing due to power cycling
§ Case 2: ?T = 2C for worst case temperature swing running application programs
§ 3) Product Life based on the reliability testing and customer use conditions for the time to failure is estimated
to be: (Condition 1: 5 power cycles/week, Condition 2: 2 power cycles/week)
>20 yrs>20 yrsN/A17.5HLA with Black "Resin Lab" epoxy
12 yrs5 yrsN/A17.5HLA with no changes to QFN (current
state)
Case 2Case 1
Projected Failures
(Condition 2)
ProjectedFailures
(Condition 1)
Acceleration Factor
(90% conf. Level)Test
Test # TestThermal Cycle
TestStatus
1st Failure
50% Failure
1 HLA #11 with cold plate-25 to +70C Stopped at 139 cycles with
81% failing. 79 126
2 HLA #12 with cold plate0 to +70C Stopped at 239 cycles with
73% failing.148 215
3 HLA #17 with Black "Resin Lab" epoxy -25 to +70C Stopped at 315 cycles
? ?
34
QFN Daisy Chain Solder Joint Reliability Results (ESS)QFN Daisy Chain Solder Joint Reliability Results (ESS)
Failure times are from Weibull models for each life test.
• Sn plated leads (test 1 & 2) performed the worst. Every HLA would fail
• NiPdAu plated leads (test 3 & 4) improved reliability but does not meet 10 year life requirement (6.8 years). Not much better
• New Mold compound with NiPdAu leads (test 5) improved reliability and meets 10 year life requirement (17 years) Package de-lamination killed this option
• The perimeter epoxy did not failed and exceeds 20 year life. In use today
Test # Test Status1st
Failure10%
Failure50%
Failure63.2% Failure
% Imp. from 50%
1Tin (Sn) lead plating & solder mask removed in middle area
All parts failed by cycle 94. 28 46 69 74 30%
2 Tin lead plating (Baseline) All parts failed by cycle 89.
21 36 53 57 NA
3Nickel Palladium Gold (NiPdAu) lead plating All parts failed by cycle
183.92 119 146 152 175%
4NiPdAu lead plating & solder mask removed in middle area
All parts failed by cycle 161.
75 102 131 137 147%
5NiPdAu lead plating & new mold compound (7730)
56% of the parts failed by cycle 346.
201 276 354 372 568%
6NiPdAu lead plating & perimeter epoxy (EP1325 )
530 cycles and no failures 1000 cycles - No Failures
35
Reliability Validation Reliability Validation -- BurnBurn--in, ARG and ELTin, ARG and ELTPurpose
Time dependent failuresSoftware bugsProduct stabilityMinimize DOA and early life failures
System RequirementsTester PlatformInsulated Room ($5K)Heating/Cooling control ($1K)
ProcessTPM controlledTemp set to 50C Daily Power cycleFormal tracking system
Test Duration• 48 hrs -Burn-in• 3-4 wks - ARG (Accelerated Reliability Growth)• 6 months - ELT (Early Life Test)
Costa Rica
North Reading
36
Example of Accelerated reliability testing results (As of 10/31/08)
Monthly Yield Chart
0
10
20
30
40
50
60
70
80
90
100
# of B
oar
ds
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Yie
ld
Boards Fails Yield
ITested 373 PCBAs
40728 Run Hrs 340 48 Hrs Burn-in
6 units ARG 2 units ELT
45 Failures modes (44 hardware/1 software)140 Hardware faults
6 Software faults146 Total failures
37
Final CommentsFinal Comments
Environmental Testing is key for reliability engineering
Tools are complementary and should be used appropriately
Testing must be a discovery tool not just a screen
Goal is to stop in production testing when yields are high
(reliability growth)
Reliability growth is a function of high quickly failures are
understood and eliminated
Environmental Testing is key for reliability engineering
Tools are complementary and should be used appropriately
Testing must be a discovery tool not just a screen
Goal is to stop in production testing when yields are high
(reliability growth)
Reliability growth is a function of high quickly failures are
understood and eliminated