A BEGINNERS GUIDE TO HALT

Testing with HALT by Chris Peterson

There are always beginners in any field. The economic conditions over the past year or so have changed the business climate. Many are opting to take early retirement if offered. Because of massive layoffs, people are being asked to do work that they have never done before. Mergers may mean that you suddenly own equipment that you don't know how to use. And there will always be people new to a field out of their own choice - perhaps fresh out of college or transferring from another department.

Working with HALT is an exciting challenge. I've been deeply involved with the process for over a decade now. I meet a lot of engineers that have a very good understanding of the process, but have lately been seeing that there are a lot of people who don't know very much about it and would like to learn more. There is also a crowd who would rather debate semantics that get down to work.

Let me start by saying that there is no one right way to run HALT. Let's take a minute to review what HALT is, and what HALT isn't.

HALT is ... HALT isn't ...
An excellent tool for learning about what types of failures you can expect to see during the lifetime of a product. The only tool in the toolbox.
A way of finding the absolute limits of a product A pass/fail test - you are expecting it to fail. That is the reason for the test.
Similar to classical ESS tests, taking it through to the failure stage (suggested as the first test with ESS, though many chose to skip that). As slow as ESS, and is purposely testing the product hard.
A way of saving money through warranty costs by catching possible failures ahead of time. A guarantee of any given dollar amount saved - it depends upon the product.
A process that allows you to understand your product more fully. Magic that fixes everything.

Usefulness of Accelerated Testing

No matter how you look at it, business is tough right now. Customers are expecting more for less, whether that customer is a personal consumer, a government agency, or another department within your own business. People want high reliability, low cost, the latest technology, and something that will last.

Let's face it; it's difficult to please everyone. You need to beat your competitor to market, but make sure that your product is going to last. How do you go about doing this?

One of the easiest ways of taking care of most of these issues is by accelerating the testing process. You can't afford a 20 year test to see if a light bulb is going to last 20 years in your neighbor's kitchen. You need to speed things up.

You can make the light bulb fail simply by pushing it off a table. It doesn't teach you much, though, other than that gravity still works. What is needed is a logical approach that allows you to accelerate the time, making the working environment like a time machine that can show you what failures are bound to happen. There are computer programs and different scientific formulas that can help you to make a correlation against possible lifetime, but the main point is that you will see what failures are most likely to happen out in the field. You should be able to find out within a matter of days what you might not have been able to find out for years.

Why is this important? One major company expense is warranty issues. Every time that Ford Motor Company finds a failure and fixes the weakness before the part can get into a car out on the road, their accounting department comes back and tells them how much Engineering just saved their company. You can literally save yourself millions by catching something early. This also adds to customer assurance. You win by saving money, but also make more money by getting repeat business.

Starting Temperature

The best way to start the test is at laboratory ambient. Why, you may wonder, do I use that term? I am a member of several working groups for the IEC (International Electrotechnical Commission). We have found that the term ambient, over the years has come to mean different things to different people. In the older definition of ambient, it is the chamber temperature around the product at any given time. However, most people use it as the room temperature that they are doing their testing in. By using the phrase laboratory ambient, I am trying to make sure that people understand to start their testing at about room temperature. I don't want anyone to be confused with whatever temperature the last test in the chamber left off at. Better safe than sorry!

Remember to constantly monitor the product. Ford Electronics, before becoming Visteon, reported publicly that 50% of failures of intermittent and would not have been caught without constant monitoring. It's not good enough just to take a reading at the beginning and a reading at the end.

A Series of Tests

Keeping in mind the thought that there is no one perfect way to run a HALT, here are a few basic thoughts that should be kept in mind. HALT is actually a series of tests. You should have more than one unit available for the testing, preferably one for each test of the series

From the purist standpoint, the best way to start is by testing using single environments, then run with combined environments for comparison. These are the six standard tests that we recommend, though this could be done in other ways:

Others, of course, can be added. If you are concerned about lower temperatures, you may want to add Cold with vibration. If you are worried about humidity, you should do a humidity only test and then start combining it with other environments such as thermal and vibration. Power cycling can also be very beneficial. Before the tests even start, you should be thoroughly familiar with the product to be tested. You should know as much as possible of the end environment that it will be placed in, then test accordingly. Remember: End users are always harder on a product than you will expect them to be, and they will expect it to continue working anyway.

Step by step

Once you've decided which environment, how do you start? They key is knowing your product. Is it as small as a PCB? Is it as big as a tank? How long will it take to stabilize at a temperature? Are some components more likely to fail than others?

Along these lines, how do you know if your product is stabilized at temperature? Let's use the example of what I will be testing today. It is a console for a vehicle that is roughly the same width as the ceiling of the van. There is no way that you could choose one representative spot on it and feel that the entire unit will be the same. The simple rule of thumb is: the larger the product, or the more diverse components it has, the more thermocouples need to be used. You will still control off of one main thermocouple. This should be placed where it is either representative of the entire unit or on the most sensitive component, depending upon your main concern.

A good starting place for all of the tests is to start at laboratory ambient. This is typically somewhere around 25°C. Fixture your product, if necessary, and hook up any wiring that may need to be in place. Keep in mind that your wiring or cabling will be seeing the same extremes as your product!

What is a failure?

Different companies will judge failures differently. I went to one company where they had product set aside because of pinpoint scratches in the paint job. To them, that was considered enough of a failure in the product that they refused to ship it out to a customer.

To some, the first intermittent failure is as far as they want to go. Others will want to get all of the way to a hard failure. This is something that you should keep in mind when you are planning your test - what will you consider as a failure?

The tests

Now that we've reviewed the main points that you need to think about before the test, let's take a look at the tests themselves. It's important to know that there is more to it than just pushing the Run button on the computer. You've already seen that you have to think things through in advance. We'll take a look at the different considerations for each of the tests, and the reasons for running them.

Cold step test

Using cold temperatures tends to be the least destructive of all of the single environments. That makes it a good starting point for doing testing.

You can choose the size of the steps based upon your knowledge of the product. If, for instance, you are concerned about the effects of cold on it, you should make the steps smaller. If you are looking for baseline data, you may want to start with larger steps, then make them smaller if you have a premature failure. With cold, many engineers are comfortable starting with steps of 10°C per minute change rate.

Once you've decided on your ramp rate, you need to decide your dwell time. How long should your product stay at temperature before ramping again? The prevailing opinion is that you keep the dwell time to the minimum needed to stabilize the product. For something like a PCB, this may be only five minutes. If you are testing an assembly it may need to be longer. Again, you need to use your own expertise to decide.

The heat test is very similar, just going the other direction. It is based on the same principles as the cold step test. First, settle in at laboratory ambient, then begin ramping and dwelling. Heat tends to be more destructive than cold, so you may choose to raise your temperature only 5°C per minute instead of going faster.

I had the chance to work with a company once whose product was going to end up in a hospital environment. Their first hard failure came at only 2°C above laboratory ambient. At first they weren't concerned, reasoning that hospitals are air conditioned and so they should never have to worry about the surrounding air getting too warm. I let them know of an extended hospital stay that I had where the temperature was controlled based on time of year, and the heating automatically kicked in because of the date, not because it was needed. Patient rooms ended up at close to 30°C. The customer reworked the board, and their unit is now number one in the market.

Moral: Make sure that you have a gap between expected use circumstances and what your product can actually survive through. Set it up in your mind as a worst case scenario, and then add a percentage. To paraphrase, if it can go wrong - it will go wrong.

Vibration only test

You've gotten through the easy tests, heat and cold. You've monitored and datalogged. You've made any changes that you feel are necessary. Now you are ready for vibration.

Once again you will start with the temperature at laboratory ambient. We do this to make sure that temperature is not going to affect the test. It may not seem like a real world situation, but right now we are concentrating purely on the vibration.

The standard way of measuring a vibration test is through the g level. Here a g, there a g - where do we measure the g from?

If you measure from the bottom of the vibration table, what you are really reading is what the table is doing. This may have very little to do with what your product is doing. The best placement for the accelerometer is on or near your product. Some prefer to attach the accel to the fixture holding the product in place. This is perfectly acceptable. If there is no easy way to affix it to the product or fixture, then you should mount it near the fixture on the table top. This will give at least a close approximation of what the product is seeing.

Vibration, as opposed to temperature, needs to be handled in very small increments. It can be difficult to control very tightly, so we suggest starting a 2 g's and moving up 2 g's at a time. The dwell time, again, is up to you and your knowledge of your product. There should be an ample settling time, so ten minutes is often used. You continue stepping up until you see what you consider to be a failure, again monitoring and data logging as you go.

Combinations

Now comes the fun part. You've gotten done with the boring steps. You've already learned more about your product, and about the responses of your other team members. ("You're doing what to my design?")

Thermal swing test

You've already done cold, heat, and vibration. The next logical test is thermal swings, combining the heat test with the cold test.

Set the chamber to begin at laboratory ambient and allow the product to settle in. Decide whether you would rather go hotter or colder first. Adding nitrogen to the chamber air will help to get rid of any latent humidity in the product, so this should be part of the consideration.

Decide on the amount of the ramp, typically either 5 or 10°C at a time, and decide upon your dwell time. Then get started.

Note: It is not unusual for a product to fail at extremely low temperatures, then come back to working order once more when warmed up. If you find a failure during the cold portion of this test, it is advisable to go on to the next hot step and find out if it starts working again.

There are valid reasons for doing the swing test. The difference between thermal coefficients is what can cause parts to pull away from each other, sometimes causing cracks. By applying these changes as fast as possible, we are stressing the unit beyond what it would normally see. There are people that claim that you will never see a temperature change of over 30°C per minute in most circumstances, but consider the following:

I live in Michigan. We are not the coldest winter state, but it is very typical to get a wind chill of -40°C at least once every winter. Say that my car gets stuck in the snow within a mile or so of home. I decide to walk (we Michiganders are a hearty breed).

>My cell phone just went from around 25°C to -40° as I stick it to my ear to call my husband and tell him of my poor luck. Since he is out on the road, I have to pull out my trusty palm computing device to get his number.

I get home, shivering, turn the heat up and stand over the heater - still using my trusty phone. The phone and computer warm back up to 25°C with the hot air rushing against them. By this point, not only have the electronics been stressed but so have I! I'm sure that you can think of other severe circumstances that we tend to put electronics products through.

Heat with vibration test

Let's do some shake and bake. Not dinner - although it's been done! It's time to use heat and vibration together to test your product out.

What do you choose for your vibration level? Let's say that you know that your product showed a failure at 10 g's in the previous vibration only test. Since you already know that this is the breaking point, you don't need to go right up to that. You are trying to learn what will happen when you combine thermal factors with the vibration.

Most of the engineers that I have worked with choose a level that is about 80% of the breaking point. In this case that would mean 8 g's.

What's the best way to do a combined heat and vibration test? Now you get to rely on your expertise once again. It is best to follow closely to the original heat test that you did, once again starting at laboratory ambient. That will give you a guide as to where you would expect your product to show a heat related failure. If the temperature was extremely high, you may want to skip some of the lower temperatures. Say that your product didn't show a failure until 110°C - you may want to settle in at 60° or so and start your steps from there.

The example in the above chart shows ten minute dwells, with vibration being run (lower than failure level) for two minutes out of every 5. Pulsing can cause failures, but so can constant running. It is up to you to decide the best way to run the test.

This is where comparison gets interesting. The typical result will be that the product will fail at a slightly lower temperature once vibration is added than it will using heat alone. There are always exceptions to every rule. This particular test should tell you a lot about how your product will survive a combination, which is what it will no doubt see in real life usage.

Thermal swings with vibration test

Using the data that you learned from your heat plus vibration test, apply the same principles to your thermal swings plus vibration. Again, you may want to skip ahead a number of steps if your failure was quite a ways from laboratory ambient.

I learned something very interesting from a machine design magazine article. Some companies are using extremely cold temperatures to de-stress equipment that will go into a high vibration area. Typically we learn that anything that we do to a product will end up taking life out of it. After applying the principles found in this article, I ran a few experiments with customers with totally different types of products. In each case, by combining vibration with cold temperatures, we found that they could actually withstand more vibration than they could at ambient. You may find this same result while you are doing your swing test with vibration. You may also want to add a cold test with vibration if your product is bound to have to survive low temperatures.

Now what?

You've got all of these numbers and failure times. Now what do you do? What do you want to do?

There's no reason to build a brick outhouse. (This is a family magazine - I'm sure that you're familiar with the more common phraseology.) If your product has lived up to what it needs for real life, plus a reasonable margin, you may not want to do anything to change it. Microsoft, for instance, after doing testing found that they could extend their warranty from one year to three on their mouse and still not worry about warranty issues.

If you don't end up breaking something, that doesn't mean that the failure is in you. It most likely means that you have an extremely tough product ready for market. If you do find a failure, think of it as a good thing. HALT is a learning tool. You are looking for a premature end of life to your product. It allows you to learn something quickly that might have taken months, and a lot of warranty replacements, to learn otherwise.

No magic bullet

If you do a HALT test, will you somehow be able to tell exactly how long the life of the product will be when put out on the market? Probably not. Unless you already have field failures on the same product, or one almost identical, it can be very hard to make a true time correlation. If you have field failures and can reproduce them in the lab under certain stresses and a given amount of time, the answer is yes. In that case you can make a good correlation.

Is a rapid thermal cycling chamber with tri-axial vibration the only chamber I will ever need again? If that was so, I'd be a millionaire. I have a hammer, screwdriver and wrench in my toolbox. I can't choose just one tool and feel that I can fix everything. The same goes for test equipment. HALT is an extremely valuable tool, but should not be considered your only one.

One of the best industry changes that I've seen is willingness for engineers to start sharing more non-proprietary information. If you want to learn more about HALT, look into groups like the IEEE AST (Advanced Stress Testing) group, take a seminar, and look into user groups. It is wonderful that people are starting to share more information so that we don't all have to start at square 1. Look to people that you know you can trust.

Final note

The most important thing to remember when you are doing HALT testing is to rely upon your own knowledge of the product. Preplan, knowing the end use environment. If you have plastics, know your melting points. Keep an open mind as you test, remembering that a product failure is not the same as a personal failure. Work with design engineers, project engineers, production engineers, management, and anyone else who can give good input.

I love working with HALT. There is no other industry that I would rather be in. I feel like the people at BASF - We don't make the _____ ... we make it better. I get the chance to help people make a better product. That leads to higher reliability, lower costs, more safety and security. As a child I wanted to help bring peace to the world. As an adult I found that it is hard to make such widespread changes. Now, as a teacher and manufacturer, I find that each one of us can make a difference and help the world to be a little bit better place to live in. You now have the possibility of doing your job better than ever before, turning out a better product for less money, and making someone else's life a little bit safer, easier, less expensive, and more reliable.