On reliability and availability

Reliability and availability are quality indicators that must be thoroughly discussed between buyers and suppliers of decentralized energy installations. And it is crucial that all parties involved use the same definitions, writes Dr Jacob Klimstra

The ultimate goal is that power is continuously available

Credit: Aprovis Energy Systems

Prospective owners of decentralized electricity generation installations always hope that their new installation will be of good quality. However, what quality means has often not been properly defined when the selection process started. The quality of generating equipment can have many aspects. For electricity supply, the ultimate quality goal is that power is continuously available with the right voltage and frequency at competitive costs.

For a heating device, quality might mean that heat will be produced with high fuel efficiency and low emissions at any moment that the user wants it. In response, manufacturers and packagers never fail to highlight the performance of their products when advertising and when talking with customers.

In practice, operators of generating equipment put a high value on the reliability and availability of their installation. For them, a slight deviation in the fuel efficiency is less of a problem than an installation that fails all the time. The economic consequences can be very negative when an installation cannot run when it is supposed to run. The big question is whether the buyers ask for the proper information during the purchasing process.

Reliability and availability are quality indicators that must be thoroughly discussed between buyers and suppliers. The results have to be properly put on paper in the contracts. It is important that all parties involved use the same language in this respect: the definitions have to be clear.

Prospective owners and operators of decentralized units should master these definitions as well as the many aspects related to them. They should also understand the implications of reliability and availability for the operation of the installation. Only then is it possible to oversee the consequences of aspects such as maintainability, spare-part logistics and minimum maintenance intervals.

Running to failure is generally no option for generating equipment. All running equipment is subjected to wear, and therefore regular replacement and readjustment of wearing components is required. Components of an installation can suddenly fail, resulting in unscheduled downtime.

For a designer of the machinery, the skill is in selecting components that will have a high predictability in lifetime. Naturally, proper testing under real operating conditions is also necessary. That is why manufacturers often use a launching customer for a new product to check, and optimize the design. A combination of proper design and a proper operating experience is required to give a good prediction of the reliability and availability that can be expected.

Component reliability

Most equipment for decentralized electricity generation has typical parts, often called consumables, which cannot be repaired but have to be replaced. Most of the time, the life of such components is not exactly predictable.

Typical examples are the spark plugs of reciprocating engines and the igniter of gas turbines. The life of a spark plug depends on the conditions in the engine cylinders and these depend, e.g., on the tuning of the engines and on the quality of the fuel. Although a multiple-cylinder engine can run for a short while with a single failing spark plug, it is always recommended to stop as soon as possible in such cases and replace it. Failing ignition of a cylinder will result in poor fuel efficiency, high hydrocarbon emissions and running speed instability.

Another example is the lubricating oil. Lubricating oil wears due to fouling, evaporation of the volatile components, acidification from combustion end products and oxidation due to contact with air. The oil has therefore to be replaced regularly. The maintenance intervals are normally chosen in such a way that consumable parts will seldom fail within that interval. The corrective actions carried out are called preventive maintenance.

For consumables, so-called component reliability is an important quality indicator. Component reliability is, by definition, the probability that components will operate for a specified period of time under the design conditions without failure. Therefore, the component reliability is not a fixed number. A single spark plug from a box of twenty can fail immediately after installation due to a manufacturing error or an incorrect action by the person who carried out its mounting. Touching the insulating porcelain with one’s fingers drastically reduces its flash-over resistance. Using an incorrect fixation torque can cause leakage of hot gases from the combustion chamber and will deteriorate the cooling of the spark plug.

These are problems in the early region of failure. Such early faults can be reduced to a minimum through well-described working procedures during manufacturing and mounting, and through testing the consumable before installation. However, consumables also wear during operation due to erosion, corrosion and oxidation.

The component reliability function of a properly mounted, good- quality consumable part might be like the curve of Figure 1. During the first 2000 running hours in this example, the component reliability remains very close to one. At 2000 hours, the probability of failure is about 0.01 or 1%, meaning that one out of 100 identical components will fail in the time interval between zero and 2000 hours of running. If an installation has 20 of such components, the probability of a problem occurring with such a component within the 2000 hour time usage interval is 20%. In other words, every 10,000 hours one can expect a stop caused by failure of one of these 20 components if they are replaced every 2000 hours. This might be quite acceptable for an operator of the installation.

Using these consumables for 4000 hours before replacing them will drastically increase the number of failure-related stops. At 4000 hours, the component reliability is only 0.9, meaning that the probability of failure is 10%. For a situation with 10 such consumables in an installation, the probability is 100% that a stop due to failure has occurred in a time interval of 4000 hours.

The optimum replacement strategy of consumables depends on many factors. The costs of early replacement have to be weighed against the costs of unplanned stops. It might be that a particular season of the year can allow more stops than another. A strategy can then be to renew sensitive consumables slightly before the crucial running timespan occurs.

It also depends on collateral damage that can occur when a consumable fails. Worn out lubricating oil can cause irreparable harm to an installation. A failed blade of a gas turbine’s axial air compressor can damage the complete rotor. Insurance policies covering machine breakdown can also demand early replacement of wearing components.

Again, close co-operation between users and manufacturers can increase insight into wear patterns and henceforth lead to an optimum maintenance strategy.

Operational availability

In order to estimate the fraction of time that an installation such as a decentralized generating unit can properly perform its duties, the operator first has to know the time that the unit requires for scheduled maintenance. Insight into the availability helps to optimize contracts for electricity and heat supply and for determining if there is need for backup capacity.

Many definitions exist for availability. By definition, the operational availability OA is the time remaining when all the scheduled maintenance actions are subtracted from the total time and this remaining time is divided by the total time:

OA = (average time between maintenance actions) /(av.time between maint.actions+maintenance time+logistic delay) ࢀ¢100%

A total time for maintenance actions of 1000 hours during a running time of 40,000 hours renders an operational availability of 97.5%. The time required for maintenance actions includes the actual maintenance time and the logistic delay of deliverance of the part to be replaced. Maintenance contractors often guarantee the time required for maintenance but exclude this logistic delay in the contracts. It might mean that the operator of an installation has to wait for a long time without the possibility of claiming compensation from the servicing company for excessive standstill time.

Figure 3. A spark plug is an example of a part for which component reliability must be known

Credit: Champion

In the case of a unique installation or a remote location, the spare parts logistics might take more time than otherwise. An operator can opt to keep a stock of frequently used spare parts, although this results in additional capital expenditure.

The maintenance time also depends on the way of working of the crew that carries out the work. Operating in shifts with sufficiently skilled personnel can substantially shorten the work, compared to the case where a single mechanic unfamiliar with the unit does the work. In the case of, e.g., weekend stops and summer stops because of lack of heat and power demand, the bulk of the maintenance work might be carried out during those times.

Unscheduled outages and system reliability

The time caused by unscheduled outage is not included in the definition of operational availability as given above. Unscheduled outage can be caused by real failure of the equipment, by operator mistakes or by nuisance trips from inadequately functioning control and monitoring equipment. Such outages are commonly expressed in reliability values; here we will use the so-called system reliability.

Unreliability is also closely connected with the reproducibility of the wear rate and wear pattern of installation components. Inadequate maintenance and operating outside the normal window increase unreliability.

A complex installation such as a cogeneration unit consists of many individual components that can only run in dependence on other components. Figure 2 is an example of the speed control system of a rotating generating set. This is a chain of interrelated elements in series. In this case, the series system is the speed sensor and its holder, the connecting cable between the sensor and the signal converter, the control unit, the connecting cable between the control unit and the fuel supply valve, and the fuel valve itself. Officially, the connectors of the wires are also part of the system.

The connecting cables normally have very high component reliability, but they sometimes fail due to loose connections. The sensor might have a component reliability of 99.99%, while the component reliability of each cable plus its connectors might be 99.9% at a certain moment in time. The signal conditioner and the control unit might each have a component reliability of 99.5% and the fuel supply valve one of 99%. All of these individual components together make a series system.

The combined reliability of such a series system is the product of the individual component reliabilities. The combined reliability of this series chain is 0.9999 ࢀ¢ 0.999 ࢀ¢ 0.995 ࢀ¢ 0.999 ࢀ¢ 0.995 ࢀ¢ 0.999 ࢀ¢ 0.99 = 0.977 equalling 97.7%. In statistics, this is the application of the so-called product rule.

In this example, the reliability of the series chain is slightly lower than that of the fuel valve. The values given here are just used as an illustration of the mechanism. In reality, the fuel supply system has a much higher reliability than 97.7%.

A decentralized energy installation can only run if its essential parts are all in proper order. All these parts are, in effect, a series chain, since if one element in the chain fails, the whole unit will stop.

In practice, it is close to impossible to determine the combined system reliability via all the individual component reliabilities. Designing for 100% system reliability is almost impossible. However, by using well-tested components with high component reliability and proper manufacturing techniques, and by sticking to timely preventive maintenance, it is possible to create an installation that is able to run 97% to 99% of the time that it should run. The sector therefore uses the following definition for system reliability:

System reliability= (scheduled running time-unscheduled outage time) / (scheduled running time) ࢀ¢100%

The actual combined practical availability of an installation is therefore the operational availability minus the fraction of time that the unit cannot run due to unreliability. In a case with 99% operational availability and 97% system reliability the unit is, on average, not available during 4% of the time.

It is important that warranties properly indicate the timespan for which the availability applies. Sometimes the values apply only until a major overhaul is required. A major overhaul might take a couple of weeks or even months.

Other reliability criteria

Other ways to express reliability are the mean time between failures and the mean time between undesired stops. Sometimes many trips occur without causing lengthy outages. Examples are known of a case where one component had a poor component reliability that could be expressed as a straight line between 1 and 0 during a timespan of 4000 hours. It meant that, on average, the component life was 2000 hours.

This particular installation had 16 such components. Of these components, 10% had already failed within 400 hours. This meant that, on average, the machine stopped every 125 hours due to the failure of one of these components. Replacing the components costs only a few minutes, but it is the undesired stop and the associated unavailability of the output that causes the nuisance.

Figure 4. High quality air filtering is essential for high reliability and low maintenance of gas turbines Credit: ACS-AFPRO Filters – Alkmaar, the Netherlands – www.acsfilters.com

An undetected loose sensor connection is just an example. Issues can also arise where the decentralized installation is not properly matched with, e.g., a heating system. Suddenly occurring high return water temperatures caused by switching off a large absorption chiller or by incorrectly tuned temperature control valves can result in undesired unit trips. Nuisance trips also occur due to, e.g., electromagnetic interference or errors in control algorithms.

In many cases, poor reliability does not arise because of bad basic design, but because of neglect, poor operation and poor maintenance approaches.

Asking the right questions

This article does not intend to draw prospective investors away from buying equipment for decentralized generation. Many manufacturers are offering equipment of a very high quality. Decentralized generation is the technique of the future in a world with many renewable resources and an optimum use of available fuels. Rather, this article intends to give some definitions and to assist potential buyers in focussing their needs and ensuring that they ask the right questions. The article can also be useful for insurers, helping them to properly define which issues must be addressed in a policy. If our readers use different definitions which can be useful for the sector, they are kindly requested to respond.

In a subsequent article, solutions for the fact that a single generator cannot have 100% reliability and availability will be given.

Dr Jacob Klimstra is Managing Editor of Decentralized Energy

No posts to display