Improving Inherent Reliability of a System

The inherent reliability of a system is determined by the system’s design. It means that the design of the system would determine the upper limit of reliability the system exhibits during operation. Suppose, for example, a system, with the best possible maintenance is able to achieve availability of say 90% we can say that this is the upper limit of the system’s capability that is determined by its design. A good “preventive maintenance” plan can never improve a systems inherent reliability. In other words, preventive maintenance, contrary to what many believe, cannot make a system “better”. It may, at best, only help realise the inherent reliability as determined by the physical design.

Hence the suggested process to “improve” the inherent reliability of a system, may be framed as follows: –

Understand the dynamics through tools like vibration analysis
Monitor changes and rate of change
Eliminate unnecessary maintenance tasks
Change the design of the system interactions to eliminate inherent “imperfections” and revise the maintenance plan.

In most cases, this would be the general approach.

Until we can effectively undertake some design changes (Design Out Maintenance – DOM) or take measures to eliminate inappropriate maintenance actions (Review of Equipment Maintenance – REM) it would not be possible to go beyond inherent reliability of an equipment, specially if it is undesirable in the business context. For example, a vertical pump of a power plant kept failing very frequently or had had to be stopped quite often when vibration shot beyond the trip limits. This behaviour of the system is determined by the design of the system. Unless the design (specifically the interactions between components) is corrected for improvement; the system (vertical pump) would continue to behave in that manner for all times. Likewise if the MTBF of a machine is say 90 days, it would not be possible to considerably improve the MTBF way beyond 90 days unless some undesirable interactions (which I call system “imperfections”) are corrected for improvement and a proper review of existing maintenance system is carried out. 

Such “imperfections” can be both physical and non-physical. Design features, most importantly, the interactions between physical/non-physical components are arguably the most important characteristic of a system that determine a system’s inherent reliability.

In addition, there are many physical design features that influence reliability like redundancy, component selection and the overall integration of various pieces of the system.

In the context of RCM, design extends far beyond the physical makeup of the system. There are a number of non-physical design features that can affect, sometimes profoundly, the inherent reliability of a system. Among these are operating procedures, errors in manufacturing, training and technical documentation. When a proper RCM analysis is conducted on a system or sub-system, there’s a good chance that the resulting maintenance actions will enable the system to achieve its inherent reliability as determined by its physical design features. However, if the inherent reliability is below user’s expectation or need then the design features are to be improved to achieve the desired level of inherent reliability.

Moreover, if unwarranted maintenance tasks are eliminated as it will greatly reduce the risk of suffering the Waddington Effect. There is also a good chance that if operating procedures, training, technical documentation and so forth are found to negatively impact inherent reliability, these issues will be identified and corrected. As evidenced by the Waddington Effect. In virtually every case, less than optimal, non-physical design features almost always have a negative impact on inherent reliability. Therefore, in RCM analysis a through review of existing maintenance plan (REM) along with DOM is necessary to improve inherent reliability of a system.

In brief, right amount of Condition Based Maintenance (CBM) tasks, Scheduled Inspections (which is a part of CBM activity) REM and DOM would not only help us realise the inherent reliability as determined by the physical design but also improve it, if the original inherent reliability is below business expectation.

 

Dibyendu De

Advertisements

Structure of a 2 day workshop on RCM

Day 1 
Session 1 – Introduction to RCM, History and 7 Questions
* Definition of Reliability, RCM and the 7 Vital Questions
* Maintenance Strategies
* Waddington Effect
* Nowlan & Heap’s Failure Patterns
* Inherent Reliability and its improvement strategy
Session 2 — Operating Context and Functions 
* Introduction to Operating Context
* Operating Context for a System
* Elements to be included
* Operating Context and Functions
* 5 general operating context
* Operating Context and Functional Failures
Session 3 – Failure Modes and Failure Effects  
* Introduction to Failure Modes
* Few thoughts about data
* Exploring Failure Modes
* 4 Rules for Physical Failure Modes
* Failure Effect
* Evidence that failure is occurring
Session 4 — Failure Consequence and Risk 
* Introduction to Decision Diagram
* Risk assessment — how each failure matter
* Is the function hidden or Evident
* Relation of time and Hidden vs Evident
* Safety and Environmental Consequences
* Operational and non-operational Consequences
Day 2 
Session 5 — Strategies and Proactive Tasks 
* Introduction to Proactive Tasks and PF interval
* CBM/On-condition tasks
* Scheduled Restoration and Scheduled Discard Tasks
* Determining Task Effectiveness
* Risk and Tolerability
* General Rules for following the decision diagram
Session 6 — Default Actions 
* Introduction to Default Actions
* Default tasks for hidden failures
* Failure Finding Task
* Failure finding Interval
* Design Out Maintenance — to do or to be
* Walk around checks with right timing
Session 7 — RCM Audits 
* Introduction to Audits
* Fundamental of Technical Audit
* Technical Audit process
* Fundamentals of Management Audit
* General Management Audit process
* What RCM achieves
Session 8 — Setting up a Successful Living Program 
* Using the power of facilitated group
* RCM Training
* Knowledge development and its process
* Failure Modes and Design Maturity
* RCM during scale up or expansion
* Summary and Conclusion

The Sad Story of the HFO pump

This is a HFO (Heavy Fuel Oil) screw pump used in Power Plant for running boilers. There was a catastrophic failure of the pump. Though this pump was regularly monitored by vibration (in velocity mode — mm/sec) it didn’t give any indication of the impending failure.

The screws of the pump rubbed against each other and the case hardened layers of both screws were crushed. The force was so great that the body of the pump also cracked. Evidence of corrosion was also noticed.

What caused it? 

For want of HFO oil, the plant personnel were forced to pump LDO (Light Diesel Oil) through this HFO pump for the past one year.

Hence the I, A, R factors that contributed to this catastrophic failure are the following:

Initiator(s)I — factor(s), which triggers the problem — low viscosity of LDO compared to that of HFO was the significant ‘initiator’ in this case. While viscosity of LDO ranges from 2.5 to 5 cSt, the viscosity of HFO varies between 30 to 50 cSt (depending on the additives used). Use of lower viscosity oil ensured metal to metal contact thereby increasing Hertz stress that led to collapse of the hardened layer of the screws.

Accelerator(s)A — factor(s), which accelerates the process of failure —  a) Indian HFO does not contain friction modifiers such as vanadium and magnesium. Their absence causes higher friction between the screws (approximately 70 times increase in friction), which accelerates the wear process. b) Moreover, presence of vanadium and magnesium additives in HFO and LDO acts as anti-corrosive agents. Notice that the failure happened a year after the management decided to pump LDO rather than HFO through the HFO pump — enough time for corrosion to take effect. So, we may say that there are at least two factors that accelerated the failure process. There are other effects too on system performance, which we shall discuss in a moment (refer “Note”).

Retarder(s)R — factors that slow down the failure process — a) surface finish of the screws b) right clearance of the bearings c) presence of chromium in the screws.

Surface finish plays a very important role in reduction of metal to metal friction and also allows fluid film development. Ideally the surface finish should be between 3 to 6 microns CLA (Centre Line Average) for best effect. This can be introduced as a specification of the MOC (Material of Construction).

Similarly, excessive clearance in bearings would modify the hertz stress zone or profile — both in width and depth, which would cause shear of the hard layer (depth of which depends on the type of hardening and the type of steel used) and the soft layer (core material). Depth and type of hardening might also be specified in the MOC to prevent failures and extend life of the equipment. Presence of chromium in the metal would help formation of Vanadium – Oxygen – Chromium bond which would effectively enhance the life by providing better lubricating property which in turn would ensure a high level of  reliability of the equipment.

Hence, once the I, A and R s are identified appropriate measures can be taken to modify maintenance plan, MOC etc to ensure long life of the equipment without negative safety consequences (heart of reliability improvement).

Example:

  1. Specify addition of Vanadium and Magnesium in the HFO during supply or these may be added at site after receipt of supply. (Material specification during purchase)
  2. Ensure the right viscosity of oil to be pumps through HFO pumps. (Monitor viscosity of the supply oil — not higher than 50 cSt and not lesser than 30 cSt)
  3. Specify surface roughness of the screws — 3 to 6 microns (CLA).
  4. Specify depth of hardness of the screws (below 580 microns so that the interface between the hard layer and the soft core remains unaffected by the Hertz stress) during procurement and supply. Preferable type of hardening of the screws would be nitriding.
  5. Specify chromium percentage in the screws (during purchase).
  6. Monitor bearing clearance on a regular basis and change as needed (by vibration analysis based on velocity and acceleration parameters).
  7. Monitor the body temperature of the pump to notice adverse frictional effects
  8. Monitor growth of incipient failures in the screws by vibration monitoring (acceleration and displacement parameters)

Note

1. (Effect of IAR on system performance — i.e. the boiler – superheater – pipes):

Problems of high temperature corrosion and brittle deposits drastically impair the performance of high-capacity steam boiler of Power Plants, using HFO. Research* shows that heavy fuel oil (HFO) can be suitably burned in high capacity boilers. However, if HFO is chemically treated with an anticorrosive additives like Vanadium and Magnesium, it diminishes high temperature corrosion that affect some operational parameters  such as the pressure in furnace and pressure drop in superheaters and pipe metal temperature, among others like atomization and combustion processes. Therefore, inclusion of right additives like Vanadium and Magnesium have been found to diminish high-temperature corrosion and improved system performance.  It therefore makes sense to monitor these parameters, which can provide direct information on the degree of fouling, as well as of the effectiveness of the treatment during normal boiler operating conditions.

*Source

2. Effect of Vanadium Oxide nano particles on friction and wear reduction

Ref:

  1. Two approaches to improving Plant Reliability:
  2. Rethinking Maintenance Strategy:
  3. Applying IAR Technique:

By Dibyendu De

Two approaches to improve — Plant wide Equipment Reliability

The first approach is to conduct a series of training programs along with hand-holding. During such programs, participants apply the concepts discussed in the programs on the critical machines to modify the existing maintenance plan or methods to improve equipment reliability over a period of time. It is effective if the organization fulfills two vital conditions. First, the organization has in place a reasonably competent condition monitoring team and the use of condition based maintenance strategy is quite widespread in its acceptance and application throughout the plant. Second, the number of failures/component replacement in the plant in a year is not more than say 60. We would call this method — The Interactive Training Method.
The second approach is a more hands-on, direct and intensely collaborative. Each critical equipment is thoroughly examined in its dynamic condition to find out its inherent imperfections that cause failures to happen. Such imperfections, once identified by deep study, are then systematically addressed eliminate the existing and potential failure modes to improve MTBF and Safety. Based on the findings, the maintenance plan is formulated or appropriately modified to sustain the gains of implementing the findings. This activity is to be done during the program. This approach is effective when the failure rate in the plant is random and high (more than 60 failures/component replacement in a year) and/or maintenance load is heavy and repetitive along with high maintenance cost in spite of having a reasonably equipped condition monitoring team in place. We would call this — The Deep Dive Approach.   
 
Outline of the two methods: — The processes involved along with approximate costs. 
 
The Interactive Training Method:  
 
1. Such training sessions are conducted once every two months for a duration of 4 days each over a period of 24 months.
2. The training programs would essentially focus on the following == a) the RCM process focussed on Failure Modes b) Vibration Analysis c) Lubrication analysis and management d) Bearing failures and practical reasons e) Root Cause Failure Analysis method — FRETTLSM method. f) Friction, Wear Flow, Heat, g) Foundations and Structures. h) Condition Monitoring of Electrical failures i) Maintenance Planning based on nature of Failure Modes j) Life Cycle Costing k) Auditing RAMS (Reliability, Availability, Maintainability and Safety).that would help in self auditing the process — in total 12 programs
3. Accordingly, there would be 12 visits to the plant. During each visit one of the above topics would be covered. Once the improvement concepts are delivered, the participants (assigned for focussed plant improvement) would collaboratively engage in designing appropriate measures to improve or modify the existing maintenance plan of each critical machine to improve its MTBF and Safety. This activity that involves a fair amount of handholding would be done during the visit. Number of critical machines to be taken up for each visit would be decided by the management or participants. Number of participants = 10 maximum
4. Subsequent paid audits to refine the process would be optional — after the completion of 24 months intervention period.
The Deep DIve Approach:  
 
Such interactive sessions would be conducted once every two months for a duration of 4 days each over a period of 18 months.
2. Each interactive session of 4 days duration would focus on one critical equipment at a time. In total 9 critical equipment would be covered during the 18 months period with a selected group of people, assigned to the project of improving reliability. During each sessions each of the critical equipment would be examined deeply and in totality to find the inherent imperfections that cause different failures in the system.Once, these imperfections are identified, time is taken to appropriately address the “imperfections” and simultaneously formulate or modify the existing equipment maintenance plan for sustaining the gains on an on-going basis. This collaborative activity would be done during the program. In this process, participants learn by doing.
3. In total there would be 9 visits to the plant. During each visit one critical equipment would be taken up for the deep dive study taken to its full logical conclusion. Number of participants = 10 maximum.
4. Subsequent paid audits of the progress is optional.- after the completion of 18 months intervention period.