Short Quiz on RCM

Max marks = 10
1 mark for one right answer
-1 mark for a wrong answer
1. Which of the following statements is true?
a)  RCM does not bother about the consequence of a failure
b)  RCM does not try to achieve the inherent reliability of a system
c). Time-based Preventive Maintenance is the cornerstone of the RCM output
d)  RCM does not believe that failure rate increases with the age of the equipment
2. Which of the following statements is true in case of RCM?
a) Age of individual components can not be determined in a statistical manner
b) RCM believes that run-to-failure strategy is often the best strategy
c) RCM is not bothered about consequences of a failure
d) RCM is not interested in achieving the inherent reliability of equipment
3. Waddington Effect states:
a) The performance of an equipment improves upon regular overhauls
b) The failure rate goes up if an equipment is regularly maintained by time-based preventive maintenance
c) The failure rate goes down if an equipment is regularly maintained by time-based preventive maintenance
d) The failure rate remains the same with regular time-based preventive maintenance
4. The most important piece of information in RCM is:
a) Number of critical equipment
b) Failure Modes
c) Mean Time Between Failures (MTTF)
d) Mean Time To Repair (MTTR)
5. Most failures in a plant are:
a) Early
b) Wear Out
c) Random
d) Constant
6) Which of the following is not a failure mode?
a) Bearing seized
b) Bearing problem
c) Shaft sheared
d) Circuit opened
7) Which of the following must not be included in the Function of a machine?
a) Design Data
b) What the user wants an equipment to do?
c) Parameters that define the standard of performance
d) Operating Context
8) Secondary Functions are:
a) Additional functions an equipment is supposed to do
b) Functions of other machinery that ensure performance of a critical machine
c) Functions that are undesirable
d) Functions that are desirable but not essential for the performance of a machine
9) Which strategy might be the cornerstone of an RCM strategy?
a) Time-based maintenance (scheduled replacement/repair)
b) Condition Monitoring or Condition Based Maintenance (CBM)
c) Detective Maintenance (Inspections of Hidden Failures)
d) Design Out Maintenance (DOM)
10) To improve the inherent reliability of a system the best strategy is:
a) Condition Based Maintenance
b) Design Out Maintenance
c) Time-based Maintenance
d) Run-to-Failure
@Dibyendu De
1) d 2) a 3) b 4) b 5) c 6) b 7) a 8) b 9) b 10) b

A Movement towards RCM

29th December 2017, Kolkata

On 29th December 1978, F. Stanley Nowlan, Howard F. Heap, in their seminal work Reliability Centered Maintenance, revealed the fallacy of the two basic principles adopted by traditional PM (Preventive Maintenance) programs – a concept that started from World War II:

  •  A strong correlation exists between equipment age and failure rate. Older the equipment higher must be the failure rate.
  •  Individual component and equipment probability of failure can be determined statistically, and therefore components can be replaced or refurbished prior to failure.

However, the first person to reveal the fallacy was Waddington who conducted his research during World War II on British fighter planes. He found that failure rate of fighter planes always increased immediately upon time-based preventive maintenance, which for the fighter planes was scheduled after every 60 hours of operation or flying time.

By the 1980s, alternatives to traditional Preventive Maintenance (PM) programs began to migrate to the maintenance arena. While computer power first supported interval-based maintenance by specifying failure probabilities, continued advances in the 1990s began to change maintenance practices yet again. The development of affordable microprocessors and increased computer literacy in the workforce made it possible to improve upon interval-based maintenance techniques by distinguishing other equipment failure characteristics like a pattern of randomness exhibited by most failures. These included the precursors of failure, quantified equipment condition, and improved repair scheduling.

The emergence of new maintenance techniques called Condition Monitoring (CdM) or Condition-based Maintenance (CBM) supported the findings of Waddington, Nowlan and Heap.

Subsequently, industry emphasis on CBM increased, and the reliance upon PM decreased. However, CBM should not replace all time-based maintenance. Time-based or interval based maintenance is still appropriate for those failure cases, exhibiting a distinct time-based pattern (generally dominated by wear phenomena) where an abrasive, erosive, or corrosive wear takes place; or when material properties change due to fatigue, embrittlement, or similar processes. In short, PM (Time based or interval based maintenance) is still applicable when a clear correlation between age and functional reliability exists.

While many industrial organizations were expanding PM efforts to nearly all other assets, the airline industry, led by the efforts of Nowlan and Heap, took a different approach and developed a maintenance process based on system functions, the consequence of failure, and failure modes. Their work led to the development of Reliability-Centered Maintenance, first published on 29th December 1978 and sponsored by the Office of the Assistant Secretary of Defense (Manpower, Reserve Affairs, and Logistics). Additional independent studies confirmed their findings.

In 1982 the United States Navy expanded the scope of RCM beyond aircraft and addressed more down-to-earth equipment. These studies noted a difference existed between the perceived and intrinsic design life for the majority of equipment and components. For example, the intrinsic design life of anti-friction bearings is taken to be five years or two years. But as perceived in industries life of anti-friction bearings usually exhibit randomness over a large range. In most cases, bearings exhibit a life which either greatly exceeded the perceived or stated design life or fall short of the stated design life. Clearly in such cases, doing time directed interval-based preventive maintenance is neither effective (initiating unnecessarily forced outage) nor cost-effective.

The process of determining the difference between perceived and intrinsic design life is known as Age Exploration (AE). AE was used by the U.S. Submarine Force in the early 1970s to extend the time between periodic overhauls and to replace time-based tasks with condition-based tasks. The initial program was limited to Fleet Ballistic Missile submarines. The use of AE was expanded continually until it included all submarines, aircraft carriers, other major combatants, and ships of the Military Sealift Command. The Navy stated the requirements of RCM and Condition-based Monitoring as part of the design specifications.

Continual development of relatively affordable test equipment and computerized maintenance management software (CMMS like MIMIC developed by WM Engineering of the University of Manchester) during the1990s till date has made it possible to:

  •  Determine the actual condition of equipment without relying on traditional techniques which base the probability of failure on age and appearance instead of the actual condition of an equipment or item.
  •  Track and analyze equipment history as a means of determining failure patterns and life-cycle cost.

    RCM has long been accepted by the aircraft industry, the spacecraft industry, the nuclear industry, and the Department of Defense (DoD), but is a relatively new way of approaching maintenance for the majority of facilities outside of these four areas. The benefits of an RCM approach far exceed those of any one type of maintenance program.

    Fortunately, RCM was applied in India for a few Indian manufacturing Industries from 1990 onwards with relatively great success. I am particularly happy to have been involved in development and application of RCM in Indian industries, which has continually evolved in terms of techniques and method of application to meet contextual industrial needs.

    I am also happy to report that RCM for industrial use has now reached a mature stage of its development, which can be replicated for any manufacturing industry.

    I am of the opinion that this maturity would provide the necessary stepping stone to develop Industry 4.0 and develop meaningful IOT applications for manufacturing industries.

    Wish RCM a very happy birthday!


    Dibyendu De

Fractional Gear Mesh Frequencies

Recently I received an email which asked me give my option on a phenomenon the analyst observed.


Observing high vibs on pressing and lifting pinion Drive End (DE) and Non Driven End (NDE) bearing on a ball mill. Motor and main Gear Box drive are OK. Clear predominant gear mesh frequency is appearing in the spectrum along with harmonics and side bands. But 1st GMF (Gear Mesh Frequency) is predominant. side bands with pinion speed is also seen. no Girth Gear speed side band was observed

Some of the vibration data, spectrums and photos shown in the attachment. Phase measurements indicate inconsistency in the readings near pressing pinion bearing. Impacts were also seen in time waveform data along with modulation.pinion speed 122 rpm.  On pressing side bearing 2.03 Hz side bands are seen, On lifting side i can see side bands spaced at 6.09 Hz (That is 3 times of pinion speed). Both Pinion lifting and pressing bearings are behaving differently. the vibs are high on DE as compared to NDE on both pinions. Can we suspect eccentric moment of the pinions with looseness. Why am i seeing 2.03 side bands on pressing and 6.09 side band on lifting side bearings. What is the significance of this. One sample of GG tooth photo shows uneven shining surface on either side (refer photo). In this case I am seeing  (30 T = 1 X 2 X 3 X 5) and (210 = 1 X 2 X 3 X 5 X 7) 2 X 3 X 5 as the common factor. Pinion 30 teeth and GG has 210 teeth. Will this create gear ratio issues uneven locking and releasing of 2 mating teeths. But no 1/2 or 1/3 or 1/5 GMF seen in the data.


My reply was:


But after a quick look this is what I see as the problem: –

1. We are seeing 1/2, 1/3 and 1/5 of GMF — these appear due to common factors 2, 3, 5 as you wrote.

This means that the pinion is badly worn out and as the common factor teeth mesh they generate these fractional frequencies.

It also means that the GMF and the natural frequency are not separated by 2.5 times. [The natural frequency in the horizontal direction = 28.5 Hz; natural frequency = 30.9 Hz; Gear Mesh Frequency = 60.6 Hz]

Looking at the signatures it is clear that the GMF falls within 2.5 times the natural frequencies.

Also note how the GMF (60.6 Hz) falls right between two natural frequencies in both the vertical and horizontal  directions. (31.1 Hz and 83 Hz). This makes the situation worse.

2. From the time waveform, we can see vibration relaxation waves. It means that the wear out or damage is towards the addendum region of the pinion/gear
3. This means that the spray nozzles are wrongly placed or jammed. The nozzles must be placed after the gear mesh not at or before the gear mesh. Also ask the client to check for jamming of the nozzles and the present viscosity of the grease/oil and the quantity that is fed per hour.
4. We can also suspect eccentricity of the pinion and looseness.
5. There is a strong resonance. This appears to have generated from the top cover.
Dibyendu De
Deeper Lessons:

It is important to question as to what else we can do other than detect a problem or detect an incipient fault?

With the above analysis and information we can easily see the relationship between fractional GMF and lubrication and wear. It means we can build an algorithm that would warn us about an imperfect lubrication system that would in fact accelerate wear and put the system out of service.

Further, we can refine the specification of a purchase a gear box. The specification should state  — a) number of pinion teeth should be a prime number to prevent accelerated wear b) if a prime number can’t be achieved then the natural frequency in the three directions must be away from the GMF by at least 2.5 times the GMF.

Similarly, we can specify the gear box top cover natural frequency should be at least 4 times the GMF.

Scheduled running checks may include — a) rate of lubricant flow b) motor current c) placement of lubricant nozzles etc.

Untitled 3


Untitled 4


Untitled 5

Untitled 6

Untitled 7

Details of the case: (relevant data)


Steps for vibration measurements

Impact test was carried out at selected locations on the torsion bar to know its natural frequency
Normal vibration signatures were recorded with motor speed being 994 rpm and pinion speed 122 rpm

Vibration data was recorded on selected bearing locations of motor, gearbox and pinion bearings
Data was recorded along horizontal, vertical and axial direction with 90% load on the mill

Phase measurements were recorded to know the behavior of pinion DE with respect to pinion NDE of pressing and lifting side


Vibration signatures recorded on Pinion DE and NDE of both pressing and lifting side bearing shows predominant gear mesh frequency and its harmonics
Side bands were observed along with gear mesh frequency and its harmonics
Gear mesh frequency 60.9 Hz is appearing predominantly in all HVA direction

Time waveform recorded on pinion DE and NDE bearings clearly shows modulation which occurs due to above phenomena
Impacting of the gear teeth was also observed. Refer time plots provided in this report in subsequent pages

Only Side bands of pinion speed (2.03 Hz or 122 rpm) are seen, no side bands of Girth gear was seen in the data
The phase measurements recorded on pressing pinion DE and NDE along axial and horizontal direction shows the phase is not consistent with time suspecting looseness due to uneven movement of pinion

Vertical vibrations recorded on pinion DE bearing lifting side shows the vibrations are low (5 mm/sec) on one end while its high (11 mm/sec) on the other end even though it’s a common top cover of that bearing
For any normal 2 mating gears the selection of no. of teeth on each gear should be such that when factorizing is done no common factor should be found apart from 1

In this case pinion has 30 teeth and Girth gear has 210 teeth o Then as per calculations
Pinion 30=1x2x3x5,GG 210=1x2x3x5x7
So common factors are 2x3x5


 Untitled Untitled2
Untitled 8
Untitled 9

Eccentric Gears

Typical Symptoms: 1x radial (in Vertical and Horizontal directions)

Like eccentric pulleys, Eccentric gears generate strong 1x radial components, especially in the direction parallel to the gear.

They would also generate sidebands of the running speed of the eccentric gear around the GMF (gear mesh frequency). However, harmonics of GMF may also be generated (depends on the severity of the problem). Natural frequency might also be excited.

Time waveform: The waveform will have combination of 1x running speed of input and output shafts plus strong gear mesh vibration modulated by the running speed of the shaft having the eccentric gear.

Phase: Not applicable.

Eccentric Pulleys

Typical Symptom: High 1x in the direction parallel to belts. Though 1x component can be found on both Vertical and Horizontal directions.

Instead of the typical Vertical and Horizontal directions it is best to choose the directions parallel and perpendicular to the belts.

The high 1x can be found on both sub-assemblies (e.g. the motor and fan). Since the motor and the fan would run at different speeds we would also find two distinct peaks on the signature corresponding to the motor and fan running speeds. Confirmation about which pulley is eccentric can be obtained by removing the belts and checking for the presence of high 1x on motor in the direction parallel to the belts.

Time waveform would be sinusoidal when viewed in velocity.

Phase: Phase reading taken parallel and perpendicular to belts will either be in phase or 180 degrees out of phase.


Eccentric rotor

Symptom: Pole pass sidebands around 1x N (N=running speed) and 2xLf (Lf = line frequency)

Eccentric rotors will produce a rotating variable air gap between the rotor and the stator which induces a pulsating source of vibration. We would see 2xLf. However, there will also be pole pass sidebands around the 2xLv and 1xN peaks. 1xN is expected to be high.

Note: Pole pass frequency is the slip frequency times the number of poles. The slip frequency is the difference (in terms of frequency) between the actual RPM and the synchronous speed.

Presence of pole pass sidebands around 1N and 2Lf is the key indicator of this fault. One needs sufficient resolution to see those sidebands. Else we would either miss them altogether or mistake them for resonance (a broadening of the base of the peak).

Waveform: Time waveform that covers many seconds of time will reveal the pole pass frequency modulation. Due to lack of impacting the waveform will smooth and will be a combination of the 1N and 2Lf frequencies of vibration.

Phase: Not applicable for this fault unless eccentric forces are high in magnitude.


Dibyendu De

Structure of a 2 day workshop on RCM

Day 1 
Session 1 – Introduction to RCM, History and 7 Questions
* Definition of Reliability, RCM and the 7 Vital Questions
* Maintenance Strategies
* Waddington Effect
* Nowlan & Heap’s Failure Patterns
* Inherent Reliability and its improvement strategy
Session 2 — Operating Context and Functions 
* Introduction to Operating Context
* Operating Context for a System
* Elements to be included
* Operating Context and Functions
* 5 general operating context
* Operating Context and Functional Failures
Session 3 – Failure Modes and Failure Effects  
* Introduction to Failure Modes
* Few thoughts about data
* Exploring Failure Modes
* 4 Rules for Physical Failure Modes
* Failure Effect
* Evidence that failure is occurring
Session 4 — Failure Consequence and Risk 
* Introduction to Decision Diagram
* Risk assessment — how each failure matter
* Is the function hidden or Evident
* Relation of time and Hidden vs Evident
* Safety and Environmental Consequences
* Operational and non-operational Consequences
Day 2 
Session 5 — Strategies and Proactive Tasks 
* Introduction to Proactive Tasks and PF interval
* CBM/On-condition tasks
* Scheduled Restoration and Scheduled Discard Tasks
* Determining Task Effectiveness
* Risk and Tolerability
* General Rules for following the decision diagram
Session 6 — Default Actions 
* Introduction to Default Actions
* Default tasks for hidden failures
* Failure Finding Task
* Failure finding Interval
* Design Out Maintenance — to do or to be
* Walk around checks with right timing
Session 7 — RCM Audits 
* Introduction to Audits
* Fundamental of Technical Audit
* Technical Audit process
* Fundamentals of Management Audit
* General Management Audit process
* What RCM achieves
Session 8 — Setting up a Successful Living Program 
* Using the power of facilitated group
* RCM Training
* Knowledge development and its process
* Failure Modes and Design Maturity
* RCM during scale up or expansion
* Summary and Conclusion

Rethinking Maintenance Strategy

As of now, maintenance strategy looks similar to strategy taken by the medical fraternity in themes, concepts and procedures.

If things go suddenly wrong we just fix the problem as quickly as possible. A person is healthy to the point when the person becomes unhealthy.

That might work fine for simple diseases like harmless flu, infections, wounds and fractures. And it is rather necessary to do so during such infrequent periods of crisis.

But that does not work for more serious diseases or chronic ones.

For such serious and chronic ones either we go for preventive measures like general cleanliness, hygiene, food and restoring normal living conditions or predictive measures through regular check ups that detects problems like high or low blood pressures, diabetes and cancer.

Once detected, we treat the symptoms post haste resorting to either prolonged doses of medication or surgery or both, like in the case of cancer. But unfortunately, the chance of survival or prolonging life of a patient is rather low.

However, it is time we rethink our strategy of maintaining health of a human being or any machine or system.

We may do so by orienting our strategy to understand the dynamics of a disease. By doing so, our approach changes radically. For example. let us take Type 2 diabetes, which is becoming a global epidemic. Acute or chronic stress initiates or triggers the disease (Initiator). Poor or inadequate nutrition or wrong choice of food accelerates the process  (Accelerator) whereas taking regular physical exercise retards or slows down the process (Retarder). Worthwhile to mention that the Initiator(s), Accelerator (s) and Retarder (s) get together to produce changes that trigger of unhealthy or undesirable behavior or failure patterns. Such interactions, which I call ‘imperfections‘ between initiator (s), accelerator (s) and retarder (s) change the gene expression which gives rise to a disease, which often has to be treated over the entire lifecycle of a patient or system with a low probability of success.

The present strategy to fight diabetes is to modulate insulin levels through oral medication or injections to keep blood sugar to an acceptable level. It often proves to be a frustrating process for patients to maintain their blood sugar levels in this manner. But more importantly, the present strategy is not geared to reverse Type 2 diabetes or eliminate the disease.

The difference between the two approaches lies in the fact — “respond to the symptom” (high blood sugar) vs “respond to the “imperfection” — the interaction between Initiators, Accelerators and Retarders”. The response to symptom is done through constant monitoring and action based on the condition of the system, without attempting to take care of the inherent imperfections. On the other hand, the response to imperfections involve appropriate and adequate actions around the I, A, R s and monitoring their presence and levels of severity.

So a successful strategy to reverse diabetes would be to eliminate or avoid the initiator (or keep it as low as possible); weaken or eliminate the Accelerator and strengthen or improve the Retarder. A custom made successful strategy might be formulated by careful observation and analysis of the dynamics of the patient.

As a passing note, by following this simple strategy of addressing the “system imperfections“, I could successfully reverse my Type 2 Diabetes, which even doctors considered impossible. Moreover, the consequences of diabetes were also reversed.

Fixing diseases as and when they surface or appear is similar to Breakdown Maintenance strategy, which most industries adopt. Clearly, other than cases where the consequences of a failure is really low, adoption of this strategy is not beneficial in terms of maintenance effort, safety, availability and costs.

As a parallel in engineering, tackling a diseases through preventive measures is like Preventive Maintenance and Total Productive Maintenance — a highly evolved form of Preventive Maintenance. Though such a strategy can prove to be very useful to maintain basic operating conditions, the limitation, as in the case of human beings, is that it does not usually ensure successful ‘mission reliability’  (high chance of survival or prolonging healthy life to the maximum) as demonstrated by Waddington Effect. (You may refer to my posts on Waddington Effect here 1 and here 2)

Similarly, predictive strategy along with its follow up actions in medical science, is similar to Predictive Maintenance, Condition Based Maintenance and Reliability Centered Maintenance in engineering discipline. Though we can successfully avoid or eliminate the consequences of failures; improvement in reliability (extending MTBF — Mean Time Between Failures) or performance is limited to the degree of existing “imperfections” in the system (gene expression of the system), which the above strategies hardly address.

For the purpose of illustration of IAR method, you may like to visit my post on — Application of IAR technique

To summarize, a successful maintenance strategy that aims at zero breakdown and zero safety and performance failures and useful extension of MTBF of any system may be as follows:

  1. Observe the dynamics of the machine or system. This might be done by observing  energy flows or materials movement and its dynamics or vibration patterns or analysis of failure patterns or conducting design audits, etc. Such methods can be employed individually or in combination, which depends on the context.
  2. Understand the failures or abnormal behavior  or performance patterns from equipment history or Review of existing equipment maintenance plan
  3. Identify the Initiators, Accelerators and Retarders (IARs)
  4. Formulate a customized comprehensive strategy  and detailed maintenance and improvement plan around the identified IARs keeping in mind the action principles of elimination, weakening and strengthening the IARs appropriately. This ensures Reliability of Equipment Usage over the lifecycle of an equipment at the lowest possible costs and efforts. The advantage lies in the fact that once done, REU gives ongoing benefits to a manufacturing plant over years.
  5. Keep upgrading the maintenance plan, sensors and analysis algorithms based on new evidences and information. This leads to custom built Artificial Intelligence for any system that proves invaluable in the long run.
  6. Improve the system in small steps that give measureable benefits.By Dibyendu De



Doing Nothing yet Everything is Done

From 21st June to 23rd June I conducted a live workshop on Streamlined Reliability Centered Maintenance (SRCM) at the Power Management Institute (PMI) of National Thermal Power Corporation (NTPC).

But what the heck is SRCM?

It is a structured process of risk based decision making against black swans.

In brief, it is about:

  1. How to detect an incipient black swan in time?
  2. How to improve the stability of a system?
  3. How to improve the longevity of a system?
  4. How to mitigate consequences of failures?

When we are able to do all that to a system we may call it “smart maintenance.” After all as human beings we create, maintain and destroy systems. Given a system, smart maintenance is about doing all the three – create, maintain and destroy. Surely, it is one of the most complex project management we can engage with.

However, the smart maintenance can really happen when one simply does nothing yet everything is done.