Plant Reliability Improvement — RAPID

A) The Background

Engineers have tried to address the issue of Reliability of equipment and systems in various ways. Therefore, many methods and techniques are in existence. The usual approach pivots around the concept of “failure modes” and how best to guard against those so as to prevent the consequences of failures. However, the existing methods do not take into account, flow of energy as an important factor that determines reliability of plant and machinery.

Taking this into consideration, I am putting together some rules and principles that hopefully point to a path that might enable engineers to improve plant reliability with minimum effort, resources and time.

But before I do that I would like to put forward an easy understanding of the term — reliability.

B) Reliability of a Machine:

The period of time a machine would run (fulfils its function) without a problem or trouble.

Longer the period of trouble free running better is the reliability of the plant.

It therefore addresses the heart of reliability improvement — i.e. enhancement of useful operating life of a machine or a system of machines.

C) What does that mean in terms of energy flow?

We may say that a machine that can continue with the smooth flow of energy, with the minimum wastage of energy for a long period of time is more reliable than a machine where energy flow is disrupted frequently in some manner or the energy wastage is high or the energy is pushed out of equilibrium condition, which invariably stops the machine from functioning effectively.

D) So what might be the job of engineers?

It is evident that the fundamental job of engineers (both operation and maintenance) is to run and maintain a machine or system in such a way so that the energy wastage is minimized and smooth flow is ensured for a long or desired period of time.

E) How can this be done?

This may be done in three fundamental ways, which are as follows:

  1. Observe the dynamics of a machine or system to ascertain energy flow patterns and the degree of energy wastage and the reasons for such wastage. Also determine the degree of stability or instability of the system and what affects a system’s stability.
  2. Adjust or maintain the conditions within its operating context to ensure smoother flow of energy with minimum wastage. In the process, learn what changes in a machine/system would disrupt energy flow or push it out of equilibrium conditions and prevent such disruptions.
  3. Change, monitor, modify the system (made up of physical asset and components, process, information flow, analysis and decision making, teams) as necessary, for smoother flow of energy to continue over longer period of time with minimum energy wastage.

F) How to apply it in a real plant?

I have found the following method useful in various plants where I implemented Reliability Improvement Programs.

  1. Make a list of critical machines.
  2. Select a critical machine along with its sub-systems (machines that support its functioning)
  3. Apply the three steps as outlined in Section E (above) — within the operating context.
  4. Improve and stabilize performance; record the learning.
  5. Create a custom made monitoring system to spot changes in time along with custom made expert system to guide engineers to quickly decide the course of actions to be undertaken.
  6. Record the decisions, actions and changes in the form of equipment history.
  7. Move to the next critical machine and its sub-systems.
  8. As we go along, first check whether the all consequences of failures have been taken care of. Next check whether overall plant/area/section MTBF (Mean Time Between Failures) is going up and whether MTTR (Mean Time to Repair) is going down along with consequential lowering of maintenance and operation costs. Lastly check the accuracy of the custom made expert system (usually made up of multivariate parameters) in is ability to forewarn and guide maintenance decisions.

G) General Rules:

Keeping the above in mind I formulated the following four rules that might help engineers managers stay on the path of plant reliability improvement:

  1. For any machine, energy tries to move in sync through all elements of a machine through various interfaces against many contradictions and constraints but always choosing the path of least resistance.
  2. Changes in contradictions, constraints and interfaces change the quality of energy flow forcing energy flow to go out of equilibrium (instability) to either cause degradation of performance or cause failures that lead to unwarranted plant stoppages, affecting costs and productivity.
  3. Changes and the causes of such changes are reflected in the dynamics of a machine in terms of interdependent parameters like vibrations, heat, flow, wear, humidity, temperature, pressure etc.
  4. Reliability of any machine or system can be improved by either maintaining the contradictions, interfaces and constraints to “just right conditions” or changing those to enable smoother flow of energy for a longer period of time with minimum wastage.

H) Applications:

Having applied these basic rules some industrial plants were able to gain on-going benefits for years. Here are some examples of — Plant Wide Reliability Improvement

Fixing Organizational Problems

Every day, managers in different organizations face an array of problems. Usually, such problems keep repeating — either randomly or at regular intervals. After a while, it then becomes clear to the managers that such problems resist current ways of thinking and actions as practiced within the organization. 

The réponse to such problems is — “How this problem can be fixed permanently?” 

A manager would then try to apply known theories, methods, and tools to solve the problem. And in this process, the managers can also increase their skills. But the problem is that the problems simply don’t vanish. They have a bad habit of sneaking back through the backdoor. 

Why is that?

The short answer is — “No problem can be fixed, at least permanently.

This is because the nature of the problems keeps changing with time or the same problem comes back with different intensity or frequency. 

However, one can find and install new guiding ideas. And one can intently engage in redesigning an organization’s infrastructure, policies, rules, methods, and the tools presently used to find new ways of dealing with work and problems. 

The key is to closely observe what is going on in the present and then discover the organizational subconsciousness (mindset) that allows such events to happen with alarming regularity or randomly. Once that mindset is found, a manager can then find new ways of thinking and practices to replace the old governing mindset. 

If one keeps going in this way one can gradually evolve a new type of organizations that is responsive, agile and observant about the numerous interactions that go within an organizational environment to become a better and a fitter organization. 

It would then be able to deal with the problems and opportunities of today and invest in its capacity with the right resources and efforts to embrace a better future. This happens because its members are focused on enhancing and expanding their collective consciousness  — where individual members are able to observe, learn and change together.  

In other words, they collectively create, support and sustain a organization that continually learns from their present situation. 

Case of Missing Gear Mesh Frequency


“Why don’t we see the Gear Mesh Frequency (GMF) on the output side of a splash lubricated slow speed gear box?”

This is quite puzzling since common sense dictates that such peaks should be present.

My Answer:

The principles involved are the following:

1. Air, water and oil produce turbulence when worked on by machines like pumps, gears, fans, propellers etc.
2. Such turbulence creates damping force.
3. This is proportional to the square of the velocity.
4. But this damping force acts in quite a funny manner.
5. For slow speed machines (say below 750 rpm; slower the better) damping is positive that is it goes against the motion and so neutralizes the entropy as seen by the decrease in the vibration levels. Hence the gear mesh frequencies vanish. Coriolis Effect on the output side of the gear box also helps in attenuating the vibration.
6. But for high speed machines damping is negative. That is it goes in the direction of the motion and therefore enhances the entropy as seen by the increase in the vibration levels.
7. So, for low speed machines it goes against the motion and suppresses the GMF. In some cases it suppresses the fundamental peak as is found in the case of the vertical Cooling Water Pumps of Power Plants. GMF is produced when the fundamental frequency is superimposed onto the vibration generated through gear impacts.
8. It therefore follows that for high speed gear boxes it magnifies both fundamental and GMF peaks.

Missing peaks therefore indicate fluid turbulence, which might also be indicated by other peaks like vane pass frequencies. The condition monitoring of such gear boxes might best be done through Wear Debris Analysis/Ferrography.

So, this is the mystery of the missing GMF in splash lubricated slow speed gear boxes.

Therefore, splash lubrication for a low speed gear box is a good idea. It enhances the life of the gear box since it balances the entropy in the system.

But at the same time, with higher oil level in a splash lubricated high speed gear box the vibration level would increase, specially the fundamental and the GMF. That would spell trouble.

Similarly, it is better to have a turbulent air flow in low speed fans and blowers. It suppresses the vibrations and therefore enhances the life of bearings.

Nature also uses these principles of fluid turbulence and damping? Applications?

1. Bird’s nest are made up of loosely placed twigs and leaves usually not bound to each other. But these don’t break up or fall off in turbulent winds. Damping keeps them in place and provides the necessary security to birds.

2. Swift flowing rivers allow fishes to grow bigger and better.

3. Winds, storms etc neutralize the increase in entropy.

Design Ideas for Reliability & Sustainability?

1. Low speed gear boxes might best be lubricated by splash lubrication.
2. High speed gear boxes might best be lubricated by spray lubrication
3. Hotter and turbulent air might best be handled by low speed fans and blowers.

Eccentricity in general

Symptoms are generally 1x radial (Vertical and Horizontal for a horizontally mounted machine).

Eccentricity occurs when the centre of rotation is offset (like offset misalignment) from the geometric centreline of a gear, motor rotor or a pulley.

It would generate strong 1x radial peak — in the direction parallel to the rotor/gear/pulley. This condition is common and mimics unbalance.

For gear eccentricity we would see 1x sidebands

For motor rotor eccentricity we would see pole pass sidebands.

Time waveform would be sinusoidal when viewed in velocity. Vibration from gear will also have gear mesh vibration and modulation of the turning shaft of the offending gear.

Phase: If belt driven, phase readings taken parallel and perpendicular to belts will either be in phase or 180 degrees out of phase. For a direct driven component, vertical and horizontal readings will be 90 degrees out of phase.

Applying IAR Technique

The above photo is that of an underground pipe carrying a fluid high in chloride concentration.

As can be seen in the photo — there is a big rupture of the pipeline leading to a unwarranted and unplanned plant outage.

What can engineers do about it? The usual way of thinking about is to quickly find a way to monitor the development of such failures in time so as to attend to the problem as quickly as possible with ruthless efficiency.

However, to monitor development of pitting / crevice corrosion – (the failure mode in this case) – is neither easy nor available. And even if there were a technique; its benefit, as in all cases of condition based maintenance of condition monitoring, is at best fundamentally limited in its effectiveness. Why? Simply because it doesn’t help improve the MTBF (Mean Time Between Failures). Without substantial improvement in MTBF, we can hardly expect to improve performance, productivity and profitability through maintenance of plant assets.

Such an approach to maintenance is more in keeping with the present requirements and needs of the industries. Maintenance is not about maintaining and restoring assets to its original condition. In present context, maintenance is directly linked to an organization’s profitability and survival under all economic conditions, at the minimum possible cost.

With this in mind, I invented the IAR (Initiator, Accelerator, Retarder) technique.

It means that for every failure or failure pattern or behavior pattern there would be at least one element in the system that would initiate or start the failure process. Similarly, there would be atleast one element in the system that would accelerate the process. Likewise, there would be atleast one element that would help retard the failure process.

Once these I, A, R elements are identified, the job of prolonging the life of a system at the least cost becomes relatively easy and more effective, which can be stated as follows:

1. Eliminate/Avoid the Initiator (s)

2. Prevent the Accelerator (s) from acting

3. Strengthen the Retarder (s)

The next step is to monitor the presence or development of the I, A , R s — if found contextually appropriate and effective.

Coming back to our case, the IAR s are the following:

Initiator — Material

Accelerator — Lack of cathodic protection

Retarder — Steady process that would prevent sudden increase in chloride concentration.

Therefore the set of solution would be:

Eliminate Initiator — Select material having high PREN (Pitting Resistance Evaluation Number)

Prevent Accelerator — install cathodic protection.

Strengthen Retarder — closely monitor the process to prevent sharp fluctuation of chloride concentration.

The above measures would not only help in preventing pitting/crevice corrosion but also prolong the life of the pipeline — helping the plant to become more productive and remain so for a longer time.

Moreover, it proves to be an effective maintenance planning tool than existing approaches.

Dibyendu De

Doing Nothing yet Everything is Done

From 21st June to 23rd June I conducted a live workshop on Streamlined Reliability Centered Maintenance (SRCM) at the Power Management Institute (PMI) of National Thermal Power Corporation (NTPC).

But what the heck is SRCM?

It is a structured process of risk based decision making against black swans.

In brief, it is about:

  1. How to detect an incipient black swan in time?
  2. How to improve the stability of a system?
  3. How to improve the longevity of a system?
  4. How to mitigate consequences of failures?

When we are able to do all that to a system we may call it “smart maintenance.” After all as human beings we create, maintain and destroy systems. Given a system, smart maintenance is about doing all the three – create, maintain and destroy. Surely, it is one of the most complex project management we can engage with.

However, the smart maintenance can really happen when one simply does nothing yet everything is done.

The Case of Burning BagHouse Filters

Recently I was invited to investigate a case of frequent burning of baghouse filter bags.

There were five such baghouses connected to five furnaces of a steel plant.

The client reasoned that the material of the bags was not suitable for the temperature of the gas it handled. However, with change of material the frequency of bag burning did not change. So it needed a different approach to home onto the reasons for the failures.

Hence, this is how I went about solving the case:

First I did a Weibull analysis of the failures. Engineers use Weibull distribution to quickly find out the failure pattern of a system. Once such a pattern is obtained an engineer can then go deeper in studying the probability distribution function (pdf). Such a pdf provides an engineer with many important clues. The most important clue it provides is the reason for such repeated failures, which are broadly classified as follows:

  1. Design related causes
  2. Operation and Maintenance related causes
  3. Age related causes.

In this case it turned out to be a combination of Design and Age related causes.

It was a vital clue that then guided me to look deeper to isolate the design and age related factors affecting the system.

I then did a modified FMEA (Failure Mode and Effect Analysis) for the two causes.

The FMEA revealed many inherent imperfections that were related to either design or aging.

Broadly, the causes were:

  1. Inability of the FD cooler (Forced Draft cooler) to take out excess heat up to the design limit before allowing the hot gas to enter the bag house.
  2. Inappropriate sequence of cleaning of the bag filters. It was out of sync with the operational sequence thus allowing relatively hot dust to build up on the surface of the bags.

Next, the maintenance plan was reviewed. The method used was Review of Equipment Maintenance (REM). The goal of such a review is to find maintenance tasks that are either missing or redundant for which new tasks are either added/deleted or modified. With such modification of the maintenance plan the aim is to achieve a balance between tasks that help find out incipient signals of deterioration and tasks that would help maintain longevity and stability of the system for a desired period of time.

Finally the investigation was wrapped up by formulating the Task Implementation Plan (TIP). It comprised of 13 broad tasks that were then broken up into more than 100 sub-tasks with scheduled dates for completion and accountability.


Love complex problems? Read Poetry

Solving difficult and complex problems is not easy. This is because established techniques can not be used to solve those. There are no ready made formula or equations or methods or fixed viewpoints to sense and address a complex issue. At least that is what I have come to believe through my experience. Instead, as I understand, human qualities play a more important role than processes, viewpoints, methods and techniques.

To my mind, one of the basic human qualities that is needed is empathy. Empathy helps us to be one with the problem. We no longer act as passive, neutral or judgmental observers viewing a problem from a distance or from proverbial ivory towers. Unless we can become one with the problem the problem would continue to elude us. And failing to make any realistic sense of a problem would only spawn superficial solutions that would invariably fail to address the complexity of an issue.

But how to develop empathy? I am sure that there are many ways to do that. However, I find that reading and appreciating poetry is one of the sure ways to develop empathy.

Let us take an example to make it clear. Europe is facing a “refugee crisis.” Leaders of Europe are fighting over it. Their differing opinions are at so much misaligned that the stability of the European Union stands threatened. Now, if someone asks me to think about the “refugee problem” and come up with at least an opinion I would certainly fail to do so. I am far away from the scene. I am not affected. I am not a “refugee” myself. I don’t understand the pain and the reason why people are forced to leave their homes, desperately trying to secure a foothold in a foreign land. Well, I can sympathize with the refuges but at the same time I would be unable to make a proper understanding of the effectiveness of various government policies crafted with the intention to tackle the situation. I can of course decide to read up a lot on the issue to make an informed judgement. Reports appearing in social media, newspapers, journals, reports in form of videos and pictures would perhaps serve my purpose. I can also decide to engage in intelligent dialogues with people who are knowledgeable about the subject. But it would still not help me much. In fact, I have been trying to make sense of the problem through such means ever since the problem surfaced and gained international prominence. But frankly I did not have a clue about the reason or its solution. When my friends wanted to discuss this issue I always felt dumb. So, instead of airing my views I was contend to be a good listener.

Then, one day, I happen to read this poem “Home” by Warsan Shire.

no one leaves home unless
home is the mouth of a shark
you only run for the border
when you see the whole city running as well

your neighbors running faster than you
breath bloody in their throats
the boy you went to school with
who kissed you dizzy behind the old tin factory
is holding a gun bigger than his body
you only leave home
when home won’t let you stay.

no one leaves home unless home chases you
fire under feet
hot blood in your belly
it’s not something you ever thought of doing
until the blade burnt threats into
your neck
and even then you carried the anthem under
your breath
only tearing up your passport in an airport toilets
sobbing as each mouthful of paper
made it clear that you wouldn’t be going back.

you have to understand,
that no one puts their children in a boat
unless the water is safer than the land
no one burns their palms
under trains
beneath carriages
no one spends days and nights in the stomach of a truck
feeding on newspaper unless the miles travelled
means something more than journey.
no one crawls under fences
no one wants to be beaten

no one chooses refugee camps
or strip searches where your
body is left aching
or prison,
because prison is safer
than a city of fire
and one prison guard
in the night
is better than a truckload
of men who look like your father
no one could take it
no one could stomach it
no one skin would be tough enough

go home blacks
dirty immigrants
asylum seekers
sucking our country dry
niggers with their hands out
they smell strange
messed up their country and now they want
to mess ours up
how do the words
the dirty looks
roll off your backs
maybe because the blow is softer
than a limb torn off

or the words are more tender
than fourteen men between
your legs
or the insults are easier
to swallow
than rubble
than bone
than your child body
in pieces.
i want to go home,
but home is the mouth of a shark
home is the barrel of the gun
and no one would leave home
unless home chased you to the shore
unless home told you
to quicken your legs
leave your clothes behind
crawl through the desert
wade through the oceans
be hunger
forget pride
your survival is more important

no one leaves home until home is a sweaty voice in your ear
run away from me now
i dont know what i’ve become
but i know that anywhere
is safer than here

By now, I am sure that I need not talk one more word about what is empathy and how one can develop this quality if you have felt something stir inside you.

Astonishing, as it is, by the time we come to the end of the poem, both the reason for the crisis and the probable solutions are self evident even though the poet has neither offered a reason nor proposed any possible solution.

That is the power of empathy, which as I have discovered, may be greatly enhanced and sustained through the power of poetry.


Role of Critical Thinking in Solving Complex Problems

Within the next five years the ability to solve complex problems would be the number one skill people would be desperately looking for.

However, development of this skill rests on three fundamental pillars, which are:

  1. Critical Thinking 
  2. Creativity 
  3. Seeing” things differently

In this post, we focus on the skill of critical thinking. To do so, we draw inspiration from ten examples of critical thinking and critical thinkers that changed our world and our world views.

“Being bold enough to let your mind go where good arguments take you, even if it’s to places that make you feel uncomfortable, may lead you to discoveries about the world and yourself.”

(Critical Thinking: The Art of Argument, by George W. Rainbolt and Sandra L. Dwyer)

Read more

Creativity in Solving Complex Problems

The other day, at the end of my seminar on “Solving Complex Engineering Problems” a delegate asked me as to whether the entire process of solving complex problems can be automated in some way by means of a software instead of relying on human creativity.

Such a response wasn’t unexpected. In the corporate world the word “creativity” is often looked at with suspicion. They would rather prefer structured and standard approaches like “brainstorming” at 10.00 am sharp or team work or collaborative effort, which in my opinion do little to help anyone solve complex problems or even address complex problems correctly.

That might be the single most important reason why “complex problems” remain unresolved for years affecting profitability and long term sustenance of an organization. Failing to resolve complex problems for years often earns such problems the sobriquet of “wicked problems”, which means that such problems are too tough for “any expert” to come to grips with.

What they sadly miss out is the role of creativity in solving complex problems, which no automation or technology can ever replicate. They miss this because most organizations systemically smother or mercilessly boot out any remnant of creativity in their people since they think that it is always easier to control and manage a regimented workforce devoid of even elementary traces of creativity.

So, is managing creativity and creative people a messy affair? On the surface it seems so. This is simply because we generally have a vague idea of what drives, inspires and really sustains creativity?

Creativity is not about wearing hair long or wearing weird clothes, singing strange tunes, coming to office late and being rude to bosses for no apparent reasons. These things hardly make anyone creative or help anyone become a more creative person.

Actually, things like “being attentive and aware”, “sensitive”, “passionate”, “concerned”, “committed” and above all “inventive” just might be the necessary ingredients to drive, inspire and sustain creativity.


Though there are many ways of describing and defining creativity what I like best is – “creativity is the expression of one’s understanding and expression of oneself” – deeper the understanding better the expression of creativity.

When we look at creativity in this manner it is obvious that we are all creative though the expression and its fidelity might vary to a great extent. Clearly, some are simply better than others.

Further, if creativity may be thought about as a process, then the inputs and the clarity of understanding of ourselves are more valuable elements of the system than the outputs that the process anyway consistently churns out (remember the uncountable hours we spent in organization meeting, discussing and brainstorming to solve complex problems).

In these days of economic depressions, organizations can really do themselves a huge favor if only they pay more attention to facilitating such inputs to people rather than get overtly worried about control and management by conformity.