Rules of Thumb about Decision Making

  1. Consider what you want to achieve, avoid, sustain or improve. Focus on results you want to achieve. Don’t focus on actions. Create goals accordingly. In short a goal is the gap between the vision and current reality.
  2. Generate many options to achieve the goal. Don’t stick to a few options.
  3. Decisions are based on published, measurable criteria and never on an ad hoc basis.
  4. Criteria can usually be classified into “Must” and “Want”. The Must criteria must be satisfied for the option/recommended solution to be viable.

Rules of Thumb — Engineering Communication

One of the most important tasks of engineers, managers, facilitators, guides, mentors, consultants and trainers is to communicate.

Without right and effective communication, nothing seems to get done. One may be working very hard but he/she would fail to see results on the ground. That is indeed very frustrating. The secret is — unless people are involved, nothing worthwhile gets done. The goal of communication is to involve people.

In this article, I would like to focus on the Communication Process

Hence some basic rules of thumb on the Process of Engineering Communication:

  1. Audience Based: Use an audience based and not a writer based approach. The content of the communication (whether written or verbal) must only address questions the audience wants answered. There is little or no point of dumping any information the writer seems knows on the subject. In such cases, a knowledgeable writer or communicator is internally motivated to share whatever he/she knows about a subject. Hence the length of such presentations or written communication is directly proportional to the length of time a communicator has spent researching on a subject. This becomes quite boring to an audience. Worthwhile to remember that there is no need for a communicator to be perceived as a highly knowledgeable person. The audience would decide that anyway. Just in case, one is forced to communicate in a structured fashion the structure should be hierarchial — i.e. — arranged in order of importance to an audience starting from what the audience “must know” about a question to going what is “good to know” type of information, which in most cases can be avoided.

    In any case, the communication structure must not be historical or chronological, unless the audience is interested in the history of a subject. In my workshops, I start off by asking what the audience wants to know and then address their questions one by one. Once you adopt this approach you would see how time flies and how stressfree the environment becomes. The golden rule is — there is no universally accepted template that fits all communications; each communication must answer the questions of the audience.
  2. Be Effective: Effective communication is linked to two important things — a) Problem Solving Skills b) Clear Thinking skills. However, if I were to choose between the two skills I would go for Problem Solving skill alone. This is because good and effective problem solvers develop clear thinking skills, without which it becomes difficult to solve any problem worth its salt. But why is effective communication linked to problem solving skills? This is because any type of audience loves to hear stories. And real life stories keep an audience glued to the communicator. When a communicator tells a story the tension and suspense created are palpable. This moves the audience to be attentive. Moreover, as I have seen, audience learns most from stories. Needless to mention that helping others to learn through their self awareness and then seeing them act upon it is the fundamental effect of any good communication.
  3. Welcome Confusion: Confusion is always welcome since it forces the communicator to check back and rethink his/her thinking. A good way to bring out ‘confusion’ in the open is to trigger a feedback loop. Simply stated, it means asking the audience — which parts of the communication wasn’t well understood or appreciated by the audience. This not only helps the audience stick to the flow of communication but also generates life in the communication.
  4. Let ideas flow: Ideas must coherently flow from one idea to the next. In short, the ideas must be well knitted together and expressed cogently. This helps an audience see the whole picture and appreciate the depth of a topic. It simply generates interest on an on-going basis that propels the audience to think and act on the their understanding. There is no need to stick to one idea throughout a presentation. One may ofcourse, dwell on an idea for sometime before logically connecting the idea to the next idea. Connecting ideas makes a presentation triggers the thinking process of the audience, which is worth its weight in gold.
  5. Revise: Be willing to rethink, revise and rework the whole structure if the structure of the communication doesn’t meet the needs of an audience. Though grammar and style are important there is no great need to endlessly polishing those. And be more than willing to delete or discard entire sections already written instead of incorrectly hoping to deliver everything that is written.
  6. Coherent Written Plan: Develop a coherent written plan — especially when one is supposed to speak on a topic or present verbally. A simple cue card (5 x 7 inches card) with a few bullet point maybe sufficient to keep one pegged to the overall picture. This also helps to keep the mind of a presentator poised and calm. Any tension in the mind of a presentator would soon show up and get communicated. Audience would notice the internal tension of a presentator and they wouldn’t like it much.
  7. Rehearse: Don’t forget to rehearse. Communication like all performance arts needs deep rehearsing. There are many ways of rehearsing, some of which are — a) read aloud (it helps to uncover flaws very quickly) b) present it to your children or spouse or friends c) rehearse in front of a mirror d) mentally rehearse the topic including the possible gestures, pauses etc one is likely to incorporate in the presentation (this can be done even while taking a shower).

By Dibyendu De

Why Systems Fail to Achieve RAM requirements?

Folowing is a summary of the reasons as to why manufacturing plants fail to achieve the desired level of RAM (Reliability, Availability and Maintainability) requirement:

• Poorly defined or unrealistically high RAM requirements.

* Lack of priority on achieving Reliability & Maintainability

* Too little engineering for RAM

  • Failure to design-in reliability early in the development process
  • Inadequate lower level testing at component or subcomponent level
  • Reliance on prediction instead of conducting engineering design analysis.
  • Failure to perform engineering analysis of commercial off the shelf equipment
  • Lack of reliability improvement incentives
  • Inadequate planning for reliability
  • Ineffective implementation of Reliability Tasks in improving reliability
  • Failure to give adequate priority to the importance of integrated Diagnositics (ID) design — influence on overall maintainability atributes, mission readiness, maintenance concept design and associated LCC support concepts.
  • Unanticipated complex software integration issues affecting all aspects of RAM
  • Lack of adequate Integrated Diagnostic (ID) maturation efforts during system integration.
  • Failure to anticipate deisgn integration problems where incremental design approaches influence RAM performance.



Strange case of Semi-elliptical cracks

Assembly: Mill having a power input of more than 1000 KW

Sub-assembly: Gear box

Part: Casing

Location: Around the area of the torsion shaft on the output side

Failure Mode: Casing crack in semi-elliptical pattern (a probabilistic failure mode)

Description: semi-eliptical crack — the pink area on the photo

Semi-elliptical crack on the gearbox casing – output side around torsion shaft area — Figure 1
Zoomed view of the crack as shown above — Figure 2


Phenomenon: Thermal Fatigue (such semi-elliptical cracks are distinguishing patterns for thermal fatigue; but not necessarily the only pattern of failure due to thermal fatigue).

Such a crack is a combination of many interactions taking place simultaneously.

However, the initiator of such a problem with casing is the likelihood of blowholes or casting defects in the casing. On examination this was confirmed.


1. Thermal fatigue then accelerates the development of such a crack and especially so, if the casing thickness is not much.
2. Induced by rotating bending of the torsion shaft. Hence the crack appeared around the middle of the casing around the torsion shaft area.
3. Temperature of the gearbox was higher on the output side. Around 70 degrees C.

Nature of Crack growth and development:

When the depth of the crack is around 0.025 to 0.25 mm the shape of the crack becomes semi elliptical (check depth of crack). The threshold limit is around 0.16 mm, when initial crack starts to propagate.

When such a crack starts to develop and propagate a cyclic noise is audible.

Other areas where Thermal Fatigue can cause failures:

1. Rolls of rolling mills — can be experienced anytime during operation
2. Gear Boxes — usually experienced in winters or relatively cold ambient temperature
3. Motor Windings — usually experienced in topical summers
4. Anti-friction bearings — can be experienced anytime during operation

Method of Montoring: (Varied): But some of the methods might be as follows:

1. Infra-red thermal imaging — especially for motors
2. Visual — powerful technique — both sight and sound
3. Vibration analysis — especially for rolls and anti-friction bearings
4. Temperature monitoring (differential or temperature distribution) — for gearbox casings and motor casings.

Precaution during purchase: (Design Review)

1. Casing thickness
2. Allowable temperature rise
3. Lubricant
4. Cooling system

Reliability Improvement of Physical Assets

What does reliability improvement of physical assets mean?

To answer this question we must first understand what we mean by reliability?

The word reliability conjures up different meanings for different people. To some it might mean quality. For others it might mean safety. While for some it might mean integrity. And for some it might mean a long and useful life of a machine.

That might seem very confusing.

Hence, in short, the word reliability means — “Whatever a user wants of his/her physical assets.”

If that is so, then reliability improvement would simply steer us towards problem solving activities that would help an user of physical assets achieve his/her desired intention.

However, solving problems to improve reliability or MTBF can be quite daunting. This is because the upper limit of reliability of machines is practically set during the design and manufacturing stage. So, to improve reliability during the operation stage is quite difficult though not impossible.

But solving problems to improve reliability during operation can be complex. Hence the entire process of problem solving has to broken up into manageable chunks and executed in a step-by-step manner.

However, the end results of improving reliability of a machine would be:

  1. Relax — to continue in an operational mode as long as possible. This is achieved by ensuring that a machine loses minimum energy during running.
  2. Resilent — to make a machine work with minimum stress under varying conditions, interactions and contexts. This is done by modifying interactions and initial conditions to extend the MTBF to a level — dictated by business goals.
  3. Rejuvenate — to make and keep a machine in a healthy state of performance without disturbing production and production cycles.

Behind all these activities the most important factor that is essential is the application of human awareness. Without it, success of a reliability improvement project is simply not possible, whatever the tools, techniques, process or methods might be.

How do we recognize, develop and apply human awareness is a million dollar question?

Plant Reliability Improvement — RAPID

A) The Background

Engineers have tried to address the issue of Reliability of equipment and systems in various ways. Therefore, many methods and techniques are in existence. The usual approach pivots around the concept of “failure modes” and how best to guard against those so as to prevent the consequences of failures. However, the existing methods do not take into account, flow of energy as an important factor that determines reliability of plant and machinery.

Taking this into consideration, I am putting together some rules and principles that hopefully point to a path that might enable engineers to improve plant reliability with minimum effort, resources and time.

But before I do that I would like to put forward an easy understanding of the term — reliability.

B) Reliability of a Machine:

The period of time a machine would run (fulfils its function) without a problem or trouble.

Longer the period of trouble free running better is the reliability of the plant.

It therefore addresses the heart of reliability improvement — i.e. enhancement of useful operating life of a machine or a system of machines.

C) What does that mean in terms of energy flow?

We may say that a machine that can continue with the smooth flow of energy, with the minimum wastage of energy for a long period of time is more reliable than a machine where energy flow is disrupted frequently in some manner or the energy wastage is high or the energy is pushed out of equilibrium condition, which invariably stops the machine from functioning effectively.

D) So what might be the job of engineers?

It is evident that the fundamental job of engineers (both operation and maintenance) is to run and maintain a machine or system in such a way so that the energy wastage is minimized and smooth flow is ensured for a long or desired period of time.

E) How can this be done?

This may be done in three fundamental ways, which are as follows:

  1. Observe the dynamics of a machine or system to ascertain energy flow patterns and the degree of energy wastage and the reasons for such wastage. Also determine the degree of stability or instability of the system and what affects a system’s stability.
  2. Adjust or maintain the conditions within its operating context to ensure smoother flow of energy with minimum wastage. In the process, learn what changes in a machine/system would disrupt energy flow or push it out of equilibrium conditions and prevent such disruptions.
  3. Change, monitor, modify the system (made up of physical asset and components, process, information flow, analysis and decision making, teams) as necessary, for smoother flow of energy to continue over longer period of time with minimum energy wastage.

F) How to apply it in a real plant?

I have found the following method useful in various plants where I implemented Reliability Improvement Programs.

  1. Make a list of critical machines.
  2. Select a critical machine along with its sub-systems (machines that support its functioning)
  3. Apply the three steps as outlined in Section E (above) — within the operating context.
  4. Improve and stabilize performance; record the learning.
  5. Create a custom made monitoring system to spot changes in time along with custom made expert system to guide engineers to quickly decide the course of actions to be undertaken.
  6. Record the decisions, actions and changes in the form of equipment history.
  7. Move to the next critical machine and its sub-systems.
  8. As we go along, first check whether the all consequences of failures have been taken care of. Next check whether overall plant/area/section MTBF (Mean Time Between Failures) is going up and whether MTTR (Mean Time to Repair) is going down along with consequential lowering of maintenance and operation costs. Lastly check the accuracy of the custom made expert system (usually made up of multivariate parameters) in is ability to forewarn and guide maintenance decisions.

G) General Rules:

Keeping the above in mind I formulated the following four rules that might help engineers managers stay on the path of plant reliability improvement:

  1. For any machine, energy tries to move in sync through all elements of a machine through various interfaces against many contradictions and constraints but always choosing the path of least resistance.
  2. Changes in contradictions, constraints and interfaces change the quality of energy flow forcing energy flow to go out of equilibrium (instability) to either cause degradation of performance or cause failures that lead to unwarranted plant stoppages, affecting costs and productivity.
  3. Changes and the causes of such changes are reflected in the dynamics of a machine in terms of interdependent parameters like vibrations, heat, flow, wear, humidity, temperature, pressure etc.
  4. Reliability of any machine or system can be improved by either maintaining the contradictions, interfaces and constraints to “just right conditions” or changing those to enable smoother flow of energy for a longer period of time with minimum wastage.

H) Applications:

Having applied these basic rules some industrial plants were able to gain on-going benefits for years. Here are some examples of — Plant Wide Reliability Improvement

Fixing Organizational Problems

Every day, managers in different organizations face an array of problems. Usually, such problems keep repeating — either randomly or at regular intervals. After a while, it then becomes clear to the managers that such problems resist current ways of thinking and actions as practiced within the organization. 

The réponse to such problems is — “How this problem can be fixed permanently?” 

A manager would then try to apply known theories, methods, and tools to solve the problem. And in this process, the managers can also increase their skills. But the problem is that the problems simply don’t vanish. They have a bad habit of sneaking back through the backdoor. 

Why is that?

The short answer is — “No problem can be fixed, at least permanently.

This is because the nature of the problems keeps changing with time or the same problem comes back with different intensity or frequency. 

However, one can find and install new guiding ideas. And one can intently engage in redesigning an organization’s infrastructure, policies, rules, methods, and the tools presently used to find new ways of dealing with work and problems. 

The key is to closely observe what is going on in the present and then discover the organizational subconsciousness (mindset) that allows such events to happen with alarming regularity or randomly. Once that mindset is found, a manager can then find new ways of thinking and practices to replace the old governing mindset. 

If one keeps going in this way one can gradually evolve a new type of organizations that is responsive, agile and observant about the numerous interactions that go within an organizational environment to become a better and a fitter organization. 

It would then be able to deal with the problems and opportunities of today and invest in its capacity with the right resources and efforts to embrace a better future. This happens because its members are focused on enhancing and expanding their collective consciousness  — where individual members are able to observe, learn and change together.  

In other words, they collectively create, support and sustain a organization that continually learns from their present situation. 

Case of Missing Gear Mesh Frequency


“Why don’t we see the Gear Mesh Frequency (GMF) on the output side of a splash lubricated slow speed gear box?”

This is quite puzzling since common sense dictates that such peaks should be present.

My Answer:

The principles involved are the following:

1. Air, water and oil produce turbulence when worked on by machines like pumps, gears, fans, propellers etc.
2. Such turbulence creates damping force.
3. This is proportional to the square of the velocity.
4. But this damping force acts in quite a funny manner.
5. For slow speed machines (say below 750 rpm; slower the better) damping is positive that is it goes against the motion and so neutralizes the entropy as seen by the decrease in the vibration levels. Hence the gear mesh frequencies vanish. Coriolis Effect on the output side of the gear box also helps in attenuating the vibration.
6. But for high speed machines damping is negative. That is it goes in the direction of the motion and therefore enhances the entropy as seen by the increase in the vibration levels.
7. So, for low speed machines it goes against the motion and suppresses the GMF. In some cases it suppresses the fundamental peak as is found in the case of the vertical Cooling Water Pumps of Power Plants. GMF is produced when the fundamental frequency is superimposed onto the vibration generated through gear impacts.
8. It therefore follows that for high speed gear boxes it magnifies both fundamental and GMF peaks.

Missing peaks therefore indicate fluid turbulence, which might also be indicated by other peaks like vane pass frequencies. The condition monitoring of such gear boxes might best be done through Wear Debris Analysis/Ferrography.

So, this is the mystery of the missing GMF in splash lubricated slow speed gear boxes.

Therefore, splash lubrication for a low speed gear box is a good idea. It enhances the life of the gear box since it balances the entropy in the system.

But at the same time, with higher oil level in a splash lubricated high speed gear box the vibration level would increase, specially the fundamental and the GMF. That would spell trouble.

Similarly, it is better to have a turbulent air flow in low speed fans and blowers. It suppresses the vibrations and therefore enhances the life of bearings.

Nature also uses these principles of fluid turbulence and damping? Applications?

1. Bird’s nest are made up of loosely placed twigs and leaves usually not bound to each other. But these don’t break up or fall off in turbulent winds. Damping keeps them in place and provides the necessary security to birds.

2. Swift flowing rivers allow fishes to grow bigger and better.

3. Winds, storms etc neutralize the increase in entropy.

Design Ideas for Reliability & Sustainability?

1. Low speed gear boxes might best be lubricated by splash lubrication.
2. High speed gear boxes might best be lubricated by spray lubrication
3. Hotter and turbulent air might best be handled by low speed fans and blowers.

Eccentricity in general

Symptoms are generally 1x radial (Vertical and Horizontal for a horizontally mounted machine).

Eccentricity occurs when the centre of rotation is offset (like offset misalignment) from the geometric centreline of a gear, motor rotor or a pulley.

It would generate strong 1x radial peak — in the direction parallel to the rotor/gear/pulley. This condition is common and mimics unbalance.

For gear eccentricity we would see 1x sidebands

For motor rotor eccentricity we would see pole pass sidebands.

Time waveform would be sinusoidal when viewed in velocity. Vibration from gear will also have gear mesh vibration and modulation of the turning shaft of the offending gear.

Phase: If belt driven, phase readings taken parallel and perpendicular to belts will either be in phase or 180 degrees out of phase. For a direct driven component, vertical and horizontal readings will be 90 degrees out of phase.

Applying IAR Technique

The above photo is that of an underground pipe carrying a fluid high in chloride concentration.

As can be seen in the photo — there is a big rupture of the pipeline leading to a unwarranted and unplanned plant outage.

What can engineers do about it? The usual way of thinking about is to quickly find a way to monitor the development of such failures in time so as to attend to the problem as quickly as possible with ruthless efficiency.

However, to monitor development of pitting / crevice corrosion – (the failure mode in this case) – is neither easy nor available. And even if there were a technique; its benefit, as in all cases of condition based maintenance of condition monitoring, is at best fundamentally limited in its effectiveness. Why? Simply because it doesn’t help improve the MTBF (Mean Time Between Failures). Without substantial improvement in MTBF, we can hardly expect to improve performance, productivity and profitability through maintenance of plant assets.

Such an approach to maintenance is more in keeping with the present requirements and needs of the industries. Maintenance is not about maintaining and restoring assets to its original condition. In present context, maintenance is directly linked to an organization’s profitability and survival under all economic conditions, at the minimum possible cost.

With this in mind, I invented the IAR (Initiator, Accelerator, Retarder) technique.

It means that for every failure or failure pattern or behavior pattern there would be at least one element in the system that would initiate or start the failure process. Similarly, there would be atleast one element in the system that would accelerate the process. Likewise, there would be atleast one element that would help retard the failure process.

Once these I, A, R elements are identified, the job of prolonging the life of a system at the least cost becomes relatively easy and more effective, which can be stated as follows:

1. Eliminate/Avoid the Initiator (s)

2. Prevent the Accelerator (s) from acting

3. Strengthen the Retarder (s)

The next step is to monitor the presence or development of the I, A , R s — if found contextually appropriate and effective.

Coming back to our case, the IAR s are the following:

Initiator — Material

Accelerator — Lack of cathodic protection

Retarder — Steady process that would prevent sudden increase in chloride concentration.

Therefore the set of solution would be:

Eliminate Initiator — Select material having high PREN (Pitting Resistance Evaluation Number)

Prevent Accelerator — install cathodic protection.

Strengthen Retarder — closely monitor the process to prevent sharp fluctuation of chloride concentration.

The above measures would not only help in preventing pitting/crevice corrosion but also prolong the life of the pipeline — helping the plant to become more productive and remain so for a longer time.

Moreover, it proves to be an effective maintenance planning tool than existing approaches.

Dibyendu De