stop button

Planning is the single most important element in effective outage management. Phillip Webster, Stephen Nelson and Steven Maloney from Black & Veatch outline the major steps that need to be implemented in order to successfully execute a maintenance outage for a power generating facility.

There are a combination of factors in play in Africa and around the world that are making electricity more and more critical to the success of individuals, countries and regions. With this emphasis on the value electricity brings to developing communities, it is also crucial to the production of nearly every industry. With that value, however, comes a risk because of the inter-dependence between industry and individuals. Any failure or shortcoming in the maintenance of a country’s electric power infrastructure has a profound impact on its development and in particular its economy.

Power generating asset maintenance is necessary in order to ensure the long-term reliability of the generating plant. Failure to properly execute maintenance outages results in poor outage performance, and therefore extends the outage, and it can also contribute to poor reliability post outage, if the proper maintenance is not executed during the time allowed.

Most outage practitioners, therefore, would agree that planning is the single most important element in effective outage management. Wherever possible, every major step before the outage starts, during the outage and through the post-outage activities should result from a carefully planned sequence of steps. It is simple advice but without context and detail it is not particularly helpful. So the objective is to provide a better understanding of the details that should be addressed well in advance of the start of an outage.

Figure 1 depicts a typical outage lifecycle, beginning with initial planning and progressing through the completion of the post-outage activities and feedback loop to subsequent planning. The timing of each of the activities shown can vary slightly.

figure 1
Figure 1: Schematic outlining a typical outage project lifecycle

The outage lifecycle graphic proposes the process starts 18 months before the expected outage start date. However, many companies with very mature outage management processes start planning a full two years ahead, and it is not uncommon in the case of major capital projects for the lead time for design and/or procurement to necessitate starting earlier. What is important to capture is the sequence of activities that must occur and to understand the objectives of each major step in the process.

Strategic planning

The initial determination of what work will be completed during the outage should be based upon a maintenance strategy for each major system, the preventive maintenance programme and any condition assessment (i.e. predictive maintenance) that identifies a need for maintenance in order to ensure the long-term reliability of the plant.

It is often considered to be the engineers’ responsibility to identify the issues to be addressed and the means to resolve them. However, this understates the importance of the Operations and Maintenance (O&M) staff.

The maintenance strategy should be championed by the engineering function with direct input and support from Operations. It is the plant operators who see problems and can identify deviations from expected or historical performance. Likewise the plant maintenance staff has an intimate knowledge regarding the equipment they maintain and in many cases know is not performing as designed.

Including these functional entities in the very early stages of the process fosters a sense that everyone has involvement and, even more importantly, everyone has a responsibility to contribute. This collection of knowledge from the respective stakeholders (Engineering, O&M) should be incorporated into an ‘Outage Charter’. The charter describes the primary functions to be achieved during the outage and in essence becomes the contract between Operations and Outage Management.

Once the charter defines what needs to be done, the next step is the ‘Who’. To facilitate this, an outage organisation should be identified. Usually these are individuals with this role on an ongoing basis, but sometimes O&M can designate individuals, take them out of their normal roles and assign them outage responsibilities. The charter is complemented by a defined set of standard roles and responsibilities for each position on the organisation chart (Figure 2). It is as important to define what the individuals will be responsible for performing as it is to define who the individuals are.

figure 2
Figure 2: A typical outage organisation chart

The Outage team and organisation chart’s standard roles and responsibilities are incorporated into the outage documentation and plan. It is important to note that some of the roles are not part of the typical maintenance organisation. For example, it is recommended that the Outage team include personnel with responsibility for budget/cost control, procurement and, although not shown, an individual be designated Environmental Health and Safety Co-ordinator. Each position should be associated with a named individual or individuals, a job description, roles and responsibility, and limits of authority.

Outage duration

When the outage is initially conceived and scheduled as part of the overall plant operating plan, it is necessary to make a reasonable estimate of its expected duration. In most cases, these estimates are based on prior outage projects with comparable work scopes. That prior experience may come from within the plant organisation or may be based on information provided by external resources, such as an engineering firm, equipment manufacturer, or a trusted contractor.

An empirical determination of maximum outage duration, called the ‘critical path’, should be performed and is based upon many factors such as resource and equipment availability, and interferences and/or overlaps between milestones that impact the schedule. There may be many milestones and critical paths associated with the milestones, but the true Critical Path Method (CPM) will include those activities that make up the critical or longest path that exists between the opening and closing of the main output breaker (Figure 3).

figure 3
Figure 3: An example of a critical path – red indicates the longest or ‘critical’ path

It is at this phase in the project when the Outage team begins to establish the primary control mechanisms in the Computerised Maintenance Management System (CMMS) or Enterprise Asset Management (EAM) system. The team will have identified all of the major projects to be completed during the outage, and will initiate actions relative to the long lead time procurement required.

The CMMS or EAM system includes both financial controls as well as the maintenance management functions, which the project team will have identified for the major projects and established the budget estimates. A Level 1 schedule should be produced showing the major milestones for the outage. It is this structure that will be used to manage the scope schedule and budget throughout the execution of the outage.

Work scope development

The team’s system subject matter experts should create task level scope documents for each work package. These documents will begin to describe each project and to define the specific activities and requirements necessary to manage the work. The details to be developed at this point in the process include:

  • Work orders and task sequence
  • Labour resources
  • Material
  • Equipment
  • External contracts
  • Testing requirements
  • Special tools/fabrication
  • Commissioning sequences
  • Operations/clearance requirements
  • Safety considerations/concerns
  • Permitting requirements
  • QA/QC

These details are usually developed by the responsible engineer with significant input from O&M, and if possible subject matter experts from a corporate or central engineering function. Often the contractors and/or the OEM are contacted for input. Once the initial documents are created, the package should be validated via a formal peer review.

table 1

Peer review is an Outage Management team responsibility performed prior to scope freeze, i.e. sometime before T-12 (12 months prior to the outage execution date). Outage Management acts as the facilitator for the meeting of all of the responsible work teams. During this peer review there should be a discussion of the potential issues that could compromise the scope, schedule or budget. Many companies start this process with a review internal to the engineering function only, but that step alone is not adequate and a more comprehensive review can be achieved by including the O&M staff and plant management.

In all of these reviews it is critical to refer back to the Outage Charter to ensure that the work envisioned still accomplishes the intended objectives. The engineers should use the results of this review to create contingency plans that quantify the potential variances. Once defined the engineers should, wherever possible, identify potential alternatives to mitigate or at least manage that contingency.

The official start of the outage is the point in time when the generator breaker is opened, despite there being a number of activities that typically start before the outage officially starts. It is this point in time (i.e. T-12) when the project should declare a scope freeze. Once frozen any change in the outage scope, regardless of whether it is an addition or a removal, should be managed via a formal change management process with the appropriate documentation and formal approvals.

It is this phase of the project where the real detail of the work scope and schedule are determined. The primary requirements of this stage include:

  • Create work order and tasks level planning;
  • Each project and work package is scheduled to a Level 3 (resource) or Level 4 (detailed);
  • All projects integrated into a master outage schedule;
  • Create commissioning plans;
  • Perform detailed contingency planning;
  • Establish required engineering support for selected projects; and
  • Create quality assurance plans.

The deliverables to be completed by the end of this phase of the project are focused on the scope deliverables and the detailed integrated schedule. With respect to the scope, each work package will have the detailed engineering scope documents, an approved master scope document, and project contingency lists. In parallel, the task cost estimates will be updated to recognise the detail available from the more detailed analysis. Lastly the project schedule should be updated to acknowledge the detailed task lists, the resource requirements and the inter-relationships between the respective tasks, both within a single project and collectively between the projects. Levels of scheduling definition:

  • Level 1 – Planning without resources/ project management & strategy (major milestones)
  • Level 2 – Monitoring progress without resources/project control level
  • Level 3 – Scheduling with resources/cost account & work packages
  • Level 4 – Monitoring progress of a resourced schedule/measurable items
  • Level 5 – Work steps

Optimisation and mobilisation

This phase more than any other is the point at which the experience of the Outage Management team will be evident. The optimisation of the outage elements requires extensive experience and in-depth knowledge of the outage process in order to identify and take advantage of opportunities to improve on the outage plan.

Several types of risk may occur during the various phases of the project. Safety, environment, equipment reliability and capacity are only a few examples of risk sources that may impact a project. Each should be evaluated against the expected probability of occurrence and the impact of the occurrence. These risks may be influenced by time, money, resources and/or equipment constraints. Identification, response and control of these risks are essential elements that will contribute to the successful execution of a project.

Execution

Obviously the execution of the outage is the phase that is most visible and critical to the overall success. Even though the term ‘execution’ implies all the activities occur after the outage has started (i.e. breaker open) the pre-outage activities also fall into this stage. By pre-outage we are referring to the mobilisation and staging activities that can be completed while the unit is still online.

An example of work that can and should be done pre outage includes staging materials in the general areas where they will be needed. A classic example is to locate the boiler scaffolding materials immediately adjacent to the boiler so that as soon as the clearances are available to enter the boiler the scaffold assembly can start without delay. Likewise clearances can be filled out, ready to sign. With a reasonable effort it is possible to indentify a number of activities that can be completed before the breaker is open and before components have been safely de-energised.

At this point all of the planning efforts to date should begin to pay dividends in terms of managing the execution. This planning should provide the Outage Management team with the ability to assess the schedule, costs and resource requirements constantly throughout the execution phase. The schedule should be updated at least daily in order to provide the opportunity to address any potential schedule slip as it happens. This update needs to compare the work required, work completed and the work remaining.

It is important to note, however, that work remaining is not the difference between work planned and work completed. The remaining effort should be estimated/calculated based on the assessment of the effectiveness of the efforts to date in relation to the evaluation of the efforts required to complete the task. It is very important that the equipment that was worked on during the outage is able to functional reliably until ‘at least’ the next scheduled outage window or in whatever outage frequency window the equipment resides (i.e. every second or third outage).

table 2

To ensure the equipment can function until the next scheduled due date a detailed and rigorous post-maintenance testing (PMT) process must be in place. To ensure equipment functionality testing is adequate, other support equipment may be required to be in service prior to executing the PMT. The CMMS/EAM system and schedule must reflect the PMT and usually Operations signs off on completion of the PMT task.

The concepts of schedule revisions and frequent review of resource requirements and costs must be done on a daily basis, if not more often, but this has proven to be less than common practice. This is due to a number of influences but probably the most important reason is simply the fact that many managers have never had the resources to produce a quality schedule. Having never seen such a tool it is certainly difficult for them to understand the value that can be achieved from its use.

Managing the schedule and tracking progress throughout the outage is critical but the next step in a comprehensive management process is to focus on resources. Part of the reason for the focus on resources is to manage costs. Contractor bodies on site correlate very closely to fees paid to the contractor. Equally important, however, is understanding and managing the production necessary to execute the scope. One tool commonly used to monitor progress is the ‘S Curve’, so-called because the shape typically resembles the letter ‘S’ rotated 90o clockwise.

This shape is typical of a project that starts slowly, with a low level of resources, staffs up to complete the bulk of the task, and then finishes with a lower level of resources, and therefore a slower rate of progress as the overall task is completed.

In the example given in Figure 4, the actual percentage completed is tracked against the plan. It indicates the project was well behind schedule from 2 April through to around 25 April, when it had recovered. By the time the project was complete, it actually finished one day ahead of schedule. It implies that the Outage Management team noted the lack of early progress and responded to address the issues.

figure 4
Figure 4: An example of a S Curve tracking the progress for a turbine overhaul

Managing the work scope and schedule are only part of the requirements of the outage. It is also necessary to actively manage safety requirements and expectations for the project. Goals for safety performance, typically included in the Outage Charter, and goals for anything less than no incidents are not sufficient. The Outage team should not only actively track safety statistics but also make safety a priority in all discussions. Team members should actively promote safety both by example and with frequent observations. Work scope documents should always have safety precautions and requirements prominently displayed. Work teams should be required to execute a hazard analysis before starting any work. The specific actions can go on but are best addressed as part of a comprehensive safety programme designed and managed by a senior member of the outage leadership.

Post outage and feedback

Once the outage is completed it would seem that the work is done and everyone can now relax. However, if we refer back to the original lifecycle in Figure 1, the 18–24 month lead time most likely means your outage staff should be actively engaged in the planning for the next outage, especially if they are resident at or responsible for multiple generating units.

table 3

Before beginning the next outage it is important to capture and document the results of the outage just completed. Some activities are likely still to be completed. Contactors still need to demobilise, there will be clean-up activities to be completed and financial closure will still require resolution of final invoices and of any disputes especially warranty registration and/or claims.

In addition to the completion of outstanding outage tasks, it is also important to document the work completed and the lessons learned from the process. Once all these items are completed it is possible to pronounce the project complete.

Although this article primarily refers to planned outages, many of the elements of the scope determination, evaluation, prioritisation and execution also hold true for maintenance and forced outages as well.