Db2 Administration: How to Break Your Cycle of Constant Firefighting and Embrace Proactive Management

February 4, 2021 | Valery Aranouski

It’s a common problem for Db2 administrators.

You do everything you can to create a stable, harmonious, balanced environment.

You run your transactions in a smooth, efficient, and predictable manner.

You fine-tune everything to apparent perfection.

But then, something unexpected happens.

A big performance issue pops up out of nowhere.

You try to resolve this issue, but it’s too late…

The issue has already created a spike in resource usage.

This spike has already snowballed through your environment and dropped application performance levels.

And your phone has already rung a dozen times from users who have lost access to their applications.

You scramble. You find a way to resolve the issue. You create another period of smooth, efficient, predictable performance in your environment.

But then, out of nowhere, a new issue appears, and the cycle repeats.

Over, and over again.

No matter what you do to create a smooth, efficient, and predictable Db2 environment, you remain trapped in this cycle of constant firefighting. We have seen it a hundred times. But we have also found a way to break this cycle. And we wrote this article to share this new approach with you.

In this article, you will learn:

  1. Why you should not accept this cycle as a normal part of the job.
  2. Why standard approaches to Db2 administration create this cycle.
  3. How you can tweak your approach to get ahead of these unpredictable issues.
  4. And how you can move from reactive to proactive Db2 administration.

Let’s begin.

Why You Shouldn’t Accept Constant Firefighting as Just “Part of the Job”

To start, let’s be clear about one thing…

No Db2 administrator wants to remain trapped in this cycle of constant firefighting. They simply don’t see a way to break this cycle, so they accept it as “part of the job.”

But if constant firefighting is just “part of the job”, then so is:

  • Operating in a constant state of stress and frustration.
  • Devoting the majority of your time and attention to firefighting.
  • Never having the chance to focus on  bigger, more strategic items for long.

Over time, these problems will take their toll on you. They will lower your day-to-day enjoyment of your work, and limit your long-term professional opportunities. But even if you don’t care about these things — even if you are more concerned about your teams and your business than yourself — you still must break this cycle. After all, by constantly firefighting Db2 issues you will also create numerous problems for your teams and your business.

These include:

  • Higher Costs to Run Customer Workloads: Every time an issue creates excess CPU and IO usage, you will increase your peak Rolling 4-Hour Average, and thus increase the costs of your monthly MLC licenses.
  • Decreased Performance of Key Customer Workloads: Every issue can snowball into inefficient workload distributions, lowered response times, increased time-outs, more system failures, and reduced application availability.
  • Friction in Your Stakeholder Relationships: Every time an application drops, you create lowered productivity, the potential for financial losses, and incoming pressure from your executive-level business stakeholders.

Worst of all — these issues add up.

Every time you experience an unexpected spike to resolve, your monthly costs will increase a bit more, your average performance levels will decrease a bit more, and your business stakeholders will trust you and your teams a bit less.

In sum: Constant firefighting is a big problem for a lot of people in your organization.

And you won’t solve this problem by following standard approaches to Db2 administration.

Why Standard Approaches Create Constant Firefighting: Three Reasons

On paper, the standard approach to Db2 administration sounds logical — just create a perfectly stable environment, and you won’t encounter unexpected problems. But in practice, this approach never seems to deliver on its promise. Here’s why.

First, there are no perfectly stable environments.

Yes, you can configure and fine-tune every little thing… You can optimize every minute detail to run smoothly and seamlessly… And you can create an environment that operates predictably in its present state…

But sooner or later, something is going to change in your environment. These changes might come from:

  • New business requirements
  • New technology installations
  • New software updates and patches
  • New security requirements
  • New workloads, methodologies, development tools, and interfaces

Every one of these changes — no matter how small and inconsequential — can throw off the fine-tuned parameters you established and create problems. And when they do, you might not know they happened — let alone get the chance to adjust your parameters and re-stabilize your environment before big problems erupt.

After all, these changes can come from multiple groups in your organization.

Developers. Architects. Business analysts. Product owners. System programmers. Infrastructure admins. Business stakeholders. Executive management. Any of these groups can make changes to your Db2 environment — with or without your knowledge, oversight, and governance.

In sum: Even the most conservative and time-tested environments are subject to constant change… and any of these changes can create big downstream problems.

Second, even small, isolated problems can snowball into big, systemic issues.

Db2 environments are complex. Their transactions are highly interconnected. And any problem with one transaction will impact other transactions. Picture your environment. Your Db2 workloads share and consume the same pool of resources. If “transaction one” encounters a problem and consumes too many resources, then “transaction two” cannot execute until those resources free up.

From there, “transaction three” cannot execute until “transaction two” completes its work and frees up its resources, and then “transaction four” gets delayed, and… well, you get the picture. A single transaction that encounters a problem can delay many other transactions, and ultimately snowball into system-wide problems that create performance loss and outages of transactions that seem to have nothing to do with the original problem. To prevent these system-wide failures, you must identify and remediate small problems before they create problems throughout your entire system. Unfortunately, most Db2 administrators are not equipped to do this.

Third, common approaches fail to fix problems before they snowball.

Before we sound too negative, let’s clarify one point… Common approaches to Db2 administration get a lot right.

Consider a real-world example. One of our clients runs SAP workloads on Db2 for z/OS. To do so, they leverage SAP GUI. They also leverage Trivoli Enterprise Portal (TEP), which is applicable to a broader range of Db2 workloads.

Our client is right to use these tools — they are specialized and powerful. Our client can use them to identify and solve many of their Db2 performance issues. These tools are critical elements of our client’s Db2 stack and our client must keep using them.

But our client recognizes that these tools are not enough on their own. There are some performance issues that these tools cannot identify and fix. And when our client encountered an outlier performance problem, they found themselves in one of the following uncomfortable positions.

  • They Know the Problem, But Not the Cause. They were able to identify the performance problem but not its root cause. They always had to ask their system programmer or administrator to come in and troubleshoot the issue.
  • They Knew the Cause, But Not the Solution. They could identify the problem and its root cause, but they didn’t know how to solve it. They again had to ask their system programmer or administrator to come in and give them the solution.
  • They Know the Solution, But Could Not Implement It. They identified the problem and its root cause, and knew how to solve it. But they lacked the right tools or skills, and their attempts to implement the solution would fail.
  • They Implemented the Solution, But Nothing Happened. They identified the problem, its root cause, and its solution. They appeared to implement the solution successfully, but they still do not see any performance improvement.

The result: Even those our client used good tools that helped them with a lot of problems, many other problems remained active in their environment, and regularly created snowballing performance loss and outages.

Their example is common.

We have seen — among multiple clients, who utilize multiple Db2 tech stacks— that common approaches and tool sets create this same exact situation. They solve many problems, but are not able to solve everything.

Ultimately, common approaches are useful but not sufficient.

If you only follow common approaches to Db2 administration, you will…

  • Maintain an environment filled within unknown, unsolved problems. Once you do find them, it might be too late to solve them before they snowball.
  • Require multiple tools and dedicated technical support teams to fix the problems that you are able to identify in your environment. You may even need your teams to be on-call 24x7x365 to ensure consistent performance.
  • Create a flood of alerts — many of which are false positives, and many of which will repeat even though they never require any action — burdening you and your teams and preventing you from dealing with the real problems.
  • Operate in a reactive mode of constantly firefighting big performance issues when they pop up — without fixing their underlying causes, and without any hope that one day your environment will remain calm, stable, and ordered.

Clearly, you need a new approach to Db2 administration.

One that:

  • Discovers, identifies, and fixes problems ignored by common solutions — before those problems snowball into system-wide problems.
  • Consolidates tools, and reduces the burden on technical support teams — or eliminates the need to maintain these teams at all times entirely.
  • Reduces alerts and false positives — especially repeating alerts — so you can focus your time and attention on solving big, fundamental problems.
  • Moves you into a proactive mode where you solve problems before they snowball — all while you fix big, fundamental problems and improve the fundamental health and performance of your Db2 environment as a whole.

Here’s what that new approach looks like.

How to Stop Firefighting and Develop Proactive Db2 Administration

Over the years, we have developed a new approach to Db2 administration. We have developed this new approach in the field, and built it around a few core principles that every Db2 administrator must follow if they wish to create a calm, stable, ordered environment that does not produce performance spikes or snowballs.

These core principles are:

  1. You Must Expect Issues and Hunt For Them: It’s time to abandon the idea that you can create a perfect environment that is 100% stable at all times. No matter how good you are at Db2 administration, you must accept the fact that not everything is under your control. Things will change and some of those changes will create unexpected problems.
  2. You Must Proactively Remediate Issues: It doesn’t matter who created the performance problems in your environment — it’s your job to fix them before they snowball out of control. To do so, you must find a way to continuously monitor your environment for these problems without adding to the flood of alerts and false positives you are already dealing with.
  3. You Must Adopt New Db2 Tools and Capabilities: Finally, you have to be willing to evolve beyond the status quo. You don’t need to give up the Db2 tools and approaches you have been following, but you do need to augment your approach with new tools and techniques without creating needless complexity or solution overload.

If you adopt these principles of Db2 administration, you will create many benefits. You will prevent problems, rationalize costs, and reduce risk. You will improve the efficiency, performance, and availability of your Db2 workloads. And — most important — you will simplify your job, and stop firefighting all day long!

Is This New Approach to Db2 Administration Right For You?

While we advocate for this new approach — and while we have seen our clients adopt it — we recognize it might not be of interest to everyone. To decide if you might benefit from our new approach, ask yourself a few questions:

  • Am I happy with my Db2 costs and performance levels?
  • Are my business stakeholders happy with our costs and performance levels?
  • Do we often encounter unexpected performance problems?
  • Do we always know how to identify, investigate, and solve these problems?
  • Do we always solve these problems before they snowball out of control?
  • How much of my day do we spend reactively putting out these fires?
  • Is there something else I’d rather be doing with those work hours?

If you are not happy with your answers to these questions, then read on.

In part two of this series, we will give you the practical, tactical, and technical details on how to follow our new approach to Db2 administration. If you require immediate, personalized help bringing this new approach to life in your organization, then reach out today and schedule a no-obligation consultation.

You might be interested in Choosing the Right First Steps in DevOps Mainframe Transformation.

Leave your comments or questions here

    Yes
    YesPrivacy PolicyCookie Policy