As the lead for Service Management with NTT DATA's UK Telco, Media and Comms practice, I have the privilege of meeting with a broad range of operators and service providers.
One thing is clear. Most of the time, I hear about the same problems, ambitions, hopes and dreams.
In this post I'll be talking about the common themes in the things I've talked with organisations about over the past 12 months. I'll share some of my views on how to approach them. But I'd like to start with a quote from Oren Harari: "The electric light did not come from the continuous improvement of candles'.
Hold that thought and let's get going.
Squaring the circle of increased complexity and reduced cost.
The first theme is about striking the balance between service and cost.
Let's first take a step back. The majority of organisations classify service impact using a severity scale. Severity zero (sev 0) being of greatest impact, like users unable to make or receive calls in a large geographic area. Severity 4 being of negligible impact, such as an intermittent problem with a content page on the company intranet.
If incidents are stacked in a pyramid, with 0 and the top and 4 at the bottom, it results in a very flat and wide pyramid.
The economics of working this pyramid are self evident. Typically the majority of the man-hours are spent on the middle layers of the pyramid, with relatively little spent on fixing severity 0. Given the lowest severity rarely get touched, even less is spent on them.
What does this tell us? Well, it's a volume game. The tactics of the game can be boiled down to a couple of simple plays:
- React: with SWAT-like effectiveness: For severity 0 incidents, operate highly repeatable and drilled major incident management procedures and ensure the systems of record that enable the management tasks are accurate.
- Prevent: by mobilising as many failure antibodies as possible: For the volume incidents, adopt a prevention tactic.
Another way of explaining this is the Warrior and the Guardian. The Warrior is about being the hero; reacting to failures with military precision and management. The Guardian is about preventing the failures. Just as is said in medicine, prevention is better than cure. It's certainly less costly too, in the long run.
I've seen a variety of service management endeavours hit the rocks because 'the in year benefits aren't there'. The simple truth is that organisations have already optimised the 'as-is' to harvest the in year benefits; there's not much left to take.
The key take away here: Changing the economics of technology service is about transforming gradually, ideally in smaller blocks. Those blocks could be technology service management journeys.
Journeys are the vehicles of business benefit. Target transforming the journeys that are focussed on Guardian outcomes. Good examples are Fault Root Cause Analysis, Change Impact Assessment, Asset and Configuration Management. A focus on this will generate pecuniary benefit to existing operations, improving the effectiveness of you army of Warriors too.
Convergence, decentralisation and a population explosion at the edge Next up is how to manage the new world that's already starting to materialise around:
- Convergence: The blurring of boundaries between IT and Telco Networks - this includes including virtualisation (NFV and SDN)
- Proliferation: The explosion and proliferation of moving parts outside of the core - IoT, 5G cloud RAN and Small Cells.
In lay terms, it's getting really complex. The fact is, with this change, traditional service management processes, tools and approaches cannot scale to meet the new demands that this complexity and dynamic asset base is creating.
But wait just a minute. Before we get carried away, it's vital to recognise that all of the clever stuff - and that includes AI - is predicated on having quality data that describes assets and inherent configuration, within and between assets.
The pyramid below is one that I regularly use as a discussion piece. It's missing one thing though - foundations. The foundation that underpins all of the layers is accurate asset/inventory and configuration data.
Let's come back to scalability of operation. What exactly does 'cannot scale' mean, I hear you ask? Well, here are a couple of examples.
Let's first take a look at an IoT use-case. Operators will have little or no knowledge of what types of 'things' are using their [IoT] connectivity. That's good though, isn't it? Well, sort of, but it creates a separate challenge. Let's assume something goes wrong with a load of these 'things' - how do you mitigate the risk that the users flood your contact points looking for support?
Second, virtualisation requires automation at scale. At some point, virtualised assets require something physical. Most of the operators I interact with don't have much sophistication in the asset and inventory management space; this has in turn resulted in asset and configuration data in a pretty poor state of repair.
The challenge here is how can you automate at scale if your base asset and configuration data is a) Managed using manual and often ad-hoc means and b) Of poor quality? The answer is 'with extreme difficulty'.
For these challenges, I point in the direction of three 'must-dos':
- Clean and keep clean: Clean up the source data, and more importantly, instigate processes that enforce accountability for data quality through clear stewardship.
- Automate: Automate every possible aspect of your operation (build and run) that adds, changes, uses or removes assets, inventory and configuration; include also the maintenance of the data that describes interrelationships between assets too.
- Diagnostics: Develop weapons grade, zero touch (i.e. shifted-left) diagnostic capabilities that will not only distinguish between false positive/negatives and real problems, but will provide a next 'go-to' point for the user (remember, in the ecosystem, you're part of one happy family - help out others to deliver a great end-user experience).
Delivering these capabilities is not just a case of picking one, then opening up your CAPEX taps. As suggested earlier, identify what 'user journeys' contain business value opportunity and tailor data cleansing, automation and diagnostic tactics in a pragmatic way. For example, look at the fault root cause analysis journey for a given product set and a specific segment, rather than just looking at fault management. Be specific and be led by where the benefits live - don't be a slave to process or try to make something work for everything.
The business case for technology Service Management
Finally, quite literally, the money question. Most of you will be able to name the film the image comes from and quote the line. It's a line that forms the cornerstone of any investment case that hits a corporate investment board. "Show. Me. The money".
For technology service management, often I see the Pavlovian response of looking for cost savings in people intensity - in other words, do a time in motion study and cut out wasteful time, thus reducing the FTE on the P&L. Should be simple. But it's not. But why?
I see a couple of reasons why it's not that easy. First, the processes in most organisations are as optimised as they can be. Secondly, whilst anybody in command of a spreadsheet can find a logical saving, because operational managers know about the first point (sic.), when it comes to the investment board, the operations director won't sign up to sustained service with fewer resources. So once again, we arrive at the position of having to transform, rather than increment on the as-is.
There are a few recommendations that I've found work well in this situation:
- Focus on journeys: Once again, journeys are the best forms of scope. In particular give strong consideration to 'Guardian' processes; it will help stem the flow of factors that generate incidents/faults. If scoped correctly, a journey is small enough to be delivered in a relatively short period. A journey can also bring great focus to a particular product [area] and/or segment - this helps to deliver tangible and near term business benefits and change outcomes.
- Look beyond FTE: Consider the value of an earlier time to market, the value of mitigated security and health & safety corporate risks. Consider also whether technology can move from supporting the business to helping it create value; the best example of this is how Amazon went from an online store to having almost 50% of the IaaS markets. Take lazy assets and put them to work. Consider how technology service data can dovetail with and give benefit to your omni-channel ambitions - data should flow to wherever it can create value.
- Create a treasure island: Making transformative change stick is hard, especially when you try to change the entire organisation in one hit. Let's be frank, most attempts like this fail in one way or another. Instead, establish an island of transformed working. To start with, it may just be operating one journey. Over time, move more and more journeys to that island. Before long, it will be the treasure island. Benefits will be clear and working methods will be transformed and materially differentiated from the as-is. Over time, everything can be moved to the new island.
There's one more very important trip hazard. Don't believe that reaching the nirvana is all about the applications and tools. It isn't. They enable it, but don't drive it. My message here is focus on the journeys, processes and people (skills and organisation). Then do technology. Whether your technology is blue, orange or red really doesn't matter as much as the other points.
Concluding with the candle ... So why the candle at the start?
Well, simply put, to square the circle of cost and service, whilst the canvas you're supporting is changing shape and size, in a way that keeps the CFO happy with in-year benefits, things have to change. Radically. Simply making small improvements to current capabilities isn't going to cut it. You need a new type of light, not a longer candle.