Tech debt in banking, how did we get here?
The term ‘technical debt’ has been around a long time in software engineering. It shares some common characteristics with financial debt: You can accumulate it, you can ‘pay it off’.
Coined by Ward Cunningham, his original definition was around the mistakes you make when you start something new, before you know any different. You know, months or years into a given project, you say, ‘If we were starting again, we would do [all these things] differently?’ That’s technical debt. Of course, these days it’s broadened to describe any short cut, ‘hackery’ or, ‘we don’t have time right now; we’ll fix it later’.
[Narrator: They didn’t fix it later]
It’s now often used to describe any system or service where the builders have taken shortcuts or made quality sacrifices in the name of speed, as a result, slowing down any future changes (go figure).
You’ll hear the term used in banks. A lot.
And what are banks, if not enormous, sprawling software projects? Banks are technology. If you think about how much money moves through your bank every day, then consider how little of that involves putting bundles of notes in safes and taking them out again, you have an idea of just how intertwined with technology banks are. Even the really old ones.
If you work in a bank, you’ll have heard someone ascribe some problem or barrier to ‘tech debt’. You were probably asking for a simple change to a system that would save or make the bank a lot of money, or asking why a project is so behind schedule. If you’re responsible for parts of the bank that deal with customers, you might be forgiven for thinking ‘technical debt’ is a financial term (‘it’s technically debt…;), but it’s now a catch-all for, ‘this technology is messy and fragile, and we really don’t want to fiddle with it.’.
A short history of banking technology
Banks’ raison d’etre is the processing of vast volumes of numbers, and they’ve always invested heavily in technology to become bigger, faster. And they’ve operated at scale since long before the internet and ‘Big Data’. If you consider the plethora of systems in a bank, and the tech debt surrounding each one, you start to appreciate why big banks move so slowly.
Part of the problem is that many bank systems of record were first commissioned decades ago. Back then, they were cutting edge, performant, fit for purpose. To be fair, they still do their job well. They have to. But they follow the model of yesteryear: End-of-day batches running on mainframes to process and reconcile transactions overnight. These things are monolithic, slablike, and like Stonehenge, or the Pyramids, the knowledge and understanding has been lost in the mists of time. Times have changed. Technology has sprawled, with data volumes increasing, systems being linked to other systems, and tech debt accumulating all the time. As time has gone on, we’ve advanced through AS/400s, Solaris, Linux, Virtual Machines, Cloud, each bringing access to new techniques, new paradigms. And while that is happening, we’re creating new ever-more cunning and esoteric financial instruments, chased by tighter regulations, more complex reporting and greater risk management.
And all the time you’re not updating or replacing these systems, they’re racking up technical debt.
Where we are today
So with every system in the bank evolving and intertwining, your architecture starts to resemble a drawer full of old cables: hopelessly tangled.
And then we consider Conway’s Law:
Any organisation that designs a system (defined broadly) will produce a design whose structure is a copy of the organisation's communication structure.
— Melvin E. Conway
Broadly, what this means is that your architecture tends to end up resembling your organisational structure. Which is okay if your organisational structure is efficient, logical and avoids duplication.
You see a lot of duplication in banks.
The fact is that banks have never been structured to support their architecture. Instead, every part of the bank shifts continuously, and independently - typically every 12-18 months. So if your architecture tends to match your org structure, and parts of your org structure change every year, what’s that going to look like? Layer upon layer of legacy technology. And don’t forget the calcified process that surrounds it and grows year-on-year (banks aren’t great at ‘less process, please’).
So, much like the derivatives - the CDOs that landed the economy in so much trouble in 2008 - we layer systems on systems, adding complexity and technical debt over the years so it’s very hard to understand end-to-end how it all works. And we fudge things together that don’t quite fit (‘this system is real-time, but we need to get its data into a system that only understands nightly batches’). So you end up with a system that works - mostly - but that is fragile, brittle, and highly resistive to change: Pretty much the antithesis of the qualities needed to compete.
...we layer systems on systems, adding complexity and technical debt over the years so it’s very hard to understand end-to-end how it all works.
What can be done about it:
While we can’t start again, what we can do is establish what the base-level capabilities - the ‘primitives’ - are. A bank needs a ledger capability, it needs an unsecured lending capability, and so on. Pick one, and build (or buy) something that does that one thing well.
Once you’ve picked something, get it live, then use it to accelerate new higher-level capabilities by building on top of it. Add further services where needed. And yes, there’s duplication here. This is often one of the hardest barriers to overcome in a big bank, because the owners of the existing systems will see what you’re proposing as an existential threat, and will argue very hard that you should couple to the existing system because, ‘Why would you build something new when we have one already?’. These conversations happen all the time, and they happen at a level so far above the people who will actually do the implementation work, that anyone can argue that one ledger is the same as another. Using a legacy system is never quicker, and you’ll likely never untangle your service from it either. And so the cycle begins again.
On the other hand, keep pushing with the strategy of building composable primitives, and eventually you’ll have an estate of services that are loosely coupled and independently upgradable. And then you can start to migrate your existing services on to these.
We delve into the solutions in more detail in our report, Rebuilding financial services from the inside.