Tuesday, 3 July 2012

Assessing the RBS meltdown


Guest blog post by Daniel Mayo, practice leader, financial services technology, Ovum

Whilst the specifics of what went wrong with RBS last week are still unknown, we have a good indication of the nature of the fault – a relatively minor update to the batch scheduling software failed. This is something minor on its own and easy to fix, but the problem was unfortunately exacerbated by an employee deleting the scheduling queue in his attempts to fix the problem.

Rebuilding the scheduling queue is a much lengthier and more complex process – one complicated even further by UK banks relatively large reliance on legacy systems. RBS needs a team that has a detailed understanding of the scheduling order, the core system’s processing quirks, and knowledge of older IBM assembly languages.

Assessing this scenario therefore shows two of the IT infrastructure pressures that banks face today:

First, that the shortage of skilled staff experienced in older systems is a growing operational risk that is difficult for banks to address. Senior staff with the knowledge necessary to perform complicated operations inevitably retire and new IT professionals (unsurprisingly) concentrate on newer technologies.With most banks under heavy cost pressures, relatively junior staff are often given responsibility for systems where they have little experience beyond the routine, particularly in a stress situation (as with RBS) where things go outside normal operations. This can become particularly acute in situations where maintenance of systems is outsourced or offshored, as even documentation on these systems and the kind of processes supported is hard to come by, if it even exists at all.

Secondly, the growth of mobile banking will increase pressure to reduce batch window and increase transaction volumes, further reducing room for error. The batch window largely operated fine in the old world of restricted-hour branch-based banking, where branches closed at 15:30 and at weekends. This gave IT a large “batch window” to complete processing, with time to roll-back and re-run if necessary. However in an age of online banking, and with growing uptake of mobile banking, IT is increasingly under pressure to reduce system offline time, and is being asked to run batches within a relatively tight window. This results in less room for error if things do go wrong.

In the short-term, the main response by banks will be to focus on processes and governance, to ensure that disaster handling policies are understood across all support staff. This is appropriate. However, this glitch should be a catalyst for banks to take a longer look at their core system strategies. While legacy systems may be mature and stable, at some point old age will get the better of them.

No comments:

Post a Comment