Millions of lines of codes without a single automated test, stressful development and growing business needs. It’s not a new problem for the software industry, it has been around for the last decade.

Yes, we had our own Death Star — but we tried to learn from the others and manage the situation by the actual best practices.

Going through the path of stabilising then de-structuring a monolith and moving towards a system based on services taught us valuable lessons. I’m very pleased to be part of the evolution of a huge terrifying software which became a maintainable and scalable application. I’d like to share our story and motivate you to fight with your legacy, too.

The age of the Monolith

When I said we had a Death Star I meant a software which sends out billions of marketing campaigns through multiple channels like email or social platformsanalyzes the behavior of the recipients, make predictions and automate the next campaigns. It’s pretty much one piece of software, but in the web applications of the early 2000s it was a common pattern. Not surprisingly our monolith — called Suite — was completely written in PHP using MySQL and the file system to store data, assets and logs. Over the years all of the new features were developed straight into this single application codebase without any distribution.

 
Some server rack serving the Monolithic application

The always growing business needs and the importance of the new customers forced our Operations Team to create distinct environments of the same application. These were completely separated server clusters with their own database, running the same codebase — but serving different customers.

At this time copying everything to distinct servers seemed to be the easiest path to solving the problem of scaling.

However, cloning everything to new environments wasn’t a sustainable solution and managing them caused us massive operation issues.

The time of awakening

After almost ten years of developing Suite — in 2010 — our stakeholders realized that the slowing releases and the raising amount of bugs wasn’t a good perspective for the future. The company had almost all of the classic problems which can be caused by a huge monolith. Besides, the software’s scaling issues we had problems with the maintainability, the reliability and the testability.

We could only release once per year due to the insanely hard work of merging and comprehensively testing both the new and the old features.

The company agreed on that we need a completely new approach for the development.

From the beginning it was clear to us that getting rid of our legacy is not the best way to follow. We wanted to iteratively improve our application so we could keep the values we had instead of starting everything over with no guarantees and the possibility of over-designing the new application. We’ve agreed that in every aspect of the development process we have to use a surgical knife, to cut and repair small pieces. To improve an application with short but meaningful steps has its cost, however, we accepted it.

Rebooting the Development

First of all, we made reforms on our processes by restructuring the development department. Instead of big teams we created small, independent ones — with the mind- and skill-set of the development, testing and operation processes. We’ve adopted and propagated eXtreme Programming techniques in order to reach these goals and to be more flexible.

In technical perspective we decided to stabilize the application by writing unit, integration and system tests and run them on a Jenkins-based CI pipeline.

It’s funny but at very first the latter was IRC-based and a build could be started by asking a bot to do it — long before the ChatOps movement has shown up.

We’ve agreed on following Clean Code disciplines — and the rule of refactoring the context of the legacy codebase on the spot when you want to add a new functionality. It was a long and hard process, but slowly we gained faith in our codebase.

Thanks to these improvements we’ve managed to decrease the deploy time from several months to two deployments per day. We’ve successfully minimized the number of released bugs from one deploy to another.

Pioneers of the future

We didn’t stop improving our main application. In order to get more flexible and quicker release and operation processes which aren’t bound to the problems of the Monolith, a decision was made three years ago to move to a service-based infrastructure.

 
Emarsys Operational Dashboard cca. 2013

With this step we hoped that a service deploy shouldn’t wait for the relatively slow deploy of the Suite. It could use its own attached resources while improving the autonomy of the teams. Finally a single service can serve all of the former environments. Moving from the customer-based architecture distribution to splitting our application by features was one of the most effective step we’ve made — it almost completely solved our scaling issues.

In my next post I will get into more details about the additional challenges, so stay tuned!

This post originally appeared on the Emarsys Craftlab blog.