Software quality matters. Learn from a real use-case how we follow the Software Craftsmanship method to push a hard project forward.
Software is a weird craft. From the code to a final product going through all the intermediate states we as product developers need to have a pearl of special wisdom to measure where we are and how to adjust the course (because we always need to adjust the course) to meet our milestones. That wisdom is acquired only by pursuing the development activity as a craft.
In the following lines, I will describe how we follow the Craftsmanship approach in one long term client with great results. Spoiler alert: it was the only way to do it.
Some years ago there was a movement called Software Craftsmanship with a lot of books, blog posts and discussions all over the internet. This new approach extends the agile manifesto by adding some constraints to how the Agile guidelines should be followed with their own manifesto. Here I merged both manifestos to put the craftsmanship in context, the ones in parenthesis are from the Agile manifesto :
The intention with these restrictions is to make clear that quality matters and that the end doesn’t justify the means. At first this might seem impractical for a field like engineering where the results are coming from a black box most of the time. Think of a remote API for example, do I really care how “well-crafted” it is if it gives me the right answer?
Of course, as every complex problem the answer to that question is “it depends,” are you the one using the black box or are you the one building what’s inside? We are in the business of selling black boxes to some extent so it’s our responsibility as developers to build the best box that we can - that’s what Software Craftsmanship is all about. In some cases, there is no option other than to apply some craftsmanship to help your clients build their products.
The client was a big company located in Silicon Valley with a team in the order of hundreds of people including many overseas contractors. They had years of operations with an online product that handles live interactions with multiple users in real-time, so it’s not a trivial site. Our team there consisted of five to eight senior engineers across multiple technologies. The initial mission was helping them push forward a couple of internal projects, and with time, that evolved into much more.
Their codebase was huge and diverse. They had everything from legacy backend services in PHP, old frontend apps in Angular, new modern Go microservices, many Java Spring Boot services, and some external services in the picture like Salesforce, Box, DocuSign, DataDog. Everything was sprinkled with Kafka and nearly a hundred databases (plus some other pieces I’m forgetting or didn’t get to see in my almost a year and a half on the project).
Other than a big technological surface and working on a hard problem (real-time interaction with multiple users) the company also experienced structural changes during this time frame at multiple levels. Some important people on the engineering team left the company, a migration from on-premise infrastructure to a cloud provider was on the horizon and, of course, a product roadmap with people trying to deliver new features to the product. It was a really fun place to be. Just the kind of projects we love to work on, difficult on many levels.
One of the first walls we hit was the legacy components, it was a considerable amount of code split in multiple projects with different technologies and poorly (to none) documented. However, the trickier part was that it still was key to the everyday operations of the company, and the plans to replace them were at the bottom of a long (ever-growing) roadmap. Introducing changes in a codebase that was not meant to last so long is dangerous and has unpredictable results, this is even worse when we have multiple of these systems talking over the network so measuring the impact of the changes was (given the circumstances) impossible.
Still, we have to deal with it somehow, so we followed two simple rules: First, avoiding them. Don’t touch them but if you have to, make sure to measure and communicate the impact as much as possible ahead of time. Second, stop producing any code similar to that. If there is one characteristic that poorly written code has it is that it ages badly because it was not meant to last in the first place, so every iteration that extends some functionality (even just bug fixing) only makes the problem worse. Here is when the well-crafted software shines.
While we were in this deep tech stack digesting things like the legacy projects, the roadmap over which we were running was already being rescheduled so we were behind the deadline. When working on a project like this you have to be really careful about how you use the tools and the way you communicate with others. Using the wrong Slack channel to have a conversation or set the wrong status on a Jira board could make people lose hours before reaching a dead-end.
For a team to move fast the people in it need to be effective communicators so they can collaborate instead of colliding. They have to take ownership of the tasks and be diligent about pushing them to a final stage so the rest of the team doesn’t have to worry about it. They also have to help each other so no one is left behind. These are characteristics you would expect of a team of professionals. If you don’t follow this approach chances are that you will be a drag instead of help.
Once we deliver some features and release the pressure of the roadmap, we started looking for other ways to steadily add value. So we started working on an experiment to improve one of the frontend apps architecture (a ReactJS codebase of approx. 110K lines) by integrating redux-query. The result was an important improvement in the performance and quality of new components that led to set a new guideline for writing new components.
The final stage of the project involves migrating all of their on-premise infrastructure to a cloud provider using Kubernetes and some ad-hoc machines for the legacy systems. We were previously in conversations with the client about how we could improve the delivery speed of the engineering team. They have a great team full of experienced and well-prepared people so the reasons for the bottleneck weren’t clear. The biggest pain point we quickly identified was that debugging was really hard given the number of components involved in every interaction. Sometimes a simple button on the app could fire a process hitting five different services each with a different database and no easy access to the logs. So we suggest adding DataDog to the cloud migration roadmap. It also was key during the migration process to monitor the changes made to services and applications. DataDog’s cross-service APM Tracing & Analytics module, as well as its log collection systems greatly improved our ability to understand how the systems were interacting with each other and simplified debugging incoming bug reports. This is an example of a productive partnership when we try to help our team make it better.
When the project has such a high level of difficulty there is no other option other than to follow an approach as the Software Craftsmanship describes. The constraints described there came directly from the experience on complex projects, if you decide not to follow a similar approach chances are high that you will fall short and the whole project could fail.
At Expero we take every project with this mindset of excellence as a consequence of years of experience in the industry and the conviction that this is the way to bring challenging ideas to reality.
We as developers are always trying to minimize time for one reason or another, deadlines are the number one concern of any developers writing code on a team. Well-crafted code doesn’t have to compete with deadlines but the opposite, the bad code you write today will increase the cost in the middle to long term. This is also valid with short projects like POCs, MVPs or similar. You don’t want a disposable proof of concept that you need to throw away when you scale it up to a full product. A big part of the development craft is to know how much technical debt you can take to move faster, this is purely an artisans work.
A piece of code is not better for spending half an hour deliberating about if “n++” is more suitable than “n += 1”, it’s better because it includes in it the knowledge you need to understand the decisions made when it was written and flexible enough so you can adapt it to a new situation. That’s why software development is kind of a weird craft, you need some special wisdom to make the decisions needed for a piece of code to have this quality without a name.
Tell us what you need and one of our experts will get back to you.