Many providers of business-critical SaaS applications eventually reach a point where incremental improvements no longer suffice to move their products forward. It may be that a major architectural change is needed, or foundational aspects of the platform are due for upgrade, or perhaps just that development has outpaced the team’s ability to deploy and integrate important new features. When a company has reached this point, it has a choice: migrate or stagnate.
Large-scale migrations can be intimidating, especially for the highly available, always-on cloud services that are becoming ever more critical to the way our world runs. How can we approach migrations at scale in a way that minimises the impact and manages risk without getting locked into overcaution?
Start with the business
Before even beginning the technical plan, consider the perspective of the business. First, what value will this migration create for the company and your customers? Some of the most critical migrations are internal projects and being able to justify their value relative to customer-facing product priorities can be crucial to resourcing and execution.
Second, what, if any, impact is acceptable to the customers? It’s essential to determine how much, if any, downtime is tolerable and what level of customer engagement is feasible for your team. This will help to determine the business constraints. Clarity upfront is essential because the most brilliant technical plan is doomed to failure if it ignores the fundamental business requirements.
Factoring in downtime
The most critical consideration will be downtime. The decision about what acceptable amount of downtime to build in will significantly impact the technical plan. The requirements and process for a zero-downtime migration are completely different from projects where systems or applications have acceptable time frames for being taken offline, so this should be the starting point for all migration planning.
Zero-downtime migrations allow teams to move on their own schedules because they are not constrained to specific maintenance windows. While teams must keep the original system online, which can create complexity, it ensures that it remains live and can still serve users in any issues. Essentially, teams can repeatedly test the migration process without impacting the original system.
System migrations do not have to be stressful experiences. The best approach to managing the project starts by focusing on the big issues, gradually building a workable plan, and agreeing on a strategic migration path
However, the constraints of zero downtime are unnecessary if there are natural maintenance windows that can be utilised. Migrating the lobby WiFi in a bank that is only open 9-5 provides a natural maintenance window in which downtime is acceptable; however, a patient records system in a hospital emergency department or a public transport signaling system cannot afford to be offline.
Establishing the migration plan
One of the keys to establishing the migration path is to break down the system to see if each component – each service, unit of code, customer, customer group, or even geography – can be migrated with zero downtime. By understanding this, a path starts to present itself. There may be a hybrid approach where a subset of customers, which use a discrete service, are transferred with downtime, while others require zero downtime. Or perhaps SLAs allow for a certain amount of downtime each month, and that allowance can be used to a company’s advantage.
Some migration plans involve building parallel systems that run for months as one group of customers at a time is migrated to the new system. Bear in mind, however, that the longer this takes, the more it will impact the business. Companies can’t freeze development for 12 months, so teams are stuck in a cycle of building features for two systems and perpetually testing those changes. The migration cycle can seem painful and never-ending.
Ultimately, zero downtime rarely needs to be all or nothing. It is a perception. If customers fail to notice any downtime or negative impact on their experience, then for them, there was no downtime.
Leadership matters
Steering the path between what is acceptable to the business and what the engineers responsible for the migration need to achieve requires good leadership. Interacting with both is the leader’s job, and first, they need to eliminate the many options and challenges that can throw a migration off course. In the beginning, it is more important to look at the big picture and work on solving high-level problems—the small, technical details can be tackled later. Don’t get caught up in the minutiae.
Top tips for leading migration project meetings:
– Build a minimum viable product (MVP) quickly: This will provide a baseline for feedback so that the engineering team can iterate more quickly.
– Meet in small groups: Dozens of engineers might be involved in some migrations. Making decisions is difficult in large groups and will rarely lead to full engagement and input from all participants. It’s better to work in small groups, where a leader can engage directly with everyone in the room, even if that means holding multiple meetings. Bringing these smaller groups together, especially in the early stages, allows rich insight to be gathered and engenders confidence that everyone is on board with the developing plan.
– Address problems separately: Limit meetings to 30-40 minutes and group problems into no more than four buckets; otherwise, it will be easy to get stuck in the weeds. It is more productive to hold separate planning meetings focused on those problem areas and avoid making each one monolithic.
– Propose a solution and welcome challenges: Rather than starting with a blank whiteboard, ask engineers to reason their way through a plan presented by the project leader. The aim should be to gather critical feedback, not all of which should be tackled immediately but can be captured, considered, and discussed, leading to better ideas that can be integrated into the overall plan at the right moment. It’s about involvement—engage engineers’ creativity by proposing a solution and asking what won’t work about it.
– Don’t put everything on the table: If the most significant problems are yet to be solved, don’t bother with the small ones. It’s the biggest problems that will determine the overall shape of the project, so solve those first, and don’t waste time discussing secondary issues early on. Create a master list of challenges to revisit later, but then move on, and focus on the big issues. Deal with everything in its own time.
System migrations do not have to be stressful experiences. The best approach to managing the project starts by focusing on the big issues, gradually building a workable plan, and agreeing on a strategic migration path. This allows for progress to be made and stops the migration from stagnating. IT leaders can build confidence in their team by instilling in everyone that their contribution is making a difference and united in a common achievable goal. Following these steps will give projects the greatest chance of success.
Nate Daly
Senior Technical Architect at NS1