It’s not often when you hear engineers talk about the games they’ve been working on. It’s like a peek inside the machine, or under the hood. So it’s no surprise that I was pretty geeked out when I heard that two CodeCraft panels would be running at this year’s Blizzcon. As it turns out, the sessions were a trove of information on how Blizzard develops and operates its games.
I was surprised by just how much Overwatch has led the way in new techniques for the studio, being a testing ground for everything from having no downtime on patch days, to using a hybrid cloud infrastructure. Then there’s World of Warcraft, and what it’s like to manage a 10+ year-old codebase on a living game. Under that, it was surprising to discover just how much falls into Battle.net, and that’s without considering the areas covered by Service Support or Web & Mobile teams.
If I was going to ask for more from the presentations, it’d mainly be about the architecture. As a former software engineer turned enterprise architect, I’d love to discover if they have an enterprise domain model in place, or if Blizzard uses a methodology like TOGAF. Putting my personal asks aside, here’s some of the juicy nuggets from the two Q&A sessions held at Blizzcon 2017.
Overwatch: Shiny New Tech
The more I heard about lead software engineer Bill Warnecke talk about Overwatch, the more I discovered just how much it challenged Blizzard. A key example was getting patches out with zero downtime, yet having a high degree of confidence that they’d be deployed without a hitch. The solution was blue-green deployment, which basically means having two production-ready platforms named ‘blue’ and ‘green’. While blue is supporting the live game, green is updated, tested and readied. When the patch is deployed, blue and green are swapped over, resulting in zero downtime. It means that patches can be tested on production systems with full integration, and the platforms can be swapped back quickly if any last-minute issues arise.
Blizzard’s commitment to providing a solid gaming experience was also surprising, especially considering how sensitive shooters like Overwatch can be. The studio’s solution in this case was to use a hybrid cloud setup in order to get servers as close to gamers globally as possible. Where it makes sense, Blizzard would use its own cloud infrastructure. But, where they needed extra reach, it would lean on Amazon Web Services and similar to provide that sweet low ping. And, to ensure that game servers are all updated correctly, patches are MD5 hashed to detect corrupt file transfer.
Overwatch has also been leading the way internally when it comes to AI. Instead of using a scripted state machine in the way that World of Warcraft and Heroes of the Storm controlled their NPCs, Overwatch used a goal-driven AI that could build goals dynamically. That learning was then applied to HotS, collapsing down 200 or so states into a series of goals. It’s even being added to the new Island Expeditions feature in WoW’s upcoming expansion Battle For Azeroth. NPCs there will react to the party of players, changing tactics as players move towards the various objectives and reacting in a more believable manner.
World of Warcraft – Expanding the Legacy
Taking a cue from Overwatch, WoW has also managed to compress patch windows down from almost half a day to a single hour. Senior software engineer Omar Gonzalez explained that the team moved over to ‘semi red/blue’ where they have two production environments. Services are switched off, platforms are switched, final tests are run, then realms are reopened.
Sometimes, though, updating a game as old as World of Warcraft with new features just isn’t simple. With a living codebase that’s older than a decade in places, it becomes an engineering skill to know when to extend or ‘build a scaffold’ on an existing system, or when it needs to be pulled out and rebuilt, particularly when new features push those systems in ways they weren’t intended to go. As a result, the team has a mantra: ‘WoW has to keep running’, which is used to weigh up adding new features versus improving the underlying services and APIs.
Even so, much of the engineering team’s work is placing hooks in the various game systems so that content creators and game designers can hook into them. An example was the Mythic+ keystone system, where affixes could be generated and stored. When the keystone is read, the dungeon is created with that affix, and then each entity inside may have additional behaviour or attributes depending on that affix. Engineering simply set up the hooks so that designers could provide meaning to the values presented. It’s a similar story with UI development, as hooks are built for the base UI team to develop onto.
Keeping all that growing code in shape is a significant tasks, which is why the team has also started using automated tools. All code commits go through static analysis before they get merged into any codebase, and are tested by a ‘phenomenal functional QA team’. The WoW engineering team is also ramping up the use of clang-tidy after running a pass over the entire codebase and finding many patterns that need to be pulled or repaired. Code Collab is also used in order to streamline the code review process.
Battle.net – Blizzard’s Bedrock
I was always vaguely aware that battle.net was a large feat, but I didn’t appreciate just how big until until technical lead Brenna Moore started listing everything it covers. There’s the client team, account management and purchasing, distribution, and game services. Then there’s all the build systems, core technology and core systems group, the data platforms and data team, and automated testing. Almost everything from develop through to build and deploy seems to be managed or touched by it.
The team also manages Blizzard’s vast infrastructure, which needed to change from a single data centre with a single internet connection, to a distributed backbone connected to multiple datacentres worldwide. It also manages that hybrid cloud infrastructure, both internally and with partnered providers, alongside peering arrangements with ISPs. Strong partner relationships have proven to be crucial, in both ensuring that players get a solid experience when playing online, and fending off those DDoS attacks that the studio is a ripe target for.
It was also interesting to hear how the Battle.net team act as enablers for the games, rather than pushing back. Lead engineer Andrew Murphy described a scenario where the HotS team wanted new players to get a heap of rewards the first time they logged in, which could stress the account and purchasing systems beyond their original limits, and so redesigned and removed bottlenecks to cope with the increase in demand.
Likewise, the data science teams came in for particular praise. Whether it’s pulling out data from the Battle.net client to detect network or system issues as they happen and respond to them, or to blend it with business intelligence to deliver more enjoyable gaming. Hearthstone engineering manager Derek Dupras described it as ‘where the rubber meets the road’, in using the vast amount of player data to deliver a smooth curve of matchmaking that evolves to the ever-changing meta.
Web & Mobile and Service Support – Meeting the Players
These days, game experiences are about more than just playing the game. We carry games with us, whether it’s the WoW Armory or Legion app, the Battle.net app, and so on. Mobile software engineer Ryan Newsome explained that this gets a little tricky when developing for both iOS and Android, as the team has to make each user interface feel like it respects the smartphone platform, while still being consistently Blizzard. In this context, engineers and designers analyse every single component to make sure it’s suitable, and occasionally scrapping ideas that don’t work out.
On the web side, the team is also responsible for the WoW Armory, forums, and so on. It also built capabilities like the new eSports framework, as senior software engineer Andy Tran described. A shared platform was needed for bracketing, team management, and more, for events including the Overwatch World Cup and Hearthstone World Championship. To ensure that the new capability could be widely reused, automated testing was updated to auto-generate code samples and fail if the documentation wasn’t updated, ensuring a high level of final coverage.
Service Support, meanwhile, covers the whole customer support experience, from handling tickets to managing complaints and penalties that drive player behaviour. Group technical architect Ryan Karg described how the department has grown, from simply supporting Battle.net purchases to several live games across multiple platforms. It’s a change that created growing pains, which were initially tackled through business process alignment. Eventually, though, the only real solution was to perform less work, which is why the team needed a single common support platform with which all games could integrate the features they need.
Even so, some of the Service Support challenges sounded eerily familiar to me, coming from a background with mobile network operators. The solutions are also similar, with Blizzard using AI to route tickets more effectively, and now looking at sharing internal support articles publicly where a player can possibly help themselves quickly. Personally, I’m curious if the studio is looking at more advanced AI chatbots that can learn how agents handle tickets to follow similar steps through machine learning, or other advanced techniques to scale support.
Fostering a Blizzard Culture
Aside from the tech talk, both Q&A sessions also gave an insight into Blizzard’s unique culture. Most striking was that each game team has operational responsibility for keeping it running, and will double down on crushing bugs and minimising downtime. Blame is avoided, with an importance placed on learning from mistakes and improving processes and techniques wherever possible.
Pushing it further, many of the Battle.net teams have embedded DevOps and embedded QA in order to shorten development cycles while still deploying in a stable manner – vital when it’s the platform that underpins some of the most popular games played.
Teams work closely together, sharing knowledge and experience as much as code and features. Beyond the AI techniques mentioned earlier, HotS developed pathing and behaviour tree code that could be decoupled from core systems and reused elsewhere. While the game teams might use individual version control, those departments responsible for core systems are centralising their code repositories to share it studio-wide.
There’s also a constant desire to provide more information back to the community so that they can build their own apps and websites from it. The Game Data initiative at dev.battle.net is intended to support 3rd party developers, and the team behind it is constantly striving to put out more. This way, fans can contribute back to the playerbase with our own creative ingenuity.More formally, Blizzard also makes contributions to standards and open-source software, with premake, Spring and RabbitMQ getting mentions.
As for recruitment, all the engineers had clear advice: completed projects count much more than just starting them. Learn one programming language well rather than lots of languages poorly. Consider how you’d architect the software to make it scale horizontally, segregate data, tolerate faults and degrade gracefully. Alongside being able to demonstrate a cultural fit, having a portfolio on github goes a long way to demonstrating abilities.