Coding in the Crease

Current Section

Home

Mitigate Risk by Optimizing the Development Workflow and Trusting Your Team

06/21/2016, 9:15am CDT

By Luke Ludwig

A look at the evolution of the Sport Ngin development daily workflow from a 5 person team to a 60 person team.

Sport Ngin has a development workflow built on trust. It enables us to continuously deliver value to our customers by coding small. Our workflow empowers developers to do their jobs quickly and efficiently. We provide the right amount of structure to reduce risk without bogging developers down with too much process. Our workflow wasn’t always this way.

In the Beginning

Back before the company was called Sport Ngin, we were Team Sport Tech. We had around 5 to 7 developers. Our software development methodology was ad hoc cowboy coding with some tests thrown in for flare. Daily stand-ups with Ruby on Rails code as the basis. We were Agile.

The only people who could deploy code to our Staging environment were myself and my boss. It wasn’t that we consciously decided that no one else was allowed to deploy to Staging. That’s just the way it was. No one else knew how to or had access to. Back then we didn’t code small, so deploys to staging were infrequent anyway. Deployments to production were whenever something was done, which was once or twice a week.

Our development team grew to 10 to 12 developers. The company name changed to TST Media. We switched from Subversion to Git. Deployments to production had increased to about one per day and staging deploys were even more frequent. It quickly became clear that I was a bottleneck in our workflow. So we changed our workflow allowing all developers to deploy to Staging. We empowered our developers to do their work.

The Sport Ngin Pipeline is Formed

One day one of our developers said we should use Github pull requests. About the same time I went to a Code Freeze event on Continuous Delivery and listened to Jez Humble speak. The next thing we knew we had formulated the basis of what our Development workflow is today. All work would be done in Github pull requests. To be deployed to production, every pull request was required to pass three checkpoints.

Original Checkpoints of Sport Ngin Pipeline

* Automated test suite passes
* Code review sign-off from any one developer
* QA sign-off from any one team member

The concept of a deployment pipeline comes from Continuous Delivery and represents how code gets from our developer’s desks to our production systems.

Requiring the test suite to pass is the obvious one. We hooked our automated unit tests running on our Continuous Integration server directly into Github pull requests and we simply didn’t deploy a pull request if the test suite was failing.

We require a code review sign-off on all code changes deployed to production because we care about the quality of our code base. This has helped to create a collaborative and social coding environment within our team. No one is allowed to write code and put it into production without at least one other person reviewing the code. This results in at least two people who think the code is production worthy. Allowing this sign-off to come from any developer, as opposed to say a Change Approval Board or only senior developers, is all about trust. By trusting our developers we empower them to do their work. Sport Ngin developers critique each other and learn from each other on a daily basis.

We didn’t actually have a Quality Assurance team, but regardless we required that someone had to look at the code change manually on our Staging environment and sign-off saying that the change looked good before it was deployed to production. This QA sign-off was usually done by the same developer who did the code review. What if the code change was so small that it doesn’t need QA or what if the code change only affected the development environment? No problem. Simply state as much in the pull request saying that no QA was necessary and then follow through with the QA sign-off. We trusted our team to make this subjective call and we absolutely did not want too much rigidity in our process. A rigid process is not built on trust.

Code Small

The company name changed to Sport Ngin. Our team grew to 20 to 30 developers and we started doing deploys to production twice daily. I began to really push our Development team to Code Small. The real benefits of Continuous Delivery are realized when developers build things in small pieces. A continuous delivery pipeline is the easy part. Learning to actually code a multi-month feature in small pieces that are deployed daily is the hard part.

A Fluid Workflow

We formed a Product team for the first time. We needed people dedicated to driving the direction of our products and understanding our customer’s needs. As the Product team formed we decided that major feature changes required a Release sign-off from Product. This change to our workflow totally failed! It wasn’t used very much and when it was it did not go smoothly. Our Product team members were not very interested in viewing code changes on Github. Sign-offs would be done by a developer saying they got verbal sign-off from a Product Owner. Once we realized this Release sign-off was not working we removed it. Since we roll out major feature changes through feature toggles this Release sign-off really wasn’t needed anyway. Having a fluid workflow that your team can change as needed is critical to optimizing your development team’s efficiency.

We started following Scrum with consistent 2 week sprints and all of the typical Scrum meetings and ceremonies that come along with it. Now we were Agile!

Quality Assurance Team Forms

We eventually added a small Quality Assurance team. They weren’t large enough to do all of our QA sign-offs, so we decided to change absolutely nothing about our workflow. We simply had more horsepower dedicated to reviewing changes on Staging and signing off on QA. This helped to free up our developers to focus more on coding. Our Quality Assurance team members joined our development teams directly instead of working as a separate Quality Assurance department.

With the addition of the Quality Assurance team we soon decided that it was not enough to simply require a QA sign-off. We also wanted a documented list of what was verified. Sounds obvious, but often we would check several things and then sign-off on QA and not list the items that were verified. Now we require a QA Plan to be briefly documented directly in the pull request. This simple step allows anyone to review the code change and reason about its risk level.

Managing Risk Within a Complex System

Making changes to a complex system without causing downtime or new bugs is all about managing risk. One day we deployed a code change which broke a critical piece of the Sport Ngin platform. The kind of failure that literally stops the flow of money coming into our bank. We quickly reverted the change and deployed again to fix the problem. Upon review it was immediately apparent to our senior developers that the code change was a risky one, however the QA plan was sparse. The QA plan did not match the level of risk inherent in the code change.

All it takes is one critical failure like this to have our entire process questioned. We made two key additions to our workflow following this critical failure. Instead of changing everything, the new additions were designed to empower our team to mitigate risk while continuing to put the emphasis on trusting our developers.

First we built directly into our workflow a point in time where the risk level of the code change is evaluated. We did this by asking that all pull requests be labeled as either Low Risk, Medium Risk or High Risk. The original developer initially sets the risk level, but anyone can raise it. The QA plan was expected to match the risk. Low risk changes usually only need a minute or two of QA, whereas High Risk changes were expected to have a detailed and comprehensive QA plan which could take hours to a few days to complete.

The second change we made as a result of the critical failure was to add a new checkpoint, the High Risk sign-off. If a pull request was labeled as High Risk then we require a High Risk sign-off from one of our senior developers. The Senior Developer was expected to compare the QA plan to the risk level and determine if there was anything more that could be done to reduce the risk.

Since we Code Small most of our code changes are low risk. In the last year we’ve deployed 5164 pull requests. 86% were low risk, 12% medium risk and 2% high risk. By requiring the extra senior level sign-off only on High Risk code changes we are able to focus our extra efforts on the 2% of code changes that are the most risky. We could have required all code changes to be signed off by a Senior Developer, but this would have majorly impacted our workflow and efficiency. Instead we trust our developers to identify High Risk code changes.

The latest change we’ve made is intended to increase our focus on our user’s experience. In similar fashion to the High Risk sign-off, our developers can label a pull request as ux-required at which point a member of our User Experience team is required to review and sign-off prior to deploy.

The Current State of Sport Ngin Development

Current Checkpoints of the Sport Ngin Pipeline

* Automated test suite passes
* Code review sign-off from any one developer
* QA sign-off from any one team member with presence of QA plan
* Pull request must be labeled as low risk, medium risk or high risk.
* If labeled as High Risk, then High Risk sign-off required from Senior Developer
* If labeled as ux-required, then UX sign-off required from User Experience Developer

Today we have around 40 developers writing code and we deploy an average of 25 code changes a day to production. Our development workflow is designed to reduce risk, encourage coding small, while maximizing our developer efficiency. This means making already simple tasks like making a new git branch even simpler. Merging code into the staging branch and getting it deployed should be dead simple. Creating a pull request from a common template that reflects our custom development workflow is a must. We accomplish all of these things using our open source octopolo toolset.

At Sport Ngin we recognize that optimizing our development team’s daily workflow and deployment pipeline is critical to delivering maximum value to our customers. We customize our workflow and pipeline to suit our changing needs. As we change our workflow we maintain trust as a foundational principle. Trust and empower developers and good things happen.

Tag(s): Home DevOps Continuous Delivery Agile