An Elegant Puzzle, Part 1 with Will Larson
Will Larson is the author of my new favorite engineering management book, An Elegant Puzzle: Systems of Engineering Management. Will has been an engineering leader and software engineer at technology companies of many shapes and sizes including Digg, Uber, and Stripe. He currently leads foundation engineering at Stripe, where he is responsible for the infrastructure and platform organization. If you haven’t heard of Stripe, they’re a huge player in the online payment space, building the economic infrastructure for the internet. They’ve raised over $700M to date and show no signs of slowing their growth any time soon.
In this episode, we talk about the importance of increasing your offer acceptance rate, how Stripe defines developer productivity, and why sometimes, the best policy is to throw away your policies.
Listen to the Full Episode Here:
It's much easier for everyone involved and much more equitable for everyone involved to actually have a system that you follow, but it's having the courage to follow the system when it's hard is a real thing that I think is why it's so hard in some situations to actually commit to this approach.
Welcome to Scaling Software Teams, a weekly podcast to help software leaders navigate fast growth without losing the magic that made that growth possible. I'm your host, Wes Winham.
Today, we're joined by Will Larson. Will is the author of my new favorite engineering management book, "An Elegant Puzzle: Systems of Engineering Management". Will has been an engineering leader and software engineer at technology companies of many shapes and sizes, including Digg, Uber, and Stripe. He currently leads Foundation Engineering at Stripe, where he is responsible for the infrastructure and platform organizations. If you haven't heard of Stripe, they're a huge player in the online payment space. They're building the economic infrastructure of the internet. They've raised over $700 million to date, and they show no signs of slowing their growth any time soon.
This week is part one of our conversation, where we dive deep into some of the principles I love from his book. Next week, we're going to be doing a Q&A where we dive into specific stories submitted by listeners on Twitter and via email. If you want to ask one of our guests about a situation you're working through at your organization, send me an email at firstname.lastname@example.org. And now, here's part one of my conversation with Will Larson.
Will Larson, thank you so much for joining us today on Scaling Software Teams.
Thank you for having me. Really, really excited to get to chat about this.
You just released a book called "An Elegant Puzzle: Systems of Engineering Management". At a high level, what is your book about and who is it for?
Well, I think anyone who has $22 is really the target audience for the book, as you might imagine.
That's a good audience.
Hopefully a large audience or, you know, $10 if you want the digital. But really, it's a book for, I think, folks who are engineering leaders and I think most applicable to managers, but I think anyone who's leading engineering teams or organizations is going to get a lot of value from it.
And what I've noticed is that there are so many different decisions you have to make for leading an organization, for leading a team, and we often don't have the right mental models or things that you kind of come up with over years yourself that you kind of reverse-engineer, but what if we actually wrote down some of these mental models and made it possible for people to learn them through our experience and not just by kind of messing up for the first five years of their tech leading or management career?
Yeah, that five years, that's about right for me. I was messing up the hardest in those first five years. So one of the mental models... maybe it's a meta-mental model that I love most from your book is thinking in systems. What is systems thinking and how does it help us make better decisions as leaders?
So my introduction to systems thinking was actually my father was a professor of economics, and once when I was like 17 he took me on this two day training course to work with STELLA, which is this system dynamic software that if you actually want to go try to buy it right now, it's like $3000 to buy a single seat, so it's kind of out of my price range and probably out of most people's price range but if you're in a university system, you can get a free or nearly free education license. So I've aged out of that, unfortunately, but I went there and got to just work with this software and start modeling some systems and that, for me... led to reading Thinking in Systems: A Primer by Donella Meadows, and a lot more exposure over time.
And really, systems thinking is this idea that often when we have a goal, we're like, "Okay, we want..." Say we're having 10 incidents a quarter and we want to have two incidents a quarter, we'll say something like, "We'll reduce incidents by eight through better reliability tooling," or something like that. In systems thinking, it doesn't let you say that. It doesn't let you state the outcome. Instead, it just lets you change the inputs, maybe design a system that is how you think it works, so instead of saying, "We'll reduce the number of incidents we have," instead you might get to say, "We'll reduce the number of deploys we have," and that might lead, through your system design, to reducing incidents, or, "We'll increase the number of canary deploys we do or percentage of canary deploys," or, "We'll increase the use of our pre-production checklist."
It forces you to actually figure out the inputs that will lead to the outcomes, not just pick the outcomes, which is very easy to pick the outcomes you want, but it's much harder to actually know if your plan is a good one when you work that way.
So I'd love to work through a quick system, so that was an incident system. A common complaint among teams that are growing really fast is they're hiring and it feels like they're actually slowing down. They're less able to get output, so the bad way is, "Let's do more deploys per developer," or some output metric. If we're using systems thinking, how might we break this problem of reduced productivity?
This is one of my favorite ones to think about because I think this is... When you're in a fast-growing company, you have this intuition that we're getting slower as we hire faster but then you go talk to, say, your CO or your manager and you're like, "We're getting slower," and they're like, "No, we're not." And there's kind of this impasse where you're like, "But we are," like, "No, we're not," and it's kind of like a belief system. It's like, how do you actually go from this belief system to actually describing something that could convince someone who doesn't initially believe the same thing you do? So the place to start is kind of your hiring funnel, but then you start building on top of it.
So hiring funnel, you have inbound candidates, maybe from sourced, maybe from referrals, maybe from inbounds. Then we have some sort of interview process where we actually vet them, and then we go to offer, and then we go to hire, and so there's already some interesting things we can think about. Like for example, the most important thing to improve typically is your actual offer accept rate, where you've done so much work and you finally decide to go to an offer, and then if people are declining you, you're wasting all of this time ahead of it, so you know initially where to start optimizing.
But the thing that's not clear in this is how do you actually ... How does this impact productivity? So then you start adding on, so you have someone accept. They start. Then you have to train them, and so your existing engineers are now spending training those new folks, but also those existing engineers are now training you on how to interview new candidates, and those existing engineers are also doing the interviews for new candidates. So all of the sudden when they're interviewing, when they're training people on interviews and they're training people on how to work in the code base, they're not actually building new functionality, right?
And so you can see pretty quickly as you start drawing these feedback loops into these different activities that this thing that was supposed to help can help if you do it at the right rate of hiring, the right rate of training. If your training program's good, for example, you can hire at a much faster rate than if not, but it's not necessarily true in any given timeframe that you'll get faster, and there's many ways, particularly again if your training program is bad or if your accept rate is bad, that you can actually just completely soak yourself with the activity around growth in a way where you don't actually get any of the business benefit.
What's a decent accept rate, or at least what are some factors that go into calculating what a decent accept rate is?
So depending on where your company is and its kind of public appeal, there's going to be a lot of different levels here. I can remember when I was at Digg, we had just gone through two rounds of layoffs and we were trying to get people to come join us, and we're like, "Hey, we have like six to nine months of runway. We're running out of money. We just had a disastrous product launch, and we've laid off a bunch off people. Please join." It wasn't a super-compelling pitch.
And so sometimes I talk with hiring managers who've only worked at companies on the up, that are like, say, only worked at Facebook or only worked at Airbnb or something, and they don't have a lot of empathy for that person who's kind of at a company that doesn't actually have a particularly compelling pitch at that given moment in time, and like, "Well, why would you ever have lower than like 60, 65 percent accept rate?" Sometimes, just depending on the situation you're at and the company is at, you're going to go through some rough spots and I think the actual brand and perception of the company depends, really influences the accept rate, expectations more than anything else.
But yeah, I do think about two thirds accept rate is something that is reasonable to be thinking about in the San Francisco market if you are doing everything right. Really though, mark by market, 10 year level by 10 year level, we're certainly seeing a lot more folks who are interviewing at like five or six different companies, I think, an industry trend I've seen in this regard is just more and more folks interviewing at many different companies and kind of picking the best and being more deliberate about that, where I feel like I saw a little bit less intentionality about the interviewing process in market like five or six years ago.
I'm seeing more and more candidates use services like talent marketplaces, like Hire.com, Triplebyte, which they kind of create that bidding. It's interesting you said that the offer accept rate is the most important part of your process. I don't think without a system that would be obvious. I just hear people talk about more candidates.
So let's say we're modeling that system about slowing down. We're focused on our offer accept rate, and now we've realized that it's our training rate. I think there's a step before that that I skipped, which is how do we define developer productivity, or how do you define developer productivity in a way that we can communicate to anyone outside of the team?
This is one of the great unsolved problems. I interviewed a candidate a year or two ago, and he talked about the P-word, and the P-word is "prioritization", which is one of the other unsolved problems is prioritization but measuring "velocity" is a second unsolved problem. The V-word, I guess, but it's quite difficult. The best attempt I've seen at this is, I think, Nicole Forsgren, Jez Humble, and Gene Kim wrote "Accelerate", which is a phenomenal book and they talk about what are the approximate metrics, what are the signal metrics that you can use to reflect on how your productivity's going. It doesn't let you measure, but they're great ways that let you get indicators about productivity and kind of what they think about is...
I don't know if you ever read "The Phoenix Project" or "The Goal", which are... "The Phoenix Project" is kind of like a modernized version of "The Goal", better I think kind of on all counts, but some of the ones that they use in "Accelerate" to recommend productivity are the defect rate, which is basically the number of deploys that get reverted, the actual number of the deploy rate, like how many deploys are going out. The time from a problem getting entered into ticket queue or a feature to actually getting shipped, which is kind of the delay in system, and that's kind of a proxy for the backlog and there might be one other, but these are some useful indicators that help you understand the health of your systems for deploying software.
But the challenge is if you take off your engineering manager hat and put on your product manager hat, none of these measure the value of any of the things we're shipping, right? And it turns out that if you ship a huge amount of stuff that doesn't help your users, maybe you're actually not very productive.
And at Stripe, you're on the infrastructure team. How do you measure internal productivity or have you been able to side-step that as even a problem?
So a couple of different strategies we've tried. I think one, if we look at the business lines that we're supporting and their kind of growth rates, I think for our older kind of business lines we actually have a deep understanding of the expected growth rate as those businesses continue to evolve, and then... So that's kind of one bucket of high level productivity, and then we also are launching more and more of these new business lines, and then we can understand the quantity of business line that we're able to kind of boot strap and launch as like a second measure of productivity, and so this gives us a great sense of how we're supporting the overall business. A little bit abstract, but those are actually the business outcomes that we care about.
Instead of thinking about new business lines, you could think about the revenue driven by the new business lines, but the challenge is often revenue is like a lagging indicator on new business lines, so it's easier to think for the first, for mature business lines, thinking about revenue is right - or maybe margin - but then for the new business lines, just the ability to launch and validate them is the critical piece, not whether they actually work. We're not measuring our developed productivity on the success in market of these new business lines, although longer term as we do more of them like that, that would also be a good way for us to be measuring our productivity. You can hire at a higher level.
Zooming all the way in, though, then we also look at these kind of indicators that we discussed. The deploy rate, we've been working to measure the human time involved in deploy, but really a lot of it's around deployment and certainly there's a gap. So we focus on deployment too much because the test, build, deploy loop is critical but most of your time is actually spent designing and implementing, and so if you kind of allied those out, you are missing a huge part of it that we do want to figure out better ways to measure over time.
So at super high level for existing business lines, it's rate of change as far as revenue, so margin and revenue are kind of your top level infrastructure goals. For the other part of the business, it's launching out new initiatives. It's actually the launch rate, the number of those launches that occur.
Yeah, exactly that. They are a little bit abstract, but I think we found them to be the correct alignment between us and the business.
What does it mean to "work the policy" instead of "working the exception", and what are the advantages does that give us?
Anyone who's ever worked with me can articulate that I'm a very systems and justice oriented kind of manager, which is a slightly different feel than many folks that you'll work with, but it's really important to me that we get to the same outcome independently of who's making the decision, for example. To me, that's an important signifier of quality in an organization.
A good example you could think about is when you're hiring candidates, how do you figure out how much to pay them? And so a couple of different models, right? One model is that you pay them as much as necessary for them to say yes. You're optimizing for your accept rate, so you're just going to pay them as much as you need for them to say yes, and that works to optimize the funnel for accepts, but then all of the sudden internally you have folks who come to realize, quickly or slowly, that they're paid radically differently than their peers.
And then you want to standardize, so you start having pay bands where everyone gets paid the same amount at a given level or within a certain range at a given level or something like that, so you realize this exception-only based system, they don't work. You roll out a standard process, but then one fateful day you have the best candidate you've ever interviewed. They are going to completely remake your company if you can just hire them.
Caveat, I actually don't think there are heroes just waiting to come remake your company. This is, I think, usually a belief based in desperation rather than in reality, but you have this person. You're-
It's a great story. I want to believe in that story.
The stories help us get through the day, right? Some of the days are challenging. But there is this person we desperately want to hire and we give them this offer, and they're like, "You know what? It's just not enough money," and you have this challenge of what do you do, because this person you've come to believe is critical hire but you would then have to pay them inconsistently with the other folks that you've hired.
And so one approach is to be, "We're just going to give it to you, but hope you never talk to anyone else about it because it will be demoralizing for them, and then they'll be upset with us." The Golden Rule of management is basically, "People who do what you ask them to should be rewarded just as well as people who don't do what you ask them to." So you can do that and just hope they don't realize that you've sort of broken this compact with them, or you can figure out how do you actually go back to this offer system and fix it so that you are consistently handling this.
So one system could be [that] your pay bands aren't right. You need to update them for everyone, and so that's something we do here at Stripe is we do... Every year, we do this level setting to make sure everyone's kind of consistently in the right pay bands together, and I think most modern companies with a certain level of size to have the people involved in the process do something similar to this.
But you could also find other ways to extend the policy, like maybe there is a bonus component that you're willing to give to some candidates and you're just willing to have a little bit of flexibility in it since bonuses tend to be the least impactful to your long-term compensation versus increasing your base salary or increasing your equity component or something like that. Finding some way to actually update it though, where it's not just this custom thing you did for one person, but something that can apply to everyone.
So the policy would be pay bands, but the policy might be everyone gets what they want. An exception would be someone came in that breaks our rule or someone asks for a raise early, and we would be better off figuring out a new policy rather than just one-off for that exception.
That's exactly right. You're trying to direct people's energy, and if people know that you're going to make exceptions, they'll spend a tremendous amount of energy pursuing these exceptions, and you'll end up either giving out more of them or you'll end up having to explain why you're not giving more of them, but neither of these are actually useful for anyone involved. It's much easier for everyone involved and much more equitable for everyone involved to actually have a system that you follow, but it's having the courage to follow the system when it's hard is a real thing that I think is why it's so hard, in some situations, to actually commit to this approach.
So let's say we've got a policy about who has to go to what office based on where they live, and we have exceptions piling up. We got one person that's asking for a one-off, and another person's asking for a one-off. How do you implement those exceptions, because I can imagine a world where you're just like... All you're doing is updating the policy constantly based on exceptions. How do you balance those two things?
I think there are many policies which are inert, where they're literally just useless. Like if a policy doesn't actually change people's behaviors, you should just throw it away, just get rid of it. In the scenario you're describing, that sounds like a policy that's just not even worth having any more. It's just not affecting people's behavior. The other choice, though, could be to look at all these different exceptions, all these different asks, and figure out the thing you're willing to commit to.
And I think about norms versus policies, where I think norms are things you want people to do but there's no actual enforcement or penalty. It's just like, "This is what we prefer for you to do." And sometimes taking your policies and turning them into norms if you're not willing to enforce them is a much easier way, because then people who don't follow them, you're not betraying your process. You're just like, "This is what we want you to do but you're not doing it, and that's okay. This is just a recommendation based on what we found effective, not like a consistent standard that we are holding people to."
So I'm imagining a world where you have written policies and you also have written norms. Are these typically written where you've seen it being more successful?
I've seen both. I think with smaller companies, almost nothing's written, which is I think has pros and cons. I've increasingly come to believe in my career there's kind of two different types of folks. There are folks who come into a company and really study their surroundings and they watch; before they do anything, they look at three or four other people doing the same thing and kind of fit on that. There's also people who come in and don't watch, and I think it's really making those people successful in the second category. If you don't have things written down, you're kind of setting them up to fail, where they have a different style and if you don't actually have the stuff written where they can find it, then they're just going to bulldoze but it's kind of on you to make sure that they have the information where they, doing their best, don't kind of consistently fall out of alignment with the processes around them.
That resonates super hard with me. I'm definitely that second category and it can get me into trouble. I don't pick up on norms quickly, so if things are written down I feel like I step on toes without knowing it, so I'm very sensitive to that type of person, but I would think... I would guess most people are in that first category, but not everybody.
I think it's really company specific. I think some companies have... For example, I think a lot of larger companies get into this mode where there's so much bureaucracy, where if you intentionally try to follow the process you actually can't get your work done, so I think many of those companies don't know how to fix that problem but they do know how to create a culture of ignoring bureaucracy and there's just get it done somehow, and so I think folks in those environments get trained to ignore the policy the environments themselves create, and so people just get trained in different cultural values and if you assume any sort of consistency in them, you're kind of doomed to have some weird conflict as you realize that is not, in fact, true.
It's the written norms versus the actual norms.
Thanks again to Will for sharing his experience with us. Here are a few takeaways I had from our chat.
Number one, to improve hiring efficiency, optimize the offer acceptance rate. Hiring is extremely time intensive. It takes lots of engineering effort and distraction. Every rejected offer we have effectively wastes the time we spent vetting that candidate. It halves our efficiency, doubles the amount of time we spend - not only our time but the opportunity cost of our time is expensive.
Acceptance rate is absolutely the place we want to optimize if we want to look for efficiency and our acceptance rate depends on a lot of factors, especially employer brand which is a nebulous concept. In Will's experience, he was at Digg after their v4 launch when they had two rounds of layoffs and a kind of flopped product release. That made acceptance rate really hard. They saw a huge dip. If you want to use a benchmark, about a 66% offer accept rate is very good in the Bay Area. Our mileage might vary. I see companies in the Midwest, two thirds is also very good.
Number two, how we define developer productivity could determine what type of productivity we get. Will calls measuring developer productivity one of the great unsolved problems. It all comes down to the P-word, which is "prioritization" - what do we actually build? What is valuable?
Will recommends reading "Accelerate", which is one of my other favorite books. "Accelerate" measures organizational productivity with four metrics. Number one, what's the defect rate, which means what percentage of deploys are getting reverted? Number two, what is a deploy rate? How many deploys/releases are going out per unit time (maybe put over the number of developers we have)? What's our lead time, the time going from problems getting entered into the ticket queue to a feature being shipped, and what's our mean time to restore, just how quickly we can recover from failures?
Now, the interesting thing is none of these actually measure the value users get, which is ultimately why we exist as engineers. If we ship a lot of things that don't have value, we do it efficiently with low defect rate, we're probably still not being productive. Now, "Accelerate" made a lot of claims that actually in real world organizations, those four really do predict value. They predict organizational success because we mostly hold prioritization constant, but prioritization is really important.
In Will's organization on the infrastructure team, they focus on the growth rate of existing businesses - so rate of change and revenue - and they also focus on the number of new initiatives that are able to ship to production. This is a launch rate of new business lines within Stripe. Those are measures of value.
Number three, we may need to throw away some of our policies to be more successful. Policies are best when they are written down and they're enforced universally, when we capture exceptions but otherwise enforce the policy. Will recommends working the policy not the exceptions, so the idea is we capture exceptions, don't make exceptions, but then batch them up every month or every few weeks, we go through and say, "Is this the right policy? We're seeing the pull to make exceptions."
On the other side, if we find ourselves routinely making exceptions and we feel like that's the right choice, maybe the policy is not useful anymore. When we have policies we don't actually follow, that dilutes the power of the other policies we have. It causes people to commit time to try to get an exception. This causes politics, unfairness, lots of organizational behavior that we do not want. It's better to throw out a policy that we're constantly making exceptions to than to keep it and water down our entire policy regime.
Remember, without a policy we create reverse incentives. We get the squeaky wheel gets the grease, and sometimes we don't want to create an organization of squeaky wheels. We want people to be rewarded for following the behavior we would like to see copied throughout the rest of our organization. We also lose our chance at designing a fair system when everything's run by exceptions. That's why Will says, "Work the policy, not the exception," and why we might need to throw away some of our policies.
Thanks again to Will for coming on the show. Now, I want to hear from you. Please leave us a review on iTunes, Google Play, or wherever you get your podcasts. This will help others find our community. Until next week, keep on scaling!
Scaling Software Teams is brought to you by Woven. We started Woven because I'm passionate about helping engineering leaders build better teams, and Will Larson’s book, "An Elegant Puzzle: Systems of Engineering Management" would be a huge asset for any engineering leader. That's why we're giving away free copies to anyone who signs up to learn more about Woven.
Go to WovenTeams.com/book and schedule a meeting with me to talk about your hiring plans, we'll send you a free copy of the book. Click here to get started now!