Top positive review
Chesterton's Fence, Beyoncé Rule and 1 billion line Change List - How Google does it at scale
Reviewed in the United States on June 2, 2020
Google has open sourced "how" they do everything. This book is about “how Google runs code”. Systems engineers should pair this book with “Site Reliability Engineering” (from Google) on “how Google runs infra”. Senior org leaders should minimally read the first part (Culture) pairing with “Work Rules” (Laszlo bock on “how Google leads”). Security Engineers should buy the very recent “Building Secure and Reliable Systems” on “how Google implements defense”. Executives should read “How Google Works” by Eric Schmidt on - “how Google works ”!
First, It is admirable how the company wants to spread its hard-earned wisdom. Thank you! Many places do many things exceptionally well - but the earnestness to coach and educate others is an extreme rarity - a version of Survivorship Bias - among the “Big Techs”. Collectively, these five books, “Top N” Google Tech Talks (check YouTube) and - depending on your role - coding best practices (check their site) are collectively a combination of top notch combined technical and business education in itself.
Very rarely I give a 5-star rating to a technical book, but SEaG is completely worth the two weeks I immersed myself into learning Culture, Processes and Tools of Google Software Teams. The writers are deeply knowledgable, passionate about their crafts and offer insight after insight into topics ranging from human psychology to the flaws of over-embracing mock testing.
Not targeting on algorithm, language, tools or libraries, it focuses on “Software Engineering” as a system, i.e., stuff that is not taught at any school. The book disambiguates “programming” from “engineering”, and walks through the hardest engineering challenge - how to scale.
It also offers a treasure of insights and ideas from how Google made a billion lines of code change at one time, or how its automated tests could successfully process 50,000 change requests a day, or how Google Production System is perhaps one of the best machines humans ever engineered. Even if you are working nowhere near Google scale, the book covers a lot of fundamentals, especially on how to sustainably develop software and make objective trade-offs while doing so. I would very highly recommend SEaG to any engineering leader trying to improve her team's game. For developers, the four extensive chapters on testing should alone be worth the entry price.
Notes from the "Culture" - first of three - part of the book (most of these should be organization agnostic) -
Software Engineering is “Programming integrated over time”. From a reverse lens, code thus becomes a derivative of Software Engineering.
A project is sustainable if for the expected life span of the software, you are capable of reactive to valuable changes for either business or technical reasons.
The higher the stakes, the more imperfect the trade-off value metrics.
Your job as a leader is to aim for sustainability and managing scaling costs for the org, the product and the development workflow.
Hyperbolic Discounting - when we start coding, our implicit life-span assignment to the code is often in hours or days. As the late 90s joke went - WORA - Write Once, Run Away!
Hyrum’s Law: With a sufficient number of users of an API, all observable behavior of your system will depend on a random person doing a random thing. Conceptually similar to entropy - your system will inevitably progress to a high degree of disorder, chaos and instability. I.e., all abstractions are leaky.
Two spectrums of code - one way is hacky and clever, another is clean and maintainable. The most important decision is to ensure which way the codebase leans. It is programming if ‘clever’ is a complement, and software engineering if ‘clever’ is an accusation!
Google's SRE book talks about the complexity of managing one of the most complex machines created by humankind - Google Production system. This book focuses on the organization scale and processes to keep that machine running over time.
Scaling an org means sublinear scaling with regard to human interactions.
The Beyoncé rule: “If you liked it, you should have put a CI test on it”. i.e., if an untested change caused an incident, it is not the change’s fault.
An average Software Engineer produces a constant number of lines of code per unit time.
Treat whiteboard markers as precious goods. Well functioning teams use them a lot!
60-70% developers build locally. But Google built its own distributed build system. Ultimately, even the distributed build gets bloated - Jevon’s Paradox -- consumption of a resource increase as a response to greater efficiency in its use.
Humans are mostly a collection of intermittent bugs.
Nobody likes to be criticized, especially for things that aren’t finished.
DevOps in one sentence: Get feedback as early as possible, run tests as early as possible, think about security & production environments as early as possible - also known as “left shifting”.
Many eyes make all bugs shallow. Make reviews mandatory.
Three Pillars of Social Interaction - Humility, Respect, Trust.
Relationships always outlast projects.
You have two paths to choose from - one, learn, adapt and use the system; two, fight it steadily, as a small undeclared war, for the whole of your life.
Good Postmortem output - Summary, Timeline, Proximate Cause, Impact, Containment (Y/N), Resolved (Y/N) and Lessons Learned.
Psychological safety is the biggest thing in leading teams - take risks and learn to fail occasionally.
Software engineering is multi person development of multi-version programs,
Understand Context before Changing things - Chesterton’s Fence is a good mental model.
Google tends toward email-based workflows by default.
Being an expert and being kind are not mutually exclusive. No brilliant jerks!
Testing on the toilet (tips) and Learning on the Loo (productivity) are single-page newsletters inside toilet stalls.
1-2% of Google engineers are readability (Code Review tool) reviewers. They have demonstrated capability to consistently write clear, idiomatic and maintainable code for a given language. Code is read far more than it is written. Readability (review tool at Google) is high cost - trade-off of increased short-term review latency and upfront costs for long-term payoffs of higher-quality code.
Knowledge is the most important, though intangible, capital for software engineering org.
At a systemic level, encourage and reward those who take time to teach and broaden their expertise beyond (a) themselves, (b) team, and (c) organizations.
On diversity - Bias is the default; Don’t build for everyone. Build with everyone; Don’t assume equity; measure equity throughout your systems.
A good reason to become tech lead or manager is to scale yourself.
First rule of management? “Above all, resist the urge to manage.” Cure for “management disease” is a liberal application of “servant leadership”, assume you’re the butler.
Traditional managers worry about how to get things done, whereas great managers worry about what things get done (and trust their team to figure out how to do it).
Being manager - “sometimes you get to be the tooth fairy, other times you have to be the dentist”.
Managers, don’t try to be everyone’s friend.
Hiring: A hires A, while B hires C players.
Engineers develop an excellent sense of skepticism and cynicism but this is often a liability when leading a team. You would do well to be less vocally skeptical while “still letting your team know you’re aware of the intricacies and obstacles involved in your work”.
As a leader - Track Happiness - best leaders are amateur psychologists; Let your team know when they’re doing well; It’s easy to say “yes” to something that’s easy to undo; Focus on intrinsic motivation through autonomy, mastery and purpose; Delegation is really difficult to learn as it goes against all our instincts for efficiency and improvement.
Three “always” of leadership - Always be Deciding, Always be Leaving, Always be Scaling.
“Code Yellow” is Google’s term for “emergency hackathon to fix a critical problem”.
Good Management = 95% observation and listening + 5% making critical adjustments in just the right place.
To scale, aim to be in “uncomfortably exciting” space.
All your means of communication - email, chat, meetings - could be a Denial-of-Service attack against your time and attention. You are the “finally” clause in a long list of code blocks!
In pure reactive mode, you spend every moment of your life on urgent things, but almost none of it is important. Mapping a path through the forest is incredibly important but rarely ever urgent.
Your brain operates in natural 90-minute cycles. Take breaks!
Give yourself permission to take a mental health day.
Everyone has a goal to be “data driven”, but we often fall to the trap of “anecdata”.
Google uses GSM (Goals/Signals/Metrics) framework to select metrics to measure engineering productivity.
QUANTS - 5 components of productivity
Quality of code; Attention from engineers; Intellectual Complexity; Tempo (how quickly can engineers accomplish something) and velocity (how fast they can push their release out); Satisfaction
Quantitative metrics are useful because they give you power and scale. However, they don’t provide any context or narrative. When quantitative and qualitative metrics disagree, it is because the former do not capture the expected result!
Let go of the idea of measuring individuals and embrace measuring the aggregate. Before measuring productivity, ask whether the result is actionable. If not, it is not worth measuring.