Similar authors to follow
Manage your follows
About Betsy Beyer
Betsy is a Technical Writer for Google in NYC specializing in Site Reliability Engineering. She has previously written documentation for Google's Data Center and Hardware Operations Teams in Mountain View and across its globally-distributed data centers. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University. En route to her current career, Betsy studied International Relations and English Literature, and holds degrees from Stanford and Tulane.
Customers Also Bought Items By
The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?
In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization.
This book is divided into four sections:
- Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices
- Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
- Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems
- Management—Explore Google's best practices for training, communication, and meetings that your organization can use
Can a system be considered truly reliable if it isn't fundamentally secure? Or can it be considered secure if it's unreliable? Security is crucial to the design and operation of scalable systems in production, as it plays an important part in product quality, performance, and availability. In this book, experts from Google share best practices to help your organization design scalable and reliable systems that are fundamentally secure.
Two previous O’Reilly books from Google—Site Reliability Engineering and The Site Reliability Workbook—demonstrated how and why a commitment to the entire service lifecycle enables organizations to successfully build, deploy, monitor, and maintain software systems. In this latest guide, the authors offer insights into system design, implementation, and maintenance from practitioners who specialize in security and reliability. They also discuss how building and adopting their recommended best practices requires a culture that’s supportive of such change.
You’ll learn about secure and reliable systems through:
- Design strategies
- Recommendations for coding, testing, and debugging practices
- Strategies to prepare for, respond to, and recover from incidents
- Cultural best practices that help teams across your organization collaborate effectively
In 2016, Google’s Site Reliability Engineering book ignited an industry discussion on what it means to run production services today—and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment.
This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.
Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is.
- How to run reliable services in environments you don’t completely control—like cloud
- Practical applications of how to create, monitor, and run your services via Service Level Objectives
- How to convert existing ops teams to SRE—including how to dig out of operational overload
- Methods for starting SRE from either greenfield or brownfield
Nesta coletânea de dissertações e artigos, membros essenciais da equipe de SRE (Site Reliability Engineering – Engenharia de Confiabilidade) do Google explicam como e por que seu comprometimento com todo o ciclo de vida tem permitido que a empresa desenvolva, implante, monitore e mantenha alguns dos maiores sistemas de software do mundo com sucesso. Você conhecerá os princípios e as práticas que possibilitam aos engenheiros do Google deixar os sistemas mais escaláveis, confiáveis e eficientes – lições que podem ser diretamente aplicáveis à sua empresa.
Este livro está dividido em quatro partes:
•Introdução – Saiba o que é SRE e por que ela difere das práticas convencionais do mercado de TI.
•Princípios – Analise os padrões, os comportamentos e as áreas de preocupação que influenciam o trabalho de um SRE (Site Reliability Engineer – Engenheiro de Confiabilidade).
•Práticas – Entenda a teoria e a prática do trabalho cotidiano de um SRE: desenvolver e operar sistemas computacionais distribuídos de grande porte.
•Gerenciamento – Explore as melhores práticas do Google para treinamento, comunicação e reuniões, que poderão ser usadas pela sua empresa.