Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines 1st Edition

4.4 out of 5 stars 118 ratings
ISBN-13: 978-1492079392
ISBN-10: 1492079391
Why is ISBN important?
ISBN
This bar-code number lets you verify that you're getting exactly the right version or edition of a book. The 13-digit and 10-digit formats both work.
Scan an ISBN with your phone
Use the Amazon App to scan ISBNs and compare prices.
Share <Embed>
Loading your book clubs
There was a problem loading your book clubs. Please try again.
Not in a club? Learn more
Amazon book clubs early access

Join or create book clubs

Choose books together

Track your books
Bring your club to Amazon Book Clubs, start a new book club and invite your friends to join, or find a club that’s right for you for free.
Buy new:
$14.21
In Stock.
Ships from and sold by Amazon.com.
List Price: $69.99 Details
Save: $55.78 (80%)
Get Fast, Free Shipping with Amazon Prime
FREE delivery Sunday, July 10 if you spend $25 on items shipped by Amazon
Or fastest delivery Thursday, July 7. Order within 20 hrs 47 mins
Data Science on AWS: Impl... has been added to your Cart

Kindle Epic Summer Challenge. Unlock achievements, crush your reading list pantry

Frequently bought together

  • Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines
  • +
  • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Syst
  • +
  • Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
Total price:
To see our price, add these items to your cart.
Choose items to buy together.

From the brand


From the Publisher

Overview of the Chapters

Chapter 1 provides an overview of the broad and deep Amazon AI and ML stack, an enormously powerful and diverse set of services, open source libraries, and infrastructure to use for data science projects of any complexity and scale.

Chapter 2 describes how to apply the Amazon AI and ML stack to real-world use cases for recommendations, computer vision, fraud detection, natural language understanding (NLU), conversational devices, cognitive search, customer support, industrial predictive maintenance, home automation, Internet of Things (IoT), healthcare, and quantum computing.

Chapter 3 demonstrates how to use AutoML to implement a specific subset of these use cases with SageMaker Autopilot.

Chapters 4–9 dive deep into the complete model development life cycle (MDLC) for a BERT-based NLP use case, including data ingestion and analysis, feature selection and engineering, model training and tuning, and model deployment with Amazon SageMaker, Amazon Athena, Amazon Redshift, Amazon EMR, TensorFlow, PyTorch, and serverless Apache Spark.

Chapter 10 ties everything together into repeatable pipelines using MLOps with SageMaker Pipelines, Kubeflow Pipelines, Apache Airflow, MLflow, and TFX.

Chapter 11 demonstrates real-time ML, anomaly detection, and streaming analytics on real-time data streams with Amazon Kinesis and Apache Kafka.

Chapter 12 presents a comprehensive set of security best practices for data science projects and workflows, including IAM, authentication, authorization, network isolation, data encryption at rest, post-quantum network encryption in transit, governance, and auditability.

Throughout the book, we provide tips to reduce cost and improve performance for data science projects on AWS.

Editorial Reviews

Review

"Wow--this book will help you to bring your data science projects from idea all the way
to production. Chris and Antje have covered all of the important concepts and the
key AWS services, with plenty of real-world examples to get you started
on your data science journey."
--Jeff Barr,
Vice President & Chief Evangelist,
Amazon Web Services

"It's very rare to find a book that comprehensively covers the full end-to-end process of
model development and deployment! If you're an ML practitioner, this book is a must!"
--Ramine Tinati,
Managing Director/Chief Data Scientist Applied Intelligence,
Accenture

"This book is a great resource for building scalable machine learning solutions on AWS
cloud. It includes best practices for all aspects of model building, including training,
deployment, security, interpretability, and MLOps."
--Geeta Chauhan,
AI/PyTorch Partner Engineering Head,
Facebook AI

"The landscape of tools on AWS for data scientists and engineers can be absolutely
overwhelming. Chris and Antje have done the community a service by providing a map
that practitioners can use to orient themselves, find the tools they need to get the
job done and build new systems that bring their ideas to life."
--Josh Wills,
Author, Advanced Analytics with Spark (O'Reilly)

"Successful data science teams know that data science isn't just modeling but needs a
disciplined approach to data and production deployment. We have an army of tools for all
of these at our disposal in major clouds like AWS. Practitioners will appreciate this
comprehensive, practical field guide that demonstrates not just how to apply
the tools but which ones to use and when."
--Sean Owen,
Principal Solutions Architect,
Databricks

From the Author

With this practical book, AI and machine learning (ML) practitioners will learn how
to successfully build and deploy data science projects on Amazon Web Services
(AWS). The Amazon AI and ML stack unifies data science, data engineering, and
application development to help level up your skills. This guide shows you how to
build and run pipelines in the cloud, then integrate the results into applications in
minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth
demonstrate how to reduce cost and improve performance.
* Apply the Amazon AI and ML stack to real-world use cases for natural language
processing, computer vision, fraud detection, conversational devices, and more.
* Use automated ML (AutoML) to implement a specific subset of use cases with
Amazon SageMaker Autopilot.
* Dive deep into the complete model development life cycle for a BERT-based natural
language processing (NLP) use case including data ingestion and analysis,
and more.
* Tie everything together into a repeatable ML operations (MLOps) pipeline.
* Explore real-time ML, anomaly detection, and streaming analytics on real-time
data streams with Amazon Kinesis and Amazon Managed Streaming for Apache
Kafka (Amazon MSK).
* Learn security best practices for data science projects and workflows, including
AWS Identity and Access Management (IAM), authentication, authorization, and
more.

Overview of the Chapters
Chapter 1 provides an overview of the broad and deep Amazon AI and ML stack, an
enormously powerful and diverse set of services, open source libraries, and infrastructure
to use for data science projects of any complexity and scale.
Chapter 2 describes how to apply the Amazon AI and ML stack to real-world use
cases for recommendations, computer vision, fraud detection, natural language
understanding (NLU), conversational devices, cognitive search, customer support,
industrial predictive maintenance, home automation, Internet of Things (IoT),
healthcare, and quantum computing.
Chapter 3 demonstrates how to use AutoML to implement a specific subset of these
use cases with SageMaker Autopilot.
Chapters 4-9 dive deep into the complete model development life cycle (MDLC) for a
BERT-based NLP use case, including data ingestion and analysis, feature selection
and engineering, model training and tuning, and model deployment with SageMaker,
Amazon Athena, Amazon Redshift, Amazon EMR, TensorFlow, PyTorch, and serverless
Apache Spark.
Chapter 10 ties everything together into repeatable pipelines using MLOps with Sage‐
Maker Pipelines, Kubeflow Pipelines, Apache Airflow, MLflow, and TFX.
Chapter 11 demonstrates real-time ML, anomaly detection, and streaming analytics
on real-time data streams with Amazon Kinesis and Apache Kafka.
Chapter 12 presents a comprehensive set of security best practices for data science
projects and workflows, including IAM, authentication, authorization, network isolation,
data encryption at rest, post-quantum network encryption in transit, governance,
and auditability.
Throughout the book, we provide tips to reduce cost and improve performance for
data science projects on AWS.

Who Should Read This Book
This book is for anyone who uses data to make critical business decisions. The guidance
here will help data analysts, data scientists, data engineers, ML engineers,
research scientists, application developers, and DevOps engineers broaden their
understanding of the modern data science stack and level up their skills in the cloud.
The Amazon AI and ML stack unifies data science, data engineering, and application
development to help users level up their skills beyond their current roles. We show
how to build and run pipelines in the cloud, then integrate the results into applications
in minutes instead of days.

Ideally, and to get most out of this book, we suggest readers have the following
knowledge:
* Basic understanding of cloud computing
* Basic programming skills with Python, R, Java/Scala, or SQL
* Basic familiarity with data science tools such as Jupyter Notebook, pandas,
NumPy, or scikit-learn

Product details

  • Publisher ‏ : ‎ O'Reilly Media; 1st edition (April 27, 2021)
  • Language ‏ : ‎ English
  • Paperback ‏ : ‎ 524 pages
  • ISBN-10 ‏ : ‎ 1492079391
  • ISBN-13 ‏ : ‎ 978-1492079392
  • Item Weight ‏ : ‎ 1.82 pounds
  • Dimensions ‏ : ‎ 7 x 1.05 x 9.19 inches
  • Customer Reviews:
    4.4 out of 5 stars 118 ratings

About the authors

Follow authors to get new release updates, plus improved recommendations.

Customer reviews

4.4 out of 5 stars
4.4 out of 5
118 global ratings

Top reviews from the United States

Reviewed in the United States on May 1, 2021
2 people found this helpful
Report abuse
Reviewed in the United States on August 8, 2021
9 people found this helpful
Report abuse
Reviewed in the United States on April 5, 2022
Reviewed in the United States on March 17, 2022
Reviewed in the United States on March 16, 2022
Reviewed in the United States on August 20, 2021
8 people found this helpful
Report abuse
Reviewed in the United States on August 15, 2021
Reviewed in the United States on May 26, 2021

Top reviews from other countries

octavian
2.0 out of 5 stars Poor printing quality for the price
Reviewed in the United Kingdom on February 2, 2022
Customer image
octavian
2.0 out of 5 stars Poor printing quality for the price
Reviewed in the United Kingdom on February 2, 2022
Just received the book and the contents are printed in black and white. For the price, I expected the same printing quality as Hands On Machine Learning with Keras and Tensorflow book, also by O'Reilly, which has glossy pages and colour printing. Not to mention that most of figures/graphs are not visible because the image quality is very bad and the back and white printing makes it difficult to read charts properly. Disappointed...

Contents wise, it is a well-written book and very concise and hence 5 stars out of respect for the authors who had no control over the printing quality of the book.

EDIT 22.03.2022

After going through some good part of the book I have to say that I has not became almost impossible to follow. Code snippets are hard or impossible to follow even if you refer to the full notebooks/code files on GitHub. They do not match and even if they do it is very difficult to relate the code snippets in the book with the code on GitHub. Also, using BERT embeddings to demonstrate data science on AWS is absolutely weird. It can be very confusing for most people starting their journey in data science on AWS or data science in general. Could have used easier or more conventional datasets to demonstrate how the presented concepts work in practice. All in all, I do not recommend this book at all. Apart from the first 4 chapters, there is nothing else you can do with it. Use the tutorials on YouTube instead.
Images in this review
Customer image Customer image
Customer imageCustomer image
Adebiyi Abdurrahman
5.0 out of 5 stars Worth it
Reviewed in the United Kingdom on June 2, 2021
Veeresh Shringari
5.0 out of 5 stars Amazing book
Reviewed in the United Kingdom on June 13, 2021
Customer image
Veeresh Shringari
5.0 out of 5 stars Amazing book
Reviewed in the United Kingdom on June 13, 2021
Very informative book for data science on AWS
Images in this review
Customer image
Customer image
Michal Szarkowicz
5.0 out of 5 stars Excellent book for those pursuing AWS certification
Reviewed in the United Kingdom on June 26, 2021
龍王
5.0 out of 5 stars good
Reviewed in Japan on October 14, 2021