Practical Statistics for Data Scientists: 50 Essential Concepts 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Fulfillment by Amazon (FBA) is a service we offer sellers that lets them store their products in Amazon's fulfillment centers, and we directly pack, ship, and provide customer service for these products. Something we hope you'll especially enjoy: FBA items qualify for FREE Shipping and Amazon Prime.
If you're a seller, Fulfillment by Amazon can help you grow your business. Learn more about the program.
There is a newer edition of this item:
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Frequently bought together
Customers who viewed this item also viewed
From the Publisher
About this Book
Data science is a fusion of multiple disciplines, including statistics, computer science, information technology and domain specific fields. As a result, a several different terms could be used to reference a given concept. Key terms and their synonyms will be highlighted throughout the book in a sidebar within the text.
This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics. Both of us came to the world of data science from the world of statistics, and have some appreciation of the contribution that statistics can make to the art of data science. At the same time, we are well aware of the limitations of traditional statistics instruction: statistics as a disciple is a century and a half old, and most statistics textbooks and courses are laden with the momentum and inertia worthy of an ocean liner.
Two goals underlie this book:
- To lay out, in digestible, navigable and easily referenced form, key concepts from statistics that are relevant to data science.
- To explain which concepts are important and useful from a data science perspective, which are less so, and why.
About the Author
Peter Bruce founded and grew the Institute for Statistics Education at Statistics.com, which now offers about 100 courses in statistics, roughly a third of which are aimed at the data scientist. In recruiting top authors as instructors and forging a marketing strategy to reach professional data scientists, Peter has developed both a broad view of the target market, and his own expertise to reach it.
Andrew Bruce has over 30 years of experience in statistics and data science in academia, government and business. He has a Ph.D. in statistics from the University of Washington and published numerous papers in refereed journals. He has developed statistical-based solutions to a wide range of problems faced by a variety of industries, from established financial firms to internet startups, and offers a deep understanding the practice of data science.
There was a problem filtering reviews right now. Please try again later.
* Decent review of core concepts
* Good coverage of importance of distinguishing between sample and population statistics
* Better discussion of bootstrapping than I've seen anywhere else
* Good ideas on dealing with non-normal data and avoiding the assumption that all data is normally distributed
* Assumes that you know R. Lots of code, no explanations of the code.
* Inconsistent level of detail and depth. Detailed coverage of mean, range, quartile, but rampant hand-waving when you get to bagging and boosting
* Many of the math explanations are unclear or incomplete. The authors make you do a lot of work to figure things out and you will need external resources
* The last part of the book is a thin and purely practical survey of ML models. You don't get much understanding of how or why things work.
I found this book a very engaging read: it sets itself apart from other books on statistics in clearly telling which concepts are not-so-relevant for the modern computerized explorative analysis toolset. Many concepts that are presented in classic books on the subjects are rooted in 20s and 30s where computing power wasn't available and researches resorted to various pre-calculated distributions and formulas to do their work. A modern data-scientist's approach would eschew some of the old ways and instead rely on randomization, resampling and computing power.
This book not only tells what something is, but also why it is that way and if a concept is still relevant today.
I can recommend this book if your statistics knowledge is spotty or ephemeral, it serves its purpose well and doesn't bog down the reader with (sometimes) unnecessary mathematical concepts to demonstrate an idea.
Why the four stars:
1. Lack of examples in programming languages.
2. Complete lack of exercises (at least 1-2 exercises are necessary).
3. All scarce examples that are available are in R. No Python. :(
However, with that said...
The concepts discussed in this book are surface level at best. You end up learning more from Google as you try to grasp a better understanding of what concept is being talked about. An intuitive understanding will not be learned as math examples are replaced by steps and R scripts. Also, one big caveat to the R in this book:
**** The R scripts in this book are not complete compared to the R scripts that you get from the GitHub page! ***
- You will end up debugging a lot of the scripts to make the examples work.
- Not all of the data you get from the Github page matches what you see in the graphs of the book. It ranges from small errors in percentage points, to entire directions being the opposite.
- Good luck when you get to Chapter 5. Some of the code is incomplete or contains a typo. (See image)
The only positive I can comment are the resources cited, though I'd avoid purchasing anything by the Bruces at this point.
Save yourself the $18.
It is true that the textbook does not provide in-depth coverage for all topics, but I don't think that was the intent of the authors. However, the text DOES provide an excellent introduction to topics relevant to students and data scientists. After reading the text and working through the examples, you will be equipped to further your knowledge in whichever topic you require for you data analysis task.
Top international reviews
Kurzum: Das Buch ist nicht mehr als ein Glossar, der die Methoden anreist und fast gar nicht gegeneinander Vergleicht. Mit Google wird man wesentlich besser informiert.
Vor allem richtet sich dieses Buch an Data Scientists und Leute die schon mal mit R. gearbeitet haben. Wer das bereits hat, der braucht dieses Buch nicht!
Schreibstil: Trocken, repetitiv und viele Vorwärts- und -Rückwärtsverweise.
Von daher keine Empfehlung!
The code snippets are in the 'R' language and unlike Python, R can be much more verbose and difficult to grasp. Fortunately in my case, I was a little familiar with R already but I have to say this also makes it a little hard to recommend if someone doesn't have time to learn a whole new programming language.
Regarding the quality of the book, the paper is decent at best and might get spoiled easily if you are fond of using a highlighter. Graphs are also usually not very useful because not only is the book grayscale but the images themselves are quite pixelated.
Overall definitely not a must have book but cant say it is useless either, at least it wasn't for me. For more specific areas of 'Data Science' definitely better resources exist.
I definitely recommend this book as a great read; however if you have access to Safari, you really don’t need this version of the book.
The only thing I think it lag is no exercise at the end of each section. It would be helpful if there were some exercise at the end of each section. There were few typo(minor only) in the book that need correction.