- File Size: 17058 KB
- Print Length: 318 pages
- Simultaneous Device Usage: Unlimited
- Publisher: O'Reilly Media; 1 edition (May 10, 2017)
- Publication Date: May 10, 2017
- Language: English
- ASIN: B071NVDFD6
- Text-to-Speech: Enabled
- Word Wise: Not Enabled
- Lending: Not Enabled
- Amazon Best Sellers Rank: #388,031 Paid in Kindle Store (See Top 100 Paid in Kindle Store)
Practical Statistics for Data Scientists: 50 Essential Concepts 1st Edition, Kindle Edition
Use the Amazon App to scan ISBNs and compare prices.
There is a newer version of this item:
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Customers who bought this item also bought
About the Author
Peter Bruce founded and grew the Institute for Statistics Education at Statistics.com, which now offers about 100 courses in statistics, roughly a third of which are aimed at the data scientist. In recruiting top authors as instructors and forging a marketing strategy to reach professional data scientists, Peter has developed both a broad view of the target market, and his own expertise to reach it.
Andrew Bruce has over 30 years of experience in statistics and data science in academia, government and business. He has a Ph.D. in statistics from the University of Washington and published numerous papers in refereed journals. He has developed statistical-based solutions to a wide range of problems faced by a variety of industries, from established financial firms to internet startups, and offers a deep understanding the practice of data science.
Would you like to tell us about a lower price?
There was a problem filtering reviews right now. Please try again later.
* Decent review of core concepts
* Good coverage of importance of distinguishing between sample and population statistics
* Better discussion of bootstrapping than I've seen anywhere else
* Good ideas on dealing with non-normal data and avoiding the assumption that all data is normally distributed
* Assumes that you know R. Lots of code, no explanations of the code.
* Inconsistent level of detail and depth. Detailed coverage of mean, range, quartile, but rampant hand-waving when you get to bagging and boosting
* Many of the math explanations are unclear or incomplete. The authors make you do a lot of work to figure things out and you will need external resources
* The last part of the book is a thin and purely practical survey of ML models. You don't get much understanding of how or why things work.
I found this book a very engaging read: it sets itself apart from other books on statistics in clearly telling which concepts are not-so-relevant for the modern computerized explorative analysis toolset. Many concepts that are presented in classic books on the subjects are rooted in 20s and 30s where computing power wasn't available and researches resorted to various pre-calculated distributions and formulas to do their work. A modern data-scientist's approach would eschew some of the old ways and instead rely on randomization, resampling and computing power.
This book not only tells what something is, but also why it is that way and if a concept is still relevant today.
I can recommend this book if your statistics knowledge is spotty or ephemeral, it serves its purpose well and doesn't bog down the reader with (sometimes) unnecessary mathematical concepts to demonstrate an idea.
Why the four stars:
1. Lack of examples in programming languages.
2. Complete lack of exercises (at least 1-2 exercises are necessary).
3. All scarce examples that are available are in R. No Python. :(
However, with that said...
The concepts discussed in this book are surface level at best. You end up learning more from Google as you try to grasp a better understanding of what concept is being talked about. An intuitive understanding will not be learned as math examples are replaced by steps and R scripts. Also, one big caveat to the R in this book:
**** The R scripts in this book are not complete compared to the R scripts that you get from the GitHub page! ***
- You will end up debugging a lot of the scripts to make the examples work.
- Not all of the data you get from the Github page matches what you see in the graphs of the book. It ranges from small errors in percentage points, to entire directions being the opposite.
- Good luck when you get to Chapter 5. Some of the code is incomplete or contains a typo. (See image)
The only positive I can comment are the resources cited, though I'd avoid purchasing anything by the Bruces at this point.
Save yourself the $18.
It is true that the textbook does not provide in-depth coverage for all topics, but I don't think that was the intent of the authors. However, the text DOES provide an excellent introduction to topics relevant to students and data scientists. After reading the text and working through the examples, you will be equipped to further your knowledge in whichever topic you require for you data analysis task.
Top international reviews
Kurzum: Das Buch ist nicht mehr als ein Glossar, der die Methoden anreist und fast gar nicht gegeneinander Vergleicht. Mit Google wird man wesentlich besser informiert.
Vor allem richtet sich dieses Buch an Data Scientists und Leute die schon mal mit R. gearbeitet haben. Wer das bereits hat, der braucht dieses Buch nicht!
Schreibstil: Trocken, repetitiv und viele Vorwärts- und -Rückwärtsverweise.
Von daher keine Empfehlung!
The code snippets are in the 'R' language and unlike Python, R can be much more verbose and difficult to grasp. Fortunately in my case, I was a little familiar with R already but I have to say this also makes it a little hard to recommend if someone doesn't have time to learn a whole new programming language.
Regarding the quality of the book, the paper is decent at best and might get spoiled easily if you are fond of using a highlighter. Graphs are also usually not very useful because not only is the book grayscale but the images themselves are quite pixelated.
Overall definitely not a must have book but cant say it is useless either, at least it wasn't for me. For more specific areas of 'Data Science' definitely better resources exist.
I definitely recommend this book as a great read; however if you have access to Safari, you really don’t need this version of the book.
The only thing I think it lag is no exercise at the end of each section. It would be helpful if there were some exercise at the end of each section. There were few typo(minor only) in the book that need correction.