search
原創譯文 |四月讀書:10本免費機器學習與數據科學電子書

原創譯文 |四月讀書:10本免費機器學習與數據科學電子書

轉載聲明

本文為燈塔大數據原創內容,歡迎個人轉載至朋友圈,其他機構轉載請在文章開頭標註:

一年之計在於春。春天都不看書的話,你還指望自己哪個季節看?四月不翻書,五月就參與不了同行討論,六月就看不懂行業新聞,七月老闆就該讓你去財務結賬走人了。論大數據人的自我修養:手不釋卷。

給你準備了10本免費機器學習和數據科學電子書,就問你,看不看?!

本書單根據內容深度排序,從基礎統計學到基礎機器學習,再到內容更深入的專著,從具體的話題討論,到整個行業的分析。這些書中既有數據科學經典讀本,也有最近出版的新書,希望你能從中找到感興趣的閱讀材料。

1.《統計思維:程序員的概率與統計學》

ThinkStats: Probability and Statistics for Programmers

作者:Allen B. Downey

《統計思維》是針對Python程序員編寫的概率和統計學專著。

《統計思維》一書強調利用簡單的技術處理實際數據集(real dataset)、回答一些有趣的問題。書中還介紹了對美國國立衛生研究院的案例分析。作者鼓勵讀者在實際數據集項目中通過實踐來學習。

註:在公眾號中回復 「4.1」 下載此書

2. 《黑客的概率編程與貝葉斯方法》

Probabilistic Programming & BayesianMethods for Hackers

作者:Cam Davidson-Pilon

從計算理解第一、數學第二的角度介紹了貝葉斯方法和概率編程。

貝葉斯方法是推理的自然方法,本書用很多章節詳細解釋了數學分析過程。一般介紹貝葉斯推理的文本都會用2到3章介紹概率論,然後再介紹什麼是貝葉斯推理。

但是由於貝葉斯模型涉及到的數學內容對一般讀者來說太困難,所以很多書在介紹貝葉斯模型的時候只會用到簡單的、理想化的案例。這其實是讓讀者對貝葉斯模型理解的誤導。

事實上,這本書就是作者避免了上述情形來寫的。

註:在公眾號中回復 「4.2」 下載此書

3. 《深入理解機器學習:從原理到演算法》

Understanding Machine Learning: From Theoryto Algorithms

By Shai Shalev-Shwartz and Shai Ben-David

本書為劍橋大學機器學習教材。機器學習是計算機科學領域發展最快的分支之一,其應用具有深遠的意義。

本教科書的目的是有條理地介紹機器學習及為讀者提供演算法範例。本書介紹了機器學習基礎知識,並詳細解釋了將這些原理轉化為實際演算法的數學推導理論論述。

除了介紹基礎知識之外,本書還涵蓋了以前教科書無法解決的一系列中心課題,包括討論學習的計算複雜性,分析了凸度和穩定性的概念,書中介紹的重要的演算法範例包括隨機梯度下降,神經網路和結構輸出學習,同時還介紹了諸如PAC-Bayes方法和基於壓縮的邊界等新興理論概念。

註:在公眾號中回復 「4.3」 下載此書

4. 《統計學習基礎:數據挖掘、統計與預測》

The Elements of Statistical Learning

作者:Trevor Hastie, Robert Tibshirani and Jerome Friedman

這本書在普遍概念框架中描述了數據學領域的重要思想。雖然這種方法屬於統計學範疇,但本書的重點在於概念而不是數學。

書中列舉了許多例子,並大膽的採用色彩豐富的圖片。對所有數據科學或行業數據挖掘感興趣的人來說,這本書都是不可不讀的寶貴的資源。

這本書的內容涵蓋範圍廣泛,從監督式學習(預測)到無監督式學習。討論的話題包括神經網路,支持向量機,分類樹,其對boosting演算法的討論更是首創。

5. 《統計學習導論:基於R應用》

An Introduction to Statistical Learningwith Applications in R

作者:Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

本書為數據學習方法的導論,面向非數學專業的高年級大學部生、碩士和博士研究所。本書還涵蓋了大量的R實驗,詳細解釋了在實際生活中如何踐行不同的方法,因此對於實踐派數據科學家來說是有用的資源。

註:在公眾號中回復 「4.5」 下載此書

6. 《數據學基礎》

Foundations of Data Science

作者:Avrim Blum, John Hopcroft, and Ravindran Kannan

雖然計算機科學的傳統領域仍然非常重要,但越來越多的研究人員將使用計算機來了解和提取應用程序中出現的大量數據的可用信息,而不僅僅是利用計算機解決特定的問題。

考慮到這一點,本書在寫作中涵蓋未來40年中可能會有所應用的理論,正如過去40年裡,自動控制原理,演算法和相關主題發揮的巨大作用一樣。

註:在公眾號中回復 「4.6」 下載此書

7. 《寫給程序員的數據挖掘實踐指南》

A Programmer's Guide to Data Mining: TheAncient Art of the Numerati

作者:Ron Zacharski

本指南遵循循序漸進、逐步學習的原則引導讀者閱讀。在閱讀本書時,作者建議讀者通過提供的Python代碼來練習和實驗。

希望讀者能積极參与數據挖掘技術的試用和編程。這本書可以說是給讀者提供了手把手輔導,閱讀完此書會給讀者日後更深入了解數據挖掘技術打下堅實基礎。

註:在公眾號中回復 「4.7」 下載此書

8. 《大數據:互聯網大規模數據挖掘與分散式處理》

Mining of Massive Datasets

作者:Jure Leskovec, Anand Rajaraman and Jeff Ullman

本書的編寫基於斯坦福大學計算機科學課程「CS246: Mining Massive Datasets」。

這本書和斯坦福的課程一樣,是為沒有計算機基礎的大學部學生設計的。為了支持讀者進行更深入的探索,大部分章節最後都補充了深度閱讀參考資料。

註:在公眾號中回復 「4.8」 下載此書

9. 《深度學習》

Deep Learning

作者:Ian Goodfellow, Yoshua Bengio and Aaron Courville

《深度學習》這本書旨在幫助學生和從業人員了解機器學習領域,特別是深度學習。

註:在公眾號中回復 「4.9」 下載此書

10. 《嚮往的機器學習》

Machine Learning Yearning

作者:吳恩達

人工智慧,機器學習和深度學習掀起了眾多行業的改革浪潮。但建立機器學習系統之前,你必須做出以下幾個具有實際意義的決定:

你還要收集更多的訓練數據嗎?

你應該採用端對端的深度學習嗎?

你打算如何應對訓練數據與測試數據機組不匹配的情況?

等等……

以前,要想知道怎麼回答這些問題、怎麼做出明智決策,你只能回學校去讀研究所課程,或者進公司給前輩當學徒。為了改變這種情況,作者寫作了此書,幫助讀者更好的構建人工智慧系統。

註:在公眾號中回復 「4.10」 下載此書

英文原文

10 Free Must-Read Books for MachineLearning and Data Science

Spring. Rejuvenation. Rebirth. Everything』sblooming. And, of course, people want free ebooks. With that in mind, here's alist of 10 free machine learning and data science titles to get your springreading started right.

What better way to enjoy this springweather than with some free machine learning and data science ebooks? Right?Right?

Here is a quick collection of such books tostart your fair weather study off on the right foot.

The list begins with abase of statistics, moves on to machine learning foundations, progresses to afew bigger picture titles, has a quick look at an advanced topic or 2, and endsoff with something that brings it all together.

A mix of classic andcontemporary titles, hopefully you find something new (to you) and of interesthere.

1. Think Stats: Probability and Statisticsfor Programmers

By Allen B. Downey

Think Stats is an introduction toProbability and Statistics for Python programmers.

Think Stats emphasizes simple techniquesyou can use to explore real data sets and answer interesting questions. Thebook presents a case study using data from the National Institutes of Health.Readers are encouraged to work on a project with real datasets.

2. Probabilistic Programming & BayesianMethods for Hackers

By Cam Davidson-Pilon

An intro to Bayesian methods andprobabilistic programming from a computation or understanding-first,mathematics-second point of view.

The Bayesian method is the natural approachto inference, yet it is hidden from readers behind chapters of slow,mathematical analysis. The typical text on Bayesian inference involves two tothree chapters on probability theory, then enters what Bayesian inference is.

Unfortunately, due to mathematical intractability of most Bayesian models, thereader is only shown simple, artificial examples. This can leave the user witha so-what feeling about Bayesian inference. In fact, this was the author's ownprior opinion.

3. Understanding Machine Learning: FromTheory to Algorithms

By Shai Shalev-Shwartz and Shai Ben-David

Machine learning is one of the fastestgrowing areas of computer science, with far-reaching applications. The aim ofthis textbook is to introduce machine learning, and the algorithmic paradigmsit offers, in a principled way.

The book provides a theoretical account of thefundamentals underlying machine learning and the mathematical derivations thattransform these principles into practical algorithms. Following a presentationof the basics, the book covers a wide array of central topics unaddressed byprevious textbooks.

These include a discussion of the computational complexityof learning and the concepts of convexity and stability; important algorithmicparadigms including stochastic gradient descent, neural networks, andstructured output learning; and emerging theoretical concepts such as thePAC-Bayes approach and compression-based bounds.

4. The Elements of Statistical Learning

By Trevor Hastie, Robert Tibshirani andJerome Friedman

This book descibes the important ideas inthese areas in a common conceptual framework. While the approach isstatistical, the emphasis is on concepts rather than mathematics.

Many examplesare given, with a liberal use of color graphics. It should be a valuableresource for statisticians and anyone interested in data mining in science orindustry. The book's coverage is broad, from supervised learning (prediction)to unsupervised learning.

The many topics include neural networks, supportvector machines, classification trees and boosting--the first comprehensivetreatment of this topic in any book.

5. An Introduction to Statistical Learningwith Applications in R

By Gareth James, Daniela Witten, TrevorHastie and Robert Tibshirani

This book provides an introduction tostatistical learning methods. It is aimed for upper level undergraduatestudents, masters students and Ph.D. students in the non-mathematical sciences.

The book also contains a number of R labs with detailed explanations on how toimplement the various methods in real life settings, and should be a valuableresource for a practicing data scientist.

6. Foundations of Data Science

By Avrim Blum, John Hopcroft, and RavindranKannan

While traditional areas of computer scienceremain highly important, increasingly researchers of the future will beinvolved with using computers to understand and extract usable information frommassive data arising in applications, not just how to make computers useful onspecific well-defined problems.

With this in mind we have written this book tocover the theory likely to be useful in the next 40 years, just as anunderstanding of automata theory, algorithms, and related topics gave studentsan advantage in the last 40 years.

7. A Programmer's Guide to Data Mining: TheAncient Art of the Numerati

By Ron Zacharski

This guide follows a learn-by-doingapproach. Instead of passively reading the book, I encourage you to workthrough the exercises and experiment with the Python code I provide.

I hope youwill be actively involved in trying out and programming data mining techniques.The textbook is laid out as a series of small steps that build on each otheruntil, by the time you complete the book, you have laid the foundation forunderstanding data mining techniques.

8. Mining of Massive Datasets

By Jure Leskovec, Anand Rajaraman and JeffUllman

The book is based on Stanford ComputerScience course CS246: Mining Massive Datasets (and CS345A: Data Mining).

The book, like the course, is designed atthe undergraduate computer science level with no formal prerequisites. Tosupport deeper explorations, most of the chapters are supplemented with furtherreading references.

9. Deep Learning

By Ian Goodfellow, Yoshua Bengio and AaronCourville

The Deep Learning textbook is a resourceintended to help students and practitioners enter the field of machine learningin general and deep learning in particular. The online version of the book isnow complete and will remain available online for free.

10. Machine Learning Yearning

By Andrew Ng

AI, Machine Learning and Deep Learning aretransforming numerous industries. But building a machine learning systemrequires that you make practical decisions:

Should you collect more training data?

Should you use end-to-end deep learning?

How do you deal with your training set notmatching your test set?

and many more.

Historically, the only way to learn how tomake these "strategy" decisions has been a multi-year apprenticeshipin a graduate program or company. I am writing a book to help you quickly gainthis skill, so that you can become better at building AI systems.

推薦圖書的全部清單,請在公眾號里回復「四月計劃 」即可下載!

翻譯:燈塔大數據

閱讀原文了解更多詳情

熱門推薦

本文由 一點資訊 提供 原文連結

一點資訊
寫了5860316篇文章,獲得23286次喜歡
留言回覆
回覆
精彩推薦