- Machine Learning, Tom Mitchell—a very readable intro to ML book. Surprisingly relevant today even after 20-30 years. Starts simple with discrete data, finite hypothesis classes, inductive bias, an intuitive exposure to no free lunch. Works through decision trees, neural networks (perhaps the most dates aspect), basic statistics, a high-level exposure to PAC-learning. Scores major points as an intro text for broad scope, conversational tone, readability. [Amazon] [Free]
- Pattern Recognition and Machine Learning, Christopher Bishop—Probably not the textbook to try to chug through when you're new to the field (will have to survive 100-200 pages before getting to your first practical model and not hands-on, but perhaps the best second textbook, for someone who already knows enough to be dangerous but wants to develop a more rigorous understanding. [Amazon] [Free]
- A Mathematical Theory of Communication, Claude Shannon—the foundational paper in the area, still reads crystal clear.
- Probabilistic Reasoning in Intelligent Systems, Judea Pearl—The classic text on PGMs. Great for getting a feeling of where the ideas came from and to see how they were argued for in a pre-probabilistic AI community but be careful, a lot has changed since 1988.
- Mathematical Monk's explanation of the forward-backward algorithm for HMMs
- Understanding Machine Learning, Shai Shalev-Schwartz and Shai Ben-David—great textbook introducing key ideas of PAC-learning, VC-dimension.
- Reinforcement Learning: An Introduction by Sutton and Barto—the definitive introductory text in reinforcement learning. Easy to read and few prerequisites required. Long on intuition and short on maths, a perfect first exposure to the topic. [Amazon] [Free (Draft of 2nd Ed.)]
- David Silver's UCL RL Course Lectures—Starts at the very beginning. What is reinforcement learning? How does it fit with other related disciplines. Walks through basics of Markov processes (MPs), Markov Reward Processes (MRPs), Markov Decision Processes (MDPs). Introduces basics of policy evaluation and improvement in tabular case: value iteration, policy iteration, TD learning, MC rollouts, Q-learning, SARSA. Then goes into function approximation and the familiar modern tools of DQNs, policy gradient, actor critic, etc. Worderfully delivered.
Within each topic, sub bullet by type of resource / flag types for (i) books, (ii) lecture notes, (iii) videos, (iv) code, (v)blogposts, (vi) foundational/historical papers