How to self-study Computer Science and Data Science

The Lord of self-studying: The Fellowship of studying

cs
learning
data
Author

Cuong Pham

Published

August 15, 2019

Today, a friend asked me how to study computer science well — a popular question that many others wonder too. Thus, I decided to write an all-in-one note to explain in detail about learning computer science and data science as well. I’m not excellent in computer science at all, and this note is only sharing some experiences I have.

Before you start

Knowing where you are is important. You need to understand how much CS knowledge you have. This journey is long and exhausted but I can ensure it’ll be fun as well, if you really like those stuff.

Prepare some things:

  • A notebook and a pen to note and visualize important things while you learn such as how HTTP requests are sent and received, how CPU and RAM works, and so on. Visualization always helps you understand things faster and easier.

Here are some examples of things I drew when learning about CPU and RAM:

CPU and RAM notes

More visualization notes
  • A laptop/computer with Linux/OSX operation system. Yup, you can use Windows if you like, but I prefer Linux to learn programming.

  • A lot of time

  • And last but not least, a superior unbreakable persistent mind

There are 3 parts of this note:

  1. Learning Maths
  2. Learning Computer Science
  3. Learning Data Science

Maths is basic stuff that support everything else, so I put it into an independent part.

Note

All 3 parts will only contain materials which I personally like and find them useful the most.

Important

Bold courses are must-taken or you should be familiar with the outcomes of those courses (so you don’t have to re-take it). Other courses are useful and good to learn.

Learning Path

For those who want to learn programming applied in non-CS areas (like business, finance, health care, …), this is the learning path I think able to help you learn effectively:

  1. Read 2 starter books
  2. Do maths courses Math_01Math_02Math_03 or make sure you understand all the concepts in each course
  3. Do core courses in Computer Science: 1238 (in course 8 only need Lecture 1 to 8)
  4. Learn Python: Py_2Py_5 (do all projects)
  5. Learn DS and Algorithm: Al_01Al_02Al_05
  6. Do real projects

Mathematics

The courses below should be taken in order.

No. Course What you expected to learn
Math_01 Single Variable Calculus Limit, Derivatives, Chain Rules, Approximation, Mean Value Theorem, Riemann Sums, The Fundamental Theorem of Calculus, Integrals and Application, L’Hospital’s Rule, Infinite Series, Taylor’s Series
Math_02 Multivariable Calculus Vectors and operators on vectors, Determinants and Matrices (Basics), Partial Derivatives, Second Derivative Test, Total Differentials and Chain Rule, Gradient, Lagrange Multipliers, Double Integrals, Vector Fields, Curl, Green’s Theorem, Triple Integrals, Flux, Stokes’ Theorem
Math_03 Math 54 Linear Algebra and Differential Equations Vector Spaces and Linear transformations, subspaces, kernels, range, eigenvalues and eigenvectors, The Gram-Schmidt Process, Least-Squares Problems, Linear Systems of Differential Equations, Fourier Series
Math_04 Math 55 - Discrete Mathematics Propositional logic and equivalences, Sets and set operations, Functions, sequences, cardinality, Solving congruences, Mathematical induction, recursive, Counting, Permutations and combinations, Graphs and graph models, Graph isomorphism

Computer Science

Before studying courses

You need to read a few books to understand the big pictures of computer science. Read these books in order:

1. Code: The Hidden Language of Computer Hardware and Software

A great book that provides you the first sense of computer science — what is binary, how machine understand something like binary and more. I like how the authors represent the physics mechanism of computers, so if you are not so good at physics, this book is even better for you. This book is only 400 pages long, easy to read, therefore I highly recommend it as your first read when starting to learn CS.

2. Structure and Interpretation of Computer Programs (SICP)

A legendary book. SICP is the kind of book that is recommended from everywhere in the CS world. It covers a lot of programming aspects from functional programming, abstraction, instruction processor to compilers. This might be quite hard for complete beginners and because this book uses the Scheme programming language, it’s pretty hard to read as well.

Core Courses

Tip

There are many things not covered in this path yet (after completing CS61 series). If you want to discover other courses about security, networks, machine learning, look for them on the Internet or search in this awesome guide.

1. CS50’s Introduction to Computer Science - Harvard

Link: edX Course

Arguably the best intro course for CS I’ve ever learnt.

What you can learn:

  • Basic understanding of computer science and programming
  • Basic algorithms and problem-solving thinking
  • Concepts like abstraction, data structures, encapsulation, web development
  • Languages: Python, C, SQL, Javascript

2. CS 61A Structure and Interpretation of Computer Programs - Berkeley

Link: Course Website

A truly great course which makes me extremely eager to learn everything from Berkeley. However, to really enjoy it as well as get useful knowledge, you need to put a quite large amount of effort — definitely commit to do all the homeworks, assignments, and take all the exams. Might take you few months to actually get things done.

What you can learn:

  • Python and its implementation
  • Higher-Order Functions, Recursion, Environment diagrams
  • Data Abstraction
  • Scheme, Exceptions, Iterators, Streams, SQL

3. CS 61B Data Structures - Berkeley

Videos: InfoCoBuild
Course: datastructur.es

This one is quite hard for me because I don’t like learning Java much. But I think it’s useful if you want to learn data structures.

What you can learn:

  • Basic Java
  • Data structures: Lists, Abstract Classes, Trees, BST, Graphs
  • Algorithms: Sorting and Algorithmic Bounds

4. CS 70 Discrete Mathematics and Probability Theory - Berkeley

Videos: InfoCoBuild
Course: Course Website

I’ve not done this course yet. Discrete maths is cool and probability is interesting so I’ll finish this course soon.

What you can learn:

  • Induction and Recursion
  • Graphs, Eulerian Tour
  • Modular Arithmetic
  • Polynomials, Secret Sharing, Erasure Codes
  • Probability: Counting, Sample Spaces, Events, Independence, Conditional Probability
  • Inference

5. CS 61C Great Ideas in Computer Architecture - Berkeley

Videos: InfoCoBuild
Course: Course Website

My favorite one, because you will learn a lot about how machines work under the hood. Not digging too deep but quite good.

What you can learn:

  • C programming
  • Assembly Language, MIPS Intro

6. CS 162 Operating Systems and Systems Programming - Berkeley

Videos: InfoCoBuild
Course: Course Website

A very tough course that I haven’t finished yet. More details about operating systems than CS61C.

What you can learn:

  • Processes, Fork, I/O, Files
  • Thread, Scheduling, Address Translation, Caching
  • Performance, Storage Devices, Queueing theory
  • Distributed Systems, TCP/IP

7. CS164 Programming Languages and Compilers - Berkeley

Videos: InfoCoBuild
Course: Course Website

I have not learned this course, but knowing about programming languages and compilers is surely useful for being a software engineer.

What you can learn:

  • Unit Calculator
  • Scoping and desugaring
  • Coroutines and Lazy Iterators
  • Regular expressions
  • CYK and Earley Parsers

8. CS169 Software Engineering - Berkeley

Videos: InfoCoBuild
Course: Course Website

I have not learned this course, just read the syllabus and think it’s practical and useful.

What you can learn:

  • Unit Testing
  • Git
  • UML
  • Software processes
  • MVC, Web development

Strengthen your knowledge

I consider the 8 courses above as core courses that you should take seriously. After that, you now have a good understanding of computer science and are ready to improve yourself even more.

The first thing I want to dig into is programming languages. I’m gonna cover a few most popular ones like C, C++, and Python.

Tip

To practice coding, join some popular competitive coding sites like Codeforces and TopCoder.

Programming Languages Resources

No. Language Resource Link Description Difficulty
C_1 C The C Programming Language Amazon Most popular C book. Short and great for learning basic concepts ★★☆☆☆
C_2 C Practical Programming in C MIT OCW MIT course with good practice problems and solutions ★★★☆☆
C++_1 C++ Learn C++ learncpp.com Great and detailed tutorial about C++ ★★☆☆☆
C++_2 C++ Accelerated C++ Amazon Practices and examples ★★★☆☆
C++_3 C++ The C++ Programming Language Amazon Detailed book from author of C++ itself ★★★☆☆
Py_1 Python Official Docs python.org Where I started learning Python — detailed and up-to-date ★★★☆☆
Py_2 Python Learn Python the Hard Way Website Simple intro to Python ★☆☆☆☆
Py_3 Python Real Python realpython.com Detailed examples for practice ★★☆☆☆
Py_4 Python Full Stack Python fullstackpython.com Learn what you need to build Python apps ★★★★☆
Py_5 Python Automate the Boring Stuff Website Learn to build cool Python projects ★★☆☆☆

Data Structures and Algorithms

No. Resource Link What you expected to learn
Al_01 Introduction to Algorithms MIT OCW Basic algorithms from MIT
Al_02 Design and Analysis of Algorithms MIT OCW Part 2: divide and conquer, dynamic programming, complexity
Al_03 Advanced Algorithms MIT OCW Part 3 with advanced algorithms
Al_04 Data Structures and Algorithms YouTube Playlist Richard Buckland’s course on basic data structures
Al_05 CS 186: Introduction to Database Systems InfoCoBuild Database management systems, SQL

Data Science

Probability and Statistics

1. Probability and Statistics - Stanford Online

Link: Stanford Lagunita

A solid foundation in probability and statistics from Stanford.

2. Probabilistic Systems Analysis and Applied Probability - MIT

Link: MIT OCW

This course and the Stanford one can be taken separately (you can choose one of them). But for me, I took both and studied at the same time. The MIT course is more engineering-focused while the Stanford course is more statistics-focused.