These days, many people want to master machine learning. But the intimidating curriculum offered by most instructors discourages many beginners.
In this tutorial, I’ll turn the curriculum upside down. I’m going to tell you how I think it’s the fastest and easiest to get a solid understanding of machine learning.
Table of Contents
The syllabus I propose is a repetitive cycle of steps:
Step 0. Immerse yourself in the field of machine learning
Step 1. Learning a project similar to what you want to create
Step 2. Learning a programming language
Step 3. Exploring libraries from top to bottom
Step 4 Create a project you are interested in, in a maximum of one month
Step 5: Identifying the biggest gap in your knowledge and filling it
Step 6: Repeat steps 0 to 5.
This is a cyclical learning plan because step 6 is a GOTO to step 0!
I should point out that my study plan may seem strange to you. But I tested it in practice when I taught machine learning to students at McGill University.
In general, I have tried many curricula, starting with the theoretically better bottom-up approach. But the practice has shown that it is the pragmatic top-down approach discussed in this article that yields the best results.
My critics often point out that people who are not starting with the basics, such as statistics or linear algebra, will have a poor grasp of machine learning and will not know what they are doing when modeling.
In theory, yes, that’s true, and that’s why I started teaching machine learning from the bottom up. But in practice, it turned out differently.
It turned out that students who learned to model at a high level were much more likely to delve into low-level stuff on their own because they saw the direct benefit it would bring to their higher-level skills.
Starting at the bottom, they wouldn’t have been able to gain that context. This is why I think most teachers lose their students.
So, with that said, let’s move on to the teaching plan itself!
Step 0: Immerse yourself in the field of machine learning
The very first part of learning anything is learning the boundaries of the field and where in it the piece that interests you personally is.
By knowing the dimensions of the field, you’ll know you’re not missing out on anything more interesting. This allows you to concentrate better. Also, knowing what the area you are walking through looks like will make it easier for you to mentally chart a path to the goal you want.
To properly dive into the ML sphere and hone your study plan, you need to answer three questions in order:
- What can you do with ML at all?
- What do you want to do with ML?
- How will you do it?
These questions will allow you to focus on something very specific and accessible to learn and to see the big picture.
Let’s look at each of these questions in more detail.
What can you do with ML?
This is a very broad question, the answer to which will change all the time. The great thing about my curriculum is that at each iteration you will spend some time learning what is possible in this area.
This will allow you to refine your mental model of machine learning. You probably won’t have a full picture of what’s even possible here at first. But that doesn’t matter much. A rough understanding is better than nothing.
What do you want to do with ML?
This is the most important question. You can’t be good at everything, either in machine learning or in any other field. You have to be very picky about what would be useful to spend your time on.
One way to make this choice is to make a list of your interests and arrange them in descending order. Then just pick the most interesting topic and anchor it somewhere you’ll see it all the time. You’ll study it and nothing else, at least until your interest ranking changes.
Yes, keep in mind that this ranking may well change. If you were very interested in a topic, but after getting to know her more closely, she is no longer that interesting, you can pursue something else. This is what planning at the beginning is for.
If you are equally interested in more than one subject, I highly recommend devoting one cycle to just one of them. All subjects are interrelated in one way or another. Going deep into one subject will allow you to see those connections, but jumping from subject to subject will not.
If I were going to learn something new on my hundredth pass of this cycle right now, I would dive into graph neural networks and their application to supply chain management.
How are you going to do what interests you?
Now that you’ve determined what you’re interested in and where that direction belongs in the overall context, spend some time figuring out how people do it.
However, most people involved in machine learning use Python and its packages. Python is a relatively easy-to-understand programming language with a thriving ecosystem. This means that people building machine learning tools are more likely to build them with Python interfaces.
Tools aren’t usually created in pure Python because that language is pretty slow. But thanks to the interface, the user doesn’t realize that what’s really in front of them is a C++ library wrapped in Python.
If you don’t understand that last part, that’s okay. Just keep in mind that Python and its libraries are the most win-win option to learn.
Tools used for machine learning
The usual set of tools for those wishing to learn machine learning are as follows:
- Python for high-level programming
- Pandas for working with datasets
- Numpy for Numerical Computing
- Scikit-learn for ML models (no deep learning)
- TensorFlow or Pytorch for deep learning ML models
- High-level deep learning libraries like Keras and fast.ai
- Git basics for working on a project
- Jupyter Notebook or Google Colab to experiment with code
Of course, there are many more tools available! Keep them in mind, but don’t chase after the newest libraries. The technologies mentioned above are good enough for most projects.
True, there are still specialized libraries that you may need to add to your stack.
Say, for studying graph neural networks and their applications in supply chain management, all of these packages are suitable. However, there are more specialized packages in Pytorch that would speed up my graph neural network development. For example, the Pytorch geometric library.
read also : data science for all : what is data science
Step 1: Study one project that is similar to what you want to create
Now that you know exactly what you want to do and have a rough idea of how you’ll do it, it’s time to refine the details.
The best way to learn how something is done is to watch a real master at work. You can think of it as an asynchronous apprenticeship.
Being able to see what can be accomplished and what the result of the labors might look like will give you more context for learning than any theory.
The best way to do this is to go to GitHub or Kaggle and look for publicly available projects. Browse through several of them until you find one you like.
It could be a full library, a simple analysis, or an off-the-shelf AI. Either way, find a few different projects, and then choose the one that interests you most.
Once you find the right project, take some time to familiarize yourself with its documentation, codebase structure, and code. Chances are you will get lost. Especially if you are not too well versed in programming. But this way you will learn a lot, and learning new things is good and enjoyable!
Take notes on recurring patterns you see, interesting pieces of code you understand, and topics you don’t understand at all. Add this project to your bookmarks and come back to it as you progress through the learning curve.
A good place to start your search is this list on GitHub. But you can just use a search on Kaggle or GitHub. Search for keywords related to your ML interests.
For my syllabus, a good simple project by Thomas Kipf will do. It’s simple enough that I can go through it and understand what’s going on in each section, learning the basics of the structure along the way.
Step 2: Learn the programming language
Now that you have a clear idea of where you need to go next and what to study to get there, it’s time to learn how to understand the code.
The code will most likely be in Python. But it could also be Julia, C++, or Java – it all depends on what you want to learn and which project you’ve bookmarked.
Whatever language it is, you should spend some time on learning the basics to understand how to write scripts.
A very good course to learn Python, enough to get you started with the language, is Scientific Computing with Python from freeCodeCamp. You can also try a very short course on Python from Kaggle.
You don’t need to understand 100% how the language works. Just as you go through the machine learning cycle, try to regularly take some time to improve your knowledge of your chosen programming language. That way the learning will become iterative.
For my syllabus, the freeCodeCamp course will do just fine.
Step 3: Study the libraries from top to bottom
In machine learning tutorials, I often notice that after learning the basics of machine learning, they move on to implementing algorithms from scratch.
I think this is a great project to do on your own. But I don’t think it should be the main focus of early ML learning.
The fact is that almost no one implements algorithms from scratch, except the people who create the packages that developers use. And even then, they often rely on other packages created by linear algebra specialists to do most of the low-level work.
My point is that understanding what’s going on under the hood is extremely useful, but I don’t think that should be a beginner’s goal.
At this point, I suggest learning the highest-level library for your programming language of choice that will get you the results you want. To create something working, you will only need to learn how that library works.
Of course, at this stage, you won’t know why something works or doesn’t work, but that’s not too important.
Much more important is being able to work with the tools that ML professionals use in their day-to-day work. Once you understand what a high-level library does, move on to a slightly lower-level library.
That said, make sure you don’t get too deep into learning the library (if you get to LAPACK by reading about Fortran, you’ve gone too far!).
For my project, the main library I need to learn is Pytorch or its higher-level wrapper, so a hands-on course in fast.ai would be in order.
Step 4: Create one project you’re passionate about in one month at the most
Let’s move on to the stage where most of the learning happens. By this point, you should already have the minimum knowledge to create at least a little bit of a useful project.
For reference : if taking on any project makes you feel confident, it means you haven’t gone through steps 0 through 3 fast enough.
Think about what interests you, and what you’d like to create. Don’t get too carried away: you have a maximum of a month for this project.
Mark a deadline on your calendar. When working on a project, a time limit is motivating and adds just enough stress so that you can still get the job done.
The idea here is to discover major knowledge gaps when faced with challenges and experience what a true machine learning developer experiences.
Working independently, i.e., without resorting to a course or book, you’ll be able to do the difficult parts of your project that you would normally skip if you followed a tutorial:
- Planning, scoping work and tracking the progress of your ML project
- reading online documentation of libraries
- reading StackOverflow and GitHub threads, posts on some randomly found developer’s blog, and posts on a mysterious help forum, all to solve a single bug
- creating a project in a suboptimal way and then improving it
- fixing problems with overtraining, undertraining, and generalization.
To pick an interesting project, try these three small exercises:
- Think carefully about what interests you now
- look at the list of project ideas
- pay attention to open data sets.
All of this together will give you an understanding of what is even possible to create. And by combining that with your interests, you can create something truly your own.
This list on GitHub can be a great place to find inspiration when creating a mini-project. To find the right data for your project, you can use Google Dataset Search.
Don’t underestimate the importance of data!
Even if you have very good ideas, a lack of data will seriously hinder your progress.
For my purposes, I found this neat data set about a mining company’s global supply chain. My project will involve using graph neural networks to determine sales prices for an excavator, which is the central theme of this data set.
Step 5: Identify one gap in your knowledge and address it
At this point, you’ve already spent some time developing your project and are very impressed with how far you’ve come. You probably haven’t even come close to what you imagined, but you’ve already encountered countless problems along the way.
You now realize how little you know and that there are gaps in your knowledge that need to be filled.
That’s great! Make a list of all the gaps you discover and arrange them in order of perceived priority. This can be difficult for you, as everything will look equally important at this point. But learning to make informed decisions about what to study next is almost as valuable as learning itself.
Now for the weirdest part: remove everything but the most important topic from your list.
When I say “delete,” I mean exactly that. Delete everything except item #1. In the next iteration of the cycle, your current assessment of what needs to be studied will be mostly wrong. You will be missing other, more important knowledge that you don’t know about now.
Now that you only have one topic left to study, give yourself one day to one week to do so. It may seem that a very little time, but you need to learn the topic not thoroughly, but just enough so that this knowledge can be used in the next round of training.
In practice, you may dive deep enough into this topic to notice how it relates to other important topics (such as probability, statistics, or even dull linear algebra).
Pay close attention to these connections, revisit related topics if you wish, and reinforce your mental model of machine learning to make it more accurate.
Step 6: Repeat steps 0 through 5
Your first run through this pipeline is likely to be so-so at best. But in a very short period, you will learn much more than you could by going “from the bottom up.”
On each new pass of the cycle, the “output” will increase rapidly. Each new round will be easier, and the big picture will become clearer.
This approach is based on lean manufacturing methodology, which I have learned to apply to my startup with great success. Doing multiple iterations is the fastest way to get there.
You might be able to go all the way through 12 times in a year, which means 12 machine learning projects and a very hands-on understanding of the field.
This will make you an attractive candidate in the job market, and you’ll also understand what you need to develop further.
So, if you want to learn machine learning in practice, you should:
- understand what the field of machine learning is all about and mentally map it out.
- Find a cool project similar to what you would like to do, and study it.
- Learn the necessary programming language.
- Master enough libraries to do something useful.
- Create a project in a week (month).
- Identify the one biggest gap in your knowledge and fill it.
- Do it again!
I hope this article is useful to you. Good luck!