In these lessons, you will learn the most commonly used classification algorithms and what problems they typically apply to.
In these lessons, you will learn the most commonly used classification algorithms and what problems they typically apply to. We will cover several modeling styles including: regression, Naive Bayes, support vector machines, decision trees, random forest models, K-nearest neighbors, and multi-class models.
For each of these modeling styles, you will learn their relative strengths and weaknesses and the trade-offs between them. This chapter will also cover an introduction to neural networks and their use in common classification problems, such as image classification and convolutional neural networks.
Learning Objectives
- Learn the most commonly used classification algorithms
- Learn the algorithms’ strengths and weaknesses
- Introduce neural networks and their use in classification problems
Skills you’ll gain
Data ClassificationDecision Tree LearningMachine LearningMachine Learning AlgorithmsMachine Learning MethodsMachine Learning Model TrainingWhat You'll Learn
- Identify the most commonly used classification algorithms and the problems they typically apply to
- Compare the strengths, weaknesses, and trade-offs of regression, Naive Bayes, support vector machines, decision trees, random forest models, and K-nearest neighbors
- Apply multi-class models to classification problems
- Explain perceptrons and neural networks and their use in classification problems
- Use convolutional neural networks for image classification
Key Takeaways
- The course covers several modeling styles including regression, Naive Bayes, support vector machines, decision trees, random forest models, gradient boosting, K-nearest neighbors, and multi-class models.
- For each modeling style, the course examines its relative strengths and weaknesses and the trade-offs between them.
- The course introduces neural networks and their use in common classification problems such as image classification.
- Convolutional neural networks are presented in the context of image classification.
Frequently Asked Questions
What classification algorithms does this course cover?
It covers regression, Naive Bayes, support vector machines, decision trees, random forest models, gradient boosting, k-nearest neighbours, multiclass models, perceptrons and neural networks, and convolutional neural networks for image classification.
Will I learn the differences between these algorithms?
Yes. For each modeling style, you will learn its relative strengths and weaknesses and the trade-offs between them.
Does the course cover neural networks?
Yes. It introduces neural networks and their use in common classification problems, including image classification and convolutional neural networks.
What skills does this course help build?
It builds skills in data classification, decision tree learning, machine learning, machine learning algorithms, machine learning methods, and machine learning model training.
Transcript
Show transcript (free preview lesson)
Transcript of the free preview lesson. Remaining lessons unlock with the full course.
In this chapter, we'll get hands on with some of the specific algorithms that you might use for classification modeling. One of the most common methods of doing classification modeling is with a logistic regression. Now, regression is one of the most common styles of machine learning used in practice and it has a really long history. Logistic regression is based on the same theory as linear regression but instead of predicting a specific value, it predicts the probability of belonging to a specific class. To do this, it uses the logistic or sigmoid function. This function goes from zero probability to 100% probability. An advantage of using the logistic regression is that you do have that probability that can be fairly straightforwardly interpreted. This allows you to look at one specific input variable and see how changes within values of that input variable impact the overall probability of a data point belonging to a specific class. So, let's see how it might work. So here I've got an example program that will run a specific model on three different synthetic data sets. The synthetic data sets are set up so that they have specific properties so that when we compare the results of the different models, we can determine where some of the models do really well and other models do relatively poorly. So in the case of logistic regression, first I need to import the module from SK learn. Then I need to title my graph, and finally, I need to input the actual classifier itself that I want to run, in this case, it's the logistic regression. So when we run that, it first shows the synthetic data sets and then it shows us the performance of the logistic regression. The first data set is like two interlocking moons, the second one is like two circles, one around the other. And the final one is essentially a blob that is approximately linearly separable. Okay, so how did it do? So the shading in the results graphs tells us where the model is predicting one value versus another. So predicting whether the data point belongs to the red class or the blue class and the grid shows us where the model believes is a region that is much more associated to either the red class or the blue class. So you can see here in the first one remember we have that sort of linear relationship between the two variables in the model. So what that means is that it's going to sort of overlay that probability based sigmoid function on the grid in a linear fashion. So for our examples, this doesn't really work all that well it doesn't catch the structure within the moons and especially not within the circles. Now the final one, the linearly separable data set you can see that it does actually much better, but the interesting thing is the a 100% probability the areas where the model is quite certain that the data point belongs to that class are actually quite far away. So you can see that most of the data points it's actually categorizing them really close to that 50% mark, either just above or just below. So this gives us some intuition into the kinds of data sets that might not be as well modeled by logistic regressions. So this should give you a little bit of an intuition about how a logistic regression might work in real life data. There are challenges to this model even though it's a extremely common method for modeling you can definitely tell that it doesn't work very well for certain kinds of relationships between your input variables. In fact, it doesn't infer any kind of complex relationship within your input variables. You would need to create those input variables as new features in order to use them. The nice thing is that it's fairly easy to interpret what's going on in the model. And so if you have a style of data set that has fairly good accuracy, then interpreting it is quite easy because you only have just a few coefficients that you need to interpret to understand how important each of those elements are. And you can directly put them in the form of probabilities and you can think about them in probabilities which is a really intuitive way to think about classification modeling. The final problem with logistic regression is that it's quite sensitive to outliers. It means that your training set can really affect the modeled outcome. So be aware of outliers in your data, it's something you should always be checking for anyway but it really matters when you're using logistic regression. The other challenge with logistic regression is that it's built on a set of assumptions that may not be met with many of the different data sources that you use. So for example, having unequal variance within your different input variables is something that can massively affect the model that you build. So remember our data from card deco we saw a lot of unequal variance in the input variables. And so in this case, we would wanna make sure then that we had an input variable that would work in our model. So now you should have a good idea of how a logistic regression can be implemented using the SK learn library and ways that you can use some of the visualizations in order to interpret what the model is telling you and what is going on with the prediction. You can build your intuition on some of these synthetic data sets so that you get an idea of how the different models compare when you are going to be using this in your real challenges.
Learn on the Go
Take your learning anywhere — the KnowledgeCity mobile app lets you watch lessons on the go.