Machine learning sounds complicated. But at its core, most machine learning tasks come down to one simple question: are you predicting a number or a category?
If you are predicting a number — like the price of a house or tomorrow's temperature — that is regression. If you are predicting a category — like whether an email is spam or not, or which animal is in a photo — that is classification.
Both are types of supervised learning, which is the most common branch of machine learning. This guide explains both from scratch, with clear everyday analogies and working Python code you can run yourself.
What Is Supervised Learning
Supervised learning means teaching a machine using examples that already have the right answer. You show the model thousands of examples — here is the input, here is the correct output — and the model learns the pattern that connects them.
Think of it like teaching a child to recognise apples. You show them 100 apples and say "this is an apple". You show them 100 oranges and say "this is not an apple". After enough examples, the child can look at a fruit they have never seen before and make a good guess.
Supervised learning works the same way. You give it labelled training data, it learns the pattern, and then it can make predictions on new data it has never seen.
Regression — Predicting a Number
Regression is used when the output you want to predict is a continuous number. The model learns to draw a line or curve through your data that best fits the relationship between your inputs and outputs.
The simplest mental model: imagine you have a scatter plot of house sizes on the x axis and house prices on the y axis. Regression draws the line that fits through all those points. When you give it a new house size it has never seen, it looks at where that size falls on the line and reads off the predicted price.
Real World Examples of Regression
- Predicting the price of a house based on its size, location and number of rooms
- Predicting tomorrow's temperature based on today's weather data
- Predicting how many sales a product will make based on its price and ad spend
- Predicting a student's exam score based on hours studied
- Predicting a patient's blood pressure based on their age and weight
- Predicting the fuel efficiency of a car based on its engine size and weight
Regression in Python
Here is a working example using scikit-learn. We will predict house prices from a list of features:
How to Measure Regression Accuracy
For regression, you measure how far your predictions are from the actual values. Here are the three most common ways to do that:
Classification — Predicting a Category
Classification is used when the output you want to predict is a discrete label or category. Instead of drawing a line, the model learns to draw a boundary that separates one category from another.
Think of it like sorting your email. Every incoming message gets sorted into a box: "spam" or "not spam". The model has learned from thousands of past emails what makes one spam and the other not. Now it can put every new email into the right box.
There are two types of classification. Binary classification is where there are only two possible answers (yes or no, spam or not spam, cat or dog). Multi-class classification is where there are more than two categories (cat, dog, bird, fish).
Real World Examples of Classification
- Detecting whether an email is spam or not spam
- Deciding if a bank transaction is fraudulent or genuine
- Diagnosing whether a tumour is malignant or benign
- Recognising which handwritten digit is in an image (0 to 9)
- Predicting whether a customer will churn or stay
- Identifying which language a sentence is written in
- Classifying a photo as a cat, dog or bird
Classification in Python
Here is a working example. We will predict whether a customer will buy a product based on their age and salary:
How to Measure Classification Accuracy
For classification, the main question is how many predictions did the model get right. But just counting correct answers is not always enough. Here are the four most important metrics:
Side by Side Comparison
| Feature | Regression | Classification |
|---|---|---|
| Output type | A continuous number | A discrete category or label |
| Example output | $347,000 22.5 degrees 89.3% | Spam / Not spam Cat / Dog / Bird |
| Key question | How much? How many? | Which one? Yes or no? |
| Main metric | MAE, RMSE, R² score | Accuracy, Precision, Recall, F1 |
| Simple algorithm | Linear Regression | Logistic Regression |
| Tree algorithm | Decision Tree Regressor | Decision Tree Classifier |
| Ensemble algorithm | Random Forest Regressor | Random Forest Classifier |
How to Choose Between Regression and Classification
The choice is almost always obvious once you ask one question: what kind of answer do I need?
Ask yourself: can the answer be any number on a scale, or does it have to be one of a fixed set of options?
- If the answer is a number from a continuous range (price, temperature, score, duration) → use regression
- If the answer is one of a fixed list of options (yes/no, which category, which label) → use classification
Common Algorithms for Each Type
Most algorithms in scikit-learn come in two versions — one for regression and one for classification. They work the same way internally but produce different types of output:
- Supervised learning means training a model on labelled data — examples where you already know the correct answer.
- Regression predicts a continuous number. Use it when the answer could be any value on a scale, like a price, temperature or score.
- Classification predicts a discrete category or label. Use it when the answer must be one of a fixed set of options, like yes/no or cat/dog/bird.
- To choose between them, ask one question: is the output a number on a continuous range, or one of a fixed set of categories?
- Measure regression with MAE (average error), RMSE (punishes big errors) and R² score (how well the model explains the data).
- Measure classification with accuracy (overall correct), precision (of your yes calls, how many were right), recall (of actual yes cases, how many did you catch) and F1 score (balance of both).
- Most scikit-learn algorithms come in two versions ending in Regressor or Classifier. The API is identical for both. You call .fit() then .predict().
- Logistic Regression is a classifier despite its name. Do not let the word "regression" confuse you.
