How I Predicted Paris Saint-Germain Will Win the Champions League 2025 🏆

Table of Contents

Over the past few days, I decided to put an ambitious project to the test: using artificial intelligence to predict the winning team of the UEFA Champions League 2025. The result? The model predicted that it will be Paris Saint-Germain who will lift the trophy.

In this article, I tell you step-by-step how I carried out the project: from data collection to training the model to creating a graphical user interface for the predictions.

Why did I decide to create an AI that predicts the Champions League?

It all started by watching a video on YouTube. Fascinated by the idea, I decided to replicate (and improve) the experiment, using data from soccer matches played over the past 25 years.

Where I got the data: the dataset used

The first step was to obtain complete and reliable datasets. I downloaded two, both available on Kaggle:

🔗 Club Football Match Data 2000-2025

1. Match Dataset.

Contains detailed results of all matches played from 2000 to 2025, including goals, shots, fouls, cards, and betting odds.

2. ELO score dataset.

Includes the evolution of teams’ ELO scores over time. If you are not familiar with the ELO system, find a full explanation here:

📖 Wikipedia – World Football Elo Ratings

Data preparation for machine learning

Cleaning and normalizing the dataset

After loading the data with Pandas, I normalized the formats and unified the team names to avoid errors related to upper/lower case or poorly formatted dates.

def clean_match_data(match_data: pd.DataFrame) -> pd.DataFrame:
    match_data['MatchDate'] = pd.to_datetime(match_data['MatchDate'], errors='coerce')
    match_data['HomeTeam'] = match_data['HomeTeam'].str.upper()
    match_data['AwayTeam'] = match_data['AwayTeam'].str.upper()
    return match_data

How the model figures out who won: defining the target variable

I created a get_result() function to generate the target for our model:

1 = home team win
1 = victory of the visiting team
0 = tie

def get_result(row: pd.Series) -> int:
    try:
        home_goals = int(row['FTHome'])
        away_goals = int(row['FTAway'])
    except:
        return None

    if home_goals > away_goals:
        return 1
    elif home_goals < away_goals:
        return -1
    else:
        return 0

The function is applied to each row and the results are saved in the new match_result column.

Choosing the most relevant features for prediction

To correctly predict the outcome of a match, I selected the most influential features:

ELO scores(HomeElo, AwayElo)
Recent form(Form3Home, Form3Away)
Betting odds(OddHome, OddDraw, OddAway)
Game statistics: shots, shots on goal, fouls, yellow/red cards

features = [
    'HomeElo', 'AwayElo',
    'Form3Home', 'Form3Away',
    'OddHome', 'OddDraw', 'OddAway',
    'HomeShots', 'AwayShots',
    'HomeTarget', 'AwayTarget',
    'HomeFouls', 'AwayFouls',
    'HomeYellow', 'AwayYellow',
    'HomeRed', 'AwayRed'
]

After selection, I removed all rows with missing values.

Normalization and division of the dataset.

To prevent variables with large numerical values (e.g., ELO scores) from dominating the others, I normalized the features using StandardScaler:

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Then I divided the data into train set (80%) and test set (20%):

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

Random Forest model training.

I used the Random Forest classifier, which is very effective with tabular data and can handle both numerical and categorical variables well:

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

After training, I tested the model:

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2%}')
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Saving the model

Once I was satisfied with the results, I saved both the model and the scaler in .joblib files:

joblib.dump(model, 'rf_model.joblib')
joblib.dump(scaler, 'scaler.joblib')

I created a Web App to test the model ⚽️💻

To make the project accessible and interactive, I built a simple Web interface with Flask. The user can select two teams, and the system calculates in real time who is more likely to win the game.

The interface uses HTML, CSS and JavaScript for the frontend, while Flask handles the backend and the loading of the saved model.

Conclusion: will PSG win the Champions League?

According to my model, Paris Saint-Germain is the team most likely to win the Champions League 2025. Of course, soccer is unpredictable, but it was fascinating to see how AI can analyze thousands of matches to make predictions based on the data.

Want to See the Code?

Curious about the code? you can have a look at the repository and if you have any comments or if you want to leave me a feedback feel free to do it

If you’re curious and want to explore the entire project, you can check out the full repository on GitHub:
👉 Visit the GitHub repository

In the repository you’ll find:

Full preprocessing code
Scripts for training the model
The Flask app for the user interface