Skip to content

Sentiment Analysis

This study case is about solving a sentiment analysis task applied to movie reviews extracted from websites like IMDb . The objective is to build an AI model that will classify movie reviews as positive or negative (binary classification).

Dataset description

The provided dataset contains labeled movie reviews that can be used to train an AI model. The training dataset is balanced so you can measure the performance of your AI model with the accuracy metric.

Column specifications:

  • id: Unique identifier for each review, when making a submission you will need to include ids (see below),
  • text: The movie review to analyse,
  • label: Sentiment of the review, either 0 (negative) or 1 (positive)

Download training dataset

Submission

Your AI model will be tested against a test dataset that will remain private during the competition. The dateset can only be accessed by the CI server.

Submission format

The output file of your solution must be named prediction.csv and must contain the following two columns:

  • id: Unique identifier for each review.
  • label: Sentiment of the review, either 0 (negative) or 1 (positive).

If your submission is successful you should be able to read the test accuracy and environmental impacts of your AI model on the leaderboard.

Leaderboard

You can check your ranking on the hackathon's leaderboard . Each submission is ranked based on the calculated "Ranking score" that takes into account the "Impact Score" and the "Model Accuracy".

\[ \text{Impact Score} = 1 - \frac{\text{Job Total Impact GWP}}{\max(\text{Job Total Impact GWP})}, \]
\[ \text{Ranking Score} = \text{Impact Score} \times \text{Model Accuracy} \]