Plankton Classification Challenge (preprocessed)

Organized by gaiasavers - Current server time: Oct. 27, 2020, 11:29 a.m. UTC

Previous

Development Phase
Nov. 15, 2018, midnight UTC

Current

Final Phase
April 30, 2020, midnight UTC

End

Competition Ends
Never

Plankton classification (preprocessed data)

Brought to you by the GAIASAVERS team

It is now common knowledge that a lot of species are endangered if not simply disappearing. As we, humans, are the ones responsible for the actual situation, it is our duty to help protect what can still be protected and save what can still be saved.  
About 70% of the surface of our planet  is covered by water, which means that a great part of the biodiversity that can be found on Earth will be located underwater. In order to protect the various life forms living there, being able to quickly identify the places where the biodiversity is particularly endangered is a good start and what we will be doing here.  
Planktons represent a very diverse group of life forms, some of which are unicellulars and are unable to swim against a current. They are found in every ocean and sea around the globe, more precisely in the pelargic zone of these waters. This zone is neither near the coasts nor the abyssal plain. Planktons can include bacteria, algae, larvae, gametes, or even jellyfish. They are at the very basis of the ocean food chain  and are a good indicator of the biodiversity. As a consequence, studying planktons and being able to determine the variety of species found in a certain location is a good way to find out whether or not this place still already endangered or not.

In this challenge, we classify images of different categories of plankton. See the "Evalutation" section for more details.

References and credits:

This challenge was made by the GAIASAVERS team : 

  • Alban Petit 
  • Eric Wang
  • Maxime Chor
  • Sébastien Warichet
  • Timothée Babinet
  • Wafa Bouzouita

Contact : gaiasavers@chalearn.org

This dataset contains pictures of life forms found in the Bering Sea and was created by Kaichang Cheng. It was used in his 2019 paper to show the effectiveness of an enhanced convolutional neural network. The dataset is composed of 7 different classes with 2560 images each.
The competition protocol was designed by Isabelle Guyon.
The starting kit was adapted from a Jupyter notebook designed by Balazs Kegl for the RAMP platform.
This challenge was generated using Chalab, a competition wizard designed by Laurent Senta.

Evaluation

The problem here is a multiclass classification problem. The submissions are evaluated using the balanced_accuracy metric in order to take into account unbalanced classes.

For more details see : https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html

Each element is a plankton photography. Originally, the images were much bigger and of different sizes but we decided to add white padding to normalize the sizes and then to reduce the size to 100x100 pixels for memory reasons. We then used the vertical and horizontal histograms as well as the mean, the variance and the length of the contour as features.
As such, each image is characterized by these 203 features.

There are 7 categories of plankton to predict :

  • chaetognatha
  • copepoda
  • euphausiids
  • fish larvae
  • limacina
  • medusae
  • other

You can see below the values of the dataset for some of the features and how the different classes are distributed along these features.


You are given for training a data matrix X_train of dimension num_training_samples x num_features and an array y_train of labels of dimension num_training_samples. You must train a model which predicts the labels for two test matrices X_valid and X_test.
There are 2 phases:

  • Phase 1: development phase. We provide you with labeled training data and unlabeled validation and test data. Make predictions for both datasets. However, you will receive feed-back on your performance on the validation set only. The performance of your LAST submission will be displayed on the leaderboard.
  • Phase 2: final phase. You do not need to do anything. Your last submission of phase 1 will be automatically forwarded. Your performance on the test set will appear on the leaderboard when the organizers finish checking the submissions.

This sample competition allows you to submit either:

  • Only prediction results (no code).
  • A pre-trained prediction model.
  • A prediction model that must be trained and tested.

 

Rules

Submissions must be made before the end of phase 1. You may submit 5 submissions every day and 100 in total.

This challenge is governed by the general ChaLearn contest rules.

Development Phase

Start: Nov. 15, 2018, midnight

Description: Development phase: tune your models and submit prediction results, trained model, or untrained model.

Final Phase

Start: April 30, 2020, midnight

Description: Final phase (no submission, your last submission from the previous phase is automatically forwarded).

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 greenforce 0.7681
2 OCEAN 0.7595
3 PLANKTON 0.7402