Malaria detection challenge - Raw dataset

Organized by medichal - Current server time: Oct. 27, 2020, 11:26 a.m. UTC


Development Phase
Nov. 15, 2018, midnight UTC


Final Phase
April 30, 2050, midnight UTC


Competition Ends

Detection of malaria using cells images



The malaria is a disease that is spread by a bite of an infected female mosquito. The malaria is caused by the Plasmodium which is a genus of parasites. We know five types of Plasmodium that can infect humans. The malaria is a global concern, every 2 minutes, a child dies of malaria. And each year, more than 200 million new cases of the disease are reported, mostly in Africa. The early diagnosis could help treat and control the disease.



Once the human get infected by the malaria, the Plasmodium parasite begin to spread and appears in the host’s blood. By detecting this Plasmodium through a blood smear test and the ”thick drop” method, it is possible to know whether a human has contracted the malaria.

In this challenge you will have access to the raw gray-scaled images of segmented cells from the thin blood smear slide. Your goal is to detect parasitized cell images from uninfected ones in order to diagnose malaria.

Brought to you by the Medichal team (

Team members

  • Clément Veyssiere
  • Théo Deschamps Berger
  • Nicolas Devatine
  • Ramine Hamidi
  • Simon Monteiro
  • Xinneng Xu
  • Corentin Leloup

Data were provided by NIH


The problem is a binary classification problem. Each sample is an an image of a cell which can be infected or not.
You are given for training a set of raw images, so you have a matrix containing the value of each pixel of the image (50x50) for 60% of the total number of images in a matrix called X_train. For the labels, you have an array of labels of same length as the matrix filled with zeros and ones which indicates whether the cell is infected or not.
Your task is to train a model which predicts the labels of the test sample matrix X_test and the validation sample matrix X_valid.
There are 2 phases:

  • Phase 1: development phase. We provide you with labeled training data and unlabeled validation and test data. Make predictions for both datasets. However, you will receive feed-back on your performance on the validation set only. The performance of your LAST submission will be displayed on the leaderboard.
  • Phase 2: final phase. You do not need to do anything. Your last submission of phase 1 will be automatically forwarded. Your performance on the test set will appear on the leaderboard when the organizers finish checking the submissions.

This sample competition allows you to submit either:

  • Only prediction results (no code).
  • A pre-trained prediction model.
  • A prediction model that must be trained and tested.

The submissions are evaluated using the area under the roc curve metric.


Submissions must be made before the end of phase 1. You may submit 5 submissions every day and 100 in total.

This challenge is governed by the general ChaLearn contest rules.

Development Phase

Start: Nov. 15, 2018, midnight

Description: Development phase: tune your models and submit prediction results, trained model, or untrained model.

Final Phase

Start: April 30, 2050, midnight

Description: Final phase (no submission, your last submission from the previous phase is automatically forwarded).

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 Africa 0.9824
2 MOSQUITO 0.9707
3 medichal 0.7566