Organized by survival - Current server time: April 10, 2021, 6:30 p.m. UTC


Development Phase
Nov. 15, 2018, midnight UTC


Final Phase
April 30, 2019, midnight UTC


Competition Ends


Solve Survival Analysis Problem

Brought to you by the Survival Team

In this challenge we try to tackle a survival analysis problem from a machine learning perspective. The goal of survival analysis is to predict the expected time before a given event occur. In our case we try to estimate the survival time of patients given some data from the NHANES dataset.

References and credits:
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annual Eugenics, 7, Part II, 179-188 (1936).
The competition protocol was designed by Isabelle Guyon.
The starting kit was adapted from an Jupyper notebook designed by Balazs Kegl for the RAMP platform.
This challenge was generated using Chalab, a competition wizard designed by Laurent Senta.
The authors are Xihui Wang, Aurélien Mascaro, Yani Zani, Antoine de Scoraille, Nouredine Nour and Louis Mouline.
Special thanks to our coordinators Kristin Bennett and Alexander New for providing a clean aggregated version of the dataset.


Introduction of the problem

The problem is a regression problem. The goal is to predict the time a patient survives after his admission in a hospital. To do so, each sample (aka each patient) is characterized by the following information:

  • Age
  • Ethnicity (with a one-hot representation)
  • Gender
  • Blood pressure (systolic and diastolic)
  • Glycated hemoglobin concentration
  • Body Mass Index (or BMI)

This give you a total of 10 features to predict the following couple:

  • Target: the survival time to predict
  • Event: if the data is censored (the patient left the study or the study stopped too early) 1 else 0

You are given for training a data matrix X_train of dimension 19297 x 10 and an array y_train of labels of dimension 19297 x 2. You must train a model which predicts the labels for two test matrices X_valid and X_test.

Evaluation metric

To evaluate the performance of your model, you will need a metric: the concordance index. The formula is:

In other word, the concordance index is a “global” index for validating the predictive ability of a survival model. It is the fraction of pairs in your data, where the observation with the higher survival time has the higher probability of survival predicted by your model.

The index is not calculated for every observation/subject. So the c-index can not be interpreted as the risk of a subject. High values mean that your model predicts higher probabilities of survival for higher observed survival times.

Different steps

There are 2 phases:

  • Phase 1: development phase. We provide you with labeled training data and unlabeled validation and test data. Make predictions for both datasets. However, you will receive feed-back on your performance on the validation set only. The performance of your LAST submission will be displayed on the leaderboard.
  • Phase 2: final phase. You do not need to do anything. Your last submission of phase 1 will be automatically forwarded. Your performance on the test set will appear on the leaderboard when the organizers finish checking the submissions.

During all this sample competition, you are allowed to submit either:

  • Only prediction results (no code)
  • A pre-trained prediction model
  • A prediction model that must be trained and tested

The submissions are evaluated using the above metric: the concordance index.


Submissions must be made before the end of phase 1. You may submit 5 submissions every day and 100 in total.

This challenge is governed by the general ChaLearn contest rules.

Development Phase

Start: Nov. 15, 2018, midnight

Description: Development phase: tune your models and submit prediction results, trained model, or untrained model.

Final Phase

Start: April 30, 2019, midnight

Description: Final phase (no submission, your last submission from the previous phase is automatically forwarded).

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 l2rpn 0.8528
2 StevenIQ 0.7863
3 SURVIVERS 0.7861