In this challenge we try to tackle a survival analysis problem from a machine learning perspective. The goal of survival analysis is to predict the expected time before a given event occur. In our case we try to estimate the survival time of patients given some data from the NHANES dataset.
References and credits:
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annual Eugenics, 7, Part II, 179-188 (1936).
The competition protocol was designed by Isabelle Guyon.
The starting kit was adapted from an Jupyper notebook designed by Balazs Kegl for the RAMP platform.
This challenge was generated using Chalab, a competition wizard designed by Laurent Senta.
The authors are Xihui Wang, Aurélien Mascaro, Yani Zani, Antoine de Scoraille, Nouredine Nour and Louis Mouline.
Special thanks to our coordinators Kristin Bennett and Alexander New for providing a clean aggregated version of the dataset.
The problem is a regression problem. The goal is to predict the time a patient survives after his admission in a hospital. To do so, each sample (aka each patient) is characterized by the following information:
This give you a total of 10 features to predict the following couple:
You are given for training a data matrix X_train of dimension 19297 x 10 and an array y_train of labels of dimension 19297 x 2. You must train a model which predicts the labels for two test matrices X_valid and X_test.
To evaluate the performance of your model, you will need a metric: the concordance index. The formula is:
In other word, the concordance index is a “global” index for validating the predictive ability of a survival model. It is the fraction of pairs in your data, where the observation with the higher survival time has the higher probability of survival predicted by your model.
The index is not calculated for every observation/subject. So the c-index can not be interpreted as the risk of a subject. High values mean that your model predicts higher probabilities of survival for higher observed survival times.
There are 2 phases:
During all this sample competition, you are allowed to submit either:
The submissions are evaluated using the above metric: the concordance index.
Submissions must be made before the end of phase 1. You may submit 5 submissions every day and 100 in total.
This challenge is governed by the general ChaLearn contest rules.
Start: Nov. 15, 2018, midnight
Description: Development phase: tune your models and submit prediction results, trained model, or untrained model.
Start: April 30, 2019, midnight
Description: Final phase (no submission, your last submission from the previous phase is automatically forwarded).
You must be logged in to participate in competitions.Sign In