Our aim behind this challenge is to use predictive modelling to identify significant opioid prescriptions. For that we will be using already preprocessed data and prepared for participants to directly use it. The challenge consists in a binary classification task. The data has prescription records for a couple hundreds of non-opioid drugs written by professionals in 2014. The target is a binary variable that represents, for each medical prescription, whether an opioid has been prescribed or not. The features represent, amongst others, the specialty of the doctor who made the prescription and the number of times a non-opioid drug had been presribed by the doctor in the past.
By participating in this challenge, you are helping save people's lives. As to avoid systematic opioid prescriptions as pain relievers, it would be interesting to identify unnecessary prescriptions, limiting therefore opioid addiction rates and overdose deaths.
References and credits:
The competition protocol was designed by Isabelle Guyon.
This challenge was generated using ChaLab.
Credits for this challenge first go to the Centers for Medicare & Medicaid Servises (CMS): cms.gov
We acknowledge BioMed team members for preprocessing and preparing the dataset:
Mariem BOUHAHA (Team Coordinator), Guillaume COLLIN, Guillaume WELSCH, Thomas FOLTETE and Hoël PLANTEC.
Contact our team: email@example.com
The problem is a binary classification problem. Each sample (a prescription) is represented by some categorical features like the gender, the specialty of the prescriber and his state of origin as well as numerical feaures describing the number of times the corresponding drug was prescribed by the medical professional (243). You must predict the prescriber's class: 1 if he's an opioid prescriber, 0 otherwise.
You are given, for training, a data matrix X_train of dimension num_training_samples x num_features and an array y_train of labels of dimension num_training_samples. You must train a model which predicts the labels for two test matrices X_valid and X_test.
We are providing train and test data for participants. Both datasets have 243 features describing the physicians' historical prescriptions of non-opioid drugs.
Train data has the following statistics for a subset of features:
As for the target variable, it is binary and takes the values 0 or 1, depending on whether or not the physicians' historical prescriptions included opioids. It has the following distribution:
To prepare your submission, remember to use predict_proba, which provides a matrix of prediction scores scaled between 0 and 1. The dimension of the matrix is num_pattern x num_classes. In our case, num_classes = 2. Each line represents the probabilities of class membership, which sum up to one. Preparing your submission with the starting kit is the easiest.
There are 2 phases:
This sample competition allows you to submit either:
Classifier performance is more than just a count of correct classifications. By considering wrong results as well as our correct ones, we get much greater insight into the performance of the classifier. In this challenge, we propose the auc_metric to evaluate participants' submissions. This metric computes the normalized Area Under the ROC Curve score. ROC curves were originally developed for use in signal detection in radar returns in the 1950’s, and have since been applied to a wide range of problems.
To generate a Receiver Operating Characteristic (ROC) curve, all we do is plot True Positive rate (sensitivity) against False Positive rates (1 – specificity) for each threshold used. A score of 0 is no better than random guessing, while a score of 1 corresponds to perfect predictions.
For more details about AUC, see https://en.wikipedia.org/wiki/Receiver_operating_characteristic and http://gim.unmc.edu/dxtests/roc3.htm.
Submissions must be made before the end of phase 1. You may submit 5 submissions every day and 100 in total.
This challenge is governed by the general ChaLearn contest rules.
This competition is organized solely for test purposes. No prizes will be awarded.
The authors decline responsibility for mistakes, incompleteness or lack of quality of the information provided in the challenge website. The authors are not responsible for any contents linked or referred to from the pages of this site, which are external to this site. The authors intended not to use any copyrighted material or, if not possible, to indicate the copyright of the respective object. The authors intended not to violate any patent rights or, if not possible, to indicate the patents of the respective objects. The payment of royalties or other fees for use of methods, which may be protected by patents, remains the responsibility of the users.
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS" THE ORGANIZERS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE THROUGH THIS WEBSITE.
Participation in the organized challenge is not-binding and without obligation. Parts of the pages or the complete publication and information might be extended, changed or partly or completely deleted by the authors without notice.
Start: Oct. 22, 2017, 6:53 p.m.
Description: Development phase: create models and submit them or directly submit results on validation and/or test data; feed-back are provided on the validation set only.
Start: April 30, 2018, 6:53 p.m.
Description: Final phase: submissions from the previous phase are automatically cloned and used to compute the final score. The results on the test set will be revealed when the organizers make them available.
You must be logged in to participate in competitions.Sign In