Iris for Python 3, info232 2019

Organized by guyon - Current server time: Oct. 27, 2020, 12:42 p.m. UTC

Previous

Development Phase
Nov. 15, 2018, midnight UTC

Current

Final Phase
April 30, 2019, midnight UTC

End

Competition Ends
Never

Solve Fisher's Famous Iris Problem

This is the well known Iris dataset from Fisher's classic paper (Fisher, 1936). The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Motivation:

You have just been hired by "Super-Flora", a flower distribution chain, as "Data Scientist". The Flower Purchasing Manager suspects one of his suppliers of dishonesty. Versicolor Irises usually last longer and are more expensive than others. But they look a lot like Iris Setosa, and the supplier may have sold you a certain number of Setosa instead to make better profit margins. Your task is to check the batches that arrive. The botanist has indicated that those flowers have features, such as the dimensions of the petals, sepals, stem length, color, etc. which allows us to distinguish them. The Flower Purchasing Manager hired the botanist to measure some features of a few flowers in each lot, to create a small dataset for a pilot study.

The Flower Purchasing Manager gives you access to a training set with "truth values" of the identity of the flowers. You must provide a "classifier", which is a program capable of predicting the identity of the flowers given the measured features, for new test examples.

References and credits:
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annual Eugenics, 7, Part II, 179-188 (1936).
The competition protocol was designed by Isabelle Guyon.
The starting kit was adapted from an Jupyper notebook designed by Balazs Kegl for the RAMP platform.
This challenge was generated using Chalab, a competition wizard designed by Laurent Senta.

Evaluation

The problem is a multiclass classification problem. Each sample (an Iris) is characterized by its sepal and petal width and length (4 features). You must predict the Iris categories: setosa, virginica, or versicolor.
You are given for training a data matrix X_train of dimension num_training_samples x num_features and an array y_train of labels of dimension num_training_samples. You must train a model which predicts the labels for two test matrices X_valid and X_test.
There are 2 phases:

  • Phase 1: development phase. We provide you with labeled training data and unlabeled validation and test data. Make predictions for both datasets. However, you will receive feed-back on your performance on the validation set only. The performance of your LAST submission will be displayed on the leaderboard.
  • Phase 2: final phase. You do not need to do anything. Your last submission of phase 1 will be automatically forwarded. Your performance on the test set will appear on the leaderboard when the organizers finish checking the submissions.

This sample competition allows you to submit either:

  • Only prediction results (no code).
  • A pre-trained prediction model.
  • A prediction model that must be trained and tested.

The submissions are evaluated using the accuracy metric.

Rules

Submissions must be made before the end of phase 1. You may submit 5 submissions every day and 100 in total.

This challenge is governed by the general ChaLearn contest rules.

Development Phase

Start: Nov. 15, 2018, midnight

Description: Development phase: tune your models and submit prediction results, trained model, or untrained model.

Final Phase

Start: April 30, 2019, midnight

Description: Final phase (no submission, your last submission from the previous phase is automatically forwarded).

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 tochenj 1.0000
2 pnamvu 1.0000
3 loba.ambemou 1.0000