Real-world Reinforcement Learning Challenge Forum

Go back to competition Back to thread list Post in this thread

> test

## Baseline Scheme and Code

The technical methods used in the challenge are not limited. This article introduces the baseline scheme [Polixir](http://polixir.ai/) provides for the challenge. The baseline scheme is based on the `Polixir Revive SDK` ( Download link: https://revive.cn ). For detailed code examples, please refer to the `sample_submission` folder in `starting_kit.zip`.
In this scheme, we will use the `Revive SDK` to learn the environment model from historical data. The environment model contains the simulation of each user, and then we train the promotion policy in the environment model. In this process, we will complete the following steps:

1. Define user state (user portrait)

2. Learn virtual environment

3. Learn promotion policy

The following content describes each step separately。

##### Define Individual User State

In order to learn the user model, firstly, we need to define the user state (user portrait). As a baseline, we take the simplest (not the best) way to define the state of each user.

The competition data provides a 60-day history of promotion actions and user actions. The

Posted by: yezh @ Dec. 25, 2021, 7:58 a.m.
Post in this thread