A Reproducibility Study of Deep and Surface Machine Learning Methods for Human-Related Trajectory Prediction |
Bardh Prenkaj, Paola Velardi, Sapienza University of Rome
|
Source code and data: benchmark.zip |
As examples of trajectory-shaped risk prediction problems, we consider dropout prediction in online learning platforms, and survival prediction in a critical care tele-health system. We employ two datasets in the first domain, and one in the second. Specifically, the datasets we use in our study are: XuetangX, KDDCup15, and eICU. The first two are benchmark datasets for student dropout prediction and contain e-tivities (homework submissions, video views, navigational actions and so on) representing the students' interaction within several online courses. | ||
For XuetangX, we prune all e-courses with less than 350 student trajectories, which leaves us with 19 courses overall, whereas we leave KDDCup15 as is, since all courses are sufficiently populated. The medical dataset eICU describes fatalities of patients admitted to multi-centre critical care in US hospitals between 2014 and 2015. Patient-related events include laboratory tests, medications, admissions, physical examinations, and visits. We prune all patient trajectories including zero events, which results in a final set of 65k trajectories. For each dataset, as suggested in [10], we perform a daily grouping of events concerning each individual (student or patient). Therefore, we represent each individual's trajectory $u$ with a time-matrix $\mathcal{T}_u \in \mathbb{R}^{\ell,m}$ where $\ell$ is the length of the adopted time-window in days and $m$ is the number of different events types (i.e., features) available in the dataset. In detail, a cell $(i,j) \in \mathcal{T}_u$ contains the number of events of type $j$ found in day $i$ of a trajectory $u$. |
||
The table below illustrates the chosen parameters and activation functions for each deep method. |
Parameters | Activation functions | |||
---|---|---|---|---|
Intermediate layer | Output layer | |||
DNN | num. layers = [3,5] $\alpha = 2$ |
ReLU | Sigmoid | |
Simple CNN | CNN | channels = [6,16] kernel size = 5 padding = 2 pooling kernel = 2 pooling stride = 2 |
ReLU | ReLU |
Dense | num layers = 3 output size = [20,10,1] |
ReLU | Sigmoid | |
Simple LSTM | LSTM | hidden size = 100 | ReLU | ReLU |
Dense | num layers = 1 | - | Sigmoid | |
ConRec | CNN | kernels = [20,50] kernel size = 2 padding = 0 stride = 2 |
ReLU | ReLU |
RNN | hidden size = 50 | ReLU | ReLU | |
Dense | num layers = 1 | - | Sigmoid | |
CFIN | CNN | 1D convolution kernel size inferred |
- | ReLU |
Attention | context vector uniformly initalised $\mu = 0$ $\sigma=0.05$ |
- | - | |
Dense | num layers = 1 | - | Sigmoid | |
HAN | Local LSTM | embedding size = 5 hidden size = 50 |
ReLU | ReLU |
Local LSTM | hidden size = 50 | ReLU | ReLU | |
Dense | num layers = 1 | - | Sigmoid |