A Reproducibility Study of Deep and Surface Machine Learning Methods for Human-Related Trajectory Prediction

Bardh Prenkaj, Paola Velardi, Sapienza University of Rome
Damiano Distante, Stefano Faralli, University of Rome Unitelma Sapienza

Source code and data: benchmark.zip

As examples of trajectory-shaped risk prediction problems, we consider dropout prediction in online learning platforms, and survival prediction in a critical care tele-health system. We employ two datasets in the first domain, and one in the second. Specifically, the datasets we use in our study are: XuetangX, KDDCup15, and eICU. The first two are benchmark datasets for student dropout prediction and contain e-tivities (homework submissions, video views, navigational actions and so on) representing the students' interaction within several online courses.

For XuetangX, we prune all e-courses with less than 350 student trajectories, which leaves us with 19 courses overall, whereas we leave KDDCup15 as is, since all courses are sufficiently populated. The medical dataset eICU describes fatalities of patients admitted to multi-centre critical care in US hospitals between 2014 and 2015. Patient-related events include laboratory tests, medications, admissions, physical examinations, and visits. We prune all patient trajectories including zero events, which results in a final set of 65k trajectories. For each dataset, as suggested in [10], we perform a daily grouping of events concerning each individual (student or patient). Therefore, we represent each individual's trajectory $u$ with a time-matrix $\mathcal{T}_u \in \mathbb{R}^{\ell,m}$ where $\ell$ is the length of the adopted time-window in days and $m$ is the number of different events types (i.e., features) available in the dataset. In detail, a cell $(i,j) \in \mathcal{T}_u$ contains the number of events of type $j$ found in day $i$ of a trajectory $u$.


[10] L. Haiyang, Z. Wang, P. Benachour, and P. Tubman. 2018. A Time Series Classification Method for Behaviour-Based Dropout Prediction. InICALT 2018. 191–195

The table below illustrates the chosen parameters and activation functions for each deep method.


Parameters Activation functions
Intermediate layer Output layer
DNN num. layers = [3,5]
$\alpha = 2$
ReLU Sigmoid
Simple CNN CNN channels = [6,16]
kernel size = 5
padding = 2
pooling kernel = 2
pooling stride = 2
ReLU ReLU
Dense num layers = 3
output size = [20,10,1]
ReLU Sigmoid
Simple LSTM LSTM hidden size = 100 ReLU ReLU
Dense num layers = 1 - Sigmoid
ConRec CNN kernels = [20,50]
kernel size = 2
padding = 0
stride = 2
ReLU ReLU
RNN hidden size = 50 ReLU ReLU
Dense num layers = 1 - Sigmoid
CFIN CNN 1D convolution
kernel size inferred
- ReLU
Attention context vector
uniformly initalised
$\mu = 0$ $\sigma=0.05$
- -
Dense num layers = 1 - Sigmoid
HAN Local LSTM embedding size = 5
hidden size = 50
ReLU ReLU
Local LSTM hidden size = 50 ReLU ReLU
Dense num layers = 1 - Sigmoid