A Reproducibility Study of Deep and Surface Machine Learning Methods for Human-Related Trajectory Prediction

Bardh Prenkaj, Paola Velardi, Sapienza University of Rome
Damiano Distante, Stefano Faralli, University of Rome Unitelma Sapienza

As examples of trajectory-shaped risk prediction problems, we consider dropout prediction in online learning platforms, and survival prediction in a critical care tele-health system. We employ two datasets in the first domain, and one in the second. Specifically, the datasets we use in our study are: XuetangX, KDDCup15, and eICU. The first two are benchmark datasets for student dropout prediction and contain e-tivities (homework submissions, video views, navigational actions and so on) representing the students' interaction within several online courses.

For XuetangX, we prune all e-courses with less than 350 student trajectories, which leaves us with 19 courses overall, whereas we leave KDDCup15 as is, since all courses are sufficiently populated. The medical dataset eICU describes fatalities of patients admitted to multi-centre critical care in US hospitals between 2014 and 2015. Patient-related events include laboratory tests, medications, admissions, physical examinations, and visits. We prune all patient trajectories including zero events, which results in a final set of 65k trajectories. For each dataset, as suggested in [10], we perform a daily grouping of events concerning each individual (student or patient). Therefore, we represent each individual's trajectory $u$ with a time-matrix $\mathcal{T}_u \in \mathbb{R}^{\ell,m}$ where $\ell$ is the length of the adopted time-window in days and $m$ is the number of different events types (i.e., features) available in the dataset. In detail, a cell $(i,j) \in \mathcal{T}_u$ contains the number of events of type $j$ found in day $i$ of a trajectory $u$.

[10] L. Haiyang, Z. Wang, P. Benachour, and P. Tubman. 2018. A Time Series Classification Method for Behaviour-Based Dropout Prediction. InICALT 2018. 191–195

The table below illustrates the chosen parameters and activation functions for each deep method.

		Parameters	Activation functions
		Parameters	Intermediate layer	Output layer
DNN		num. layers = [3,5] $\alpha = 2$	ReLU	Sigmoid
Simple CNN	CNN	channels = [6,16] kernel size = 5 padding = 2 pooling kernel = 2 pooling stride = 2	ReLU	ReLU
Simple CNN	Dense	num layers = 3 output size = [20,10,1]	ReLU	Sigmoid
Simple LSTM	LSTM	hidden size = 100	ReLU	ReLU
Simple LSTM	Dense	num layers = 1	-	Sigmoid
ConRec	CNN	kernels = [20,50] kernel size = 2 padding = 0 stride = 2	ReLU	ReLU
	RNN	hidden size = 50	ReLU	ReLU
	Dense	num layers = 1	-	Sigmoid
CFIN	CNN	1D convolution kernel size inferred	-	ReLU
	Attention	context vector uniformly initalised $\mu = 0$ $\sigma=0.05$	-	-
	Dense	num layers = 1	-	Sigmoid
HAN	Local LSTM	embedding size = 5 hidden size = 50	ReLU	ReLU
	Local LSTM	hidden size = 50	ReLU	ReLU
	Dense	num layers = 1	-	Sigmoid

A Reproducibility Study of Deep and Surface Machine Learning Methods for Human-Related Trajectory Prediction

Bardh Prenkaj, Paola Velardi, Sapienza University of RomeDamiano Distante, Stefano Faralli, University of Rome Unitelma Sapienza

Bardh Prenkaj, Paola Velardi, Sapienza University of Rome
Damiano Distante, Stefano Faralli, University of Rome Unitelma Sapienza