Morphology-preserving reconstruction of times series with missing data for enhancing deep learning-based classification

There is a growing concern among deep learning-based decoding methods used for biomedical time series. In small dataset particularly those that rely mainly on subject-specific analyses, these decoding techniques correspond too closely to set of data and may consequently unable to generalize well on future observations. Considering this overfitting issue, expanding the datasets without introducing extra noise or losing important information is highly demanded. In so doing, this work invokes a novel idea of using delay-embedding-based nonlinear principal component analysis (DE-NLPCA) to generate synthetic time series. This idea was inspired by extracting topological representation of input space by unsupervised learning which can benefits augmentation of biomedical time series, tending to be high dimensional and morphologically complex. Different types of time series with different temporal complexity were used for evaluation. One of them was an open dataset associated with the activities of daily living, being collected from 10 healthy participants performing 186 ADL-related instances of activity while wearing 9-axis Inertial Measurement Units. Another dataset was an experimental data from healthy-brain patients undergoing operation (N = 20), being recorded from the BrainStatus device with 10 EEG channels. Considering leave-one-subject-out cross-validation, increase of up to 14.72% in classification performance (in terms of accuracy) was observed across anesthesia dataset when DE-NLPCA-based augmented data was introduced during training. It was also found that classification performance was more improved when DE-NLPCA-based technique were introduced compared to augmentation using conditional generative adversarial network (CGAN). This DE-NLPCA-based approach was also shown to be able to recover time–frequency characteristics of contaminated signals.