TY - GEN
T1 - Analysis of encoder representations as features using sparse autoencoders in gradient boosting and ensemble tree models
AU - Aguilar, Luis
AU - Aguilar, L. Antonio
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - The performance of learning algorithms relies on factors such as the training strategy, the parameter tuning approach, and data complexity; in this scenario, extracted features play a fundamental role. Since not all the features maintain useful information, they can add noise, thus decreasing the performance of the algorithms. To address this issue, a variety of techniques such as feature ex-traction, feature engineering and feature selection have been developed, most of which fall into the unsupervised learning category. This study explores the generation of such features, using a set of k encoder layers, which are used to produce a low dimensional feature set F. The encoder layers were trained using a two-layer depth sparse autoencoder model, where PCA was used to estimate the right number of hidden units in the first layer. Then, a set of four algorithms, which belong to the gradient boosting and ensemble families were trained using the generated features. Finally, a performance comparison, using the encoder features against the original features was made. The results show that by using the reduced features it is possible to achieve equal or better results. Also, the approach improves more with highly imbalanced data sets.
AB - The performance of learning algorithms relies on factors such as the training strategy, the parameter tuning approach, and data complexity; in this scenario, extracted features play a fundamental role. Since not all the features maintain useful information, they can add noise, thus decreasing the performance of the algorithms. To address this issue, a variety of techniques such as feature ex-traction, feature engineering and feature selection have been developed, most of which fall into the unsupervised learning category. This study explores the generation of such features, using a set of k encoder layers, which are used to produce a low dimensional feature set F. The encoder layers were trained using a two-layer depth sparse autoencoder model, where PCA was used to estimate the right number of hidden units in the first layer. Then, a set of four algorithms, which belong to the gradient boosting and ensemble families were trained using the generated features. Finally, a performance comparison, using the encoder features against the original features was made. The results show that by using the reduced features it is possible to achieve equal or better results. Also, the approach improves more with highly imbalanced data sets.
KW - Ensemble models
KW - Feature generation
KW - Gradient boosting models
KW - Sparse autoencoders
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85057128635&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-03928-8_13
DO - 10.1007/978-3-030-03928-8_13
M3 - Conference contribution
AN - SCOPUS:85057128635
SN - 9783030039271
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 159
EP - 169
BT - Advances in Artificial Intelligence – IBERAMIA 2018 - 16th Ibero-American Conference on AI, Proceedings
A2 - Fermé, Eduardo
A2 - Simari, Guillermo R.
A2 - Gutiérrez Segura, Flabio
A2 - Rodríguez Melquiades, José Antonio
PB - Springer Verlag
T2 - 16th Ibero-American Conference on Artificial Intelligence, IBERAMIA 2018
Y2 - 13 November 2018 through 16 November 2018
ER -