skip to main content

Separating the wheat from the chaff: on feature selection and feature importance in regression random forests and symbolic regression

Stijven, Sean ; Minnebo, Wouter ; Vladislavleva, Katya

Proceedings of the 13th annual conference companion on genetic and evolutionary computation, 2011, p.623-630

ACM

Texto completo disponível

Citações Citado por
  • Título:
    Separating the wheat from the chaff: on feature selection and feature importance in regression random forests and symbolic regression
  • Autor: Stijven, Sean ; Minnebo, Wouter ; Vladislavleva, Katya
  • Assuntos: feature selection ; genetic programming ; random forests ; symbolic regression ; variable importance ; variable selection
  • É parte de: Proceedings of the 13th annual conference companion on genetic and evolutionary computation, 2011, p.623-630
  • Descrição: Feature selection in high-dimensional data sets is an open problem with no universal satisfactory method available. In this paper we discuss the requirements for such a method with respect to the various aspects of feature importance and explore them using regression random forests and symbolic regression. We study 'conventional' feature selection with both methods on several test problems and a case study, compare the results, and identify the conceptual differences in generated feature importances. We demonstrate that random forests might overlook important variables (significantly related to the response) for various reasons, while symbolic regression identifies all important variables if models of sufficient quality are found. We explain the results by the fact that variable importances obtained by these methods have different semantics.
  • Editor: ACM
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.