ISBN-13: 9783659556401 / Angielski / Miękka / 2014 / 112 str.
In the context of microarray data, a common characteristic is that the number of parameter is greater than the number of samples (n p). Because of this feature, many existing methods, derived for the usual "small p and large n" problem, either cannot be applied or may not perform well. For the purpose of classification of tumor types in real and simulated microarray data using regularized and classification approaches, we have studied three regression methods, namely Least Absolute Shrinkage and Selection Operator (LASSO), ridge regression, elastic net and four classification methods namely KNN, SVM, RDA and DLDA. In order to evaluation, we have used four readily available real microarray data sets which are Colon, Brain, SRBCT and Spira. The lasso imposes an L1 penalty and ridge regression imposes an L2 penalty; whereas, the elastic net is a balance between these two. Real data and simulation study show that the elastic net outperforms the lasso, although they both are derived from similar concept. Through the comparative study we have found that RDA performs the best for Brain, SRBCT and Spira cancer data and KNN performs better for Colon cancer data."
In the context of microarray data, a common characteristic is that the number of parameter is greater than the number of samples (n≪p). Because of this feature, many existing methods, derived for the usual "small p and large n" problem, either cannot be applied or may not perform well. For the purpose of classification of tumor types in real and simulated microarray data using regularized and classification approaches, we have studied three regression methods, namely Least Absolute Shrinkage and Selection Operator (LASSO), ridge regression, elastic net and four classification methods namely KNN, SVM, RDA and DLDA. In order to evaluation, we have used four readily available real microarray data sets which are Colon, Brain, SRBCT and Spira. The lasso imposes an L1 penalty and ridge regression imposes an L2 penalty; whereas, the elastic net is a balance between these two. Real data and simulation study show that the elastic net outperforms the lasso, although they both are derived from similar concept. Through the comparative study we have found that RDA performs the best for Brain, SRBCT and Spira cancer data and KNN performs better for Colon cancer data.