Principal component analysis
Principal component analysis is a versatile statistical method for reducing a cases-by-variables data table to its essential features, called principal components. Principal components are a few linear combinations of the original variables that maximally explain the variance of all the variables. In the process, the method provides an approximation of the original data table using only these few major components. This Primer presents a comprehensive review of the method’s definition and geometry, as well as the interpretation of its numerical and graphical results. The main graphical result is often in the form of a biplot, using the major components to map the cases and adding the original variables to support the distance interpretation of the cases’ positions. Variants of the method are also treated, such as the analysis of grouped data, as well as the analysis of categorical data, known as correspondence analysis. Also described and illustrated are the latest innovative applications of principal component analysis: for estimating missing values in huge data matrices, sparse component estimation, and the analysis of images, shapes and functions. Supplementary material includes video animations and computer scripts in the R environment.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
cancel any time
Subscribe to this journal
Receive 1 digital issues and online access to articles
133,45 € per year
only 133,45 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Easy computation of the Bayes factor to fully quantify Occam’s razor in least-squares fitting and to guide actions
Article Open access 19 January 2022
Variable Selection in the Regularized Simultaneous Component Analysis Method for Multi-Source Data Integration
Article Open access 09 December 2019
Simple nested Bayesian hypothesis testing for meta-analysis, Cox, Poisson and logistic regression models
Article Open access 23 March 2023
Code availability
Several datasets and the R scripts that produce certain results in this Primer can be found on GitHub at: https://github.com/michaelgreenacre/PCA.
Change history
References
- Pearson, K. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dubl. Phil. Mag. J. Sci.2, 559–572 (2010). ArticleMATHGoogle Scholar
- Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol.24, 417–441 (1933). ArticleMATHGoogle Scholar
- Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemometr. Intell. Lab. Syst.2, 37–52 (1987). ArticleGoogle Scholar
- Jackson, J. E. A User’s Guide To Principal Components (Wiley, 1991).
- Jolliffe, I. T. Principal Component Analysis 2nd edn (Springer, 2002). Covering all major aspects of theory of PCA and with a wide range of real applications.
- Ringnér, M. What is principal component analysis? Nat. Biotechnol.26, 303–304 (2008). ArticleGoogle Scholar
- Abdi, H. & Williams, L. J. Principal component analysis. WIREs Comp. Stat.2, 433–459 (2010). ArticleGoogle Scholar
- Bro, R. & Smilde, A. K. Principal component analysis. Anal. Meth.6, 2812–2831 (2014).A tutorial on how to understand, use, and interpret PCA in typical chemometric areas, with a general treatment that is applicable to other fields.ArticleGoogle Scholar
- Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A374, 20150202 (2016). ArticleADSMathSciNetMATHGoogle Scholar
- Helliwell, J. F., Huang, H., Wang, S. & Norton, M. World happiness, trust and deaths under COVID-19. In World Happiness Report Ch. 2, 13–56 (2021).
- Cantril, H. Pattern Of Human Concerns (Rutgers Univ. Press, 1965).
- Flury, B. D. Developments in principal component analysis. In Recent Advances In Descriptive Multivariate Analysis (ed. Krzanowski, W. J.) 14–33 (Clarendon Press, 1995).
- Gabriel, R. The biplot graphic display of matrices with application to principal component analysis. Biometrika58, 453–467 (1971). ArticleMathSciNetMATHGoogle Scholar
- Gower, J. C. & Hand, D. J. Biplots (Chapman & Hall, 1995).
- Greenacre, M. Biplots In Practice (BBVA Foundation, 2010). Comprehensive treatment of biplots, including principal component and correspondence analysis biplots, explained in a pedagogical way and aimed at practitioners.
- Greenacre, M. Contribution biplots. J. Comput. Graph. Stat.22, 107–122 (2013). ArticleMathSciNetGoogle Scholar
- Eckart, C. & Young, G. The approximation of one matrix by another of lower rank. Psychometrika1, 211–218 (1936). ArticleMATHGoogle Scholar
- Greenacre, M., Martínez-Álvaro, M. & Blasco, A. Compositional data analysis of microbiome and any-omics datasets: a validation of the additive logratio transformation. Front. Microbiol.12, 727398 (2021). ArticleGoogle Scholar
- Greenacre, M. Compositional data analysis. Annu. Rev. Stat. Appl.8, 271–299 (2021). ArticleMathSciNetGoogle Scholar
- Aitchison, J. & Greenacre, M. Biplots of compositional data. J. R. Stat. Soc. Ser. C51, 375–392 (2002). ArticleMathSciNetMATHGoogle Scholar
- Greenacre, M. Compositional Data Analysis In Practice (Chapman & Hall/CRC Press, 2018).
- Cattell, R. B. The scree test for the number of factors. Multivar. Behav. Res.1, 245–276 (1966). ArticleGoogle Scholar
- Jackson, D. A. Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology74, 2204–2214 (1993). ArticleGoogle Scholar
- Peres-Neto, P. R., Jackson, D. A. & Somers, K. A. How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Comput. Stat. Data Anal.49, 974–997 (2005). ArticleMathSciNetMATHGoogle Scholar
- Auer, P. & Gervini, D. Choosing principal components: a new graphical method based on Bayesian model selection. Commun. Stat. Simul. Comput.37, 962–977 (2008). ArticleMathSciNetMATHGoogle Scholar
- Cangelosi, R. & Goriely, A. Component retention in principal component analysis with application to cDNA microarray data. Biol. Direct.2, 2 (2007). ArticleGoogle Scholar
- Josse, J. & Husson, F. Selecting the number of components in principal component analysis using cross-validation approximations. Comput. Stat. Data Anal.56, 1869–1879 (2012). ArticleMathSciNetMATHGoogle Scholar
- Choi, Y., Taylor, J. & Tibshirani, R. Selecting the number of principal components: estimation of the true rank of a noisy matrix. Ann. Stat. 45, 2590–2617 (2017).
- Wang, M., Kornblau, S. M. & Coombes, K. R. Decomposing the apoptosis pathway into biologically interpretable principal components. Cancer Inf.17, 1176935118771082 (2018). Google Scholar
- Greenacre, M. & Degos, L. Correspondence analysis of HLA gene frequency data from 124 population samples. Am. J. Hum. Genet.29, 60–75 (1977). Google Scholar
- Borg, I. & Groenen, P. J. F. Modern Multidimensional Scaling: Theory And Applications (Springer Science & Business Media, 2005).
- Khan, J. et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med.7, 673–679 (2001). ArticleGoogle Scholar
- Hastie, T., Tibshirani, R., Friedman, J. H. & Friedman, J. H. The Elements of Statistical Learning Data Mining, Inference, And Prediction (Springer, 2009).
- James, G., Witten, D., Hastie, T. & Tibshirani, R. Introduction To Statistical Learning 2nd edn (Springer, 2021). General text on methodology for data science, with extensive treatment of PCA in its various forms, including matrix completion.
- Greenacre, M. Data reporting and visualization in ecology. Polar Biol.39, 2189–2205 (2016). ArticleGoogle Scholar
- Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugen.7, 179–188 (1936). ArticleGoogle Scholar
- Campbell, N. A. & Atchley, W. R. The geometry of canonical variate analysis. Syst. Zool.30, 268–280 (1981). ArticleGoogle Scholar
- Jolliffe, I. T. Rotation of principal components: choice of normalization constraints. J. Appl. Stat.22, 29–35 (1995). ArticleMathSciNetGoogle Scholar
- Cadima, J. F. C. L. & Jolliffe, I. T. Loadings and correlations in the interpretation of principal components. J. Appl. Stat.22, 203–214 (1995). ArticleMathSciNetGoogle Scholar
- Jolliffe, I. T., Trendafilov, N. T. T. & Uddin, M. A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12, 531–547 (2003).
- Zou, H., Hastie, T. & Tibshirani, R. Sparse principal component analysis. J. Comput. Graph. Stat.15, 265–286 (2006). ArticleMathSciNetGoogle Scholar
- Shen, H. & Huang, J. Z. Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal.99, 1015–1034 (2008). ArticleMathSciNetMATHGoogle Scholar
- Witten, D. M., Tibshirani, R. & Hastie, T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics10, 515–534 (2009). ArticleMATHGoogle Scholar
- Journée, M., Nesterov, Y., Richtárik, P. & Sepulchre, R. Generalized power method for sparse principal component analysis. J. Mach. Learn. Res.11, 517–553 (2010).
- Papailiopoulos, D., Dimakis, A. & Korokythakis, S. Sparse PCA through low-rank approximations. In Proc. 30th Int. Conf. on Machine Learning (PMLR)28, 747–755 (2013).
- Erichson, N. B. et al. Sparse principal component analysis via variable projection. SIAM J. Appl. Math.80, 977–1002 (2020). ArticleMathSciNetMATHGoogle Scholar
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B58, 267–288 (1996). MathSciNetMATHGoogle Scholar
- Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B67, 301–320 (2005). ArticleMathSciNetMATHGoogle Scholar
- Guerra-Urzola, R., van Deun, K., Vera, J. C. & Sijtsma, K. A guide for sparse PCA: model comparison and applications. Psychometrika86, 893–919 (2021). ArticleMathSciNetMATHGoogle Scholar
- Camacho, J., Smilde, A. K., Saccenti, E. & Westerhuis, J. A. All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance. Chemometr. Intell. Lab. Syst.196, 103907 (2020). ArticleGoogle Scholar
- Camacho, J., Smilde, A. K., Saccenti, E., Westerhuis, J. A. & Bro, R. All sparse PCA models are wrong, but some are useful. Part II: Limitations and problems of deflation. Chemometr. Intell. Lab. Syst.208, 104212 (2021). ArticleGoogle Scholar
- Benzécri, J.-P. Analyse Des Données, Tôme 2: Analyse Des Correspondances (Dunod, 1973).
- Greenacre, M. Correspondence Analysis in Practice 3rd edn (Chapman & Hall/CRC Press, 2016). Comprehensive treatment of correspondence analysis (CA) and its variants, multiple correspondence analysis (MCA) and canonical correspondence analysis (CCA).
- ter Braak, C. J. F. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology67, 1167–1179 (1986). ArticleGoogle Scholar
- Greenacre, M. & Primicerio, R. Multivariate Analysis of Ecological Data (Fundacion BBVA, 2013).
- Good, P. Permutation Tests: A Practical Guide To Resampling Methods For Testing Hypotheses (Springer Science & Business Media, 1994).
- Legendre, P. & Anderson, M. J. Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecol. Monogr.69, 1–24 (1999). ArticleGoogle Scholar
- van den Wollenberg, A. L. Redundancy analysis an alternative for canonical correlation analysis. Psychometrika42, 207–219 (1977). ArticleMATHGoogle Scholar
- Capblancq, T. & Forester, B. R. Redundancy analysis: a Swiss army knife for landscape genomics. Meth. Ecol. Evol.12, 2298–2309 (2021). ArticleGoogle Scholar
- Palmer, M. W. Putting things in even better order: the advantages of canonical correspondence analysis. Ecology74, 2215–2230 (1993). ArticleADSGoogle Scholar
- ter Braak, C. J. F. & Verdonschot, P. F. M. Canonical correspondence analysis and related multivariate methods in aquatic ecology. Aquat. Sci.57, 255–289 (1995). ArticleGoogle Scholar
- Abdi, H. & Valentin, D. Multiple correspondence analysis. Encycl. Meas. Stat.2, 651–657 (2007). Google Scholar
- Richards, G. & van der Ark, L. A. Dimensions of cultural consumption among tourists: multiple correspondence analysis. Tour. Manag.37, 71–76 (2013). ArticleGoogle Scholar
- Glevarec, H. & Cibois, P. Structure and historicity of cultural tastes. Uses of multiple correspondence analysis and sociological theory on age: the case of music and movies. Cult. Sociol.15, 271–291 (2021). ArticleGoogle Scholar
- Jones, I. R., Papacosta, O., Whincup, P. H., Goya Wannamethee, S. & Morris, R. W. Class and lifestyle ‘lock-in’ among middle-aged and older men: a multiple correspondence analysis of the British Regional Heart Study. Sociol. Health Illn.33, 399–419 (2011). ArticleGoogle Scholar
- Greenacre, M. & Pardo, R. Subset correspondence analysis: visualizing relationships among a selected set of response categories from a questionnaire survey. Sociol. Meth. Res.35, 193–218 (2006). ArticleMathSciNetGoogle Scholar
- Greenacre, M. & Pardo, R. Multiple correspondence analysis of subsets of response categories. In Multiple Correspondence Analysis And RelatedMethods (eds Greenacre, M. & Blasius, J.) 197–217 (Chapman & Hall/CRC Press, 2008).
- Aşan, Z. & Greenacre, M. Biplots of fuzzy coded data. Fuzzy Sets Syst.183, 57–71 (2011). ArticleMathSciNetGoogle Scholar
- Vichi, M., Vicari, D. & Kiers, H. A. L. Clustering and dimension reduction for mixed variables. Behaviormetrika46, 243–269 (2019). ArticleGoogle Scholar
- van de Velden, M., Iodice D’Enza, A. & Markos, A. Distance-based clustering of mixed data. Wiley Interdiscip. Rev. Comput. Stat.11, e1456 (2019). MathSciNetGoogle Scholar
- Greenacre, M. Use of correspondence analysis in clustering a mixed-scale data set with missing data. Arch. Data Sci. Ser. Bhttps://doi.org/10.5445/KSP/1000085952/04 (2019). ArticleGoogle Scholar
- Gifi, A. Nonlinear Multivariate Analysis (Wiley-Blackwell, 1990).
- Michailidis, G. & de Leeuw, J. The Gifi system of descriptive multivariate analysis. Stat. Sci. 13, 307–336 (1998).
- Linting, M., Meulman, J. J., Groenen, P. J. F. & van der Koojj, A. J. Nonlinear principal components analysis: introduction and application. Psychol. Meth.12, 336–358 (2007). Gentle introduction to nonlinear PCA for data that have categorical or ordinal variables, including an in-depth application to data of early childhood caregiving.ArticleGoogle Scholar
- Cazes, P., Chouakria, A., Diday, E. & Schektman, Y. Extension de l’analyse en composantes principales à des données de type intervalle. Rev. Stat. Appl.45, 5–24 (1997). Google Scholar
- Bock, H.-H., Chouakria, A., Cazes, P. & Diday, E. Symbolic factor analysis. In Analysis of Symbolic Data (ed. Bock H.-H. & Diday, E.) 200–212 (Springer, 2000).
- Lauro, C. N. & Palumbo, F. Principal component analysis of interval data: a symbolic data analysis approach. Comput. Stat.15, 73–87 (2000). ArticleMATHGoogle Scholar
- Gioia, F. & Lauro, C. N. Principal component analysis on interval data. Comput. Stat.21, 343–363 (2006). ArticleMathSciNetMATHGoogle Scholar
- Giordani, P. & Kiers, H. A comparison of three methods for principal component analysis of fuzzy interval data. Comput. Stat. Data Anal.51, 379–397 (2006). The application of PCA to non-atomic coded data, that is, interval or fuzzy data.ArticleMathSciNetMATHGoogle Scholar
- Makosso-Kallyth, S. & Diday, E. Adaptation of interval PCA to symbolic histogram variables. Adv. Data Anal. Classif.6, 147–159 (2012). ArticleMathSciNetMATHGoogle Scholar
- Brito, P. Symbolic data analysis: another look at the interaction of data mining and statistics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.4, 281–295 (2014). ArticleGoogle Scholar
- Le-Rademacher, J. & Billard, L. Principal component analysis for histogram-valued data. Adv. Data Anal. Classif.11, 327–351 (2017). ArticleMathSciNetMATHGoogle Scholar
- Booysen, F. An overview and evaluation of composite indices of development. Soc. Indic. Res.59, 115–151 (2002). ArticleGoogle Scholar
- Lai, D. Principal component analysis on human development indicators of China. Soc. Indic. Res.61, 319–330 (2003). ArticleGoogle Scholar
- Krishnakumar, J. & Nagar, A. L. On exact statistical properties of multidimensional indices based on principal components, factor analysis, MIMIC and structural equation models. Soc. Indic. Res.86, 481–496 (2008). ArticleGoogle Scholar
- Mazziotta, M. & Pareto, A. Use and misuse of PCA for measuring well-being. Soc. Indic. Res.142, 451–476 (2019). ArticleGoogle Scholar
- Fabrigar, L. R., Wegener, D. T., MacCallum, R. C. & Strahan, E. J. Evaluating the use of exploratory factor analysis in psychological research. Psychol. Meth.4, 272–299 (1999). ArticleGoogle Scholar
- Booysen, F., van der Berg, S., Burger, R., von Maltitz, M. & du Rand, G. Using an asset index to assess trends in poverty in seven Sub-Saharan African countries. World Dev.36, 1113–1130 (2008). ArticleGoogle Scholar
- Wabiri, N. & Taffa, N. Socio-economic inequality and HIV in South Africa. BMC Public. Health13, 1037 (2013). ArticleGoogle Scholar
- Lazarus, J. Vetal The global NAFLD policy review and preparedness index: are countries ready to address this silent public health challenge? J. Hepatol.76, 771–780 (2022). ArticleGoogle Scholar
- Rodarmel, C. & Shan, J. Principal component analysis for hyperspectral image classification. Surv. Land. Inf. Sci.62, 115–122 (2002). Google Scholar
- Du, Q. & Fowler, J. E. Hyperspectral image compression using JPEG2000 and principal component analysis. IEEE Geosci. Remote. Sens. Lett.4, 201–205 (2007). ArticleADSGoogle Scholar
- Turk, M. & Pentland, A. Eigenfaces for recognition. J. Cogn. Neurosci.3, 71–86 (1991). ArticleGoogle Scholar
- Paul, L. & Suman, A. Face recognition using principal component analysis method. Int. J. Adv. Res. Comput. Eng. Technol.1, 135–139 (2012). Google Scholar
- Zhu, J., Ge, Z., Song, Z. & Gao, F. Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Annu. Rev. Control.46, 107–133 (2018). ArticleMathSciNetGoogle Scholar
- Ghorbani, M. & Chong, E. K. P. Stock price prediction using principal components. PLoS One15, e0230124 (2020). ArticleGoogle Scholar
- Pang, R., Lansdell, B. J. & Fairhall, A. L. Dimensionality reduction in neuroscience. Curr. Biol.26, R656–R660 (2016). ArticleGoogle Scholar
- Abraham, G. & Inouye, M. Fast principal component analysis of large-scale genome-wide data. PLoS One9, e93766 (2014). ArticleADSGoogle Scholar
- Alter, O., Brown, P. O. & Botstein, D. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci.97, 10101–10106 (2000). Application of PCA to gene expression data, proposing the concepts of eigenarrays and eigengenes as representative linear combinations of original arrays and genes.ArticleADSGoogle Scholar
- Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet.2, e190 (2006). ArticleGoogle Scholar
- Tsuyuzaki, K., Sato, H., Sato, K. & Nikaido, I. Benchmarking principal component analysis for large-scale single-cell RNA-sequencing. Genome Biol.21, 9 (2020). ArticleGoogle Scholar
- Golub, G. H. & van Loan, C. F. Matrix Computations (JHU Press, 2013).
- Lanczos, C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bureau Standards45, 255–282 (1950). ArticleMathSciNetGoogle Scholar
- Baglama, J. & Reichel, L. Augmented GMRES-type methods. Numer. Linear Algebra Appl.14, 337–350 (2007). ArticleMathSciNetMATHGoogle Scholar
- Wu, K. & Simon, H. Thick-restart Lanczos method for large symmetric eigenvalue problems. SIAM J. Matrix Anal. Appl.22, 602–616 (2000). ArticleMathSciNetMATHGoogle Scholar
- Halko, N., Martinsson, P.-G. & Tropp, J. A. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev.53, 217–288 (2011). A comprehensive review of randomized algorithms for low-rank approximation in PCA and SVD.ArticleMathSciNetMATHGoogle Scholar
- Weng, J., Zhang, Y. & Hwang, W.-S. Candid covariance-free incremental principal component analysis. IEEE Trans. Pattern Anal. Mach. Intell.25, 1034–1040 (2003). ArticleGoogle Scholar
- Ross, D. A., Lim, J., Lin, R.-S. & Yang, M.-H. Incremental learning for robust visual tracking. Int. J. Comput. Vis.77, 125–141 (2008). Proposal of incremental implementations of PCA for applications to large data sets and data flows.ArticleGoogle Scholar
- Cardot, H. & Degras, D. Online principal component analysis in high dimension: which algorithm to choose? Int. Stat. Rev.86, 29–50 (2018). ArticleMathSciNetGoogle Scholar
- Iodice D’Enza, A. & Greenacre, M. Multiple correspondence analysis for the quantification and visualization of large categorical data sets. In Advanced Statistical Methods for the Analysis of Large Data-Sets (eds di Ciaccio, A., Coli, M. & Angulo Ibanez, J.-M.) 453–463 (Springer, 2012).
- Iodice D’Enza, A., Markos, A. & Palumbo, F. Chunk-wise regularised PCA-based imputation of missing data. Stat. Meth. Appl. 31, 365–386 (2021).
- Shiokawa, Y. et al. Application of kernel principal component analysis and computational machine learning to exploration of metabolites strongly associated with diet. Sci. Rep.8, 3426 (2018). ArticleADSGoogle Scholar
- Koren, Y., Bell, R. & Volinsky, C. Matrix factorization techniques for recommender systems. Computer42, 30–37 (2009). ArticleGoogle Scholar
- Li, Y. On incremental and robust subspace learning. Pattern Recogn.37, 1509–1518 (2004). ArticleADSMATHGoogle Scholar
- Bouwmans, T. Subspace learning for background modeling: a survey. Recent Pat. Comput. Sci.2, 223–234 (2009). ArticleGoogle Scholar
- Guyon, C., Bouwmans, T. & Zahzah, E.-H. Foreground detection via robust low rank matrix decomposition including spatio-temporal constraint. In Asian Conf. ComputerVision (eds Park, J. Il & Kim, J.) 315–320 (Springer, 2012).
- Bouwmans, T. & Zahzah, E. H. Robust PCA via principal component pursuit: a review for a comparative evaluation in video surveillance. Comput. Vis. Image Underst.122, 22–34 (2014). ArticleGoogle Scholar
- Mazumder, R., Hastie, T. & Tibshirani, R. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res.11, 2287–2322 (2010). MathSciNetMATHGoogle Scholar
- Josse, J. & Husson, F. Handling missing values in exploratory multivariate data analysis methods. J. Soc. Fr. Stat.153, 79–99 (2012). MathSciNetMATHGoogle Scholar
- Hastie, T., Tibshirani, R. & Wainwright, M. Statistical Learning With Sparsity: The LASSO And Generalizations (CRC Press, 2015). Comprehensive treatment of the concept of sparsity in many different statistical contexts, including PCA and related methods.
- Hastie, T., Mazumder, R., Lee, J. D. & Zadeh, R. Matrix completion and low-rank SVD via fast alternating least squares. J. Mach. Learn. Res.16, 3367–3402 (2015). MathSciNetMATHGoogle Scholar
- Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J.-P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun.9, 284 (2018). ArticleADSGoogle Scholar
- Ioannidis, A. G. et al. Paths and timings of the peopling of Polynesia inferred from genomic networks. Nature597, 522–526 (2021). ArticleADSGoogle Scholar
- Rohlf, F. J. & Archie, J. W. A comparison of Fourier methods for the description of wing shape in mosquitoes (Diptera: Culicidae). Syst. Zool.33, 302–317 (1984). ArticleGoogle Scholar
- Gower, J. C. Generalized Procrustes analysis. Psychometrika40, 33–51 (1975). ArticleMathSciNetMATHGoogle Scholar
- Dryden, I. L. & Mardia, K. V. Statistical Shape Analysis: With Applications In R 2nd edn, Vol. 995 (John Wiley & Sons, 2016).
- Ocaña, F. A., Aguilera, A. M. & Valderrama, M. J. Functional principal components analysis by choice of norm. J. Multivar. Anal.71, 262–276 (1999). ArticleMathSciNetMATHGoogle Scholar
- Ramsay, J. O. & Silverman, B. W. Principal components analysis for functional data. In Functional Data Analysis 147–172 (Springer, 2005).
- James, G. M., Hastie, T. J. & Sugar, C. A. Principal component models for sparse functional data. Biometrika87, 587–602 (2000). ArticleMathSciNetMATHGoogle Scholar
- Yao, F., Müller, H.-G. & Wang, J.-L. Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc.100, 577–590 (2005). ArticleMathSciNetMATHGoogle Scholar
- Hörmann, S., Kidziński, Ł. & Hallin, M. Dynamic functional principal components. J. R. Stat. Soc. Ser. B77, 319–348 (2015). ArticleMathSciNetMATHGoogle Scholar
- Bongiorno, E. G. & Goia, A. Describing the concentration of income populations by functional principal component analysis on Lorenz curves. J. Multivar. Anal.170, 10–24 (2019). ArticleMathSciNetMATHGoogle Scholar
- Li, Y., Huang, C. & Härdle, W. K. Spatial functional principal component analysis with applications to brain image data. J. Multivar. Anal.170, 263–274 (2019). ArticleMathSciNetMATHGoogle Scholar
- Song, J. & Li, B. Nonlinear and additive principal component analysis for functional data. J. Multivar. Anal.181, 104675 (2021). ArticleMathSciNetMATHGoogle Scholar
- Tuzhilina, E., Hastie, T. J. & Segal, M. R. Principal curve approaches for inferring 3D chromatin architecture. Biostatistics23, 626–642 (2022). ArticleMathSciNetGoogle Scholar
- Maeda, H., Koido, T. & Takemura, A. Principal component analysis of song units produced by humpback whales (Megaptera novaeangliae) in the Ryukyu region of Japan. Aquat. Mamm.26, 202–211 (2000). Google Scholar
- Allen, J. A. et al. Song complexity is maintained during inter-population cultural transmission of humpback whale songs. Sci. Rep.12, 8999 (2022). ArticleADSGoogle Scholar
- Wiltschko, A. B. et al. Mapping sub-second structure in mouse behavior. Neuron88, 1121–1135 (2015). ArticleGoogle Scholar
- Liu, L. T., Dobriban, E. & Singer, A. ePCA: high dimensional exponential family PCA. Ann. Appl. Stat.12, 2121–2150 (2018). ArticleMathSciNetMATHGoogle Scholar
- Lê, S., Josse, J. & Husson, F. FactoMineR: an R package for multivariate analysis. J. Stat. Softw.25, 1–18 (2008). ArticleGoogle Scholar
- Siberchicot, A., Julien-Laferrière, A., Dufour, A.-B., Thioulouse, J. & Dray, S. adegraphics: an S4 Lattice-based package for the representation of multivariate data. R J.9, 198–212 (2017). ArticleGoogle Scholar
- Thioulouse, J. et al. Multivariate Analysis Of Ecological Data With ade4 (Springer, 2018).
- Erichson, N. B., Voronin, S., Brunton, S. L. & Kutz, J. N. Randomized matrix decompositions using R. J. Stat. Softw.89, 1–48 (2019). ArticleGoogle Scholar
- Iodice D’Enza, A., Markos, A. & Buttarazzi, D. The idm package: incremental decomposition methods in R. J. Stat. Softw.86, 1–24 (2018). Google Scholar
- Josse, J. & Husson, F. missMDA: a package for handling missing values in multivariate data analysis. J. Stat. Softw.70, 1–31 (2016). ArticleGoogle Scholar
- Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011). MathSciNetMATHGoogle Scholar
- Harris, C. R. et al. Array programming with NumPy. Nature585, 357–362 (2020). ArticleADSGoogle Scholar
- Kidziński, Ł. et al. Deep neural networks enable quantitative movement analysis using single-camera videos. Nat. Commun.11, 4054 (2020). ArticleADSGoogle Scholar
Acknowledgements
This review is dedicated to the memory of Professor Cas Troskie, who was the head of the Department of Statistics at the University of Cape Town, both teacher and mentor to M.G. and T.H., and who planted the seeds of principal component analysis in them at an early age. T.H. was partially supported by grants DMS2013736 and IIS1837931 from the National Science Foundation, and grant 5R01 EB001988-21 from the National Institutes of Health. E.T. was supported by the Stanford Data Science Institute.
Author information
Authors and Affiliations
- Department of Economics and Business, Universitat Pompeu Fabra and Barcelona School of Management, Barcelona, Spain Michael Greenacre
- Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, Netherlands Patrick J. F. Groenen
- Departments of Statistics and Biomedical Science, Stanford University, Stanford, CA, USA Trevor Hastie
- Department of Political Sciences, University of Naples Federico II, Naples, Italy Alfonso Iodice D’Enza
- Department of Primary Education, Democritus University of Thrace, Alexandroupolis, Greece Angelos Markos
- Department of Statistics, Stanford University, Stanford, CA, USA Elena Tuzhilina
- Michael Greenacre