Cluster Analysis for High-Dimensionality Population Health Data

Population health research seeks to develop a better understanding of how social, cultural, environmental, occupational and economic factors determine health status. While most population health research focuses on specific hypotheses, understanding the bigger picture can yield insights on a larger scale. How socio-economic factors influence or correlate with health status, how diseases group together in constellations, and how these relate to health services usage, medication usage and health-driven outcomes are all important questions. Cluster analysis (CA) is a class of statistical techniques that can be applied to data that exhibit natural patterns. However, current CA methods are poorly suited to broad population health data, which may contain hundreds of variables with many dimensions. The result is that patterns between classes of variables can be lost in the statistical “noise.” Eric Sayre is developing a new method of CA called Cluster Analysis for High-Dimensionality Data (CAHDD), which provides a means for filtering statistical noise, and allowing important patterns to emerge from the data. By applying CAHDD to Canadian population health data, Eric’s research seeks to answer big-picture questions about socio-economic factors and health status. CAHDD will be available for other health researchers to interpret population health data, leading to significant advances in our understanding of the determinants of health status in our population.