Bayesian Models for Sparse Regression Analysis of High Dimensional Data*

Sylvia Richardson, Leonardo Bottolo and Jeffrey S. Rosenthal

in Bayesian Statistics 9

Published in print October 2011 | ISBN: 9780199694587
Published online January 2012 | e-ISBN: 9780191731921 | DOI:
Bayesian Models for Sparse Regression Analysis of High Dimensional Data*

Show Summary Details


This paper considers the task of building efficient regression models for sparse multivariate analysis of high dimensional data sets, in particular it focuses on cases where the numbers q of responses Y = (y k,1 ≤ kq) and p of predictors X = (x j, 1 ≤ jp) to analyse jointly are both large with respect to the sample size n, a challenging bi‐directional task. The analysis of such data sets arise commonly in genetical genomics, with X linked to the DNA characteristics and Y corresponding to measurements of fundamental biological processes such as transcription, protein or metabolite production. Building on the Bayesian variable selection set‐up for the linear model and associated efficient MCMC algorithms developed for single responses, we discuss the generic framework of hierarchical related sparse regressions, where parallel regressions of y k on the set of covariates X are linked in a hierarchical fashion, in particular through the prior model of the variable selection indicators γ kj, which indicate among the covariates x j those which are associated to the response y k in each multivariate regression. Structures for the joint model of the γ kj, which correspond to different compromises between the aims of controlling sparsity and that of enhancing the detection of predictors that are associated with many responses (“hot spots”), will be discussed and a new multiplicative model for the probability structure of the γ kj will be presented. To perform inference for these models in high dimensional set‐ups, novel adaptive MCMC algorithms are needed. As sparsity is paramount and most of the associations expected to be zero, new algorithms that progressively focus on part of the space where the most interesting associations occur are of great interest. We shall discuss their formulation and theoretical properties, and demonstrate their use on simulated and real data from genomics.

Keywords: Adaptive MCMC scanning; eQTL; Genomics; Hierarchically related regressions; variable selection

Chapter.  17010 words.  Illustrated.

Subjects: Probability and Statistics

Full text: subscription required

How to subscribe Recommend to my Librarian

Buy this work at Oxford University Press »

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.