Robust, Scalable and Fast Bootstrap Method for Analyzing Large Scale Data

Location:

JACOBS HALL (EBU1) // UC San Diego         

BOOKER CONFERENCE SUITE (#2512)

Contact:
11/03/2016

This talk address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We propose a scalable, statistically robust and computationally efficient bootstrap method, compatible with distributed processing and storage systems. The proposed method combines distributed bootstrap with computationally efficient fixed point equations. Many statistically robust and highly efficient estimators lend themselves to such computation. Bootstrap resamples are constructed from a smaller number of distinct data points that correspond to multiple disjoint subsets of data, similarly to the bag of little bootstrap method (BLB) by Kleiner et al. This facilitates distributed storage and computation in inference. Significant saving in computation is achieved by avoiding the re-computation of the estimator for each bootstrap sample. Instead, an initial estimate is improved by using an efficient fixed-point estimation equation. An analytically found correction term compensating for underestimated variability is applied. Our proposed bootstrap method facilitates the use of highly robust statistical methods in analyzing large scale data sets. The favorable statistical properties of the method are established analytically. Numerical examples on finding confidence intervals in parameter estimation and hypothesis testing problems demonstrate scalability, low complexity and robust statistical performance of the method in analyzing large data sets.

 

Visa Koivunen (IEEE Fellow) has been the Professor of Signal Processing at Aalto since 1999. He leads the Statistical and Sensor Array Signal Processing team at Aalto. He was a post-doc at the University of Pennsylvania from 1992 to 1995. During 2010-2014, he was the Academy Professor. In 2006-2012, he was the Nokia visiting fellow. He has spent two full sabbaticals at Princeton University and has held visiting fellow appointment there since 2010. His research interests include statistical, communication, sensor array signal processing, and large-scale data analysis. He has received 4 conference Best Paper Awards and the IEEE SP Society Best Paper Award in 2007. He has served in the editorial board for the IEEE TR on Signal Processing and IEEE Signal Processing Magazine. He is a member of the IEEE Fourier Award board and Distinguished Lecturer of IEEE SPS. He received the 2015 EURASIP Technical Achievement Award for fundamental contributions to statistical signal processing and its applications in wireless communications, radar and related fields.