Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/11301
Title: A comparative study on some techniques for fitting linear regression models to big data
Authors: Mankoe, Mathias
Keywords: Big Data Data Reduction Data Simulation Merge and Reduce Random Projections Regression Analysis
Issue Date: Oct-2023
Publisher: University of Cape Coast
Abstract: This study examines the applicability of two Random Projection and Merge and Reduce methods, widely used in Computer Science, for linear regression analysis of big data in Statistics. The Clarkson-Woodruff, Rademacher Matrix as well as the Merge and Reduce techniques are used as data reduction techniques before performing a linear regression analysis on big data sets. The Classical Merge and Reduce approach uses parameter estimates and standard errors as summary values. In summary statistics, the Bayesian Merge and Reduce approach uses some characteristics of the posterior distribution. The study reveals that the techniques considered in this thesis are good data reduction techniques for fitting linear regression models to big data sets. The Clarkson-Woodruff method provides faster and more reliable reduced data sets for linear regression analysis. The Merge and Reduce models better approximate the true Poisson and linear regression models provided there are enough observations per variable per block (5000 observations per block). However, for data sets with unbalanced factor variables, the Bayesian Merge and Reduce models approximate the true models better than the Classical Merge and Reduce models. The Merge and Reduce models show good approximations of the true models when outliers are evenly distributed among blocks. But the standard errors are overestimated for models without intercept terms. For uneven distribution of outliers, the Random Projection methods provide reliable results. The methods considered in this thesis are largely used in Computers Science, but they can be used for efficient linear regression analysis of big data sets.
Description: xi, 109p,; ill.
URI: http://hdl.handle.net/123456789/11301
Appears in Collections:Department of Mathematics & Statistics

Files in This Item:
File Description SizeFormat 
MANKOE, 2022.pdfMpil thesis3.21 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.