University of Cape Coast Institutional Repository

A comparative study on some techniques for fitting linear regression models to big data

Show simple item record

dc.contributor.author Mankoe, Mathias
dc.date.accessioned 2024-12-04T13:58:57Z
dc.date.available 2024-12-04T13:58:57Z
dc.date.issued 2023-10
dc.identifier.uri http://hdl.handle.net/123456789/11301
dc.description xi, 109p,; ill. en_US
dc.description.abstract This study examines the applicability of two Random Projection and Merge and Reduce methods, widely used in Computer Science, for linear regression analysis of big data in Statistics. The Clarkson-Woodruff, Rademacher Matrix as well as the Merge and Reduce techniques are used as data reduction techniques before performing a linear regression analysis on big data sets. The Classical Merge and Reduce approach uses parameter estimates and standard errors as summary values. In summary statistics, the Bayesian Merge and Reduce approach uses some characteristics of the posterior distribution. The study reveals that the techniques considered in this thesis are good data reduction techniques for fitting linear regression models to big data sets. The Clarkson-Woodruff method provides faster and more reliable reduced data sets for linear regression analysis. The Merge and Reduce models better approximate the true Poisson and linear regression models provided there are enough observations per variable per block (5000 observations per block). However, for data sets with unbalanced factor variables, the Bayesian Merge and Reduce models approximate the true models better than the Classical Merge and Reduce models. The Merge and Reduce models show good approximations of the true models when outliers are evenly distributed among blocks. But the standard errors are overestimated for models without intercept terms. For uneven distribution of outliers, the Random Projection methods provide reliable results. The methods considered in this thesis are largely used in Computers Science, but they can be used for efficient linear regression analysis of big data sets. en_US
dc.language.iso en en_US
dc.publisher University of Cape Coast en_US
dc.subject Big Data Data Reduction Data Simulation Merge and Reduce Random Projections Regression Analysis en_US
dc.title A comparative study on some techniques for fitting linear regression models to big data en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UCC IR


Advanced Search

Browse

My Account