Skip to main content

Projects

3 mins

B.S. Statistics - Data Science Track

Minor - Computer Science

Highlights #

Big Data & High Performance Statistical Computing #

High-performance computing in high-level data analysis languages; different computational approaches and paradigms for efficient analysis of big data; interfaces to compiled languages; high-level parallel computing; MapReduce; parallel algorithms and reasoning. Python. R. SQL.

Categorical Data #

Varieties of categorical data, cross-classifications, contingency tables, tests for independence. Multidimensional tables and log-linear models, maximum likelihood estimation; tests of goodness-of-fit. Logit models, linear logistic models. Analysis of incomplete tables. Packaged computer programs, analysis of real data. R.

Project: Byssinossis (Farmer’s Lung Disease) Determination. Link TBU (Apr 2020).

Data and Web Technologies for Data Analysis #

Essentials of using relational databases and SQL. Processing data in blocks. Scraping Web pages and using Web services/APIs. Basics of text mining. Interactive data visualization with Web technologies. Computational data workflow and best practices. Statistical methods. Team lead. R. SQL.

Project: Temporal Birth Trends (2000-2015). Link TBU (Apr 2020).

Database Systems #

Team development and testing of L-Store style database system. Database modeling and design (E/R model, relational model), relational algebra, query languages, file and index structures, query processing, transaction management. Python. SQL.

Project: JellyDB - GitHub

Machine Learning #

Supervised & unsupervised learning, including classification, dimensionality reduction, regression & clustering using modern machine learning methods. Applications of machine learning in biology (oncology) and engineering. Python.

Computational Foundation and Focus #

Agent-Based Modeling #

Agent-based computer simulation and analysis with emphasis on learning how to model animals, including humans, to achieve insight into social and group behavior. Referred by instructor for continued studies in mate-choice and evolutionary game-theory modeling. Java.

Project: Effects of Similiarity and Attractiveness in Mate Selection. Link TBU (Apr 2020).

Algorithm Design and Analysis #

Complexity of algorithms, bounds on complexity, analysis methods. Searching, sorting, pattern matching, graph algorithms. Algorithm design techniques: divide-conquer, greedy, dynamic programming. Approximation methods. NP-complete problems. Python.

Computational Linguistics #

Understanding the nature of language through computer modeling of linguistic abilities. Relationships between human cognition and computer representations of cognitive processing.

Practice in Data Science #

Principles and practice of interdisciplinary, collaborative data analysis; complete case study review and team data analysis project. R.

Statistical Data Science #

Introduction to computing for data analysis and visualization, and simulation, using a high-level language. Computational reasoning, computationally intensive statistical methods, reading tabular and non-standard data. R.

Statistical Foundation #

ANOVA #

Foundational experiment design. One- and Two-way ANOVA. Random effects modeling. R.

Project: Bird-Nest Size Relationships. Link TBU (Apr 2020).

Applied Linear Algebra #

Extensive Problem Solving. Applications of linear algebra; LU and QR matrix factorizations, eigenvalue and singular value matrix decompositions. Matlab.

Project: Classification of Handwritten Digits. Link TBU (Apr 2020).

Mathematical Statistics #

Sampling, methods of estimation, bias-variance decomposition, sampling distributions, Fisher information, confidence intervals, and some elements of hypothesis testing.

Testing theory, tools and applications from probability theory, Linear model theory, ANOVA, goodness-of-fit.

Multivariate Analysis #

Multivariate normal distribution; Mahalanobis distance; sampling distributions of the mean vector and covariance matrix; Hotellings T2; simultaneous inference; one-way MANOVA; discriminant analysis; principal components; canonical correlation; factor analysis. Intensive use of computer analyses and real data sets. R.

Probability Theory #

Fundamental concepts of probability theory, discrete and continuous random variables, standard distributions, moments and moment-generating functions, laws of large numbers and the central limit theorem. R.

Regression Analysis #

Project: Regression Notes

Simple and multi- linear regression, variable selection techniques, stepwise regression, analysis of covariance, influence measures. R.

Statistical Learning #

Fundamental concepts and methods in statistical learning with emphasis on supervised learning. Principles, methodologies and applications of parametric and nonparametric regression, classification, resampling and model selection techniques. Python.

References: #