An intense review of the elementary aspects of computer programming using both R and Python, and an introduction to a variety of numerical and computational problems. Topics include functions, recursion, loops, list comprehensions, reading and writing files, importing web sites, generating random numbers, the method of inverse transformations, acceptance/rejection sampling, gradient descent, bootstrapping techniques, matrix and vector operations, and graphics.
An intense review of linear algebra. Topics include matrix operations, special matrices, linear systems of equations, the inverse matrix, and determinants; vectors, subspaces, linear independence, basis and dimension, row space, column space, rank, and the rank-nutility theorem; eigenvectors, eigenvalues, computational methods for finding eigenvectors and eigenvalues, and diagonalization of matrices; the LU decomposition and singular value decomposition.
An introduction to the history of "big data" and four ideas driving the revolution in data analytics: volume, velocity, variety, and veracity. Students will read current newspaper and journal articles, listen to guest speakers, and complete case studies. After finishing this gateway course, students should understand how businesses, governments, and not-for-profit institutions are creating stakeholder value by more effectively capturing, curating, storing, searching, charing, analyzing, and visualizing data.
An intense review of elementary probability and statistics. Topics include random variables, probability mass functions, density functions, the cumulative distribution function, moments, maximum likelihood estimation, and the method of moments; one- and two-sample hypothesis tests and confidence intervals involving proportions, means, and correlation coefficients; the axioms of Kolmogorov, independence, the law of total probability, and Bayes' Theorem; and multivariate distributions. indicator random variables, and conditional expectation.
This course is an intensive introduction to linear models, with a focus on both principles and practice. Examples from finance, business, marketing and economics are emphasized. Large data sets are used frequently. Topics include simple and multiple linear regression; weighted, generalized, and outlier-resistant least squares regression; interaction terms; transformations; regression diagnostics and addressing violations of regression assumptions; variable selection techniques like backward elimination and forward selection, and logit/probit models. Statistical packages include R and SAS.
In this course, students will read case studies and hear from guest speakers about challenges and opportunities generated by the advent of "big data." Students will make group presentations and write critical response papers related to these case studies. Students will consider some of the traditional business frameworks (e.g., SWOT analysis) for evaluating the strategic opportunities available to a company in the "big data" space.
A survey of the theory and application of time series models, with a particular emphasis on financial and business application (e.g., exchange rates, sales data, Value-at-Risk, etc.). Tools for model identification, estimation, and assessment of are developed in depth. Smoothing methods and trend/seasonal decomposition methods are covered as well, including moving average, exponential, Holt-Winters, and Lowess smoothing techniques. Finally, volatility clustering is modeled through ARCH, GARCH, EGARCH, and GARCH-in-mean specifications. Statistical packages include R and SAS.
Provides both skills and experience in working with clients and opportunities to practice the professional skills required by business. The course features frequent presentations by program partners about real analytical problems and how they are addressed. The course features significant one-on-one mentoring and integration of topics presented in program’s courses.
In this course, students will learn essential concepts related to
business communication and, in particular, the communication of technical
material. Students will learn how to competently create, organize, and support
ideas in their business presentations. They will deliver both planned and
extemporaneous public presentations on topics related to data analysis,
business, and economics. This course will particularly emphasize the creation
of presentation slides and other supporting materials, the correct use of data
visualization techniques, and learning how to listen to and critically evaluate
presentations made by other students.
Algorithms to classify unknown data and make predictions. Support Vector Machines, kNN, Naive Bayes, association rules (a priori algorithm), decision trees, feature selection, classifier accuracy measures, Neural networks.
This course will address basic information and data visualization techniques, as well as design principles. Students will primarily use R with the ggplot2 and shiny packages to prototype visualizations. Students will obtain practical experience with the presentation of complex visual data, including multivariate data, geospatial data, textual data, and networks and data.
This course trains students in the use of multivariate statistical methods other than multiple linear regression, which is covered in MSAN 601. Application to finance, social science, and marketing data are emphasized (e.g., dimension reduction for Treasury yield curves and consumer microdata). Topics include factor analysis, linear and nonlinear discriminant analysis, ANOVA and MANOVA, regression with longitudinal data, repeated measures ANOVA, and both hierarchical and k-means cluster analysis. Statistical packages include R and SAS.
In this course, students will learn how companies harness their digital marketing data to drive insights that convert into better customer experiences. Topics may include survival analysis, longitudinal data analysis, heat maps, geographic information systems, fraud detection, and market basket analysis. Areas of application may include customer targeting, election management, and ecommerce.
Students are placed with a client as part of a semester-long project with weekly deliverables and meetings. Continued mentoring and development of professional business skills are also provided.
Deriving information such as sentiment from unstructured text like tweets or web documents. Distance measures for documents and email messages. Application of clustering and classification algorithms to high-dimension feature spaces from text documents.
This course introduces the fundamental concepts and methods underlying the field of social network analysis including network centrality, cohesive subgroups, structural and role equivalence, visualization and hypothesis testing. Emphasis is on students learning from analyzing data and answering empirical questions using routines written in R.
Continuation of Practicum. Students also receive “soft skills” training in creating their CV, interviewing and networking, and study of the venture capital and startup process.
In this course, students receive a brief, intense, and focused review of programming in SAS Enterprise Guide. This review will augment the SAS training that students receive in other analytics courses, yet specifically prepare students to take the SAS Base Programming examination.
Students create a distributed MongoDB cluster study partitioning strategies such as sharding and horizontal partitioning. Topics include SQL and NoSQ: queries and data insertion.
Analysts spend the majority of their time just collecting data and contorting it into an appropriate or convenient form for analysis. In this course, students write programs to scrape data from websites such as Yahoo finance and use REST APIs to extract data from Twitter. Topics also include log file filtering, table merging, data cleaning, and data reorganization.
In this introductory course, students will learn to perform basic data exploration techniques in both R and Python, as well as manipulate unstructured text in these two environments. Students will learn elementary techniques for visualizing and exploring patterns in data while practicing basic presentation skills. Furthermore, students will understand basic text classification techniques, implement algorithms for sentiment analysis, and evaluate and compare classification algorithms.
Big data does not fit on a single machine and analysts must resort to clusters of machines cooperating to compute results. This course introduces students to map-reduce systems such as HADOOP and domain specific languages such as PIG. Students learn to re-express programs as map-reduce jobs and present them to environments such as Amazon's "Elastic Map-Reduce."
The study of website traffic analysis for the purpose of understanding how visitors use a site or services. Topics include Google Analytics, A/B testing, and the analysis of incoming traffic characteristics such as client browser, language, computer attributes, and geolocation.
Students learn how to prepare for an interview, successfully answer questions in interviews, and how to present themselves. Labs include interviews and answering technical question quickly and accurately.