# Online courses directory (209)

Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most straightforward ways to quickly gain insights and make predictions. Machine learning brings together computer science and statistics to harness that predictive power. It’s a must-have skill for all aspiring data analysts and data scientists, or anyone else who wants to wrestle all that raw data into refined trends and predictions. This is a class that will teach you the end-to-end process of investigating data through a machine learning lens. It will teach you how to extract and identify useful features that best represent your data, a few of the most important machine learning algorithms, and how to evaluate the performance of your machine learning algorithms. This course is also a part of our Data Analyst Nanodegree.

Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs. As companies realize this, Spark developers are becoming increasingly valued.

This statistics and data analysis course will teach you the basics of working with Spark and will provide you with the necessary foundation for diving deeper into Spark. You’ll learn about Spark’s architecture and programming model, including commonly used APIs. After completing this course, you’ll be able to write and debug basic Spark applications. This course will also explain how to use Spark’s web user interface (UI), how to recognize common coding errors, and how to proactively prevent errors. The focus of this course will be Spark Core and Spark SQL.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python mini-quiz before the course and take this Python mini-course if they need to learn Python or refresh their Python knowledge.

We begin with an introduction to the biology, explaining what we measure and why. Then we focus on the two main measurement technologies: next generation sequencing and microarrays. We then move on to describing how raw data and experimental information are imported into R and how we use Bioconductor classes to organize these data, whether generated locally, or harvested from public repositories or institutional archives. Genomic features are generally identified using intervals in genomic coordinates, and highly efficient algorithms for computing with genomic intervals will be examined in detail. Statistical methods for testing gene-centric or pathway-centric hypotheses with genome-scale data are found in packages such as limma, some of these techniques will be illustrated in lectures and labs.

Given the diversity in educational background of our students we have divided the series into seven parts. You can take the entire series or individual courses that interest you. If you are a statistician you should consider skipping the first two or three courses, similarly, if you are biologists you should consider skipping some of the introductory biology lectures. Note that the statistics and programming aspects of the class ramp up in difficulty relatively quickly across the first three courses. By the third course will be teaching advanced statistical concepts such as hierarchical models and by the fourth advanced software engineering skills, such as parallel computing and reproducible research concepts.

These courses make up 2 XSeries and are self-paced:

PH525.2x: Introduction to Linear Models and Matrix Algebra

PH525.4x: High-Dimensional Data Analysis

PH525.5x: Introduction to Bioconductor: annotation and analysis of genomes and genomic assays

PH525.6x: High-performance computing for reproducible genomics

PH525.7x: Case studies in functional genomics

This class was supported in part by NIH grant R25GM114818.

HarvardX requires individuals who enroll in its courses on edX to abide by the terms of the edX honor code. HarvardX will take appropriate corrective action in response to violations of the edX honor code, which may include dismissal from the HarvardX course; revocation of any certificates received for the HarvardX course; or other remedies as circumstances warrant. No refunds will be issued in the case of corrective action for such violations. Enrollees who are taking HarvardX courses as part of another program will also be governed by the academic policies of those programs.

HarvardX pursues the science of learning. By registering as an online learner in an HX course, you will also participate in research about learning. Read our research statement to learn more.

Harvard University and HarvardX are committed to maintaining a safe and healthy educational and work environment in which no member of the community is excluded from participation in, denied the benefits of, or subjected to discrimination or harassment in our program. All members of the HarvardX community are expected to abide by Harvard policies on nondiscrimination, including sexual harassment, and the edX Terms of Service. If you have any questions or concerns, please contact harvardx@harvard.edu and/or report your experience through the edX contact form.

This course aims to give students the tools and training to recognize convex optimization problems that arise in scientific and engineering applications, presenting the basic theory, and concentrating on modeling aspects and results that are useful in applications. Topics include convex sets, convex functions, optimization problems, least-squares, linear and quadratic programs, semidefinite programming, optimality conditions, and duality theory. Applications to signal processing, control, machine learning, finance, digital and analog circuit design, computational geometry, statistics, and mechanical engineering are presented. Students complete hands-on exercises using high-level numerical software.

## Acknowledgements

The course materials were developed jointly by Prof. Stephen Boyd (Stanford), who was a visiting professor at MIT when this course was taught, and Prof. Lieven Vanderberghe (UCLA).

This course aims to give students the tools and training to recognize convex optimization problems that arise in scientific and engineering applications, presenting the basic theory, and concentrating on modeling aspects and results that are useful in applications. Topics include convex sets, convex functions, optimization problems, least-squares, linear and quadratic programs, semidefinite programming, optimality conditions, and duality theory. Applications to signal processing, control, machine learning, finance, digital and analog circuit design, computational geometry, statistics, and mechanical engineering are presented. Students complete hands-on exercises using high-level numerical software.

## Acknowledgements

The course materials were developed jointly by Prof. Stephen Boyd (Stanford), who was a visiting professor at MIT when this course was taught, and Prof. Lieven Vanderberghe (UCLA).

Matrix Algebra underlies many of the current tools for experimental design and the analysis of high-dimensional data. In this introductory data analysis course, we will use matrix algebra to represent the linear models that commonly used to model differences between experimental units. We perform statistical inference on these differences. Throughout the course we will use the R programming language.

Given the diversity in educational background of our students we have divided the series into seven parts. You can take the entire series or individual courses that interest you. If you are a statistician you should consider skipping the first two or three courses, similarly, if you are biologists you should consider skipping some of the introductory biology lectures. Note that the statistics and programming aspects of the class ramp up in difficulty relatively quickly across the first three courses. By the third course will be teaching advanced statistical concepts such as hierarchical models and by the fourth advanced software engineering skills, such as parallel computing and reproducible research concepts.

These courses make up 2 XSeries and are self-paced:

PH525.2x: Introduction to Linear Models and Matrix Algebra

PH525.4x: High-Dimensional Data Analysis

PH525.5x: Introduction to Bioconductor: annotation and analysis of genomes and genomic assays

PH525.6x: High-performance computing for reproducible genomics

PH525.7x: Case studies in functional genomics

This class was supported in part by NIH grant R25GM114818.

HarvardX requires individuals who enroll in its courses on edX to abide by the terms of the edX honor code. HarvardX will take appropriate corrective action in response to violations of the edX honor code, which may include dismissal from the HarvardX course; revocation of any certificates received for the HarvardX course; or other remedies as circumstances warrant. No refunds will be issued in the case of corrective action for such violations. Enrollees who are taking HarvardX courses as part of another program will also be governed by the academic policies of those programs.

HarvardX pursues the science of learning. By registering as an online learner in an HX course, you will also participate in research about learning. Read our research statement to learn more.

Harvard University and HarvardX are committed to maintaining a safe and healthy educational and work environment in which no member of the community is excluded from participation in, denied the benefits of, or subjected to discrimination or harassment in our program. All members of the HarvardX community are expected to abide by Harvard policies on nondiscrimination, including sexual harassment, and the edX Terms of Service. If you have any questions or concerns, please contact harvardx@harvard.edu and/or report your experience through the edX contact form.

This course provides an elementary introduction to probability and statistics with applications. Topics include: basic combinatorics, random variables, probability distributions, Bayesian inference, hypothesis testing, confidence intervals, and linear regression.

The Spring 2014 version of this subject employed the residential MITx system, which enables on-campus subjects to provide MIT students with learning and assessment tools such as online problem sets, lecture videos, reading questions, pre-lecture questions, problem set assistance, tutorial videos, exam review content, and even online exams.

This course is part of the *Microsoft Professional Program Certificate in Data Science*.

R is rapidly becoming the leading language in data science and statistics. Today, R is the tool of choice for data science professionals in every industry and field. Whether you are full-time number cruncher, or just the occasional data analyst, R will suit your needs.

This introduction to R programming course will help you master the basics of R. In seven sections, you will cover its basic syntax, making you ready to undertake your own first data analysis using R. Starting from variables and basic operations, you will eventually learn how to handle data structures such as vectors, matrices, data frames and lists. In the final section, you will dive deeper into the graphical capabilities of R, and create your own stunning data visualizations. No prior knowledge in programming or data science is required.

What makes this course unique is that you will continuously practice your newly acquired skills through interactive in-browser coding challenges using the DataCamp platform. Instead of passively watching videos, you will solve real data problems while receiving instant and personalized feedback that guides you to the correct solution.

Enjoy!

This course will provide a solid foundation in probability and statistics for economists and other social scientists. We will emphasize topics needed for further study of econometrics and provide basic preparation for 14.32. Topics include elements of probability theory, sampling theory, statistical estimation, and hypothesis testing.

This course exposes students to the logic of statistical reasoning and its application in the quantitative social sciences. It is meant as a thorough but accessible introduction to the topics of descriptive statistics, probability theory, and statistical inference with hands-on exercises.

We are surrounded by information, much of it numerical, and it is important to know how to make sense of it. Stat2x is an introduction to the fundamental concepts and methods of statistics, the science of drawing conclusions from data.

The course is the online equivalent of Statistics 2, a 15-week introductory course taken in Berkeley by about 1,000 students each year. Stat2x is divided into three 5-week components. Stat2.1x is the first of the three.

The focus of Stat2.1x is on descriptive statistics. The goal of descriptive statistics is to summarize and present numerical information in a manner that is illuminating and useful. The course will cover graphical as well as numerical summaries of data, starting with a single variable and progressing to the relation between two variables. Methods will be illustrated with data from a variety of areas in the sciences and humanities.

There will be no mindless memorization of formulas and methods. Throughout Stat2.1x, the emphasis will be on understanding the reasoning behind the calculations, the assumptions under which they are valid, and the correct interpretation of results.

**FAQ**

- What is the format of the class?
- Instruction will be consist of brief lectures and exercises to check comprehension. Grades (Pass or Not Pass) will be decided based on a combination of scores on short assignments, quizzes, and a final exam.

- How much does it cost to take the course?
- Nothing! The course is free.

- Will the text of the lectures be available?
- Yes. All of our lectures will have transcripts synced to the videos.

- Do I need to watch the lectures live?
- No. You can watch the lectures at your leisure.

- Can I contact the Instructor or Teaching Assistants?
- Yes, but not directly. The discussion forums are the appropriate venue for questions about the course. The instructors will monitor the discussion forums and try to respond to the most important questions; in many cases response from other students and peers will be adequate and faster.

- Do I need any other materials to take the course?
- If you have any questions about edX generally, please see the edX FAQ.

Statistics 2 at Berkeley is an introductory class taken by about 1,000 students each year. Stat2.3x is the last in a sequence of three courses that make up Stat2x, the online equivalent of Berkeley's Stat 2. The focus of Stat2.3x is on statistical inference: how to make valid conclusions based on data from random samples. At the heart of the main problem addressed by the course will be a population (which you can imagine for now as a set of people) connected with which there is a numerical quantity of interest (which you can imagine for now as the average number of MOOCs the people have taken). If you could talk to each member of the population, you could calculate that number exactly. But what if the population is so large that your resources will not stretch to interviewing every member? What if you can only reach a subset of the population?

Stat 2.3x will discuss good ways to select the subset (yes, at random); how to estimate the numerical quantity of interest, based on what you see in your sample; and ways to test hypotheses about numerical or probabilistic aspects of the problem.

The methods that will be covered are among the most commonly used of all statistical techniques. If you have ever read an article that claimed, "The margin of error in such surveys is about three percentage points," or, "Researchers at the University of California at Berkeley have discovered a highly significant link between ...," then you should expect that by the end of Stat 2.3x you will have a pretty good idea of what that means. Examples will range all the way from a little girl's school science project (seriously – she did a great job and her results were published in a major journal) to rulings by the U.S. Supreme Court.

The fundamental approach of the series was provided in the description of Stat2.1x and appears here again: There will be no mindless memorization of formulas and methods. Throughout the course, the emphasis will be on understanding the reasoning behind the calculations, the assumptions under which they are valid, and the correct interpretation of results.

Statistics 2 at Berkeley is an introductory class taken by about 1000 students each year. Stat2.2x is the second of three five-week courses that make up Stat2x, the online equivalent of Berkeley's Stat 2.

The focus of Stat2.2x is on probability theory: exactly what is a random sample, and how does randomness work? If you buy 10 lottery tickets instead of 1, does your chance of winning go up by a factor of 10? What is the law of averages? How can polls make accurate predictions based on data from small fractions of the population? What should you expect to happen "just by chance"? These are some of the questions we will address in the course.

We will start with exact calculations of chances when the experiments are small enough that exact calculations are feasible and interesting. Then we will step back from all the details and try to identify features of large random samples that will help us approximate probabilities that are hard to compute exactly. We will study sums and averages of large random samples, discuss the factors that affect their accuracy, and use the normal approximation for their probability distributions.

Be warned: by the end of Stat2.2x you will not want to gamble. Ever. (Unless you're really good at counting cards, in which case you could try blackjack, but perhaps after taking all these edX courses you'll find other ways of earning money.)

The fundamental approach of the series was provided in the description of Stat2.1x and appears here again: There will be no mindless memorization of formulas and methods. Throughout the course, the emphasis will be on understanding the reasoning behind the calculations, the assumptions under which they are valid, and the correct interpretation of results.

**FAQ**

- What is the format of the class?
- Instruction will be consist of brief lectures and exercises to check comprehension. Grades (Pass or Not Pass) will be decided based on a combination of scores on short assignments, quizzes, and a final exam.

- How much does it cost to take the course?
- Nothing! The course is free.

- Will the text of the lectures be available?
- Yes. All of our lectures will have transcripts synced to the videos.

- Do I need to watch the lectures live?
- No. You can watch the lectures at your leisure.

- Will certificates be awarded?
- Yes. Online learners who achieve a passing grade in a course can earn a certificate of achievement. These certificates will indicate you have successfully completed the course, but will not include a specific grade. Certificates will be issued by edX under the name of BerkeleyX, designating the institution from which the course originated.

- Can I contact the Instructor or Teaching Assistants?
- Yes, but not directly. The discussion forums are the appropriate venue for questions about the course. The instructors will monitor the discussion forums and try to respond to the most important questions; in many cases response from other students and peers will be adequate and faster.

- Do I need any other materials to take the course?
- If you have any questions about edX generally, please see the edX FAQ.

This course is for students interested in studying the Project Maths Junior Certificate Higher Level Course in its entirety. This free online course provides students with tutorial videos on all the higher level topics in one location listed by module and topic. In addition, a comprehensive assessment is provided which tests learners on the entire content of the Project Maths Higher Level Syllabus. These topics include Probability and Statistics, Geometry and Trigonometry, Numbers and Shapes and Algebra.<br />

This course is for students interested in studying the Project Maths Junior Certificate Ordinary Level Course in its entirety. This free online course provides students with tutorial videos on all the ordinary level topics in one location listed by module and topic. In addition, a comprehensive assessment is provided which tests learners on the entire content of the Project Maths Ordinary Level Syllabus. These topics include Probability and Statistics, Geometry and Trigonometry, Numbers and Shapes and Algebra.<br />

Probability and Statistics is one of the strands of the new Project Maths Course in the Irish curriculum. Statistics are used in real life to make sense of the information around us and how it affects us. Statistics looks at the data handling cycle and analysis of the data collected. This involves posing a question, collecting data on that question, presenting that data, analysing the data (using measures of spread and centre) and interpreting the results. In answering questions, it is essential that you can contextualise and justify your findings. Probability is concerned with the likelihood of an event(s) happening. The information can be used to make informed decisions. The use of probability is commonly utilised in the world of finance, insurance and sport among others. Probability can also be used to infer the fairness of an event or series of events. It can be evaluated using a diagram or a rule-based approach. This Strand attempts to merge the mathematical aspects of Probability and Statistics with its real-life application. It is an interesting topic that is very accessible to all students.<br />

Probability and Statistics is one of the strands of the new Project Maths Course in the Irish curriculum. Statistics are used in real life to make sense of the information around us and how it affects us. Statistics looks at the data handling cycle and analysis of the data collected. This involves posing a question, collecting data on that question, presenting that data, analysing the data (using measures of spread and centre) and interpreting the results. In answering questions, it is essential that you can contextualise and justify your findings. Probability is concerned with the likelihood of an event(s) happening. The information can be used to make informed decisions. The use of probability is commonly utilised in the world of finance, insurance and sport among others. Probability can also be used to infer the fairness of an event or series of events. It can be evaluated using a diagram or a rule-based approach. This Strand attempts to merge the mathematical aspects of Probability and Statistics with its real-life application. It is an interesting topic that is very accessible to all students.Write a concise and interesting paragraph here that explains what this course is about