DEPARTMENT OF ENGINEERING MATHEMATICS

EMAT20210

DATA ANALYSIS 

 

 

EMAT20210  (10 credits)

Timetable Assessment  Syllabus Materials Textbooks Past exams

Organiser:

Dr Tijl De Bie

Lecturer:

Dr Tijl De Bie

Description:

The first part of the course is on probability and statistics.  The second part is on using and manipulating (large) data sets  by using Matlab.
Pre-requisites:

EMAT 10004 (Mathematics with Maple I) or equivalent.   Working knowledge of a procedural programming language.

Co-requisite: EMAT 20920 (Numerical Methods wtih Matlab)
Aims: The over-arching aim is to present a brisk and applied overview of topics in applied statistics, with an emphasis on techniques which may be used to analyse experimental data sets.

Learning outcomes:

At the end of this unit, the students will be able to
Write short MATLAB programs, and use MATLAB to manipulate and visualise data sets. Understand basic stochastic processes and simulate them on a computer. Estimate distribution parameters from empirical data. Perform basic hypothesis testing and compute confidence intervals. Perform exploratory data analysis using correlation and covariance. Load large noisy data sets into a computer, and analyse them with basic statistical techniques. Select and apply appropriate methods for particular tasks, such as (multivariate) linear regression, clustering and classification; and understand how these methods work and how accurate they are.

Organisation & timetable:

 Check on timetabling

Assessments:

Data analysis has 8 assessed lab sessions (weighting of 2.5% each making a total of 20%), a midterm project (weighting of 40%), and a
final project
(weighting of 40%).

The lab sessions are completed in the lab and students hand in their results at the end of the lab. Their results will be marked and the marks will be returned to them one week later. However, feedback for lab work will be given only at class level, not at an individual level. Note that they can get continuous individual feedback during the labs as well.

As for the midterm project, the deadline will be 17 December 2010. The marked midterm project with some individual feedback will be
returned 17 January 2011.

The deadline for the final project on 4 January 2011. The marked final project with some individual feedback will be returned by 25 February 2011.

Syllabus

- Descriptive statistics and basic data handling and visualization using MATLAB.

- Basic probability: random variables and their expectation, variance, standard deviation. Types of random variables: gaussian, binomial, poisson. Correlation coefficient of 2 random variables.

- Basic statistics: inference from data. Estimation of parameters and fitting distributions to data. Significance of relations found in data (hypothesis testing), and confidence intervals.

- Multivariate Statistics. Decision Trees.

- Least squares regression. Linear discriminant analysis.

- Principal Components Analysis.

Materials: Homework, assignments, and handouts available on Blackboard


Books:

Christopher Chatfield, Statistics for Technology, 3rd edition, Chapman and Hall, 1983

Glyn James (ed), Advanced Modern Engineering Mathematics (Ch 11), 3rd Edition, Pearson Education, 2004

Past exams: None
Deadlines: TBC

Back to top