DEPARTMENT OF ENGINEERING MATHEMATICSEMAT20210DATA ANALYSIS 
Timetable  Assessment  Syllabus  Materials  Textbooks  Past exams 
Organiser: 
Dr Tijl De Bie 
Lecturer: 
Dr Tijl De Bie 
Description: 
The first part of the course is on probability and statistics. The second part is on using and manipulating (large) data sets by using Matlab. 
Prerequisites: 
EMAT 10004 (Mathematics with Maple I) or equivalent. Working knowledge of a procedural programming language. 
Corequisite:  EMAT 20920 (Numerical Methods wtih Matlab) 
Aims:  The overarching aim is to present a brisk and applied overview of topics in applied statistics, with an emphasis on techniques which may be used to analyse experimental data sets. 
Learning outcomes: 
At the end of this unit, the students will be able to Write short MATLAB programs, and use MATLAB to manipulate and visualise data sets. Understand basic stochastic processes and simulate them on a computer. Estimate distribution parameters from empirical data. Perform basic hypothesis testing and compute confidence intervals. Perform exploratory data analysis using correlation and covariance. Load large noisy data sets into a computer, and analyse them with basic statistical techniques. Select and apply appropriate methods for particular tasks, such as (multivariate) linear regression, clustering and classification; and understand how these methods work and how accurate they are. 
Check on timetabling  
Data
analysis has 8 assessed lab sessions (weighting of 2.5% each making a
total of 20%), a midterm project (weighting of 40%), and a final project (weighting of 40%). The lab sessions are completed in the lab and students hand in their results at the end of the lab. Their results will be marked and the marks will be returned to them one week later. However, feedback for lab work will be given only at class level, not at an individual level. Note that they can get continuous individual feedback during the labs as well. As for the midterm project, the deadline will be 17 December 2010. The marked midterm project with some individual feedback will be returned 17 January 2011. The deadline for the final project on 4 January 2011. The marked final project with some individual feedback will be returned by 25 February 2011. 

 Descriptive statistics and basic data
handling and visualization using MATLAB.
 Basic probability: random variables and their expectation, variance, standard deviation. Types of random variables: gaussian, binomial, poisson. Correlation coefficient of 2 random variables.  Basic statistics: inference from data. Estimation of parameters and fitting distributions to data. Significance of relations found in data (hypothesis testing), and confidence intervals.  Multivariate Statistics. Decision Trees.  Least squares regression. Linear discriminant analysis.  Principal Components Analysis. 

Materials:  Homework, assignments, and handouts available on Blackboard 
Christopher Chatfield, Statistics for Technology,
3rd edition, Chapman and Hall, 1983
Glyn James (ed), Advanced Modern Engineering Mathematics (Ch 11), 3rd Edition, Pearson Education, 2004 

Past exams:  None 
Deadlines:  TBC 