Students

STAT828 – Data Mining

2013 – S1 Evening

General Information

Download as PDF
Unit convenor and teaching staff Unit convenor and teaching staff Unit Convenor
Ayse Bilgin
Contact via ayse.bilgin@mq.edu.au
E4A515
Credit points Credit points
4
Prerequisites Prerequisites
Admission to MAppStat or PGDipAppStat or PGCertAppStat
Corequisites Corequisites
Co-badged status Co-badged status
STAT728: Data Mining
Unit description Unit description
Data mining is emerging as an important analytical tool as organisations deal with increasingly large data sets. In particular, data mining is widely used in large insurance companies, market research organisations, banks and other financial institutions. This unit introduces relevant data mining techniques and the underlying algorithms and statistical principles. The applications will be demonstrated using two of the major data mining packages and interesting case studies.
Data Mining is about discovering patterns in the big data sets, and converting data into information. The emphasis is on the data and the ways to convert data into information.

Important Academic Dates

Information about important academic dates including deadlines for withdrawing from units are available at https://www.mq.edu.au/study/calendar-of-dates

Learning Outcomes

On successful completion of this unit, you will be able to:

  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

Assessment Tasks

Name Weighting Due
Data Mining Project Plan 5% 20/3/2013
Market Basket Analysis Report 10% 10/4/2013
Data Mining Project Report 20% 29/5/2013
Data Mining Project Poster 5% 5/6/2013
Participation in Lab Exercises 5% 5/6/2013
Final Exam 55% Examination Period

Data Mining Project Plan

Due: 20/3/2013
Weighting: 5%

A project plan template will be provide in iLearn.


Market Basket Analysis Report

Due: 10/4/2013
Weighting: 10%

 

Market Basket Analysis Project is an individual assessment task.

If you work with another student, you need to acknowledge it in your report.

Students are allowed to bring in a data set from their work place to work on, however, they need to consult Dr Bilgin for approval of the suitability of the data set for the project.

A model format and the examples of earlier reports will be provided through iLearn with the issue of the projects.


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

Data Mining Project Report

Due: 29/5/2013
Weighting: 20%

 

Data Mining Project is a group work project.

Students will be put into groups as soon as possible (i.e. by week three) and they will be given opportunity to work on their project during tutorials.

Students are allowed to bring in a data set from their work place to work on, however, they need to consult Dr Bilgin for approval of the suitability of the data set for the project.

A model format and the examples of earlier reports will be provided through iLearn with the issue of the projects.


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

Data Mining Project Poster

Due: 5/6/2013
Weighting: 5%

 

One poster per group on iLearn by due date (power point document or pdf) clearly stating the group members. Also include a summary handout (see iLearn) to your submission (possibly pdf document).


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

Participation in Lab Exercises

Due: 5/6/2013
Weighting: 5%

Lab exercise submission and contribution to tutorial discussions will be taken into account when allocating the marks. For individual due dates of lab exercises see iLearn.


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Final Exam

Due: Examination Period
Weighting: 55%

Final examination is 3 hours long with 10 minutes reading time and will be held during the exam period. You will be permitted to bring an A4 sheet of notes, handwritten or typed, on both sides, into the final examination. This summary must be submitted with your exam paper.

Calculators are permitted, but may be used only as calculators, and not as storage devices. No electronic devices (e.g. mobile phones, mp3 players) other than calculators are allowed during the exam.The final examination will be timetabled in the official University examination timetable. The University Examination timetable will be available in draft form approximately eight weeks before the commencement of the examinations and in final form approximately four weeks before the commencement of the examinations at: http://www.exams.mq.edu.au/exam/

Attendance at the examination is compulsory. The only exception to not sitting an examination at the designated time is because of documented illness or unavoidable disruption. In these circumstances you may wish to consider applying for Special Consideration. Information about unavoidable disruption and the special consideration process is available at http://www.mq.edu.au/policy/docs/special consideration/policy.html

You can submit your special consideration request(s) through the following link https://ask.mq.edu.au/index.php

Your final grade in STAT828 will be based on your work during the semester and in the final examination. You need to achieve the same standards both during the semester assessments and the final exam to be awarded a particular grade as set out in the Grading Policy (http://www.mq.edu.au/policy/docs/grading/policy.html).

 

 


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

Delivery and Resources

Changes to Content

The initial course notes for this unit have been developed by Associate Prof Julian Leslie and Dr Ayse Bilgin in 2007. Ms Gillian Miller made changes and added new topics to the course notes in 2010 and 2011. In 2012 and 2013, the notes are revised based on the current developments in the data mining discipline and students’ feedbacks to the unit.

Classes

Lectures

Lectures begin in Week 1.  Students should attend ONE 2-hour session per week: Wednesdays between 6:00 and 8:00pm in EMC-G220 Faculty PC Lab. 

Tutorials   

Tutorials also begin in Week 1.  The aim of tutorials is to practise techniques learnt in lectures.  They are designed so that students work through the exercises asking as many questions as they need to improve their understanding. Tutors are the facilitators in the tutorial groups. They will assist students and they will create an environment for thinking process and discussion between the students. Tutorials will be held on Wednesdays in EMC-G220 Faculty PC Lab between 8:00pm and 10:00pm.

Teaching and Learning Strategy

  • Students are expected to attend all the lectures and the tutorials.
  • Additional readings will be provided through iLearn to provide opportunities for students to increase their knowledge.
  • Weekly tutorial exercises are set for individual development and considered formative assessment (no marks but suggestions to improve will be given each week to each student through marked lab exercises). Therefore, it is suggested that if students decide to work together, the final product should be written individually and group work should be acknowledged to draw attention of the lecturer.
  • Projects are extensions to the lab exercises. They require applying the learned techniques to unseen data sets and writing professional reports.

Relationship between Assessment and Learning Outcomes

While attendance at classes is important, it is only a small proportion of the total workload for the unit: reading, research in the library, working with other students in groups, completing assignments, using the computer packages to develop models and private study are all parts of the work involved.  At Macquarie it is expected that the average student should spend three hours per week per credit point.

Weekly lab (tutorial) exercises are due at the BEGINNING of your lecture session on week following date of issue (e.g. Week 2 lab exercise solution is due in Week 3 before the lecture or by 6pm). You need to submit them through iLearn. You will be provided timely individual feedback to your submitted lab exercises. However, there are no marks for each weekly exercise. It is important to discuss the feedback provided with your tutor/lecturer if you have any problems to improve your learning. In addition to individual feedback, the suggested solutions to lab exercises will be provided through iLearn in a timely basis. You are expected to submit at least 8 of the lab exercises. Failure to comply with this may result in exclusion from the unit. Instead of content marking for the weekly lab exercises, a participation mark will be given to each student at the end of the semester based on the quality of their submissions (which will be shared by all students – details will be provided in the first lecture and within each lab exercise).

See Assessment Section for other assessment tasks.

If for any reason, students cannot hand in their assessment tasks on time, they have to contact the teaching staff in advance. No extensions for the lab exercises will be granted unless satisfactory documentation outlining illness or misadventure is submitted.

The marked assessment papers (lab exercises or projects) will be distributed during the tutorials by the Lecturer. If you are unable to submit you assessment through iLearn (due to technical problems); an electronic (word) file can be e-mailed to Dr Ayse Bilgin (ayse.bilgin@mq.edu.au). Only word format files will be accepted; each page should have the student ID and student name as footer to eliminate any problems. When naming files please adopt the following convention: StudentID-(Your Surname)(Initial of Your First Name) –Assessment Task (Lab 1 or Assignment 1) e.g., 40000000-BilginA-Project 1. No other format of naming the assessment tasks will be accepted.

Unit Schedule

Week 1:Introduction to Data Mining & Introduction to R

Week 2:Data Preprocessing, missing data, outliers & Further R

Week 3:Descriptive and exploratory data mining, concept hierarchies & graphical displays with R        

Week 4: Graphics and data explorations & Introduction to IBM SPSS Modeler

Week 5: Market Basket Analysis

Week 6: Classification (1)

Week 7: Classification (2)

Week 8: Classification (3)

Week 9: Classification (4)

Week 10: Classification (5)

Week 11: Cluster Analysis (1)

Week 12: Cluster Analysis (2)

Week 13: Revision and Data Mining Project Poster Presentations

Note that the order of the lectures might change and all lab exercises are due by 5:30pm a week after they are issued

Policies and Procedures

Macquarie University policies and procedures are accessible from Policy Central. Students should be aware of the following policies in particular with regard to Learning and Teaching:

Academic Honesty Policy http://www.mq.edu.au/policy/docs/academic_honesty/policy.html

Assessment Policy  http://www.mq.edu.au/policy/docs/assessment/policy.html

Grading Policy http://www.mq.edu.au/policy/docs/grading/policy.html

Grade Appeal Policy http://www.mq.edu.au/policy/docs/gradeappeal/policy.html

Grievance Management Policy http://mq.edu.au/policy/docs/grievance_management/policy.html

Special Consideration Policy http://www.mq.edu.au/policy/docs/special_consideration/policy.html

In addition, a number of other policies can be found in the Learning and Teaching Category of Policy Central.

Grading Policy http://www.mq.edu.au/policy/docs/grading/policy.html

Student Support

Macquarie University provides a range of Academic Student Support Services. Details of these services can be accessed at: http://students.mq.edu.au/support/

UniWISE provides:

  • Online learning resources and academic skills workshops http://www.students.mq.edu.au/support/learning_skills/
  • Personal assistance with your learning & study related questions.
  • The Learning Help Desk is located in the Library foyer (level 2).
  • Online and on-campus orientation events run by Mentors@Macquarie.

Student Services and Support

Students with a disability are encouraged to contact the Disability Service who can provide appropriate help with any issues that arise during their studies.

Student Enquiries

Details of these services can be accessed at http://www.student.mq.edu.au/ses/.

IT Help

If you wish to receive IT help, we would be glad to assist you at http://informatics.mq.edu.au/help/

When using the university's IT, you must adhere to the Acceptable Use Policy. The policy applies to all who connect to the MQ network including students and it outlines what can be done.

Graduate Capabilities

PG - Discipline Knowledge and Skills

Our postgraduates will be able to demonstrate a significantly enhanced depth and breadth of knowledge, scholarly understanding, and specific subject content knowledge in their chosen fields.

This graduate capability is supported by:

Learning outcomes

  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

PG - Critical, Analytical and Integrative Thinking

Our postgraduates will be capable of utilising and reflecting on prior knowledge and experience, of applying higher level critical thinking skills, and of integrating and synthesising learning and knowledge from a range of sources and environments. A characteristic of this form of thinking is the generation of new, professionally oriented knowledge through personal or group-based critique of practice and theory.

This graduate capability is supported by:

Learning outcomes

  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

PG - Research and Problem Solving Capability

Our postgraduates will be capable of systematic enquiry; able to use research skills to create new knowledge that can be applied to real world issues, or contribute to a field of study or practice to enhance society. They will be capable of creative questioning, problem finding and problem solving.

This graduate capability is supported by:

Learning outcomes

  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

PG - Effective Communication

Our postgraduates will be able to communicate effectively and convey their views to different social, cultural, and professional audiences. They will be able to use a variety of technologically supported media to communicate with empathy using a range of written, spoken or visual formats.

This graduate capability is supported by:

Learning outcomes

  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

PG - Engaged and Responsible, Active and Ethical Citizens

Our postgraduates will be ethically aware and capable of confident transformative action in relation to their professional responsibilities and the wider community. They will have a sense of connectedness with others and country and have a sense of mutual obligation. They will be able to appreciate the impact of their professional roles for social justice and inclusion related to national and global issues

This graduate capability is supported by:

Learning outcomes

  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

PG - Capable of Professional and Personal Judgment and Initiative

Our postgraduates will demonstrate a high standard of discernment and common sense in their professional and personal judgment. They will have the ability to make informed choices and decisions that reflect both the nature of their professional work and their personal perspectives.

This graduate capability is supported by:

Learning outcomes

  • ability to apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrated level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users
  • high-level research, analytical and conceptual skills and ability to apply these skills in development of models and client profiling

Software Packages

 

R       We will use open source software called R. You can download and install a copy of the program from the developers’ web page: http://cran.r-project.org/  or www.R-project.org

R is a command line software, it might be hard to learn if you are not used to this kind of environment, however the benefits of learning to use this software overweight its disadvantages. The benefits include and not limited to: it is free; it is very flexible; great support from R community through news groups and you can use it after you complete the course.

IBM SPSS Modeler : This is graphical based data mining software owned by IBM and widely used by business.

Learning management system (LMS)

There is a iLearn (which is modified Moodle) site for this unit where the required course materials for the unit will be posted. In addition, the forums are created for each week will enable us to communicate within the unit without having the danger of spam filters. The lecturers might make announcements via the online unit page therefore you should make sure you log in and read the posts at least twice a week.

 

The web page for the LMS is https://ilearn.mq.edu.au/login/MQ/, use your Macquarie OneID to log in.