Students

STAT828 – Data Mining

2018 – S1 External

General Information

Download as PDF
Unit convenor and teaching staff Unit convenor and teaching staff Unit Convenor
Ayse Bilgin
Contact via ayse.bilgin@mq.edu.au
Credit points Credit points
4
Prerequisites Prerequisites
Corequisites Corequisites
((Admission to MAppStat or GradCertAppStat or GradDipAppStat or MSc) and (STAT683 or STAT680)) or (admission to MActPrac or MInfoTech or MDataSc)
Co-badged status Co-badged status
STAT728: Data Mining
Unit description Unit description
Data mining is an important analytical tool as organisations deal with increasingly large data sets. It is about discovering patterns in the big data sets, and converting data into information or learning from data. Data mining uses techniques from different disciplines such as statistics, computing and machine learning. This unit introduces relevant data mining techniques using a white box approach to illuminate the underlying algorithms and statistical principles. This unit is designed to inform students about the data mining techniques by arming them with a deeper understanding of the algorithms and statistical principles underlying the techniques. At least two different software packages will be used to apply the different methods to discover information from different data sources. The first part of the unit will cover descriptive data mining, which will concentrate on exploratory tools such as graphical displays and descriptive statistics by using R and IBM SPSS Modeler. The second part will introduce the model building and predictive data mining such as classification, market basket analysis and clustering.

Important Academic Dates

Information about important academic dates including deadlines for withdrawing from units are available at https://www.mq.edu.au/study/calendar-of-dates

Learning Outcomes

On successful completion of this unit, you will be able to:

  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

General Assessment Information

All within session assessment tasks must be submitted online via iLearn. 

Only word or pdf format files will be accepted for lab exercises. Project(s) will also require additional files such as R scripts, IBM Modeler stream(s) to be submitted. Each page in word or pdf files should have the student ID and student name as footer to eliminate any problems. When naming files please adopt the following convention: StudentID-(Your Surname)(Initial of Your First Name) – Assessment Task (Lab 1 or Assignment 1) e.g., 40000000-BilginA-Project 1. No other format of naming the assessment tasks will be accepted. If you are unable to submit you assessment through iLearn (due to technical problems); an electronic (word or pdf) file (one file only) can be e-mailed to A/Prof Ayse Bilgin (ayse.bilgin@mq.edu.au).

In the case of the late submission of an assignment, if no special consideration has been granted, 10% of the earned mark will be deducted for each day that the assignment is late, up to a maximum of 50%. After 5 days, including weekends and public holidays, a mark of 0% will be awarded for the assignment.

Assessment Tasks

Name Weighting Hurdle Due
Data Mining Project Plan 0% No Week 4
Market Basket Analysis Report 15% No Week 7
Data Mining Project Draft 5% No Week 10
Data Mining Project Report 20% No Week 12
Data Mining Project Poster 5% No Week 13
Participation in Lab Exercises 5% No Weekly
Final Exam 50% No Examination Period

Data Mining Project Plan

Due: Week 4
Weighting: 0%

A project plan template will be provided in iLearn.


On successful completion you will be able to:
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data

Market Basket Analysis Report

Due: Week 7
Weighting: 15%

Undirected knowledge discovery (Cluster Analysis and Market Basket Analysis) Project is an individual assessment task.

Students are allowed to use data sets from their workplaces. However, they need to consult A/Prof Bilgin for approval of the suitability of the data set for the project.

Examples of earlier student reports will be provided within iLearn.


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Data Mining Project Draft

Due: Week 10
Weighting: 5%

Draft of the Data mining project report


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Data Mining Project Report

Due: Week 12
Weighting: 20%

Directed Knowledge Discovery (Data Mining) Project is an individual project, however students might decide to form a group to work on this project.  If students form a group, they need to inform A/Prof Ayse Bilgin as soon as possible.

Students are allowed to use data sets from their workplaces. However, they need to consult A/Prof Bilgin for approval of the suitability of the data set for the project.

The expected format for the report and the examples of earlier reports will be provided within iLearn.


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Data Mining Project Poster

Due: Week 13
Weighting: 5%

The poster will have a separate submission on iLearn. The expected format is a power point or a pdf file. Also include a summary handout (see iLearn) to your submission (possibly pdf document).


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Participation in Lab Exercises

Due: Weekly
Weighting: 5%

Lab exercise submission and contribution to tutorial discussions will be taken into account when allocating the marks. For individual due dates of lab exercises see iLearn.


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Final Exam

Due: Examination Period
Weighting: 50%

Final examination is 3 hours long with 10 minutes reading time and will be held during the exam period. You will be permitted to bring an A4 sheet of notes, handwritten or typed, on both sides, into the final examination. This summary must be submitted with your exam paper.

Calculators are permitted, but may be used only as calculators, and not as storage devices. No electronic devices other than calculators are permitted to be used during the exam.The final examination will be timetabled in the official University examination timetable. The University Examination timetable will be available in draft form approximately eight weeks before the commencement of the examinations and in final form approximately four weeks before the commencement of the examinations at: http://students.mq.edu.au/student_admin/exams/

The only exception to not sitting an examination at the designated time is because of documented illness or unavoidable disruption. If this happens, you may wish to consider applying for a Special Consideration. Students need to apply for Special Consideration online at https://ask.mq.edu.au/

If you receive special consideration for the final exam, a supplementary exam will be scheduled in the interval between the regular exam period and the start of the next session. By making a special consideration application for the final exam you are declaring yourself available for a resit during the supplementary examination period and will not be eligible for a second special consideration approval based on pre-existing commitments. Please ensure you are familiar with the policy prior to submitting an application. You can check the supplementary exam information page on FSE101 in iLearn (bit.ly/FSESupp) for dates, and approved applicants will receive an individual notification one week prior to the exam with the exact date and time of their supplementary examination.

Your final grade in STAT828 will be based on your work during the semester and in the final examination.


On successful completion you will be able to:
  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Delivery and Resources

Classes

Lectures

Lectures begin in Week 1. 

Tutorials   

Tutorials also begin in Week 1.  The aim of tutorials is to practise techniques learned in lectures.  They are designed so that students work through the exercises asking as many questions as they need to improve their understanding.

Teaching and Learning Strategy

  • Students are expected to read the relevant lecture material,  complete and submit tutorial exercises each week.
  • Students must satisfactorily submit a minimum of 10 (ten) of the weekly lab exercises to achieve participation mark of 5%.
  • Additional readings will be provided through iLearn to provide opportunities for students to increase their knowledge.
  • Weekly tutorial exercises are set for individual development and considered formative assessment, although participation (see assessments) will be based on the submitted lab exercises. These lab exercises are designed to be completed by each individual to achieve the best learning. Therefore, it is suggested that if students decide to work together, the final product should be written individually and group work should be acknowledged to draw the attention of the lecturer. 
  • Projects are extensions to the lab exercises. They require applying the learned techniques to unseen data sets and writing professional reports.

Relationship between Assessment and Learning Outcomes

While reading the relevant lecture material is important, it is only a small proportion of the total workload for the unit: additional readings, research in the library (or internet), working with other students in (virtual) groups, completing assignments, using the computer packages to develop models and private study are all parts of the work involved.

Weekly lab (tutorial) exercises are due on week following date of issue (e.g. Week 2 lab exercise submission is due in Week 3 by 6pm) on Monday. You need to submit them through iLearn.

Suggested solutions to lab exercises will be provided through iLearn in a timely manner. You are expected to submit at least 8 of the lab exercises. Failure to comply with this may result in getting a zero participation mark. Instead of content marking for the weekly lab exercises, a participation mark will be given to each student at the end of the semester based on the quality of their submissions.

See Assessment Section for other assessment tasks.

If for any reason, students cannot complete their assessment tasks on time, they have to contact the lecturer in advance. No extensions for the lab exercises will be granted unless satisfactory documentation outlining illness or misadventure is submitted.

The marked assessments (projects) will be returned by the Lecturer. Only word or pdf format files will be accepted; each page should have the student ID and student name as footer to eliminate any problems. When naming files please adopt the following convention: StudentID-(Your Surname)(Initial of Your First Name) – Assessment Task (Lab 1 or Assignment 1) e.g., 40000000-BilginA-Project 1. No other format of naming the assessment tasks will be accepted. If you are unable to submit you assessment through iLearn (due to technical problems); an electronic (word or pdf) file (one file only) can be e-mailed to A/Prof Ayse Bilgin (ayse.bilgin@mq.edu.au).

Unit Schedule

Week 1: Introduction to Data Mining & Introduction to R

Week 2: Data Preprocessing, missing data, outliers & Further R

Week 3: Descriptive and exploratory data mining, concept hierarchies & Graphical displays with R        

Week 4: Graphics and data explorations & Introduction to IBM SPSS Modeler

Week 5: Market Basket Analysis 

Week 6: Cluster Analysis (1)

Week 7: Classification (1) 

Week 8: Classification (2)

Week 9:  Classification (3)

Week 10: Classification (4)

Week 11: Classification (5)

Week 12: Cluster Analysis (2)

Week 13: Revision and Data Mining Project Poster Presentations

Note that the order of the lectures might change and all lab exercises are due by 5:30pm a week after they are issued

Policies and Procedures

Macquarie University policies and procedures are accessible from Policy Central (https://staff.mq.edu.au/work/strategy-planning-and-governance/university-policies-and-procedures/policy-central). Students should be aware of the following policies in particular with regard to Learning and Teaching:

Undergraduate students seeking more policy resources can visit the Student Policy Gateway (https://students.mq.edu.au/support/study/student-policy-gateway). It is your one-stop-shop for the key policies you need to know about throughout your undergraduate student journey.

If you would like to see all the policies relevant to Learning and Teaching visit Policy Central (https://staff.mq.edu.au/work/strategy-planning-and-governance/university-policies-and-procedures/policy-central).

Student Code of Conduct

Macquarie University students have a responsibility to be familiar with the Student Code of Conduct: https://students.mq.edu.au/study/getting-started/student-conduct​

Results

Results shown in iLearn, or released directly by your Unit Convenor, are not confirmed as they are subject to final approval by the University. Once approved, final results will be sent to your student email address and will be made available in eStudent. For more information visit ask.mq.edu.au.

Student Support

Macquarie University provides a range of support services for students. For details, visit http://students.mq.edu.au/support/

Learning Skills

Learning Skills (mq.edu.au/learningskills) provides academic writing resources and study strategies to improve your marks and take control of your study.

 The Macquarie University offers various workshops for the postgraduate students which you might find useful. The overviews and timetables can be accessed at http://www.students.mq.edu.au/support/learning_skills/workshops/postgraduate_workshops/

There are specific workshops for international students that help them to integrate into Australian Education System http://www.international.mq.edu.au/.

Student Services and Support

Students with a disability are encouraged to contact the Disability Service who can provide appropriate help with any issues that arise during their studies.

Student Enquiries

For all student enquiries, visit Student Connect at ask.mq.edu.au

IT Help

For help with University computer systems and technology, visit http://www.mq.edu.au/about_us/offices_and_units/information_technology/help/

When using the University's IT, you must adhere to the Acceptable Use of IT Resources Policy. The policy applies to all who connect to the MQ network including students.

Graduate Capabilities

PG - Capable of Professional and Personal Judgment and Initiative

Our postgraduates will demonstrate a high standard of discernment and common sense in their professional and personal judgment. They will have the ability to make informed choices and decisions that reflect both the nature of their professional work and their personal perspectives.

This graduate capability is supported by:

Learning outcomes

  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Assessment tasks

  • Data Mining Project Plan
  • Market Basket Analysis Report
  • Data Mining Project Draft
  • Data Mining Project Report
  • Data Mining Project Poster
  • Participation in Lab Exercises
  • Final Exam

PG - Discipline Knowledge and Skills

Our postgraduates will be able to demonstrate a significantly enhanced depth and breadth of knowledge, scholarly understanding, and specific subject content knowledge in their chosen fields.

This graduate capability is supported by:

Learning outcomes

  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Assessment tasks

  • Data Mining Project Plan
  • Market Basket Analysis Report
  • Data Mining Project Draft
  • Data Mining Project Report
  • Data Mining Project Poster
  • Participation in Lab Exercises
  • Final Exam

PG - Critical, Analytical and Integrative Thinking

Our postgraduates will be capable of utilising and reflecting on prior knowledge and experience, of applying higher level critical thinking skills, and of integrating and synthesising learning and knowledge from a range of sources and environments. A characteristic of this form of thinking is the generation of new, professionally oriented knowledge through personal or group-based critique of practice and theory.

This graduate capability is supported by:

Learning outcomes

  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Assessment tasks

  • Data Mining Project Plan
  • Market Basket Analysis Report
  • Data Mining Project Draft
  • Data Mining Project Report
  • Data Mining Project Poster
  • Participation in Lab Exercises
  • Final Exam

PG - Research and Problem Solving Capability

Our postgraduates will be capable of systematic enquiry; able to use research skills to create new knowledge that can be applied to real world issues, or contribute to a field of study or practice to enhance society. They will be capable of creative questioning, problem finding and problem solving.

This graduate capability is supported by:

Learning outcomes

  • have an extensive understanding of the principles and the concepts of data mining methods and their applications
  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Assessment tasks

  • Data Mining Project Plan
  • Market Basket Analysis Report
  • Data Mining Project Draft
  • Data Mining Project Report
  • Data Mining Project Poster
  • Participation in Lab Exercises
  • Final Exam

PG - Effective Communication

Our postgraduates will be able to communicate effectively and convey their views to different social, cultural, and professional audiences. They will be able to use a variety of technologically supported media to communicate with empathy using a range of written, spoken or visual formats.

This graduate capability is supported by:

Learning outcomes

  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • explain the link between descriptive and predictive data mining to support good decision making
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Assessment tasks

  • Data Mining Project Plan
  • Market Basket Analysis Report
  • Data Mining Project Draft
  • Data Mining Project Report
  • Data Mining Project Poster
  • Participation in Lab Exercises
  • Final Exam

PG - Engaged and Responsible, Active and Ethical Citizens

Our postgraduates will be ethically aware and capable of confident transformative action in relation to their professional responsibilities and the wider community. They will have a sense of connectedness with others and country and have a sense of mutual obligation. They will be able to appreciate the impact of their professional roles for social justice and inclusion related to national and global issues

This graduate capability is supported by:

Learning outcomes

  • apply creative thinking to resolve complex problems or issues as well as summarising complex multivariate data and creating visual summaries of such data
  • examine and compare the differences between different decision trees and interpret sophisticated decision tree models for decision makers by writing a professional data mining report
  • analyse data sets by applying classification and cluster analysis methods and use their results to create an action plan for the management
  • apply market basket analysis to the sales data of a company, synthesise the results for a professional data mining report
  • demonstrate level of knowledge and technical expertise in data mining activities, including cleaning and transformation of data; presentation of results of mining and modelling to possible users

Assessment tasks

  • Data Mining Project Plan
  • Market Basket Analysis Report
  • Data Mining Project Draft
  • Data Mining Project Report
  • Data Mining Project Poster
  • Participation in Lab Exercises
  • Final Exam

Software Packages

R       We use the open source software R. You can download and install a copy of the program from the developers’ web page: http://cran.r-project.org/  or www.R-project.org

R is command line software. It might be hard to learn if you are not used to this kind of environment. However the benefits of learning to use this software outweigh the disadvantages. The benefits include and are not limited to the fact that it is free, it is very flexible, it has great support from R community through news groups.

R Studio We also use a user interface for R, RStudio, which can be downloaded from https://www.rstudio.com/products/rstudio/download/

IBM SPSS Modeler : This is graphically based data mining software from IBM and widely used by business. It can be accessed through iLab Macquarie University's personal computer laboratory on the Internet.

Learning management system (LMS)

There is an iLearn (which is modification of Moodle) site for this unit where the required course materials for the unit will be posted. Communication within the unit is possible via iLearn forums. Remember to log in and read the posts at least twice a week, as the lecturer make important announcements.

The web page for the LMS is ilearn.mq.edu.au, use your Macquarie OneID to log in.