Students

COMP3220 – Document Processing and Semantic Technologies

2022 – Session 1, In person-scheduled-weekday, North Ryde

General Information

Download as PDF
Unit convenor and teaching staff Unit convenor and teaching staff
Rolf Schwitter
Contact via Email
4 Research Park Drive; Office 359
By appointment
Diego Molla-Aliod
Contact via Email
4 Research Park Drive; Office 358
By appointment
Credit points Credit points
10
Prerequisites Prerequisites
130cp at 1000 level or above including COMP2110 or COMP249 or COMP2200 or COMP257
Corequisites Corequisites
Co-badged status Co-badged status
Unit description Unit description

This unit explores the issues involved in building natural language processing (NLP) applications that operate on large bodies of real text such as are found on the world wide web. In this unit we discuss some core methods and tools for dealing with data on the web; in particular machine learning platforms widely used in industry. The unit also explores some recent developments of the web, such as emerging semantic web technologies and the corresponding standards promoted by the Word Wide Web Consortium (W3C). Application areas covered include web search, sentiment analysis, and information extraction.

Important Academic Dates

Information about important academic dates including deadlines for withdrawing from units are available at https://www.mq.edu.au/study/calendar-of-dates

Learning Outcomes

On successful completion of this unit, you will be able to:

  • ULO1: Explain the main techniques that are used to develop and implement intelligent document processing applications.
  • ULO2: Describe the functionality of the key components in document processing architectures.
  • ULO3: Implement text processing applications using a programming language.
  • ULO4: Apply web technology to document processing.

General Assessment Information

The assessment of this unit consists of three assignments and a final exam. You will submit the solutions to the three assignments via iLearn by the due date. The final examination is a closed book examination, and will be taken in person during the exam period.

Late Submission

Late submissions will not be accepted without an approved Special Consideration request.  Assessments submitted after the due date will receive a mark of zero.

Supplementary Exam

If you receive Special Consideration for the final exam, a supplementary exam will be scheduled after the normal exam period, following the release of marks. By making a special consideration application for the final exam you are declaring yourself available for a resit during the supplementary examination period and will not be eligible for a second special consideration approval based on pre-existing commitments. Please ensure you are familiar with the policy prior to submitting an application. Approved applicants will receive an individual notification one week prior to the exam with the exact date and time of their supplementary examination.

Assessment Tasks

Name Weighting Hurdle Due
Assignment 1 10% No Week 3
Assignment 2 20% No 2nd Week of Recess
Assignment 3 20% No Week 12
Final Exam 50% No Examination Period

Assignment 1

Assessment Type 1: Programming Task
Indicative Time on Task 2: 10 hours
Due: Week 3
Weighting: 10%

 

In this assignment you will implement a simple document processing application that uses pre-packaged tools.

 


On successful completion you will be able to:
  • Explain the main techniques that are used to develop and implement intelligent document processing applications.
  • Implement text processing applications using a programming language.
  • Apply web technology to document processing.

Assignment 2

Assessment Type 1: Programming Task
Indicative Time on Task 2: 20 hours
Due: 2nd Week of Recess
Weighting: 20%

 

This assignment will use more powerful techniques such as those used in commercial and research applications. You will experience the processing of real text data, which can be messy and unpredictable at times. At the end of the assignment you will submit a report describing the system, its implementation, and its evaluation.

 


On successful completion you will be able to:
  • Explain the main techniques that are used to develop and implement intelligent document processing applications.
  • Describe the functionality of the key components in document processing architectures.
  • Implement text processing applications using a programming language.
  • Apply web technology to document processing.

Assignment 3

Assessment Type 1: Programming Task
Indicative Time on Task 2: 20 hours
Due: Week 12
Weighting: 20%

 

In this assignment you will experiment with the integration of Semantic Web technology into document processing. You will be asked to study a particular domain and report on the integration of Semantic Web technologies suitable for the domain, including what sort of SPARQL queries would be applicable to solve specific user needs.

 


On successful completion you will be able to:
  • Explain the main techniques that are used to develop and implement intelligent document processing applications.
  • Describe the functionality of the key components in document processing architectures.
  • Implement text processing applications using a programming language.
  • Apply web technology to document processing.

Final Exam

Assessment Type 1: Examination
Indicative Time on Task 2: 2 hours
Due: Examination Period
Weighting: 50%

 

The final exam will focus on the theoretical aspects of the unit. There will be few questions about implementation issues.

 


On successful completion you will be able to:
  • Explain the main techniques that are used to develop and implement intelligent document processing applications.
  • Describe the functionality of the key components in document processing architectures.

1 If you need help with your assignment, please contact:

  • the academic teaching staff in your unit for guidance in understanding or completing this type of assessment
  • the Writing Centre for academic skills support.

2 Indicative time-on-task is an estimate of the time required for completion of the assessment task and is subject to individual variation

Delivery and Resources

Required and Recommended Texts

Most of the contents of the unit will be based on the following two books:

  • Steven Bird, Ewan Klein, Edward Loper. Natural Language Processing -- Analyzing Text with Python and the Natural Language Toolkit. Available online
  • François Chollet (2017). Deep Learning with Python. Manning Publications. Available in the library.
  • Dan Jurafsky and James H. Martin (2021), Speech and Language Processing (3rd ed. draft), Dec 29, 2021. Available online

Additional material will be made available during the semester, in conjunction with the lecture notes. See the unit schedule for a listing of the most relevant reading for each week.

Technology Used and Required

The following software is used in COMP3220:

  1. Anaconda for Python 3.9
  2. NLTK (bundled with Anaconda)
  3. Python SciKit-Learn (bundled with Anaconda)
  4. gensim (can be installed using Anaconda)
  5. spaCy (can be installed using Anaconda)
  6. Keras (can be installed using Anaconda)
  7. Tensorflow (can be installed using Anaconda)
  8. rdflib (can be installed using Anaconda)
  9. rdfizer (https://pypi.org/project/rdfizer/)
  10. Protégé (https://protege.stanford.edu/)
  11. Clingo (https://potassco.org/clingo/)

This software is installed in the labs; you should also ensure that you have working copies of all the above on your own machine. Note that many packages come in various versions; to avoid potential incompatibilities, you should install versions as close as possible to those used in the labs.

Unit Web Page

Note that the majority of the unit materials is publicly available while some material requires you to log in to iLearn to access it.

The unit will make extensive use of discussion boards hosted within iLearn. Please post questions there, they will be monitored by the staff on the unit.

Unit Schedule

 

 

Week Topic Reading

1

Python for Text Processing

NLTK Ch 1

2

Information Retrieval

Manning et al. (2008)

3

Text Classification 

NLTK Ch 6

4

Deep Learning for Text

Chollet, Ch. 2 & 3

5

Processing Text Sequences

Chollet, Ch. 6

6

Advanced Use of Deep Learning for Text

See lecture notes

7

Semantic Technologies

A Review of the Semantic Web Field

 

Recess

 

8

RDF, RDF Schema and SPARQL

RDF Primer

SPARQL

9

DBpedia and Wikidata

Wikipedia and DBpedia: a Comparative Study

10

Ontologies

OWL Primer

11

Rule Languages

Applications of Answer Set Programming

12

Recent Trends in Semantic Technologies

See lecture notes

13

Revision

 

Policies and Procedures

Macquarie University policies and procedures are accessible from Policy Central (https://policies.mq.edu.au). Students should be aware of the following policies in particular with regard to Learning and Teaching:

Students seeking more policy resources can visit Student Policies (https://students.mq.edu.au/support/study/policies). It is your one-stop-shop for the key policies you need to know about throughout your undergraduate student journey.

To find other policies relating to Teaching and Learning, visit Policy Central (https://policies.mq.edu.au) and use the search tool.

Student Code of Conduct

Macquarie University students have a responsibility to be familiar with the Student Code of Conduct: https://students.mq.edu.au/admin/other-resources/student-conduct

Results

Results published on platform other than eStudent, (eg. iLearn, Coursera etc.) or released directly by your Unit Convenor, are not confirmed as they are subject to final approval by the University. Once approved, final results will be sent to your student email address and will be made available in eStudent. For more information visit ask.mq.edu.au or if you are a Global MBA student contact globalmba.support@mq.edu.au

Academic Integrity

At Macquarie, we believe academic integrity – honesty, respect, trust, responsibility, fairness and courage – is at the core of learning, teaching and research. We recognise that meeting the expectations required to complete your assessments can be challenging. So, we offer you a range of resources and services to help you reach your potential, including free online writing and maths support, academic skills development and wellbeing consultations.

Student Support

Macquarie University provides a range of support services for students. For details, visit http://students.mq.edu.au/support/

The Writing Centre

The Writing Centre provides resources to develop your English language proficiency, academic writing, and communication skills.

The Library provides online and face to face support to help you find and use relevant information resources. 

Student Services and Support

Macquarie University offers a range of Student Support Services including:

Student Enquiries

Got a question? Ask us via AskMQ, or contact Service Connect.

IT Help

For help with University computer systems and technology, visit http://www.mq.edu.au/about_us/offices_and_units/information_technology/help/

When using the University's IT, you must adhere to the Acceptable Use of IT Resources Policy. The policy applies to all who connect to the MQ network including students.

Assessment Standards: COMP3220

COMP3220 will be assessed and graded according to the University assessment and grading policies.

The following general standards of achievement will be used to assess each of the assessment tasks with respect to the letter grades. 

Grade Range Description
HD 85-100 Provides consistent evidence of deep and critical understanding in relation to the learning outcomes. There is substantial originality, insight or creativity in identifying, generating and communicating competing arguments, perspectives or problem solving approaches; critical evaluation of problems, their solutions and their implications; creativity in application as appropriate to the course/program.
D 75-84 Provides evidence of integration and evaluation of critical ideas, principles and theories, distinctive insight and ability in applying relevant skills and concepts in relation to learning outcomes. There is demonstration of frequent originality or creativity in defining and analysing issues or problems and providing solutions; and the use of means of communication appropriate to the course/program and the audience.
CR 65-74 Provides evidence of learning that goes beyond replication of content knowledge or skills relevant to the learning outcomes. There is demonstration of substantial understanding of fundamental concepts in the field of study and the ability to apply these concepts in a variety of contexts; convincing argumentation with appropriate coherent justification; communication of ideas fluently and clearly in terms of the conventions of the course/program.
P 50-64 Provides sufficient evidence of the achievement of learning outcomes. There is demonstration of understanding and application of fundamental concepts of the course/program; routine argumentation with acceptable justification; communication of information and ideas adequately in terms of the conventions of the course/program. The learning attainment is considered satisfactory or adequate or competent or capable in relation to the specified outcomes.
F 0-49 Does not provide evidence of attainment of learning outcomes. There is missing or partial or superficial or faulty understanding and application of the fundamental concepts in the field of study; missing, undeveloped, inappropriate or confusing argumentation; incomplete, confusing or lacking communication of ideas in ways that give little attention to the conventions of the course/program.

Assessment Process

These assessment standards will be used to give a numeric mark to each assessment submission during marking. The mark will correspond to an appropriate letter grade when relevantly weighted. The final mark for the unit will be calculated by combining the marks for all assessment tasks according to the percentage weightings shown in the assessment summary.


Unit information based on version 2022.02 of the Handbook