MAS8600/MAS8505/MAS8504 Graduate Foundations of Statistics and Data Science Report Briefing 2026-27

University Newcastle University (NU)
Subject MAS8600/MAS8505/MAS8504 Graduate Foundations of Statistics and Data Science

MAS8600/MAS8505/MAS8504 Report Briefing

Goals of this Session

In this lecture we will:

▶ Introduce the project task for the assessed coursework in this module

▶ Talk through the data we will be analysing for the project

▶ Answer some FAQs about the project

The Project

The Task

This assignment will require you to carry out an exploratory analysis of a previously unseen dataset, making use of the tools and techniques we’ve seen in the module

You will have complete free choice as to what aspect of the data to investigate, and you will be assessed solely on how you follow best practice when carrying out your analysis

You are not required to carry out any sophisticated statistical or computational analysis for this coursework. This is simply an exploratory analysis into the data (hence numerical and graphical summaries of the data will be perfectly adequate in terms of results)

The deadline for you to submit this coursework to Canvas is Friday

16th January at 4pm

The Context

You are provided with data from 7 years of a massive open online course (MOOC) developed by Newcastle University and run by the online skills provider FutureLearn

The course was titled “Cyber Security: Safety At Home, Online, and in Life”. It took 3 weeks to complete and was free to access (although learners could pay for a certificate to show that they had completed the course if they wished)

We have access to all the raw data collected by FutureLearn on the learners as they progressed through and interacted with the course

This is available via a zip file on the module Canvas page.

The Data

The data that you have access to contains .csv files pertaining to different aspects of student engagement (i.e., how much of the course the learner completed), their profiles (i.e., where the learners were from, what language they spoke), and sentiment (i.e., what they thought of the course)

This dataset is quite large, and so you should choose an aspect of it that you find interesting to investigate for your project. Learner IDs allow you to track individuals across the different data files.

Different Data Files for Each Run of the Course

▶ Archetype Survey Responses: the different ‘learner types’ of the different individuals taking the course

▶ Enrolments: data on the different learners: sex, country, age, etc., as well as when they enrolled and unenrolled

▶ Leaving Survey Responses: learner responses to a survey on why they left the course

▶ Question Response: how learners performed on different quiz questions during the course

▶ Step Activity: when learners first attempted/completed the different steps of the course

▶ Video Stats: where and how the different videos from the course were watched

▶ Weekly Sentiment Survey Responses: weekly feedback responses from the different learners

The Project

You are to carry out an exploratory investigation into the FutureLearn data, to generate insights that you think would be relevant/interesting to stakeholders

You have complete freedom to choose any aspect of the data to investigate, and you won’t be judged on how sophisticated your analysis is, or whether you actually manage to generate the insight(s) you looked for.

When choosing your area of investigation, you should be mindful of the timescale of the project, and so should avoid investigations which would be particularly difficult or time consuming to carry out.

The Analysis

You should identify a research question related to the FutureLearn data that you find interesting, and then carry out an exploratory analysis to investigate it, following the steps and structure of CRISP-DM.

For your analysis, you should complete 2 cycles of CRISP-DM (i.e., you should investigate multiple questions, where the second question is based on the results of the first question. For example, this might comprise a first investigation of “which countries/continents are engaging with the FutureLearn course?”, then a second investigation of “how has this changed over the different years of the course?”)

You should document your analysis and findings in an R Markdown report (with a strict page limit of 15 pages)

CRISP-DM Cycles

MAS8600/MAS8505/MAS8504 Graduate Foundations of Statistics and Data Science Report Briefing 2026-27

Deliverables

There are 2 deliverables that you are required to submit for this project:

▶ Report (50% of total mark) – A report describing the steps in your analysis, making reference to the different stages of CRISP-DM

▶ Code (50%) – A zip file containing the folder created by ProjectTemplate, containing all of the code needed to run your analysis and generate your report

Code Submission Deliverables

Within your code file, you should have:

▶ ALL directories created by ProjectTemplate (data, munge, cache, etc.)

▶ A README file at the top level which describes what the analysis does, gives instructions on how to run it, etc.

▶ A file containing the Git log for your project

▶ A lockfile created by renv which describes the package versions used to build your analysis

Carrying Out the Analysis

You should make use of the tools and techniques taught in the module when carrying out your analysis. Specifically, you should:

▶ Structure your analysis following the steps of CRISP-DM, and document this in your analysis report

▶ Use ProjectTemplate to structure your directory and run your analysis

▶ Make use of dplyr to handle data transformation/wrangling

▶ Use ggplot to produce any plots which showcase your findings

▶ Write your analysis report using R Markdown

▶ Use Git to back up your project

▶ Use renv to ensure that your project is reproducible

Stages of CRISP-DM

▶ Business Understanding – Who is your stakeholder? Why will your investigation help them? What are your analysis goals?

▶ Data Understanding – What data are you working with? What is the data like? Are there any quality issues?

▶ Data Preparation – What data transformation steps did you make, and why?

▶ Modelling – Does NOT necessarily mean that you need to build a stats/ML model. Just a phase where you generate insight to answer your question, so just using plots/tables is fine!

▶ Evaluation – Were you able to achieve your original goal? If not, why not and how could you change things? If so, what could you do next to take your insight further?

▶ Deployment – What are the main findings and takeaways from your project that need to be communicated?

Report Advice

▶ The exact structure and format of your report is up to you, but you should make sure to discuss all relevant aspects of CRISP-DM where appropriate

▶ How long you spend talking about the different aspects of CRISP-DM will depend on your own specific investigation. It’s perfectly fine for you to spend more/less time talking about something than someone else, so just focus on what you need to say for your own investigation

▶ If your second cycle is closely related to your first cycle (e.g., you’re using the same data from a different run of the course), you don’t need to repeat what you say in the early phases from your first cycle, i.e., it’s fine just to say that you’re using the same/similar variables, rather than needing to introduce them again for the second cycle (you don’t need to do lots of copying and pasting!) – it’s normal for the part of your report for the second cycle to be shorter than for the first cycle!

▶ Be sure to highlight how your first CRISP-DM cycle feeds into your second cycle

Report: What Are We Looking For

In the report, we do want to see:

▶ A description of 2 cycles of CRISP-DM, with a clear link from the first cycle to the second

▶ That the report covers all phases of CRISP-DM

▶ A well-presented report, with plots made using ggplot and tables nicely presented (not unformatted R output!)

▶ A clear narrative, with results tied to the research question identified

Report: What Are We NOT Looking For

In the report, we do not want to see:

▶ Only one (or zero!) cycles of CRISP-DM

▶ More than two cycles of CRISP-DM

▶ Lots of results/plots with no descriptive text

▶ Important phases of CRISP-DM not mentioned

▶ Results/plots that don’t relate to the research question

▶ Complicated plots which are difficult to read and interpret

Analysis Advice

▶ Make sure your code is annotated and well formatted – make it as easy as possible for someone else (e.g., the marker) to understand what each piece of code does

▶ We’re interested in how you carry out your analysis (using the tools from the module) rather than what your analysis is/what it finds – avoid trying to carry out an investigation that is too big/complex to complete before the deadline

▶ Use tidyverse packages (e.g., dplyr and ggplot) to carry out your data analysis and visualisation

▶ Make regular Git commits throughout your work on your project

▶ Your project should be fully executable by clicking Knit in your .Rmd file (i.e., there should be a chunk in your .Rmd file which runs load.project())

Making the Submission

All submissions should be made to Canvas under the Assignments tab

Please make sure well in advance that you know how to submit your deliverables

Please please please make sure to check you’re submitting the correct files when you make your submission

Any submissions made after the deadline will be capped as late. This is a central university process, which has nothing to do with module leaders and can’t be changed

Tip: Don’t Wait Until the Last Minute

The deadline for all submissions is 4pm on Friday 16th January 2027, but please don’t plan to start uploading at 15:59!

You have to make two separate uploads, which can be quite large files, and may take a little bit of time to upload (especially if you’re somewhere with not fantastic WiFi).

Work submitted late will be capped as late (again a central university process, nothing we can do) so please don’t take the risk.

Aim to submit by 3:30pm at the latest on deadline day to avoid stress!

looking for MAS8600, MAS8505, and MAS8504 assignment help for university students?

many students find MAS8600, MAS8505, and MAS8504 statistics and data science reports difficult because this project requires CRISP-DM analysis, R markdown reporting, data wrangling, ggplot visualisation, and proper use of tools like dplyr, Git, and renv. some learners also struggle to choose a research question, organise multiple CRISP-DM cycles, or present insights from large datasets in a clear academic format. if you are also facing problems with this coursework, Students Assignment help provide you with data science assignment help and human-written newcastle support prepared according to university guidelines. you can also explore our other newcastle university assignment samples to understand the quality of academic support we provide for university students.

Answer
img-blur-answers
WhatsApp Icon

Facing Issues with Assignments? Talk to Our Experts Now!Download Our App Now!

Have Questions About Our Services?
Download Our App!

Get the App Today!

QRcode