Evaluation of centre assessment grades and grading gaps in summer 2020

Evaluation of centre assessment grades and grading gaps in summer 2020

This research used data made available via the Office for National Statistics (ONS) Secure Research Service, which is being expanded and improved with ADR UK funding.

Author: Dr Tim Stratton, Nadir Zanini (Ofqual) and Dr Philip Noden (Ofsted)

Date: July 2021

Research summary

A research project by Ofqual, the Department for Education, Ofsted, and UCAS used secure data to showcase the potential of the new Grading and Admissions Data for England (GRADE) dataset. The project evaluated the use of centre assessment grades to inform policy decisions around the future grading of GCSEs and A-Levels.

Due to the Covid-19 pandemic, the 2020 GCSE and A-Level summer exams were cancelled across England. Schools and colleges were asked to allocate grades based on teacher judgement: centre assessment grades (CAGs). CAGs were intended to be standardised, resulting in calculated grades. However, the standardisation process did not gain public confidence, and a policy change in August 2020 meant that grades awarded to students were based on the higher grade from their CAG or calculated grade.

This project sought to understand any bias or unfairness which may have been created by the change. Findings indicated that most relationships between grades and candidate or school features had not substantially changed compared to previous years, and the process did not unfairly affect any subgroups studied.

Hear Dr Tim Stratton, Nadir Zanini and Dr Philip Noden discuss this research as part of the ONS Research Excellence series on Thursday 20 April, 11:00-12:00. Register now.

Data used

A preliminary version of the Grading and Admissions Data for England (GRADE) dataset was the primary data source for this project. It contained candidate level results in England from 2018-2020 combined with candidate and school information from the National Pupil Database. This version of the data was held by Ofqual before it was uploaded to the ONS Secure Research Service.

GRADE is a dataset created through a sharing initiative between Ofqual, the Department for Education, and UCAS. It is the largest available dataset in England collecting teacher grading judgements.

Data was also matched to publicly available data on school attributes.

Methods used

For the first strand of analysis, researchers analysed patterns of CAGs compared to grading of previous years through multilevel models. Grades were converted into a numeric score – the dependent variable. Models were then nested by layering in variables from different levels: prior attainment, candidate, centre (school or college), subject-within-centre, and subject. These nested models were run separately for each year: for exam grades in 2018 and 2019, and CAGs in 2020. Random effects of candidate ID and centre number were included in all models to account for the hierarchical structure of the data and the non-independence of datapoints within these groups.

These models identified grade variations explained by the variables at each level, and whether the amount of variance changed between years. Researchers could then pinpoint characteristics that had larger effects on candidates’ grades in 2020 than previous years. They also compared the coefficients produced by the models for each year to observe any notable changes between 2020 and previous years.

The second strand of research focused on ‘grading gaps’: the difference in a candidate’s CAGs and calculated grades. Binary logistic regression was used to identify whether candidates with certain characteristics were more likely to have a large grading gap (three or more grades). Further steps in the analysis identified how these associations changed when clustering at centre level and subjects taken by individual candidates were accounted for.

Research findings

Compared to previous years, there was an overall increase in outcomes by around half a grade in 2020 for both GCSE and A-Levels. However, most relationships between grades and other features studied had not substantially changed. This suggests that teacher grades did not introduce any substantial sources of bias or differences in the relationships between grades in 2020 and the observable characteristics studied.

The strongest predictor of grade outcomes was a candidate’s prior attainment, for both GCSE and A-Level. In 2020, this relationship was slightly stronger compared to previous years, but candidate and school features explained the lower variance. This increase in predictive power of prior attainment may represent CAGs factoring out ‘unpredictable’ variations in student outcomes. Unpredictable variations are often seen in normal years due to factors such as exam anxiety, last minute revision, or the combination of questions which come up on exam papers. Alternatively, it may represent teachers’ over-reliance on prior attainment as a source of data without sufficiently considering individual candidate differences in performance.

Due to the increase in mean grade, there was some evidence that at the top of the grade distribution there was a plateauing of the relationship with prior attainment. On average, 2020 candidates with the highest prior attainment received slightly smaller increases in grades compared to previous years. This is because the level of increase was limited due to the ceiling effect of the available grade range.

The most notable candidate level change was a decrease in the attainment gap between male and female students at A-Level. Though boys’ grades had previously somewhat exceeded girls’ grades on average, this effect was reduced in 2020. Other marginal effects explained a very small amount of the variation in grades.

At subject level, for both GCSE and A-Levels, subjects with more non-exam assessments tended to have the largest increase in grades in 2020. This could be due to teachers using results from internally marked non-exam assessments, which are usually a candidate’s highest graded element, to inform the allocation of CAGs.

Analysis also suggests that ‘facilitating’ subjects at A-Level - subjects considered to provide access to the widest range of university courses - tended to be more generously graded than non-facilitating subjects.

Although there were small differences between groups of students in their probability of a 3-grade gap between their CAGs and calculated grades, these disappeared once centre level clustering and subject choice were accounted for.

Research impact

This research helped to provide an evaluation of the grading approach taken by Ofqual in summer 2020 by giving additional assurance that the approach did not unfairly affect any subgroups of students.

By providing a nuanced view of the pros and cons of teacher grading, it gave valuable insights to inform Ofqual and the Department for Education’s decisions and policies around exams and grading in 2021 and 2022. It reassured that using elements of teacher judgement in teacher assessment grades was a valid approach in the circumstances of 2021 and justified the move back to exam-based assessments for 2022.

This research also provided a valuable example of how the GRADE dataset could be used and created collaborative links across different organisations. It highlighted the value of gaining insight from researchers and analysts with different perspectives across the education system, and has helped pave the way for future collaborative projects.

Research outputs

Publications and reports

Blogs, news posts, and videos

Presentations and awards

About the ONS Secure Research Service

The ONS Secure Research Service is an accredited trusted research environment, using the Five Safes Framework to provide secure access to de-identified, unpublished data. If you use ONS SRS data and would like to discuss writing a future case study with us, please ensure you have reported your outputs here: Outputs Reporting Form.

Share this: