Census microdata guide

Article Image

"Samples of individual person-level records drawn from the 1991, 2001 and 2011 Censuses"

For information on accessing the data, see Get Data, Microdata

What are census microdata?

Census microdata are datasets containing random samples of anonymous individual records. Because of this structure, they were previously known as 'Samples of Anonymised Records' or (SARs), this term is still in the name of the 1991 and 2001 files.

Each file contains a broad range of socio-demographic characteristics for respondents, with a particular emphasis on either individual, household, or geographical detail.

Downloadable files are designed to ensure that sample members cannot be identified. In order to achieve this confidentiality, the amount of detail available is restricted to a non-disclosive level and individual respondents only appear in one file. Although such measures are taken, the data still look like that which might be collected if you were to conduct a survey yourself, and can be analysed in the same way. The data are particularly well suited to analysis in a statistical package like SPSS, Stata or R.

The census microdata hold the further advantage of much larger sample sizes than are typical in alternative survey data sources. For example, the 2011 Safeguarded Individual file is a 5% sample containing nearly three million cases. The largest files, the 2011 secure file for England and Walescontain over five million cases each, however these need to be accessed in a secure setting.

The term 'census microdata' has been adopted to describe data for a single point in time. This contrasts with other individual level (or microdata) census products such as the Office for National Statistics Longitudinal Study, which links individual data records over time. However, unlike the Longitudinal Study, most SARs files can be downloaded and used at your own place of work rather than requiring access from a safe setting.

What can the census microdata tell us?

Microdata files enable researchers to analyse data in a very flexible manner. This enables users to:

  • apply their own definitions and create new variables
  • define tables
  • work with sub-populations
  • conduct multivariate analyses

Because the files are very large, they also permit analyses of relatively small sub-populations for which it is often difficult to obtain sufficient sample sizes in other survey data. Consequently, a major use of the Census microdata has been for the analysis of individual ethnic groups.

Download an Excel spreadsheet showing the topics available in each microdata file.

2011 DATA

Files for 2011 are available in a range of forms

Open Teaching Data

Open Government Licence Teaching files are 1% samples containing simple key variables. They are downloadble in csv format from the Office for National Statistics for England and Wales, Northern Ireland Statistics and Registration Agency for Northern Ireland and National Records of Scotland for Scotland.

The England and Wales, Scotland, and Northern Ireland teaching files are also available in Nesstar/SPSS/Stata format via the UK Data Service without registration.

Safeguarded Data available under licence

Each census office has produced two files. Each is a 5% sample of cases:

  • A regional file with enhanced detail on socio-economic classifications like age and country of birth.
  • A file with less detail on age and country of birth, but with more geography. Local authorities with populations above 120, 000 are distinguished, while smaller authorities are grouped with a neighbour.

These files are restricted to users who have agreed to some data management conditions to ensure that the confidentiality of respondents is protected.

More information about each file is available through the appropriate catalogue record see Get Census Microdata.

Secure data

Access to these data are tightly controlled as they contain additional detail which means that they are deemed to be 'personal' under the auspices of the Statistics and Registration Act. While the data are still anonymous, they do contain full local authority detail, full age, full country of birth.

Additionally sample sizes are larger than their equivalent safeguarded data. In 2001 the files were 5% samples. In 2011 the files were 10% samples. No similar file was released for 1991.

There is no current plan to make secure data available from the UK Data Service. Instead users will be able to access these data through the Virtual Microdata Laboratory. At the time of writing data were available for England and Wales (at Office for National Statistics) and Northern Ireland (at NISRA). Scottish secure data are anticipated.

Users who wish to use these data will need to contact the relevant census office.

THE 2001 INDIVIDUAL SAR

The 2001 Individual SAR is a good all-round easy to access file with considerable individual level detail, allowing comparisons between UK countries and regions. For example, Popham (2006) used this file to demonstrate how Scotland's higher levels of ill health, compared to England, can largely be explained through differences in employment and socio-economic position.

The 2001 SAM

sarmap

The 2001 SAM enables analyses to be undertaken at the local authority level. This level of geography enables mapping, local level tabulations, and multivariate analyses including local area level variables. These data differ from aggregate outputs such as the Census Area Statistics in that users can look at individual level characteristics within areas and define area level measures of their own choice. Users can define and use subsets and can create new classifications by grouping existing classifications or combining information from more than one socio-economic characteristic. The accompanying map is based on data from the SAM and illustrates the proportion of residents in London boroughs who are working age males with a professional qualification.

Further Reading


Li, Y (2004), Samples of Anonymised Records (SARs) from the UK censuses: A Unique source for social science research, Sociology, 38 (3): 553-572.

Popham, F. (2006) Is there a 'Scottish effect' for self reports of health? Individual level analysis of the 2001 UK census [computer file], BMC Public Health, ;6: 191

Back to top  

We are giving away £20 in Amazon vouchers to the first 100 people who complete our online survey*

Discover UK Data Service

Quick Access To