Data explained

Data explained

Sourcing data

What data do I need?

The types of data that are most commonly linked for health and health-related research are: administrative data, clinical data and project-specific data.

Administrative data is information that is routinely collected as part of the delivery of a service. These data are collected from service providers by government departments and agencies, and other organisations, and used for policy, planning, management, monitoring, evaluation, and research purposes.

Key features of administrative data include:

  • Collected during service delivery 
  • Population level coverage
  • Availability of historical data
  • Collection by government generally required or authorized by law 
  • Mandated reporting requirements typically allow this data to be collected without patient consent 
  • Used by government departments for planning, monitoring, and funding

Clinical data are the detailed patients records that are collected and used by health professionals during the provision of care to patients. 

Examples of clinical data include medical notes, pathology results, imaging for example, X-ray, CT scans and other health test results.

Key features of clinical data include:

  • Collected during service delivery  
  • Collected either directly from the patient or with their explicit knowledge and consent 
  • Used by health professionals for the provision of clinical care.

Governments and other entities such as research organisations may collect information from time to time for a particular purpose such as a survey of health behaviours or a specific research project. Subject to relevant approvals, researchers can request data linkage units to link these data with administrative data and/or other project-specific data collections

Key features of project-specific data collections include:

  • Collected for research 
  • Collected with the participant's consent 
  • Sometimes includes consent for access to other information 
  • Used for research or a specific purpose which is not administrative or clinical.

Data collections are often categorised based on whether they are part of a Data Linkage Unit’s Master Linkage Map. The Master Linkage Map is a system of continuously updated links within and between datasets. 

Core data collections: Data collections that are routinely linked into the Master Linkage Map. 

Ad hoc linkage: Data collections that are linked on a project-by-project basis.

A list of the core datasets (those that are routinely linked) can be accessed here.

A list of the data collections currently available are provided on each of the data linkage unit websites

Most data linkage units provide information not researchers about data quality and the linkage processes used to produce their linked data.

Who has the data?

Each data collection is assigned a Data Custodian who is responsible for the day to day management of data:

  • Data collection/input
  • Database management and development
  • User support
  • Metadata development
  • Data validation
  • Reporting and analysis

Client Services

  • Usually liaise with Data Custodians on behalf of the researcher
  • May suggest the researcher contact the Data Custodian

to discuss the type of information held in the data collection, quality and whether it is likely that the proposed research question can be answered by the data requested.

Details of the Data Linkage Unit and Data Custodian for each of the core datasets (those that are routinely linked) can be accessed here.


Data flow

Who will I get the data from?

The researcher will receive the data from each of the data custodians from which they requested data, from the data linkage unit, or a combination of both. Some data linkage units assist with the preparation of data prior to release to researchers. The tasks associated with this service include pre-merge checking of data extracts, addition of derived variables to data extracts and merging of data extracts, post-merge checking prior to making data available to researchers and provision of data to researchers.

How will I get my data?

It is usually recommended or required for the linked data to be made available to the researcher/s in an appropriate Secure Access Environment, such as SURE.

Data format

What will my data look like when I get it?

As a researcher you will receive only the Project Person Numbers (PPN), Project Event Number (PPE) and their associated content variables, as listed in your approved application.

The amount of data researchers receive and how it's structured depends on the number of data files and variables requested, the temporal scope, and the size of the requested cohort.

Depending on the data linkage unit involved, the data may be provided to the researcher already merged. In most cases the researcher will receive the data as multiple files and be required to merge the data themselves.  A separate file is usually provided for each data collection in each year. For example, a researcher applying for data from the birth registry, perinatal data collection and admitted patient data collection, for the date range 2000-2009, would typically receive 30 files in total.

The data will be delivered in a variety of different formats, depending on the data linkage unit and data collection involved. Some data linkage units may deliver the data in a standardized format that can be easily read into any statistical analysis software, e.g. tab delimited text files. The format and standards relating to core datasets (those that are routinely linked) can be accessed here. In addition to the data files, researchers may also be given metadata for each corresponding data collection, including a data dictionary. The data dictionary provides coding information to assist researchers in interpreting the data.