Systematic Review Guidelines

Systematic Reviews

This page provides more ‘formal’ documentation about the different procedures involved in performing a systematic review. The figure below provides a schematic model of these.

GuidelinesStructure Download

Secondary Study Guidelines

Goal: The goal of these guidelines is to provide an overview of the systematic review process suitable for novice researchers.

Scope: These guidelines are intended to support all types of secondary study including standard quantitative systematic reviews, mapping studies, qualitative reviews, mixed reviews and tertiary studies. However they rely heavily on the guidelines developed for medical research, and assume that all systematic review have a core goal to support process improvement with high quality empirical evidence.

The guidelines are intended only to provide an overview of the systematic review process. However, doing a systematic review well is difficult. Before deciding to undertake a systematic review, novice researchers are advised to read some existing SE systematic review reports, and if possible, to gain practical experience by working as a member of a systematic review team. We also provide some useful tutorial material on the Resources tab that may be helpful for planning as well as undertaking a systematic review.

The guidelines are organized around the three major stages in a systematic review:

Planning
Conduct
Dissemination

Outcome: Novice researchers should have sufficient knowledge to understand text books that discuss systematic reviews in more detail, and, with appropriate supervision, should be able to act as a member of a systematic review team.

Conduct

To be completed.

Selection Process

Definition:

The selection process defines how each candidate primary study is included in or excluded from the set of primary studies used in a systematic review, mapping study, tertiary study, or rapid review.

Goal:

To ensure that each and every candidate primary study is correctly classified as suitable or not for inclusion in the primary study.

Input:

A list of candidate primary studies including citation information and abstracts.

The protocol, in particular the inclusion and exclusion criteria, should specify methods for handling documents that report two or more different empirical studies, methods for handling documents authored by review team members, methods used for assessing candidate studies and the process steps to be adopted.

Methods:

Several different methods of assessing candidate studies have been used in systematic reviews. The process recommended by SR guidelines is to use two or more members of the review team working independently to assess each candidate primary stage in each stage.

Other methods can also used:

A less rigorous procedure that can be adopted by graduate students with only a supervisor as another reviewer, is for both reviewers to assess a random sample of the candidate primary studies, apply the eligibility criteria and assess their agreement statistics. Any disagreements are discussed and clarified, and then the student assesses all the remaining candidate studies. In this case additional processes can be used to reduce some of the inherent risks. For example, the supervisor can be asked to confirm the exclusion decision of any studies that were excluded after reviewing the full text.
If only a single reviewer is available, a test-retest method can be used. The reviewer assesses each study. Then they wait for several days and repeat their assessments (preferably changing the order in which they assess specific studies). Agreement statistics can then be assessed and problems with the clarity of eligibility criteria identified. Any candidate studies where consecutive assessments disagree need to be assessed in more detail until a final decision is made. Note this approach can also be used when a graduate student assesses the majority of the candidate primary studies.

Healthcare SR guidelines also suggest the use of text analysis tools[1]. Such tools need to be trained on a random selection of candidate primary studies. This method is a good fit for reviews where a supervisor only has time to review or assess a subset of the candidate primary studies. Tools are useful to improve consistency with assessment decisions if there are a large number of candidate primary studies and to reduce the risks that arise errors when single researchers assess studies.

Methods of resolving disagreements include:

Discussions between the reviewers until an agreement is reached.
Allocating the disputed candidate primary study to another independent reviewer and accepting the consensus decision.
Trialing the data collection process on the paper – if it does not contain the required information it can be excluded from the review.

Usually option 1 is adopted and if the original reviewers cannot come to an agreement, the team leader needs to activate options 2 or 3.

Process:

The selection process recommended in systematic review guidelines is usually performed in two stages:

Each candidate primary study is assessed based on its title, abstract and key words. It is classified as excluded or not excluded (both those studies that the assessor believes should be included and those studies where the assessor is unsure should be assigned to this category). The reason for exclusion should be reported.
Each remaining candidate primary study is re-assessed based on the full text of the article. In this step, assessors need to specify whether a candidate primary study should be excluded or included. Occasionally, a preliminary stage is used to eliminate studies based upon only the title and key words.

Usually two assessors are allocated to each candidate primary study in each stage. If the assessors disagree, the disagreement must be resolved. In the first stage, if there is no consensus for eliminating the study it should progress to the second stage. In the second stage, there should be consensus either for including or excluding the study.

In most cases, tools are needed to organize the process effectively. They are used to maintain information about the status of each candidate systematic review, and monitor its progress. The larger the number of candidate primary studies and review team members the more important it is:

To employ database tools rather than spreadsheets and bibliographic tools to help manage the process.
To have a team leader responsible for reviewing the progress of the selection process and organizing the process used to resolve disagreements.

Iteration:

If the main search process is based on snowballing, the entire process is iterative, with each iteration of the selection process, followed by an iteration of the search process. If automated searches are used, and a snowballing activity is being used as a secondary method to reduce the risk of missing relevant studies, this process will first be invoked for the results of the automated searchers. Then the primary studies identified among the automated search results will be subjected to a round of snowballing and the selection process will be re-activated to assess any new candidate primary studies.

Outputs:

The final inclusion/exclusion decision about each candidate primary. The reason for excluding a candidate primary study and the stage in the process at which it was excluded should be recorded.

Agreement statistics should be reported for the first time that assessments from different review members are compared for each different each stage in the process.

The section process may be represented graphically as a flow chart showing the number of candidate studies entering each stage of the process and the number excluded by that stage.

Verification:

Verification methods include:

Monitoring agreement statistics and responding to poor or declining agreement statistics.
Tools that maintain records of the assessment status and final classification of each candidate primary and ensure that all candidate primary studies are properly assessed.
Additional assessments of all candidate studies excluded after the full text has been assessed by team members
Using text analysis tools to provide an independent method of assessing study eligibility to compare with assessment made by team members.
Using citation and visualization tools to investigate the relationships among studies and the final classification of the studies. These can be used to identify studies that are assessed differently to other related studies and that should then be re-assessed.

Risks of Systematic Review Bias:

The main risk of bias during selection is one of misclassifying studies that should be included as excluded. Including a study that should be excluded is also a risk but not quite as serious because it is likely to be detected during data collection. Misclassification is mainly due to:

Human error caused by fatigue, misunderstandings or mis-transcription[2], or misleading reporting by the authors of primary studies (e.g., misleading titles, unclear abstracts, invalid keywords, or unjustified claims).
Personal biases on the part of team members
Ambiguities or errors in the inclusion/exclusion criteria.
The complexity involved in managing a multi-step, multi-person process.

Risk Mitigation:

Risk associated with human errors are addressed by requiring two or more team members to independently assess each candidate primary study and a well-defined procedure for handling disagreements.

Risk associated with personal biases are addressed by ensuring that team members do not assess studies that they themselves authored.

Ambiguities or errors in the eligibility criteria can lead to inconsistent eligibility decisions by different review team members. Problems can be identified by monitoring agreement statistics at various times during the process. If agreement statistics are poor, the reasons should be investigated and then either refining the eligibility criteria or giving team members given additional help with interpreting them. Team members should be encouraged to report any problems with the eligibility criteria, since it is possible that some aspects of eligibility were not anticipated when the protocol was developed.

Administering a process of multiple assessments and agreement activities with different stages is difficult and error prone without tool support. For small-scale reviews a spreadsheet or bibliographic tool may be sufficient, but for large-scale reviews with many reviewers, a special purpose systematic review tool may be needed.

[1]

[2] Any software engineers who have been involved in pair-programming will have realized how frequently mistakes and transcription errors occur.

Data Definition and Data Extraction Process

Definition:

The data definition and extraction process identifies and defines which data items are required to address the SR research questions and specifies how the required data will be extracted.

Background:

SR data fall into two broad categories:

Data required for all forms of SR and all research questions.
Data required to answer the specific research questions.

Data required for all types of SR include:

Citation information (e.g., author names, title, publication source, publication date)
Goal of study
Study methodology (as assessed by the study authors, and as assessed by the reviewers)
Study context (e.g., industry/academia/mixed)
The object(s) of study (e.g., participant types, products/component types, software organization)
Main conclusions

Data required to answer research questions depend on the type of SR.

For mapping studies, the research questions are concerned with charactering the study, so are usually based on a classifications scheme. A popular form of classification is to specify whether the study:

Provides a discussion of a SE topic for example defining a topic and a discussing its important and scope.
Identifies a problem associated with the SE topic.
Suggests a solution to a known problem.
Provides a validation of a proposed solution (e.g., a small scale example of the solution, or an experiment in an artificial setting).
Provides an evaluation of a proposed solution (e.g., a full scale example of the solution in an industrial setting).

This classification is consistent with using a mapping study as a means of assessing whether sufficient primary studies exist to undertake a full systematic review. The reviewers need to define the classification schemes need to address their specific research questions.

For quantitative SRs, research questions are usually about the comparing the effects of alternative SE methods, procedures, or tools on the software development, management or maintenance in terms of effort, time, quality, or productivity. SRs need to define and extract the data items used to represent those properties.

For qualitative SRs, research questions are usually about the benefits and risk associated with a specific SE method, procedure, or tool. Qualitative SRs often extract lists of the benefits and risks reported in the primary studies together with their definitions. If the primary study is an opinion survey, it will also report the number of respondents that reported specific benefits and risks and this information should also be extracted. Important issues for qualitative synthesis include:

Reconciling results from opinion surveys with those from case studies and ethnological studies. In order to do this fairly, synthesis should ignore the frequencies mentioned within the primary study, and report frequencies based on the number of time a particular factor was mentioned across the set of relevant primary studies.
Ensuring that only results from research performed in the specific primary study are included in the data extraction. If you are using text analysis tools be careful that any mention of factor that only occurs as something reported other research, is not included as a factor found in that primary study. If you are performing manual extraction make sure you restrict yourself to reports of the primary study research outcomes.

Some SRs may include primary studies of different types, in such cases different data extraction forms for each type of study may be required.

The data definition and collection process involves three processes:

Specifying the Required Data
Defining the Data Extraction Process
Extracting the Data

Specifying the Data

Goal:

To identify and define the data to be extracted from each primary study

Input:

The SR research questions.

The set of known primary studies.

The type of SR being planned

Method:

The research team member responsible for producing the protocol should review the set of known primary studies

To help specify the basic primary study data that will be collected.
To assess whether several different data collection forms are likely to be required.

Then:

Use the research questions to identify the data that must be extracted from each primary study.
Construct one or more data collection forms which should include all necessary data definitions and data extraction guidelines.

Note, citation information is usually obtained from extracted search engines not extracted manually.

The format of the data form needs to be defined. Options include:

Paper forms
Spreadsheets
Online forms
Databases

The format and storage of extracted data should be designed to support the requirements of the planned data synthesis process, in particular, manual transcription from the data form to another medium for analysis should be avoided. Storage of the data forms should allow for multiple versions of the data form for each primary study (i.e., independent reviews from two or more members of the review team member and the final agreed version).

Verification:

All team members need to trial the data extraction form(s). At least two members of the review team should manually extract data specified on the form(s) for a specific known study, and all different types primary study should be trialed. Problems reported by the reviewers, and disagreements among the data extracted by reviewers should be discussed (preferably in a meeting attended by all team members). Then, the data forms and data definitions should be amended if necessary.

Risks:

The known set of primary studies are not representative of the variety of primary studies that will be found by the search process. The newly found primary studies may include results reported in a format that was not anticipated when the data extraction process was defined.

Risk Reduction:

Ensure the set of known primary studies comes from different research groups and was published in different venues including recent topic-specific conferences. If the set of primary studies appear insufficient, look to find some other relevant studies (by manual inspection of recent conferences, forward snowballing of highly cited known primary studies, or by asking a topic expert to recommend some studies).

Outcome:

Data extraction forms with data definitions and data extraction guidelines reported in the study protocol.

Defining the Data Extraction Process

Goal:

To define how the data collection process will be organized.

Input:

The data forms and associated guidelines.

Process:

The research team member responsible for producing the protocol should specify the data extraction process. Normal best practice for data extraction:

At least two review team members are assigned to extract the data from each primary study.
The data forms for a specific primary study must be reviewed and any disagreements resolved.
The process for resolving difference should be defined. Usually this is based on discussion among the data extractors with the option to involve another tam member in the event that the differences cannot be resolved.

If review team intend to use a text analysis tool to assist data extraction:

The process for training the tool must be defined.
The team members must understand how to use the tool.

If other less rigorous methods of organizing the data collection process are planned, they must be defined and justified.

The protocol should define how many review team members will be assigned to each primary study and specify the method used to organize the assignment. Simple random assignment is seldom the best option. The assignment process should ensure that:

The workload is fairly distributed among the review team members.
The reviewers assigned to a specific primary study are not all SR novices.
While considering the first two issues, reviewers should be assigned to data extraction for primary studies they assessed during the selection process.
Unless there is no other option, reviewers should not be asked to extract data from their own studies.

The agreement statistic that will be collected should be specified with guidelines for when they will be calculated and an explanation of how the values will be used.

Risks & Risk Mitigation:

See Specifying the Data.

Verification:

The trials of the data extraction forms should also be used to try out the data extraction process.

Outcome:

The definition of the data extraction process reported in the protocol.

Extracting the Data

Goal:

To extract and record the data required to answer the SR research questions from each primary study.

Input:

A list of all primary studies.

The data collections form(s).

The protocol defining the data items and describing the data extraction process.

Process:

Organizational Process

The leader of the review team needs to allocate review team members to each primary study as specified in the protocol, and should then monitor the progress of the process by:

Checking that data extraction assignments are completed in a timely manner.
Checking that disagreements about the data extracted from a specific primary study are properly resolved, assigning another reviewer to the specific primary study if necessary.
Calculating and monitoring the agreement rates, and taking action if agreement rates are low (e.g. coaching individual team members, or revising data extraction guidelines).

Data Extraction Process

The data extraction process should implement the process defined in the protocol:

Team members should extract data from the primary studies to which they are assigned.
Once they have completed extracting data from a specific study, they should notify any other reviewers assign to that study.
Once all initial extractions are complete for a specific study, reviewers should compare their extraction forms and identify any disagreements.
The original data extraction forms should be copied to the team leader to calculate agreement statistics. Any disagreements should be resolved using the procedure defined in the protocol.
The final agreed data collection form should be made available for the data synthesis process.

Risks:

Risk 1: Investigating excessive numbers of disagreements may extend the time need to completed data extraction.

Risk 2: Reviewers may misinterpret the data definitions or misunderstand the data collection process when dealing with primary studies that are not among the original set of known primary studies.

Risk Mitigation:

To address risk 1: Reviewers should to specify the location of the extracted data in the primary study. This should make it easier to identify the source of any disagreements.

To address risk 2:

Agreement statistics can be reviewed partway through the conduct of the data collection process to check the level of agreement is acceptable.
All team members should be aware of the need alert the team leader if they have problems with data extraction for specific data items or primary studies.

Outcome:

The data required from each primary study.

Data Analysis and Synthesis

Definition:

The data analysis and synthesis process identifies and defines how the extracted data items will be analysed in order to address the SR research questions.

Background:

Contextual information relating to the primary studies are used in three ways:

To identify subsets of primary studies that should be analysed together. This usually means groups of primary studies that address the same issues and use similar empirical methods.
In the context of quantitative synthesis, to undertake sensitivity analysis (which assesses whether results are reliable – i.e. not due to a single atypical result) and heterogeneity analysis (which investigates whether there are likely to be missing studies, and whether difference among results can be linked to different primary study contexts).
To assist the strength of evidence of assessment of the analysis/synthesis findings.

Mapping study data usually classifies the primary study in different ways. Such data is usually presented in tables that look for associations among different categories used classify aspects the primary studies (e.g., for empirical studies the relationship between the type of study reported by the authors and the type of study proposed by the reviewers), or graphs that plot characteristics of the studies over time (e.g., number or types of primary study published each year). Mapping study data is seldom formally analysed.

Quantitative analysis is used when primary studies report the results of quantitative experiments or data mining studies. They usually involve some effectiveness measures such as document or code readability or faults, or process effort, duration, or productivity. Primary studies may report investigations of the relationships:

Among different effectiveness measures.
Between effectiveness measures and software engineer practices (e.g., methods, processes, procedures, tools).
Between effectiveness measures and product characteristics (such as size, complexity, maintenance history).

Aggregating such primary study outcomes usually involves formal meta-analysis. It is wise to have a statistical expert as part of the review team if such analysis is needed.

Qualitative analysis is usually used to answer questions relating to personal opinions towards some software engineering practice in terms of its benefits/value and limitations/risks and any constraints associated with adoption. It is also sometimes used to assess whether sociological models (such as models of motivation, management styles or personality types) are relevant to the software industry. In this case data synthesis involves:

Identifying conceptual equivalence between factors mention in the primary studies. This involves
1. Identifying each factor mentioned in a primary study and grouping factors that address different aspects of a more abstract characteristic (often referred to as a second level factor).
2. Doing the same analysis for another primary study
3. Comparing to arrive at definitions of first level and second level factors and synonyms for factor names.
4. This process of comparing and refining terminology and definitions continues until data from all relevant primary studies are synthesized.
Assessing the frequency that different first and second level factors were mentioned across the set of primary studies and identifying the most frequently reported issues. We assume that issues reported in many different primary studies are likely to be the most important issues faced by other software organizations.

The data analysis and synthesis process involves two processes:

Specifying the Data Analysis and Synthesis Process
Conducting the Data Analysis and Synthesis Process

Specifying the Data Analysis and Synthesis Process

Goal:

To define the methods that will be used to analyse the data extracted from each primary study

Input:

The SR research questions.

The set of known primary studies.

The data definition and extraction process.

Method:

The method of analysis and synthesis is defined by the research questions, the primary study types and the data extracted to address the research questions.

A single person must take responsibility for specifying the data analysis and synthesis process, taking into account the specific types of SR.

Prior to analyzing data it must be clear how primary studies should be grouped together to address each research question. The known primary studies and the existing data collection forms should give the review team an indication of the importance of this issue. The protocol should define how this process will be performed. For example, once all the primary studies are agreed, the review team leader should assess whether the primary studies need to be split into subsets and provide an initial breakdown if subsets are required. Other member of the review team should assess the subsets given their experience of the primary study selection process.

For mapping studies, the link from extracted data to answering research questions should be well defined and allocation of review team members to the analysis task can be done on a basis of availability. It is advisable, however, to agree some presentation/reporting guidelines, for example:

Avoid using tabular analysis that requires mutually exclusive categories, unless that property can be guaranteed.
Keep to the simplest adequate representations of the results – over-complex representations can be misleading.
Avoid too much use of colour – some readers may be colour-blind so some combinations of colors can decrease readability.
Avoid extremely small text fonts – some readers may have restricted eyesight,

For quantitative reviews, the analysis process usually requires some statistical expertise, which may mean relying on a single member of the research team. The review team member responsible for the analysis should take responsibility for specifying the statistical methods to be used and how results will be reported.

For qualitative SRs, it is important that that the construction of first level and second level factors is not assigned to a single reviewer. It often involves substantial effort and is based on subjective assessments, which means it should be done by several review team members working together. The protocol should define how this should be organized. For example, for each subset of primary studies addressing a specific research question (or several related research questions), two or more team members work together to develop codes for data from two primary studies, and then revise/extend the codes by assessing the data from the other related projects one project at a time. Other team members should review the codes once all the data for a specific subset of primary studies have been assessed.

Verification:

The data analysis and synthesis process should be trialed both on data from the data extraction trials and also from artificially generated data.

Validation:

For quantitative analysis, both the analysis process should be represented as an analysis script (to support reproducibility).

Risks:

R1: The search will uncover primary studies of a type that were not anticipated during the development of the protocol.

R2: The format and/or the storage of the extracted data is inappropriate for the required analyses.

Risk Reduction:

To address risk 1: Addressed by the risk process adopted during data definition.

To address risk 2: The trials of the data analysis and synthesis process should check that the format of the extracted data is suitable for the required analysis.

Outcome:

A description of the planned method for data analysis/synthesis reported in the study protocol.

Conducting Data Analysis and Synthesis Process

Goal:

To identify one or more findings that answer all the SR research questions.

Input:

A list of all primary studies relevant to a specific research question (or a set of closely related research questions).

The completed data collections form(s) for the specific subset of primary studies.

The protocol defining the data analysis and synthesis process.

Method:

Organizational Process

The leader of the review team needs to allocate review team members to data synthesis as defined in the study protocol.

Data Analysis and Synthesis Process

The data analysis and synthesis process should be conducted as specified in the protocol.

Risks & Risk Mitigation:

Addressed by the risk process adopted during data definition.

Outcome:

One or more findings answering each SR research question linked to the specific set of primary studies that contributed to the finding.

Systematic Review Guidelines

Systematic Reviews

Secondary Study Guidelines

Conduct: Selection Process

Conduct: Data Definition and Data Extraction Process

Conduct: Data Analysis and Synthesis