Reviewing medical literature
a primer for those living with Celiac disease
The safety, toxicity or otherwise of gluten…
Author: Dr Geoff Forbes The author has no conflicts of interest
Author: Dr Geoff Forbes (MBBS, MD, FRACP) is a Gastroenterologist at Royal Perth Hospital, Perth, Australia; and Clinical Professor with the University of Western Australia. He is a clinician with expertise in immune conditions of the gut and in clinical nutrition. He is a teacher and researcher, and as he has family members with Celiac disease, he has a first hand knowledge of living with the gluten-free diet.
Interpreting medical literature is difficult, time consuming, and easy to get wrong, even for the most knowledgeable. However, when someone has a life-long, but readily treatable condition such as Celiac disease, being informed is important. And before I receive objections to this statement, you will notice I said readily treatable, not easily treatable! Understanding empowers the individual, makes living with a GF diet easier, and is likely to result in improved health. I would not like to think that my missive will lead to CD patients turning to the medical literature in droves, but I hope that it provides some basis from which to be more enquiring of published data and reports. Example: The manuscript by Carlo Catassi and colleagues: ‘A prospective, double blind, placebo-controlled trial to establish a safe gluten threshold for patients with celiac disease’. Free Full text of Catassi 2007 research paper
Analysing medical literature can be difficult for medical practitioners, so how does the lay person achieve this? More often than not, an internet search provides ready access to information, but is information that has been interpreted by others, or represents their personal views. Studies published in medical journals are almost always independently reviewed by medical peers, and by the editorial board of the journal prior to publication. This leads to improved publication quality, and reduces the risk of medical research adversely influencing subsequent medical progress. It is important to appreciate that it is uncommon for any single manuscript to result in a paradigm shift in thinking for any single disease. Most manuscripts add a small piece to the jigsaw. Accordingly, any single manuscript needs to be read and interpreted in the context of published data from other research groups.
Celiac disease (CD) patients or, if children, their parents need to establish “ownership” of their treatment by following a gluten-free (GF) diet, and this contrasts with many other diseases where patients can more readily have treatment supervised or ‘provided’ by a doctor or other health-care professional. Accordingly, CD patients are often well informed, and seek a deeper understanding of the need to remain GF and how to best achieve this. The aim of this manuscript is to provide a basis to understand how a clinical trial is established, run, interpreted and reported. This is done by taking the reader through one published study, considered by some experts highly influential in informing current and proposed future GF standards internationally.
First, a mathematics question as a prerequisite to evaluating the study cited below: what is the relationship between 20 parts per million (ppm) and 10 mg gluten per day? 20 ppm gluten (equivalent to 20 mg gluten in 1 kg [1 million mg] food) is the gluten threshold below which food manufacturers must achieve for foods to be labelled ‘gluten free’ in Europe, Canada and from August 2014, USA. If a person eats 500 g of food per day, 20 ppm of 500 g is 10 mg. 10 mg is present in 1/250th of a slice of bread containing 2.5 g of gluten.
Establishing a reason for conducting a study
The manuscript by Carlo Catassi and colleagues: ‘A prospective, double blind, placebo-controlled trial to establish a safe gluten threshold for patients with celiac disease’- represents a major logistic undertaking. The underlying premise for the study is extremely important for patients with CD, but also for the Food Industry and regulatory authorities. The abstract to this, and all abstracted articles, provides a ‘snap-shot’ of the entire article. The article itself should then take the reader through the logic of establishing the study, the aim or objective of the research, how the study was conducted, what the results were, and then a discussion of key findings and how they relate to other published research. The first section within the body of the Catassi manuscript, the introduction, appropriately outlines the background to CD and the need for a GF diet. It points to uncertainty about the potential ‘toxicity’ for CD patients of trace amounts of gluten in foods. ‘Toxicity’ can be defined in a variety of ways, such as the causation of symptoms, or the development of complications (eg bone loss, anemia, infertility), or damage to the small bowel lining. On this background, the researchers proposed to ‘investigate the toxicity of gluten traces in the celiac diet’, by studying the effect of gluten on the small bowel lining. This forms the ‘aim’, or ‘objective’ of the study; often a study will also state a ‘hypothesis’ to be proven or disproven, defined before the study is developed, and around which the aim is then established. This does not occur in the Catassi manuscript; if it had, it might have been, for example, ‘that 10 mg gluten per day is safe in 95% of CD patients’, or something to that effect. The aim of a study is sometimes reflected in the study title: in this instance, the authors confuse the expressed aim of investigating ‘the toxicity of gluten’, with the title of establishing ‘a safe gluten threshold’; safety and toxicity of gluten have very different meanings for those with CD, and this is expanded upon later.
Ethics and study design
Having decided that a clinical study is required, investigators discuss how it should be designed, or carried out, to address the proposed aim. The design of a study needs to be practical and allow it to be completed in a timely fashion, but not subject participants to unreasonable risk. This can be a fine balance, and ultimately a study protocol requires approval by an Ethics Committee that adheres to certain principles including respect for human beings, research merit and integrity, justice, and beneficence (www.nhmrc.gov.au/guidelines/publications/e72). Many medical journals also require that research studies are listed on a web-based program (as was the Catassi study, on clinicaltrial.gov; identifier number NCT00250146) in advance of the research being undertaken, and this provides an additional level of governance over study conduct. Finally, it is usual for ‘informed consent’ to be obtained from individual participants; this involves disclosure of the rationale behind the study, how it is conducted, and benefits or risks for the participant.
In the ‘Methods’ section of a manuscript, the study design is described. The Catassi study sought to compare the effect of two different doses of dietary gluten in patients with well controlled CD. The study has a number of positive attributes, as outlined in the study title. A prospective study (collecting data in a forward-planning way over time) is usually better than a retrospective analysis (collecting data on events that have already occurred). A double-blind study is one in which neither investigator, nor participant, knows which treatment is being received- this is better than either a single-blind study or unblinded study. Further the Catassi study is ‘placebo controlled’- in one of the three study groups subjects receive a placebo, and this is the study group against which the two ‘gluten’ study groups are compared. This study is multicenter, involving teams at different hospitals; this assists with patient recruitment, and can help improve the study because of regular review or advice from research colleagues.
Defining the study population, study intervention and outcome measures
Ideally, the characteristics of the study population (for example, age; gender; type of, or severity of disease; other illnesses) are such that the outcomes in the study population can be applied to the broader population, in this case all those with CD. In the Catassi study there were 37 women and 12 men, aged 20-55, who had well-controlled CD. These demographics are reasonable, because it would be ethically difficult to justify this study in children, and there is a requirement for good disease control in order to evaluate the adverse effects of added gluten. However, by having study entry restrictions, there can be unwanted consequences: for example, the data arising from the Catassi study may not be applicable to children. Further, because study participants were those with reasonably well controlled disease, they may have been a group less likely to respond adversely to additional gluten. As Catassi does not give the size of the overall CD population (possibly hundreds of CD patients) from which the study population was selected (49 patients) and studied (39 patients), it is difficult to know how applicable these data are to the broader population of CD patients.
In understanding the study intervention, it can be useful evaluating the study along a time-line. In the Catassi manuscript, 49 subjects were initially identified and asked to remain strictly GF, and then had a more detailed assessment a month later. This assessment included clinical examination, interview about diet, blood test for celiac antibodies (serology), and a small bowel biopsy. At this time, seven subjects were found not suitable for further study participation; for four of these seven this was because of inadequately controlled CD at initial small bowel biopsy. The possible impact on applying study findings to the broader CD population by studying only those with well controlled CD has been referred to in the previous paragraph. The remaining 39 patients were then randomised to one of three groups; 13 to receive 10 mg gluten per day, 13 to receive 50 mg gluten per day, or 13 to receive a placebo each day, for 3 months. Neither investigator, nor patient, knew which group they were allocated to, as capsules appeared identical- in other words, the study intervention was ‘double-blind’. It is also useful to understand what monitoring occurred during the study, both to check compliance with study protocol, and subject well being. In this instance, patients had weekly telephone calls, but not a more rigorous assessment of compliance, such as a count of pills taken.
Study outcome measures need to be meaningful and relevant. In other words, by what measure or parameter does a researcher best evaluate the outcome, or result, of the study intervention in participants? The best, or most consistent, marker of CD control remains histology (how biopsies appear under a microscope), which Catassi evaluated for each participant, before and after three months of daily placebo, 10 mg, or 50 mg gluten. Serology (Celiac antibody blood tests) was also measured although, as Catassi acknowledges, serology is not as accurate as histology. In order to maximise the rigour by which histology was assessed, two experienced pathologists examined the specimens. Exposure to gluten in CD leads to microscopic changes in the surface lining of the small bowel including shortening of villi, deepening of crypts (resulting in a reduced ratio of villous height to crypt depth, or Vh/Cd), and an increase in inflammatory cells (intraepithelial lymphocytes, or IEL). Accordingly, these were the histology outcome measures in the Catassi study. It is unreasonable to expect the non-specialist to know whether this is an acceptable way to measure outcome, and so this is an example of where it can be difficult to interpret medical literature without a detailed specialty understanding. In fact, it is a reasonable outcome measure, but with three caveats. Firstly, if the study was being conducted in 2014, subjects would also likely have a biopsy of the first part of the small bowel, in addition to the biopsies taken in the Catassi study, for this improves the accuracy of picking up CD abnormalities. Secondly, although we have no better marker of disease control, it can reasonably be asked whether this is an acceptable marker of what is now recognised as a systemic disease (one that cause problems beyond the bowel), but with origin in the small bowel. Finally, and as acknowledged by Catassi, the study was limited to a 3 month challenge on ethical grounds, but it is known that it may take longer for abnormal histology to develop with longer exposure to gluten.
Statistical advice is often sought from an expert biostatistician prior to seeking Ethics Committee approval, to help receive guidance on the relevance of comparing one set of measures before, with those after an intervention. For example, in the Catassi paper, there is a statistical comparison of Vh/Cd ratio and IEL count, before and after the gluten challenge. A statistician can also give advice on the number of patients that a study should have, based on hypothesised outcomes, in order to ensure the study results are meaningful. For the Catassi study, the number of patients to be studied, or sample size, was calculated based on anticipated changes in histology.
Presenting the data
The results section of a manuscript should summarise the details of the population studied, together with the actual outcome measures of the study. It should describe any side effects within the study period, and any transgressions from the study protocol.
Omitting a participant- ‘intention to treat’ or ‘per protocol’ analysis: If a subject drops out of a study, or cannot be evaluated, consideration needs to be given to whether data should be analysed in the final report on an ‘intention-to-treat’ basis, or ‘per-protocol’. If, for example, a subject withdraws from a study researching a new drug because of intolerable drug side effects, then the drug has not been successful in treating the underlying condition, and this subject should generally be included in an analysis that is ‘intention-to-treat’. Conversely, if a subject withdraws for reasons unrelated to the study (such as a change of residence), then a per-protocol analysis is more appropriate, where the subject is not included in the final analysis.
Catassi first describes the characteristics of the population studied, as is usual. One subject within the 10 mg gluten group withdrew from study because of typical symptoms of CD. This subject did not have small bowel histology evaluated at 3 months, and was removed from the analysis. This is only acceptable inasmuch as the outcome measure of histology required small bowel biopsies to be taken. However, this subject was considered by the investigators to have CD symptoms, and so clearly had not tolerated 10 mg / day gluten, and was likely to have had worse appearances on histology. Had the authors undertaken an intention-to-treat analysis, based on any measure of worse disease (ie. symptoms or reduced Vh/Cd ratio), there was worsening of disease in 2 of 13 subjects receiving placebo, at least 5 and perhaps as many as 8 of 14 subjects who received 10 mg gluten, and 10 of 13 subjects receiving 50 mg gluten. The reason for uncertainty for those receiving 10 mg gluten is that for three of the seven subjects with a reduction in Vh/Cd ratio, this reduction was small (Catassi, figure 2), and might be considered to reflect no real change in histology.
Outcome measures- statistical significance but clinical irrelevance? The results section then details the findings of serology and histology. Celiac antibody levels are usually elevated in CD patients not taking a GF diet. Within the serology section there is statistical comparison of celiac antibody levels before and after gluten challenge. It is important in clinical trials to recognise there can be numerical statistical significance, but clinical irrelevance. In this instance, the Celiac antibody levels remain well within the normal range. When changes in Celiac antibody levels occur within a normal range, these changes are not clinically meaningful.
Similarities between groups, and over time: When analysing the histology data, which represents the key outcome measure, it is important to be reassured that there is no difference between the three groups at baseline. This is hopefully achieved in clinical studies by the randomisation process, and is reflected in Catassi’s figure 2. The authors report that the correlation between changes in Vh/Cd and IEL over the three month study period was poor; in other words, that for individual patients, if one parameter improved, or worsened, it was not necessarily closely reflected by the other parameter. The non-specialist cannot be expected to know whether this is important or not; it reflects the difficulties in using histology scores to assess outcomes, but this finding supports measuring both, rather than one measure alone.
Benefit of a placebo group: The Catassi study provides a good example of the benefit a placebo group provides. When 13 CD subjects received placebo for 3 months, histology improved in 11 (see Catassi, figure 2). This is important to be aware of when interpreting data from the other two groups for whom added gluten was given for 3 months. In other words, had 10 mg or 50 mg gluten per day not had any adverse effect on subjects, both these groups should have had a similar improvement in histology as the placebo group. The enquiring mind should also ask, ‘no change is expected with placebo, so why did they improve?’ The authors very reasonably suggest that this is because the GF diet became more strict during the study period, even though the participating subjects were thought to have had a strict GF diet at the outset.
‘Placebo’ versus ‘control’ group: A ‘placebo’ group can also be referred to as a ‘control’ group- a group that does not receive the intervention being tested. Perhaps confusingly, but appropriately, the Catassi study had yet another ‘control’ group- a group of patients without CD against whom the CD subjects initial histology assessments were compared. This turns out to be important for, compared with non-Celiac controls, the CD subjects had abnormal histology before the study, despite taking a GF diet. That there was abnormal histology at study commencement, and improvement in the placebo group during study supports the concepts that, firstly, low levels of gluten contamination commonly lead to abnormal histology without symptoms and secondly, that these microscopic changes can improve with greater diligence and care taking a GF diet.
What should the reader expect from the discussion section?
The discussion of a manuscript summarises major findings and their implications, and places them in perspective to what is known about the topic by referencing relevant previous research. It is also appropriate for authors to be self-critical in respect the study design and conduct. The primary conclusion of the Catassi study, as reflected in the abstract, is correct: ‘the ingestion of contaminating gluten should be kept lower than 50 mg / day in the treatment of CD’. This is because 50 mg / d caused histological abnormalities in the majority of CD subjects in this study.
It is stated that ‘we were not able to reach firm conclusions about the potential toxicity of 10 mg / d’. This statement is factually correct, for comparisons between groups are undertaken on statistical grounds, and there was not a statistical difference reached in the measured endpoints between the 10 mg / day group and placebo group. On the other hand, of 14 subjects receiving 10 mg gluten / day, one developed CD symptoms and histology worsened in at least four, and possibly as many as seven, other subjects. Accordingly it is unreasonable for the authors to conclude that ‘on the basis of the evidence from the current study and the quoted literature, it appears that 50 mg / day is the minimum dose required to produce measurable damage to small intestinal mucosa in CD patients’. Of course, the smaller dose of 10 mg / day also made some subjects worse.
The authors reflect upon subjects improving after receiving placebo; that serology is not the ideal test to follow disease control; and that there is significant variability in sensitivity to gluten between CD patients. All these relate in some way to the findings in the results section. The authors also acknowledge some of the limitations of the study: that it was only a 3 month evaluation; and uncertainties about the influence of the somewhat artificial, but scientifically necessary, way that gluten was administered.
Appropriately, the authors reflect on a gluten threshold in the penultimate paragraph of the discussion. The gluten threshold is referred to in the mathematics question at the start of this article. In establishing a threshold, many factors require consideration, most importantly toxicity, or conversely safety for CD patients. Unfortunately, Catassi and colleagues do not draw a distinction between safety and toxicity. When setting a threshold of gluten above ‘zero’, this should be at a level that is safe for the significant majority, and ‘toxic’ only for the negligible minority. Catassi establishes a lowest level threshold (50 mg / day) where gluten is ‘toxic’ for the significant majority. The lowest level threshold where gluten was safe for the significant majority, and ‘toxic’ only for the negligible minority, was ‘no additional gluten’ (the placebo group). The study has not demonstrated safety of 10 mg gluten / day for a significant majority of CD patients.
Interpretation by others of a study after publication
Dangers exist in the promulgation of statements that, unwittingly or otherwise, do not accurately reflect the findings or conclusions of a study. Although this may occur more easily in an internet era of readily accessible data, it can also make it easier to re-examine source information. For example, Australian colleagues have unfortunately quoted the Catassi research as providing evidence that a 20 ppm GF threshold is safe, and as key evidence in support of raising the GF standard in Australia from ‘no detectable gluten’ to <20 ppm="" a="" href="http://www.pc.gov.au/__data/assets/pdf_file/0008/82286/sub046.pdf" data-mce-href="http://www.pc.gov.au/__data/assets/pdf_file/0008/82286/sub046.pdf">http://www.pc.gov.au/__data/assets/pdf_file/0008/82286/sub046.pdf). In another example from the USA, in the lead-up to the FDA decision on a GF threshold, a Catassi co-author extolled the importance of evidence based science, and stated ‘‘limits must be established through double-blind, randomized trials such as the one conducted by our center in 2007. The three-month trial showed that a daily intake of 10 mg of gluten for three months by adults with celiac disease caused no intestinal damage’ (http://www.glutenfreediet.ca/img/Fasano_letter.pdf). Following the August 2013 FDA announcement on a GF threshold, the same co-author stated that ‘the evidence-based research published by our Center, which has been confirmed by studies from colleagues around the world, conclusively supports the 20 ppm level as a suitable safety threshold for gluten-free products.’ The media release went on to claim that research from the Center has shown that 10 mg per day of gluten consumption is a safe level for the vast majority of individuals with CD (http://www.massgeneral.org/children/about/pressrelease.aspx?id=1609). Neither has this group published data conclusively supporting 20 ppm, or alternatively 10 mg gluten / day as a suitable safety threshold, nor has any other group around the world; and certainly not for the vast majority of celiac patients.
Summary and conclusion
Catassi and colleagues gave CD patients on a GF diet either placebo, 10 mg gluten, or 50 mg gluten each day for 3 months, and evaluated what happened to the small bowel lining. 10 mg gluten is found in 500 g of food containing gluten at a concentration of 20 ppm (equivalent to 20 mg gluten in 1 kg [1 million mg] food). 20 ppm is the gluten threshold below which food manufacturers must achieve for foods to be labelled ‘gluten free’ in Europe, Canada and the USA. 10 mg is present in 1/250th of a slice of bread containing 2.5 g of gluten.This study suggests that:
- 50 mg gluten each day for 3 months damages the small bowel lining in a majority of CD patients;
- 10 mg gluten each day for 3 months makes CD worse in some CD patients;
- no additional gluten to a GF diet was the only ‘dose’ which was ‘safe’ for the significant majority;
- there is variability between CD patients in the sensitivity to gluten;
- CD patients may develop small bowel damage to gluten contamination without developing symptoms, and
- this may occur despite taking what is thought to be a strict GF diet.
The exercise of interpreting this study has shown:
- unanticipated findings or observations may arise;
- there is an important difference between ‘safety’ and ‘toxicity’ of gluten, which in this instance the authors do not make clear;
- how the characteristics of a study population need to be the same as the overall population for the study results to be applicable to the overall population;
- that the measures of study outcome need to adequately reflect the beneficial, or adverse, effects of an intervention;
- within the discussion section of a manuscript, the proposed clinical implications of the study should be accurately aligned with the study results;
- post-publication study interpretation, even by authoritative figures, may not necessarily reflect published data.
Interpreting medical literature is difficult, time consuming, and easy to get wrong, even for the most knowledgeable. However, when someone has a life-long, but readily treatable condition such as Celiac disease, being informed is important. And before I receive objections to this statement, you will notice I said readily treatable, not easily treatable! Understanding empowers the individual, makes living with a GF diet easier, and is likely to result in improved health. I would not like to think that my missive will lead to CD patients turning to the medical literature in droves, but I hope that it provides some basis from which to be more enquiring of published data and reports.