Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (22)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wiklund, S. J.
Right arrow Articles by Agurell, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wiklund, S. J.
Right arrow Articles by Agurell, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Mutagenesis, Vol. 18, No. 2, 167-175, March 2003
© 2003 UK Environmental Mutagen Society/Oxford University Press

Aspects of design and statistical analysis in the Comet assay

Stig Johan Wiklund2 and Eva Agurell1

1 AstraZeneca R&D Södertälje, S-151 85 Södertälje, Sweden


    Abstract
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Some aspects of the statistical design and analysis of the Comet (single cell gel electrophoresis) assay have been evaluated by means of a simulation study. The tail length and tail moment were selected for the quantification of DNA migration. Results from the simulation study showed that the choice of measure to summarize the cells on each slide is extremely important in order to facilitate an efficient analysis. For tail moment, the mean of log transformed data is clearly superior to the other evaluated measures, whereas using the mean of raw data without transformation can lead to very inefficient analyses. The 90th percentile, capturing the upper tail of the distribution, performs well for the tail length, with a slight improvement obtained by applying a log transformation prior to calculations. Furthermore, the simulation study has been used to assess the appropriateness of some models for statistical analysis and to address the issue of design (i.e. number of cultures or animals in each group, number of slides per animal/culture and number of cells scored per slide). Combining the results from the simulations with practical experience from the pharmaceutical industry, we conclude the paper by providing concise recommendations regarding the design and statistical analysis in the Comet assay.


    Introduction
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
The Comet (single cell gel electrophoresis) assay is a sensitive method to assess DNA damage. It is based on the principle of quantifying the amount of denatured DNA fragments migrating out of the cell nuclei during electrophoresis. The assay has during the last decade gained widespread use in various areas and has emerged as a standard tool in the pharmaceutical industry for the assessment of the safety of potential new drugs.

Attempts have been made to develop international guidelines to standardize the use of the assay. In particular, a consensus report was recently published (Tice et al., 2000Go) as an outcome of the International Workshop on Genotoxicity Procedures. However, with regard to statistical treatment of the results the paper does not provide any useful guidance as ‘there was no consensus among the expert panel as to the most appropriate statistical method(s) to use’.

A few authors have addressed issues of design, summary measures and analysis in the Comet assay. Bauer et al. (1998)Go pointed out that data from Comet assays do not generally follow a normal distribution. However, they do not provide any guidance as to the consequences of this for the practitioner in the field. Lovell et al. (1999)Go presented a simulation study which demonstrated a compelling illustration of the hazard in neglecting the hierarchical nature of Comet assay data.

We feel that the few recommendations that have been made with respect to the Comet assay (see for example Lovell et al., 1999Go; Tice et al., 2000Go) are too general to be practically useful and they often lack empirical foundation. An actual comparison of statistical methods in the context of the Comet assay is still to be found in the literature. There is a need for this gap to be filled and to this end we have performed an extensive simulation study, evaluating various aspects of the statistical treatment of Comet assay data. The objective was to provide basic statistical results that would be useful to practitioners when deciding on design and statistical analysis. We present in this paper a set of clear and concise recommendations for experimental design and statistical analysis that are based on both the simulation results and on our own practical experience with the Comet assay.


    Material and methods
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Empirical data
This study focuses entirely on the two historically most used measures of DNA migration; the tail length and the tail moment. The tail length is defined as the distance from the centre of gravity of the nucleus, i.e. the position of the maximum fluorescence intensity over the nucleusl, to the end of the tail. The tail moment is defined as the product of DNA in the tail and the mean distance of migration in the tail according to Olive et al. (1990)Go. The DNA migration was measured using an automatic image analyser (Colourmorph or Comet Assay II; Perceptive Instruments, UK) connected to a fluorescence microscope. Images were captured using a high sensitivity CCD black and white video camera connected to an IBM compatible PC equipped with a Colourmorph/Comet Assay II video card. The DNA migration was measured simultaneously with image capture. In order to obtain relevant results from the simulation study it was essential to base the simulation on assumptions that accurately reflect empirical data from experimentally performed Comet assays. The studies described below represent different designs and test compounds and the parameters for the simulation study were selected on the basis of these performed studies.

In vitro studies
Individual cell data from five individual Comet tests performed in rat hepatocytes in vitro form the basis of the data for the simulation study. Two slides were prepared from each culture. A total of 136 slides with 50 cells measured for DNA migration on each slide resulted in a total of 6800 scored cells with individual tail length and tail moment values. The designs of the studies were as follows.

In vitro study 1. Duplicate cultures of negative control, duplicate cultures of positive control and single cultures of six different concentrations of a test compound.

In vitro study 2. Duplicate cultures of negative control, positive control and seven different concentrations of test compound.

In vitro study 3. Duplicate cultures of negative control, positive control and seven different concentrations of test compound.

In vitro study 4. Duplicate cultures of negative control, positive control and three different concentrations of test compound.

In vitro study 5. Duplicate cultures of negative control, single cultures of two different concentrations of positive control and eight different concentrations of test compound.

In vivo studies
Individual cell data from three Comet studies performed in animals form the basis of the data for the simulation study. One to three cell types, parenchymal liver cells, non-parenchymal liver cells and white blood cells, were examined in the tests. Three slides were prepared from each animal and tissue. A total of 360 slides with 50 cells were measured for DNA migration on each slide resulting in a total of 18 000 scored cells with individual tail length and tail moment values. The designs of the studies were as follows.

In vivo study 1. A negative control, positive control and two different sampling times with one dose level of a test compound were used. Five mice were used for each group. Parenchymal liver cells, non-parenchymal liver cells and white blood cells were examined.

In vivo study 2. A negative control, positive control and one dose level of a test compound with two different sampling times were used. Five rats were used for each group. Parenchymal liver cells and white blood cells were examined.

In vivo study 3. A negative control, positive control and one dose level of a test compound with three different sampling times were used. Four rats were used for each group and white blood cells were examined.

Statistical techniques
The mean, median and the 90th percentile were chosen as measures to summarize the tail length and tail moment values for each slide. The rationales for these choices are as follows: the mean was included as being the most commonly used summary measure; the median is a frequently recommended measure for highly skewed distributions, as are present for tail length and tail moment; the 90th percentile provides a measure which is not affected by the most extreme outliers yet focuses on the upper part of the distribution in which treatment effects for these variables are often revealed. Each of the three measures have been calculated on both the raw data and on data subjected to a logarithmic transformation. For tail moments, a small constant (0.001) was added to the data prior to calculations to circumvent the potential problem of taking the logarithm of 0.

We have selected a few relatively simple statistical models and tests for our comparison, as summarized in Table IGo. The evaluated techniques comprise both tests for dose-related trend and tests for overall group comparisons. The tests for trend in the linear models are performed using linear contrasts. Most statistical models and tests use slide as the unit of measurement, the exception being one application of the 1-way linear model using culture/slide as the unit of measurement. The 3-way models both cater for the hierarchical nature of data allowing for variation between culture/animal, although using data for each slide. All the reported results are based on two-sided tests. For the statistically less inclined reader, it may be useful to note that the general linear models referred to here share a lot of similarities with the better known concept of ANOVA models. As most standard statistical software nowadays supports both types of models, we have chosen to use the more flexible linear models rather than traditional ANOVA models. Calculations were made in the SAS® system, v.8.0 (SAS Institute Inc., 1999Go). The statistical models were estimated with the MIXED and NPAR1WAY procedures and the random numbers were generated using the RANUNI and RANNOR functions. Further details and references for the applied statistical models, in most cases together with the corresponding SAS code, are given in the following paragraphs.


View this table:
[in this window]
[in a new window]
 
Table I. . Selected statistical models and tests
 
The Jonkheere–Terpstra test is a non-parametric test for a monotonic trend in the response to increasing doses (see for example Hollander and Wolfe, 1999Go, pp. 202–204). It is designed to detect alternatives of the type {theta}0 <= {theta}1 <=...<= {theta}k, with at least one of the inequalities being strict, where the parameter {theta}i denotes the response in group i. The test was performed using a normal approximation for the test statistic, with the following core SAS® statements:

The Kruskal–Wallis test (see for example Hollander and Wolfe, 1999Go, pp. 190–192) is a non-parametric equivalent to the 1-way ANOVA model, looking for general differences between groups, i.e. {theta}i != {theta}j, for at least some i and j. The Kruskal–Wallis test was performed in its standard version, using a normal approximation for the test statistic, the SAS® statements being:

The linear regression model was the only analysis in which actual dose levels were incorporated. The doses chosen for the simulations were {0, 1, 3, 10, 30, 100, 300} for patterns A–C and {0, 1, 2, 3} for pattern D (doses in arbitrary units). Linear regression was performed in a standard fashion, with details to be found in any textbook on statistics.

Several versions of general linear models were evaluated. The models were all fitted with the SAS® procedure MIXED and with corresponding MODEL statements given below.

  • 1-way linear model, with group as a single fixed effect


    The 1-way model was fitted both using animal/culture as well as slide as the unit of measurement.

  • 2-way linear model, with group and electrophoresis as fixed effects


  • 3-way nested linear model, with group, electrophoresis and culture (in vitro) or animal (in vivo) as fixed effects. The culture or animal effects was defined as nested within group to ensure estimability of the model.


  • 3-way mixed effects linear model, with group and electrophoresis as fixed effects and culture (in vitro) or animal (in vivo) as random effects.



The tests for overall group difference in the linear models have been performed with the usual F-test (the SAS® Type III test) for the GROUP factor. The trend tests are based on a F-test for linear contrasts, K'{theta}, in the GROUP factor, where {theta} is the vector of {theta}i values and the vector of coefficients is:


Corresponding SAS® statements are:


Simulation study
To facilitate a realistic simulation study, reasonable approximations of tail length and tail moment distributions were needed, and these were obtained using data from the negative control groups in the studies described earlier. It turned out that a mixture of two log-normal distributions provided an adequate approximation of the tail length distribution, L’, which consequently can be obtained by exponentiating two normal distributions:


where L*1 ~ Nl1, {sigma}l1), L*2 ~ Nl2, {sigma}l2) and p1 gives the proportion of data arising from a ‘long-tailed’ part of the distribution. The distribution of tail lengths in empirical and simulated data are shown as histograms in Figure 2Go.



View larger version (44K):
[in this window]
[in a new window]
 
Fig. 2. . Histograms of the distribution of log-transformed tail lengths in (a) simulated and (b) empirical in vitro data.

 
The distribution of the tail moment is clearly different as 0 tail moments are common with the definition used. We applied the transformation log(M + {delta}), with the constant {delta} added to allow the transformation of the 0 moments, and choosing {delta} = 0.001 resulted in an approximate normal distribution for the non-zero part of the data. Consequently, the tail moment, M' can be characterized by a mixture of a point mass at 0 and a log-normal distribution, defined as


where M* ~ Nm, {sigma}m), and with the additional proviso that potential negative values of M' are replaced by 0. The distribution of tail moments in empirical and simulated data is shown as histograms in Figure 3Go. Figures 2 and 3GoGo illustrate a good resemblance between simulated and empirical distributions for both tail lengths and tail moments.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 3. . Histograms of the distribution of log-transformed tail moments in (a) simulated and (b) empirical in vitro data. Note that the leftmost bar mainly consist of those cells with zero tail moments.

 
The simulation study allowed for variation between slides, S, between electrophoreses, E, between cultures (in vitro), C, or animals (in vivo), A. Accounting for these sources of variation, the length and moment was generated as:


S, E, C and A are generated according to log-normal distributions, e.g. for variation between slides logS ~ N(0,{sigma}s), and similarly for the other random effects. The factors kl and km are constants included to represent a multiplicative treatment effect. As given by the equations above, the characterization of the simulated data is given by the parameters µl1, {sigma}l1, µl2, {sigma}l2, µm, {sigma}m, {sigma}s, {sigma}e, {sigma}c and {sigma}a. The values assigned to these parameters were based on empirical study data and are presented in Table IIGo. One may note the large standard deviation, {sigma}m, in the non-zero part of the tail moment distribution, which results in highly variable and skewed data, with a few very large moments together with a majority of small ones.


View this table:
[in this window]
[in a new window]
 
Table II. . Parameter values used in the simulation study
 
A treatment effect can be defined by varying the four parameters pl, pm, kl and km. Treatment effects are presumed to cause an increase in tail length or tail moment, which corresponds to an increase in pl, kl or km, but a decrease in pm. The ‘null’ values assigned to unaffected groups, e.g. the negative control, are kl = km = 1, pl = 0.04, pm = 0.2 (in vitro) and pm = 0.3 (in vivo). When the treatment affects pl and pm it will be referred to as an effect of type P, changes in kl and km will be referred to as an effect of type M and the combination of the two will be referred to as effect type C. Capital letters for effect type represent change in proportion, multiplicative effect and combined effect, respectively. It may be noted that the defined distribution and treatment effects imply that data have higher variance in groups affected by treatment.

For the in vitro simulations four different patterns (dose–response relationships) for treatment effects were selected (schematically illustrated in Figure 1Go). The four patterns were chosen to reflect situations that occur in practice. Patterns A–C comprise seven dose groups (including a negative control) and might represent early phase studies, prior to which little is known about the effective dose range, whereas pattern D, with four dose groups, represents a later phase study in which doses can be chosen more accurately. Additionally, the ‘null’ situation of no treatment effect (pattern 0) is included in the simulations. The in vivo study was evaluated with three treatment groups, including a negative control. The treatment groups in vivo are frequently two exposure times and we have evaluated both the situation in which the larger treatment effect is seen after the longer exposure time (pattern E), as well as the situation where a larger effect is seen after shorter exposure (pattern F). The values used to define treatment effects in the different treatment patterns are given in Table IIIGo. The choice of treatment effect types and patterns, as described in the previous section, was guided by the empirical, non-control data together with our previous experience from comet assays.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1. . Simulated patterns of in vitro treatment effects.

 

View this table:
[in this window]
[in a new window]
 
Table III. . Parameter values defining treatment effects in the main part of the simulation study
 
For the in vitro data we have simulated designs with 2 or 3 cultures/group, 2, 3 or 4 slides/culture and 25, 50 or 100 cells/slide. For in vivo data the simulated design comprises 3, 4 or 5 animals/group, 2, 3 or 4 slides/animal and 25, 50 or 100 cells/slide. For each combination of treatment effect type, effect pattern and study design, 2000 data sets were generated, except for the no-effect situation (pattern 0), for which 10 000 data sets were generated. For each data set, the summary measures mean, median and the 90th percentile were calculated. The data were analysed with each of the statistical models given in Table IGo. The main end-point for evaluation was statistical power, i.e. the proportion of data sets for which the null hypothesis of no treatment effect could be rejected by the performed analysis.

Sensitivity analysis
Results from simulation studies are potentially dependent on the assumptions made and on the conditions chosen for the simulations. In the present study we judged the comparison between study designs to be the part most sensitive to alterations in assumptions. Additional simulations were therefore performed to assess the reliability of findings under varying conditions. The size of different sources of variability was altered by changing the values of the parameters {sigma}s, {sigma}e, {sigma}c and {sigma}a. Each of these parameters was assigned a lower and a higher value, resulting in six situations evaluated in vitro, represented by {sigma}slow, {sigma}shigh, {sigma}elow, {sigma}ehigh, {sigma}clow and {sigma}chigh, respectively, and similarly six situations evaluated in vivo. The lower/higher values were taken to be half/twice the parameter values given in Table IIGo.


    Results
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Comparing summary measures
Simulation results facilitating comparisons between summary measures are given in Tables IVGo (in vitro) and V (in vivo). For convenience, the tables only include results for the 3-way nested model, but the main conclusions are also valid for other statistical models. For both data types, the largest differences between measures are found for effect type P, i.e. when the treatment causes an increase in the proportion of high migration cells.


View this table:
[in this window]
[in a new window]
 
Table IV.. Comparisons of descriptive measures for in vitro Comet data
 
For the tail length, the best performance for effect type P is obtained with the 90th percentile, whereas a very poor performance, only slightly above the nominal significance level of 5%, is obtained when basing the analysis on the median. The differences between measures are generally smaller under effect type M. Under the combined effect type C, the 90th percentile and the mean have similar performances in the in vitro patterns A–D, but the 90th percentile is clearly superior under the in vivo pattern E.

The results show that the choice of summary measure is immensely important when dealing with the tail moment. Among the measures in our study, the mean of log transformed data is clearly superior to any of the other measures. In particular, taking the mean of the raw data leads to extremely inefficient analyses with almost no power at all to detect several patterns of effect. The superiority of the mean of log transformed data is most obvious in effect type P and also in the combined effect type C, whereas differences between measures are relatively small for effect type M.

Comparing statistical models
Simulation results facilitating comparisons between statistical models are given in Tables VIGo (in vitro) and VII (in vivo). The tables only give results for selected summary measures, but the main conclusions also apply to other summary measures. The differences in power between the models are generally smaller than the differences seen between summary measures reported in the previous section. The best general performance is found for the fixed effects linear models using data for each slide, among which there are only small differences in power. A slightly lower power was observed for the 3-way mixed effects and the 1-way linear models using animal/culture as the unit of measurement.


View this table:
[in this window]
[in a new window]
 
Table VI. . Comparisons of statistical models for in vitro Comet data
 
The outcome of comparisons between statistical models is similar for both in vitro and in vivo data. The non-parametric tests, i.e. the Jonkheere–Terpstra and Kruskal–Wallis tests, are generally less efficient than the linear models. The Jonkheere–Terpstra test is sensitive to the pattern of effect and although it compares well with the linear models for the well-behaved dose–response patterns (i.e. C, D and E), it has less attractive properties for the slightly more problematic patterns A and B. In the linear regression, the high dose group will have a large influence on the results and, consequently, the regression is extremely efficient in detecting pattern A effects, with the effect basically represented by substantially higher DNA migration in the high dose group. However, linear regression behaves poorly with other patterns of response, in particular pattern B.

In the in vitro data, the trend tests are more powerful than the corresponding overall tests for group differences, irrespective of scenario. The trend tests within the fixed effects linear models show relatively good performance even under the non-monotonic dose–response pattern B. In the simulated in vivo data there is clearly no monotonicity in pattern F and the trend tests show unacceptably poor properties. One may note that patterns E and F are equivalent in terms of the overall tests, yielding similar simulation results for both patterns in the overall tests.

The type I error rate, i.e. the proportion of rejections with no treatment effect (pattern 0), is around or below the nominal 5% for all statistical models.

Comparing study designs
Results from simulations to compare different study designs are given in Tables VIIIGo (in vitro) and IX (in vivo). Only the results for selected summary measures and for the 3-way nested model are included in the tables, but the findings would be similar for other measures and statistical models.


View this table:
[in this window]
[in a new window]
 
Table VIII. . Comparisons of designs for in vitro Comet studies
 
The results show that a substantial increase in power can be obtained by increasing the number of cells per slide from 25 to 50, whereas further doubling of the number of cells to 100 per slide generally does not provide a corresponding increase in power. For the tail length in vitro data a similar gain in power is obtained by either increasing the number of cultures per group or the number of slides per culture from 2 to 3 or by increasing the number of cells from 25 to 50. For the tail moment in vitro increasing the number of slides or the number of cultures gives a higher gain in power than doubling the number of cells per slide. For the in vivo data performance is substantially improved by increasing the number of slides per animal. Doubling the number of cells per slide or adding another animal to each group has less effect on the power than adding another slide for each animal. This finding is similar for both tail length and tail moment.

Sensitivity analysis
Selected results from the sensitivity analysis are given in Tables XGo (in vitro) and XI (in vivo). Obviously, the conditions with lower variability, i.e. {sigma}slow, {sigma}clow and {sigma}alow, yield higher statistical power than the corresponding high variability conditions {sigma}shigh, {sigma}chigh and {sigma}ahigh. The comparisons between various designs are, however, less affected by the change in variability and the relative efficiencies of different designs are not substantially different from those found in the main study in Tables VIII and IXGoGo. Altering the variability between electrophoreses only marginally affected the results and the situations corresponding to {sigma}elow and {sigma}ehigh and were therefore not included in Tables X and XIGoGo.


View this table:
[in this window]
[in a new window]
 
Table X. . Sensitivity analysis for comparisons between study designs for in vitro Comet studies
 

View this table:
[in this window]
[in a new window]
 
Table IX. . Comparisons of designs for in vivo Comet studies
 

View this table:
[in this window]
[in a new window]
 
Table XI. . Sensitivity analysis for comparisons between study designs for in vivo Comet studies.
 

    Discussion
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
The most interesting and important finding of our study is the dramatic difference in performance between the different summary measures for cells on a slide. Note, for instance, the difference in power between the mean on raw data and the mean on log transformed data revealed for the tail moment in Tables IV and VIGoGo. The importance of many other design issues fade in light of the finding that the power to detect a treatment effect can be increased from 5 to 37% or from 30 to 84% (Table IVGo, scenarios pattern A, type P and pattern A, type C, respectively) merely by taking the logarithm of the data prior to calculation. Further studies are encouraged on this apparently trivial, but obviously vital, issue of how to summarize the measurements from cells on a slide. Other summary measures deserve evaluation and studies in situations different from those used in our simulations would be valuable.

In an overall evaluation of the comparison between statistical models, the 2-way linear model yields the highest power, with the difference being only marginal to the other linear models. Hence, adding a factor for culture (in vitro) or animal (in vivo), as in the 3-way nested model, did not increase power. However, the 3-way model is still to be preferred as it captures the hierarchical nature of data and identifies the culture/animal as the highest experimental unit. Failing to do so can have severe consequences in situations where interculture/interanimal variation is larger than in our simulations. In this study we have deliberately chosen to evaluate only relatively simple statistical tools, excluding such potentially useful techniques as non-linear regression and generalized linear models. It is plausible that the analysis can be slightly improved by the use of more elaborate statistical techniques, but for most practical applications avoiding the worst pitfalls should be a priority and a substantial increase in complexity is not warranted by a potential slight increase in efficiency.

When evaluating the comparisons between designs, both the statistical power and the practical feasibility have to be taken into account. In the in vitro data a similar gain in power can be obtained by either increasing (from 2 to 3) the number of cultures per group or the number of slides per culture. Since in practice it is easier to increase the number of slides than the number of cultures, increasing the number of slides appears to be the more attractive option. Especially for tail length in vitro, the power is sensitive to the number of cells per slide. Hence, we argue that at least 50 cells should be scored from each slide. For in vivo data a substantial gain in power is obtained by increasing the number of slides. Correspondingly, attempts to enhance the power of an in vivo study are likely to be most easily accomplished by adding another slide for each animal, rather than including more animals or scoring more cells. However, when a large variation is anticipated the number of cultures or animals should also be increased.

It may be noted that the results to some extent depend on the conditions on which the study is based. Effort was therefore put into creating a realistic simulation study, with parameter settings based on empirical data. Additionally, a sensitivity analysis was conducted to assess the reliability of the results. The sensitivity analysis focused on the comparisons between study designs, by altering the size of various sources of variation, as this was suspected to be the part of the study most sensitive to changes in underlying conditions. Results from the sensitivity analysis indicated that the findings are reasonably robust against moderate changes in assumptions. Nevertheless, this does not preclude the possibility that other approaches or designs may be called for in applications clearly different from those anticipated in our simulations.

Our study has focused on some aspects of the Comet study, i.e. choice of statistical model, choice of summary measure and the design/size of the study. There are several other issues that are vital, such as randomization of experimental units to treatment groups, blinding of personnel with respect to treatment, etc. These are not explicitly dealt with in our study, but nevertheless are of utmost importance, and we refer to Hauschke et al. (1997)Go for a useful listing and comments on such issues. Our simulations are entirely based on statistical tests of a ‘no-effect’ null hypothesis and on the concept of the power of rejecting such a hypothesis. This choice was based on commonality and simplicity; statistical tests are commonly used in evaluating Comet assays and the power of a test is easy to calculate and interpret and it is therefore useful as an end-point for evaluation. We would like to emphasize, however, that in many toxicological applications the role of the null hypothesis could be reversed [cf. the discussion in Hauschke (1997)Go on ‘proof of hazard’ versus ‘proof of safety’]. Furthermore, it should be stressed that methods for statistical estimation are often more relevant than testing of hypotheses (Hauschke et al. 1997Go, section 7.7). Estimating the effect of a treatment, e.g. through a confidence interval for the effect size, yields an analysis better suited to bridge the alleged gap between statistical and biological significance. Many statistical models, the linear models (ANOVA) in particular, allow for both estimation and testing within the same framework. The relative efficiency of summary measures, models and designs is also likely to be the same when the statistical analysis is geared towards estimation, albeit we have for practical reasons used testing procedures for this evaluation.

The issue of whether to use one- or two-sided tests in applications of statistical tests is often discussed. It is obvious that with a one-sided test, if the significance level is kept constant, one will have a greater power to detect an increase (decrease) than by using a two-sided test. On the other hand, it is sometimes argued that instead of using a one-sided test one might as well use a two-sided test with twice the significance level. We consider this discussion to be a general one, not specific to the analysis of the Comet assay, and it is therefore left without recommendation in this paper.

Recommendations
Based on our interpretations of the results from the performed simulations, as well as our practical experience from the pharmaceutical industry, we provide the following recommendations on the design and statistical treatment of data from Comet assays.

  • Perform the statistical analysis with slide as the unit of measurement but with a statistical model that captures the hierarchical nature of data, e.g. a nested effects linear model.
  • For tail moments, use the mean of log transformed data (or equivalently the log of the geometric mean) as a summary measure for the cells on a slide.
  • For tail lengths, use a percentile from the upper tail of the distribution, e.g. the 90th percentile, of log transformed data as a summary measure for the cells on a slide.
  • A suitably designed linear model (ANOVA) provides a relatively simple yet efficient means of detecting treatment effects and important factors can be accounted for within the model. We recommend a model including factors for treatment group, electrophoresis (or other experimental conditions if appropriate) and culture (in vitro) or animal (in vivo).
  • Perform the test for treatment effect as a test for trend whenever there is a natural ordering of the treatment groups, such as increasing doses. We have used a contrast that is linear in the treatment groups and this has proven to be efficient and robust over a range of patterns of treatment effect.
  • For in vitro studies we recommend the following study design; 50 cells from 3 slides/culture and 2 or 3 cultures/group.
  • For in vivo studies we recommend the following study design; 50 cells from 3 slides/animal and 4 or 5 animals/group.


View this table:
[in this window]
[in a new window]
 
Table V. . Comparisons of descriptive measures for in vivo Comet data
 

View this table:
[in this window]
[in a new window]
 
Table VII. . Comparisons of statistical models for in vivo Comet data
 

    Notes
 
1 Present address: Medical Products Agency, Box 26, S-751 03 Uppsala, Sweden Back

2 To whom correspondence should be addressed. Tel: +46 8 553 26983: Fax: +46 8 553 28947; Email: stigjohan.wiklund{at}astrazeneca.com Back


    References
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 

    Bauer,E., Recknagel,R.D., Fiedler,U., Wollweber,L., Bock,C. and Greulich,K.O. (1998) The distribution of the tail moments in single cell gel electrophoresis (comet assay) obeys a chi-square ({chi}2) not a gaussian distribution. Mutat. Res., 398, 101–110.[Web of Science][Medline]

    Hauschke,D. (1997) Statistical proof of safety in toxicological studies. Drug Inf. J., 31, 357–361.

    Hauschke,D., Hayashi,M., Lin,K.K., Lovell,D.P., Robinson,W.D. and Yoshimura,I. (1997) Recommendations for biostatistics of mutagenicity studies. Drug Inf. J., 31, 323–326.

    Hollander,M. and Wolfe,D.A. (1999) Nonparametric Statistical Methods, 2nd Edn. John Wiley & Sons, New York, NY.

    Lovell,D.P., Thomas,G. and Dubow,R. (1999) Issues related to the experimental design and subsequent analysis of in vivo and in vitro Comet studies. Teratog. Carcinog. Mutagen., 19, 109–119.[CrossRef][Web of Science][Medline]

    Olive,P.L., Banath,J.P. and Durand,R.E. (1990) Heterogeneity in radiation-induced DNA damage and repair in tumor and normal cells using the ‘comet’ assay. Radiat. Res., 122, 86–94.[Web of Science][Medline]

    SAS Institute Inc. (1999) SAS/STAT® User’s Guide, Version 8. SAS Institute Inc., Cary, NC.

    Tice,R.R., Agurell,E., Anderson,D., Burlinson,B., Hartmann,A., Kobayashi,H., Miyamae,Y., Rojas,E., Ryu,J.-C. and Sasaki,Y.F. (2000) Single cell gel/comet assay: guidelines for in vitro and in vivo genetic toxicology testing. Environ. Mol. Mutagen., 35, 206–221.[CrossRef][Web of Science][Medline]

Received on January 17, 2002; revised on November 4, 2002; accepted on November 7, 2002.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
MutagenesisHome page
C. C. Smith, D. J. Adkins, E. A. Martin, and M. R. O'Donovan
Recommendations for design of the rat comet assay
Mutagenesis, May 1, 2008; 23(3): 233 - 240.
[Abstract] [Full Text] [PDF]


Home page
MutagenesisHome page
D. P. Lovell and T. Omori
Statistical issues in the use of the comet assay
Mutagenesis, May 1, 2008; 23(3): 171 - 182.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
B. Pesch, M. Kappler, K. Straif, B. Marczynski, R. Preuss, B. Rossbach, H.-P. Rihs, T. Weiss, S. Rabstein, C. Pierl, et al.
Dose-Response Modeling of Occupational Exposure to Polycyclic Aromatic Hydrocarbons with Biomarkers of Exposure and Effect
Cancer Epidemiol. Biomarkers Prev., September 1, 2007; 16(9): 1863 - 1873.
[Abstract] [Full Text] [PDF]


Home page
MutagenesisHome page
A. Hartmann, M. Schumacher, U. Plappert-Helbig, P. Lowe, W. Suter, and L. Mueller
Use of the alkaline in vivo Comet assay for mechanistic genotoxicity investigations
Mutagenesis, January 1, 2004; 19(1): 51 - 59.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
E. L. Fernandez, A.-L. Gustafson, M. Andersson, B. Hellman, and L. Dencker
Cadmium-Induced Changes in Apoptotic Gene Expression Levels and DNA Damage in Mouse Embryos Are Blocked by Zinc
Toxicol. Sci., November 1, 2003; 76(1): 162 - 170.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (22)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wiklund, S. J.
Right arrow Articles by Agurell, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wiklund, S. J.
Right arrow Articles by Agurell, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?