Mutagenesis Advance Access originally published online on April 2, 2008
Mutagenesis 2008 23(3):171-182; doi:10.1093/mutage/gen015
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Statistical issues in the use of the comet assay
Department of Biostatistics, Postgraduate Medical School, University of Surrey, Daphne Jackson Road, Manor Park, Guildford, Surrey GU2 7WG, UK 1Department of Biostatistics, Kyoto University School of Public Health, Onogawa 16-2, Tsukuba, Japan
The comet or single-cell gel electrophoresis assay is now widely used in regulatory, mechanistic and biomonitoring studies using a range of in vitro and in vivo systems. Each of these has issues associated with the experimental design which determine to a large extent the statistical analyses than can be used. A key concept is that the experimental unit is the smallest amount of experimental material that can be randomly assigned to a treatment: the animal for in vivo studies and the culture for in vitro studies. Biomonitoring studies, being observational rather than experimental, are vulnerable to confounding and biases. Critical factors in any statistical analysis include the identification of suitable end points, the choice of measure to represent the distribution of the comet end point in a sample of cells, estimates of variability between experimental units and the identification of the size of effects that could be considered biologically important. Power and sample size calculations can be used in conjunction with this information to identify optimum experimental sizes and provide help in combining the results of statistical analyses with other information to aid interpretation. Interpretation based upon the size of effects and their confidence intervals is preferred to that based solely upon statistical significance tests. Statistical issues associated with the design and subsequent analyses of current validation studies for the comet assay include the identification of acceptable levels of intra- and inter-laboratory repeatability and reproducibility and criteria for dichotomizing results into positive or negative.
| Introduction |
|---|
The comet or single-cell gel electrophoresis assay is now widely used as a quick, sensitive and cheap method for measuring DNA strand breaks in eukaryotic cells for the investigation of genetic damage associated with exposures to potentially genotoxic agents. The method has evolved over the last 20 years since first described by Östling and Johanson (1
An overview of the uses of the comet assay is given by Collins (3
) and the recent Comet Workshop (papers in this issue) showed the range of applications from traditional genotoxicity in vivo and in vitro studies, through mechanistic studies to its use in ecotoxicology such as for aquatic toxicology. Other examples of its use include investigations on industrial chemicals, pharmaceuticals, biocides, agrochemicals and food chemicals such as additives. It is also used as a biomarker in cancer and nutrition studies (papers in this issue).
A number of protocols have been developed for use in different types of investigations, for instance, for the neutral and the alkaline versions. Guidelines and recommendations for the conduct of studies have been published (5
,6
). A number of modifications of the comet assay have been developed, for instance, to measure cross-links by determining the reduction of induced DNA migration (7
) and for investigating base excision repair and nuclear excision repair (8
).
Continuing development of the assay means that a range of statistical methods may be used. Many of them are likely to be interchangeable and, although giving numerically different results, are likely to lead to qualitatively similar conclusions. Where different statistical methods produce different conclusions, this is often an indication that a careful inspection of the data is needed. It is, though, unlikely that a single statistical method will meet every requirement (9
).
This paper not only addresses generic issues associated with the comet assay but also discusses specific issues associated with the protocols being developed for in vivo and in vitro studies and which will be used in validation studies. The statistical input into the development of experimental design is emphasized. Many of the experimental design and statistical analysis issues associated with comet assay are common to many other genotoxicity studies and, in general, to other experimental systems. Comments made here may apply to other assays. It is stressed that the size of an effect [and some indication of the confidence interval [CI] associated with it] is more important than determining statistical significance by itself.
| Image analysis and end points |
|---|
The comet has a complex form which after visualization can be simplified to a set of multivariate data representing the shape. Image analysis methods are capable of collecting a large amount of information on the image and various proprietary automated systems exist and public domain programmes have been developed for image analysis (10
Debate continues over whether manual or automatic scoring is best (4
). For instance, while manual scoring using an eyepiece micrometre to measure tail length is permitted, image analysis is recommended (11
). The key point is that the method should be consistent over a study or combination of studies. The method of scoring is relevant if meta-analysis is planned and where the absolute size of effect is important. It is important to avoid bias in the identification of cells to measure comets (3
). Cells with large tails may overlap and thus may not be selected for measurement which could violate the assumptions underlying statistical tests that the cells represent a random sample.
Three measures of DNA migration are commonly used: tail length, tail moment and % of the DNA in tail (% tail DNA). Metrics on tail or head length and moment are measured in arbitrary units and may vary from study to study or from laboratory to laboratory. Tail length is considered unsatisfactory as a measure because the length only increases at relatively low damage levels and is sensitive to the background intensity of the image analysis system which affects the criteria for determining the end of the tail (3
). Tail moment, an index taking account both the migration of the genetic material and the relative amount of DNA in the tail, can be calculated a number of ways. The Olive tail moment, for instance, is the product of the tail length and the % tail DNA. The % tail DNA is a measure of the relative fluorescent intensity in the head and tail (3
). The % tail DNA values are constrained to a maximum of 100 and a minimum of 0 with no variability at the extremes and maximum variability at intermediate values such as 50%.
The % tail DNA has the advantage that it can be standardized over studies while tail length and moment, although consistent within a study, may not be comparable across studies. There is a increasing emphasis on the use of the % tail DNA as the preferred metric or the primary end point (12
) and it was recognized as the most suitable primary end point at the International Workshop on Genotoxicity Test Procedures at San Francisco in 2005 (4
). Hartmann et al. (11
), for instance, in describing the various measures suggest that there is much to recommend the use of per cent DNA in tail. Relative tail intensity (the % tail DNA) was linearly related to DNA damage over a wide range of damage and is related to DNA break frequency. Collins (3
) viewed % tail DNA as the most useful measure because it covered a wide range of damage (from 0 to 100%), was independent of the threshold settings of the image analysis program used and gave some feel for what the comet looked like. In contrast, tail moment was not linearly related to dose and did not provide an indication of what the comet looked like. One complication with % tail DNA is that the presence of zero values would complicate statistical analysis. Collins (3
), however, suggests that a check of whether cells are in satisfactory condition for the assay is that untreated control cells should have a background level of breaks (i.e.
10% DNA in tail) and there are suggestions that negative control cells should have between 10 and 20% DNA in tail which would obviate statistical problems.
Other variations on the measures made include comet moments (13
) and tail inertia(14
). Bowden et al. (15
) developed a tail profile which identifies more DNA damage than measured by the tail moment. They derived a profile plot, a visual representation of a series of comets on a slide, which could identify features in the data that were not otherwise apparent.
An alternative scoring system is to classify DNA migration data using a five-category classification scheme (0, no damage to 4, almost all the DNA in the tail). The system is manual and relies on a subjective assessment based upon comparisons with standard images. Collins (3
) illustrates the five classes (0–4). Each grade is equivalent to
20% band on the % tail DNA score. The scores for a sample of 100 comets from a slide can be combined to provide an overall score for the slide on an arbitrary score from 0 to 400. This score shows close agreement with scores based upon % tail DNA (3
).
Comet cells can also be categorized as responder or non-responder cells based upon the degree of damage and the proportion of responder cells on a slide then used as the measure of damage. Altman and Royston (16
), however, point to the costs of dichotomizing continuous variables and that dichotomizing at the median is comparable with losing a third of the data.
Comparisons of results with different comet end points may be useful. Lee and Steiner, in the context of environmental studies, suggest the use of both tail moment and % tail DNA data in the analysis (17
). Other data collected in comet assay are measures of cell toxicity, damage or viability and information on the percentage of hedgehogs (cells with a small or non-existent head and large, diffuse tails). These data are not usually included in the formal statistical analysis of the comet measures but are important for an assessment of the quality of the study.
| Experimental unit |
|---|
The concept of the experimental unit is fundamental to the statistical analysis of designed experiments. Misspecification of the experimental unit can lead to serious misinterpretation of the statistical analysis. The US National Institute of Standards and Technology defines the experimental unit as the entity to which a specific treatment combination is applied, the US Food and Drug Administration as the standard subject to which a treatment is applied and a measurement is made. More precisely, it is the smallest amount of experimental material that can be randomly assigned to a treatment.
Both the animal (6
) and the culture in in vitro studies (11
) have been clearly identified as the experimental unit (Figure 1). In some protocols, cells may be scored from a number of slides and a summary statistic for the slide may be used in the analysis. It is possible that there may be appreciable variability between slides and this may need to be taken into account in the statistical analysis.
|
The individual cell may be the smallest unit which can be measured but cells from the same animal or culture are all assigned to the same treatment and repeated measures taken from the same experimental unit are likely to be autocorrelated or more similar to one another than two cells each taken from different samples. The degree of similarity is measured by the intra-class correlation (ICC). The ICC compares the within-group variance with the between-group variance and is calculated as
There is a need with in vitro designs to ensure that there is adequate replication with the culture having a similar role as the experimental unit to the animal in the in vivo test (Figure 2). There may be differences between cultures, between subcultures within a culture and between cells within a subculture. Any analysis needs to take into account these different levels of variability otherwise hidden levels of variability can distort the estimates of variability and lead to errors in interpretation.
|
An in vitro design where each of a series of subcultures receives a different treatment and the cells within the subculture are treated as the experimental unit in the analysis may lead to significant but artifactual results such as apparent non-dose-related effects (Figure 2c). (A similar ex vivo design with single subcultures per treatment has the same problem.) These small but significant differences between subcultures probably represent their underlying variability rather than a true treatment effect. The more cells that are measured the more likely a significant difference will be detected. An experiment with replicate subcultures will provide a valid estimate of subculture variability and a valid, if low power, test of the treatments for that specific culture (Figure 2b). For power calculations, replication of cultures is needed for an estimate of the variability across cultures.
Common to all, experimental studies should be the standard design features of randomization, replication and blocking together aimed at reducing biases and managing uncontrollable variables. Examples include randomization of the position of slides on platforms, the use of electrophoretic runs as blocks and the blind scoring of cells.
| Within sample distributions |
|---|
Many distributions of comet measures (e.g. % tail DNA) within a sample (intra-sample) from a culture or animal are not normally distributed but rather may be asymmetric, skewed, bi- or multi-modal, a mixture of different distributions or just idiosyncratic especially if an administered compound has caused some DNA damage. Some of the end points may include many small or zero values plus some extreme values. Finding any best measure may be difficult if the distribution is not simple. In the case of normally distributed data, the central value can be described by the mean and the spread by the standard deviation (SD) but description of the distribution may be difficult if a number of parameters are needed to explain the distribution. In these cases, it is unlikely that a single statistical distribution will be able to describe the distribution and any single measure will capture only part of the information in the sample data and may be unrepresentative of any actual value.
Various statistics have been suggested to represent the sample. For instance, the 90th and 95th percentiles have been suggested because they capture the upper tail of the distribution. (However, for a precise estimate, this may require large numbers of cells—more than the 50 cells per slide often measured.) Values of the mean, median and 75th percentile are usually highly correlated.
Duez et al. (18
) noted that the heterogeneity of the distribution curves of comet measures made the use of standard parametric and non-parametric methods difficult because assumptions underlying them were violated. They suggested using either the median or 75% percentile of the sample in subsequent analyses. They concluded that a trend analysis on medians of the samples was satisfactory. They also noted that non-parametric tests such as the Kruskal–Wallis and Mann–Whitney tests were oversensitive in detecting small differences between replicate samples and were not suitable for use in detecting genotoxic effects.
Data can be transformed to try to make the distribution of the data conform to normality. The logarithmic transformation has the convenient property that back transformation (taking antilogs) to biologically meaningful values is easier than with other transformations. The problem of the logarithm of zero values can be overcome by the addition of small positive values (such as 0.001) to the data. It is important to appreciate, though, that while a transformation may correct one violation, say normality, it can result in another, such as heterogeneity of variances. The logarithmic transformation is a special case from the family of Box–Cox power transformations. The optimum value for the power term in the transformation can be derived from an analysis of the data sets using a range of values for the power term (19
).
In the case of % tail DNA which is measured on a scale from 0 to 100%, the data may be suitable for transformation by a logistic or arcsin transformation. Collins et al. (20
), for instance, suggested the use of either an angular or arcsin transformation or the use of generalized linear models with binomial error distribution and a logit link function.
A number of probability distributions have been proposed for modelling the distributions. These include the Weibull, exponential, logistic, normal, log-normal and log-logistic distributions (21
). Others investigated include the Poisson, beta, gamma, Erlang and Weibull (22
). Debon et al. (23
) suggest the use of the sum of two Gaussian (normal) curves.
Ejchart and Sadlej-Sosnowska (22
) found that the Weibull distribution was the best fit to data from an in vitro dose–response study. The Weibull is widely used in other fields, such as to model the failure of mechanical components, and can be characterized by two parameters relating to the shape (
) and scale (β) of the distribution which both increased with increasing dose. Ejchart and Sadlej-Sosnowska argued that changes in the values of these parameters would, therefore, be evidence for genotoxicity and suggested using simulations to derive CIs for the estimates (22
).
Tice et al. (6
) suggested the use of the H statistic as a measure of the migration patterns among cells within a sample. H, the coefficient of dispersion, is calculated as the variance/mean and is sometimes referred to as Fisher's coefficient of dispersion or the variance–mean ratio. In the case of quantitative measures, H is equivalent to the coefficient of variation (CV) times the SD. The larger the CV, the larger the value of H. One problem with H is that it can be susceptible to one or a small number of outliers. The coefficient of distribution is also used to see if data are distributed according to a Poisson distribution where H = 1. Values of H > 1 suggest over-dispersion of the data.
The fit of the data to a distribution has often been tested using a goodness of fit statistic such as the Kolmogorov–Smirnov test. However, problems can arise in the interpretation of a goodness of fit test as the null hypothesis that the data fit a normal distribution is likely to be rejected when the sample size is large simply because real data are unlikely to be a perfectly distributed. Duez et al. (18
) found that another test for normality, the Shapiro–Wilks test, was very sensitive for detecting non-normality of untransformed and log-transformed tail length and tail moment when applied to samples of 100 cells. A similar issue arises with tests of the assumptions of equal variances. Care is, therefore, needed to avoid trawling for a best distribution. The finding that data conform to a particular distribution does not mean that this distribution is the correct one. There should be some biological basis for the choice of a distribution.
In practice, concerns about the assumptions underlying the ANOVA methodology are, in decreasing order of concern: independence, equal (homogeneous) variances and normality (24
). Of these, independence is by far the most important, while the ANOVA is robust enough to conduct when the within-group variances differ by a factor of two (some even say five) and where normality is a minor violation (24
).
Complications arising from the complex distribution of comet end points may be in part avoided because of the implications of the central limit theorem. While the original distribution of a set of data may not be normally distributed, the distribution of the means of random samples from the distribution will be approximately normal, particularly as the sample size increases. Median values for a slide may, therefore, represent data which are amenable to standard statistical analyses.
| Suggested approaches for statistical analysis |
|---|
Currently, there is no consensus on standard statistical methods for the analysis of comet data (6
In practice, however, comet assay data should be basically straightforward to analyse with the exception that the measures of damage to the cells in a sample have a complex distribution especially if there has been an effect of the chemical. As discussed above, there is probably no simple statistical/mathematical distribution that would explain the observed distributions and this makes statistical analysis using the individual cell scores difficult. On the other hand, analyses concentrating on a single measure from each animal (the experimental unit) may provide robust results which, with care, can be interpreted satisfactorily. Duez et al. (18
) has listed some of the standard methods available.
Statistical analysis can be carried out by using parametric approaches such as ANOVA techniques which reduce, in the special case of two groups, to one of a range of t-tests based upon the degree of variability in the two groups. ANOVA methods, part of the wider general linear model (GLM) approach, can be used to further explore the difference within a group of means by specific contrasts. Some contrasts have clearly defined hypotheses such as tests for linear and quadratic trends in a dose–response experiment. More sophisticated designs and analyses move from a traditional hypothesis testing approach into a modelling methodology where estimates are derived for various model components and attempts made to identify the best fitting and hopefully the most predictive model.
Non-parametric methods shadow the simpler parametric tests: the Mann–Whitney, the t-test; the Kruskal–Wallis, the one-way ANOVA and the Jonckheere–Terpstra trend test, the linear dose–response trend test. Non-parametric tests are slightly less powerful than their parametric equivalents but give potentially a more accurate Type I error rate when the assumptions underlying parametric tests are violated. (A Type I error is the risk of rejecting the null hypothesis in a statistical test when, in fact, it is true.) Importantly, the non-parametric tests may be distribution free but are not assumption free, so are probably as vulnerable, if not more so, to differences in the distributions between the groups. Non-parametric tests aim to ensure that the correct Type I error rates are maintained but are less suitable for more complex designs, estimation and model fitting. The distribution of the comet end point can create complications in finding an appropriate transformation of the data and the assumptions underlying parametric analyses may be challenged even if not violated. Small sample sizes (e.g. 4 or 5 units per group) also mean that comparisons using non-parametric tests may have low power even when there are quite large treatment effects.
Qualitative data (present/absent) can be analysed by chi-square and Fisher exact tests of 2 x 2 tables and chi-square tests of difference between groups and trends which mirror ANOVA approaches but with appreciably less power. The choice of experimental unit for inclusion in the analysis is, however, a critical concern for the appropriate analysis and interpretation of such tests.
One-sided tests are directional and slightly increase the power of a statistical analysis. There is an argument that the statistical test should be two sided if results where either a decrease or an increase in DNA migration could be envisaged.
The ANOVA, especially the one-way ANOVA is an omnibus test of an overall difference between means. A more targeted hypothesis is of a linear or other specified dose response which also has more power. A linear effect may be found even when there is no significant difference between means as a consequence of the increased power of the specific hypothesis test compared with the general hypothesis. This should be borne in mind when the rule that no further testing should be carried out between groups if the overall hypothesis test of a difference between means is not significant in the ANOVA is applied. Another argument against formalized analyses is the complications that arise if an experiment is only declared satisfactory if there is a significant difference between the positive and negative control groups but which does not take into account the complications that can arise from small sample sizes and unequal variances as a result of variable responses in the positive control group.
Multiple comparison approaches are sometimes used to address concerns that when a large number of comparisons (e.g. between pairs of treatments) are made, there is a risk of Type 1 errors (declaring results significant when they are not.) A common method often used in the analysis of comet assay (and other toxicological) data is Dunnett's test (25
). This is a specialized multiple comparison test that allows a comparison of a single control group with all other groups. This test was specifically designed to adjust the error rate when multiple comparisons are made between a number of new treatments and the standard treatment group with the objective of avoiding wrongly replacing a satisfactory standard treatment with a new treatment which just happened to perform better by chance in a single particular study. Dunnett's test aims to keep the experiment-wise (or family-wise in contrast to the individual error rate) error rate at 0.05 which means that on average only 1 in 20 experiments will reach a false conclusion. The implication is that testing is done at a more conservative
value so in effect lowering the power of the design but without taking any account of any other structure in the design. A multiple comparison procedure in effect dampens down the number of significant results reported. There are a number of different multiple comparison methods available, each addressing a different aspect of the comparisons across a range of treatments and with different properties. The Bonferroni correction, for instance, is a highly conservative approach which carries out hypothesis testing at the
/n level where n is the number of multiple comparisons being made. The use of multiple comparison methods, however, remains controversial, with some statisticians arguing against their indiscriminate use (26
).
| Hierarchical design, random effect models and generalized linear modelling |
|---|
The comet assay is a hierarchical or nested design with animals (in the in vivo design) and cultures (in the in vitro design) within doses, a number of slides from each animal or culture and a number of cells measured per slide (Figure 1). The statistical models underlying these designs go under various names (hierarchical linear models, multilevel models, mixed-effects models, random-effects models, random coefficient regression models and covariance components models). The models make use of information on the various levels of variability in the design but are quite complex, need sophisticated software and can be complicated to interpret and explain. Their advantage is that they are able to provide estimates for the variability at each level in the design and make use of information at the cell level so increasing the power of the study somewhat. However, if there is appreciable between animal or culture variability, the extra power available may be small. There is also the added difficulty that the variability between cells within the same animal may have a complex distribution which may be difficult to include in the model.
Wiklund and Agurell (27
) and Verde et al. (21
) provide examples of more sophisticated statistical analyses where attempts have been made to model both the variability between samples and between cells within a sample based upon GLM approaches. The GLM is a generalization of the ordinary least squares approach (used in the ANOVA, analysis of covariance and multivariate ANOVA) and is a special case of the generalized linear model (GLZ). The generalized linear model is a unified method used to extend the GLM approach to incorporate responses other than those based upon the normal distribution. Nelder and Wedderburn (28
) developed the concept of the GLZ which placed all the commonly used models, binomial, logit, probit and normal in a unified framework. Generalized linear modelling uses a link function which can be considered equivalent to the transformations applied in traditional analyses and provides the link between the linear part of the model and the random part of the model.
The GLZ can be further generalized. Generalized linear mixed models (GLMM) are an extension of the GLZ with random effects and is also called a generalized linear mixed-effects model. Verde et al. (21
) used it in their modelling approach to the analysis of comet data. Generalized estimating equations (GEE) are another extension of GLZ involving algorithmic adjustments used to model longitudinal or clustered data and to estimate regression coefficients. GEE use a working correlation matrix as an approximation of the true within subject/unit correlation for each unit (29
).
Wiklund and Agurell (27
) provide concise recommendations regarding the design and statistical analysis of comet assay studies. They used simulations to identify the optimum number of cultures or animals, slides per culture or animal and cells per slide based upon data derived from studies performed in house. They investigated the performance of a number of standard statistical methods on a range of scenarios of in vitro and in vivo study results. The non-parametric tests investigated (the Kruskal–Wallis and Jonckheere–Terpstra trend test) were generally less efficient than the corresponding parametric tests. The use of parametric linear trend tests was recommended as they generally performed better that the corresponding overall tests for treatment differences especially when the dose–response pattern was monotonic. They noted that the 90th percentile is not affected by extreme outliers but focuses on the upper part of distribution. They recommended, however, using the mean of the log-transformed tail moment data and the 90th percentile of the log-transformed tail length as the end point in the analysis. They cautioned strongly against the use of the untransformed mean tail moment. They recommended designs with 50 cells from three slides per culture and either four or five animals per group or two or three cultures per treatment group in in vitro studies. They suggested analysis based upon a GLM/ANOVA approach which included factors for treatment groups, experimental conditions such as electrophoresis runs and cultures or animals. Similar simulation approaches could be applied to the use of % tail DNA as the end point.
Verde et al. modelled tail moment data using GLMM (20
). They recommend choosing from a set of distributions that give the best fit to the data. The approach is based upon the use of survival models and the distributions derived from the family of accelerated life models to develop a two-level hierarchical model of within and between individual tail moment measures. Treatment effects were assessed by the construction of probability over damage graph which visualized the degree of damage produced by the treatment by plotting the probability of the damage to a cell being greater than a certain value.
A number of statistical software packages such as SPSS, Genstat and SAS as well as R (a public domain open source statistical analysis software language) have procedures for carrying out analyses using some of these models. For instance, the SAS procedures PROC GLM and MIXED can be used for the analysis of comet data. PROC GLM is the SAS procedure for analysing data using a GLM approach. PROC GLM is able to handle repeated measures by including various postulated correlation structures in the analysis. SAS PROC MIXED was developed for the analysis of designs where there is a mix of both random and fixed effects. It is based upon approaches developed by Wolfinger (30
) and is more powerful but more complex to use than PROC GLM. The general linear mixed model in PROC MIXED directly models the covariance structure so dealing with problems that might arise from inefficient analyses and incorrect conclusions being drawn from ignoring the problems associated with the correlations between repeated measures.
| Control groups |
|---|
Two comparisons using control groups are relevant: comparisons between the concurrent positive and negative control groups and comparisons of the concurrent controls with historical control information. An important ethical issue relates to the purpose of the positive controls in in vivo studies and how big a group size is needed. As discussed earlier, formal statistical tests may fail to show statistical significance if the variability of response of the positive control group is large and sample sizes are small.
It is often an expectation, such as meeting a requirement for entry to a validation study, that a laboratory can show evidence of a successful record of carrying out the assay by providing historical control data. The compilation of such data sets can be developed as part of a formal quality control (QC) process using the range of statistical methods available (31
). Assessment against these criteria could be useful both for the laboratory and the regulator. The development of Bayesian approaches to make the optimum use of historical control data is one potential development.
Hauschke et al. (32
) argue for an approach which relates the classification of a result as positive or negative to the size of the response in the positive control group. This involves determining the maximum safe dose by incorporating a biologically meaningful threshold value (f) which is fraction of the difference between the positive control and vehicle control responses
| Dose–response modelling |
|---|
The number of doses to include in a comet assay remains an important consideration with the need to ensure non-linear dose–response curves (or downturns) can be detected and positive responses at multiple dose levels reinforcing the biological relevance of results (4
In the context of dose–response modelling, it is important to note that the identification of a dose as a no-observable effect level (NOEL) using a statistical test does not mean that a threshold exists or that effects do not occur below this level. The NOEL classifies a result into an effect/no-effect dichotomy. This may be wrongly interpreted as implying that the response is either non-linear or thresholded. The NOEL detectable in an experiment is a function of the statistical test applied to the data. The larger the experiment carried out, the smaller the difference between a negative control and a treated group that it is capable of detecting as statistically significant. In contrast a small, poorly designed study with appreciable variability is liable to fail to detect effects and thus provide estimates of NOEL above the level where effects should be detected. This is a well-known limitation of the NOEL methodology (33
).
| Design of experiment approaches |
|---|
The comet assay continues to develop and now exists in a number of forms. Further extension of the methodology makes the assay a good candidate for systematic development using a design of experiment (DOE) methodology. This approach finds multiple factors (and interactions between them) that affect results appreciably and identifies the levels of these factors which optimize results while minimizing the number of experiments that need to be run and the material used (34
| Power and sample sizes |
|---|
A key concept in the design of a study is a determination of the number of experimental units needed. A range of software packages, web-based resources, books and formulae are available for estimating sample sizes for a given power and vice versa. Power is defined as the probability of detecting an effect of a specified size if it is present and is related to the Type II or beta error associated with hypothesis testing. Most formulations represent very simple situations: comparison of two groups for differences in means or proportions. More complex hypotheses such as tests for specific dose–response relationships are more difficult as the power depends very much on the specific hypothesis being tested. Statistical packages such as nQuery Advisor have options for sample sizing for more complex designs and hypotheses. An alternative approach to the more complex problem is simulation and modelling of the design (36
In the case of quantitative end points, four things are needed to determine sample sizes: the significance level the hypothesis will be tested at, the chosen power (conventionally 80 or 90%), the size of effect considered biologically important and some measure of the variability of the experimental units (e.g. the between-unit SD). Note that the sample size is for the number of experimental units. If the power for a specific sample size is required, then the sample size is entered instead. Ignoring strata in the design can lead to serious misinterpretation.
For qualitative end points, besides the alpha and beta levels, the control and treated proportions are needed to obtain sample sizes. The sample sizes needed are likely to be appreciably larger with qualitative compared with quantitative end points because of the lower information content of qualitative data.
The background level is important in determining the size of effect that can be detected by a design. This level affects how easy it is to detect absolute as opposed to relative changes. For instance, with a low background level, a small absolute difference may equate to a large-fold change while with a high background level a large absolute difference will equate to a smaller fold change. This becomes important, for instance, if the negative control group was to have little or no variability for a measure like % tail DNA.
The challenge in power and sample size calculation for the comet assay is to identify what size of effect can be considered biologically important and to have appropriate measures of the inter-experimental unit SD. One source of such measures is from previous studies or data from the literature. For example (37
), an estimate of the inter-individual SD of % tail DNA of comets from buccal cells from seven healthy, young female non-smokers was 6.1% [taken from day 0; Figure 5 of reference (37
)]. As another example, Frenzilli et al. (38
) reported mean (SD) comet lengths in leukocytes of 16.5 (4.6) for 39 children from Pisa, 26.3 (9.6) for 16 healthy and 16.0 (4.3) in 27 tumour-affected children from Belarus.
It is important to remember than any power/sample size calculation is only an approximation and depends upon the assumptions, particularly of the inter-experimental unit (SD). The simple calculations are also based upon the assumption that a t-test will be appropriate for the analysis. If the treated animals are appreciably more variable, then the sample sizes can be underestimates depending upon the nature of the response. Alternatively, in some case where there is an appreciable difference in variability between the groups, a transformation may be appropriate and may result in the power being retained. Power calculations are possible with log-transformed data.
An alternative approach is based upon the work of Cohen (39
). He developed the concept of expressing the size of differences based upon effect sizes in SD units calling 0.2 small, 0.5 medium and 0.8 large effects. A simple rule of thumb is that for a two-sided test of two group means at 80% power, that for every halving of the effect size in SD units the sample size in each group increases by
4 (Table I).
|
Although the approach is useful and widely applied in circumstances where information of biologically important differences and variability is difficult to obtain, it is not without its critics (40
Figure 3 shows the implications of reducing sample sizes below n = 5. Based upon standard methods, a two-sided test with n = 5 has 80 and 90% power to detect difference of 2.02 and 2.35 SD units, respectively, in a two-sample t-test. The comparable values for sample sizes of n = 4 are difference of 2.38 and 2.77 SD units and the graph shows the appreciable information gain from an extra experimental unit when sample sizes are small.
|
Hierarchical designs of the comet assay share features in common with hierarchical or cluster randomized controlled trials. These are common in studies involving interventions in units such as general practices or schools where individuals within the same practice or school show similarities (41
A failure to take clustering into account results in the standard errors of the estimates being underestimated and, potentially, leading to false conclusions. Even small ICC values can have big effects on the size of an estimate. An ICC of just 0.05 with cluster sizes (m) of 20 per group can lead to a 30% underestimation of the true precision of an estimate which leads to an increase risks of Type I error (29
). The variance of the estimate increases by (1 + (m – 1)
) (called the design effect) where m is the number per unit. Table II illustrates the effective sample sizes associated with different ICCs.
|
| Criteria for a positive result |
|---|
Guidelines explicitly refer to the identification for regulatory purposes of a positive or negative result to categorize the result as genotoxic or non-genotoxic and that an equivocal result may require further testing. The criteria needed for a definitive result are usually not explicitly defined. In some cases, there is reference to the need for a statistically significant result. However, statistical significance is not a measure of the size effect alone but depends upon a number of other factors especially the size of the experiment and the variability of the material. Different laboratories may also use their own statistical methods to define a positive.
The criteria for a positive result should, therefore, be defined. Given the limitations of basing this solely on statistical significance, the criteria should be related to the size of the difference, for instance, in the mean % tail DNA between a negative control and treated group. This approach has the advantage that the study can be explicitly designed to have sufficient power to detect differences large enough to be considered biologically important.
It should always be remembered that dichotomization of results into genotoxic or non-genotoxic leads to a loss of information with the consequence that some weak mutagens will be called negative and disagreements will occur when different criteria are used by different laboratories.
| Replication and repeat experiments |
|---|
Repeating experiments with adequate replication within them is usually considered good experimental practice. The terminology is not always standardized but a repeat experiment can be considered a separate experiment while replication occurs within an experiment. It is not always clear just how independent repeat experiments are or whether they should be based upon the same or a different design. A further consideration is whether the conditions for a repeat experiment should be decided on before or after the first study. Replicate samples should, in general, be prepared. Replicates can be considered as either biological and/or technical replicates. Biological replicates are samples taken from different independent experimental units such as subjects, animals or cultures. Technical replicates are repeat samples taken from the same animal or culture. They may be replicates from the same unit or from replicate samples from the same experimental unit. The need for biological replication should take precedence over the need for technical replication.
Pooled samples may sometimes be used. However, samples should not be pooled if information on individual experimental units is important and information is required on within-group variability in in vivo studies. Pooling samples may make an experiment technically easier because there are an adequate number of cells for analysis but this advantage is offset by the loss of any measure of inter-sample variability, the loss of statistical power and the potential for outliers with idiosyncratic responses masking the effects seen in the other units.
The OECD guidelines (45
) note that equivocal results should be clarified by further testing preferably using a modification of experimental conditions. A follow-up study may fail to confirm initial results because of the regression to the mean effect. This is because a study that generated a follow-up experiment may be at the upper end of possible results and the results of subsequent studies are likely to approach the true but smaller effect.
| Validation studies |
|---|
Validation is the process by which the reliability and accuracy of a procedure are established for a specific purpose (46
In an inter-comparison of the comet assay, there are three different levels of potential variability: between laboratories, between experiments carried out in the same laboratory and within an experiment. (This latter can be broken down further into variability between animals/cultures within the same dose level and between cell within the same animal/culture.) The criteria for acceptable levels of variability are a scientific rather than a purely statistical issue. These criteria should be defined before the study begins.
The International Standards Organization and the American Society for Testing and Materials have developed guidelines for the investigation of repeatability and reproducibility of inter-laboratory comparisons (47
,48
). Repeatability is defined as the closeness of agreement under identical conditions in the same laboratory using the same conditions (equivalent to a best case) (measured by R and the within-laboratory consistency statistic k) and reproducibility the variability between laboratories using the same methods (equivalent to a realistic case) (measured by r and the between-laboratory consistency statistic h). Guidelines for conducting the analysis are provided (including issues such as inclusion or exclusion of potential outlier laboratories).
Qualitative agreement between laboratories should be expected for potent positive control chemicals. Determining acceptable levels of variability in a quantitative measure is different from whether a particular individual experiment is significant or not. The criteria need to be defined for assessing how much variability in results using reference chemicals is acceptable.
An alternative definition of accuracy is the proportion of correct outcomes of a test method (46
). In validation studies, dichotomization (genotoxic/non-genotoxic) allows the calculation of diagnostic statistics such as sensitivity and specificity. Large sample sizes (of chemicals) are needed for precise estimates or small CIs. These CIs are usually derived by standard methods based upon the binomial distribution. The choice of the cut-off point for dichotomization can be investigated using receiver operator curves but any choice will be a trade-off because some misclassifications will occur so that sensitivity and specificity estimates will be less than one. The prevalence of the classes and criteria for how good the agreement or concordance needs to be to claim that the method is validated need to be predetermined. Scientific judgement is required on how big a sample of chemicals is needed for adequate precision (i.e. width of CIs) of the diagnostic statistics.
| Observational and biomonitoring studies |
|---|
The comet assay is used in biomonitoring and molecular epidemiological studies. Observational studies differ from experimental studies, in that there is, in effect, no choice of who is allocated to the control or exposed group. Individuals are observed unlike experimental studies where animals or cultures are randomly assigned to the treatments. In randomized trials, the design explicitly tries to ensure that the observed effects are not a consequence of some differences in the baseline characteristics of the groups. However, in non-randomized human biomonitoring studies such as case–control and cohort studies, the groups being compared are likely to differ with respect to a large number of potentially uncontrolled confounding factors such as sex, age, weight, diet, cigarette smoking, alcohol consumption, lifestyle and genetic polymorphisms. These factors may be unequally distributed between the control or reference group and the exposed groups. Bias in the selection of the groups is also a major risk for such studies.
Observational studies produce methodological problems but can, if carefully done, generate important results not easily otherwise obtained. Considerable care is needed, however, with such studies to ensure that the problems of confounding and bias do not influence results. Standard statistical methods such as ANOVA make assumptions about randomization which are unlikely to hold for observational studies.
If confounding cannot be avoided, identifying the causal relationships involving the factors becomes more difficult. There are several more or less complex statistical methods for dealing with confounding including techniques such as matching, stratification and regression. Modelling approaches such as multiple regression, logistic regression and Cox s proportional hazard modelling are often used. It is important that the report of the study shows if and how adjustments for confounding have been done. However, M
llner et al. (49
) reported that there is often inadequate reporting in papers of the statistical methods used to adjust for confounding factors. Recommendations for reporting such approaches are given by Campbell (29
).
Collins (3
) pointed to the need in human epidemiological studies to carry out power calculation to establish group sizes needed and that pilot studies might be needed to estimate intra- and inter-individual variability in the end point under investigation.
Analyses reporting correlations between variables are often reported although it is widely appreciated that an association identified by a significant correlation does not imply causation. In some cases, correlations are reported when a regression analysis would be more appropriate as there are clearly dependent and independent variables. Large sample sizes can result in small but statistically significant correlations but if many end points are measured, large numbers of correlations can be calculated with a serious risk of Type 1 errors: with 10 end points there are 45 possible correlations. Similar multiple comparison problem can arise when subgroup analyses are carried out particularly post hoc analyses. Consequently, considerable care should be exercised in the interpretation of significant correlation coefficients.
A large number of factors can affect the quality of biological samples before their analysis. Guidelines on sample collection and processing of samples should be followed to prevent these factors introducing systematic biases into data derived from the analyses of the samples (50
).
| Recommendations |
|---|
There is nothing especially unusual about the statistical issues associated with the comet assay. Lovell et al. (9
Identifying the experimental unit is crucial. The experimental unit is the unit to which treatments are randomized. In an in vivo study, this is the animal while in in vitro studies it is the culture. Statistical analyses which treat the cell rather than the animal or culture as the experimental unit can produce incorrect results with a risk of overestimating the statistical significance of a result.
A clearly defined end point for the comet should be used. The % tail DNA is a suitable end point for analysis and has the advantage of a defined scale from 0 to 100% which is comparable across studies. Statistical analysis of other end points e.g. tail moment and tail length is possible although they are less directly comparable across studies. Data sets where interpretation of the statistical analyses would differ appreciably between different end points should be investigated to identify the cause of the divergence. Transformation such as the use of logarithms of the end points may be appropriate and, in the case of the % tail DNA, a logistic transformation may be appropriate.
Graphical presentation of results is useful but should not supersede formal statistical methods. Histograms should be presented of the individual data with bins representing the number of cells falling in particular ranges. Care should be taken in comparisons of the shifts across histograms because the measures within the same sample will show autocorrelations so that apparent visual evidence for treatment effects must be critically examined.
Summary statistics for the distribution of cells within a sample can be used for statistical analysis. These could be the mean, the log mean and various percentiles such as the median, 75th and 90th percentile. The more cells measured per unit the more accurate the estimate of these statistics. However, sample sizes of 50 cells per slide are probably satisfactory as the central limit theorem begins to apply when the number of cells is >30. If appreciable variability exists between duplicate slides, then increasing the number of slides would be sensible. A summary statistic like the median value for the slide may be a suitable metric for the statistical analysis.
Statistical analysis could be carried out on the multiple end points collected in the comet assay to see if there are alternative combinations of the measures that could be used in the analysis of a study. Multivariate analysis (MVA) methods could be applied to the data collected on individual comet shape to see whether this could provide extra information for use in the interpretation of results. Similarly, MVA could be used to investigate cells from a sample to see if there is an optimal representative end point for the sample. Estimates of the ICC coefficients could be calculated to help in designing studies with the optimum number of observations at each level in the design.
Statistical tests to identify suitable distributions of the data have limited use because a significant lack of fit may be more a consequence of the sample size than the degree of departure from a distribution.
There should be increased emphasis on the estimate of the size of an effect and its CI rather than solely concentrating on the statistical significance level (P-value) determined in a specific experiment using a particular statistical test. The criteria for a positive effect should be based upon size of effect produced that would be considered biologically important. This should form the basis for power and sample size calculations for study designs. A retrospective analysis of data could be carried out to explore potential improvements to statistical analyses and to provide estimates for sample size/power calculations.
A suitable background incidence of % tail DNA should be identified for the negative control group in a study. The implications for the power of the study design of using different expected levels of % tail DNA in negative control samples should be explored.
Historical control data may help in the interpretation of results but should not preclude the need for concurrent control data to be collected in the study. The use of QC statistics should be considered for the monitoring and assessment of historical control data.
A range of parametric tests such as t-tests and ANOVA and their non-parametric equivalents are appropriate for analysing simple experiments. Tests of dose-related effects such as linear trends can be used and will have higher statistical power. Care should be taken using tests of proportions such as chi-square and Fisher exact tests because these tests assume independence of the data and can seriously overestimate significance levels if the cell is wrongly considered the experimental unit.
Statistical analyses based upon GLMs/ANOVA methodology provide a general approach to the analysis. The continuing development of more sophisticated methods making use of the hierarchical structure such as random effect modelling (e.g. GEE) may also be suitable approaches. Methods for the effective reporting and interpretation of these more sophisticated analyses will need to be developed. DOE approaches should be considered in the context of developing new or modified protocols for the comet assay.
Biomonitoring studies are observational rather than experimental studies. They are thus vulnerable to the problems of bias and confounding. Especial care is needed in the analysis and interpretation of such studies to avoid drawing incorrect conclusions.
In validation studies, acceptable levels of intra- and inter-laboratory variability should be defined to help assess the reliability of an assay. An adequate number of chemicals are needed to get precise estimates of the accuracy of the method.
Considerable care is needed in the design of in vitro studies to ensure that there is adequate replication of cultures. A failure to take into account hidden variability can result in the overestimation of effects such as the identification of artifactual non-dose-related effects as a consequence of differences being detected between subcultures rather than treatments.
| Acknowledgments |
|---|
Conflict of interest statement: None declared.
| Notes |
|---|
* To whom correspondence should be addressed. Tel: +44 1483 688609; Fax: +44 1483 688501; Email: d.lovell{at}surrey.ac.uk
| References |
|---|
-
1. Östling O, Johanson KJ. Microelectrophoretic study of radiation-induced DNA damage in individual mammalian cells. Biochem. Biophys. Res. Commun. (1984) 123:291–298.[CrossRef][Web of Science][Medline]
2. Brendler-Schwaab S, Hartmann A, Pfuhler S, Speit G. The in vivo comet assay: use and status in genotoxicity testing. Mutagenesis (2005) 20:245–254.
3. Collins AR. The comet assay for DNA damage and repair: principles, applications, and limitations. Mol. Biotechnol. (2004) 26:249–261.[CrossRef][Web of Science][Medline]
4. Burlinson B, Tice RR, Speit G, et al. Fourth International Workgroup on Genotoxicity Testing: result of the in vivo comet assay workgroup. Mutat. Res. (2006) 627:31–35.[Web of Science][Medline]
5. Singh NP, McCoy MT, Tice RR, Schneider EL. A simple technique for quantitation of low levels of DNA damage in individual cells. Exp. Cell Res. (1988) 75:184–191.
6. Tice RR, Agurell E, Anderson D, et al. Single cell gel/comet assay: guidelines for in vitro and in vivo genetic toxicology testing. Environ. Mol. Mutagen. (2000) 35:206–221.[CrossRef][Web of Science][Medline]
7. Merk O, Speit G. Detection of crosslinks with the comet assay in relationship to genotoxicity and cytotoxicity. Environ. Mol. Mutagen. (1999) 33:167–172.[CrossRef][Web of Science][Medline]
8. Langie SAS, Knaapen AM, Brauers KJJ, van Berlo D, van Schooten F-J, Godschalk RWL. Development and validation of a modified comet assay to phenotypically assess nucleotide excision repair. Mutagenesis (2006) 21:153–158.
9. Lovell DP, Thomas G, Dubow R. Issues related to the experimental design and subsequent statistical analysis of in vivo and in vitro comet studies. Teratog. Carcinog. Mutagen. (1999) 19:109–119.[CrossRef][Web of Science][Medline]
10. Helma C, Uhl M. A public domain image-analysis program for the single-cell-gel-electrophoresis (comet) assay. Mutat. Res. (2000) 466:9–15.[Web of Science][Medline]
11. Hartmann A, Agurell E, Beevers C, et al. Recommendations for conducting the in vivo alkaline Comet assay. 4th International Comet Assay Workshop. Mutagenesis (2003) 18:45–51.
12. Kumaravel TS, Jha AN. Reliable comet assay measurements for detecting DNA damage induced by ionising radiation and chemicals. Mutat. Res. (2006) 605:7–16.[Web of Science][Medline]
13. Kent CRH, Eady JJ, Ross GM, Steel GG. The comet moment as a measure of DNA damage in the Comet assay. Int. J. Radiat. (1995) 67:660–665.
14. Hellman BH, Vaghef H, Bostorm B. The concepts of tail moment and tail inertia in the single cell gel electrophoresis assay. Mutat. Res. (1995) 336:123–131.[Web of Science][Medline]
15. Bowden RD, Buckwalter MR, McBride JF, Johnson DA, Murray BK, O'Neill KL. Tail profile: a more accurate system for analyzing DNA damage using the Comet assay. Mutat. Res. (2003) 537:1–9.[Web of Science][Medline]
16. Altman DG, Royston P. The cost of dichotomising continuous variables. Br. Med. J. (2006) 332:1080.
17. Lee RF, Steiner S. Use of the single cell gel electrophoresis/comet assay for detecting DNA damage in aquatic (marine and freshwater) animals. Mutat. Res. (2003) 544:43–64.[CrossRef][Web of Science][Medline]
18. Duez P, Dehon G, Kumps A, Dubois J. Statistics of the comet assay: a key to discriminate between genotoxic effects. Mutagenesis (2003) 18:159–166.
19. Box GEP, Cox DR. An analysis of transformations. J. R. Stat. Soc. B (1964) 26:211–246.
20. Collins A, Dusinská M, Franklin M, et al. Comet assay in human biomonitoring studies: reliability, validation, and applications. Environ. Mol. Mutagen. (1997) 30:139–146.[CrossRef][Web of Science][Medline]
21. Verde PE, Geracitano LA, Amado LL, Rosa CE, Bianchini A, Monserrat JM. Application of public-domain statistical analysis software for evaluation and comparison of comet assay data. Mutat. Res. (2006) 604:71–82.[Web of Science][Medline]
22. Ejchart A, Sadlej-Sosnowska N. Statistical evaluation and comparison of comet assay results. Mutat. Res. (2003) 534:85–92.[Web of Science][Medline]
23. Debon G, Bogaerts P, Duez P, Catoire L, Dubois J. Curve fitting of combined comet intensity profiles: a new global concept to quantify DNA damage by the comet assay. Chemomet. Intell. Lab. Syst. (2004) 73:235–243.[CrossRef]
24. van Belle G. Statistical Rules of Thumb (2002) Hoboken, NJ: Wiley Series in Probability and Statistics.
25. Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc. (1955) 50:1096–1121.[CrossRef][Web of Science]
26. Finney DJ. Thoughts suggested by a recent paper: questions on non-parametric analysis of quantitative data. J. Toxicol. Sci. (1995) 20:165–170.[Medline]
27. Wiklund SJ, Agurell E. Aspects of design and statistical analysis in the Comet assay. Mutagenesis (2003) 18:167–175.
28. Nelder JA, Wedderburn RWM. Generalized linear models. J. R. Stat. Soc. A (1972) 135:370–384.[CrossRef]
29. Campbell MJ. Statistics at Square Two: Understanding Modern Statistical Applications in Medicine (2001) London: BMJ Publishing Group.
30. Wolfinger RD, Chang M. Comparing the SAS GLM and MIXED Procedures for Repeated Measures (1998) Cary, NC: SAS Institute Inc.
31. Ryan TP. Statistical Methods for Quality Improvement (2000) 2nd edn. New York, NY: John Wiley and Sons.
32. Hauschke D, Slacik-Erben R, Hensen S, Kaufmann R. Biostatistical assessment of mutagenicity studies by including the positive control. Biom. J. (2005) 47:82–87.[CrossRef][Web of Science][Medline]
33. Gaylor D, Ryan L, Krewski D, Zhu Y. Procedures for calculating benchmark doses for health risk assessment. Regul. Toxcol. Pharmacol. (1998) 28:150–164.[CrossRef]
34. Box GEP, Hunter WG, Hunter JS. Statistics for Experimenters. An Introduction to Design, Data Analysis, and Model Building (1978) New York, NY: John Wiley and Sons.
35. Montgomery DC. Design and Analysis of Experiments (1997) New York, NY: John Wiley and Sons Inc.
36. Eng J. Sample size estimation: a glimpse beyond simple formulas. Radiology (2004) 230:606–612.
37. Szeto YT, Benzie IFF, Collins AR, Choi SW, Cheng CY, Yow CMN, Tsec MMY. A buccal cell model comet assay: development and evaluation for human biomonitoring and nutritional studies. Mutat. Res. (2005) 578:371–381.[Web of Science][Medline]
38. Frenzilli G, Bosco E, Antonelli A, Panasiuk G, Barale R. DNA damage evaluated by alkaline single cell gel electrophoresis (SCGE) in children of Chernobyl, 10 years after the disaster. Mutat. Res. (2001) 491:139–149.[Web of Science][Medline]
39. Cohen J. Statistical Power Analysis for the Behavioral Sciences (1988) 2nd edn. New York, NY: Academic Press.
40. Lenth J. Statistical power calculations. Anim. Sci. (2007) 85:E24–E29.[CrossRef]
41. Bland JM, Kerry SM. Statistics notes. Trials randomised in clusters. Br. Med. J. (1997) 315:600.
42. Kerry SM, Bland JM. Statistics notes. Analysis of a trial randomised in clusters. Br. Med. J. (1998) 316:54.
43. Kerry SM, Bland JM. Statistics notes. Sample size in cluster randomisation. Br. Med. J. (1998) 316:549.
44. Kerry SM, Bland JM. Statistics notes. The intra-cluster correlation coefficient in cluster randomisation. Br. Med. J. (1998) 316:1455.
45. OECD. OECD Guidelines for the Testing of Chemicals: Genotoxicity, Revised and New Guidelines, Adopted 1997 (1997) Paris: Organization for Economic Cooperation and Development.
46. ICCVAM. ICCVAM Guidelines for the Nomination and Submission of New, Revised, and Alternative Test Methods (NIH 03-4508) (2003) Appendix C: glossary. http://iccvam.niehs.nih.gov/SuppDocs/SubGuidelines/SD_subg034508.pdf.
47. ISO. Precision of Test Methods—Determination of Repeatability and Reproducibility for a Standard Test by Inter-Laboratory Tests. ISO 5725 (1986) International Standards Organization. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=11832.
48. ASTM. ASTM E691-99 Standard Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method (1999) American Society for Testing and Materials.
49. M
llner M, Matthews H, Altman DG. Reporting on statistical methods to adjust for confounding: a cross-sectional survey. Ann. Intern. Med. (2002) 136:122–126.
50. Holland NT, Smith MT, Eskenazi B, Bastaki M. Biological sample collection and processing for molecular epidemiological studies. Mutat. Res. (2003) 543:217–234.[CrossRef][Web of Science][Medline]
Received on October 19, 2007; revised on February 6, 2008; accepted on February 18, 2008.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Balasubramanyam, N. Sailaja, M. Mahboob, M. F. Rahman, S. M. Hussain, and P. Grover In vivo genotoxicity assessment of aluminium oxide nanomaterials in rat peripheral blood cells using the comet assay and micronucleus test Mutagenesis, May 1, 2009; 24(3): 245 - 251. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



