Mutagenesis, Vol. 14, No. 3, 271-281,
May 1999
© 1999 UK Environmental Mutagen Society/Oxford University Press
Review |
Issues for conducting the microtiter version of the mouse lymphoma thymidine kinase (tk) assay and a critical review of data generated in a collaborative trial using the microtiter method
Environmental Carcinogenesis Division, National Health and Environmental Effects Research Laboratory, US Environmental Protection Agency, Research Triangle Park, NC 27711, USA and 1 Medical Research Council, Cell Mutation Unit, University of Sussex, Falmer, Brighton BN1 9RR, UK
| Introduction |
|---|
|
|
|---|
In response to a need to evaluate and validate the mouse lymphoma thymidine kinase (tk) assay in their laboratories, a large group of Japanese researchers worked together to establish the microtiter version of the assay and to compare that data from the mouse lymphoma assay with cytogenetic data. The results of this impressive and logistically difficult collaboration were submitted to Mutagenesis for publication by Honma et al. (1999). The peer reviewers of the paper, while recognizing the significant accomplishment required to generate the data and the need to publish the data, did not feel that the data should be published without comment. They suggested that the manuscript be published concurrently with a detailed review of the data.
Subsequently, we were asked by Dr James Parry to conduct such a detailed review. In so doing, we are taking the opportunity to: (i) identify issues that are particularly important in the conduct of the microtiter version of the assay; (ii) inform reviewers of mouse lymphoma data about what does, and does not, represent an adequate experiment; (iii) provide a general framework for evaluating a series of mouse lymphoma experiments. It is our hope that this information will be helpful to individuals who are initiating use of the assay, who are long time practitioners of the assay and also those who review and make decisions based on data generated by the assay. Although our comments are focused toward the microtiter version of the assay, much of what we say applies equally to the soft agar cloning method.
We cannot overemphasize the fact that it is not our intention to belittle the considerable feat that was achieved by the Japanese in organizing and successfully completing such a large trial involving so many laboratories, many of them inexperienced in gene mutation assays, in such a short time. We feel that the resulting data can be used as an exercise in data interpretation that will be helpful to both practitioners and regulators. We hope that it will encourage informed, wide-ranging and constructive discussion in the Mouse Lymphoma Workgroup at the International Workshop for Genotoxicity Test Procedures (IWGTP) that will be held in Washington, DC, in March 1999. We also sincerely hope that our comments will not be interpreted as dismissive of a laudable effort on the part of the trial participants. In fact, we feel that the many people involved in this trial are to be congratulated for their achievement.
| Background |
|---|
|
|
|---|
The mouse lymphoma assay has several characteristics that make it highly suitable as a mammalian mutation screening tool. (i) First, and most important, the assay detects a broad array of mutational events known to be important in human disease. An overview of our current knowledge is provided in a companion paper (Cole et al., 1999
Furthermore, the various cytotoxicity parameters obtained in the mouse lymphoma assay provide critical measures by which assay quality and adequacy can and must be assessed. One must ensure that: (i) the untreated cultures are grown and plated in an acceptable manner; (ii) the positive control cultures show the expected degree of cytotoxicity, plate with expected efficiency and have a mutant frequency within the expected range; (iii) the treated cultures show the expected pattern (generally a decline with increasing dose) of cell growth and plating efficiency; (iv) the cultures do not show unreasonable amounts of variability. All of this information provides the ability to assess the adequacy of experiments and requires that laboratories using the assay pay very careful attention to all aspects of cell culture and cell plating. It also requires that laboratories new to the assay spend adequate time becoming proficient with all of the technical procedures prior to the generation of actual test data.
Quality of the published data using the soft agar method
Unfortunately, much of the published mouse lymphoma literature (using the soft agar method) does not demonstrate proper use of the above-mentioned assay capabilities and/or proper evaluation of the various cell culture parameters that ensure optimal assay conduct. In fact, the recently published US EPA Gene-Tox Mouse Lymphoma Report (Mitchell et al., 1997
) for the soft agar version of the assay highlights the fact that a high proportion of the data (published between 1976 and 1993) could not be used to determine whether the chemical was positive or negative. Moreover, although not emphasized in the Gene-Tox report, significant portions of the data that were used to make a positive or negative call were certainly not representative of high quality mouse lymphoma data (M.M.Moore, observations). As identified by the Gene-Tox Committee, two major difficulties with much of those data were: (i) an obvious inefficiency in small colony recovery; (ii) test data that did not cover the complete cytotoxicity range. It should be noted that much of the data evaluated by the Gene-Tox Committee was completed prior to the publication of various consensus agreements (Clive et al., 1995
) and recommendations concerning the requirement for adequate small colony detection (Dearfield et al., 1991
). Some of this published data clearly demonstrated inadequate small colony detection (Moore et al., 1999
).
While recognizing and strongly emphasizing the importance of adequate small colony recovery, the Gene-Tox Committee wanted to provide definitive calls for as many chemicals as possible. Therefore, they did not require that the positive controls demonstrate proper small colony recovery if the test chemical met the other criteria of a positive response. Although this approach should not provide an incorrect positive call, it often underestimates the actual mutagenic potential of the chemical, i.e. the reported induced mutant frequency (IMF) is much lower than it should be. In addition, the Gene-Tox Committee did not require that a response be replicated either within an experiment or by an additional experiment. Thus, many calls were made based on single data points. Such an approach provides little or no assurance that the chemical was properly evaluated.
However, as stated clearly in the Gene-Tox report, all of the future published mouse lymphoma data should meet all of the now established acceptability criteria. Indeed, it is essential to utilize only high quality data for decision making. In our review of this data set we are providing: (i) a discussion of issues important to the microtiter assay; (ii) a strategy to approach the analysis of a data set; (iii) detailed guidance for data evaluation; (iv) a careful chemical-by-chemical assessment of this data set, the details of which may be found in the Mutagenesis database which is currently available, upon request, from the Mutagenesis Editorial Office.
| Important issues for microtiter assay conduct and general observations concerning this data set |
|---|
|
|
|---|
Scoring of large and small colonies using microtiter plates
While agar plates are scored and sized by an automated colony counter, at present microtiter plates are scored by eye, making the analysis much more subjective. In addition, the successful classification of microtiter plate colonies into size categories depends in large part on experience. Originally, when developing the microtiter version of the assay, Cole et al. (1990) classified colonies by size and a large number of colonies were isolated and checked for growth rate to ensure that colonies classified as `small' conformed to the definition of slow growing used by Moore et al. (1985). Clearly, there was not enough time during the Japanese trial for each laboratory, new to the assay, to conduct these analyses. Therefore, it is not surprising that the percent small colony (%SC) analysis in this data set is often different than one might expect. Furthermore, a good deal of variation in the %SC for the control cultures (both negative and positive) is seen in this data set. In our view, it is essential that all laboratories who use the microtiter protocol perform the same analysis, as was done by Cole et al. (1990), to ensure that they are using the proper criteria for colony sizing.
Relative small and large colony mutant induction is sometimes determined using colony number found in the microtiter wells. However, this is not the optimal method (Clive et al., 1995
). In order to make an accurate comparison with agar-generated data, it is necessary to calculate microtiter mutant frequencies (MFs) for large and small colonies separately. This, in turn, requires separately scoring wells with no large colonies and no small colonies. When the cells are plated at the standard cell density for mutant selection of 2000 cells/well, at high induced MFs, such as generally found for the positive control, many wells are positive and each well is likely to contain several mutant colonies, which may merge and make enumeration difficult. In addition, large mutant colonies may obscure small colonies, etc. Under these circumstances, only the overall MF can be determined (by counting the negative wells/plate). Using the microtiter protocol, the only way to assess large and small colony induction at high MFs is to plate at a lower cell density to ensure that each positive well will only contain a single colony (Clive et al., 1995
).
For this trial methyl methanesulfonate (MMS) and cyclophosphamide (CP) were used as the positive controls. Based on our experience, MMS (at 10 µg/ml) should induce a ratio of ~1:3 large to small colonies. CP (3 µg/ml) should give an ~1:5 ratio. It is clear from the data provided in Honma et al. (1999) that, in nearly every case, the percentage of small colonies is very much lower than expected. The reason for this may well be a combination of inexperience in colony sizing and the use of too high a cell plating density, making accurate small colony mutant estimation impossible.
Cloning efficiencies
Absolute cloning efficiencies for viability are very variable in this data set and often well over 100%. This is an indication of inexperience. It should be noted that, in the microtiter method, the cells are plated at very low density in the microtiter wells (only 12 cells/well) and inaccuracies can make a substantial difference in the apparent cloning efficiency. Therefore, it is essential to pay particular attention to the following points: (i) ensure a single cell suspension by vigorous pipetting or, if necessary, spinning and resuspending the cells, before estimating the cell density; (ii) count the cell suspension accurately and dilute to the plating density with great care; (iii) do not incubate after counting (an hour or two in the incubator will mean more cell division and clumping of the cells); (iv) resuspend the cells carefully before plating to ensure they are distributed evenly over the plate or the Poisson formula will not apply.
Plating efficiency (PE) is more than just a measure of cytotoxicity. The PE2 (plating efficiency day 2 cloning) is the denominator in the calculation of the MF. Errors in assessing the PE2 can lead to substantial miscalculation of the MF. The variability seen in the PEs in these experiments makes it difficult to have confidence that the small MF increases are accurate reflections of chemically induced mutations. As discussed further below, and in detail in the compound-by-compound commentary, this factor alone keeps us from concurring with the decision to call some small increases in MF positive. It also causes problems with evaluating a chemical as negative when small increases in MF are observed. We note such experiments as inconclusive in the chemical-by-chemical evaluation.
Estimating the toxic effect of the chemical
It is generally agreed that when using the mouse lymphoma assay, chemicals should be tested to a concentration resulting in 1020% survival. Two methods have been used to estimate this. (i) Relative survival (RS) is obtained by counting and cloning immediately after treatment and expressing the PE relative to the negative control. This method is generally used for the microtiter version of the assay. (2) Relative total growth (RTG) is calculated as the relative cumulative suspension growth through day 2 multiplied by the day 2 relative PE (Clive and Spector, 1975
), and is used for the soft agar method. The merits of these two methods have been debated on many occasions. In practice, both have been acceptable for defining the test cultures that are used for evaluating the positive or negative response. However, these differences in cytotoxicity assessment have very real implications for defining the required/acceptable level of cytotoxicity in dose selection and ultimately in determining whether a response is positive or negative.
There is an additional difference in procedure that can further exacerbate the differences between the two methods. The majority of current microtiter method users, following treatment and chemical removal, count the cultures and then adjust the cell density to a standard value such as 0.2x106 cells/ml and then conduct the day 0 plating. However, some laboratories, particularly those who have historically used the soft agar method, simply resuspend the cells in a defined volume of medium and, therefore, do not adjust the cell density.
In some circumstances (for instance when a substantial proportion of cells are lost by lysis during the treatment period) these two different approaches to the day 0 plating would be expected to give very different %RS and %RTG values. The %RTG reflects: (i) the loss of cells during treatment; (ii) the growth in suspension following treatment; (iii) the plating efficiency at the time of cloning. However, %RS, when determined after the cell density is adjusted, is simply an estimate of the relative plating efficiency of those cells recovered after treatment. A mathematical adjustment in the calculations to take account of cells lost during treatment has been suggested, but to our knowledge is not in wide use. This adjustment can be made as follows. The cell counts performed post-treatment are compared with the solvent control. This value is referred to as the cell count factor (CCF). Once the cloning efficiency plates are scored, the calculated PE can be multiplied by the relevant CCF value.
While, as already indicated, it would be generally expected that the %RS and %RTG for a particular culture would not be exactly the same, it is notable in this data set that there are sometimes dramatic differences between cytotoxicity estimated by RS and RTG. In one instance, for one culture, the first laboratory got a PE0 of 71%, an RS of 53%, but an RTG of only 4% (following a PE2 of 60%). The RS value is well above the 10% toxicity cut-off and therefore would be an acceptable data point. The RTG, which is below 10%, would have caused the data point to be rejected as being too toxic! For the same dose of the chemical, the other laboratory found very good agreement between the two toxicity estimates, namely PE0 = 17%, RS = 17% and RTG = 16% (with PE2 = 75%). It is likely that the first laboratory was either not diluting and plating the cells carefully enough or the cells were not growing properly post-treatment (e.g. compound was not washed out following treatment) or both.
Because all laboratories produced data using MMS and CP, the %RS and %RTG for a large number of test cultures can be compared and evaluated (Figure 1
). It is clear, at least in this data set, that the two measures give very different assessments of the cytotoxicity. Both for the MMS- and CP-treated cultures, the %RS gives a generally higher value for the relative survival. However, perhaps more significant, there is no real relationship between the two measures, i.e. cultures with high %RS do not necessarily have a high %RTG, etc. These differences probably come from excessive variability, probably due to the inexperience of the laboratories conducting the experiments. It is also important to note that many of the MMS-treated cultures actually plated better on day 0 than the negative control, i.e. they have %RS values that exceed 100%; this should not be the case.
|
Dose selection
It is not possible to achieve the ideal concentration range in every experiment, particularly when only a small (4 or 5) number of concentrations are used. Often, in this data set, there was overemphasis on the low (non-toxic) concentrations and insufficient close spacing of doses over the 3010% cytotoxicity range. When the critical dose range is missed, it is often impossible to make a definitive call, particularly for weak mutagens and non-mutagens. In these situations, the chemical must be retested. We want to emphasize that this problem is not unique to this data set. In our experience evaluating mouse lymphoma assay data, one of the most common problems is the insufficient number of test cultures evaluated.
We would note that experienced laboratories generally start an experiment with 12 or more different concentrations and then clone those cultures that (based on their suspension growth at day 2) appear to cover the dose range, while emphasizing the 3010% range. Such a strategy is almost essential when evaluating chemicals that are either negative or weakly mutagenic. Whereas it is relatively easy to demonstrate the activity of chemicals inducing high MFs, the successful evaluation of non-mutagens or weak mutagens is dependent upon obtaining more than one treated culture in the 3010% survival range.
Statistical analyses of the data
Honma et al. (1999) report that statistical analysis was used on all the experimental results. In the first part of the Japanese trial, the protocol and statistical analysis was based on UK Environmental Mutagen Society (UKEMS) Guidelines which require the use of duplicate cultures. Only the mean of these duplicate cultures is reported in the Honma et al. (1999) paper. In the second half of this trial, single cultures were used (making the UKEMS statistics inappropriate for use) and the data were evaluated by a newly developed procedure (M.Hayashi, in preparation) that adjusts the type I error in the UKEMS procedure. According to Honma et al. (1999), this `procedure consists of four steps, i.e. identification of clear negative data by comparison of the MF in the treatment group and concurrent control, elimination of data showing a downturn phenomenon using the SimpsonMargolin procedure, doseresponse evaluation by the weighted least square method and multiple comparisons with the concurrent control by a modified Dunnett's procedure'.
In order to evaluate the suitability of the UKEMS guidelines and statistical approach to this, or any, data set it is important to have some understanding as to their basis. The UKEMS produced two published guidelines for mammalian cell assays, one for fluctuation (microtiter) assays (Robinson et al., 1989
) and the other for colony-forming assays (Arlett et al., 1989
). Both were based on extensive raw databases provided by experienced laboratories. Given the variability seen in the spontaneous MFs, recommendations were made for the numbers of cells to be treated and subcultured (taking cell kill into account), the numbers of plates necessary to determine the mutant frequency accurately and the number of cultures necessary to evaluate the variation in any given experiment. The recommendations were first evaluated during a collaborative trial (Arlett and Cole, 1990
; Green et al., 1990
). As already mentioned, the 1989 Guidelines strongly recommended the use of replicate cultures, which was confirmed by the UK trial. In addition, it was also noted in the UKEMS trial that variation between experiments was significantly greater than between replicates within an experiment in about half the experiments. This, it was suggested (Green et al., 1990
), might create problems of interpretation. These guidelines and those for the other routine assays are, at present, being re-evaluated in the light of the last 8 years of experience.
We would comment that statistical analysis (by any method) is not the final word on the compounds in this data set because inevitably, probably due to inexperience, these experiments were often not optimally performed. Statistics have a place in assessing mutagenicity data, but it is important that experiments meet all acceptability criteria prior to statistical analysis.
Criteria for defining individual experiment acceptability
There are general criteria that can be used to determine if an individual experiment is acceptable. These include: (i) plating efficiency; (ii) mutant frequency for the negative and positive controls; (iii) internal consistency between the RS and the RTG; (iv) RS and/or RTG for the positive control. Our preliminary analysis of the data set identified problems in all of these areas.
There are, in fact, published guidelines concerning experimental acceptability for the mouse lymphoma microtiter method. As noted by Honma et al. (1999), a meeting of mouse lymphoma experts in Portland, Oregon, in May 1994 yielded a consensus agreement concerning a number of assay parameters (Clive et al., 1995
). In particular, it was recommended that for the microtiter version of the assay, absolute PE for solvent controls be between 60 and 140% for PE0 (plating immediately after treatment) and between 70 and 130% for PE2 (plating day 2). We note that this is a very wide range, and probably too wide. The group recommended a range for the soft agar version of the assay that was between 70 and 120% (day 2 plating). We feel that 70120% (for both PE0 and PE2) would be more appropriate as the acceptable range for the microtiter method as well. For our assessment of this data set, however, we have used the Portland microtiter consensus values to establish assay acceptability for plating efficiency. With regard to the minimum acceptable value for the MF of the negative control, the Portland group recommended 60x106 for the microtiter method. Although the group did not define an upper cut-off for the spontaneous mutant frequency (SMF), we are concerned about values >250x106 for example and have used that as our allowable upper limit for the spontaneous control.
As the first step in our evaluation, we identified individual experiments that have PEs and SMFs that do not meet the recommended values of the Portland consensus. It should be noted that there are several experiments in which the negative control met the criteria, but one or more of the treated cultures had either a PE0 or a PE2 that was above either 140 or 130%, respectively. While the Portland consensus did not identify this as an issue, it is clear that these values are just as unacceptable for treated cultures as they are for the negative control and we have identified them as unacceptable. This analysis found 67 of 185 (36%) experiments that do not meet the Portland consensus and therefore should be considered to be unacceptable. A table detailing this evaluation can be found in the Mutagenesis database.
Another basic criterion that should be used for assay acceptability is the evaluation of the positive control. Positive controls are used to demonstrate that the experimental conditions were acceptable and that it is possible to obtain MFs, following treatment, that are within a predicted range. For positive controls, it is somewhat difficult to establish an acceptable range that is applicable across laboratories. For this reason, the Portland Expert Group did not define an acceptable range for positive controls.
The positive controls in the present data set show a very large degree of variability. Tables I and II![]()
show, for all experiments in this trial using 10 µg/ml MMS or 3 µg/ml CP, the values for four different parameters that were obtained for the positive controls: (i) %RS; (ii) %RTG; (iii) calculated MF; (iv) IMF (IMF = MR SMF). We have also included the corresponding SMF for each experiment. Although we cannot define an absolute range for acceptability based on an established consensus, we can, based on our collective experience, make some observations and establish some conservative cut-offs for acceptable values. A review of the values in Table I
clearly reveals an extreme amount of variability in the MMS data. The CP data (Table II
) appears to be somewhat more consistent, although still variable. The reason(s) for this difference is not clear. However, based on the generally low MF and apparent lack of cytotoxicity seen with MMS, it is reasonable to speculate that there may have been some problem with the MMS samples used in the testing.
|
|
Some variability would be expected in the MF for MMS (or any other chemical) between experiments and over long time periods. MMS can present particular problems because it does have a relatively short half-life in the presence of water; it is hygroscopic and hydrolyzes. Thus, if not handled and stored properly, different samples could have very different mutagenic potencies. It should be noted that not all laboratories had low MF and low cytotoxicities; in fact, some of the values are within the ranges that we might expect to see.
One can evaluate whether the actual mutagenic potency varied among the MMS samples used in these experiments by graphing the IMF values against the cytotoxicity. This plot would be expected to show that experiments with low cytotoxicity have relatively low MF and those with high cytotoxicity have relatively high MF. This, in fact, is generally the case for the CP-treated cultures when the MF is plotted against the %RTG (Figure 2
). However, it is not so clear that the CP MF is related to the %RS (Figure 3
). For MMS, the MF does not seem to be associated with either the %RTG (Figure 4
) or %RS (Figure 5
). We find this to be very puzzling. We feel that this lack of association between the level of cytotoxicity and MF is reflective of a very wide degree of unacceptable variability around the three different measurements. The wide range of PE0s and PE2s observed in the negative control certainly supports this hypothesis. This variability is probably explained by the fact that cell culture handling and conditions were less than ideal. It can also be clearly seen in Figures 1 and 5![]()
that a very high proportion of the MMS-treated cultures had a %RS that was very close to or >100%. This means that these MMS-treated cultures had plating efficiencies at day 0 that were as good as, or better than, the negative control! Taking all of these observations into account, although it does appear that there may have been some problem with the MMS used, the data obtained with MMS is clearly reflective of excess variability.
|
|
|
|
With regard to the CP positive control cultures, while a plot of MF against %RTG does show a general trend toward high cytotoxicity and high MF, there is still a great deal of variation in the MF for cultures that attained the same level of cytotoxicity. This variation, particularly at %RTGs that are <~25%, is more than we would expect to see.
Having evaluated the overall qualities of the positive control data, we find ourselves in somewhat of a dilemma. Based on our experience, we would ideally like to accept only those experiments in which the 10 µg/ml MMS culture gives an RS <50%, an RTG <50% and a MF of at least 750x106. These are values that we would expect to be on the extreme end of acceptability. Although we have less collective experience with CP, we would expect similar (or perhaps lower) limits for the %RS and %RTG and the MF to be at least 1500x106. However, if we used these criteria, almost every experiment (particularly in the S9 data set) would be eliminated.
The attainment of certain minimum frequencies for positive controls is required to demonstrate the adequate quantification of the small colony mutants. Mutant frequencies that are too low are often reflective of inadequate small colony recovery (Moore et al., 1999
). Positive controls are therefore very important in evaluating whether a chemical is or is not mutagenic. It is inappropriate to determine that a compound is negative when the positive control IMF is unacceptably low.
After substantial deliberation, we settled on the following criteria to identify experiments, based on the positive controls, that we feel should be considered unacceptable. While these cut-offs are arbitrary, they are very conservative. For both MMS and CP, we identified as unacceptable those experiments in which the %RS or %RTG were >80%. For MMS, we identified as unacceptable those experiments in which the IMF was <500x106. For CP, we identified as unacceptable those experiments in which the IMF was <900x106.
As with the negative control, there is no generally defined and accepted upper limit for the MF of the positive control. We do note in the chemical-by-chemical analysis (in the Mutagenesis database) a few values that appear to be outliers.
Using the above outlined criteria to determine which experiments are acceptable based on plating efficiency, negative control values and positive control values, we identified 49 of 185 (26%) experiments that meet the acceptability criteria. Tables detailing the specific acceptable and unacceptable experiments are found in the Mutagenesis database. Unfortunately, that does not mean that 26% of the experiments provide definitive information as to whether or not the chemical is mutagenic.
Evaluation criteria
The general criteria that we used to determine whether a chemical was positive or negative are the same as those used by the US EPA Gene-Tox expert panel (Mitchell et al., 1997
), who evaluated only data generated using the soft agar protocol. Recognizing that there is no universally agreed-upon statistical approach to evaluating mouse lymphoma data, the Gene-Tox criteria are based on the combined experience of its members. However, it should be emphasized that it is not appropriate to simply transfer these criteria, which were developed based on extensive experience with the soft agar method, to the microtiter data. As detailed below, the Gene-Tox responses required for positive and negative calls are based on an understanding of the normal variation of the background mutant frequency seen in the soft agar method. This understanding comes from a significant number of experiments designed specifically to investigate variability and a significant number of experiments that are then pooled to establish historical databases in the laboratories of the committee members. Such an extensive analysis has not been published for the microtiter method. However, we do know that individual laboratories with several years experience with the assay have accumulated sufficiently large databases to define the variability under their conditions.
We have already emphasized the importance of obtaining test data across a complete doseresponse curve. For the soft agar method, cytotoxicity is universally measured by the %RTG originally defined by Clive and Spector (1975). Most laboratories conducting the microtiter method use the %RS to define cytotoxicity. Of course, the microtiter method also allows the calculation of a %RTG (which, as already noted elsewhere, may or may not be, depending on how cell density is handled following treatment, equivalent to the %RTG calculated by users of the soft agar method).
As already mentioned, it is generally agreed that chemicals should be evaluated using concentrations that result in 1020% survival. However, the %RS and %RTG are often somewhat different and, in fact, there are valid reasons why, for specific chemicals, the RS and RTG might be quite different. Because there are two different ways to assess survival, it can be difficult to decide which treated cultures should and should not be included in the evaluation.
In fact, the two measures in this data set are often substantially different. The conclusion that the differences are most probably due to technical rather than biological differences is drawn because laboratory A and laboratory B (and in some cases laboratory C) often differ considerably in this respect. This makes it extremely difficult to make decisions as to which data points are and are not acceptable. For our analysis, we have taken the strategy that definitive calls can only be made on data points that have both measures of cytotoxicity within the acceptable range. This is clearly a conservative approach. However, in the absence of evidence that one measure is superior to the other, and due to the importance of this data set, we feel this is appropriate.
Honma et al. (1999) state that they used this same approach for excluding a particular data point, i.e. they state that doses were not used if either the %RS or the %RTG were <10%. However, it is clear that this policy was not uniformly followed. There are several examples of experiments where data points were used in the evaluation when either the %RS or %RTG was <10%. This is also a significant issue when deciding to call a response negative. Our criteria require that there be doses that give both %RS and %RTG in the 2010% survival range. It is clear that such stringent criteria were not used and responses were called negative when either the %RS or %RTG (or in some cases both) were substantially >20%.
We discuss below the Gene-Tox criteria for soft agar data and how we have modified the criteria for the present analysis.
Positive. The Gene-Tox Committee defined two categories of positive calls, definitive and limited. The Gene-Tox criteria for a definitive positive requires an IMF of 100x106 at a %RTG >20%. Although somewhat arbitrary, the requirement for the induced MF of 100x106 is based on a combination of the Don Clive laboratory Monte Carlo analysis of the variation of background frequencies (D.Clive, unpublished data) and a general consensus that every statistical method and every experienced soft agar mouse lymphoma laboratory would agree that such a response is positive. While this may strike many as an unscientific analysis, for an assay that can give MF responses >3000x106, a response of 100 is a relatively minimal response to define a positive. Likewise, the requirement for a %RTG >20% is arbitrary. Many laboratories use the 10% RTG cut-off as originally suggested by Clive et al. (1979). The Gene-Tox committee did agree that to call a chemical definitively positive it was reasonable to see the positive response at a %RTG >20%. The Gene-Tox Committee assigned a limited positive when the data did not meet the definitive positive criteria but did show an induced MF >70x106 at a %RTG >10%.
As stated above, the major factor used to define the Gene-Tox requirement for an increase of either 70 or 100 was a good understanding of the normal variability of the SMF. In the Japanese trial database the SMF varies widely; the lowest observed value being 23x106 and the highest being 628x 106. This is the range over a large number of laboratories, each of whom conducted a small number of experiments. For the first half of the data set, these values represent the means of two cultures and we have not had access to the duplicate culture data. Moreover, because each laboratory only tested a limited number of chemicals once and because we cannot identify which laboratories conducted which experiments, we are not able to comment upon the variability of the SMF within a given laboratory. Therefore, we have used our combined experience to define the acceptable range of values to be between 60 (the Portland Consensus value) and 250x106. Working on the hypothesis that the MF for a positively responding culture should be above the highest acceptable MF for the negative control, then an increase of 190x106 (250 60 = 190) would be required. The term `weak' was applied to positive responses that showed relatively small increases in mutant frequencies (increases of <~300x106). Given the uncertainty as to the appropriate variability in the SMF, a limited positive response could not be defined.
Equivocal. The Gene-Tox Committee used this category to define chemicals that fluctuated between being very weakly positive and negative either between experiments or within an experiment. The literal meaning of equivocal is equal voice. This call is a definitive call, because it is not likely that additional testing using the same protocol will provide useful information. It should be noted that the equivocal call requires many data points and two or more very informative experiments. For the current data set, we could not apply this call to any of the chemicals.
Negative and non-toxic negative. The Gene-Tox Committee defined two categories of negative; cytotoxic and non-cytotoxic. To be determined a cytotoxic negative, one or more doses must have fallen between 10 and 20% RTG. The Gene-Tox criteria required that the induced MF of all tested doses (with cytotoxicity >10%) must be <70x106. The non-cytotoxic negative classification was applied to those chemicals tested to 5000 µg/ml that did not attain cytotoxicity <20% RTG or a positive response. Using the same reasoning that we applied for determining a compound positive, we required the IMF to be <190x106 in order to define a chemical as negative in this data set.
Not testable (or inappropriate for testing). The Gene-Tox Committee devised this category and applied it to chemicals that could not be adequately tested by the `standard' protocol. Chemicals were placed in this category if they: (i) were either insoluble or became insoluble during the testing; (ii) caused drastic shifts in pH; (iii) elevated the osmolality. It is realized that there are different philosophies as to the relevance of test data obtained in the insoluble range. It is generally agreed that suspension cultures are particularly problematic because the test chemical may not be washed away when the cells are pelleted and resuspended in fresh medium. Insoluble chemicals are good candidates for evaluation under longer (up to perhaps 24 h) treatment times. It might be noted that most laboratories that use soft agar cloning also use a 4 rather than a 3 h treatment time. This extra hour might make it possible to use slightly lower (i.e. soluble) doses, thus allowing for the testing of a chemical that would not be testable using a 3 h treatment. In addition, it is often possible to refine (narrow) the concentration range and still obtain useful data for chemicals that become insoluble, alter pH or elevate osmolality. When widely spaced doses are used, it is easy to miss the critical range and find a chemical negative at one concentration and insoluble at the next test concentration. Therefore, one must evaluate a chemical across the appropriate dose range before placing it into this particular category.
Inconclusive. This category is used both by us and the Gene-Tox Committee for those experiments that could not be evaluated as positive, negative or equivocal. As discussed in more detail elsewhere, there were several difficulties that plagued this data set: (i) excessive variability in plating efficiency; (ii) excessive variability between the %RS and the %RTG; (iii) inadequate or insufficient dose points to cover the required doseresponse range; (iv) positive control MFs that were too low; (v) lack of demonstrated cytotoxicity by the positive control; (vi) SMFs that were either too low or too high.
| Data evaluation |
|---|
|
|
|---|
Table III
|
As already mentioned, our detailed chemical-by-chemical evaluation can be found in the Mutagenesis database.
Overall, 21 of the 41 chemicals were determined to be inconclusive both with and without S9 activation. Twelve chemicals were inconclusive under one test condition (with or without S9) but could be evaluated under the other test condition. Six (cytosine arabinoside, hydroxyurea, methotrexate, monocrotaline, mitomycin C and N-aminoethyl ethanolamine) were positive both with and without S9. Cinnamyl anthranilate was weakly positive both with and without S9. 1,3-Dimethyl zanthine was a non-toxic negative.
While this may not seem to be a particularly good result, we want to emphasize the difficulty in establishing the assay by conducting a few experiments and then attempting to provide useful test evaluation by using a very small number of dose points. In fact, this is an impossible objective, particularly with a set of chemicals that are likely to be, at best, weakly mutagenic. We commend the participants in this trial for their excellent efforts.
In fact, the utility of this collaborative trial far exceeds its ability to determine whether this particular set of chemicals are or are not mutagenic. The data set provides an ideal opportunity for others, particularly first time users and those who do not personally conduct mutation assays (yet utilize mutation assay information) to learn about the evaluation of mouse lymphoma data. Most of the laboratories demonstrate the usual problems that one encounters when first attempting to establish an in vitro mammalian gene mutation assay. Given the sensitivity of mouse lymphoma cells (and other tissue culture cells) to subtle differences in handling and culture conditions, any protocol that closely measures growth and cytotoxicity will reveal great variability when any aspect of the experimentation is less than optimal. It should be noted that the cytogenetic assays, with which the Japanese laboratories are experts, do not include such careful monitoring of cell growth and cloning ability. Therefore, the shift to protocols that require very stringent control of cell handling and culture requires some additional experience.
Although we have identified many problems with the data generated in this trial, the difficulties seen are not substantially different from those observed in our own laboratories while new individuals are learning to perform the assay. For this reason, new trainees are required to conduct a large number of practice assays to gain experience prior to using the assay for evaluating unknown chemicals. In addition, multiple determinations of MF from a single culture and other experimental designs geared toward evaluating variability provide the user with substantial insight into just how much variability is inherent in this biological system. Because we have this experience, we are hesitant to call chemicals positive based on single cultures showing very small increases in MF. As mentioned in the specific chemical analyses, there are several examples in this data set that provide some insight into the variability of the background MF.
In summary, while the Japanese trial required remarkable effort and demonstrated that the laboratories can conduct the assay, the resultant data set is not yet sufficient to be used to form conclusions concerning assay capability and performance. When laboratories become proficient in the technical aspects of the mouse lymphoma assay, the assay is relatively easy to conduct and is sensitive to the broad array of mutational events that are important in human health risk assessment. By using the various cytotoxicity measures as guides, one can readily attain complete doseresponse information. Properly conducted and evaluated, the microtiter version of the assay should allow a reliable evaluation of the mutagenic properties of test chemicals.
| Final comments |
|---|
|
|
|---|
One of the stated goals of the Honma et al. (1999) paper was to evaluate whether the mouse lymphoma assay is capable of detecting chemical clastogens. They compared their results obtained in the mouse lymphoma assay with cytogenetic data obtained in other cell lines. There are a number of pitfalls with this approach. In particular, with cytogenetic assays the measure of cytotoxicity is generally less well defined than it is in the mouse lymphoma assay and it is difficult to ensure that a chemical is evaluated across the entire cytotoxicity range. Furthermore, with cytogenetic assays, because one has difficulty assessing where a particular dose falls on the cytotoxicity curve, the analysis is often performed on cultures treated with concentrations causing substantially more cytotoxicity than is normally possible, or generally accepted, with the mouse lymphoma assay. Thus, when comparing the mouse lymphoma and cytogenetic analysis, one would expect some differences in test results, based solely on the fact that the chemicals were tested to different levels of cytotoxicity (to different maximum concentrations). While it is premature to draw conclusions using much of the data generated in this trial, we would expect to see differences between the results of well-conducted mouse lymphoma assays and cytogenetic analysis.
We feel that the most appropriate way to determine whether the mouse lymphoma tk gene assay detects clastogens is to conduct the analysis using mouse lymphoma cells. Treated cultures can be aliquoted and used for both cytogenetic and mutation analysis, allowing a direct comparison of results. A rather extensive database has been generated using this approach and the analysis of this information supports the use of the mouse lymphoma tk gene mutation end point to detect clastogens (Doerr et al., 1989
; Moore and Doerr, 1990
).
| Notes |
|---|
This manuscript has been reviewed by the National Health and Environmental Effects Research Laboratory, US Environmental Protection Agency and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the Agency nor does mention of trade names of commercial products constitute endorsement or recommendation for use.
2 To whom correspondence should be addressed: Tel: +1 919 541 3933; Fax: +1 919 541 0694; Email: moore.martha{at}epamail.epa.gov ![]()
| References |
|---|
|
|
|---|
-
Arlett,C.F. and Cole,J. (1990) The third United Kingdom Environmental Mutagen Society collaborative trial: overview, a summary and assessment. Mutagenesis, 5 (suppl.), 8588.
Arlett,C.F., Smith,D.M., Clarke,G.M., Green,M.H.L., Cole,J., McGregor,D.B. and Asquith,J.C. (1989) Mammalian cell gene mutation assays based upon colony formation. In Kirkland,D.J. (ed.) Statistical Evaluation of Mutagenicity Test Data. Cambridge University Press, Cambridge, UK, pp. 66101.
Clive,D. and Spector,J.F. (1975) Laboratory procedure for assessing specific locus mutations at the TK locus in cultured L5178Y mouse lymphoma cells. Mutat. Res., 31, 1729.[Web of Science][Medline]
Clive,D., Johnson,K.O., Spector,J.F.S., Batson,A.G. and Brown,M.M. (1979) Validation and characterization of the L5178Y/TK+/ mouse lymphoma mutagen assay system. Mutat. Res., 59, 61108.[Web of Science][Medline]
Clive,D., Bolcsfoldi,G., Clements,J., Cole,J., Honma,M., Majeska,J., Moore,M., Muller,L., Myhr,B., Oberly,T., Oudelhkim,M.-C., Rudd,C., Shimada,H., Sofuni,T., Thybaud,V. and Wilcox,P. (1995) Consensus agreement regarding protocol issues discussed during the mouse lymphoma workshop: Portland, Oregon, May, 7, 1994. Environ. Mol. Mutagen., 25, 165168.[Web of Science][Medline]
Cole,J., Diot,M.C., Richmond,F.N. and Bridges,B.A. (1990) Comparative induction of gene mutations and chromosome damage by 1-methoxy-1,3,5-cycloheptatriene (MCHT), 2. Results using L5178Y mouse lymphoma cells to detect both gene and chromosome damage; validation with ionizing radiation, methyl methane sulphonate, ethyl methane sulphonate and benzo[a]pyrene. Mutat. Res., 230, 8191.[Web of Science][Medline]
Cole,J., Harrington-Brock,K. and Moore,M.M. (1999) The mouse lymphoma assay in the wake of ICH4where are we now? Mutagenesis, 14, 265270.
Dearfield,K., Auletta,A., Cimino,M. and Moore,M. (1991) Considerations in the U.S. Environmental Protection Agency's testing approach for mutagenicity. Mutat. Res., 258, 259283.[Web of Science][Medline]
Doerr,C.L., Harrington-Brock,K. and Moore,M.M. (1989) Micronucleus, chromosome aberration and small-colony TK mutant analysis to quantitate chromosomal damage in L5178Y mouse lymphoma cells. Mutat. Res., 222, 191203.[Web of Science][Medline]
Green,M.H., Cook,S.K., Cole,J. and Arlett,C.F. (1990) A statistical analysis of the third UKEMS collaborative trial. Mutagenesis, 5 (suppl.), 7184.
Honma,M., Hayashi,M., Shimada,H., Tanaka,N., Wakuri,S., Awogi,T., Yamamoto,K., Kodani,N.-U., Nishi,Y., Nakadate,M., and Sofuni,T. (1999) Evaluation of the mouse lymphoma tk assay (microwell method) as an alternative to the in vitro chromosomal aberration test. Mutagenesis, 14, 522.
Mitchell,A.D., Auletta,A.E., Clive,D., Kirby,P.E., Moore,M.M. and Myhr,B.C. (1997) The L5178Y/tk+/ mouse lymphoma specific gene and chromosomal mutation assay a phase III report of the U.S. Environmental Protection Agency Gene-Tox Program. Mutat. Res., 394, 177303.[Web of Science][Medline]
Moore,M.M. and Doerr,C.L. (1990) Comparison of chromosome aberration frequency and small-colony TK-deficient mutant frequency in L5178Y/TK+/-3.7.2C mouse lymphoma cells. Mutagenesis, 5, 609614.
Moore,M.M., Clive,D., Hozier,J.C., Howard,B.E., Batson,A.G., Turner,N.T. and Sawyer,J. (1985) Analysis of trifluorothymidine-resistant (TFTr) mutants of L5178Y/TK+/ mouse lymphoma cells. Mutat. Res, 151, 161174.[Web of Science][Medline]
Moore,M.M., Collard,D.C. and Harrington-Brock,K. (1999) Failure to adequately use positive control data leads to poor quality mouse lymphoma data assessments. Mutagenesis, 14, 261263.
Robinson,W.D., Healy,M.J.R., Green,M.H.L., Cole,J., Gatehouse,D. and Garner,R.C. (1989) Statistical evaluation of bacterial/mammalian fluctuation tests. In Kirkland,D.J. (ed.) Statistical Evaluation of Mutagenicity Test Data. Cambridge University Press, Cambridge, UK, pp. 102140.
Received on August 24, 1998; accepted on January 25, 1999.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. M. Parry Applications and interpretation of data obtained in the mouse lymphoma tk assay Mutagenesis, May 1, 1999; 14(3): 255 - 255. [Full Text] [PDF] |
||||
![]() |
J. Cole, K. Harrington-Brock, and M. M. Moore The mouse lymphoma assay in the wake of ICH4—where are we now? Mutagenesis, May 1, 1999; 14(3): 265 - 270. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





