In this paper we discuss the problem of predicting the fish toxicity property of chemical compounds, and show how this can be approached using a computational intelligence method. There are two views on assessing toxicities: One says that such properties can be derived from the whole molecular structure, the other that some specific functional substructures, called Structural Alerts (SA), are able to explain the toxicity. In this work, a new Structure-Activity Relationship (SAR) approach is proposed to mine molecular fragments that act like SAs for the biological activity. We apply our data mining method, called SARpy, to a dataset about LC50 for the fathead minnow, and build a multiclass classifier in the categories defined by the legislation. We test the model on an external test set of data about trout toxicity. The new model shows marked prediction skills and, more interestingly, it is based on mined structural alerts. Discovering new knowledge about substructures statistically strongly connected to toxicity opens to other future in-silico methods. The model is freely available in the VEGA huba among other models for aquatic toxicity.ahttps://www.vegahub.eu
Structure activity relationships (SAR), Data mining, Knowledge discovery, Fish toxicity, Structural alerts
In our industrialized society, a huge amount of chemical substances are used and produced every day. This increasing number of chemicals around us raises the problem of characterization, prediction and evaluation of their consequences to the human health and to the environment.
Toxicology provides the knowledge of mechanisms, rules, and data characterized by quality levels and defines the limits of safety of chemical agents. Chemistry offers knowledge on chemical descriptors and physico-chemical properties. Biology studies the mechanisms of the action of the chemicals on the animals and the other organisms used for tests.
Predictive Toxicology is a multi-disciplinary science that requires the close collaboration among toxicologists, chemists, biologists, statisticians and Artificial Intelligence researchers. Statistics and Machine Learning integrates all these items to analyze the existing data and, especially, to extract new knowledge from the data, or at least to generate reliable toxicity predictions for chemical compounds [1-4]. In this way, the main drawbacks in toxicity studies, like high costs of experiments, duration of the tests, using animals in scientific experiments , can be surmounted.
The goal of toxicity prediction is to describe the relationship between chemical properties and biological and toxicological processes. SAR relates features of a chemical structure to a property, effect or biological activity associated with the chemical. There are two kinds of SAR: qualitative and quantitative. A qualitative relationship is a general rule, which provides a class label. Quantitative structure-activity relationships (QSARs) can be developed using continuous (quantitative) data, mostly through a regression process. This area was developed in the last forty years to assess the value of drugs and more recently to assess general toxicity.
Three "postulates" of modeling QSARs are accepted:
P1: The molecular structure is responsible for all the activities shown.
P2: Similar compounds have similar biological and chemo-physical properties.
P3: QSAR is applicable only to similar compounds.
In toxicity prediction various machine learning techniques were proposed, from expert systems to Artificial Neural Networks (ANN), from decision trees to neuro-fuzzy models. There were also local versus global models, and different combinations of classifiers .
Human experts usually estimate toxicity through the detection of particular structural fragments, already known to be responsible for the toxic property under investigation. In the literature such fragments are referred to as structural alerts (SAs) and can be derived by human-experts, from knowledge of the biochemical mechanism of action (such as the activation of an enzyme cascade or the opening of an ion channel, which leads to a biological response); these mechanisms are still poorly understood and largely unknown.
Only a few approaches have been developed to assist experts in directly extracting such knowledge from data. Some are based on inductive logic programming (ILP) ; they cannot be directly applied to standard chemical formats for molecule representation and require extra computation. Other, including NIKE , uses a neuro-fuzzy approach to integrate explicit and implicit knowledge; however they use chemical descriptors derived after complex simulations and not directly the chemical 2D structure.
In the following we introduce the problem of our investigation, i.e. acute fish toxicity, the data available, our method for knowledge extraction, the obtained model and its statistics.
Among the available data sets with experimental results, the acute toxicity database to fathead minnow (Pimephales promels) is considered a "gold standard" for quality of toxicity measurement. The chemicals were selected from the Toxic Substances Control Act (TSCA) inventory of chemicals to represent a cross-section of industrial organic chemicals. Thus, the fathead minnow database is chemically heterogeneous and the substances in it represent a large spectrum of mechanisms of toxic action. In fact it includes compounds acting by different modes and mechanisms such as narcosis (type I, II and III), uncouplers of the oxidative phosphorylation, reactive electrophiles/proelectrophiles, acetylcholinesterase inhibitors, and central nervous system (seizure) agents.
Recent work is expanding the set of chemicals for which more testing results are available. Recently  analyzed the effect of some antibiotics on fish larvae (Tilapia nilotica) toxicity. Another important area of investigation is the effect on fishes of pollutants derived from agriculture. Recent papers [10,11] report about tests on fish of the toxicity of chlorpyrifos, carbaryl, diuron, nemacur, and malathion, showing how their presence in water poses a concern, in particular for chlorpyrifos (for which the experimental LC50 was 0.08 µ mol/L).
The QSAR models so far developed on those data predict either the LC50 value or a class, using the classification adopted at international levels. We shortly present this topic in the next section. Predicting a class is more useful for regulatory purposes, and this is our choice.
The endpoint considered in this paper is fish toxicity, an important test for water quality assessment. All toxicity data were retrieved from Russom, et al. . Tests were conducted at EPA (Environmental Protection Agency) using Lake Superior water at 25 ± 1 ℃. Aqueous toxicant concentrations were measured in all tests with quality assurance criteria requiring 80% agreement between duplicate samples and 90% to 100% spike recovery. Flow-though exposures were conducted using cycling proportional, modified Benoit or electronic diluters. Median lethal concentrations (LC50, in mg/L) were calculated using the Trimmed Spearman-Karber method, with 95% confidence intervals being calculated when possible. LC50 (96 h), i.e. the lethal concentration for 50% of a population within 96 hours. For the purposes of this study the logarithm of their reciprocal values (log (1/LC50)) was used for QSAR modeling.
The data set contains 568 different compounds. Data are quite representative for most industrial chemicals, but they are still a very small percentage of the commercialized chemicals, and an even minor part of all possible chemicals humans can be exposed to. Nevertheless, they represent a unique collection of data. As typical in data from studies in the life sciences, the cost of experimental data is high. Thus, the number of experiments is quite low.
Since these values are widely spread and to take into account the regulations, the results were also transformed into the classification for toxicity to fish as provided by Directive 92/32/EEC of the EU for dangerous substances  (Table 1).
Table 1: EU classification for fish toxicity. View Table 1
According to EPA the data set can be studied also considering the chemical classes included, which are indicated in (Table 2).
Table 2: Chemical classes in the data set. View Table 2
In a previous work on toxicity prediction in the fathead minnow Russom, et al.  encoded human expert knowledge in the definition of modes of action (MOA). They classified eight mechanisms involved in aquatic toxicity. Then specific chemical fragments have been identified, and a heuristic has been developed to find the MOA on the basis of these fragments. Finally, quite simple QSAR models have been used to predict toxicity for specific MOA.
Another interesting QSAR model that uses a low number of chemical 3D descriptors and a linear regression is described in .
The fish dataset contains a diversity of compounds, with a diversity of structures. Because of the lack of homogeneity it is hard for a single technique to model this data and to obtain good results. One attempt to overcome the diversity of the data, dividing the set using the official chemical classification from EPA, was developed in . The results obtained in prediction are quite good for some models, with the coefficient of determination on the test set R2test ranging from 0.98 for ethers to 0.60 for aldehydes. The accuracy prediction of the combined model has R2test of about 0.8.
Since the classification in chemical classes is sometimes ambiguous because more than one functional group could be inside one molecule, an alternative approach grouped the chemicals together with a clustering technique in the input space (the descriptors space) using Euclidian distance. In fact the chemicals with similar biological and physico-chemical properties have similar descriptors and are grouped in the same cluster; moreover, for similar descriptors is more likely to have similar behavior. This work, as reported in , obtained accuracy very similar to the previous one. Using an unsupervised method it built ten clusters and ten models (one for each). All the models were tested on an external test set. The performance of every local model and of the global model on the test set was about 0.77 in terms of R2test.
Other approaches have been reported, using Genetic Algorithms, Support Vector Machines, and integration of them [17-19]. The EC project ANTARESb listed about 250 implemented QSAR models, including many on fish toxicity, with R2 similar to those above reported.bhttp://www.antares-life.eu/index.php?sec=modellist
To evaluate the performance of the models, a test set of 351 compounds with toxicity data on Rainbow trout (Oncorhynchus mykiss) was used. This dataset was compiled from two sources: the beta version of the OECD toolboxc selecting the OECD-HPV inventory, and the pesticide dataset used to develop the Demetrad model for fish. The data were checked to eliminate duplicates, mixture and inorganic compounds. Salt were considered in their neutral form.chttp://www.oecd.org/document/23/0,3746,en_2649_34379_33957015_1_1_1_1,00.html
Instead of computing chemical descriptors and extracting a good subset for modeling, our approach relies on extracting fragments and combining them into rules. Our method has been implemented in SARpy [20,21], a Python script employing the open source OpenBabel 2.2.3 library. SARpy also is public and available at https://sourceforge.net/projects/sarpy/.
The model is built in SARpy starting from the training dataset given in input as a matrix of chemical structures and their experimental activity label, written as a Comma Separated Values (CSV) table, with structures expressed in Simplified Molecular Input Line Entry Specification (SMILES) notation. SMILES are ASCII strings obtained by printing the symbol nodes (chemical elements) encountered in a depth-first tree visit of the chemical graph.
From the training set of molecular structures along with their experimental activity binary labels, SARpy generates every substructure of the chemicals in the set, and mines correlations between the incidence of a particular molecular substructure and the activity of the molecules that contain it. Such task is carried out in three subsequent steps:
1. Fragmentation: A recursive algorithm that considers every combination of bond breakages working directly on the SMILES string and computes every substructure of the molecules in the input set.
2. Evaluation: Each substructure is validated as potential SA on the training set.
3. Rule set extraction: From the huge set of substructures collected, a reduced set of rules is extracted in the form: "IF contains
The aim of this phase is to detect all the chemical substructures present in the training set of chemicals. The substructures are identified by recursively applying a simple fragmentation algorithm, which at each step breaks a bond in the molecule and collects the two fragments that result. Applying the next fragmentation step to the output of the previous, all the possibilities of a second bond breakage are explored; and so on, until no more new fragments could be extracted. The fragmentation of chemical structures is performed directly on their SMILES strings. Thus the problem of processing a two-dimensional molecular graph is reduced to a fast ASCII strings processing. Considering that parsing with context-free grammars has worst-case time complexity cubic, the complexity of SARpy in enumerating the substrings is polynomial. In addition it has been decided to consider rings as single entities during the fragmentation phase: the idea is that the same substructures contained in a ring might be found as open skeleton in other compounds in the training data, otherwise, if always embedded in a ring, then only the whole ring itself has to be taken into account.
A further consideration concerns the length, in terms of number of atoms, of relevant fragments. Several experiments gave evidences that fragments longer than 18 atoms don't affect the final model.
Once all the substructures have been collected, the next phase consists in their individual evaluation as potential SAs using the training set of chemicals. Here the binary case of a "positive" or "negative" experimental activity label, associated to each structure, is considered, so to focus on the search for SAs for positive activity.
First of all, each substructure is matched against every molecular structure in the training data. Such matches can be divided into matches against positive structures, called True Positives (TP), and matches against negative structures, called False Positives (FP). The structural comparison is carried out by the OpenBabel SMARTS matching function after being optimized by the use of fingerprints and by taking advantage of the hierarchical organization of the fragmentation process. In fact, the search for structures potentially containing a given fragment can be restricted to the ones that contain a substructure of the fragment itself. Therefore, the evaluation process is performed backward in the hierarchy of fragments.
Then every fragment is paired with its TPs and FPs matches. From these two values, the likelihood ratio, which is a measure of precision intrinsic to the test (does not depend on prevalence of activity labels in the training set) is computed:
The likelihood ratios are used to rank all the potential SAs; also sensitivity is used as a secondary sorting key. The evaluation is aimed at identifying the substructures that show high precision and good sensitivity; likelihood ratio will be used in the next phase to dynamically extract the best set of SAs.
The final goal is to obtain, from the huge list of potential SAs, a reduced set able to predict the target class with the best precision. Such requirement is implemented as follows:
1. Order the list of potential alerts by likelihood ratio.
2. Select the top ranked one, add it at the rule set and remove it from the list of potential alerts.
3. Update TP and FP values of the remaining potential SAs, by discarding TPs and FPs in common with the alert just selected.
4. Update likelihood ratios of potential SAs.
5. Return to 1.
To avoid rules with irrelevant behavior, a lowest bound on the TP value is considered; furthermore, rules that show a precision worse than the prevalence of the target class in the training set (i.e., likelihood ratio < 1) are removed, since they predict worse than a "random rule". Such rules pruning is dynamically re-executed before the extraction of every rule, using the actual TP values and likelihood ratios.
With this procedure it is possible to select the next SA keeping into account the effect of the SAs already in the rule set, The definition of a termination condition affects the behavior of the rule set, making it more sensitive (late stop) or more specific (early stop). The output is presented to the user as an ordered set of rules in the form "IF contains
The resulting set of rules can be checked on an external test set or many-folds cross-validated.
As already said, the definition of the toxicity in case of fish acute toxicity follows thresholds defined on a conventional basis, as reported in Table 1. It means that the toxic effect is defined according to opportunistic values, and there are different thresholds used by regulators, but not necessarily connected to a unique effect (see the discussion about MOA).
Another point to underline is that the robustness of the model depends on the number of chemicals in each class. Thus, for instance, in case of the class of chemicals with the highest toxicity (meaning toxic at a concentration < 1 mg/L), the number of chemicals is quite low, and thus the model has to work on unbalanced data.
Due to the relatively low number of chemicals with experimental data on the fathead minnow, we decided to use all of them for modeling purposes, and then to use a second set of compounds for testing purposes. However, the second set of compounds is for another fish, trout, and includes a high number of pesticides (more then 50%). These two factors should be carefully considered, because they represent differences compared to the population of the values used to train the model.
We applied SARpy to identify the fragments related to toxicity, according to the different thresholds of Table 1. Thus, SARpy discovers a series of fragments which serves to label chemicals as toxic with the threshold of 100 mg/L, another series of fragments for the threshold at 10 mg/L, and so on. The overall model is composed by three individual models done with SARpy.
The overall model has been then implemented within the VEGA platform (http://www.vega-qsar.eu) , where the user can freely get the predicted class simply introducing the SMILES code of the chemical(s) of interest.
Table 3 shows the numbers of predicted molecules in each class as obtained for the four toxicity classes. Computing the accuracy on the training set, the accuracy is 87.5% the highest threshold of 100 mg/L, is 85% for the threshold of 10 mg/L, and it is 92% for the threshold at 1 mg/L. The average accuracy is so greater than 88%.
Table 3: Chemical classes in the data set. View Table 3
Thus, the models for the three thresholds provide good results.
To check the generalization capability of the model we tested it with an external test set. This test set contains compounds for which results on trout toxicity were available. Table 4 shows the results for this test set. As before stated, there is difference in the fish, which may get different toxicity values. Furthermore, this data set contains in most of the cases pesticides, and thus there is quite a lot of chemicals which are labeled as very toxic, and may contains SAs not covered within the training set. The accuracy for the threshold at 100 mg/L is still quite good (79%), but it is lower for the thresholds at 10 mg/L (63%) and at 1 mg/L (65%). The average accuracy is so 69% on the trout test set. The reasons are quite probably both the diversity in the chemicals used and the fish kind, as expected. In fact applying a QSAR model to another species has to consider the inter-species differences too.
Table 4: Results for the external test set (TROUT). View Table 4
SARpy has been able to extract fragments that can provide useful insights on the understanding of the toxic mode of action. For instance, SARpy has found that fragments related to toxicity have Sn, dithiophosphates, and halogenatated aromatic rings, which are known to be associated to aquatic toxicity. Using the same tool those fragments would be eventually correlated to MOA, an area where there is not sufficient structural knowledge. The full list of the alerts found by SARpy and applied by the model to make predictions according to the three thresholds is in Appendix 1.
This new model can be compared to other available models.
In a recent exercise , researchers tested five available models for fish toxicity, namely ECOSA, KATE, TEST, ADMET, and CADRE-AT, all based on logP, against a new test set, not used in developing any of the models. This new set contains 83 chemicals, distributed in many chemical classes. The output of the models was one of the 4 categories. CADRE-AT (a model based on neural nets, and not freely available) obtained the high total accuracy of 83%, missing only 3 predictions. The other models had a total accuracy lower that 58% and many missing predictions. For the majority of compounds is was impossible to classify them into MOA.
The same EPA Fathead Minnow Acute Toxicity database has been used to create a model as reported in the EU project OpenToxe. The models were developed using descriptors and regression on a training set of 90% (522 compounds) and test set of 10% (58 compounds). The reported statistical parameters are: R2test about 0.7, and Q2train about 0.6 .ehttp://www.opentox.org
The use of (Q)SAR models is expected to increase in various regulations. The European Plant Protection Products Regulation requires that registrants establish whether pesticide metabolites pose a risk to the environment. Fish acute toxicity is one of the required endpoints. EFSA recently published a "Guidance on tiered risk assessment for plant protection products for aquatic organisms in edge-of-field surface waters" in which it outlines the opportunity to apply non-testing methods, such as SAR and QSAR. Considering this issue Burden and coworkers developed a retrospective analysis about the state of the art in predicting fish toxicity. They extracted experimental fish LC50 values for 150 metabolites from the Pesticide Properties Databasef. Then predicted them using the US EPA's ECOSAR software, and compared the predictions to the experimental data. They observed a significant correlation between predicted and experimental fish LC50 values and concluded that for 91% of the substances the predictions were sufficiently predictive. They concluded that the applicability of QSAR models in the metabolite assessment could be recommended .fhttp://sitem.herts.ac.uk/aeru/ppdb/en/atoz.htm
It is worth discussing our results in perspective, considering other available approaches and future developments.
Recent research has individuated toxicokinetic models as tools to help in assessing the environmental risk. For aquatic species, and in particular for fish and for some chemical classes, some PBTK (physiologically based toxicokinetic) models are available . This software can be of great interest in studying the acute toxicity of fish.
Another recent direction in filling the data-gap in toxicity prediction is providing new software to improve read-across. In our experience, after the results on a round-robin exercise on read across  involving about 20 experts, we found that the results were largely user-dependent on assessing fish toxicity (spanning from non toxic to high toxic for the same chemical), even considering that most of the experts used tools as OECD toolbox. In case of other toxicological endpoints, when experts used a property tailored system as ToxRead , the answers were more consistent with each other. ToxRead presents in a structured and hierarchical way the structural alerts present in the target and the similar molecules.
It is our future task to extend ToxRead by inserting new properties, as fish toxicity, and the structural alerts so far collected. We are aware that those alerts have now only a statistical meaning (the full process of transformations not being available for most of them); however we expect that they will help experts in the assessment.
Alongside classical methods as in vivo and in vitro experiments, the use of computational tools is gaining more and more interest in the scientific community and in the industrial world as accompaniment or replacement of existing techniques. For regulatory purposes it is important to obtain satisfactory classification accuracy on new chemical families not well studied. In this area it is important to develop models that can take advantage of statistical analysis on great numbers and can be further refined to improve or confirm the results and give more insights into the domain.
To this end we have developed a new model for fish toxicity using SARpy, a system able to focus on the important structural features hidden in the database. The difference of SARpy with respect to other (Q)SAR approaches is its ability to extract relevant knowledge in the form of structural alerts during the learning stage. The models show their capability to predict the fish toxicity, even though they suffer in some cases for the unbalanced distribution of the data and chemical nature.