We evaluated five machine learning algorithms based on molecular descriptors for addressing the problem of compound toxicity prediction: Random Forest (RF), Support Vector Regression (SVR), Deep Neural Network (DNN), Gradient Boosting Tree (GBM), and Adaboost Boosting Tree (ADA), as well as four graph neural network models based on graphs: Directed Message Passing Neural Network (D-MPNN), Attentive FP model, Graph Convolutional Neural Network (GCN), and Graph Attention Network (GAT). A three-layer Stacking ensemble model was constructed using the Super Learner method.

你可以下载此处demo:

推荐使用如下模型结构:

id= 0,Random Forest Regressor useing ECFP

id= 1,deepchem.models.AttentiveFPModel:Model for Graph Property Prediction

id= 2,Deep neural network useing ECFP

id= 3,Support vector regression useing ECFP


The structure of Stacking model's second layer :

id= 0,Support vector regression for the second layer of Stacking_model

id= 1,Random Forest Regressor for the second layer of Stacking_model

id= 2,Support vector regression for the second layer of Stacking_model


第三层使用nnls

Model selection

Molecular Descriptor Selection

For machine learning algorithms beyond graph neural networks, appropriate parameters need to be selected. Currently, this project supports the use of Extended-Connectivity Fingerprint, which requires pre-setting two parameters:

  1. Fingerprint Length : Specify the length of the generated extension fingerprints, which determines the number of bits included.
  2. Fingerprint Radius: Specify the neighborhood range of atoms in the extension fingerprints.

Click the button to start the training process. It may take some time.

Before the next prediction step, you need to download the pre-trained model.

Download the packaged model.

Click the button to start the prediction process. It may take some time.