Data.DrData
Click here to view source code.
class DrData(pair_ls: list,
cell_ft: str or dict,
drug_ft: str or dict,
smiles_dict: dict = None,
mpg_dict: dict = None):
It stores all the data needed for drug response prediction.
You can remove pairs lacking cell or drug data,
and split the response data into training, validation, and test set
using .clean and .split.
PARAMETERS:
pair_ls (list) - The cell-drug pairs. Each element in the list is a sub-list that contains three elements, which are the cell name, drug name, and drug response. You can build it yourself or get it through
Data.DrRead.PairCSVorData.DrRead.PairDef.cell_ft (str or dict) -
"EXP","PES","MUT","CNV", CellFeat got byData.DrRead.FeatCell, or your own dict, where the key is the cell name and the value is the feature vector, e.g. VAE_dict.pkl.drug_ft (str or dict) -
"ECFP","SMILES","Graph","Image", or your own dict, where the key is the drug name and the value is the feature vector, e.g. SMILESVec_dict.pkl.smiles_dict (dict, optional) - SMILES_dict got by
Data.DrRead.FeatDrug. (default: None)mpg_dict (dict, optional) - MPG_dict got by
Data.DrRead.FeatDrug. (default: None)
self.clean
def clean(self, cell_ft_ls: list = None):
It can be used to remove pairs lacking cell or drug data.
cell_ft_ls (list, optional) - Each element should have the same form as cell_ft. (default: None)
self.split
def split(self, mode: str,
fold: int,
ratio: list,
seed: int,
save: bool = True,
save_path: str = None):
It can be used to split the response data into training, validation, and test set.
PARAMETERS:
mode (str) - The splitting mode.
"common","cell_out","drug_out","strict"are available.fold (int) - The number of folds for k-fold cross-validation. It should greater or equal to 1. Setting
fold=1will not use k-fold cross-validation.ratio (list) - The splitting ratio. If
fold=1, it should be a list containing 3 floats, respectively correspond to the ratio of training set, validation set, and test set. Iffold>1, it should be a list containing 2 floats, respectively correspond to the ratio of non-test set and test set.seed (int) - The random seed.
save (bool, optional) - Whether to save the return value. (default: True)
save_path (str, optional) - Save path for the return value. It is required to end in
".pkl". If it is set to None, the default path will be used. (default: None)
OUTPUTS:
train_dr_data_ls (list) - The segmented training sets.
val_dr_data_ls (list) - The segmented validation sets.
test_dr_data (DrData) - The segmented test set.