'Convert a SMILES dataset to graph
My idea would be to create a VAE or a GAN capable of generating new drugs, using graphs as representations for my molecules. Now I’m asking the real question:
I started the project with a simple Pandas dataframe made up of SMILES strings and various features, like this one:
CC(=O)Nc1ccc(O)cc1, weight = 151.16, …
CC(=O)Oc1ccccc1C(=O)O, weight = 180, …
Is it possible to convert the strings in a graph data format? If yes, may you give me some suggestions on how to do that?
Thank you all!
Solution 1:[1]
Yes, use dgl lifesci they have a few functions for smiles to graphs depending on the graph you want:
https://github.com/awslabs/dgl-lifesci/blob/master/python/dgllife/utils/mol_to_graph.py
Also deepchem has similar functionality in their inbuilt featurizers: https://github.com/deepchem/deepchem/blob/master/deepchem/feat/molecule_featurizers/mol_graph_conv_featurizer.py
Sometimes going stright from smiles to graph can be confusing, where you see anything that talks about mol e.g mol_to_graph, you can convert smiles to mol with the mol_from_smiles function in rdkit.Chem:
mol = Chem.MolFromSmiles('Cc1ccccc1')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mrw |