Of late I have been concerned with R&D information and various homebrew means of storing it and retrieving it. Institutionalizing R&D results into easily accessed knowledge can roll into a real hairball if you’re not careful. More on that another time.

My adventures with CHETAH 9.0 have caused me to look deeply into SMILES strings and what utility might be found there. This lead me to rediscover ChemSpider and the many services it provides for free to the user.

Consider the following: if you generate a SMILES structure of acetylsalicylic acid, say, from Chemdraw, O=C(O)C1=C(OC(C)=O)C=CC=C1, and use this character string as a search term in ChemSpider, it will take you to the entry for aspirin. What you get is a treasure trove of information on this substance. Go to ChemSpider, cut and paste the above SMILES string into the search box, and let her rip. I’m not your Momma. Just try it.

The breadth of references is encyclopedic.  But the truly amazing part is found when you scroll to the end of the page. There is a drop down window for SimBioSys LASSO. ChemSpider is working to provide LASSO data on its large database of compounds.  LASSO generates a structure and grinds it through a neural net processor module and produces a score between zero and one. The closer the score is to 1.00, the greater the surface conformity or compatibility of the ligand to a target receptor site.  As you would expect, there is a high score associated with aspirin and the COX-1 receptor. From what I can tell, the software is self-learning in some fashion.

The uses are many. Substances can be screened for drug-like attributes within the 40 receptor types provided.  I would like to hear from someone who might have something to say about the use of LASSO for the estimation of possible toxic effects of substances that have not been biologically tested. I fully realize the hazards of this, but perhaps LASSO scores might help flag particular substances for closer examination by testing.