CHATPRIVACYDONATELOGINREGISTER
DMT-Nexus
FAQWIKIHEALTH & SAFETYARTATTITUDEACTIVE TOPICS
On the problem of searching for molecular structure and/or proteins. Options
 
muladharma
#1 Posted : 1/16/2021 12:09:52 PM

DMT-Nexus member


Posts: 128
Joined: 03-Jun-2017
Last visit: 14-Jun-2022
Location: European Union
Starting this thread to discuss the problem of searching for chemicals, molecules.

There are many standards for representing the naming of molecules, especially in the form of strings of characters to be used by computers. There seems to be no worldwide standard for the problem, so any application that allows you to perform the task could be using a combination of multiple representations.

One such application is:
https://opsin.ch.cam.ac.uk/ OPSIN: Open Parser for Systematic IUPAC nomenclature

The OPSIN app produces CML file which is an .XML of the atom positions and bonds.
There could be more than a single name for a molecule, and the application deals well with this and other problems by detecting ambiguity.

The result given might be a combination of operations on the N-Grams of the input, which by some logic finds or constructs a match. That being said, without studying the code and/or docummentation one cannot know about the completeness (if all inputs can produce all outputs) and the inversibility (can produce an input given an output) of the operation.

It's not clear if this can be used to search for related compounds, but some ideas are: backtracking inputs, searching for names using structure and backtracking structures.

Examples:
Hydroxytryptamine gives in the result the hydroxy group on the amine, but 5-hydroxytryptamine fixes ambiguity. It could be that ambiguous parts are resolved in the order of priority of construction.

For multiple configurations, example dimethoxybenzene, the first ortho form is prefferred.

Some inputs cannot detect stereocenters, example L-Glycine, but it gives a sign that it can search for that.


Using neural networks can yield other insights because of the complex nature of the search space that is generated by combining natural language with structural geometry.
Find the wisdom to practice loving-kindness.
 

STS is a community for people interested in growing, preserving and researching botanical species, particularly those with remarkable therapeutic and/or psychoactive properties.
 
downwardsfromzero
#2 Posted : 1/16/2021 10:23:16 PM

Boundary condition

ModeratorChemical expert

Posts: 8617
Joined: 30-Aug-2008
Last visit: 07-Nov-2024
Location: square root of minus one
Quote:
Some inputs cannot detect stereocenters, example L-Glycine, but it gives a sign that it can search for that.
This specific example has the problem that glycine, out of all the amino acids, is not in fact chiral.

The onus is, er, on us to name our molecules as accurately as possible while simultaneously being aware of the range of alternative trivial names and/or possible (common or otherwise) nomenclatural errors that may be encountered.

As you may have noticed, nomenclature is one of my specialised fields of focus. Please feel free to address me with questions directly.


Proteins are far more tricky beasts as their secondary structure is outside the scope of ordinary systems of nomenclature. And, of course, the larger the molecule, the more of a mouthful the systematic name becomes.



The OPSIN link is great, thanks!




“There is a way of manipulating matter and energy so as to produce what modern scientists call 'a field of force'. The field acts on the observer and puts him in a privileged position vis-à-vis the universe. From this position he has access to the realities which are ordinarily hidden from us by time and space, matter and energy. This is what we call the Great Work."
― Jacques Bergier, quoting Fulcanelli
 
 
Users browsing this forum
Guest

DMT-Nexus theme created by The Traveler
This page was generated in 0.042 seconds.