How to replace Hydrogen atom with another atom using RDKit? - replace

Assume that I have a molecule as below.
smile = 'C(CN)SS'
mol = Chem.MolFromSmiles(smile)
As can be seen, N has two Hydrogen atoms and S has one Hydrogen atom. I would like to replace those Hydrogens with Carbon.
I can find neighbor atoms by the following code. But, it only returns non-Hydrogen atoms. And, I don't know how to replace Hydrogen with Carbon.
for atom in mol.GetAtoms():
if atom.GetAtomicNum() != 6 and atom.GetTotalNumHs() > 0:
print("Neighbors", [x.GetAtomicNum() for x in atom.GetNeighbors()])
If you have any idea, I really appreciate it to guide me.

The reason you cannot see the hydrogens as Atom objects is because here these hydrogens are modelled as implicit hydrogens. The difference between implicit and explicit hydrogens is summarised nicely in this article. This can also be seen using rdkit:
smiles = 'C(CN)SS'
mol = Chem.MolFromSmiles(smiles)
for atom in mol.GetAtoms():
if atom.GetAtomicNum() != 6 and atom.GetTotalNumHs() > 0:
element = atom.GetSymbol()
implicit = atom.GetNumImplicitHs()
explicit = atom.GetNumExplicitHs()
print(element, 'Implicit:', implicit, 'Explicit:', explicit)
>>> N Implicit: 2 Explicit: 0
>>> S Implicit: 1 Explicit: 0
You can set atoms as explicit easily:
mol = Chem.AddHs(mol)
The easiest way to change substructure is to use the function Chem.ReplaceSubstructs:
match = Chem.MolFromSmarts('[NH2]')
repl = Chem.MolFromSmarts('N(-C)-C')
new_mol = Chem.ReplaceSubstructs(mol, match, repl)
Okay considering you want to change any hydrogen connected to a non-carbon atom into a carbon you could do something like this:
smiles = 'C(CN)SS'
mol = Chem.MolFromSmiles(smiles)
mol = Chem.AddHs(mol)
edit = Chem.RWMol(mol)
for atom in edit.GetAtoms():
if atom.GetAtomicNum() != 6:
for nbr in atom.GetNeighbors():
if nbr.GetAtomicNum() == 1:
nbr.SetAtomicNum(6)
Chem.SanitizeMol(edit)
mol = Chem.RemoveHs(edit) # Also converts back to standard Mol

Here is a way to do it with ReplaceSubstructs().
from rdkit import Chem
repl = Chem.MolFromSmiles('C')
patt = Chem.MolFromSmarts('[#1;$([#1][!#6])]') # hydrogen not on carbon
#patt = Chem.MolFromSmarts('[#1;$([#1][#7,#16])]') # hydrogen on nitrogen or sulfur
mol = Chem.AddHs(Chem.MolFromSmiles('C(CN)SS'))
rms = Chem.ReplaceSubstructs(mol, patt, repl, replaceAll=True)
newMol = Chem.RemoveHs(rms[0])

Related

Find starting and ending index of each unique charcters in a string in python

I have a string with characters repeated. My Job is to find starting Index and ending index of each unique characters in that string. Below is my code.
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
mo = re.search(item,x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
Output :
a 0 1
b 3 4
c 7 8
Here the end index of the characters are not correct. I understand why it's happening but how can I pass the character to be matched dynamically to the regex search function. For instance if I hardcode the character in the search function it provides the desired output
x = 'aabbbbccc'
xs = set(x)
mo = re.search("[b]+",x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
output:
b 2 5
The above function is providing correct result but here I can't pass the characters to be matched dynamically.
It will be really a help if someone can let me know how to achieve this any hint will also do. Thanks in advance
String literal formatting to the rescue:
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
# for patterns better use raw strings - and format the letter into it
mo = re.search(fr"{item}+",x) # fr and rf work both :) its a raw formatted literal
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n) # fix upper limit by n-1
Output:
a 0 3 # you do see that the upper limit is off by 1?
b 3 7 # see above for fix
c 7 9
Your pattern does not need the [] around the letter - you are matching just one anyhow.
Without regex1:
x = "aaabbbbcc"
last_ch = x[0]
start_idx = 0
# process the remainder
for idx,ch in enumerate(x[1:],1):
if last_ch == ch:
continue
else:
print(last_ch,start_idx, idx-1)
last_ch = ch
start_idx = idx
print(ch,start_idx,idx)
output:
a 0 2 # not off by 1
b 3 6
c 7 8
1RegEx: And now you have 2 problems...
Looking at the output, I'm guessing that another option would be,
import re
x = "aaabbbbcc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
print(output)
Output
a 0 3
b 3 7
c 7 9
I think it'll be in the Order of N, you can likely benchmark it though, if you like.
import re, time
timer_on = time.time()
for i in range(10000000):
x = "aabbbbccc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
timer_off = time.time()
timer_total = timer_off - timer_on
print(timer_total)

ROT 13 Cipher: Creating a Function Python

I need to create a function that replaces a letter with the letter 13 letters after it in the alphabet (without using encode). I'm relatively new to Python so it has taken me a while to figure out a way to do this without using Encode.
Here's what I have so far. When I use this to type in a normal word like "hello" it works but if I pass through a sentence with special characters I can't figure out how to JUST include letters of the alphabet and skip numbers, spaces or special characters completely.
def rot13(b):
b = b.lower()
a = [chr(i) for i in range(ord('a'),ord('z')+1)]
c = []
d = []
x = a[0:13]
for i in b:
c.append(a.index(i))
for i in c:
if i <= 13:
d.append(a[i::13][1])
elif i > 13:
y = len(a[i:])
z = len(x)- y
d.append(a[z::13][0])
e = ''.join(d)
return e
EDIT
I tried using .isalpha() but this doesn't seem to be working for me - characters are duplicating for some reason when I use it. Is the following format correct:
def rot13(b):
b1 = b.lower()
a = [chr(i) for i in range(ord('a'),ord('z')+1)]
c = []
d = []
x = a[0:13]
for i in b1:
if i.isalpha():
c.append(a.index(i))
for i in c:
if i <= 12:
d.append(a[i::13][1])
elif i > 12:
y = len(a[i:])
z = len(x)- y
d.append(a[z::13][0])
else:
d.append(i)
if message[0].istitle() == True:
d[0] = d[0].upper()
e = ''.join(d)
return e
Following on from comments. OP was advised to use isalpha, and wondering why that's causing duplication (see OP's edit)
This isn't tied to the use of isalpha, it's to do with the second for loop
for i in c:
isn't necessary, and is causing the duplication. You should remove that. Instead you can do the same by just using index = a.index(i). You were already doing this, but for some reason appending to a list instead and causing confusion
Use the index variable any time you would have used i inside the for i in c loop. On a side note, in nested for loops try not to reuse the same variables. It just causes confusion...but that's a matter for code review
Assuming you do all that right it should work.

Excel | Get all column/row names in which a specific text is as a list

It is difficult for me to describe the problem in the title, so excuse any misleading description.
The easiest way to describe what I need is with an example. I have a table like:
A B C
1 x
2 x x
3 x x
Now what I want is the formula in a cell for every single column and row with each of the column or row name for every x that is placed. In the example like:
A B C
1,2 2,3 3
1 A x
2 A, B x x
3 B, C x x
The column and row names are not equivalent to the excel designation. It works with an easy WHEN statement for single cells (=WHEN(C3="x";C1)), but not for a bunch of them (=WHEN(C3:E3="x";C1:E1)). How should/can such a formula look like?
So I found the answer to my problem. Excel provides the normal CONCATENATE function. What is needed is something like a CONCATENATEIF (in German = verkettenwenn) function. By adding a module in VBA based on a thread from ransi from 2011 on the ms-office-forum.net the function verkettenwenn can be used. The code for the German module looks like:
Option Explicit
Public Function verkettenwenn(Bereich_Kriterium, Kriterium, Bereich_Verketten)
Dim mydic As Object
Dim L As Long
Set mydic = CreateObject("Scripting.Dictionary")
For L = 1 To Bereich_Kriterium.Count
If Bereich_Kriterium(L) = Kriterium Then
mydic(L) = Bereich_Verketten(L)
End If
Next
verkettenwenn = Join(mydic.items, ", ")
End Function
With that module in place one of the formula for the mentioned example looks like: =verkettenwenn(C3:E3;"x";$C$1:$K$1)
The English code for a CONCATENATEIF function should probably be:
Option Explicit
Public Function CONCATENATEIF(Criteria_Area, Criterion, Concate_Area)
Dim mydic As Object
Dim L As Long
Set mydic = CreateObject("Scripting.Dictionary")
For L = 1 To Criteria_Area.Count
If Criteria_Area(L) = Criterion Then
mydic(L) = Concate_Area(L)
End If
Next
CONCATENATEIF = Join(mydic.items, ", ")
End Function

Selecting elements in numpy array using regular expressions

One may select elements in numpy arrays as follows
a = np.random.rand(100)
sel = a > 0.5 #select elements that are greater than 0.5
a[sel] = 0 #do something with the selection
b = np.array(list('abc abc abc'))
b[b==a] = 'A' #convert all the a's to A's
This property is used by the np.where function to retrive indices:
indices = np.where(a>0.9)
What I would like to do is to be able to use regular expressions in such element selection. For example, if I want to select elements from b above that match the [Aab] regexp, I need to write the following code:
regexp = '[Ab]'
selection = np.array([bool(re.search(regexp, element)) for element in b])
This looks too verbouse for me. Is there any shorter and more elegant way to do this?
There's some setup involved here, but unless numpy has some kind of direct support for regular expressions that I don't know about, then this is the most "numpytonic" solution. It tries to make iteration over the array more efficient than standard python iteration.
import numpy as np
import re
r = re.compile('[Ab]')
vmatch = np.vectorize(lambda x:bool(r.match(x)))
A = np.array(list('abc abc abc'))
sel = vmatch(A)

Erlang record item list

For example i have erlang record:
-record(state, {clients
}).
Can i make from clients field list?
That I could keep in client filed as in normal list? And how can i add some values in this list?
Thank you.
Maybe you mean something like:
-module(reclist).
-export([empty_state/0, some_state/0,
add_client/1, del_client/1,
get_clients/1]).
-record(state,
{
clients = [] ::[pos_integer()],
dbname ::char()
}).
empty_state() ->
#state{}.
some_state() ->
#state{
clients = [1,2,3],
dbname = "QA"}.
del_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = lists:delete(Client, C)}.
add_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = [Client|C]}.
get_clients(#state{clients = C, dbname = _D}) ->
C.
Test:
1> reclist:empty_state().
{state,[],undefined}
2> reclist:some_state().
{state,[1,2,3],"QA"}
3> reclist:add_client(4).
{state,[4,1,2,3],"QA"}
4> reclist:del_client(2).
{state,[1,3],"QA"}
::[pos_integer()] means that the type of the field is a list of positive integer values, starting from 1; it's the hint for the analysis tool dialyzer, when it performs type checking.
Erlang also allows you use pattern matching on records:
5> reclist:get_clients(reclist:some_state()).
[1,2,3]
Further reading:
Records
Types and Function Specifications
dialyzer(1)
#JUST MY correct OPINION's answer made me remember that I love how Haskell goes about getting the values of the fields in the data type.
Here's a definition of a data type, stolen from Learn You a Haskell for Great Good!, which leverages record syntax:
data Car = Car {company :: String
,model :: String
,year :: Int
} deriving (Show)
It creates functions company, model and year, that lookup fields in the data type. We first make a new car:
ghci> Car "Toyota" "Supra" 2005
Car {company = "Toyota", model = "Supra", year = 2005}
Or, using record syntax (the order of fields doesn't matter):
ghci> Car {model = "Supra", year = 2005, company = "Toyota"}
Car {company = "Toyota", model = "Supra", year = 2005}
ghci> let supra = Car {model = "Supra", year = 2005, company = "Toyota"}
ghci> year supra
2005
We can even use pattern matching:
ghci> let (Car {company = c, model = m, year = y}) = supra
ghci> "This " ++ c ++ " " ++ m ++ " was made in " ++ show y
"This Toyota Supra was made in 2005"
I remember there were attempts to implement something similar to Haskell's record syntax in Erlang, but not sure if they were successful.
Some posts, concerning these attempts:
In Response to "What Sucks About Erlang"
Geeking out with Lisp Flavoured Erlang. However I would ignore parameterized modules here.
It seems that LFE uses macros, which are similar to what provides Scheme (Racket, for instance), when you want to create a new value of some structure:
> (define-struct car (company model year))
> (define supra (make-car "Toyota" "Supra" 2005))
> (car-model supra)
"Supra"
I hope we'll have something close to Haskell record syntax in the future, that would be really practically useful and handy.
Yasir's answer is the correct one, but I'm going to show you WHY it works the way it works so you can understand records a bit better.
Records in Erlang are a hack (and a pretty ugly one). Using the record definition from Yasir's answer...
-record(state,
{
clients = [] ::[pos_integer()],
dbname ::char()
}).
...when you instantiate this with #state{} (as Yasir did in empty_state/0 function), what you really get back is this:
{state, [], undefined}
That is to say your "record" is just a tuple tagged with the name of the record (state in this case) followed by the record's contents. Inside BEAM itself there is no record. It's just another tuple with Erlang data types contained within it. This is the key to understanding how things work (and the limitations of records to boot).
Now when Yasir did this...
add_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = [Client|C]}.
...the S#state.clients bit translates into code internally that looks like element(2,S). You're using, in other words, standard tuple manipulation functions. S#state.clients is just a symbolic way of saying the same thing, but in a way that lets you know what element 2 actually is. It's syntactic saccharine that's an improvement over keeping track of individual fields in your tuples in an error-prone way.
Now for that last S#state{clients = [Client|C]} bit, I'm not absolutely positive as to what code is generated behind the scenes, but it is likely just straightforward stuff that does the equivalent of {state, [Client|C], element(3,S)}. It:
tags a new tuple with the name of the record (provided as #state),
copies the elements from S (dictated by the S# portion),
except for the clients piece overridden by {clients = [Client|C]}.
All of this magic is done via a preprocessing hack behind the scenes.
Understanding how records work behind the scenes is beneficial both for understanding code written using records as well as for understanding how to use them yourself (not to mention understanding why things that seem to "make sense" don't work with records -- because they don't actually exist down in the abstract machine...yet).
If you are only adding or removing single items from the clients list in the state you could cut down on typing with a macro.
-record(state, {clients = [] }).
-define(AddClientToState(Client,State),
State#state{clients = lists:append([Client], State#state.clients) } ).
-define(RemoveClientFromState(Client,State),
State#state{clients = lists:delete(Client, State#state.clients) } ).
Here is a test escript that demonstrates:
#!/usr/bin/env escript
-record(state, {clients = [] }).
-define(AddClientToState(Client,State),
State#state{clients = lists:append([Client], State#state.clients)} ).
-define(RemoveClientFromState(Client,State),
State#state{clients = lists:delete(Client, State#state.clients)} ).
main(_) ->
%Start with a state with a empty list of clients.
State0 = #state{},
io:format("Empty State: ~p~n",[State0]),
%Add foo to the list
State1 = ?AddClientToState(foo,State0),
io:format("State after adding foo: ~p~n",[State1]),
%Add bar to the list.
State2 = ?AddClientToState(bar,State1),
io:format("State after adding bar: ~p~n",[State2]),
%Add baz to the list.
State3 = ?AddClientToState(baz,State2),
io:format("State after adding baz: ~p~n",[State3]),
%Remove bar from the list.
State4 = ?RemoveClientFromState(bar,State3),
io:format("State after removing bar: ~p~n",[State4]).
Result:
Empty State: {state,[]}
State after adding foo: {state,[foo]}
State after adding bar: {state,[bar,foo]}
State after adding baz: {state,[baz,bar,foo]}
State after removing bar: {state,[baz,foo]}