I have tried below function.
Example:
dna = "ACGTGGTCTTAA"
function to_rna(dna)
for (nucleotides1, nucleotides2) in zip("GCTA", "CGAU")
dna = replace(dna, nucleotides1 => nucleotides2)
end
return dna
end
Output: "UGGUGGUGUUUU", which is not expected.
Expected output: "UGCACCAGAAUU"
Can somebody point out what went wrong.
You're performing the replacement of each letter in sequence:
julia> function to_rna(dna)
for (nucleotides1, nucleotides2) in zip("GCTA", "CGAU")
dna = replace(dna, nucleotides1 => nucleotides2)
#show nucleotides1 => nucleotides2
#show dna
end
return dna
end
to_rna (generic function with 1 method)
julia> to_rna(dna)
nucleotides1 => nucleotides2 = 'G' => 'C'
dna = "ACCTCCTCTTAA"
nucleotides1 => nucleotides2 = 'C' => 'G'
dna = "AGGTGGTGTTAA"
nucleotides1 => nucleotides2 = 'T' => 'A'
dna = "AGGAGGAGAAAA"
nucleotides1 => nucleotides2 = 'A' => 'U'
dna = "UGGUGGUGUUUU"
"UGGUGGUGUUUU"
julia> dna
"ACGTGGTCTTAA"
That is, you can't distinguish an RNA C from a DNA C after the first step etc.
You'd want it to work in parallel -- somehow like this:
julia> to_rna2(dna) = map(dna) do nucleotide
NUCLEOTIDE_MAPPING[nucleotide]
end
to_rna2 (generic function with 1 method)
julia> NUCLEOTIDE_MAPPING = Dict(n1 => n2 for (n1, n2) in zip("GCTA", "CGAU"))
Dict{Char,Char} with 4 entries:
'A' => 'U'
'G' => 'C'
'T' => 'A'
'C' => 'G'
julia> to_rna2(dna)
"UGCACCAGAAUU"
This also removes the unnecessary work of iterating over the string four times.
replace is already able to do that by itself -- if you give it an array and pass it multiple replacement arguments:
julia> replace(collect(dna), NUCLEOTIDE_MAPPING...)
12-element Array{Char,1}:
'U'
'G'
'C'
'A'
'C'
'C'
'A'
'G'
'A'
'A'
'U'
'U'
To get back a string instead of an array, you just have to join it again:
julia> replace(collect(dna), NUCLEOTIDE_MAPPING...) |> join
"UGCACCAGAAUU"
Related
im having trouble asking the user to re-enter input if it doesn't match string values from specific list. what have i missed?
El_List = 'H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne'
user_instructions = ( 'Select Elements value from following list to create a molecule')
print (user_instructions )
while true :
try :
elem = str(input('add molecular formula: '))
if elem = 'H' or = 'He' or = 'Li' or = 'Be' or = 'B' or = 'C' or = 'N' or = 'O' or = 'F' or = 'Ne':
print (elem)
break,
else :
print ('Not Found in Element list)
except: ValueError :
print('add molecular formula: ')
continue
Welcome to SO. First of all, you have a lot of indentation errors in your code. I just tried on my side after cleaning up the code and is working fine.
Try this:
El_List = 'H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne'
user_instructions = ( 'Select Elements value from following list to create a molecule')
print (user_instructions)
while True:
try :
elem = str(input('add molecular formula: '))
if elem == 'H' or elem == 'He' or elem == 'Li' or elem == 'Be' or elem == 'B' or elem == 'C' or elem == 'N' or elem == 'O' or elem == 'F' or elem == 'Ne':
print(elem)
break
else :
print ('Not Found in Element list')
except ValueError :
print('add molecular formula: ')
continue
Your except is inside your try statement. Make sure all the indentations are correct. Hope this helps!
Consider sentence : W U T Sample A B C D
I'm trying to use re.groups after re.search to fetch A, B, C, D (letters in caps after 'Sample'). There could be variable number of letters
Few unsuccessful attempts :
A = re.search('Sample\s([A-Z])\s*([A-Z])*', 'W U T Sample A B C D')
A.groups()
('A', 'B')
A = re.search('Sample\s([A-Z])(\s*([A-Z]))*', 'W U T Sample A B C D')
A.groups()
('A', ' D', 'D')
A = re.search('Sample\s([A-Z])(?:\s*([A-Z]))*', 'W U T Sample A B C D')
A.groups()
('A', 'D')
I'm expecting A.groups() to give ('A', 'B', 'C', 'D')
Taking another example, 'XSS 55 D W Sample R G Y BH' should give the output ('R', 'G', 'Y', 'B', 'H')
Most regex engines, including Python's, will overwrite a repeating capture group. So, the repeating capture group you see will just be the final one, and your current approach will not work. As a workaround, we can try first isolating the substring you want, and then applying re.findall:
input = "W U T Sample A B C D"
text = re.search(r'Sample\s([A-Z](?:\s*[A-Z])*)', input).group(1) # A B C D
result = re.findall(r'[A-Z]', text)
print(result)
['A', 'B', 'C', 'D']
Right now I have the following code...
%strings = ( 'a' => 'x',
'b0' => 'y',
'b1' => 'y',
'b2' => 'y',
...
'bN' => 'y'
'c' => 'z');
....
if(grep { $_ eq $line[0] } keys %strings){
....
}
So over all I setup this hash. $line is created by reading a file. I then look to see if the first string in the line is contained within my hash. This code works perfectly. However, my problem arises with the fact that in the hash, b is growing. For instance right now I have to explicitly list out b0 - b63. This is 64 different definitions that all just need to have the same value. Is there a way to have a regex for the hash key like b\/d\?
If you want to use a regular expression, nothing prevents you from doing so:
%strings = (
'a' => 'x',
'b\d+' => 'y',
'c' => 'z'
);
...
if( grep { $line[0] =~ /^$_$/ } keys %strings ) {
...
}
The ^ and $ are necessary to make sure the full string $line[0] matches and not only a part of it.
Bear in mind that this will be much slower than the eq comparison. On the other hand, the number of expressions to evaluate by grep will be much lower, so you may want to profile different options if the speed of execution is an issue.
Also, keep in mind that you may want to refine the regular expression. For instance, ^b\d{1,2}$ will match a b followed by one or two digits. Or even ^b[1-6]?\d$...
If I undestood you correctly,
b\d+
This will match "b" followed by any string of only numbers.
my %strings = ('a' => 'x',
map{("b$_" , 'y') } 0..63,
'c' => 'z');
should do the trick ;)
if it is what you want
if you need to add a 'b value' later in the code, you still can do $strings{"b$value"} = 'y'; to add the new value in the hash
I wrote a regex containing named groups. I can see the regexmatch object contains the keys and values, but I'm not sure how I would extract a dictionary from it...
For example, if I have
r"(?<my_num>\d*) (?<the_rest>.*)"
how would I get a dictonary containing the keys "my_num" and "the_rest", without having to keep track of those keys elsewhere?
Just to access the groups use:
julia> m = match(r"^([a-z]*)([1-9]*)([a-z]*)$","abc123abc")
RegexMatch("abc123abc", 1="abc", 2="123", 3="abc")
julia> m[1]
"abc"
julia> m[2]
"123"
or to create a dictionary with the group numbers as keys:
julia> d = Dict(i=>m[i] for i=1:3)
Dict{Int64,SubString{String}} with 3 entries:
2 => "123"
3 => "abc"
1 => "abc"
or if you meant to work with named groups as keys:
julia> re = r"(?P<group1>a*)(?P<group2>b*)$"
r"(?P<group1>a*)(?P<group2>b*)$"
julia> m = match(re,"aaabbb")
RegexMatch("aaabbb", group1="aaa", group2="bbb")
julia> d = Dict(Symbol(n)=>m[Symbol(n)] for n in values(Base.PCRE.capture_names(m.regex.regex)))
Dict{Symbol,SubString{String}} with 2 entries:
:group1 => "aaa"
:group2 => "bbb"
julia> d[:group1]
"aaa"
julia> m[2]
"bbb"
I have this cell array in MATLAB:
y = { 'd' 'f' 'a' 'g' 'g' 'a' 'w' 'h'}
I use unique(y) to get rid of the duplicates but it rearranges the strings in alphabetical order:
>> unique(y)
ans =
'a' 'd' 'f' 'g' 'h' 'w'
I want to remove the duplicates but keep the same order. I know I could write a function do do this but was wondering if there was a simpler way using unique to remove duplicates while keeping the same order just with the duplicates removed.
I want it to return this:
>> unique(y)
ans =
'd' 'f' 'a' 'g' 'w' 'h'
Here's one solution that uses some additional input and output arguments that UNIQUE has:
>> y = { 'd' 'f' 'a' 'g' 'g' 'a' 'w' 'h'}; %# Sample data
>> [~,index] = unique(y,'first'); %# Capture the index, ignore the actual values
>> y(sort(index)) %# Index y with the sorted index
ans =
'd' 'f' 'a' 'g' 'w' 'h'
In MATLAB R2012a, a new order flag was added:
>> y = {'d' 'f' 'a' 'g' 'g' 'a' 'w' 'h'};
>> unique(y, 'stable')
ans =
'd' 'f' 'a' 'g' 'w' 'h'
If you look at the documentation for unique, there's the option to return an index along with the sorted array. You can specify whether you want the first or last occurrence of a number to be returned to the index as well.
For example:
a=[5, 3, 4, 2, 1, 5, 4];
[b,order]=unique(a,'first')
returns
b=[1, 2, 3, 4, 5] and m=[5, 4, 2, 3, 1]
You can sort your order array and store the index next
[~,index]=sort(order) %# use a throw-away variable instead of ~ for older versions
and finally re-index b
b=b(index)