I am trying to draw a line that goes through the given blobs. The following is a given example
I want a curve line that goes through multiple blobs in horizontal direction as shown below.
Just as example:
import cv2
import numpy as np
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
(_, contours, _) = cv2.findContours(gray, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# biggest area
target = max(contours, key=lambda x: cv2.contourArea(x))
cv2.drawContours(img, [target], -1, [0, 0, 255], -1) # debug
# just example of fitting
x = target[:, :, 0].flatten()
y = target[:, :, 1].flatten()
poly = np.poly1d(np.polyfit(x, y, 5))
for _x in range(min(x), max(x), 5): # too lazy for line/curve :)
cv2.circle(img, (_x, int(poly(_x))), 3, [0, 255, 0])
cv2.imshow('result', img)
cv2.waitKey(0)
Just for fun, and employing the Perl philosophy of TMTOWTDI ("There's More Than One Way to Do It)", I extracted all the white points of your contours into a file called points.dat and fed that into gnuplot to fit a curve, which gave me a formula for a best fit line of:
y=3.10869110524588e-07*x*x*x -0.000972406154863963*x*x + 0.861790477479291*x + 307.220397010312
And then I plotted that in red on your original contours using awk and ImageMagick.
#!/bin/bash
convert contours.jpg -colorspace gray -threshold 50% txt: | awk -F: '/white/{print $1}' | tr ',' ' ' > points.dat
{ echo 'f(x) = a*x**3 + b*x**2 + c*x + d'; \
echo 'fit f(x) "points.dat" via a, b, c, d'; \
echo 'print a,"*u^3 + ",b,"*u^2 + ",c,"*u + ",d'; \
} | gnuplot 2>&1 | tail -1
awk 'BEGIN{
for(x=0;x<1504;x++){
y=3.10869110524588e-07*x*x*x -0.000972406154863963*x*x + 0.861790477479291*x + 307.220397010312
y=int(y)
print "point ",x,y
}
}' /dev/null > p.mvg
convert contours.jpg -draw #p.mvg z.png
The start of points.dat looks like this:
769 453
770 453
771 453
772 453
773 453
769 454
765 455
766 455
767 455
768 455
...
...
The start of p.mvg looks like this:
fill red
point 0 307
point 1 308
point 2 308
point 3 309
point 4 310
point 5 311
point 6 312
point 7 313
point 8 314
...
...
Related
I have a function that takes all, non-distinct, MatchId and (xG_Team1 vs xG_Team2, paired) and gives an output of as an array. which then summed up to be sse constant.
The problem with the function is it iterates through each row, duplicating MatchId. I want to stop this.
For each distinct MatchId I need the corresponding home and away goals as a list. I.e. Home_Goal and Away_Goal to be used in each iteration. from Home_Goal_time and Away_Goal_time columns of the dataframe. The list below doesn't seem to work.
MatchId Event_Id EventCode Team1 Team2 Team1_Goals
0 842079 2053 Goal Away Huachipato Cobresal 0
1 842079 2053 Goal Away Huachipato Cobresal 0
2 842080 1029 Goal Home Slovan lava 3
3 842080 1029 Goal Home Slovan lava 3
4 842080 2053 Goal Away Slovan lava 3
5 842080 1029 Goal Home Slovan lava 3
6 842634 2053 Goal Away Rosario Boca Juniors 0
7 842634 2053 Goal Away Rosario Boca Juniors 0
8 842634 2053 Goal Away Rosario Boca Juniors 0
9 842634 2054 Cancel Goal Away Rosario Boca Juniors 0
Team2_Goals xG_Team1 xG_Team2 CurrentPlaytime Home_Goal_Time Away_Goal_Time
0 2 1.79907 1.19893 2616183 0 87
1 2 1.79907 1.19893 3436780 0 115
2 1 1.70662 1.1995 3630545 121 0
3 1 1.70662 1.1995 4769519 159 0
4 1 1.70662 1.1995 5057143 0 169
5 1 1.70662 1.1995 5236213 175 0
6 2 0.82058 1.3465 2102264 0 70
7 2 0.82058 1.3465 4255871 0 142
8 2 0.82058 1.3465 5266652 0 176
9 2 0.82058 1.3465 5273611 0 0
For example MatchId = 842079, Home_goal =[], Away_Goal = [87, 115]
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
m = 1 ,arbitrary constant used to optimise sse.
k = 196
total_timeslot = 196
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal
def sum_squared_diff(x1, x2, x3, y):
ssd = []
for k in range(total_timeslot): # k will take multiple values
if k in Home_Goal:
ssd.append(sum((x2 - y) ** 2))
elif k in Away_Goal:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return ssd
def my_function(row):
xG_Team1 = row.xG_Team1
xG_Team2 = row.xG_Team2
return np.array([1-(xG_Team1*m + xG_Team2*m)/k, xG_Team1*m/k, xG_Team2*m/k])
results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)
results
sum(results.sum())
For the three matches above the desire outcome should look like the following.
If I need an individual sse, sum(sum_squared_diff(x1, x2, x3, y)) gives me the following.
MatchId = 842079 = 3.984053038520635
MatchId = 842080 = 7.882189570700502
MatchId = 842080 = 5.929085973050213
Given the size of the original data, realistically I am after the total sum of the sse. For the above sample data, simply adding up the values give total sse=17.79532858227135.` Once I achieve this, then I will try to optimise the sse based on this figure by updating the arbitrary value m.
Here are the lists i hoped the function will iterate over.
Home_scored = s.groupby('MatchId')['Home_Goal_time'].apply(list)
Away_scored = s.groupby('MatchId')['Away_Goal_Time'].apply(list)
type(HomeGoal)
pandas.core.series.Series
Then convert it to lists.
Home_Goal = Home_scored.tolist()
Away_Goal = Away_scored.tolist()
type(Home_Goal)
list
Home_Goal
Out[303]: [[0, 0], [121, 159, 0, 175], [0, 0, 0, 0]]
Away_Goal
Out[304]: [[87, 115], [0, 0, 169, 0], [70, 142, 176, 0]]
But the function still takes Home_Goal and Away_Goal as empty list.
If you only want to consider one MatchId at a time you should .groupby('MatchID') first
df.groupby('MatchID').apply(...)
I have the following Data Frame named: mydf:
A B
0 3de (1ABS) Adiran
1 3SA (SDAS) Adel
2 7A (ASA) Ronni
3 820 (SAAa) Emili
I want to remove the " (xxxx)" and keeps the values in column A , so the dataframe (mydf) will look like:
A B
0 3de Adiran
1 3SA Adel
2 7A Ronni
3 820 Emili
I have tried :
print mydf['A'].apply(lambda x: re.sub(r" \(.+\)", "", x) )
but then I get a Series object back and not a dataframe object.
I have also tried to use replace:
df.replace([' \(.*\)'],[""], regex=True), But it didn't change anything.
What am I doing wrong?
Thank you!
you can use str.split() method:
In [3]: df.A = df.A.str.split('\s+\(').str[0]
In [4]: df
Out[4]:
A B
0 3de Adiran
1 3SA Adel
2 7A Ronni
3 820 Emili
or using str.extract() method:
In [9]: df.A = df.A.str.extract(r'([^\(\s]*)', expand=False)
In [10]: df
Out[10]:
A B
0 3de Adiran
1 3SA Adel
2 7A Ronni
3 820 Emili
I want to print sequence of Ribose Puckering.
Script in perl:
open (filehandler, "List_NAD_ID.txt") or die $!; #Input file
my #file1=<filehandler>;
my $OutputDir = 'C:\Users\result'; #output directory path
foreach my $line (#file1)
{
chomp $line;
open (fh,"$line") or die $!;
open (out, ">$OutputDir/$line.pdb") or die $!;
print out "\n" , "$line ";
print out "\n";
while($file = <fh>)
{
if($file =~/^HETATM.{7}(?:C4B|O4B|C1B|C2B|O4B|C1B|C2B|C3B|C1B|C2B|C3B|C4B|C2B|C3B|C4B|O4B|C3B|C4B|O4B|C1B)/)
{
print out "$file";
}
}
print "Completed", "\n";
}
I have pdb input file:
HETATM 3934 C4B NAD A 255 10.495 -11.444 1.016 1.00 50.46 C
HETATM 3935 O4B NAD A 255 10.768 -11.615 2.448 1.00 48.17 O
HETATM 3936 C3B NAD A 255 10.445 -12.867 0.431 1.00 49.69 C
HETATM 3938 C2B NAD A 255 10.431 -13.759 1.675 1.00 48.46 C
HETATM 3940 C1B NAD A 255 11.323 -12.898 2.593 1.00 46.97 C
HETATM 3978 C4B NAD B 256 14.596 1.733 33.219 1.00 50.48 C
HETATM 3979 O4B NAD B 256 14.370 0.578 32.357 1.00 48.22 O
HETATM 3980 C3B NAD B 256 14.940 1.177 34.603 1.00 49.64 C
HETATM 3982 C2B NAD B 256 14.987 -0.347 34.401 1.00 48.48 C
HETATM 3984 C1B NAD B 256 14.066 -0.517 33.189 1.00 46.98 C
Expected Result:
I want to copy following atom and then paste as per following sequence. All should be chain wise. (Chain "A, B, C,..........")
HETATM 3934 **C4B** NAD **A** 255 10.495 -11.444 1.016 1.00 50.46 C
HETATM 3935 **O4B** NAD **A** 255 10.768 -11.615 2.448 1.00 48.17 O
HETATM 3938 **C2B** NAD **A** 255 10.431 -13.759 1.675 1.00 48.46 C
HETATM 3940 **C1B** NAD **A** 255 11.323 -12.898 2.593 1.00 46.97 C
HETATM 3935 **O4B** NAD **A** 255 10.768 -11.615 2.448 1.00 48.17 O
HETATM 3940 **C1B** NAD **A** 255 11.323 -12.898 2.593 1.00 46.97 C
HETATM 3938 **C2B** NAD **A** 255 10.431 -13.759 1.675 1.00 48.46 C
HETATM 3936 **C3B** NAD **A** 255 10.445 -12.867 0.431 1.00 49.69 C
.
.
.
I have five level of paste sequence, v0,v1,v2,v3,v4.
Sequence is:
C4B-O4B-C1B-C2B
O4B-C1B-C2B-C3B
C1B-C2B-C3B-C4B
C2B-C3B-C4B-O4B
C3B-C4B-O4B-C1B
This all sequence, I want to print data as per above sequence. I have also edited expected result.
I want to sort data as per above sequence, chain wise. I am not getting expected result. I have tried in perl. I am new in perl and python... so please try to solve my problem
Its Like matrix problem:
for example we have five values: 1,2,3,4,5
Row 1 - 1 2 3 4
Row 2 - 2 3 4 5
Row 3 - 3 4 5 1
Row 4 - 4 5 1 2
I want to print data like that for each chain. Chain A to Z.
If you want to use Biopython, you have to create all the Chains and insert the Atoms in it. But the atoms must be hold in a Residue for this to work out:
from Bio.PDB import PDBParser, PDBIO, Chain, Residue
# This is your source structure
pdb = PDBParser().get_structure("UGLY", "ugly.pdb")
# Now you cycle all your chains
for chain in pdb.get_chains():
# Load all the atoms and residues in each Chain
atoms = list(chain.get_atoms())
residues = list(chain.get_residues())
# Start a new structure to save the output
io = PDBIO()
this_chain = Chain.Chain("A")
this_residue = Residue.Residue(residues[0].id,
residues[0].resname,
residues[0].segid)
# Now get the atoms in your source structure that matches your sort keys
# You should refactor this out to a function that accepts a sort key
# and returns a list of atoms or a residue with the atoms added.
for atom_name in "O4B-C1B-C2B-C3B".split("-"):
for atom in atoms:
if atom.get_name() == atom_name:
this_residue.add(atom)
# Add the residue to a structure and save it
this_chain.add(this_residue)
io.set_structure(this_chain)
# And now write your output file. Remember to change the name!
io.save("temp.pdb")
I have a large chunk of class data that I need to run a regular expression on and get data back from. The problem is that I need a repeating capturing group in order to acomplish that.
Womn St 157A QUEERHISTORY MAKING
CCode Typ Sec Unt Instructor Time Place Max Enr Req Rstr Status
32680 LEC A 4 SHAH, P. TuTh 11:00-12:20p IAB 131 35 37 60 FULL
Womn St 171 SEX/RACE & CONQUEST
CCode Typ Sec Unt Instructor Time Place Max Enr Req Rstr Status
32710 LEC A 4 O'TOOLE, R. TuTh 2:00- 3:20p DBH 1300 52 13/45 24 OPEN
~ Same as 25610 (GlblClt 103B, Lec A); 26350 (History 169, Lec A); and
~ 60320 (Anthro 139, Lec B).
32711 DIS 1 0 MONSON, A. W 9:00- 9:50 HH 105 25 5/23 8 OPEN
O'TOOLE, R.
~ Same as 25612 (GlblClt 103B, Dis 1); 26351 (History 169, Dis 1); and
~ 60321 (Anthro 139, Dis 1).
The result I need would return two matches
Match
Group1:Womn St 157A
Group2:QUEERHISTORY MAKING
Group3:32680
Group4:LEC
Group5:A
Group6:SHAH, P.
Group7:TuTh 11:00-12:20p
Group8:IAB 13
Match
Group1:Womn St 171
Group2:SEX/RACE & CONQUEST
Group3:32710
Group4:LEC
Group5:A
Group6:O'TOOLE, R.
Group7:TuTh 2:00- 3:20p
Group8:DBH 1300
Group9:25610
Group10:26350
Group11:60320
Group12:32711
Group13:DIS
Group14:1
Group15:MONSON, A.
Group16: W 9:00- 9:50
Group17:HH 105
Group18:25612
Group19:26351
Group20:60321
I have a data.frame called rbp that contains a single column like following:
>rbp
V1
dd_smadV1_39992_0_1
Protein: AGBT(Dm)
Sequence Position
234
290
567
126
Protein: ATF1(Dm)
Sequence Position
534
890
105
34
128
301
Protein: Pox(Dm)
201
875
453
*********************
dd_smadv1_9_02
Protein: foxc2(Mm)
Sequence Position
145
987
345
907
Protein: Lor(Hs)
876
512
I would like to discard the Sequence position and extract only the specific details like the names of the sequence and the corresponding protein names like following:
dd_smadV1_39992_0_1 AGBT(Dm);ATF1(Dm);Pox(Dm)
dd_smadv1_9_02 foxc2(Mm);Lor(Hs)
I tried the following code in R but it failed:
library(gsubfn)
Sub(rbp$V1,"Protein:(.*?) ")
Could anyone guide me please.
Here's one way to to it:
m <- gregexpr("Protein: (.*?)\n", x <- strsplit(paste(rbp$V1, collapse = "\n"), "*********************", fixed = TRUE)[[1]])
proteins <- lapply(regmatches(x, m), function(x) sub("Protein: (.*)\n", "\\1", x))
names <- sub(".*?([A-z0-9_]+)\n.*", "\\1", x)
sprintf("%s %s", names, sapply(proteins, paste, collapse = ";"))
# [1] "dd_smadV1_39992_0_1 AGBT(Dm);ATF1(Dm);Pox(Dm)"
# [2] "dd_smadv1_9_02 foxc2(Mm);Lor(Hs)