I know how to move, rotate, and scale, but how does skewing work? what would I have to do to a set of verticies to skew them?
Thanks
Offset X values by an amount that varies linearly with the Y value (or vice versa).
Edit: Doing this with a rectangle:
Let's say you start with a rectangle (0, 0), (4, 0), (4, 4), (0, 4). Let's assume you want to skew it with a slope of 2, so as it goes two units up, it'll move one to the right, something like this (hand drawn, so the angle's undoubtedly a bit wrong, but I hope it gives the general idea):
To get this, each X value is adjusted like:
X = X + Y * S
where S is the inverse of the slope of the skew. In this case, the slope is 2, so S = 1/2. Working that for our four corners, we get:
(0, 0) => 0 + 0 / 2 = 0 => (0, 0)
(4, 0) => 4 + 0 / 2 = 4 => (4, 0)
(4, 4) => 4 + 4 / 2 = 6 => (6, 4)
(0, 4) => 0 + 4 / 2 = 2 => (2, 4)
Skewing / shearing is described in detail at http://en.wikipedia.org/wiki/Shear_mapping and http://mathworld.wolfram.com/ShearMatrix.html
Related
So far I managed to calculate the distances between an Point P(x,y) and a multitude of points stored in a list l = [(x1,y1), (x2,y2), (x3,y3), ...) Here is the code :
import math
import pprint
l = [(1,2), (2,3), (4,5)]
p = (3,3)
dists = [math.sqrt((p[0]-l0)**2 + (p[1]-l1)**2) for l0, l1 in l]
pprint.pprint(dists)
Output :
[2.23606797749979, 1.0, 2.23606797749979]
Now I want to calculate the distances from multitude points in a new list to the points in the list l.
I haven't found a solution yet, so does anyone have an idea how this could be done?
Here is a possible solution:
from math import sqrt
def distance(p1, p2):
return sqrt((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2)
lst1 = [(1,2), (2,3), (4,5)]
lst2 = [(6,7), (8,9), (10,11)]
for p1 in lst1:
for p2 in lst2:
d = distance(p1, p2)
print(f'Distance between {p1} and {p2}: {d}')
Output:
Distance between (1, 2) and (6, 7): 7.0710678118654755
Distance between (1, 2) and (8, 9): 9.899494936611665
Distance between (1, 2) and (10, 11): 12.727922061357855
Distance between (2, 3) and (6, 7): 5.656854249492381
Distance between (2, 3) and (8, 9): 8.48528137423857
Distance between (2, 3) and (10, 11): 11.313708498984761
Distance between (4, 5) and (6, 7): 2.8284271247461903
Distance between (4, 5) and (8, 9): 5.656854249492381
Distance between (4, 5) and (10, 11): 8.48528137423857
In a large code written in Fortran08 for calculating thermodynamic equilibria and phase diagrams I use many symmetric matrices which I store as 1D arrays and index using a small function
integer function ixsym(i,j)
if(i.gt.j) then
ixsym=j+i*(i-1)/2
else
ixsym=i+j*(j-1)/2
endif
return
end
This works perfectly but after improving the speed of various other parts of the code this routine now takes 15-20% of the calculation time (it is used very often). I assume there are various ways of speeding this up but I do not know C or other way to replace this function so I am looking for help. I use gfortran but the replacement has to be portable.
Bo Sundman
The only thing you might consider is to get rid of the branching in that function:
The minimum and maximum of two numbers can be computed as:
max = (a+b + abs(a-b))/2
min = (a+b - abs(a-b))/2 = a+b - max
So you can use this as
integer function ixsym(i,j)
integer :: p, q
q = i+j; p = (q + abs(i-j))/2; q = q - p
ixsym = q + (p*(p-1))/2
return
end
which you can further reduce as
integer function ixsym(i,j)
integer :: p
ixsym = i+j; p = (ixsym + abs(i-j))/2;
ixsym = ixsym + (p*(p-3))/2
return
end
Fortran compilers used to have optimization on par or better than C compilers.
So I would not expect a gain just by switching language and rather focus on algorithmic improvements.
How about replacing the calculation of the index tranformation by a lookup table?
Do you have the memory to store the ixsym values for given i and j indices?
Yes it counters your memory for cpu trade-off, but if you have many matrices this extra one might help.
Is it really necessary to calculate the transformation at all times? E.g. if you iterate over elements: ixsym(i, j+1) = ixsym(i, j) + 1, if i < j.
Another idea, though hardware specific, might be to order your data differently, so that it stays within cache areas of the CPU. (Link)
About your index transformation:
I initially thought you used some variation of the Cantor pairing function to enumerate your symmetric 2D array. I asked my friend Ruby to plot a few pairs and she told me:
(0, 0) -> 0 (0, 1) -> 0 (0, 2) -> 1 (0, 3) -> 3 (0, 4) -> 6
(1, 0) -> 0 (1, 1) -> 1 (1, 2) -> 2 (1, 3) -> 4 (1, 4) -> 7
(2, 0) -> 1 (2, 1) -> 2 (2, 2) -> 3 (2, 3) -> 5 (2, 4) -> 8
(3, 0) -> 3 (3, 1) -> 4 (3, 2) -> 5 (3, 3) -> 6 (3, 4) -> 9
(4, 0) -> 6 (4, 1) -> 7 (4, 2) -> 8 (4, 3) -> 9 (4, 4) -> 10
I would have expected only two occurences of a calculated index, but I see three for some pairs. Is this intended?
Update:
It was the index start, as fellow user Jean-Claude Arbaut pointed out in his comment.
Here is Ruby's answer with indices starting at 1:
(1, 1) -> 1 (1, 2) -> 2 (1, 3) -> 4 (1, 4) -> 7 (1, 5) -> 11
(2, 1) -> 2 (2, 2) -> 3 (2, 3) -> 5 (2, 4) -> 8 (2, 5) -> 12
(3, 1) -> 4 (3, 2) -> 5 (3, 3) -> 6 (3, 4) -> 9 (3, 5) -> 13
(4, 1) -> 7 (4, 2) -> 8 (4, 3) -> 9 (4, 4) -> 10 (4, 5) -> 14
(5, 1) -> 11 (5, 2) -> 12 (5, 3) -> 13 (5, 4) -> 14 (5, 5) -> 15
I am new to NLP, please clarify on how the TFIDF values are transformed using fit_transform.
Below formula for calculating the IDF is working fine,
log (total number of documents + 1 / number of terms occurrence + 1) + 1
EG: IDF value for the term "This" in the document 1("this is a string" is 1.91629073
After applying fit_transform, values for all the terms are changed, what is the formula\logic used for the transformation
TFID = TF * IDF
EG: TFIDF value for the term "This" in the document 1 ("this is a string") is 0.61366674
How this value is arrived, 0.61366674?
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd
d = pd.Series(['This is a string','This is another string',
'TFIDF Computation Calculation','TFIDF is the product of TF and IDF'])
df = pd.DataFrame(d)
tfidf_vectorizer = TfidfVectorizer()
tfidf = tfidf_vectorizer.fit_transform(df[0])
print (tfidf_vectorizer.idf_)
#output
#[1.91629073 1.91629073 1.91629073 1.91629073 1.91629073 1.22314355 1.91629073
#1.91629073 1.51082562 1.91629073 1.51082562 1.91629073 1.51082562]
##-------------------------------------------------
##how the above values are getting transformed here
##-------------------------------------------------
print (tfidf.toarray())
#[[0. 0. 0. 0. 0. 0.49681612 0.
#0. 0.61366674 0. 0. 0. 0.61366674]
# [0. 0.61422608 0. 0. 0. 0.39205255
# 0. 0. 0.4842629 0. 0. 0. 0.4842629 ]
# [0. 0. 0.61761437 0.61761437 0. 0.
# 0. 0. 0. 0. 0.48693426 0. 0. ]
# [0.37718389 0. 0. 0. 0.37718389 0.24075159
# 0.37718389 0.37718389 0. 0.37718389 0.29737611 0.37718389 0. ]]
It's normed TF-IDF vectors because by default norm='l2' according to the documentation. So in the output of tfidf.toarray() each element on level 0 / row of the array represents a document and each element of level 1 / column represents a unique word with the sum of squares of vector elements for each document being equal to 1, which you can check by printing print([sum([word ** 2 for word in doc]) for doc in tfidf.toarray()]).
norm : ‘l1’, ‘l2’ or None, optional (default=’l2’)
Each output row will have unit norm, either: * ‘l2’: Sum of squares of vector elements is 1. The cosine similarity between two
vectors is their dot product when l2 norm has been applied. * ‘l1’:
Sum of absolute values of vector elements is 1. See
preprocessing.normalize
print(tfidf) #the same values you find in tfidf.toarray() but more readable
output: ([index of document on array lvl 0 / row], [index of unique word on array lvl 1 / column]) normed TF-IDF value
(0, 12) 0.6136667440107333 #1st word in 1st sentence: 'This'
(0, 5) 0.4968161174826459 #'is'
(0, 8) 0.6136667440107333 #'string', see that word 'a' is missing
(1, 12) 0.48426290003607125 #'This'
(1, 5) 0.3920525532545391 #'is'
(1, 8) 0.48426290003607125 #'string'
(1, 1) 0.6142260844216119 #'another'
(2, 10) 0.48693426407352264 #'TFIDF'
(2, 3) 0.6176143709756019 #'Computation'
(2, 2) 0.6176143709756019 #'Calculation'
(3, 5) 0.2407515909314943 #'is'
(3, 10) 0.2973761110467491 #'TFIDF'
(3, 11) 0.37718388973255157 #'the'
(3, 7) 0.37718388973255157 #'product'
(3, 6) 0.37718388973255157 #'of'
(3, 9) 0.37718388973255157 #'TF'
(3, 0) 0.37718388973255157 #'and'
(3, 4) 0.37718388973255157 #'IDF'
Because it's normed TF-IDF values the sum of squares of vector elements will be qual to 1. E.g. for the first document at index 0, the sum of squares of vector elements will be equal to 1: sum([0.6136667440107333 ** 2, 0.4968161174826459 ** 2, 0.6136667440107333 ** 2])
You can turn off this transformation by setting norm=None.
print(TfidfVectorizer(norm=None).fit_transform(df[0])) #the same values you find in TfidfVectorizer(norm=None).fit_transform(df[0]).toarray(), but more readable
output: ([index of document on array lvl 0 / row], [index of unique word on array lvl 1 / column]) TF-IDF value
(0, 12) 1.5108256237659907 #1st word in 1st sentence: 'This'
(0, 5) 1.2231435513142097 #'is'
(0, 8) 1.5108256237659907 #'string', see that word 'a' is missing
(1, 12) 1.5108256237659907 #'This'
(1, 5) 1.2231435513142097 #'is'
(1, 8) 1.5108256237659907 #'string'
(1, 1) 1.916290731874155 #'another'
(2, 10) 1.5108256237659907 #'TFIDF'
(2, 3) 1.916290731874155 #'Computation'
(2, 2) 1.916290731874155 #'Calculation'
(3, 5) 1.2231435513142097 #'is'
(3, 10) 1.5108256237659907 #'TFIDF'
(3, 11) 1.916290731874155 #'the'
(3, 7) 1.916290731874155 #'product'
(3, 6) 1.916290731874155 #'of'
(3, 9) 1.916290731874155 #'TF'
(3, 0) 1.916290731874155 #'and'
(3, 4) 1.916290731874155 #'IDF'
Because every word just appears once in each document, the TF-IDF values are the IDF values of each word times 1:
tfidf_vectorizer = TfidfVectorizer(norm=None)
tfidf = tfidf_vectorizer.fit_transform(df[0])
print(tfidf_vectorizer.idf_)
output: Smoothed IDF-values
[1.91629073 1.91629073 1.91629073 1.91629073 1.91629073 1.22314355
1.91629073 1.91629073 1.51082562 1.91629073 1.51082562 1.91629073
1.51082562]
I hope, the above is helpful to you.
Unfortunately, I cannot reproduce the transformation, because
The cosine similarity between two vectors is their dot product when l2
norm has been applied.
seems to be an additional step. Because the TF-IDF values will be biased by the number of words in each document when you use the default setting norm='l2', I would simply turn this setting off by using norm=None. I figured out, that you cannot simply do the transformation by using:
tfidf_norm_calculated = [
[(word/sum(doc))**0.5 for word in doc]
for doc in TfidfVectorizer(norm=None).fit_transform(df[0]).toarray()]
print(tfidf_norm_calculated)
print('Sum of squares of vector elements is 1: ', [sum([word**2 for word in doc]) for doc in tfidf_norm_calculated])
print('Compare to:', TfidfVectorizer().fit_transform(df[0]).toarray())
So in a program I am creating I have a list that contains tuples, and each tuple contains 3 numbers. For example...
my_list = [(1, 2, 4), (2, 4, 1), (1, 5, 2), (1, 4, 1),...]
Now I want to delete any tuple whose last two numbers are less than any other tuple's last two numbers are.
The first number has to be the same to delete the tuple. *
So with the list of tuples above this would happen...
my_list = [(1, 2, 4), (2, 4, 1), (1, 5, 2), (1, 4, 1),...]
# some code...
result = [(1, 2, 4), (2, 4, 1), (1, 5, 2)]
The first tuple is not deleted because (2 and 4) are not less than (4 and 1 -> 2 < 4 but 4 > 1), (1 and 5 -> 2 > 1), or (4 and 1 -> 2 < 4 but 4 > 1)
The second tuple is not deleted because its first number (2) is different than every other tuples first number.
The third tuple is not deleted for the same reason the first tuple is not deleted.
The fourth tuple is deleted because (4 and 1) is less than (5 and 2 -> 4 < 5 and 1 < 2)
I really need help because I am stuck in my program and I have no idea what to do. I'm not asking for a solution, but just some guidance as to how to even begin solving this. Thank you so much!
I think this might actually work. I just figured it out. Is this the best solution?
results = [(1, 2, 4), (2, 4, 1), (1, 5, 2), (1, 4, 1)]
for position in results:
for check in results:
if position[0] == check[0] and position[1] < check[1] and position[2] < check[2]:
results.remove(position)
Simple list comprehension to do this:
[i for i in l if not any([i[0]==j[0] and i[1]<j[1] and i[2]<j[2] for j in my_list])]
Your loop would work too, but be sure not to modify the list as you are iterating over it.
my_list = [(1, 2, 4), (2, 4, 1), (1, 5, 2), (1, 4, 1)]
results = []
for position in my_list:
for check in my_list:
if not (position[0] == check[0] and position[1] < check[1] and position[2] < check[2]):
results.append(position)
results
>[(1, 2, 4), (2, 4, 1), (1, 5, 2)]
i'd like to be able to calculate the 'mean brightest point' in a line of pixels. It's for a primitive 3D scanner.
for testing i simply stepped through the pixels and if the current pixel is brighter than the one before, the brightest point of that line will be set to the current pixel. This of course gives very jittery results throughout the image(s).
i'd like to get the 'average center of the brightness' instead, if that makes sense.
has to be a common thing, i'm simply lacking the right words for a google search.
Calculate the intensity-weighted average of the offset.
Given your example's intensities (guessed) and offsets:
0 0 0 0 1 3 2 3 1 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
this would give you (5+3*6+2*7+3*8+9)/(1+3+2+3+1) = 7
You're looking for 1D Convolution which takes a filter with which you "convolve" the image. For example, you can use a Median filter (borrowing example from Wikipedia)
x = [2 80 6 3]
y[1] = Median[2 2 80] = 2
y[2] = Median[2 80 6] = Median[2 6 80] = 6
y[3] = Median[80 6 3] = Median[3 6 80] = 6
y[4] = Median[6 3 3] = Median[3 3 6] = 3
so
y = [2 6 6 3]
So here, the window size is 3 since you're looking at 3 pixels at a time and replacing the pixel around this window with the median. A window of 3 means, we look at the first pixel before and first pixel after the pixel we're currently evaluating, 5 would mean 2 pixels before and after, etc.
For a mean filter, you do the same thing except replace the pixel around the window with the average of all the values, i.e.
x = [2 80 6 3]
y[1] = Mean[2 2 80] = 28
y[2] = Mean[2 80 6] = 29.33
y[3] = Mean[80 6 3] = 29.667
y[4] = Mean[6 3 3] = 4
so
y = [28 29.33 29.667 4]
So for your problem, y[3] is the "mean brightest point".
Note how the borders are handled for y[1] (no pixels before it) and y[4] (no pixels after it)- this example "replicates" the pixel near the border. Therefore, we generally "pad" an image with replicated or constant borders, convolve the image and then remove those borders.
This is a standard operation which you'll find in many computational packages.
your problem is like finding the longest sequence problem. once you are able to determine a sequence( the starting point and the length), the all that remains is finding the median, which is the central element.
for finding the sequence, definition of bright and dark has to be present, either relative -> previous value or couple of previous values. absolute: a fixed threshold.