How do programs calculate square roots?

How do programs calculate square roots? - c++

I understand that this is a pretty math-y question, but how do programs get square roots? From what I've read, this is something that is usually native to the cpu of a device, but I need to be able to do it, probably in c++ (although that's irrelevant).
The reason I need to know about this specifically is that I have an intranet server and I am getting started with crowdsourcing. For this, I am going to start with finding a lot of digits of a certain square root, like sqrt(17) or something.
This is the extent of what python provides is just math.sqrt()
I am going to make a client that can work with other identical clients, so I need complete control over the processes of the math. Heck, this question might not even have an answer, but thanks for your help anyway.
[edit]
I got it working, this is the 'final' product of it: (many thanks to #djhaskin987)
def square_root(number):
old_guess = 1
guess = 2
guesssquared = 0
while round(guesssquared, 10) != round(number, 10):
old_guess = guess
guess = ((number / guess) + guess ) / 2
print(guess)
guesssquared = guess * guess
return guess
solution = square_root(7) #finds square root of 7
print(solution)

Computers use a method that people have actually been using since babylonian times:
def square_root(number):
old_guess = 1
guess = 2
while old_guess != guess:
old_guess = guess
guess = ((number / guess) + guess ) / 2
return guess

x86 has many sqrt in registry, starting with FSQRT for float.
In general, if your function is too complicated or has no implementation, and is C^\infty ("infinitely" differentiable), you can expand it into a polynom via Taylor expansion. This is extremely common in HPC.

Related

Boolean logic on my fragments takes a lot of VRAM, how can I avoid this?

I have a very simple request from GLSL 330:
if (colorOut.r <= 1.0 && colorOut.r > 0.7)
{
colorOut.r=*color_1.r;
}
I have over 40 compares like this.
However, this is creating a world of trouble for me, as I've been told AND, NOT, etc statements take a lot of video memory, and I'm developing a plugin for After Effects, and people who happen to use them don't have strong GPUs (for the most part -- I have done a survey and most of them use mobile version of mid-end GPUs). so I thought I'd ask you guys if there's a possible alternative to using AND or even if, because I've been told fragment shaders don't like if in the main branch at all.
Thanks.

A multiplexing scenario like yours you can use branchless programming. You could for example use something like this. The boolean operators are "approximated".
colorOut.r = mix(colorOut.r, (colorOut.r*color_1.r),
( clamp(pow(1-colorOut.r, 20), 0, 1)
* clamp(pow(colorOut.r-0.7, 20), 0, 1) ) );
Note that a ternary usually doesn't cause that much problems and this should be easy on resources, since it doesn't causes diverging branches:
colorOut.r = mix(colorOut.r, (colorOut.r*color_1.r),
( colorOut.r <= 1 && colorOut.r > 0.7 ? 1 : 0 );

How to use tf.contrib.rnn.convLSTMCell class in tensorflow

I would like to use a convolution LSTM in my research but I'm having a difficult time figuring out the exact way to implement this class in tensorflow. Here is what I have so far. I get no errors, but I am seriously doubting my implementation. Can anyone confirm if I am doing this correctly?
n_input = 4
x = tf.placeholder(tf.float32,shape=[None,n_input,HEIGHT,WIDTH,2])
y = tf.placeholder(tf.float32,shape=[None,HEIGHT,WIDTH,2])
convLSTM_cell = tf.contrib.rnn.ConvLSTMCell(
conv_ndims=2,
input_shape = [HEIGHT,WIDTH,DEPTH],
output_channels=2,
kernel_shape=[3,3]
)
outputs, states = tf.nn.dynamic_rnn(convLSTM_cell, x, dtype=tf.float32)
weights = tf.Variable(tf.random_normal([3,3,2,2]))
biases = tf.Variable(tf.random_normal([2]))
conv_out = tf.nn.conv2d(outputs[-1],weights,strides=[1,1,1,1],padding='SAME')
out = tf.nn.sigmoid(conv_out + biases)
UPDATE:
printing the size of outputs gives the shape=(?,4,436,1024,2) but I think I want (?,5,436,1024,2) or (?,1,436,1024,2).
UPDATE2:
So according to a fellow lab mate, the 4 outputs corresponds to the lstm outputs for each frame and so it is working correctly. Apparently all I have to do is take output #4 and that is the predicted future time frame.
A stackoverflow confirmation would put my mind at ease on this whole thing.

Yes, you are correct!
The output dimension will match the input dimension. If you actually want the (?,5,436,1024,2) output, you will have to look at the history, state.h. the last four [-4] of it will still correspond to the output.

Declaring variables in Python 2.7x to avoid issues later

I am new to Python, coming from MATLAB, and long ago from C. I have written a script in MATLAB which simulates sediment transport in rivers as a Markov Process. The code randomly places circles of a random diameter within a rectangular area of a specified dimension. The circles are non-uniform is size, drawn randomly from a specified range of sizes. I do not know how many times I will step through the circle placement operation so I use a while loop to complete the process. In an attempt to be more community oriented, I am translating the MATLAB script to Python. I used the online tool OMPC to get started, and have been working through it manually from the auto-translated version (was not that helpful, which is not surprising). To debug the code as I go, I use the
MATLAB generated results to generally compare and contrast against results in Python. It seems clear to me that I have declared variables in a way that introduces problems as calculations proceed in the script. Here are two examples of consistent problems between different instances of code execution. First, the code generated what I think are arrays within arrays because the script is returning results which look like:
array([[ True]
[False]], dtype=bool)
This result was generated for the following code snippet at the overlap_logix operation:
CenterCoord_Array = np.asarray(CenterCoordinates)
Diameter_Array = np.asarray(Diameter)
dist_check = ((CenterCoord_Array[:,0] - x_Center) ** 2 + (CenterCoord_Array[:,1] - y_Center) ** 2) ** 0.5
radius_check = (Diameter_Array / 2) + radius
radius_check_update = np.reshape(radius_check,(len(radius_check),1))
radius_overlap = (radius_check_update >= dist_check)
# Now actually check the overalp condition.
if np.sum([radius_overlap]) == 0:
# The new circle does not overlap so proceed.
newCircle_Found = 1
debug_value = 2
elif np.sum([radius_overlap]) == 1:
# The new circle overlaps with one other circle
overlap = np.arange(0,len(radius_overlap), dtype=int)
overlap_update = np.reshape(overlap,(len(overlap),1))
overlap_logix = (radius_overlap == 1)
idx_true = overlap_update[overlap_logix]
radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
A similar result for the same run was produced for variables:
radius_check_update
radius_overlap
overlap_update
Here is the same code snippet for the working MATLAB version (as requested):
distcheck = ((Circles.CenterCoordinates(1,:)-x_Center).^2 + (Circles.CenterCoordinates(2,:)-y_Center).^2).^0.5;
radius_check = (Circles.Diameter ./ 2) + radius;
radius_overlap = (radius_check >= distcheck);
% Now actually check the overalp condition.
if sum(radius_overlap) == 0
% The new circle does not overlap so proceed.
newCircle_Found = 1;
debug_value = 2;
elseif sum(radius_overlap) == 1
% The new circle overlaps with one other circle
temp = 1:size(radius_overlap,2);
idx_true = temp(radius_overlap == 1);
radius = distcheck(1,idx_true) - (Circles.Diameter(1,idx_true)/2);
In the Python version I have created arrays from lists to more easily operate on the contents (the first two lines of the code snippet). The array within array result and creating arrays to access data suggests to me that I have incorrectly declared variable types, but I am not sure. Furthermore, some variables have a size, for example, (2L,) (the numerical dimension will change as circles are placed) where there is no second dimension. This produces obvious problems when I try to use the array in an operation with another array with a size (2L,1L). Because of these problems I started reshaping arrays, and then I stopped because I decided these were hacks because I had declared one, or more than one variable incorrectly. Second, for the same run I encountered the following error:
TypeError: 'numpy.ndarray' object is not callable
for the operation:
radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
which occurs at the bottom of the above code snippet. I have posted the entire script at the following link because it is probably more useful to execute the script for oneself:
https://github.com/smchartrand/MarkovProcess_Bedload
I have set-up the code to run with some initial parameter values so decisions do not need to be made; these parameter values produce the expected results in the MATLAB-based script, which look something like this when plotted:
So, I seem to specifically be having issues with operations on lines 151-165, depending on the test value np.sum([radius_overlap]) and I think it is because I incorrectly declared variable types, but I am really not sure. I can say with confidence that the Python version and the MATLAB version are consistent in output through the first step of the while loop, and code line 127 which is entering the second step of the while loop. Below this point in the code the above documented issues eventually cause the script to crash. Sometimes the script executes to 15% complete, and sometimes it does not make it to 5% - this is due to the random nature of circle placement. I am preparing the code in the Spyder (Python 2.7) IDE and will share the working code publicly as a part of my research. I would greatly appreciate any help that can be offered to identify my mistakes and misapplications of python coding practice.

I believe I have answered my own question, and maybe it will be of use for someone down the road. The main sources of instruction for me can be found at the following three web pages:
Stackoverflow Question 176011
SciPy FAQ
SciPy NumPy for Matlab users
The third web page was very helpful for me coming from MATLAB. Here is the modified and working python code snippet which relates to the original snippet provided above:
dist_check = ((CenterCoordinates[0,:] - x_Center) ** 2 + (CenterCoordinates[1,:] - y_Center) ** 2) ** 0.5
radius_check = (Diameter / 2) + radius
radius_overlap = (radius_check >= dist_check)
# Now actually check the overalp condition.
if np.sum([radius_overlap]) == 0:
# The new circle does not overlap so proceed.
newCircle_Found = 1
debug_value = 2
elif np.sum([radius_overlap]) == 1:
# The new circle overlaps with one other circle
overlap = np.arange(0,len(radius_overlap[0]), dtype=int).reshape(1, len(radius_overlap[0]))
overlap_logix = (radius_overlap == 1)
idx_true = overlap[overlap_logix]
radius = dist_check[idx_true] - (Diameter[0,idx_true] / 2)
In the end it was clear to me that it was more straightforward for this example to use numpy arrays vs. lists to store results for each iteration of filling the rectangular area. For the corrected code snippet this means I initialized the variables:
CenterCoordinates, and
Diameter
as numpy arrays whereas I initialized them as lists in the posted question. This made a few mathematical operations more straightforward. I was also incorrectly indexing into variables with parentheses () as opposed to the correct method using brackets []. Here is an example of a correction I made which helped the code execute as envisioned:
Incorrect: radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
Correct: radius = dist_check[idx_true] - (Diameter[0,idx_true] / 2)
This example also shows that I had issues with array dimensions which I corrected variable by variable. I am still not sure if my working code is the most pythonic or most efficient way to fill a rectangular area in a random fashion, but I have tested it about 100 times with success. The revised and working code can be downloaded here:
Working Python Script to Randomly Fill Rectangular Area with Circles
Here is an image of a final results for a successful run of the working code:
The main lessons for me were (1) numpy arrays are more efficient for repetitive numerical calculations, and (2) dimensionality of arrays which I created were not always what I expected them to be and care must be practiced when establishing arrays. Thanks to those who looked at my question and asked for clarification.

OverflowError in a for loop

I'm working on problem 3 of Project Euler using Python, but I can't seem to solve the problem without running into the following error: "OverflowError: range() result has too many items"
I'm wondering if there's a way to increase the allowed range? My code looks as follows:
target = 600851475143
largest_prime_factor = 1
#find largest prime factor of target
for possible_factor in range(2,(target/2)+1):
if target % possible_factor == 0:
is_prime = True
for i in range(2,(possible_factor/2)+1):
if possible_factor % i == 0:
is_prime = False
break
if is_prime:
largest_prime_factor = possible_factor
print largest_prime_factor

If you run into limitations of your computer or language while trying to solve a puzzle problem, or if it takes too long, it is an indication that probably there exists a better way (read: algorithm) to solve the problem. In your case, you do not need to loop to target / 2 + 1 (though that is a good educated upper bound). You only need to go as far as ceil(sqrt(target)).
And, as a sidenote, you can overcome this limitation by using xrange, which will create a generator, instead of range for Python 2, which creates a list. In Python 3, range will return a sequence type instead of a list by default.
Thanks to #Fernando for the clarification in the comments.

Code understanding Question

I was going through one code on net.
I did not understand following logic. This code works and works really fast.
for (int i = 0; i < typo_word_vec.size(); i++)
{
float each_typo_word_len = (float)typo_word_vec[i].size();
int start_range = each_typo_word_len - floor((each_typo_word_len / lower_bound_word_size) * each_typo_word_len) - 1;
if (start_range < 1)
start_range = 1;
int end_range = each_typo_word_len + ceil((each_typo_word_len / upper_bound_word_size) * each_typo_word_len) + 1;
if (end_range > src_word_max_len)
end_range = src_word_max_len - 1;
call_get_dist(i, start_range, end_range);
}
But I do not understand what is the logic behind using start_range and end_range What underlying algorithm or theory is used here.

You really should have posted a few more lines - we definitely need to check the whole code to understand something.
As I understand it, the 'source' words are ordered by size. The 'candidate' words may be shorter or longer than their potential match. That is what start_range and end_range are used for.
Though I have a hard time figuring why the author doesn't use
start_range = 0;
end_range = src_word_max_len;
EDIT:
ok, this is just an optimization from his part (quoting readme.txt):
I solved this using pythonand php first, however, my solution were continuously rejected since it takes too much time to solve it (my guess). In the "cpp" directory, I uploaded my solution with c++ using STL, and finally accepted (The basic idea of algorithm is almost the same: prunning the scan range of source files) Currently, I plan to try this problem using other language such as java next time. The statement of the problem can be found here: http://www.facebook.com/careers/puzzles.php?puzzle_id=17
He just arbitrarily defines a range large enough to have a high enough probability of finding the right matching word in it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do programs calculate square roots? - c++

Computers use a method that people have actually been using since babylonian times: def square_root(number): old_guess = 1 guess = 2 while old_guess != guess: old_guess = guess guess = ((number / guess) + guess ) / 2 return guess

x86 has many sqrt in registry, starting with FSQRT for float. In general, if your function is too complicated or has no implementation, and is C^\infty ("infinitely" differentiable), you can expand it into a polynom via Taylor expansion. This is extremely common in HPC.

Related

Boolean logic on my fragments takes a lot of VRAM, how can I avoid this?

How to use tf.contrib.rnn.convLSTMCell class in tensorflow

Declaring variables in Python 2.7x to avoid issues later

OverflowError in a for loop

Code understanding Question

Categories

Resources