cs231n: how to make the program run faster?

cs231n: how to make the program run faster? - python-2.7

I am interested in this course and also new to python. I try the first NN program but it is quite slow (mostly in the following loop).
# loop over all test rows
for i in xrange(num_test):
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
min_index = np.argmin(distances)
Ypred[i] = self.ytr[min_index]
Is there way to accelerate it?
Thanks.

Answer myself: The parallel approach introduced in this link (Parallelise python loop with numpy arrays and shared-memory) seems to work, basically cython, prange, gil, openmp and other tweaks.

Related

How can I run Tensorflow on one single core, single thread CPP?

I am trying to restrict the number of threads that TensorFlow spawns. In python, I understand we need to use the following steps as pointed out Here. I was trying to do the same in CPP, but it doesn't seem that straight forward.
Questions:
How to modify intra_op_parallelism_threads and inter_op_parallelism_threads correctly?
How to modify the device_count to control the core as well?
SessionOptions options;
ConfigProto* config = &options.config;
string key = "CPU";
//not sure if this is the correct way to do it.
(*config->mutable_device_count())[key] = 1;
config->set_inter_op_parallelism_threads(1);
config->set_intra_op_parallelism_threads(1);

Answer to 1 as Fisa pointed out is correct. With just minor adjustment as config is a pointer.
SessionOptions options;
ConfigProto* config = &options.config;
//single thread control//
config->set_inter_op_parallelism_threads(1);
config->set_intra_op_parallelism_threads(1);
fSession.reset(NewSession(options));

Answer to question 1:
tensorflow::SessionOptions options;
tensorflow::ConfigProto & config = options.config;
config.set_inter_op_parallelism_threads(1);
config.set_intra_op_parallelism_threads(1);
session->reset(tensorflow::NewSession(options));
This will reduce the total number of threads (but not to 1) generated by TensorFlow. The total number of threads generated by TensorFlow will still be multiple, depending on the number of cores in the CPU. In most of the cases, only one thread will be active while others will be in sleeping mode. I don't think it is possible to have a single-threaded TensorFlow.
The following github issues support my point of view.
https://github.com/tensorflow/tensorflow/issues/33627
https://github.com/usnistgov/frvt/issues/30

Declaring variables in Python 2.7x to avoid issues later

I am new to Python, coming from MATLAB, and long ago from C. I have written a script in MATLAB which simulates sediment transport in rivers as a Markov Process. The code randomly places circles of a random diameter within a rectangular area of a specified dimension. The circles are non-uniform is size, drawn randomly from a specified range of sizes. I do not know how many times I will step through the circle placement operation so I use a while loop to complete the process. In an attempt to be more community oriented, I am translating the MATLAB script to Python. I used the online tool OMPC to get started, and have been working through it manually from the auto-translated version (was not that helpful, which is not surprising). To debug the code as I go, I use the
MATLAB generated results to generally compare and contrast against results in Python. It seems clear to me that I have declared variables in a way that introduces problems as calculations proceed in the script. Here are two examples of consistent problems between different instances of code execution. First, the code generated what I think are arrays within arrays because the script is returning results which look like:
array([[ True]
[False]], dtype=bool)
This result was generated for the following code snippet at the overlap_logix operation:
CenterCoord_Array = np.asarray(CenterCoordinates)
Diameter_Array = np.asarray(Diameter)
dist_check = ((CenterCoord_Array[:,0] - x_Center) ** 2 + (CenterCoord_Array[:,1] - y_Center) ** 2) ** 0.5
radius_check = (Diameter_Array / 2) + radius
radius_check_update = np.reshape(radius_check,(len(radius_check),1))
radius_overlap = (radius_check_update >= dist_check)
# Now actually check the overalp condition.
if np.sum([radius_overlap]) == 0:
# The new circle does not overlap so proceed.
newCircle_Found = 1
debug_value = 2
elif np.sum([radius_overlap]) == 1:
# The new circle overlaps with one other circle
overlap = np.arange(0,len(radius_overlap), dtype=int)
overlap_update = np.reshape(overlap,(len(overlap),1))
overlap_logix = (radius_overlap == 1)
idx_true = overlap_update[overlap_logix]
radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
A similar result for the same run was produced for variables:
radius_check_update
radius_overlap
overlap_update
Here is the same code snippet for the working MATLAB version (as requested):
distcheck = ((Circles.CenterCoordinates(1,:)-x_Center).^2 + (Circles.CenterCoordinates(2,:)-y_Center).^2).^0.5;
radius_check = (Circles.Diameter ./ 2) + radius;
radius_overlap = (radius_check >= distcheck);
% Now actually check the overalp condition.
if sum(radius_overlap) == 0
% The new circle does not overlap so proceed.
newCircle_Found = 1;
debug_value = 2;
elseif sum(radius_overlap) == 1
% The new circle overlaps with one other circle
temp = 1:size(radius_overlap,2);
idx_true = temp(radius_overlap == 1);
radius = distcheck(1,idx_true) - (Circles.Diameter(1,idx_true)/2);
In the Python version I have created arrays from lists to more easily operate on the contents (the first two lines of the code snippet). The array within array result and creating arrays to access data suggests to me that I have incorrectly declared variable types, but I am not sure. Furthermore, some variables have a size, for example, (2L,) (the numerical dimension will change as circles are placed) where there is no second dimension. This produces obvious problems when I try to use the array in an operation with another array with a size (2L,1L). Because of these problems I started reshaping arrays, and then I stopped because I decided these were hacks because I had declared one, or more than one variable incorrectly. Second, for the same run I encountered the following error:
TypeError: 'numpy.ndarray' object is not callable
for the operation:
radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
which occurs at the bottom of the above code snippet. I have posted the entire script at the following link because it is probably more useful to execute the script for oneself:
https://github.com/smchartrand/MarkovProcess_Bedload
I have set-up the code to run with some initial parameter values so decisions do not need to be made; these parameter values produce the expected results in the MATLAB-based script, which look something like this when plotted:
So, I seem to specifically be having issues with operations on lines 151-165, depending on the test value np.sum([radius_overlap]) and I think it is because I incorrectly declared variable types, but I am really not sure. I can say with confidence that the Python version and the MATLAB version are consistent in output through the first step of the while loop, and code line 127 which is entering the second step of the while loop. Below this point in the code the above documented issues eventually cause the script to crash. Sometimes the script executes to 15% complete, and sometimes it does not make it to 5% - this is due to the random nature of circle placement. I am preparing the code in the Spyder (Python 2.7) IDE and will share the working code publicly as a part of my research. I would greatly appreciate any help that can be offered to identify my mistakes and misapplications of python coding practice.

I believe I have answered my own question, and maybe it will be of use for someone down the road. The main sources of instruction for me can be found at the following three web pages:
Stackoverflow Question 176011
SciPy FAQ
SciPy NumPy for Matlab users
The third web page was very helpful for me coming from MATLAB. Here is the modified and working python code snippet which relates to the original snippet provided above:
dist_check = ((CenterCoordinates[0,:] - x_Center) ** 2 + (CenterCoordinates[1,:] - y_Center) ** 2) ** 0.5
radius_check = (Diameter / 2) + radius
radius_overlap = (radius_check >= dist_check)
# Now actually check the overalp condition.
if np.sum([radius_overlap]) == 0:
# The new circle does not overlap so proceed.
newCircle_Found = 1
debug_value = 2
elif np.sum([radius_overlap]) == 1:
# The new circle overlaps with one other circle
overlap = np.arange(0,len(radius_overlap[0]), dtype=int).reshape(1, len(radius_overlap[0]))
overlap_logix = (radius_overlap == 1)
idx_true = overlap[overlap_logix]
radius = dist_check[idx_true] - (Diameter[0,idx_true] / 2)
In the end it was clear to me that it was more straightforward for this example to use numpy arrays vs. lists to store results for each iteration of filling the rectangular area. For the corrected code snippet this means I initialized the variables:
CenterCoordinates, and
Diameter
as numpy arrays whereas I initialized them as lists in the posted question. This made a few mathematical operations more straightforward. I was also incorrectly indexing into variables with parentheses () as opposed to the correct method using brackets []. Here is an example of a correction I made which helped the code execute as envisioned:
Incorrect: radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
Correct: radius = dist_check[idx_true] - (Diameter[0,idx_true] / 2)
This example also shows that I had issues with array dimensions which I corrected variable by variable. I am still not sure if my working code is the most pythonic or most efficient way to fill a rectangular area in a random fashion, but I have tested it about 100 times with success. The revised and working code can be downloaded here:
Working Python Script to Randomly Fill Rectangular Area with Circles
Here is an image of a final results for a successful run of the working code:
The main lessons for me were (1) numpy arrays are more efficient for repetitive numerical calculations, and (2) dimensionality of arrays which I created were not always what I expected them to be and care must be practiced when establishing arrays. Thanks to those who looked at my question and asked for clarification.

Moving matrix from c++ to Matlab

I'm trying to take a matrix from c++ and import it to Matlab to run bintprog on this matrix, call it m. My c++ code generates these matrices of a certain type, and I need to run bintprog on them quickly, and with ideally millions of matrices.
So any of the following would be great:
A way to import a bunch of matrices at once so I can run a lot of iterations thru my Matlab code.
Or
If I could implement Matlab code right in c++ nicely.
If this is not clear leave me comments and I'll update what I can.

You can call Matlab commands from C++ code (and vice versa):
Compile your C++ code into a mex function and call bintprog using mexCallMatlab.
As proposed by Mark, you may call Matlab engine from native C++ code using matlab engine.
You may compile your C++ code as a shared library and call it from Matlab using calllib.

I suggest the simple solution, assuming that your matrices are kept in 3-dimmensional array:
Build a loop in C++, to save your matrices... Something like this:
ofstream arquivoOut0("myMatrices.dat");
for(int m=0;m<numberMatrices;m++){
for (int i=0; i< numberlines;i++){
for(int j=0;j<numberColumns;j++)
if(j!=numberColumns-1) arquivoOut0<< matrices[m][i][j] << "\t";
else arquivoOut0<< matrices[m][i][j] << "\n";
}
}
}
arquivoOut0.close();
Ok. You have saved your matrices in an ascii file! Now you have to read it in Matlab!
load myMatrices.dat
for m=1:numberMatrices
for i=1:numberLines
for j=1:numberColumns
myMatricesInMatlab(m,i,j)=myMatrices((m-1)*numberLines+i,j);
end
end
end
Now, you can use the toolbox that you need:
for i=1:numberMatrices
Apply the toolbox for myMatricesInMatlab(i,:,:);
end
I think it works, it the processing time is not an issue!

SFML getFullscreenModes

Have you ever run into issue where function in SFML 2 to get availiable modes returns you:
availiableVideoModes [3]({width=3131961357 height=3131961357 bitsPerPixel=3131961357 },{width=3131961357 height=3131961357 bitsPerPixel=3131961357 },{width=3131961357 height=3131961357 bitsPerPixel=3131961357 }) std::vector >
max int values in vector? Interesting is why 3? I tried quick debugging without luck so in parallel I thought to raise question here.
code:
std::vector<sf::VideoMode> availiableVideoModes;
availiableVideoModes = sf::VideoMode::getFullscreenModes();
interesting is that
desktopVideoMode = sf::VideoMode::getDesktopMode();
returns correct value.

The issue was in libraries link, I have linked 32bits one instead of 64bits.

Embed a stateful Python script into C++ program using boost::python

I have a C++ program that keeps generating data. I have a python class that process these data. I want to use this python class to process the data: when each time a data point is generated, I can use this python script to process the data. But this python script must be "stateful", i.e. it should be able to remember what it did before this data point.
One super basic example is, my C++ program just generates numbers, and my python class calculates the cumulative sums of the numbers generated:
Python:
class CumSum:
def addone(x):
self._cumsum += x;
print self._cumsum;
C++
[Somehow construct a CumSum instance, say c]
for (int i=0; i<100000; i++) {
int x = rand() % 1000;
[Call c.addone(x)]
}
I heard boost::python is a good way to handle this. Can anyone sketch out how to do it? I tried to read boost documents but they were too huge for me to digest.
I appreciate your help.

For basic information about how to execute your python script:
http://www.boost.org/doc/libs/1_47_0/libs/python/doc/tutorial/doc/html/python/embedding.html
For details on manipulating python objects in C++
http://www.boost.org/doc/libs/1_47_0/libs/python/doc/tutorial/doc/html/python/object.html
Much of boost-python is concerned with exporting your C++ classes to python but you aren't doing that so you can ignore it.
You may be better off using a simpler wrapper like SCXX
http://davidf.sjsoft.com/mirrors/mcmillan-inc/scxx.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

cs231n: how to make the program run faster? - python-2.7

Answer myself: The parallel approach introduced in this link (Parallelise python loop with numpy arrays and shared-memory) seems to work, basically cython, prange, gil, openmp and other tweaks.

Related

How can I run Tensorflow on one single core, single thread CPP?

Declaring variables in Python 2.7x to avoid issues later

Moving matrix from c++ to Matlab

SFML getFullscreenModes

Embed a stateful Python script into C++ program using boost::python

Categories

Resources