So my current project is mostly in Python, but I'm looking to rewrite the most computationally expensive portions in C++ to try and boost performance. Much of this I can achieve via simple functions loaded from DLL files, but not everything. I have a multidimensional array in Python that I want to perform operations on in C++ (particularly A* pathfinding), but I'm not really sure how to translate them, and constantly sending data one piece at a time into a loaded function seems really inefficient (the array's first two dimensions are in the low hundreds, and the functions will need to deal with scores, if not hundreds, of elements in the array at a time).
My idea was to have a class in C++ that creates its own copy of the array at setup (where performance isn't as much of an issue) and has methods that are performed on the array and return data to the main Python program. However, I'm not sure how to accomplish this, and even if this is the proper way to go about such a thing; that would seem to imply having C++ code running parallel to the main Python program, and intuition tells me that's a bad idea.
I don't know much about integrating C++ and Python beyond how to load simple functions via cTypes in Python, so I'd really appreciate some pointers here. Keep in mind I'm relatively new to C++ at all; most of my programming experience is in Python. What would be the best way to fit the two together in this situation?
First and foremost, when you are working with multidimensional arrays in Python, you should really be using NumPy. Chances are that your program is already fast enough when you let NumPy do the number crunching (use Array arithmetic instead of Python for loops).
If this is not enough, consider writing parts of your program using Cython. Cython also supports NumPy arrays and provides a painless way to write C code using Python-like syntax.
If it really must be C++, I highly recommend using Boost.Python. Bridging Python and C++ has never been this easy. Plus, Boost.Python comes with NumPy support as well (boost::numeric::array).
Take a look at Cython.
Related
I use it to implement neural networks. I prefer NumPy, because it is more convenient to prepare data with Python; however, I am concerned that NumPy is not as fast as c++ libraries.
NumPy is implemented in C. So most of the time you just call C and for some functionality optimized Fortran functions or subroutines. Therefore, you will get a decent speed with NumPy for many tasks. You need to vectorize your operations. Don't write for loops over NumPy arrays. Of course, hand-optimized C code can be faster. On the other hand, NumPy contains a lot of already optimized algorithms that might be faster than not so optimal C code written by less experienced C programmers.
You can gradually move from Python to C with Cython and/or use Numba
for jit-compilation to machine or gpu code.
I have to say that I think that the other answers here are missing things.
First, as #Mike Muller correctly points out, Python's numerical libraries have C or Fortran (or both) backends, so the performance of pure-Python is almost irrelevant (as opposed to the performance of the backend, which can be significant). In this respect, whether you're manipulating something like MKL through Python or C++ - hardly makes a difference.
There are two differences, though:
On the plus side for Python - it is interactive. This means that, especially in conjunction with something like the IPython Notebook, you can perform an operation and plot the result, perform another operation and plot the result, etc. It's hard to get this effect for exploratory analysis with a compiled language like C++ or Java.
On the minus side for Python - it, and its scientific ecosystem, handle multicores imperfectly, to say the least. This is a fundamental problem of the language itself (read about the GIL).
I am looking for a way to write fast code and be able to use builtin vector operations (for the sake of readability).
FORTRAN seems to be the good candidate. However, almost all resources I find on the web are about writing code without array expressions, and have only trivial examples of vector operations.
I feel strong need in some good resource which can cover caveats and give some insight into optimizations of code with vector expressions.
Example:
currently I am not even able to predict the behavior of such code:
! a = [0], indices = [1, 1]
a(indices) = a(indices) + 1
After compiling I get a = [2], but it this correct? If I use openmp, will it behave like this?
Personally, I would be very happy to have something like following examples on numpy:
100 numpy excercises
numpy: tips and tricks to work with data
Getting the Best Performance out of NumPy
Your code is not standard conforming:
Fortran 2008 6.5.3.3.2.3:
If a vector subscript has two or more elements with the same value,
an array section with that vector subscript shall not appear in a
variable definition context (16.6.7). NOTE 6.15
Therefore the result of your operation is not defined by the standard.
Other parts of your question appear to be too broad to treat them here. There are many books about scientific programming in Fortran 90 and later.
Also be aware that by vectorization most people in Fortran and C or C++ mean the usage of SIMD instructions simd and not the vectorized expressions from NumPy. These are just array expressions in Fortran.
I have scanned many sources (~20 books and dozens of web pages). Hard luck I missed something really important. The question I posted is indeed incorrect and comes from my initial high expectation about array operations in fortran.
The answer I would expect is: there are no tools to write short, readable code in fortran with automatic parallelization (to be more precise: there are, but those are proprietary libraries).
The list of intrinsic functions available in fortran is quite short
(link), and consists only of functions easily mapped to SIMD ops.
There are lots of functions that one will be missing.
while this could be resolved by separate library with separate implementation for each platform, fortran doesn't provide such. There are commercial options (see this thread)
Brief examples of missing functions:
no built-in array sort or unique. The proposed way is to use this library, which provides single-threaded code (forget threads and CUDA)
cumulative sum / running sum. One trivially can implement it, but the resulting code will never work fine on threads/CUDA/Xeon Phi/whatever comes next.
bincount, numpy.ufunc.at, numpy.ufunc.reduceat (which is very useful in many applications)
In most cases fortran provides 2x speed up even with simple implementations, but the code written will always be one-threaded, while matlab/numpy functions can be reimplemented for GPU or other parallel platform without any effort from user side (which occasionally happened to MATLAB, also see gnumpy, theano and parakeet)
To conclude, this is bad news for me. Fortran developers really care about having fast programs today, not in the future. I also can't lock my code on proprietary software. And I'm still looking for appropriate tool. (Julia is current candidate)
See also:
STL analogue in fortran
where ready-to-use algorithms are asked.
Numerical recipes: the art of parallel programming author implements basic MATLAB-like operations to have more expressive code
I also find useful these notes to see recommended ways of code optimizations (to see there is no place for vector operations)
numpy, fortran, blitz++: a case study
dicussion about implementing unique in fortran, where proprietary tools are recommended.
I'm debating whether to use C++ or Python for a largely math-based program.
Both have great math libraries, but which language is generally faster for complex math?
You could also consider a hybrid approach. Python is generally easier and faster to develop in, specially for things like user interface, input/output etc.
C++ should certainly be faster for some math operations (although if your problem can be formulated in terms of vector operations or linear algebra than numpy provides a python interface to very efficient vector manipulations).
Python is easy to extend with Cython, Swig, Boost Python etc. so one strategy is write all the bookkeeping type parts of the program in Python and just do the computational code in C++.
I guess it is safe to say that C++ is faster. Simply because it is a compiled language which means that only your code is running, not an interpreter as with python.
It is possible to write very fast code with python and very slow code with C++ though. So you have to program wisely in any language!
Another advantage is that C++ is type safe, which will help you to program what you actually want.
A disadvantage in some situations is that C++ is type safe, which will result in a design overhead. You have to think (maybe long and hard) about function and class interfaces, for instance.
I like python for many reasons. So don't understand this a plea against python.
It all depends if faster is "faster to execute" or "faster to develop". Overall, python will be quicker for development, c++ faster for execution. For working with integers (arithmetic), it has full precision integers, it has a lot of external tools (numpy, pylab...) My advice would be go python first, if you have performance issue, then switch to cpp (or use external libraries written in cpp from python, in an hybrid approach)
There is no good answer, it all depends on what you want to do in terms of research / calculus
It goes witout saying that C++ is going to be faster for intensive numeric computations. However, there are so many pre-existing libraries out there (written in C/C++/Haskell etc..), with Python wrappers - it's just more convenient to utilise the convenience of Python and let the existing libraries carry the load.
One comprehensive system is http://www.sagemath.org and a fairly interesting link is the components it uses at http://sagemath.org/links-components.html.
A system with numpy/scipy and pandas from my experience is normally sufficient for most things.
Use the one you like better (and you should like python better :)).
In either case, any math-intensive computations should be carried out by existing libraries - which aren't language dependent (usually BLAS / LAPACK are used to perform the actual math).
If you choose to go with python, use numpy
for calculations.
Edit: From your comments, it seems that you are very concerned with the speed of your program. The only way to know for sure how much time is wasted by the high level pythonic code is to profile your program (for example, use ipython with run -p).
In most cases, you will find that the high level stuff takes about 10% of the total running time, and therefore switching from python to C++ will only improve that 10% by some factor, for a total gain of perhaps 5% in running time.
I sincerely doubt that Google and Stanford don't know C++.
"Generally faster" is more than just language. Algorithms can make or break a solution, regardless of what language it's written in. A poor choice written in C++ and be beaten by Java or Python if either makes a better algorithm choice.
For example, an in-memory, single CPU linear algebra library will have its doors blown in by a parallelized version done properly.
An implicit algorithm may actually be slower than an explicit one, despite time step stability restrictions, because the latter doesn't have to invert a matrix. This is often true for hyperbolic partial differential equations.
You shouldn't care about "generally faster". You ought to look deeply into the problem you're trying to solve and the algorithms used to solve it. You'll do better that way than a blind language choice.
I would go with Python running on Java platform. This approach is implemented in DataMelt program. Algorithm that call Java libraries from Python can be faster, since JVM optimizes the code for you.
I'm looking for the best choice of C++ Mathematics library to make easier some operations from LabView blocks.
I need to realize many complex mathematics things on C++ alike: linear regression, peak detection, derivative for the graph and many others alike.
I found there are a lot of libraries for it: http://en.wikipedia.org/wiki/List_of_numerical_libraries#C_and_C.2B.2B
Which library is better to chose for my tasks?
(currently Im thinking about boost BLAS but I never worked with it earlier so maybe this choice is wrong)
Note that there isn't much more to boost uBLAS than basic linear algebra; and even if you consider the larger boost "Math and Numerics" section, it can hardly be considered a complete scientific computing package.
GSL is very good in that it's quite comprehensive. However, it's very much a 'C' library so you need to be prepared to work with raw pointers to array data and function pointer callbacks rather than higher level classes.
(Personally these days I find myself using Python/Numpy/Scipy as much as possible; the scope of Scipy is truly incredible and Numpy arrays are fantastically easy to work with; if there was a LabView/Python/Scipy integration that met any other requirements it'd be the first thing I'd be looking it).
I have to port a C++ STL application to Python. I am a Python newbie, but have been programming for over a decade. I have a great deal of experience with the STL, and find that it keeps me hooked to using C++. I have been searching for the following items on Google these past several days:
Python STL (in hope of leveraging my years of STL experience)
Python linked lists
Python advanced list usage
Python list optimization
Python ordered sets
And have found posts about the above topic, tutorials on Python lists that are decidedly NOT advanced, or dead ends. I am really surprised at my lack of success, I think I am just burned out from overworking and entering bad search terms!
(MY QUESTION) Can I get a Python STL wrapper, or an interface to Python lists that works like the STL? If not can someone point me to a truly advanced tutorial or paper on managing very large sorted collections of non trivial objects?
P.S. I can easily implement workarounds for one or two uses, but if management wants to port more code, I want to be ready to replace any STL code I find with equivalent Python code immediately. And YES I HAVE MEASURED AND DO NEED TO HAVE TOTALLY OPTIMAL CODE! I CANT JUST DO REDUNDANT SORTS AND SEARCHES!
(ADDENDUM) Thanks for the replies, I have checked out some of the references and am pleased. In response to some of the comments here:
1 - It is being ported to python because managements says so, I would just as soon leave it alone - if it aint broke, why fix it?
2 - Advanced list usage with non trivial objects, what I mean by that is: Many different ways to order and compare objects, not by one cmp method. I want to splice, sort, merge, search, insert, erase, and combine the lists extensively. I want lists of list iterators, I want to avoid copying.
3 - I now know that built in lists are actually arrays, and I should be looking for a different python class. I think this was the root of my confusion.
4 - Of course I am learning to do things in the Python way, but I also have deadlines. The STL code I am porting is working right, I would like to change it as little as possible, because that would introduce bugs.
Thanks to everyone for their input, I really appreciate it.
Python's "lists" are not linked lists -- they're like Java ArrayLists or C++'s std::vectors, i.e., in lower-level terms, a resizable compact array of pointers.
A good "advanced tutorial" on such subjects is Hettinger's Core Python containers: under the hood presentation (the video at the URL is of the presentation at an Italian conference, but it's in English; another, shorter presentation of essentially the same talk is here).
So the performance characteristics of Python lists are essentially those of C++'s std::vector: Python's .append, like C++'s push_back, is O(1), but insertion or removal "in the middle" is O(N). Consequently, keeping a list sorted (as can be easily done with the help of functions in Python's standard library module bisect) is costly (if items arrive and/or depart randomly, each insertion and removal is O(N), just like similarly maintaining order in an std::vector would be. For some purposes, such as priority queues, you may get away with a "heap queue", also easy to maintain with the help of functions in Python's standard library module heapq -- but of course that doesn't afford the same range of uses as a completely sorted list (or vector) would.
So for purposes for which in C++ you'd use a std::set (and rely on its being ordered, i.e., a hashset wouldn't do -- Python's sets are hash-based, not ordered) you may be better off avoiding Python builtin containers in favor of something like this module (if you need to keep things pure-Python), or this one (which offers AVL trees, not RB ones, but is coded as a C-implemented Python extension and so may offer better performance) if C-coded extensions are OK.
If you do end up using your own module (be it pure Python, or C-coded), you may, if you wish, give it an STL-like veneer/interface (with .begin, .end, iterator objects that are advanced by incrementing rather than, as per normal Python behavior, by calling their next methods, ...), although it will never perform as well as "going with the grain" of the language would (the for statement is optimized to use normal Python iterators, i.e., one with next methods, and it will be faster than wrapping a somewhat awkward while around non-Python-standard, STL-like iterators).
To give an STL-like veneer to any Python built-in container, you'll incur substantial wrapping overhead, so the performance hit may be considerable. If you, as you say, "DO NEED TO HAVE TOTALLY OPTIMAL CODE", using such a veneer just for "syntax convenience" purposes would therefore seem to be a very bad choice.
Boost Python, the Python extension package that wraps the powerful C++ Boost library, might perhaps serve your purposes best.
If I were you I would take the time to learn how to properly use the various data structures available in Python instead of looking for things that are similar to what you know from C++.
It's not like you're looking for something fancy, just working with some data structures. In that case I would refer you to Python's documentation on the subject.
Doing this the 'Python' way would help you and more importantly future maintainers who will wonder why you try to program C++ in Python.
Just to whet your appetite, there's also no reason to prefer STL's style to Python (and for the record, I'm also a C++ programmer who knows STL throughly), consider the most trivial example of constructing a list and traversing it:
The Pythonic way:
mylist = [1, 2, 3, 4]
for value in mylist:
# playaround with value
The STL way (I made this up, to resemble STL) in Python:
mylist = [1, 2, 3, 4]
mylistiter = mylist.begin()
while mylistiter != mylist.end():
value = mylistiter.item()
mylistiter.next()
For linked-list-like operations people usually use collections.deque.
What operations do you need to perform fast? Bisection? Insertion?
I would say that your issues go beyond just STL porting. Since the list, dict, and set data structures, which are bolted on to C++ via the STL, are native to core Python, then their usage is incorporated into common Python code idioms. If you want to give Google another shot, try looking for references for "Python for C++ Programmers". One of your hits will be this presentation by Alex Martelli. It's a little dated, from way back in ought-three, but there is a side-by-side comparison of some basic Python code that reads through a text file, and how it would look using STL.
From there, I would recommend that you read up on these Python features:
iterators
generators
list and generator comprehensions
And these builtin functions:
zip
map
Once you are familiar with these, then you will be able to construct your own translation/mapping between STL usage and Python builtin data structures.
As others have said, if you are looking for a "plug-and-chug" formula to convert STL C++ code to Python, you will just end up with bad Python. Such a brute force approach will never result in the power, elegance, and brevity of a single-line list comprehension. (I had this very experience when introducing Python to one of our managers, who was familiar with Java and C++ iterators. When I showed him this code:
numParams = 1000
paramRequests = [ ("EqptEmulator/ProcChamberI/Sensors",
"ChamberIData%d"%(i%250)) for i in range(numParams) ]
record.internalArray = [ParameterRequest(*pr) for pr in paramRequests]
and I explained that these replaced this code (or something like it, this might be a mishmash of C++ and Java APIs, sorry):
std::vector<ParameterRequest> prs = new std::vector<ParameterRequest>();
for (int i = 0; i<1000; ++i) {
string idstr;
strstream sstr(idstr);
sstr << "ChamberIData" << (i%250);
prs.add(new ParameterRequest("EqptEmulator/ProcChamberI/Sensors", idstr));
}
record.internalArray = new ParameterRequest[prs.size];
prs.toArray(record.internalArray);
One of your instincts from working with C++ will be a reluctance to create new lists from old, but rather to update or filter a list in place. We even see this on many forums from Python developers asking about how to modify a list while iterating over it. In Python, you are much better off building a new list from the old with a list comprehension.
allItems = [... some list of items, perhaps from a database query ...]
validItems = [it for it in allItems if it.isValid()]
As opposed to:
validItems = []
for it in allItems:
if it.isValid():
validItems.add(it)
or worse:
# get list of indexes of items to be removed
removeIndexes = []
for i in range(len(allItems)):
if not allItems[i].isValid():
removeIndexes.add(i)
# don't forget to remove items in descending order, or later indexes
# will be invalidated by earlier removals
sort(removeIndexes,reverse=True)
# copy list
validItems = allItems[:]
# now remove the items from allItems
for idx in removeIndexes:
del validItems[i]
Python STL (in hope of leveraging my years of STL experience) - Start with the collections ABC's to learn what Python has. http://docs.python.org/library/collections.html
Python linked lists. Python lists have all the features you would want from a linked list.
Python advanced list usage. What does this mean?
Python list optimization. What does this mean?
Python ordered sets. You have several choices here; you could invent your own "ordered set" as a list that discards duplicates. You can subclass the heapq and add methods that discard duplicates: http://docs.python.org/library/heapq.html.
In many cases, however, the cost of maintaing an ordered set is actually excessive because it must only be ordered once at the end of the algorithm. In other cases, the "ordered set" really is a heapq -- you never needed the set-like features and only needed the ordering.
Non-Trivial.
(I'm guessing at what you meant by "non-trivial"). All Python objects are equivalent. There's no "trivial" vs. "non-trivial" objects. They're all first-class objects and can all have "non-trivial" complexity without any real work. This is not C++ where there are primitive (non-object) values floating around. Everything's an object in Python.
Management Expectations.
For the most part the C++ brain-cramping doesn't exist in Python. Use the obvious Python classes the obvious way and you'll have much less code. The reduction in code volume is the big win. Often, the management reason for converting C++ to Python is to get rid of the C++ complexity.
Python code will be much simpler, making it much more reliable and much easier to maintain.
While it's generally true that Python is slower than C++, it's also true that picking the right algorithm and data structure can have dramatic improvements on performance. In one benchmark, someone found that Python was actually faster than C because the C program had such a poorly chosen data structure.
It's possible that your C++ has a really poor algorithm and you will see comparable performance from Python.
It's also possible that your C++ program is I/O bound, or has other limitations that will leave the Python running at a comparable speed.
The design of Python is quite intentionally "you can use just a few data structures (arrays and hash tables) for whatever you want to do, and if that isn't fast enough there's always C".
Python's standard library doesn't have a sorted-list data structure like std::set. You can download a red/black tree implementation or roll your own. (For small data sets, just using a list and periodically sorting it is a totally normal thing to do in Python.)
Rolling your own linked list is very easy.