C/C++ for Python programmer [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have to switch from Python to C/C++.
Do you know a quick "reference tutorial" or something like that to have a reference to how to start? For example something like the Numpy and Scipy tutorials.
I have read a lot of "documentation", for example
C++ for dummies
the K&R C Programming Language
a lot of blog and online documentation such as: http://eli.thegreenplace.net/2010/01/11/pointers-to-arrays-in-c/,
http://newdata.box.sk/bx/c/
tons of Q&A here on StackOverflow
...
but it's still not clear to me even how to do start porting to C/C++ something like:
#!/usr/bin/env python
import time
import numpy as np
import tables as tb
"""Retrieve 3D positions form 1000 files and store them in one single HDF5 file.
"""
t = time.time()
# Empty array
sample = np.array([])
sample.shape = (0,3)
# Loop over the files
for i in range(0, 1000):
filename = "mill2sort-"+str(i)+"-extracted.h5"
print "Doing ", filename
# Open data file
h5f = tb.openFile(filename, 'r')
# Stack new data under previous data
sample = np.vstack((sample, h5f.root.data.read()))
h5f.close()
# Create the new file
h5 = tb.openFile("mill2sort-extracted-all", 'w')
# Save the array
h5.createArray(h5.root, 'data', sample, title='mill_2_sub_sample_all')
h5.flush()
h5.close()
print "Done in ", time.time()-t, " seconds."
in C or C++. In this example I was not even able to understand how to pass a 3D array to a function that find it's dimensions, something like
int getArrayDimensions(int* array, int *dimensions){
*dimensions = sizeof(*array)/sizeof(array[0]);
return 0;
}
With array being
int array[3][3][3] = ...
Thank you for any suggestion!:)

OK, for that particular example:
you can get the time services from the standard library here
you can use eigen for linear algebra. It's an amazing library, I'm in love with it.
check here to learn how to manipulate files
While using C++, you might miss some features from python, but most of them are actually provided by the boost libraries. For instance returning multiple values from a function is very easy with boost.tuple library as in here. You can use boost::shared_ptr if you don't want to bother yourself with memory management. Or if you want to keep using python to play with your c++ classes, you can use boost.python. Boost.parameter helps you define functions with named arguments. There is also Boost.lambda for lambda functions, but if your environment supports it, you can also use C++11 to have language support for lambda functions. Boost is a gold mine, never stop digging. Just assume that it's part of the standard library. I develop C++ in many different platforms, and neither eigen nor boost has let me down yet.
Here's a good FAQ for C++ best practices. This is a very important principle that you have to keep in mind at all times, while working in C++. I extend it a bit, in my mind and think; If you're going to do something dangerous such as: Allocate memory with a raw new, or index a raw C style array, pass around raw pointers, or do static_cast (even worse reinterpret_cast) etc. They should usually happen in a class somehow dedicated to them, and the code to make sure they don't cause trouble lives very close to them, so that you can see at a glance that everything is under control.
Finally, my favourite!!! Do you want to keep using generators in C++? Here's some dark magic.

Alright, lets just start with C for now.
void readH5Data(FILE *file, int ***sample); // this is for you to implement
void writeH5Data(FILE *file, int ***sample); // this is for you to implement
int main(int argc, const char *argv[])
{
#define width 3
#define height 3
#define depth 3
time_t t = time(NULL);
int ***sample = calloc(width, sizeof(*sample));
for (int i = 0; i < width; i++)
{
sample[i] = calloc(height, sizeof(**sample));
for (int j = 0; j < height; j++)
{
sample[i][j] = calloc(depth, sizeof(***sample));
}
}
for (int i = 0; i < 1000; i++)
{
char *filename[64];
sprintf(filename, "mill2sort-%i-extracted.h5", i);
// open the file
FILE *filePtr = fopen(filename, "r");
if (filePtr == NULL || ferror(filePtr))
{
fprintf(stderr, "%s\n", strerror(errno));
exit(EXIT_FAILURE);
}
readH5Data(filePtr, sample);
fclose(filePtr);
}
char filename[] = "mill2sort-extracted-all";
FILE *writeFile = fopen(filename, "w");
if (writeFile == NULL || ferror(writeFile))
{
fprintf(stderr, "%s\n", strerror(errno));
exit(EXIT_FAILURE);
}
writeH5Data(writeFile, sample);
fflush(writeFile);
fclose(writeFile);
printf("Done in %lli seconds\n", (long long int) (time(NULL) - t));
for (int i = 0; i < width; i++)
{
for (int j = 0; j < width; j++)
{
free(sample[i][j]);
}
free(sample[i]);
}
free(sample);
}
As long as you remember that your array is 3x3x3, you should have no problems overstepping the bounds in your 'writeH5Data' method.

This question is getting quite old, but here is a couple of references that have been useful to me:
A Transition Guide: Python to C++ (pdf)
A Brief Introduction to C++ for Python programmers (incomplete but quite good)

Related

What is the C++ equivalent to this MATLAB code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
In my project, I am responsible for migrating some MATLAB code to C++. The code below refers to serial communication from a computer to a microcontroller. The function CreatePackage generates a package which is then sent to the microcontroller using MATLAB's fwrite(serial) function.
function package = CreatePackage(V)
for ii = 1:size(V,2)
if V(ii) > 100
V(ii) = 100;
elseif V(ii) < -100
V(ii) = -100;
end
end
vel = zeros(1, 6);
for ii = 1:size(V,2)
if V(ii) > 0
vel(ii) = uint8(V(ii));
else
vel(ii) = uint8(128 + abs(V(ii)));
end
end
package = ['BD' 16+[6, vel(1:6)], 'P' 10 13]+0;
And then, to send the package:
function SendPackage(S, Package)
for ii = 1:length(S)
fwrite(S(ii), Package);
end
How can I create an array/vector in C++ to represent the package variable used in the MATLAB code above?
I have no experience with MATLAB so any help would be greatly apreciated.
Thank you!
The package variable is being streamed as 12, unsigned 8-bit integers in your MATLAB code, so I would use a char[12] array in C++. You can double check sizeof(char) on your platform to ensure that char is only 1 byte.
Yes, MATLAB default data-type is a double, but that does not mean your vector V isn't filled with integer values. You have to look at this data or the specs from your equipment to figure this out.
Whatever the values are coming in, you are setting/clipping the outgoing range to [-100, 100] and then offsetting them to the byte range [0, 255].
If you do not know a whole lot about MATLAB, you may be able to leverage what you know from C++ and use C as an interim. MATLAB's fwrite functionality lines up with that of C's, and you can include these functions in C++ with the #include<cstdio.h> preprocessor directive.
Here is an example solution:
#include <cstdio.h> // fwrite
#include <algorithm> // min, max
...
void makeAndSendPackage(int * a6x1array, FILE * fHandles, int numHandles){
char packageBuffer[13] = {'B','D',24,0,0,0,0,0,0,'P','\n','\r',0};
for(int i=0;i<6;i++){
int tmp = a6x1array[i];
packageBuffer[i+3] = tmp<0: abs(max(-100,tmp))+144 ? min(100,tmp)+16;
}
for(int i=0;i<6;i++){
fwrite(fHandles[i],"%s",packageBuffer);
}
}
Let me know if you have questions about the above code.

How can I quickly printf 2 dimensional array of chars made of pointers to pointers without using a loop?

I'm making ASCII game and I need performance, so decided to go with printf(). But there is a problem, I designed my char array as multidimensional char ** array, and printing it outputs garbage of memory instead of data. I know it's possible to print it with a for loop but the performance rapidly drops that way. I need to printf it like a static array[][]. Is there a way?
I did some example of working and notWorking array. I only need printf() to work with nonWorking array.
edit: using Visual Studio 2015 on Win 10, and yeah, I tested performance and cout is much slower than printf (but I don't really know why is this happening)
#include <iostream>
#include <cstdio>
int main()
{
const int X_SIZE = 40;
const int Y_SIZE = 20;
char works[Y_SIZE][X_SIZE];
char ** notWorking;
notWorking = new char*[Y_SIZE];
for (int i = 0; i < Y_SIZE; i++) {
notWorking[i] = new char[X_SIZE];
}
for (int i = 0; i < Y_SIZE; i++) {
for (int j = 0; j < X_SIZE; j++) {
works[i][j] = '#';
notWorking[i][j] = '#';
}
works[i][X_SIZE-1] = '\n';
notWorking[i][X_SIZE - 1] = '\n';
}
works[Y_SIZE-1][X_SIZE-1] = '\0';
notWorking[Y_SIZE-1][X_SIZE-1] = '\0';
printf("%s\n\n", works);
printf("%s\n\n", notWorking);
system("PAUSE");
}
Note: I think I could make some kind of a buffer or static array for just copying and displaying data, but I wonder if that can be done without it.
If you would like to print a 2D structure with printf without a loop, you need to present it to printf as a contiguous one-dimension C string. Since your game needs access to the string as a 2D structure, you could make an array of pointers into this flat structure that would look like this:
Array of pointers partitions the buffer for use as a 2D structure, while the buffer itself can be printed by printf because it is a contiguous C string.
Here is the same structure in code:
// X_SIZE+1 is for '\n's; overall +1 is for '\0'
char buffer[Y_SIZE*(X_SIZE+1)+1];
char *array[Y_SIZE];
// Setup the buffer and the array
for (int r = 0 ; r != Y_SIZE ; r++) {
array[r] = &buffer[r*(X_SIZE+1)];
for (int c = 0 ; c != X_SIZE ; c++) {
array[r][c] = '#';
}
array[r][X_SIZE] = '\n';
}
buffer[Y_SIZE*(X_SIZE+1)] = '\0';
printf("%s\n", buffer);
Demo.
Some things you can do to increase performance:
There is absolutely no reason to have an array of pointers, each pointing at an array. This will cause heap fragmentation as your data will end up all over the heap. Allocating memory in adjacent cells have many benefits in terms of speed, for example it might improve the use of data cache.
Instead, allocate a true 2D array:
char (*array2D) [Y] = new char [X][Y];
printf as well as cout are both incredibly slow, as they come with tons of overhead and extra features which you don't need. Since they are just advanced wrappers around the system-specific console functions, you should consider using the system-specific functions directly. For example, the Windows console API. It will however turn your program non-portable.
If that's not an option, you could try to use puts instead of printf, since it has far less overhead.
Main performance issue with printf/cout is that they write to the end of the "standard output stream", meaning you can't write where you like, but always at the bottom of the screen. Forcing you to constantly redraw the whole thing every time you changed something, which will be slow and possibly cause flicker issues.
Old DOS/Turbo C programs solved this with a non-standard function called gotoxy which allowed you to move the "cursor" and print where you liked. In modern programming, you can do this with the console API functions. Example for Windows.
You could/should separate graphics from the rest of the program. If you have one thread handing graphics only and the main thread handling algorithms, the graphic updates will work smoother, without having to wait for whatever else the program is doing. It makes the program far more advanced though, as you have to consider thread safety issues.

C++ ASIO, accessing buffers

I have no experience in audio programming and C++ is quite low level language so I have a little problems with it. I work with ASIO SDK 2.3 downloaded from http://www.steinberg.net/en/company/developers.html.
I am writing my own host based on example inside SDK.
For now I've managed to go through the whole sample and it looks like it's working. I have external sound card connected to my PC. I've successfully loaded driver for this device, configured it, handled callbacks, casting data from analog to digital etc. common stuff.
And part where I am stuck now:
When I play some track via my device I can see bars moving in the mixer (device's software). So device is connected in right way. In my code I've picked the inputs and outputs with the names of the bars that are moving in mixer. I've also used ASIOCreateBuffers() to create buffer for each input/output.
Now correct me if I am wrong:
When ASIOStart() is called and driver is in running state, when I input the sound signal to my external device I believe the buffers get filled with data, right?
I am reading the documentation but I am a bit lost - how can I access the data being sent by device to application, stored in INPUT buffers? Or signal? I need it for signal analysis or maybe recording in future.
EDIT: If I had made it to complicated then in a nutshell my question is: how can I access input stream data from code? I don't see any objects/callbacks letting me to do so in documentation.
The hostsample in the ASIO SDK is pretty close to what you need. In the bufferSwitchTimeInfo callback there is some code like this:
for (int i = 0; i < asioDriverInfo.inputBuffers + asioDriverInfo.outputBuffers; i++)
{
int ch = asioDriverInfo.bufferInfos[i].channelNum;
if (asioDriverInfo.bufferInfos[i].isInput == ASIOTrue)
{
char* buf = asioDriver.bufferInfos[i].buffers[index];
....
Inside of that if block asioDriver.bufferInfos[i].buffers[index] is a pointer to the raw audio data (index is a parameter to the method).
The format of the buffer is dependent upon the driver and that can be discovered by testing asioDriverInfo.channelInfos[i].type. The types of formats will be 32bit int LSB first, 32bit int MSB first, and so on. You can find the list of values in the ASIOSampleType enum in asio.h. At this point you'll want to convert the samples to some common format for downstream signal processing code. If you're doing signal processing you'll probably want convert to double. The file host\asioconvertsample.cpp will give you some idea of what's involved in the conversion. The most common format you're going to encounter is probably INT32 MSB. Here is how you'd convert it to double.
for (int i = 0; i < asioDriverInfo.inputBuffers + asioDriverInfo.outputBuffers; i++)
{
int ch = asioDriverInfo.bufferInfos[i].channelNum;
if (asioDriverInfo.bufferInfos[i].isInput == ASIOTrue)
{
switch (asioDriverInfo.channelInfos[i].type)
{
case ASIOInt32LSB:
{
double* pDoubleBuf = new double[_bufferSize];
for (int i = 0 ; i < _bufferSize ; ++i)
{
pDoubleBuf[i] = *(int*)asioDriverInfo.bufferInfos.buffers[index] / (double)0x7fffffff;
}
// now pDoubleBuf contains one channels worth of samples in the range of -1.0 to 1.0.
break;
}
// and so on...
Thank you very much. Your answer helped quite much but as I am inexperienced with C++ a bit :P I find it a bit problematic.
In general I've written my own host based on hostsample. I didn't implement asioDriverInfo structure and use common variables for now.
My first problem was:.
char* buf = asioDriver.bufferInfos[i].buffers[index];
as I got error that I can't cast (void*) to char* but this probably solved the problem:
char* buf = static_cast<char*>(bufferInfos[i].buffers[doubleBufferIndex]);
My second problem is with the data conversion. I've checked the file you've recommended me but I find it a little black magic. For now I am trying to follow your example and:
for (int i = 0; i < inputBuffers + outputBuffers; i++)
{
if (bufferInfos[i].isInput)
{
switch (channelInfos[i].type)
{
case ASIOSTInt32LSB:
{
double* pDoubleBuf = new double[buffSize];
for (int j = 0 ; j < buffSize ; ++j)
{
pDoubleBuf[j] = bufferInfos[i].buffers[doubleBufferIndex] / (double)0x7fffffff;
}
break;
}
}
}
I get error there:
pDoubleBuf[j] = bufferInfos[i].buffers[doubleBufferIndex] / (double)0x7fffffff;
which is:
error C2296: '/' : illegal, left operand has type 'void *'
What I don't get is that in your example there is no table there: asioDriverInfo.bufferInfos.buffers[index] after bufferInfos and even if I fix it... to what kind of type should I cast it to make it work. P
PS. I am sure ASIOSTInt32LSB data type is fine for my PC.
The ASIO input and output buffers are accessible using void pointers, but using memcpy or memmove to access I/O buffer will create a memory copy which is to be avoided if you are doing real-time processing. I would suggest casting the pointer type to int* so you can directly access them.
It's also very slow in real-time processing to cast types 1 by 1 when you have like 100+ audio channels when AVX2 is supported on most CPUs.
_mm256_loadu_si256() and _mm256_cvtepi32_ps() will do the conversion much faster.

Python around 40 times slower than c++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I was implementing a rolling median solution and was not sure why my python implementation was around 40 times slower than c++ implementation.
Here are the complete implementations
C++
#include <iostream>
#include <vector>
#include <string.h>
using namespace std;
int tree[17][65536];
void insert(int x) { for (int i=0; i<17; i++) { tree[i][x]++; x/=2; } }
void erase(int x) { for (int i=0; i<17; i++) { tree[i][x]--; x/=2; } }
int kThElement(int k) {
int a=0, b=16;
while (b--) { a*=2; if (tree[b][a]<k) k-=tree[b][a++]; }
return a;
}
long long sumOfMedians(int seed, int mul, int add, int N, int K) {
long long result = 0;
memset(tree, 0, sizeof(tree));
vector<long long> temperatures;
temperatures.push_back( seed );
for (int i=1; i<N; i++)
temperatures.push_back( ( temperatures.back()*mul+add ) % 65536 );
for (int i=0; i<N; i++) {
insert(temperatures[i]);
if (i>=K) erase(temperatures[i-K]);
if (i>=K-1) result += kThElement( (K+1)/2 );
}
return result;
}
// default input
// 47 5621 1 125000 1700
// output
// 4040137193
int main()
{
int seed,mul,add,N,K;
cin >> seed >> mul >> add >> N >> K;
cout << sumOfMedians(seed,mul,add,N,K) << endl;
return 0;
}
Python
def insert(tree,levels,n):
for i in xrange(levels):
tree[i][n] += 1
n /= 2
def delete(tree,levels,n):
for i in xrange(levels):
tree[i][n] -= 1
n /= 2
def kthElem(tree,levels,k):
a = 0
for b in reversed(xrange(levels)):
a *= 2
if tree[b][a] < k:
k -= tree[b][a]
a += 1
return a
def main():
seed,mul,add,N,K = map(int,raw_input().split())
levels = 17
tree = [[0] * 65536 for _ in xrange(levels)]
temps = [0] * N
temps[0] = seed
for i in xrange(1,N):
temps[i] = (temps[i-1]*mul + add) % 65536
result = 0
for i in xrange(N):
insert(tree,levels,temps[i])
if (i >= K):
delete(tree,levels,temps[i-K])
if (i >= K-1):
result += kthElem(tree,levels,((K+1)/2))
print result
# default input
# 47 5621 1 125000 1700
# output
# 4040137193
main()
On the above mentioned input (in the comments of the code) C++ code took around 0.06 seconds while python took around 2.3 seconds.
Can some one suggest the possible problems with my python code and how to improve to less than 10x performance hit?
I dont expect it to be anywhere near c++ implementation but to the order of 5-10x. I know I can optimize this by using libraries like numpy (and/or scipy). I am asking this question from the point of view of using python for solving programming challenges. These libraries are usually not allowed in these challenges. I am just asking if it is even possible to beat the timelimit for this algorithm in python.
If somebody is interested C++ code is borrowed from Floating median problem at http://community.topcoder.com/tc?module=Static&d1=match_editorials&d2=srm310
[Edit]
For those who think using numpy arrays will improve the performance, it does not. On the otherhand just using numpy ndarray instead of list of list, performace further degraded to around 14 seconds which is more than 200x slowdown from c++.
Pure Python code which is compute-bound and written procedurally is likely to be slow, as you have found. If you want to make something in Python which runs quickly for tasks like this, you'll need to use some C (or C++, Fortran, or other) extensions, which are abundant. For example, statistics and math people use NumPy and SciPy and related tools, which are easy to use from Python but which are actually implemented in compiled languages and have high performance (if used carefully).
If you want to try to squeeze a bit more performance out of pure Python, you can try using the "cProfile" module to analyze your code. But it probably won't get anywhere near C++ speed unless you use smarter modules like NumPy or write your own extensions.
You might gain a small amount by refactoring this:
reversed(xrange(levels))
Especially if you are using Python 2.x, as this will create an actual list. You can instead do something like this:
xrange(levels - 1, -1, -1)
Can some one suggest [...] how to improve to less than 10x performance hit?
Profile the code.
Look into using NumPy instead of native lists.
If that turns out to not be enough, look into using Cython for the critical part.

CUDA 5.0 context management with single application thread in multiple GPU environment

It seems that most tutorials, guides, books and Q&A from the web refers to CUDA 3 and 4.x, so that is why I'm asking it specifically about CUDA 5.0. To the question...
I would like to program for an environment with two CUDA devices, but use only one thread, to make the design simple (specially because it is a prototype). I want to know if the following code is valid:
float *x[2];
float *dev_x[2];
for(int d = 0; d < 2; d++) {
cudaSetDevice(d);
cudaMalloc(&dev_x[d], 1024);
}
for(int repeats = 0; repeats < 100; repeats++) {
for(int d = 0; d < 2; d++) {
cudaSetDevice(d);
cudaMemcpy(dev_x[d],x[d],1024,cudaMemcpyHostToDevice);
some_kernel<<<...>>>(dev_x[d]);
cudaMemcpy(x[d],dev_x[d],1024,cudaMemcpyDeviceToHost);
}
cudaStreamSynchronize(0);
}
I would like to know specifically if cudaMalloc(...)s from before the testing for persist even with the interchanging of cudaSetDevice() that happens in the same thread. Also, I would like to know if the same happens with context-dependent objects such as cudaEvent_t and cudaStream_t.
I am asking it because I have an application in this style that keeps getting some mapping error and I can't find what it is, if some missing memory leak or wrong API usage.
Note: In my original code, I do check every single CUDA call. I did not put it here for code readability.
Is this just a typo?
for(int d = 0; d < 2; d++) {
cudaSetDevice(0); // shouldn't that be 'd'
cudaMalloc(&dev_x, 1024);
}
Please check the return value of all API calls!