Can _builtin_cpu_is be tricked into checking an array? - c++

I wanted to write a program displaying all CPU options on a current CPU.
Individual calls like this work:
if (__builtin_cpu_is("intel"))
Instead I would prefer to declare an array:
const char* cpuType[] = {
"intel", "atom", "core2", "corei7", "nehalem",
"westmere", "sandybridge", "amd", "amdfam10h", "barcelona",
"shanghai", "istanbul", "btver1", "amdfam15h",
"bdver1", "bdver2", "bdver3", "bdver4", "btver2"
}
and in a loop check the same thing:
for (int i = 0; i < sizeof(cputype)/sizeof(char*); i++)
if (__builtin_cpu_is(cpuType[i])
gcc refuses, saying it has to be a constant string.
Is there any way to make this work aside from writing the code over and over?

Related

How to write directly into a vector array <std::vector< std::array

I'm trying to write directly into a
auto tablelisttt = std::make_shared<std::vector< std::array<std::string, 9>>(size);
for (int j = 1; j <=max; j++)
{
//dummy example code
tablelisttt->operator[0](j)="Hello"; //Only one string at a time in an array
}
so that data from a database is written directly to the vector array as efficiently / quickly as possible.
but no matter what i try i can't get it to write into the vector array directly with the msvc compiler (the gcc online compilers work )
i use c++19

Increasing the size of an array during runtime

I want to dynamically allocate an array in a for loop using pointers. As the for loop proceeds, the size of the array should increase by one and a new element should be added then. The usual method involves using the new operator, but this always allocated a fixed memory at the time of declaration. Is there any way to do so?
I tried to do so using the following code (simplified for explaing the problem):
sameCombsCount = 0;
int **matchedIndicesArray;
for(int i = 0; i<1000; i++) //loop condition a variable
{
sameCombsCount++;
matchedIndicesArray = new int*[sameCombsCount]; // ??
// Now add an element in the new block created...
}
The thing is, I do not know the size of the for loop during execution time. It can vary depending upon execution conditions and inputs given. I don't think this is the correct way to do so. Can someone suggest a way to do so?
std::vector handles the resizing for you:
sameCombsCount = 0;
std::vecotr<int> matchedIndicesArray;
for(int i = 0; i<1000; i++) //loop condition a variable
{
sameCombsCount++;
#if 0
matchedIndicesArray.resize(sameCombsCount);
matchedIndicesArray.back() = someValue;
#else
matchedIndicesArray.push_back(someValue);
#endif
}
The first version does what you wanted and resizes the vector then sets the value. The second version just adds the element directly at the end of the array and should be marginally more efficient.

Putting multiple calculations into a for-loop that uses variables based on iteration number

What is the best way to organize the following into a for loop that iterates X times, but requires updating the variables (velocity, currentPose, targetPoint) depending on the iteration number?
velocity1 = computeVelocity(currentPose1, targetPoint1);
velocity2 = computeVelocity(currentPose2, targetPoint2);
...
velocityX = computeVelocity(currentPoseX, targetPointX);
The for loop would ideally look something like this:
for (int i=0; i<X; i++)
{
velocity_i = computeVelocity(currentPose_i, targetPoint_i);
}
Since for each velocity, there will be an associated (and possibly distinct) currentPose and targetPoint, one way to do it it to have all these variables as std::vectors, or std::array if you know at compile time how many items you will have to store. Then your loop could look like this:
for (int i=0; i<X; i++)
{
velocity[i] = computeVelocity(currentPose[i], targetPoint[i]);
}
I don't think that wanting the i to be a part of the variables' name is doable (although there might be some way to do it using preprocessor macros and the # concatenation operator, I have not thought about it), nor would it be usual C++ code.
For a C++ programmer the vector/array approach is the more natural one.

modifying values in pointers is very slow?

I'm working with a huge amount of data stored in an array, and am trying to optimize the amount of time it takes to access and modify it. I'm using Window, c++ and VS2015 (Release mode).
I ran some tests and don't really understand the results I'm getting, so I would love some help optimizing my code.
First, let's say I have the following class:
class foo
{
public:
int x;
foo()
{
x = 0;
}
void inc()
{
x++;
}
int X()
{
return x;
}
void addX(int &_x)
{
_x++;
}
};
I start by initializing 10 million pointers to instances of that class into a std::vector of the same size.
#include <vector>
int count = 10000000;
std::vector<foo*> fooArr;
fooArr.resize(count);
for (int i = 0; i < count; i++)
{
fooArr[i] = new foo();
}
When I run the following code, and profile the amount of time it takes to complete, it takes approximately 350ms (which, for my purposes, is far too slow):
for (int i = 0; i < count; i++)
{
fooArr[i]->inc(); //increment all elements
}
To test how long it takes to increment an integer that many times, I tried:
int x = 0;
for (int i = 0; i < count; i++)
{
x++;
}
Which returns in <1ms.
I thought maybe the number of integers being changed was the problem, but the following code still takes 250ms, so I don't think it's that:
for (int i = 0; i < count; i++)
{
fooArr[0]->inc(); //only increment first element
}
I thought maybe the array index access itself was the bottleneck, but the following code takes <1ms to complete:
int x;
for (int i = 0; i < count; i++)
{
x = fooArr[i]->X(); //set x
}
I thought maybe the compiler was doing some hidden optimizations on the loop itself for the last example (since the value of x will be the same during each iteration of the loop, so maybe the compiler skips unnecessary iterations?). So I tried the following, and it takes 350ms to complete:
int x;
for (int i = 0; i < count; i++)
{
fooArr[i]->addX(x); //increment x inside foo function
}
So that one was slow again, but maybe only because I'm incrementing an integer with a pointer again.
I tried the following too, and it returns in 350ms as well:
for (int i = 0; i < count; i++)
{
fooArr[i]->x++;
}
So am I stuck here? Is ~350ms the absolute fastest that I can increment an integer, inside of 10million pointers in a vector? Or am I missing some obvious thing? I experimented with multithreading (giving each thread a different chunk of the array to increment) and that actually took longer once I started using enough threads. Maybe that was due to some other obvious thing I'm missing, so for now I'd like to stay away from multithreading to keep things simple.
I'm open to trying containers other than a vector too, if it speeds things up, but whatever container I end up using, I need to be able to easily resize it, remove elements, etc.
I'm fairly new to c++ so any help would be appreciated!
Let's look from the CPU point of view.
Incrementing an integer means I have it in a CPU register and just increments it. This is the fastest option.
I'm given an address (vector->member) and I must copy it to a register, increment, and copy the result back to the address. Worst: My CPU cache is filled with vector pointers, not with vector-member pointers. Too few hits, too much cache "refueling".
If I could manage to have all those members just in a vector, CPU cache hits would be much more frequent.
Try the following:
int count = 10000000;
std::vector<foo> fooArr;
fooArr.resize(count, foo());
for (auto it= fooArr.begin(); it != fooArr.end(); ++it) {
it->inc();
}
The new is killing you and actually you don't need it because resize inserts elements at the end if the size it's greater (check the docs: std::vector::resize)
And the other thing it's about using pointers which IMHO should be avoided until the last moment and it's uneccesary in this case. The performance should be a little bit faster in this case since you get better locality of your references (see cache locality). If they were polymorphic or something more complicated it might be different.

Recursive call segmentation fault issue

quick question again.
I'm creating a recursive function that will look for elements in a array of "source" rules and apply those rules to an "target array" of rules if the "source" rule type is the same as the target character. Furthermore the function checks to see if the target character is in an array of symbols or not and adds it if it is not (and throws a few flags on the newly applied rule as well). This is all driven by a recursive call that uses a counter to determine how many iterations have passed and is used to determine the spot in the target array the new rule should be applied, so we don't overwrite.
I've put in a little debugging code to show the results too.
Here's the function itself:
//Recursively tack on any non terminal pointed elements
int recursiveTack(rule * inrule[], char target, rule * targetrule[],
int counter, char symbols[])
{
printf("Got into recursiveTack\n");
printf("target is %c\n", target);
printf("counter is %d", counter);
for (int k = 0; k < sizeof(inrule); k++)
{
if (inrule[k]->type == target)
{
//doublecheck to see if we're trying to overwrite
if (targetrule[counter]->used = true)
{
counter++;
}
targetrule[counter]->head = inrule[k]->head;
targetrule[counter]->type = inrule[k]->type;
targetrule[counter]->used = true;
//Check to see if the elements are new to the symbols table and need to be added
if (!contains(returnGotoChar(targetrule[counter]), symbols))
{
//If not then add the new symbol
addChar(returnGotoChar(targetrule[counter]), symbols);
//Also set the goto status of the rule
targetrule[counter]->needsGoto = true;
//Also set the rule's currentGotoChar
targetrule[counter]->currentGotoChar = returnGotoChar(
targetrule[counter]);
}
counter++;
//recursivly add elements from non terminal nodes
if (isNonTerm(targetrule[counter]))
{
char newTarget = returnGotoChar(targetrule[counter]);
counter = recursiveTack(inrule, newTarget, targetrule, counter,
symbols);
}
}
}
//return how many elements we've added
return counter;
}
Here's the call:
if(isNonTerm(I[i+first][second]))
{
printf("Confirmed non termainal\n");
printf("Second being passed: %d\n", second);
//Adds each nonterminal rule to the rules for the I[i+first] array
second = recursiveTack(I[i], targetSymbol, I[i+first], second, symbols[first]);
}
All the arrays being passed in have been initialized prior to this point.
However, the output I get indicates that the recursion is getting killed somewhere before it gets off the ground.
Output:
Second being passed: 0
Confirmed non termainal
Got into recursiveTack
target is E
Segmentation fault
Any help would be great, I've got the rest of the program available too if needs be it's around 700 lines including comments though. I'm pretty sure this is just another case of missing something simple, but let me know what you think.
for(int k = 0; k < sizeof(inrule); k++)
sizeof(inrule) is going to return the size of a pointer type (4 or 8). Probably not what you want. You need to pass the size of the arrays as parameters as well, if you are going to use these types of structures.
It would be better to use Standard Library containers like std::vector, though.
if(targetrule[counter]->used = true){
counter++;
}
// what is the guarantee that targetrule[counter] is actually valid? could you do a printf debug before and after it?
The biggest thing I see here is:
for(int k = 0; k < sizeof(inrule); k++)
This isn't going to do what you think. inrule is an array of pointers, so sizeof(inrule) is going to be the number of elements * sizeof(rule*). This could very quickly lead to running off the end of your array.
try changing that to:
for (int k = 0; k < sizeof(inrule) / sizeof(rule*); ++k)
Something else you might consider is an fflush(stdout); after your print statements. You're crashing while some output is still buffered so it's likely hiding where your crash is happening.
EDIT:
That won't work. If you had a function that did something like:
int x[10];
for (int i = 0; i < sizeof(x) / sizeof(int); ++i) ...
It would work, but on the other side of the function call, the type degrades to int*, and sizeof(int*) is not the same as sizeof(int[10]). You either need to pass the size, or ... better yet, use vectors instead of arrays.