I am new to C++ and I am solving some easy exercises. I was solving one problem when I came across a behavior that I cannot explain.
My function takes 2 arrays as arguments and I must return sum of all elements in these arrays. My code:
#include <vector>
using namespace std;
int arrayPlusArray(vector<int> a, vector<int> b){
int c=0;
for (auto k : a){
c += k;}
for (auto k : b){
c += k;}
return c;
}
This works, but as soon as I write int c; instead of int c=0; it does not work correctly anymore. I thought that when I write int c; it sets c's value to 0. What is the matter?
Writing int c; does not initialise c (unless c is at global scope or has static storage). Initialisation costs CPU clock cycles so C++ allows the programmer to forgo such unnecessary CPU expenditure at the expense of potential program instability.
In fact, the behaviour on reading an uninitialised variable is undefined (unless it's a char, an signed char or an unsigned char) in C++. Never do it.
Related
I've seen a very experienced programmer do something like:
#include <iostream>
#include <string>
using namespace std;
typedef int BOOL;
BOOL is_max(int a) {
return a&3;
}
class Foo
{
public:
inline Foo() {}
inline ~Foo() {}
int min[3];
int max[3];
};
int main()
{
Foo foo = Foo();
for (int i=0; i<3; i++) {
foo.min[i] = i;
foo.max[i] = i+3;
}
BOOL b = is_max(75); // b==3
// Print out foo.max[1] (which is foo.min[4])
cout << foo.min[1+b] << endl;
}
This is in a very computationally expensive part of the code. So I guess it's indeed faster than create branching with an if condition. Since both arrays (max and min) are of int type and contiguous in the class definition, this should always work.
Is there a reason one should avoid this approach? I know this is probably not the best for code readability and maintainability (e.g. if someone would add a third member in the class definition at the wrong place, i.e. between min and max). Probably a better approach would then tho have
int[6] extrema;
Other than that, would there be other downsides of that approach? Could this lead to premature termination/segmentation fault somehow?
There are two problems, first of all, it is out of bounds access if b is larger then 1 and that is undefined behavior.
Another problem is, that the compiler only sees foo.max[i] = i+3; but has no indication in your code that max is used at any point after that loop. So from the perspective of the optimizer and because accessing max trough min is not valid, it could assume that foo.max[i] = i+3; in the loop is useless and could theoretically optimize it away.
And based on a short look at the compiled output of gcc with optimizations turned on this seems to be indeed the case.
So even if there wouldn't be any unknown padding involved and you could be sure about the memory layout it is still definitely something you must not do.
I solved this introduction problem on hackerrank.
Here is something strange when I try to solve this problem.
the input is
4
1 4 3 2
I want to read the numbers into an array.
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
int a;
int arr[a];
scanf("%d",&a);
for(int i=0; i<=a-1; i++){
scanf("%d",&arr[i]);
printf("i = %d, a = %d\n", i, a);
}
return 0;
}
I got the output:
i = 0, a = 4
i = 1, a = 4
i = 2, a = 4
i = 3, a = 2
The array is correct.
My question is why the value in int a is changed? Why it is changed to 2 instead of 3?
if I rearrange following lines:
int a;
scanf("%d",&a);
int arr[a];
the value in int a is not changed,
i = 0, a = 4
i = 1, a = 4
i = 2, a = 4
i = 3, a = 4
This is wrong:
int a;
int arr[a];
scanf("%d",&a);
Two problems: You are using a before you read the value from the user. Using a unitinitalized is undefined behavior. The output of your code could be anything or nothing.
Then you cannot have a static array with a run-time size. Some compilers support variable length arrays as an extension, but they are not standard c++ (see here).
If you want to write C++, then you should actually use C++. Dynamically sized arrays are std::vector. Your code could look like this:
#include <vector>
#include <iostream>
int main() {
int a;
std::cin >> a; // read user input before you use the value
std::vector<int> x(a); // create vector with a elements
for (size_t i=0; i < x.size(); ++i) {
std::cin >> x[i];
std::cout << "i = " << i << " a = " << a << "\n";
}
}
My question is why the value in int a is changed? Why it is changed to 2 instead of 3?
Undefined behavior means just that, the behavior of your program is undefined. Compilers are not made to compile invalid code. If you do compile invalid code then strange things can happen. Accessing arr[i] is accessing some completely bogus memory address and it can happen that writing to that overwrites the value of a. However, it is important to note that what happens here has little to do with C++, but rather your compiler and the output of the compiler. If you really want to understand the details you need to look at the assembly, but that wont tell you anything about how C++ "works". You can do that with https://godbolt.org/, but maybe the better would be to pay attention to your compilers warnings and try to write correct code.
int a;
int arr[a];
scanf("%d",&a);
This means:
Declare an uninitialised a, with some unspecified value that is not permitted to be used
Declare an array with runtime bounds a (which doesn't exist), which is not permitted
Read user input into a.
Even if these steps were performed in the correct order, you cannot have runtime bounds in C++. Some compilers permit it as an extension, though I've found these to work haphazardly, and it's certainly going to result in strange effects when you use an uninitialised value to do it!
In this case, in practice, you probably have all sorts of weirdness going on in your stack, since you're accessing "non-existent" elements of arr, overwriting the variables on the stack that are "below" it, such as a. Though I caution that trying to analyse the results of undefined behaviour is kind of pointlesss, as they can change at any time for various black-boxed reasons.
Make a nice vector instead.
There are two versions of a function (the code below is a simplified version). Both versions are used in the program. In the actual function, the differences between the two versions can occur at two or three different places.
How to avoid writing both versions in the code without sacrificing performance, through template or other means? This is an attempt
to make the code more readable.
Performance is critical because it will get run many many times, and I am writing benchmark for different implementations.
(Also, is this an ok api, if I am writing a library for a few people?)
Example:
int set_intersect(const int* A, const int s_a,
const int* B, const int s_b,
int* C = 0){
//if (int* C == 0), we are running version
//0 of the function.
//int* C is not known during compilation
//time for version 1.
int Count0 = 0;
//counter for version 0 of the function.
const int* const C_original(C);
//counter and pointer for version 1 of
//the function
int a = 0;
int b = 0;
int A_now;
int B_now;
while(a < s_a && b < s_b){
A_now = A[a];
B_now = B[b];
a += (A_now <= B_now);
b += (B_now <= A_now);
if (A_now == B_now){
if (C == 0){
Count0++;
} else {
C++;
*(C)=A_now;
}
}
}
if (C == 0){
return Count0;
}else{
return C - C_original;
}
}
Thanks.
Updates:
Conditional compile-time inclusion/exclusion of code based on template argument(s)
(some of those templates look so long)
Remove/Insert code at compile time without duplication in C++
(this is more similar to my case. my case is simpler though.)
I guess the following can work, but it adds a new argument.
int set_intersect(const int* A, const int s_a,
const int* B, const int s_b,
int* C = 0,
char flag);
put all code for version 0 into if (flag == '0') { /* version 0 code */ }
put all code for version 1 into if (flag == '1') { /* version 1 code */}
Probably can put the flag variable into template (as Barmar suggested in comments), that way, it doesn't feel like adding another argument for the function. Can also replace the 0 and 1 with enum (like enum class set_intersection_type {find_set, size_only}). Calling the function will be like set_intersect<find_set>(const int* A, const int s_a, const int* B, const int s_b, int* C) or set_intersect<size_only>(const int* A, const int s_a, const int* B, const int s_b) Hopefully this is more readable than before, and the compiler is smart enough to see what is going on.
Another problem is, what if someone uses the findset version (version 1), and then forgets to change the default argument (int C* = 0)? It is possible to call the function this way: set_intersect<find_set>(const int* A, const int s_a, const int* B, const int s_b).
May be I can use dasblinkenlight's idea in the comments. Create two wrapper functions (set_intersection, set_intersection_size). Each wrapper calls the actual function with different arguments. Also list the actual function as a private function so no one can call it directly.
For the different implementations of set intersections, maybe can create a common wrapper with templates. Calling the wrapper would be similar to set_intersection<basic>, set_intersection<binary_search>, or set_intersection_size<simd> etc. This seems to look better.
Generally seems doable, question is whether you want to do that. Would say no. From what I can tell you do two different things:
Version 0 computes the size of the intersection
Version 1 computes the size of the intersection, and writes the intersection to the location past C*, assuming there is enough space to store it.
I would not only for speed, but also for clarity make two distinct functions, set_intersection and set_intersection_size, but if you insist on having one I would benchmark your code against std::set_intersection, and if possible just redirect to the ::std version if C != 0.
In your current version I would not use your library. However I would also be hard-pressed to come up with a situation where I would prefer a custom-made version of set_intersection to the STL version. If I ever needed performance better than the STL, I would expect to have identified the point in code as the bottleneck, and I would not use a library call at all, but write the code myself, possibly in assembly and unrolling the loop etc..
What bugs me a bit is how this is supposed to work:
const int* const Count1(C);
//counter and pointer for version 1 of
//the function
...
Count1++;
*(Count1)=A_now;
If it is known at compilation time what version you want you can use Conditional Compilation.
#define Version_0 //assuming you know this compilation is version 0
Then you can go:
int set_intersect(...)
#ifdef Version_0
//Version 0 of the code
#else
//Version 1 of the code
This way only one version of the code gets compiled.
If you don't know which version it is for compilation, I suggest having two separate functions so you don't have to check for the version every instance of the function.
Have a type that you specialize on a bool parameter:
template<bool b>
struct Counter
{
};
template<>
struct Counter<false>
{
int c;
Counter(int *)
: c(0)
{
}
int operator++() { return ++c; }
void storeA(const int a_now) {}
};
template<>
struct Counter<true>
{
const int* const c;
Counter(int * c_orig)
: c(c_orig)
{
}
int operator++() { return ++C; }
void storeA(const int a_now) { *C = a_now; }
}
Then specialize your algorithm on Counter as a template argument. Note that this will be exactly the same for both cases, that is, you don't need to specialize:
template<typename Counter>
struct SetIntersectHelper
{
static int set_intersect(const int* A, const int s_a,
const int* B, const int s_b,
int* C)
{
// your function's body, using Counter
}
};
Now, you're ready to add the generic method:
int set_intersect(const int* A, const int s_a,
const int* B, const int s_b,
int* C = 0)
{
return C ? SetIntersectHelper< Counter< true > >::set_intersect(A, s_a, B, s_b, C):
SetIntersectHelper< Counter< false > >::set_intersect(A, s_a, B, s_b, C);
}
If I want to declare a dynamic size array in the main function, I can do:-
int m;
cin>>m;
int *arr= new int[m];
The following cannot be done as while compiling the compiler has to know the size of the every symbol except if it is an external symbol:-
int m;
cin>>m;
int arr[m];
My questions are:
Why does the compiler have to know the size of arr in the above code? It is a local symbol which is not defined in the symbol table. At runtime, the stack takes care of it(same way as m). Is it because the compiler has to ascertain the size of main() (a global symbol) which is equal to the size of all objects defined in it?
If I have a function:
int func(int m)
Could I define int arr[m] inside the function or still I would have to do
int *a= new int[m]
For instance :
int MyArray[5]; // correct
or
const int ARRAY_SIZE = 6;
int MyArray[ARRAY_SIZE]; // correct
but
int ArraySize = 5;
int MyArray[ArraySize]; // incorrect
Here is also what is explained in The C++ Programming Language, by Bjarne Stroustrup :
The number of elements of the array, the array bound, must be a constant expression (§C.5). If you need variable bounds, use a vector (§3.7.1, §16.3). For example:
To answer your questions:
1) Q: Why does the compiler have to know the size of arr in the above code?
A: If you generate assembly output, you'll notice a "subtract" of some fixed value to allocate your array on the stack
2) Q: Could I define int arr[m] i ... inside the function?
A: Sure you could. And it will become invalid the moment you exit the function ;)
Basically, you don't want an "array". A C++ "vector" would be a good alternative:
std::vector<A> v(5, A(2));
Here are a couple of links you might enjoy:
http://www.parashift.com/c++-faq/arrays-are-evil.html
http://blogs.msdn.com/b/ericlippert/archive/2008/09/22/arrays-considered-somewhat-harmful.aspx
I have a problem with passing 2d dynamic array to a function of my class.
void s::LoadData(long int &Num_Of_InputDataId,
long int **PresentData,
long int **InputDataId,
long int **InputData)
{
long int b;
for (long int i=0;i<Num_Of_InputDataId;i++)
{
b = InputDataId[i][0];
for(long int j=0;j<Num_Of_InputDataId;j++)
{
InputData[i][j]=PresentData[b][j]; //error occur here
} // end of internal for
} //end of external for
}
main:
long int Num_Of_InputDataId=10;
long int **PresentData;
PresentData = new long int *[Num_Of_InputDataId];
for (long int ii = 0; ii < Num_Of_InputDataId; ++ii)
PresentData[ii] = new long int[Num_Of_InputDataId];
long int ** InputDataId;
InputDataId = new long int *[Num_Of_InputDataId];
for (long int ii = 0; ii < Num_Of_InputDataId; ++ii)
InputDataId[ii] = new long int[2];
long int ** InputData;
InputData = new long int *[Num_Of_InputDataId];
for (long int ii = 0; ii < Num_Of_InputDataId; ++ii)
InputData[ii] = new long int[Num_Of_InputDataId];
Load.LoadData(Num_Of_InputDataId, PresentData, InputDataId, InputData);
Each of Num_Of_InputDataId, PresentData and InputDataId come from different functions.
For some i, InputDataId[i][0] is greater than or equal to Num_of_InputDataId.
Therefore when you call:
b = InputDataId[i][0];
... = PresentData[b][...];
the array index to PresentData is out-of-bounds, causing a memory error and crash.
As you are not giving the precise error I will make a common sense guess as to the problem. The value of b is an element in your Id array. Given that this is a long int and that your code implies it is some sort of ID code (rather than an index into another array) it is highly likely that the value of it exceeds the index range of your other array PresentData and which could be causing you a memory error from the out of bounds access attempt. You are treating b in code as if it were an index when it seems likely that it is not.
The simplest way to check would be to step through the code using a debugger, (which should automatically be your first attempt at identifying what your problem actually is) rather than guessing from the program error report. In fact using a debugger would probably have taken less time than pasting your code here and asking the question so if you are not using a debugging tool already you really should do so as it will help you no end.
And as the comment from Superman suggests, given that you use C++ take advantage of all the nice things that the STL can do for you and use vectors.
Just as #Superman has commented on your question, here is a simple snippet to make life easy.
#define V(x) std::vector< x>
typedef V(int) VI;
typedef V(VI) VII;
void functionDoSomething(VII & arr) { }
(Passing the vector by reference). By using such typedefs in your code,readability of the code will be better,and you can define nested structures easily.