Address Sanitizer Heap buffer Overflow - c++

I was trying to solve this problem on leet code it works fine on my vs code and gcc compiler but i'm getting this
Runtime error: Address sanitizer Heap buffer Overflow error message with a long list of address or something on the website. Help me fix it. Here's the code
class Solution
{
public:
char nextGreatestLetter(vector<char> v, char a)
{
int l=v.size()-1;
if (v[l] < a)
{
return v[0];
}
int i = 0;
while (v[i] <= a)
{
i++;
}
return v[i];
}
};
p.s the array is sorted in ascending order

This code snippet has a lot of problems:
The while loop isn't guaranteed to terminate. If the last character of v is == a, then the first v[l] < a test will be false, but v[i] <= a might be true all the way through the array (it looks like v is meant to be pre-sorted into ascending order), which will have you eventually accessing v[i] for a value of i >= v.size(). That is an illegal/undefined array access and might be the source of the error message you report, if the test platform had strict bounds-checking enabled.
The logic of returning v[0] if a is greater than any character in v (again, inferring from the loop that v is supposed to be pre-sorted into ascending order) also seems flawed. Why not return the value of a instead? The caller can easily see that if the return value was <= a, then there clearly was no element of v greater than a.
It's almost certainly worth your time to handle cases where the passed-in v array is empty (v.size() == 0) or not actually pre-sorted, i.e. by caching n = v.size() and changing the loop condition to while (i < n && v[i] <= a). Don't let fragile functions creep into your codebase!

Related

C++ unordered_map operator[ ] vs unordered_map.find() performance

I was solving a competitive programming problem on interviewbit.com
I basically used a unordered_map to keep track of visited numbers. When I used operator[], my code could not perform in time, but it passes all tests when I used find. Both should have same time complexity.
I tried timing both codes using clock() by running them 10 times and averaging out the run times and they both gave more or less same time. I used g++ 7.4.0 while the environment provided by website has g++ 4.8.4. Could this be the reason for this.
int Solution::solve(vector<int> &A) {
unordered_map<long long, int> hashmap;
for(auto a : A)
hashmap[a] = 1;
int res = 0;
for(int i = 0; i < A.size(); ++i){
for(int j = i + 1; j < A.size(); ++j){
// if(hashmap.find((long long)A[i] + A[j]) != hashmap.end())
if(hashmap[(long long)A[i] + A[j]] == 1)
++res;
}
}
return res;
}
The problem was to find pairs in array whose sum also exist in the array. I got "time limit exceeded" on array size of about 900 when I used the [] operator.
There are two reasons why the []-operator will be slower than find:
The []-operator calls a non-const function on the map properly preventing all kinds of optimizations, like loop unrolling.
The more import reason: The []-operator creates non-existing elements in the map with their default value. The map will be bloated with all pairs of A[i] + A[j], that were not previously in the map and set their values to 0. This will increase the map size and thus the time.
I think your performance measurements showed no difference between the two alternatives because of one or more of this reasons:
The input vector is too small to make a difference
Most combinations of A[i] + A[j] are already in the vector, so the unordered_map is not bloated enough to make a difference
You did not optimize your code (-O3 or -Os) your code

Why isn't the read only [] operator used?

I'm currently writing an Polynomial-class in C++, which should represent an polynomial of the following form:
p(x) = a_0 + a_1*x^1 + a_2*x^2 + ... + a_i*x^i
where a_0, ..., a_i are all int's.
The class internally uses an member variable a_ of typestd::vector<int> to store the constant factors a_0, ..., a_i. To access the constant factors the operator[] is overloaded in the following way:
Read and write:
int &operator[](int i)
{
return a_.at(i);
}
This will fail when trying to change one of the factors a_i with:
i > degree of polynomial = a_.size() - 1
Read-only:
int operator[](int i) const
{
if (i > this->degree()) {
return 0;
}
return a_.at(i);
}
The slightly different implementation allows rather comfortable looping over the factors of two different sized polynomials (without worrying about the degree of the polynomial).
Sadly I seem to miss something here, since the operator+-overloading (which makes use of this comfortable read-only-operator[]) fails.
operator+-overloading:
Polynomial operator*(const Polynomial &other) {
Polynomial res(this->degree() + other.degree());
for (int i = 0; i <= res.degree(); ++i) {
for (int k = 0; k <= i; ++k) {
res[i] += (*this)[k] * other[i-k];
}
}
return res;
}
Don't mind the math involved. The important point is, that the i is always in the range of
0 <= i < res.a_.size()
thus writing to res[i] is valid. However (*this)[k] and other[i-k] try to read from indices which don't necessarily lay in the range [0, (*this).a_.size() - 1].
This should be fine with our read-only-implementation of the operator[] right? I still get an error trying to access a_ at invalid indices. What could cause the compiler to use the read-write-implementation in the line:
res[i] += (*this)[k] * other[i-k];
Especially the part on the right side of the equality.
I'm certain the error is caused by the "wrong" use of the read-and-write-operator[]. Because with an additional check fixes the invalid access:
if (k <= this->degree() && i-k <= other.degree()) {
res[i] += (*this)[k] * other[i-k];
}
What am I missing with the use of the operator[]-overloading? Why isn't the read-only-operator[] used here?
(*this)[k] is using the non-const this as the function containing it is not const.
Hence the non-const overload of [] is preferred by the compiler.
You could get round this using an ugly const_cast, but really you ought to keep the behaviour of the two versions of the [] operator as similar as possible. Besides, the std::vector overload of [] doesn't insist on the index being bounds checked, as opposed to at which must be. Your code is a deviation from this and therefore could confuse readers of your code.

Start iterating vector from the nth element

I'm trying to iterate a vector from the nth element onwards. Not sure how should i go about doing this.
I have a vector A and B. My vector A has 10 elements of PC1-PC10 and my vector B has 20 elements of User1-User20.
So what I want to do is that when both my vector A and B reaches the 10th element, meaning to say the last element for vector A, I want to repeat iterating vector A but start iterating vector B from the 11th element so that I can do some stuff with it.
Below is the simplified code that I came up with but technically its about the same thing:
vector<string>::iterator b = vecB.begin();
for (int i = 1; i < 2; i++) {
for (vector<string>::iterator a = vecA.begin(); a != vecA.end() ; a++) {
if (a == vecA.end()) {
b = vecB.begin() + 10; //here the iterator for verB should start from the 11th element
}
++b
}
}
Should I mess with the iterator for vector B? Or is there another alternative?
EDIT
It seems that I have been asking the wrong question after all. I have marked the answer to this question and will be posting another shortly. Thanks for the quick response to my question!
The if condition inside the nested loop will never be true, because it conflicts with the loop condition:
for (vector<string>::iterator a = vecA.begin(); a != vecA.end() ; a++) {
// This check ----------------------------------^^^^^^^^^^^^^^^
// guarantees that this will never succeed:
// vvvvvvvvvvvvvvv
if (a == vecA.end()) {
...
}
}
You should rewrite the code like this:
vector<string>::iterator b = vecB.begin();
// Check that vecB has sufficient number of elements before entering the loop.
for (int i = 1 ; i < 2 ; i++) {
for (vector<string>::iterator a = vecA.begin(); a != vecA.end() ; ++a, ++b) {
...
}
// At this point we know for sure that a == vecA.end(),
// because it is a post-condition of the for loop above.
b = std::next(vecB.begin(), 11);
}
The call of ++b can be moved into the loop header.
Note the use of std::next: although
b = vecB.begin() + 10;
compiles for vectors, it is not guaranteed for all kinds of containers. Use std::next instead:
b = std::next(vecB.begin(), 11);
Note: This code makes an assumption that vecB has at least 11 elements more than vecA does. This may be OK if you check that assumption before entering the loop. If this assumption is broken, the code would have undefined behavior.
Others have already answered how to reset or advance an iterator, so I'll just answer, how to solve your problem in a simpler way. It's much simpler to iterate two vectors in parallel using the index rather than two iterators:
// assumes vecB is bigger than vecA as described in the question
for (std::size_t i = 0; i < vecB.size(); i++) {
auto user = vecB[i];
auto pc = vecA[i % vecA.size()];
}
Pay attention to how the smaller vector is iterated using the remainder operator.
In addition to using std::next, as shown in the answer by #dasblinkenlight, you can also use std::advance.
b = vecB.begin();
std::advance(b, 10);
You don't need to change the iterator for B, it will automatically continue with 11th element. But you need to restart iteration on A at the beginning of the for loop (or you would work with a.end() which is not a valid element):
if (a == vecA.end()) {
a = vecA.begin();
}
Also you should iterate over both but check for end on b only; if you check on a, the for would end before the if would turn true:
for (auto a = vecA.begin(), b = vecB.begin(); b != vecB.end(); ++a, ++b)
You can see the whole code here.
I actually prefer to manually iterate over vectors pre-C++11 because it looks way cleaner and more readable than iterators:
for (unsigned int i = 0; i < my_vector.size(); i++) {
my_vector[i]; //Do Something
}
You can specify the range you want to iterate over simply by modifying the for loop conditional (i.e. unsigned int i = n)
Edit: Before downvoting actually read my entire answer. Using iterators on vectors is overly verbose and makes your code virtually unreadable. If there is a legitimate reason this method should not be used in favor of iterators, then please leave a comment.
Most people aren't looking for an ultra-generic, drop-in-any-container solution. Most people know they need a dynamic list, vector fits the bill, so why not make your code easy to read for the next guy?

Is there a more efficient way to do this algorithm?

To the best of my knowledge, this algorithm will search correctly and turn out true when it needs too. In class we are talking about Big O analysis so this assignment is to show how the recursive search is faster than an iterative search. The point is to search for a number such that A[i] = i (find an index that is the same as the number stored at the index). This algorithm versus an iterative one only varies by about 100 nanoseconds, but sometimes the iterative one is faster. I set up the vector in main using the rand() function. I run the two algorithms a million times and record the times. The question I am asking is, is this algorithm as efficient as possible or is there a better way to do it?
bool recursiveSearch(vector<int> &myList, int beginning, int end)
{
int mid = (beginning + end) / 2;
if (myList[beginning] == beginning) //check if the vector at "beginning" is
{ //equal to the value of "beginning"
return true;
}
else if (beginning == end) //when this is true, the recursive loop ends.
{ //when passed into the method: end = size - 1
return false;
}
else
{
return (recursiveSearch(myList, beginning, mid) || recursiveSearch(myList, mid + 1, end));
}
}
Edit: The list is pre-ordered before being passed in and a check is done in main to make sure that beginning and the end both exist
One possible "improvement" would be to not copy the vector in each recursion by passing a reference:
bool recursiveSearch(const vector<int>& myList, int beginning, int end)
Unless you know something special about the ordering of the data, there is absolutely no advantage to performing a partitioned search like this.
Indeed, your code is actually [trying] to do a linear search, so it is actually implementing a simple for loop with the cost of a lot of stack and overhead.
Note that there is a weirdness in your code: If the first element doesn't match, you will call recursiveSearch(myList, beginning /*=0*/, mid). Since we already know that element 0 doesn't match, you're going to subdivide again, but only after re-testing the element.
So given a vector of 6 elements that has no matches, you're going to call:
recursiveSearch(myList, 0, 6);
-> < recursiveSearch(myList, 0, 3) || recursiveSearch(myList, 4, 6); >
-> < recursiveSearch(myList, 0, 1) || recursiveSearch(2, 3) > < recursiveSearch(myList, 4, 5); || recursiveSearch(myList, 5, 6); >
-> < recursiveSearch(myList, 0, 0) || recursiveSearch(myList, 1, 1) > < recursiveSearch(myList, 2, 2) || recursiveSearch(myList, 3, 3) > ...
In the end, you're failing on a given index because you reached the condition where begin and end were both that value, that seems an expensive way of eliminating each node, and the end-result is not a partitioned search, it a simple linear search, you just use a lot of stack-depth to get there.
So, a simpler and faster way to do this would be:
for (size_t i = beginning; i < end; ++i) {
if (myList[i] != i)
continue;
return i;
}
Since we're trying to optimize here, it's worth pointing out that MSVC, GCC and Clang all assume that if expresses the likely case, so I'm optimizing here for the degenerate case where we have a large vector with no or late matches. In the case where we get lucky and we find a result early, then we're willing to pay the cost of a potential branch miss because we're leaving. I realize that the branch cache will soon figure this out for us, but again - optimizing ;-P
As others have pointed out, you could also benefit from not passing the vector by value (forcing a copy)
const std::vector<int>& myList
An obvious "improvement" would be to run threads on all the remaining cores. Simply divvy up the vector into number of cores - 1 pieces and use a condition variable to signal the main thread when found.
If you need to find an element in an unsorted array such that A[i] == i, then the only way to do it is to go through every element until you find one.
The simplest way to do this is like so:
bool find_index_matching_value(const std::vector<int>& v)
{
for (int i=0; i < v.size(); i++) {
if (v[i] == i)
return true;
}
return false; // no such element
}
This is O(n), and you're not going to be able to do any better than that algorithmically. So we have to turn our attention to micro-optimisations.
In general, I would be quite astonished if on modern machines, your recursive solution is faster in general than the simple solution above. While the compiler will (possibly) be able to remove the extra function call overhead (effectively turning your recursive solution into an iterative one), running through the array in order (as above) allows for optimal use of the cache, whereas, for large arrays, your partitioned search will not.

Array Initialization

I am trying to assign values within an array within the condition of a for loop:
#include <iostream>
using namespace std;
int* c;
int n;
int main()
{
scanf("%d", &n);
c = new int[n];
for (int i = 0; i < n; c[i] = i++ )
{
printf("%d \n", c[i]);
}
return 0;
}
However, I am not obtaining the desired output, for n = 5, 0 1 2 3 4. Instead, if I am using the instruction, c[i] = ++i, I am obtaining the output -842150451 1 2 3 4. Could you please explain me we does my code behave like this and how can I correct it?
The value of the expression ++i is the value after i has been incremented. So if it started at 0, you assign value 1 the first time and so on. You can see where the value got assigned, but asking why it got assigned there opens a can of worms.
Using i in an expression where i is modified via i++ or ++i is undefined behavior unless there is a so-called "sequence point" in between the two. In this case, there isn't. See Undefined behavior and sequence points for this rather complicated part of the language.
Although the behaviour is undefined by the standard, and may not be consistent from one run to another, clearly your program has done something. Apparently it didn't assign to index 0 at all (at least, not before the first print, which is understandable considering that the loop body happens before the last part of the "for"), so you got whatever just so happened to be in that raw memory when it was allocated to you. It assigned 1 to index 1 and so on.
This means that it may also have attempted to assign the value 5 to c[5], which is a class of bug known as a "buffer overrun", and more undefined behavior on top of what you've already got. Attempting to assign to it probably overwrites other memory, which on any given day may or may not contain something important.
The fix is to assign some value to c[0], and don't try to assign to c[5], which doesn't exist anyway, and don't try to invalidly use i "at the same time as" incrementing it. Normally you'd write this:
for (int i = 0; i < n; ++i) {
c[i] = i;
printf("%d \n", c[i];
}
If you're desperate for some reason to assign in the third clause of a for loop, you could use the comma operator to introduce a sequence point:
for (int i = 0; i < n; c[i] = i, ++i) {
}
But of course if you do that then you can't print the value of c[i] in the loop body. It hasn't been assigned yet, because the third clause isn't evaluated until the end of each loop.
You could also try c[i] = i+1, ++i, but not ++i, c[i] = i because then we're back to trying to assign to c[5], on the last iteration.
First you need to understand that the last part of the for loop is executed at the end of each iteration, so the reason you see this:
-842150451 1 2 3 4
Is because you print c[0] before it is assigned, so the value could be anything. The rest falls into line as expected.
Lesson; don't be sneaky and stuff things into the last part of the for loop like that. Make your code clear and simple:
for (int i = 0; i < n; ++i )
{
c[i] = i;
printf("%d \n", c[i]);
}
Firstly, you are claiming that you want to assign the values "within the condition of the loop". In for loop the condition is the second part of the header (i < n in your case). You are performing the assignment in the third part, which is not a condition. So, why are you saying that you want to assign the values within the condition, and yet not doing that?
Secondly, expressions like c[i] = i++ or c[i] = ++i do not have any defined behavior in C++ language. In C++ it is illegal to modify a variable and at the same time read it for any other purpose without an intervening sequence point. Yet, you are doing exactly that. There's no meaningful explanation for the behavior of your code. The behavior of your code is undefined. It can do anything for any random reason.
Thirdly, initializing anything in the for condition is generally not a good idea. Could you explain in more detail what you are trying to do and why? Without it it is hard to come up with anything meaningful.
Your fundamental problem is how the statements in the for(;;) structure get broken down and executed. The for(st1; st2; st3) structure is intended to be identical to:
st1;
while (st2) {
<body>
st3;
}
Therefore, your 3rd statement, c[i] = i++, gets executed after the printf statement and you're printing uninitialized data.
The pre-increment vs. post-increment issue is obscuring this.
The reason why c[i] = ++i produces undefined behavior. It's undefined to both ++ a value (pre or post) and use it again within the same expression. In this case it appears that ++i is being evaluated before anything else and causing the execution to essentially be
c[1] = 1;
c[2] = 2;
...
This means c[0] is never initialized and instead has essentially a garbage value. It seems like the order you want is
c[0] = 0;
c[1] = 1;
To get this ordering you'll need to separate the initialization and increment into separate statements.
c[i] = i;
i++;