Help with c++ logic? - c++

Something::methodname()
{
(unsigned char*) ptr = (unsigned char*) m_pptr;
while ((*ptr || *(ptr+1)) && (((unsigned char*)m_pptr+BUFSIZE)<ptr))
ptr++;
if(ptr == m_pptr)
return ptr;
return ptr + 1;
}
m_pptr is a protected member of a class. ptr is local to this function
Could someone help me with the logic of this code? I know it compiles but the answers I'm getting out are not the ones I'm expecting. I am memset-ing a buffer full of A5's and the while loop fails somehow. It skips right past it. Any help would be great.
This will go through a buffer and if the value of the pointer or the value of (ptr+1) is true it will increment the pointer AND the ptr can't exceed the size of the buffer(which is found by m_pptr "pointer to the beginning of the buffer" + buffer size) has to be true also. The if statement says if m_pptr(pointer to beginning of the buffer is the same as ptr then return just the pointer.
this function returns a void* and is passed nothing

(((unsigned char*)m_pptr+BUFSIZE)<ptr))
looks backward:
(((unsigned char*)m_pptr+BUFSIZE)>ptr))
would be more likely; Even more sane:
while (ptr < ((unsigned char*) m_pptr + BUFSIZE)) // until end of buffer
{
if (!*ptr) // null char reached
break;
if (!*(ptr+1)) // null char almost reached
break;
// do stuff
ptr++;
}

This bit looks suspicious to me:
while ((*ptr || *(ptr+1))
Imagine that ptr is pointing to a valid character byte, followed by a NUL terminator byte.
The first sub-test of the above line will evaluate to true, and so ptr gets incremented. Now ptr is pointing at the NUL terminator byte, and *(ptr+1) is pointing at the byte AFTER the NUL terminator byte... which might be garbage/undefined, and therefore might be non-zero, at which point (ptr) will be incremented again (because the second sub-test evaluated to true this time), so that ptr now points to the byte AFTER the NUL terminator byte. And from there on your pointer heads off into la-la-land, trying to interpret data that was never meant to be part of the string it was parsing.

Wouldn't it look cleaner and simpler if you used for-loop instead?
for ( int i =0; i<BUFSIZE && (ptr[i] || ptr[i+1]); i++);
It would be easier to notice wrong comparison, wouldn't it?
And i think it would be also easier to see that in this case it should be
for ( int i =0; i<(BUFSIZE-1) && (ptr[i] || ptr[i+1]); i++);
or even
for ( int i =1; i<BUFSIZE && (ptr[i-1] || ptr[i]); i++);
unless obiviously you accounted for that by having BUFSIZE equal to buffer size minus one.

Related

Efficiency & Readabilty of a C++ For Loop that Compares two C-style Strings

So I've created my own function to compare two C Strings:
bool list::compareString(const char array1[], const char array2[])
{
unsigned char count;
for (count = 0; array1[count] != '\0' && array2[count] != '\0' && (array1[count] == array2[count] || array1[count + 32] == array2[count] || array1[count] == array2[count+32]); count++);
if (array1[count] == '\0' && array2[count] == '\0')
return true;
else
return false;
}
The parameter of my for loop is very long because it brings count to the end of at least one of the strings, and compares each char in each array in such a way that it their case won't matter (adding 32 to an uppercase char turns that char into its lowercase counterpart).
Now, I'm guessing that this is the most efficient way to go about comparing two C Strings, but that for loop is hard to read because of its length. What I've been told is to use a for loop instead of a while loop whenever possible because a for loop has the starting, ending, and incrementing conditions in its starting parameter, but for this, that seems like it may not apply.
What I'm asking is, how should I format this loop, and is there a more efficient way to do it?
Instead of indexing into the arrays with count, which you don't know the size of, you can instead operate directly on the pointers:
bool list::compareString(const char* array1, const char* array2)
{
while (*array1 != '\0' || *array2 != '\0')
if (*array1++ != *array2++) return false; // not the same character
return true;
}
For case insensitive comparison, replace the if condition with:
if (tolower(*array1++) != tolower(*array2++)) return false;
This does a safe character conversion to lower case.
The while loop checks if the strings are terminated. It continues while one of the strings is not yet terminated. If only 1 string has terminated, the next line - the if statement, will realize that the characters don't match (since only 1 character is '\0', and returns false.
If the strings differ at any point, the if statement returns false.
The if statement also post-increments the pointers so that it tests the next character in the next iteration of the while loop.
If both strings are equal, and terminate at the same time, at some point, the while condition will become false. In this case, the return true statement will execute.
If you want to write the tolower function yourself, you need to check that the character is a capital letter, and not a different type of character (eg. a number of symbol).
This would be:
inline char tolower(char ch)
{
return (ch >= 'A' && ch <= 'Z' ? (ch + 'a' - 'A') : ch);
}
I guess you are trying to do a case-insensitive comparison here. If you just need the fastest version, use a library function: strcasecmp or stricmp or strcmpi (name depends on your platform).
If you need to understand how to do it (I mean, is your question for learning purpose?), start with a readable version, something like this:
for (index = 0; ; ++index)
{
if (array1[index] == '\0' && array2[index] == '\0')
return true; // end of string reached
if (tolower(array1[index]) != tolower(array2[index]))
return false; // different characters discovered
}
Then measure its performance. If it's good enough, done. If not, investigate why (by looking at the machine code generated by the compiler). The first step in optimization might be replacing the tolower library function by a hand-crafted piece of code (which disregards non-English characters - is it what you want to do?):
int tolower(int c)
{
if (c >= 'A' && c <= 'Z')
return c + 'a' - 'A';
}
Note that I am still keeping the code readable. Readable code can be fast, because the compiler is going to optimize it.
array1[count + 32] == array2[count]
can lead to an OutOfRangeException, if the length of the array is smaller than 32.
You can use strcmp for comparing two strings
You have a few problems with your code.
What I'd do here is move some of your logic into the body of the for loop. Cramming everything into the for loop expression massively reduces readability without giving you any performance boosts that I can think of. The code just ends up being messy. Keep the conditions of the loop to testing incrementation and put the actual task in the body.
I'd also point out that you're not adding 32 to the character at all. You're adding it to the index of the array putting you at risk of running out of bounds. You need to test the value at the index, not the index itself.
Using an unsigned char to index an array gives you no benefits and only serves to reduce the maximum length of the strings that you can compare. Use an int.
You could restructure the code so that it looks like this:
bool list::compareString(const char array1[], const char array2[])
{
// Iterate over the strings until we find the string termination character
for (int count = 0; array1[count] != '\0' && array2[count] != '\0'; count++) {
// Note 0x20 is hexadecimal 32. We're comparing two letters for
// equality in a case insensitive way.
if ( (array1[count] | 0x20) != (array2[count] | 0x20) ) {
// Return false if the letters aren't equal
return false;
}
}
// We made it to the end of the loop. Strings are equal.
return true;
}
As for efficiency, it looks to me like you were trying to reduce:
The size of the variables that you're using to store data in
memory
The number of individual lines of code in your solution
Neither of these are worth your time. Efficiency is about how many steps (not lines of code, mind you) it will take to perform a task and how those steps scale as the inputs get bigger. For instance, how much slower would it be to compare the content of two novels for equality than two single word strings?
I hope that helps :)

Segmentation fault : Address out of bounds for a pointer in C

I am trying to build and run some complicated code that was written by someone else, I don't know who they are and can't ask them to help. The code reads a bpf (brain potential file) and converts it to a readable ascii format. It has 3 C files, and 2 corresponding header files. I got it to build successfully with minor changes, however now it crashes
with a segmentation fault.
I narrowed the problem down to FindSectionEnd() (in ReadBPFHeader.c) and find that the error occurs when sscanfLine() (in the file sscanfLine.c) is called (code for both is below).
ui1 is defined as unsigned char.
si1 is defined as char.
Just before returning from sscanfLine(), the address pointed to by dp is 0x7e5191, or something similar ending with 191. However, on returning to FindSectionEnd(), dp points to 0x20303035 and it says 'Address 0x20303035 is out of bounds', which then causes a fault at strstr(). The loop in FindSectionEnd() runs without problem for 14 iterations before the fault occurs. I have no idea what is going wrong. I really hope the information I have given here is adequate.
ui1 *FindSectionEnd(ui1 *dp)
{
si1 Line[256], String[256];
int cnt=0;
while (sscanfLine(dp, Line) != EOF){
dp = (ui1 *)strstr(dp, Line);
dp+= strlen(Line);
sscanf(Line,"%s",String);
if(SectionEnd(String))
return(dp);
}
return(NULL);
}
si1 *sscanfLine(ui1 *dp, si1 *s)
{
int i = 0;
*s = NULL;
int cnt = 0;
while (sscanf(dp, "%c", s + i) != EOF){
cnt++;
dp++;
if(*(s + i) == '\n') {
*(s + i + 1) = '\0';
return s;
}
++i;
}
*(s + i) = '\0';
return s;
}
The sscanfLine function doesn't respect the size of the buffer passed in, and if it doesn't find '\n' within the first 256 bytes, happily trashes the stack next to the Line array.
You may be able to work around this by making Line bigger.
If you're going to improve the code, you should pass the buffer size to sscanfLine and make it stop when the count is reached even if a newline wasn't found. While you're at it, instead of returning s, which the caller already has, make sscanfLine return the new value of dp, which will save the caller from needing to use strstr and strlen.
My first guess would be that your string is not null terminated and strstr() segfaults because it reads past the boundaries of the array

Realloc returning null?

I allocate some memory with malloc - about 128 bytes.
Later on, I call realloc with about 200 bytes, but it's returning null!
It returns a valid pointer if I do free, and then another malloc, however I would like to use realloc.
What could explain this behavior (I clearly am not running out of memory)? Is this valid behavior?
Code bits:
//class constructor
size = 0;
sizeAllocated = DEFAULT_BUFFER_SIZE; //64
data = (char*)malloc(sizeAllocated * sizeof(char)); //data is valid ptr now, I've checked it
data[0] = '\0';
//later on:
//append function
bool append(char** data, const char* str, size_t strLen) {
if((size + strLen) >= sizeAllocated) {
sizeAllocated += strLen + 1 + BUFFER_ALLOCATION_STEP;
char* temp = realloc(*data, sizeAllocated * sizeof(char));
if(temp)
*data = temp;
return( temp != NULL );
}
EDIT: fixed. I was overloading the << operator for my class, and had it return *this instead of void. Somehow this was screwing everything up! If anyone could explain why this happen, it would be nice!
Since the following comment was added to the question
data = (char*)realloc(data, (size_t)(sizeAllocated * sizeof(char)));
if I replace sizeAllocated with a
constant that is same value, it
reallocs correctly
Now we can figure out what happened. You replaced sizeAllocated with a constant that DID NOT have the same value. For debugging purposes, add a statement that will output the value of sizeAllocated and you will be surprised.

Substring search interview question

char* func( char* a, const char* b )
{
while( *a )
{
char *s = a, *t = b;
while( (*s++ == *t++) && *s && *t );
if( *t == 0 )
return a;
a++;
}
return 0;
}
The above code was written to search for the first instance
of string "b" inside of string "a."
Is there any problem with the above program?
Is there any approach to improve the efficiency of it?
If a points to "cat" and b points to "ab", func will return a pointer to "at" (the wrong value) instead of 0 (the intended value) because the pointer t is incremented even though the comparison (*s++ == *t++) fails.
For completeness' sake and in order to answer the second question, I'd offer one solution (surely among other viable ones): Have the result of the comparison be assigned to another variable, e.g. while( ( flag = ( *s++ == *t++ ) ) && *s && *t ); and then if( flag && *t == 0 ).
I'm not a C developer so I can't nor will comment on the correctness of the code but with regards to efficiency, see:
http://en.wikipedia.org/wiki/String_searching_algorithm
I believe you have the naive searching version. Look at the Knuth-Morris-Pratt algorithm. You can do a little work on the string b before you are searching in a. And then you can do it in O(|a|+|b|). And |b| is larger than |a| then b can't be in a so it becomes O(|a|).
The essence is that if a is:
abcabe
And b is:
aba
Then you know that if the third char fails then a search will also fail if you shift b one char or two chars. Therefore you don't have to check every possible substring:
a[1 .. 3] == b
a[2 .. 4] == b
...
which is O(|a|*|b|) chars but only a subset which is equal to O(|a|)
yeah...
t can't be assigned b as its destroying const.
it doesn't match the last char in "b" properly.
Well, it does have the slight problem that it doesn't actually work.
Try running with a="xyz" and b="xw". When you hit the while loop the first time, x=x, you increment both pointers, and loop around again. Then y!=w, so you exit the loop. But you've already incremented the pointers, so t==0, and you report a hit.
In general, you report a hit regardless of whether the last character matches.
If b is a 1-character string, the last character is the only character, so a 1-character string matches anything.
I'd recommend against trying to do the loop with a single statement with side effects. As this example illustrates, this is tricky. Even if you get it right, it's very cryptic for people trying to read your code.
you could rewrite 'while loop' as (without using flag):
while( (*s == *t) && *s && *t ){
s++;
t++;
}
Or use for loop...below code is copied from K&R book of 'C':
/* strindex: return index of t in s, -1 if none */
int strindex(char s[], char t[])
{
int i, j, k;
for (i = 0; s[i] != '\0'; i++) {
for (j=i, k=0; t[k]!='\0' && s[j]==t[k]; j++, k++)
;
if (k > 0 && t[k] == '\0')
return i;
}
return -1;
}
If a is not properly null-terminated, the function will die horribly.
If b is not properly null-terminated, the function will probably die horribly.
The indentation is strange.
This is going to do the job but I there are better ways to do this.
Check this article:
http://en.wikipedia.org/wiki/String_searching_algorithm
I think the line:
while( (*s++ == *t++) && *s && *t );
is undefined because you are accessing the variables after the post-increment they might be before the increment or after the increment.
Unless they changed it, side effects of expressions are undefined by the standard as to when they take effect. The only thing guaranteed is *s++ will access s first and then increment for the next statement. What is undefined is whether the && s and && t see the value before or after the increment...
Very picky point, in addition to those raised by others:
If a and b are both 0-length, then this routine returns NULL. If it's supposed to be following the specification of strstr, then it must return a in that case. Which makes sense, since the empty string b is indeed a substring of the empty string a.
Why do you not use a function for your work? Do you know strstr()?
const char* mystrstr(const char* a,const char* b)
{
size_t blen=strlen(b);
while( *a )
{
if( !strncmp(a,b,blen) )
return a;
++a;
}
return 0;
}
*t = b; //killing the const-ness of b....
Also to clarity to code you can do while(*a!= '\0') instead of while(*a)
Also the second while statement :
while( (*s++ == *t++) && *s && *t );
will fail....Try to take int flag = (*s++ = *t++) ;
and do bit of simplification
The efficiency? It's horrible! < This does not mean I can do better, though... I'd do the same thing. ;)
Take a look at Knuth-Morris-Pratt.

How to know if the the value of an array is composed by zeros?

Hey, if you can get a more descriptive tittle please edit it.
I'm writing a little algorithm that involves checking values in a matrix.
Let's say:
char matrix[100][100];
char *ptr = &matrix[0][0];
imagine i populate the matrix with a couple of values (5 or 6) of 1, like:
matrix[20][35]=1;
matrix[67][34]=1;
How can I know if the binary value of an interval of the matrix is zero, for example (in pseudo code)
if((the value from ptr+100 to ptr+200)==0){ ... // do something
I'm trying to pick up on c/c++ again. There should be a way of picking those one hundred bytes (which are all next to each other) and check if their value is all zeros without having to check on by one.(considering char is one byte)
You can use std::find_if.
bool not_0(char c)
{
return c != 0;
}
char *next = std::find_if(ptr + 100, ptr + 200, not_0);
if (next == ptr + 200)
// all 0's
You can also use binders to remove the free function (although I think binders are hard to read):
char *next = std::find_if(ptr + 100, ptr + 200,
std::bind2nd(std::not_equal_to<char>(), 0));
Dang, I just notice request not to do this byte by byte. find_if will still do byte by byte although it's hidden. You will have to do this 1 by 1 although using a larger type will help. Here's my final version.
template <class T>
bool all_0(const char *begin, const char *end, ssize_t cutoff = 10)
{
if (end - begin < cutoff)
{
const char *next = std::find_if(begin, end,
std::bind2nd(std::not_equal_to<char>(), 0));
return (next == end);
}
else
{
while ((begin < end) && ((reinterpret_cast<uintptr_t>(begin) % sizeof(T)) != 0))
{
if (*begin != '\0')
return false;
++begin;
}
while ((end > begin) && ((reinterpret_cast<uintptr_t>(end) % sizeof(T)) != 0))
{
--end;
if (*end != '\0')
return false;
}
const T *nbegin = reinterpret_cast<const T *>(begin);
const T *nend = reinterpret_cast<const T *>(end);
const T *next = std::find_if(nbegin, nend,
std::bind2nd(std::not_equal_to<T>(), 0));
return (next == nend);
}
}
What this does is first checks to see if the data is long enough to make it worth the more complex algorithm. I'm not 100% sure this is necessary but you can tune what is the minimum necessary.
Assuming the data is long enough it first aligns the begin and end pointers to match the alignment of the type used to do the comparisons. It then uses the new type to check the bulk of the data.
I would recommend using:
all_0<int>(); // 32 bit platforms
all_0<long>(); // 64 bit LP64 platforms (most (all?) Unix platforms)
all_0<long long>() // 64 bit LLP64 platforms (Windows)
There's no built-in language feature to do that, nor is there a standard library function to do it. memcmp() could work, but you'd need a second array of all zeroes to compare against; that array would have to be large, and you'd also eat up unnecessary memory bandwidth in doing the comparison.
Just write the function yourself, it's not that hard. If this truly is the bottleneck of your application (which you should only conclude of profiling), then rewrite that function in assembly.
you tagged this C++, so you can use a pointer as an iterator, and use an stl algorithm. std::max. Then see if the max is 0 or not.
You could cast your pointer as an int * and then check four bytes at a time rather than one.
There's no way to tell whether an array has any value other than zero other than by checking all elements one by one. But if you start with an array that you know has all zeros, then you can maintain a flag that states the array's zero state.
std::vector<int> vec(SIZE);
bool allzeroes = true;
// ...
vec[SIZE/2] = 1;
allzeroes = false;
// ...
if( allzeroes ) {
// ...
}
Reserve element 0 of your array, to be set to all zeros.
Use memcmp to compare the corresponding ranges in the two elements.