Function isalnum(): unexpected results - c++

For an assignment, I am using std::isalnum to determine if the input is a letter or a number. The point of the assignment is to create a "dictionary." It works well on small paragraphs, but does horrible on pages of text. Here is the code snippet I am using.
custom::String string;
std::cin >> string;
custom::String original = string;
size_t size = string.Size();
char j;
size_t i = 0;
size_t beg = 0;
while( i < size)
{
j = string[i];
if(!!std::isalnum(static_cast<unsignedchar>(j)))
{
--size;
}
if( std::isalnum( j ) )
{
string[i-beg] = tolower(j);
}
++i;
}//end while
string.SetSize(size - beg, '\0');

The code presented as I write this, does not make sense as a whole.
However, the calls to isalnum, as shown, would only work for plain ASCII, because
the C character classification functions require non-negative argument, or else EOF as argument, and
in order to work for international characters,
the encoding must be single-byte per character, and
setlocale should have been called prior to using the functions.
Regarding the first of these three points, you can wrap std::isalnum like this:
using Byte = unsigned char;
auto is_alphanumeric( char const ch )
-> bool
{ return !!std::isalnum( static_cast<Byte>( ch ) ); }
where the !! is just to silence a sillywarning from Visual C++ (warning about "performance", of all things).
Disclaimer: code untouched by compiler's hands.
Addendum: if you don't have a C++11 compiler, but only C++03,
typedef unsigned char Byte;
bool is_alphanumeric( char const ch )
{
return !!std::isalnum( static_cast<Byte>( ch ) );
}
As Bjarne remarked, C++11 feels like a whole new language! ;-)

I was able to create a solution to the problem. I noticed that isalnum did take care of some non alpha-numerics, but not all the time. Since the code above is part of a function, I called it multiple times with refined results given each time. I then came up with a do while loop that stores the string's size, calls the function, stores the new size, and compares them. If they are not the same it means that there is a chance that it needs to be called again. If they are the same, then the string has been fully cleaned. I am guessing that the reason isalnum was not working well was because I was reading in several chapters of a book into the string. Here is my code:
custom::string abc;
std::cin >> abc;
size_t first = 0;
size_t second = 0;
//clean the word
do{
first = abc.Size();
Cleanup(abc);
second = abc.Size();
}while(first != second);

Related

Character pointers messed up in simple Boyer-Moore implementation

I am currently experimenting with a very simple Boyer-Moore variant.
In general my implementation works, but if I try to utilize it in a loop the character pointer containing the haystack gets messed up. And I mean that characters in it are altered, or mixed.
The result is consistent, i.e. running the same test multiple times yields the same screw up.
This is the looping code:
string src("This haystack contains a needle! needless to say that only 2 matches need to be found!");
string pat("needle");
const char* res = src.c_str();
while((res = boyerMoore(res, pat)))
++res;
This is my implementation of the string search algorithm (the above code calls a convenience wrapper which pulls the character pointer and length of the string):
unsigned char*
boyerMoore(const unsigned char* src, size_t srcLgth, const unsigned char* pat, size_t patLgth)
{
if(srcLgth < patLgth || !src || !pat)
return nullptr;
size_t skip[UCHAR_MAX]; //this is the skip table
for(int i = 0; i < UCHAR_MAX; ++i)
skip[i] = patLgth; //initialize it with default value
for(size_t i = 0; i < patLgth; ++i)
skip[(int)pat[i]] = patLgth - i - 1; //set skip value of chars in pattern
std::cout<<src<<"\n"; //just to see what's going on here!
size_t srcI = patLgth - 1; //our first character to check
while(srcI < srcLgth)
{
size_t j = 0; //char match ct
while(j < patLgth)
{
if(src[srcI - j] == pat[patLgth - j - 1])
++j;
else
{
//since the number of characters to skip may be negative, I just increment in that case
size_t t = skip[(int)src[srcI - j]];
if(t > j)
srcI = srcI + t - j;
else
++srcI;
break;
}
}
if(j == patLgth)
return (unsigned char*)&src[srcI + 1 - j];
}
return nullptr;
}
The loop produced this output (i.e. these are the haystacks the algorithm received):
This haystack contains a needle! needless to say that only 2 matches need to be found!
eedle! needless to say that only 2 matches need to be found!
eedless to say that eed 2 meed to beed to be found!
As you can see the input is completely messed up after the second run. What am I missing? I thought the contents could not be modified, since I'm passing const pointers.
Is the way of setting the pointer in the loop wrong, or is my string search screwing up?
Btw: This is the complete code, except for includes and the main function around the looping code.
EDIT:
The missing nullptr of the first return was due to a copy/paste error, in the source it is actually there.
For clarification, this is my wrapper function:
inline char* boyerMoore(const string &src, const string &pat)
{
return (const char*) boyerMoore((const unsigned char*) src.c_str(), src.size(),
(const unsigned char*) pat.c_str(), pat.size());
}
In your boyerMoore() function, the first return isn't returning a value (you have just return; rather than return nullptr;) GCC doesn't always warn about missing return values, and not returning anything is undefined behavior. That means that when you store the return value in res and call the function again, there's no telling what will print out. You can see a related discussion here.
Also, you have omitted your convenience function that calculates the length of the strings that you are passing in. I would recommend double checking that logic to make sure the sizes are correct - I'm assuming you are using strlen or similar.

String compression (Interview prepare)

I need to compress a string. Can make an assumption that each character in the string doesn`t appear more than 255 times. I need return the compressed string and its length.
Last 2 years I worked with C# and forgot C++. I will be glad to hear your comments about code , algorithm and c++ programming practices
// StringCompressor.h
class StringCompressor
{
public:
StringCompressor();
~StringCompressor();
unsigned long Compress(string str, string* strCompressedPtr);
string DeCompress(string strCompressed);
private:
string m_StrCompressed;
static const char c_MaxLen;
};
// StringCompressor.cpp
#include "StringCompressor.h"
const char StringCompressor::c_MaxLen = 255;
StringCompressor::StringCompressor()
{
}
StringCompressor::~StringCompressor()
{
}
unsigned long StringCompressor::Compress(string str, string* strCompressedPtr)
{
if (str.empty())
{
return 0;
}
char currentChar = str[0];
char count = 1;
for (string::iterator it = str.begin() + 1; it != str.end(); ++it)
{
if (*it == currentChar)
{
count++;
if (count == c_MaxLen)
{
return -1;
}
}
else
{
m_StrCompressed+=currentChar;
m_StrCompressed+=count;
currentChar = *it;
count = 1;
}
}
m_StrCompressed += currentChar;
m_StrCompressed += count;
*strCompressedPtr = m_StrCompressed;
return m_StrCompressed.length();
}
string StringCompressor::DeCompress(string strCompressed)
{
string res;
if (strCompressed.length() % 2 != 0)
{
return res;
}
for (string::iterator it = strCompressed.begin(); it != strCompressed.end(); it+=2)
{
char dup = *(it + 1);
res += string(dup, *it);
}
return res;
}
There can be many improvement:
Do not return -1 for a unsigned long function.
consider use size_t or ssize_t to represent size.
Learn const
m_StrCompressed has bogus state if Compress is called repeatedly. Since those member cannot be reused, you may as well make the function static.
Compressed stuff generally should not be considered string, but byte buffer. Redesign your interface.
Comments! Nobody knows you are doing RLE here.
Bonus: Fallback mechanism if your compression yield larger result. e.g. a flag to denote uncompressed buffer, or just return failure.
I assume efficiency is not major concern here.
A few things:
I'm all for using classes, and perhaps you could do that here in a way that makes more sense. But given the scope of what you are trying to do, this here would be better off as two functions. One for compression, one for decompression. For instance, why are you storing the string in the class as an object and never using it? How does grouping this as a class actually enhance the functionality or make it more reusable?
You should pass your compressed string return as a reference instead of a pointer.
It looks like you are trying to count the number of times characters are repeated in a row and save that. For most common strings this will make the size of your compressed string larger than uncompressed as it takes two bytes to store each non-repeated character.
There are a lot of characters, there are two kinds of bits. If you do this method trying to group repeated bits, you'd be more successful (and that's actually one simple method of lossless compression).
If you are allowed, just use a library like zlib to do compression of arbitrary data types.

Null Pointer issue using string::iterator in Visual Studio 2005

I am working with some legacy code. The legacy code works in production mode in the following scenario. I'm trying to build a command line version of the legacy code for testing purposes. I suspect there is an environmental setting issue at work here, but I'm relatively new to C++ and Visual Studio (long time eclipse/java guy).
This code is attempting to read in a string from a stream. It reads in a short, which in my debug scenario has a value of 11. Then, it is supposed to read in 11 chars. But this code craps out on the first char. Specifically, in the read method below, ptr is null, and so the fread call is throwing an exception. Why is ptr NULL?
Point of clarification, ptr becomes null between the operator>>(string) and operator>>(char) calls.
Mystream& Mystream::operator>>( string& str )
{
string::iterator it;
short length;
*this >> length;
if( length >= 0 )
{
str.resize( length );
for ( it = str.begin(); it != str.end(); ++it )
{
*this >> *it;
}
}
return *this;
}
The method for reading the short is here and looking at the file buffer etc. this looks like it is working properly.
Mystream& Mystream::operator>>(short& n )
{
read( ( char* )&n, sizeof( n ) );
SwapBytes( *this, ( char* )&n, sizeof( n ) );
return *this;
}
Now, the method for reading in a char is here:
Mystream& Mystream::operator>>(char& n )
{
read( ( char* )&n, sizeof( n ) );
return *this;
}
and the read method is:
Mystream& Mystream::read( char* ptr, int n )
{
fread( (void*)ptr, (size_t)1, (size_t)n, fp );
return *this;
}
One thing I don't understand, in the string input method, the *it is a char right? So why does the operator>>(char &n) method get dispatched on that line? In the debugger, it looks like the *it is a 0, (although a colleague tells me he doesn't trust the 2005 debugger on such things) and thus, it looks like the &n is treated as a null pointer and so the read method is throwing an exception.
Any insights you can provide would be most helpful!
Thanks
John
ps. For the curious, Swap Bytes looks like this:
inline void SwapBytes( Mystream& bfs, char * ptr, int nbyte, int nelem = 1)
{
// do we need to swap bytes?
if( bfs.byteOrder() != SYSBYTEORDER )
DoSwapBytesReally( bfs, ptr, nbyte, nelem );
}
And DoSwapBytesReally looks like:
void DoSwapBytesReally( Mystream& bfs, char * ptr, int nbyte, int nelem )
{
// if the byte order of the file
// does not match the system byte order
// then the bytes should be swapped
int i, n;
char temp;
#ifndef _DOSPOINTERS_
char *ptr1, *ptr2;
#else _DOSPOINTERS_
char huge *ptr1, huge *ptr2;
#endif _DOSPOINTERS_
int nbyte2;
nbyte2 = nbyte/2;
for ( n = 0; n < nelem; n++ )
{
ptr1 = ptr;
ptr2 = ptr1 + nbyte - 1;
for ( i = 0; i < nbyte2; i++ )
{
temp = *ptr1;
*ptr1++ = *ptr2;
*ptr2-- = temp;
}
ptr += nbyte;
}
}
I'd throw out this mess and start over. Extrapolating from the code, if what you had actually worked, it would be roughly equivalent to something like this:
MyStream::operator>>(string &s) {
short size;
fread((void *)&size, sizeof(size), 1, fP);
size = ntohs(size); // oops: after reading edited question, this is really wrong.
s.resize(size);
fread((void *)&s[0], 1, size, fp);
return *this;
}
In this case, delegating most of the work to other functions doesn't seem to have gained much -- this does the work more directly, but still isn't significantly longer or more complex than the original (if anything, I'd say rather the opposite).
I found a gray beard in the company who could explain what's going on to me. (I had already spoken to 2 old timers so I figured I had covered the old timer avenue of attack.) The code above is not ANSI compliant STL code. In Visual Studio 2005, Microsoft first introduced STL and there were issues. In particular older code that used to work would now fail in 2005 (I think 64bit mode may play a role in this as well.) Because of this, code will not work in debug mode (but it will work in release mode). One partial article is located here.
http://msdn.microsoft.com/en-us/library/aa985982%28v=vs.80%29.aspx
The particular issue I saw has to do with the line: it = str.begin() in the first method in the question. str is an empty string. So str.begin() is technically not defined. Visual Studio treats this situation differently between debug and release modes. (Can't do this in debug, you can do it in release.)
Bottom line, the gray beard suggested rewrite was exactly Jerry's. Ironically, the gray beard had fixed this problem in several files, but neglected to check it into the mainline. Uh oh. That scares the &#$!! out of me.

C++ exam on string class implementation

I just took an exam where I was asked the following:
Write the function body of each of the methods GenStrLen, InsertChar and StrReverse for the given code below. You must take into consideration the following;
How strings are constructed in C++
The string must not overflow
Insertion of character increases its length by 1
An empty string is indicated by StrLen = 0
class Strings {
private:
char str[80];
int StrLen;
public:
// Constructor
Strings() {
StrLen=0;
};
// A function for returning the length of the string 'str'
int GetStrLen(void) {
};
// A function to inser a character 'ch' at the end of the string 'str'
void InsertChar(char ch) {
};
// A function to reverse the content of the string 'str'
void StrReverse(void) {
};
};
The answer I gave was something like this (see bellow). My one of problem is that used many extra variables and that makes me believe am not doing it the best possible way, and the other thing is that is not working....
class Strings {
private:
char str[80];
int StrLen;
int index; // *** Had to add this ***
public:
Strings(){
StrLen=0;
}
int GetStrLen(void){
for (int i=0 ; str[i]!='\0' ; i++)
index++;
return index; // *** Here am getting a weird value, something like 1829584505306 ***
}
void InsertChar(char ch){
str[index] = ch; // *** Not sure if this is correct cuz I was not given int index ***
}
void StrRevrse(void){
GetStrLen();
char revStr[index+1];
for (int i=0 ; str[i]!='\0' ; i++){
for (int r=index ; r>0 ; r--)
revStr[r] = str[i];
}
}
};
I would appreciate if anyone could explain me roughly what is the best way to have answered the question and why. Also how come my professor closes each class function like " }; ", I thought that was only used for ending classes and constructors only.
Thanks a lot for your help.
First, the trivial }; question is just a matter of style. I do that too when I put function bodies inside class declarations. In that case the ; is just an empty statement and doesn't change the meaning of the program. It can be left out of the end of the functions (but not the end of the class).
Here's some major problems with what you wrote:
You never initialize the contents of str. It's not guaranteed to start out with \0 bytes.
You never initialize index, you only set it within GetStrLen. It could have value -19281281 when the program starts. What if someone calls InsertChar before they call GetStrLen?
You never update index in InsertChar. What if someone calls InsertChar twice in a row?
In StrReverse, you create a reversed string called revStr, but then you never do anything with it. The string in str stays the same afterwords.
The confusing part to me is why you created a new variable called index, presumably to track the index of one-past-the-last character the string, when there was already a variable called StrLen for this purpose, which you totally ignored. The index of of one-past-the-last character is the length of the string, so you should just have kept the length of the string up to date, and used that, e.g.
int GetStrLen(void){
return StrLen;
}
void InsertChar(char ch){
if (StrLen < 80) {
str[StrLen] = ch;
StrLen = StrLen + 1; // Update the length of the string
} else {
// Do not allow the string to overflow. Normally, you would throw an exception here
// but if you don't know what that is, you instructor was probably just expecting
// you to return without trying to insert the character.
throw std::overflow_error();
}
}
Your algorithm for string reversal, however, is just completely wrong. Think through what that code says (assuming index is initialized and updated correctly elsewhere). It says "for every character in str, overwrite the entirety of revStr, backwards, with this character". If str started out as "Hello World", revStr would end up as "ddddddddddd", since d is the last character in str.
What you should do is something like this:
void StrReverse() {
char revStr[80];
for (int i = 0; i < StrLen; ++i) {
revStr[(StrLen - 1) - i] = str[i];
}
}
Take note of how that works. Say that StrLen = 10. Then we're copying position 0 of str into position 9 of revStr, and then position 1 of str into position 9 of revStr, etc, etc, until we copy position StrLen - 1 of str into position 0 of revStr.
But then you've got a reversed string in revStr and you're still missing the part where you put that back into str, so the complete method would look like
void StrReverse() {
char revStr[80];
for (int i = 0; i < StrLen; ++i) {
revStr[(StrLen - 1) - i] = str[i];
}
for (int i = 0; i < StrLen; ++i) {
str[i] = revStr[i];
}
}
And there are cleverer ways to do this where you don't have to have a temporary string revStr, but the above is perfectly functional and would be a correct answer to the problem.
By the way, you really don't need to worry about NULL bytes (\0s) at all in this code. The fact that you are (or at least you should be) tracking the length of the string with the StrLen variable makes the end sentinel unnecessary since using StrLen you already know the point beyond which the contents of str should be ignored.
int GetStrLen(void){
for (int i=0 ; str[i]!='\0' ; i++)
index++;
return index; // *** Here am getting a weird value, something like 1829584505306 ***
}
You are getting a weird value because you never initialized index, you just started incrementing it.
Your GetStrLen() function doesn't work because the str array is uninitialized. It probably doesn't contain any zero elements.
You don't need the index member. Just use StrLen to keep track of the current string length.
There are lots of interesting lessons to learn by this exam question. Firstly the examiner is does not appear to a fluent C++ programmer themselves! You might want to look at the style of the code, including whether the variables and method names are meaningful as well as some of the other comments you've been given about usage of (void), const, etc... Do the method names really need "Str" in them? We are operating with a "Strings" class, after all!
For "How strings are constructed in C++", well (like in C) these are null-terminated and don't store the length with them, like Pascal (and this class) does. [#Gustavo, strlen() will not work here, since the string is not a null-terminated one.] In the "real world" we'd use the std::string class.
"The string must not overflow", but how does the user of the class know if they try to overflow the string. #Tyler's suggestion of throwing a std::overflow_exception (perhaps with a message) would work, but if you are writing your own string class (purely as an exercise, you're very unlikely to need to do so in real life) then you should probably provide your own exception class.
"Insertion of character increases its length by 1", this implies that GetStrLen() doesn't calculate the length of the string, but purely returns the value of StrLen initialised at construction and updated with insertion.
You might also want to think about how you're going to test your class. For illustrative purposes, I added a Print() method so that you can look at the contents of the class, but you should probably take a look at something like Cpp Unit Lite.
For what it's worth, I'm including my own implementation. Unlike the other implementations so far, I have chosen to use raw-pointers in the reverse function and its swap helper. I have presumed that using things like std::swap and std::reverse are outside the scope of this examination, but you will want to familiarise yourself with the Standard Library so that you can get on and program without re-inventing wheels.
#include <iostream>
void swap_chars(char* left, char* right) {
char temp = *left;
*left = *right;
*right = temp;
}
class Strings {
private:
char m_buffer[80];
int m_length;
public:
// Constructor
Strings()
:m_length(0)
{
}
// A function for returning the length of the string 'm_buffer'
int GetLength() const {
return m_length;
}
// A function to inser a character 'ch' at the end of the string 'm_buffer'
void InsertChar(char ch) {
if (m_length < sizeof m_buffer) {
m_buffer[m_length++] = ch;
}
}
// A function to reverse the content of the string 'm_buffer'
void Reverse() {
char* left = &m_buffer[0];
char* right = &m_buffer[m_length - 1];
for (; left < right; ++left, --right) {
swap_chars(left, right);
}
}
void Print() const {
for (int index = 0; index < m_length; ++index) {
std::cout << m_buffer[index];
}
std::cout << std::endl;
}
};
int main(int, char**) {
Strings test_string;
char test[] = "This is a test string!This is a test string!This is a test string!This is a test string!\000";
for (char* c = test; *c; ++c) {
test_string.InsertChar(*c);
}
test_string.Print();
test_string.Reverse();
test_string.Print();
// The output of this program should look like this...
// This is a test string!This is a test string!This is a test string!This is a test
// tset a si sihT!gnirts tset a si sihT!gnirts tset a si sihT!gnirts tset a si sihT
return 0;
}
Good luck with the rest of your studies!
void InsertChar(char ch){
str[index] = ch; // *** Not sure if this is correct cuz I was not given int index ***
}
This should be something more like
str[strlen-1]=ch; //overwrite the null with ch
str[strlen]='\0'; //re-add the null
strlen++;
Your teacher gave you very good hints on the question, read it again and try answering yourself. Here's my untested solution:
class Strings {
private:
char str[80];
int StrLen;
public:
// Constructor
Strings() {
StrLen=0;
str[0]=0;
};
// A function for returning the length of the string 'str'
int GetStrLen(void) {
return StrLen;
};
// A function to inser a character 'ch' at the end of the string 'str'
void InsertChar(char ch) {
if(StrLen < 80)
str[StrLen++]=ch;
};
// A function to reverse the content of the string 'str'
void StrReverse(void) {
for(int i=0; i<StrLen / 2; ++i) {
char aux = str[i];
str[i] = str[StrLen - i - 1];
str[StrLen - i - 1] = aux;
}
};
};
When you init the char array, you should set its first element to 0, and the same for index. Thus you get a weird length in GetStrLen since it is up to the gods when you find the 0 you are looking for.
[Update] In C/C++ if you do not explicitly initialize your variables, you usually get them filled with random garbage (the content of the raw memory allocated to them). There are some exceptions to this rule, but the best practice is to always initialize your variables explicitly. [/Update]
In InsertChar, you should (after checking for overflow) use StrLen to index the array (as the comment specifies "inser a character 'ch' at the end of the string 'str'"), then set the new terminating 0 character and increment StrLen.
You don't need index as a member data. You can have it a local variable if you so please in GetStrLen(): just declare it there rather than in the class body. The reason you get a weird value when you return index is because you never initialized it. To fix that, initialize index to zero in GetStrLen().
But there's a better way to do things: when you insert a character via InsertChar() increment the value of StrLen, so that GetStrLen() need only return that value. This will make GetStrLen() much faster: it will run in constant time (the same performance regardless of the length of string).
In InsertChar() you can use StrLen as you index rather than index, which we already determined is redundant. But remember that you must make sure the string terminates with a '\0' value. Also remember to maintain StrLen by incrementing it to make GetStrLen()'s life easier. In addition, you must take the extra step in InsertChar() to avoid a buffer overflow. This happens when the user inserts a character to the string when the length of the string is alreay 79 characters. (Yes, 79: you must spend one character on the terminating null).
I don't see an instruction as to how to behave when that happens, so it must be up to your good judgment call. If the user tries to add the 80th character you might ignore the request and return, or you might set an error flag -- it's up to you.
In your StrReverse() function you have a few mistakes. First, you call GetStrLen() but ignore its return value. Then why call it? Second, you're creating a temporary string and work on that, rather than on the string member of the class. So your function doesn't change the string member, when it should in fact reverse it. And last, you could reverse the string faster by iterating through half of it only.
Work on the member data string. To reverse a string you can swap the first element (character) of the string with its last (not the terminating null, the character just before that!), the second element with the second-to-last and so on. You're done when you arrive at the middle of the string. Don't forget that the string must terminate with a '\0' character.
While you were solving the exam it would also be a good opportunity to teach your instructor a think or two about C++: we don't say f(void) because that belongs to the old days of C89. In C++ we say f(). We also strive in C++ to use class initializer lists whenever we can. Also remind your instructor how important const-correctness is: when a function shouldn't change the object is should be marked as such. int GetStrLen(void) should be int GetStrLen() const.
You don't need to figure out the length. You already know it it is strLen. Also there was nothing in the original question to indicate that the buffer should contain a null terminated string.
int GetStrLen(void){
return strLen;
}
Just using an assertion here but another option is to throw an exception.
void InsertChar(char ch){
assert(strLen < 80);
str[strLen++] = ch;
}
Reversing the string is just a matter of swapping the elements in the str buffer.
void StrRevrse(void){
int n = strLen >> 1;
for (int i = 0; i < n; i++) {
char c = str[i];
str[i] = str[strLen - i];
str[strLen - i] = c;
}
}
I would use StrLen to track the length of the string. Since the length also indicates the end of the string, we can use that for inserting:
int GetStrLen(void) {
return StrLen;
}
int InsertChar(char ch)
{
if (strLen < sizeof(str))
{
str[StrLen] = ch;
++strLen;
}
}
void StrReverse(void) {
for (int n = 0; n < StrLen / 2; ++n)
{
char tmp = str[n];
str[n] = str[StrLen - n - 1];
str[StrLen - n - 1] = tmp;
}
}
first of all why on you use String.h for the string length?
strlen(char[] array) returns the Lenght or any char array to a int.
Your function return a werid value because you never initialize index, and the array has zero values, first initilize then execute your method.

C code - need to clarify the effectiveness

Hi I have written a code based upon a requirement.
(field1_6)(field2_30)(field3_16)(field4_16)(field5_1)(field6_6)(field7_2)(field8_1).....
this is one bucket(8 fields) of data. we will receive 20 buckets at a time means totally 160 fields.
i need to take the values of field3,field7 & fields8 based upon predefined condition.
if teh input argument is N then take the three fields from 1st bucket and if it is Y i need
to take the three fields from any other bucket other than 1st one.
if argumnet is Y then i need to scan all the 20 buckets one after other and check
the first field of the bucket is not equal to 0 and if it is true then fetch the three fields of that bucket and exit.
i have written the code and its also working fine ..but not so confident that it is effctive.
i am afraid of a crash some time.please suggest below is the code.
int CMI9_auxc_parse_balance_info(char *i_balance_info,char *i_use_balance_ind,char *o_balance,char *o_balance_change,char *o_balance_sign
)
{
char *pch = NULL;
char *balance_id[MAX_BUCKETS] = {NULL};
char balance_info[BALANCE_INFO_FIELD_MAX_LENTH] = {0};
char *str[160] = {NULL};
int i=0,j=0,b_id=0,b_ind=0,bc_ind=0,bs_ind=0,rc;
int total_bukets ;
memset(balance_info,' ',BALANCE_INFO_FIELD_MAX_LENTH);
memcpy(balance_info,i_balance_info,BALANCE_INFO_FIELD_MAX_LENTH);
//balance_info[BALANCE_INFO_FIELD_MAX_LENTH]='\0';
pch = strtok (balance_info,"*");
while (pch != NULL && i < 160)
{
str[i]=(char*)malloc(strlen(pch) + 1);
strcpy(str[i],pch);
pch = strtok (NULL, "*");
i++;
}
total_bukets = i/8 ;
for (j=0;str[b_id]!=NULL,j<total_bukets;j++)
{
balance_id[j]=str[b_id];
b_id=b_id+8;
}
if (!memcmp(i_use_balance_ind,"Y",1))
{
if (atoi(balance_id[0])==1)
{
memcpy(o_balance,str[2],16);
memcpy(o_balance_change,str[3],16);
memcpy(o_balance_sign,str[7],1);
for(i=0;i<160;i++)
free(str[i]);
return 1;
}
else
{
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
}
else if (!memcmp(i_use_balance_ind,"N",1))
{
for (j=1;balance_id[j]!=NULL,j<MAX_BUCKETS;j++)
{
b_ind=(j*8)+2;
bc_ind=(j*8)+3;
bs_ind=(j*8)+7;
if (atoi(balance_id[j])!=1 && atoi( str[bc_ind] )!=0)
{
memcpy(o_balance,str[b_ind],16);
memcpy(o_balance_change,str[bc_ind],16);
memcpy(o_balance_sign,str[bs_ind],1);
for(i=0;i<160;i++)
free(str[i]);
return 1;
}
}
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
My feeling is that this code is very brittle. It may well work when given good input (I don't propose to desk check the thing for you) but if given some incorrect inputs it will either crash and burn or give misleading results.
Have you tested for unexpected inputs? For example:
Suppose i_balance_info is null?
Suppose i_balance_info is ""?
Suppose there are fewer than 8 items in the input string, what will this line of code do?
memcpy(o_balance_sign,str[7],1);
Suppose that that the item in str[3] is less than 16 chars long, what will this line of code do?
memcpy(o_balance_change,str[3],16);
My approach to writing such code would be to protect against all such eventualities. At the very least I would add ASSERT() statements, I would usually write explicit input validation and return errors when it's bad. The problem here is that the interface does not seem to allow for any possibility that there might be bad input.
I had a hard time reading your code but FWIW I've added some comments, HTH:
// do shorter functions, long functions are harder to follow and make errors harder to spot
// document all your variables, at the very least your function parameters
// also what the function is suppose to do and what it expects as input
int CMI9_auxc_parse_balance_info
(
char *i_balance_info,
char *i_use_balance_ind,
char *o_balance,
char *o_balance_change,
char *o_balance_sign
)
{
char *balance_id[MAX_BUCKETS] = {NULL};
char balance_info[BALANCE_INFO_FIELD_MAX_LENTH] = {0};
char *str[160] = {NULL};
int i=0,j=0,b_id=0,b_ind=0,bc_ind=0,bs_ind=0,rc;
int total_bukets=0; // good practice to initialize all variables
//
// check for null pointers in your arguments, and do sanity checks for any
// calculations
// also move variable declarations to just before they are needed
//
memset(balance_info,' ',BALANCE_INFO_FIELD_MAX_LENTH);
memcpy(balance_info,i_balance_info,BALANCE_INFO_FIELD_MAX_LENTH);
//balance_info[BALANCE_INFO_FIELD_MAX_LENTH]='\0'; // should be BALANCE_INFO_FIELD_MAX_LENTH-1
char *pch = strtok (balance_info,"*"); // this will potentially crash since no ending \0
while (pch != NULL && i < 160)
{
str[i]=(char*)malloc(strlen(pch) + 1);
strcpy(str[i],pch);
pch = strtok (NULL, "*");
i++;
}
total_bukets = i/8 ;
// you have declared char*str[160] check if enough b_id < 160
// asserts are helpful if nothing else assert( b_id < 160 );
for (j=0;str[b_id]!=NULL,j<total_bukets;j++)
{
balance_id[j]=str[b_id];
b_id=b_id+8;
}
// don't use memcmp, if ('y'==i_use_balance_ind[0]) is better
if (!memcmp(i_use_balance_ind,"Y",1))
{
// atoi needs balance_id str to end with \0 has it?
if (atoi(balance_id[0])==1)
{
// length assumptions and memcpy when its only one byte
memcpy(o_balance,str[2],16);
memcpy(o_balance_change,str[3],16);
memcpy(o_balance_sign,str[7],1);
for(i=0;i<160;i++)
free(str[i]);
return 1;
}
else
{
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
}
// if ('N'==i_use_balance_ind[0])
else if (!memcmp(i_use_balance_ind,"N",1))
{
// here I get a headache, this looks just at first glance risky.
for (j=1;balance_id[j]!=NULL,j<MAX_BUCKETS;j++)
{
b_ind=(j*8)+2;
bc_ind=(j*8)+3;
bs_ind=(j*8)+7;
if (atoi(balance_id[j])!=1 && atoi( str[bc_ind] )!=0)
{
// length assumptions and memcpy when its only one byte
// here u assume strlen(str[b_ind])>15 including \0
memcpy(o_balance,str[b_ind],16);
// here u assume strlen(str[bc_ind])>15 including \0
memcpy(o_balance_change,str[bc_ind],16);
// here, besides length assumption you could use a simple assignment
// since its one byte
memcpy(o_balance_sign,str[bs_ind],1);
// a common practice is to set pointers that are freed to NULL.
// maybe not necessary here since u return
for(i=0;i<160;i++)
free(str[i]);
return 1;
}
}
// suggestion do one function that frees your pointers to avoid dupl
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
A helpful technique when you want to access offsets in an array is to create a struct that maps the memory layout. Then you cast your pointer to a pointer of the struct and use the struct members to extract information instead of your various memcpy's
I would also suggest you reconsider your parameters to the function in general, if you place every of them in a struct you have better control and makes the function more readable e.g.
int foo( input* inbalance, output* outbalance )
(or whatever it is you are trying to do)