reading buffer C++ - c++

I'm trying to read buffer in C++ one character at the time until '\n', and initialize char array with these characters using do-while loop. I know I could use cin.getline(), but I want to try it on my own.
int main()
{
char buffer [1024];
int index = 0;
char temp;
do
{
cin.get( temp );
buffer [ index ] = temp;
index ++;
}
while ( temp != '\n' );
cout << buffer << endl;
return 0;
}
It gives me incorrect result-the proper text fallow by couple of lines of squre brackets mixed with other weird symbols.

At first, after whole text you have to append '\0' as end of string
it should look like buffer[ index ] = 0; because you should rewrite your \n character which you append too.
Of course, there are other things which you should check but they are not your main problem
length of your input because you have limited buffer - max length is 1023 + null byte
end of standard input cin.eof()

You're not null-delimiting your buffer.
Try to change the first line to
char buffer[1024] = "";
This will set all characters in buffer to 0. Or, alternatively, set only the last character to 0, by doing
buffer[index] = 0;
after the loop.
Also, (as correctly pointed by others) if the text is longer than 1024 characters, you'll have a buffer overrun error - one of the most often exploited causes for security issues in software.

Two things:
If the length of the line you are
reading exceeds 1024 you write past
the buffer which is bad.
If the length is within the
limit,you are not terminating the
string with null char.
You can trying doing it the following way. This way if you find a fine exceeding the buffer size, we truncate it and also add the null char at the end ouside the loop.
#define MAX 1024
int main()
{
char buffer [MAX];
int index = 0;
char temp;
do
{
// buffer full.
if(index == MAX-1)
break;
cin.get( temp );
buffer [ index ] = temp;
index ++;
}
while ( temp != '\n' );
// add null char at the end.
buffer[index] = '\0';
cout << buffer << endl;
return 0;
}

Several issues I noted:
(1) What character encoding is the input. You could be reading 8,16, or 32 bit characters. Are you sure you're reading ASCII?
(2) You are searching for '\n' the end of line character could be '\r\n' or '\r' or '\n' depending on your platform. Perhaps the \r character by itself is your square bracket?

You stop filling the buffer when you get to a newline, so the rest is uninitialised. You can zero-initialise your buffer by defining it with: char buffer[1024] = {0}; This will fix your problem.

You are not putting a '\0' at the end of the string. Additionally, you should really check for buffer overflow conditions. Stop reading when index gets to 1024.

Related

Reading text file cause invalid character at buffer end

Reading a simple text file in c++ display invalid characters at the end of buffer,
string filecontent="";
ifstream reader(fileName);
reader.seekg (0, reader.end);``
int length = reader.tellg();
reader.seekg (0, reader.beg);
char *buffer=new char[length];
reader.read(buffer,length);
filecontent=buffer;
reader.close();
cout<<"File Contents"<<std::endl;
cout<<filecontent;
delete buffer;
return false;
but when i specify buffer length incremented by one ie
char *buffer=new char[length+1];
reader.read(buffer,length+1);
it works fine without invalid characters i want to know what is the reason behind this?
You read a string without terminating it with a trailing zero (char(0) or '\0'). Increase the buffer length by one and store a zero at buffer[reader.tellg()]. Just increasing the buffer size is not good enough, you might get a trailing zero by accident.

How is memset working in this snippet of code?

I think this snippet of code is enough to get the idea of what I'm doing.
I'm using getline to read input data from a text file that has lines that might look something like this: The cat is fat/And likes to sing
From searching around the internet I was able to get it working, but I'd like to better understand WHY it is working. My primary question is how the
memcpy(id, buffer, temp - buffer);
line is working. I read what memcpy() does but do not understand how the temp - buffer part is working.
So from my understanding I'm setting *temp to the '/' in that line. Then I'm copying the line up until the '/' into it. But how does the temp, which is at '/' minus the buffer (which is the whole line from getline) work out to just be The cat is fat?
Hopefully that made some sense.
#define MAX_SIZE 255
char buffer[MAX_SIZE + 1] = { 0 };
cin.getline(buffer, MAX_SIZE);
memset(id, 0, 256);
memset(title, 0, 256);
char* temp = strchr(buffer, '/');
memcpy(id, buffer, temp - buffer);
temp++;
strcpy(title, temp);
Also, if I can double dip, why would MAX_SIZE be defined at 255 but MAX_SIZE+1 is often used. Does this have to do with a delimiter or white space at the end of a line?
Thanks for the help.
In my opinion it is simply a bad code.:)
I would write it like
const size_t MAX_SIZE = 256
char buffer[MAX_SIZE] = {};
std::cin.getline( buffer, MAX_SIZE );
id[0] = '\0';
title[0] = '\0';
if ( char* temp = strchr( buffer, '/' ) )
{
std::memcpy( id, buffer, temp - buffer );
id[temp - buffer] = '\0';
std::strcpy( title, temp + 1 );
}
else
{
std::strcpy( id, buffer );
}
As for memcpy in this statement
memcpy(id, buffer, temp - buffer);
then it copies temp - buffer bytes from buffer to id. As id was previously set to zeroes then after memcpy it will contain a string with terminating zero.
You're question concerns pointer-difference calculation, part of the family of arithmetic operations that are done in pointer-arithmetic.
Most beginners don't have too much trouble grasping how pointer-addition works. Given this:
char buffer[256];
char *p = buffer + 10;
it is usually clear that p points to the 10th slot in the buffer char array. But you need to remember that the pointer type is important. The same construct you see above also works for more complicated data types:
struct Something
{
char name[128];
int ident;
int supervisor;
} people[64];
struct Something *p = people+10; // NOTE: same line, different types
Just as before, p points to the tenth element in the array, but note the arithmetic; the size of the underlying type is used to calculate the relevant memory offset. You don't need to do it yourself. No sizeof required here.
So why do you care? Because just like regular math, pointer math has certain properties, one of them being the following:
char buffer[256];
char *p = buffer+10; // p addresses the 10th slot in the array
size_t len = p-buffer // len is the typed-difference between p and buffer.
In this case, len will be 10, the same as the offset of p. So how does this relate to your question? Well...
char* temp = strchr(buffer, '/');
memcpy(id, buffer, temp - buffer);
The horrid nature of this code aside (if there is no '/' in the buffer array the result is temp being NULL, and the ensuing memcpy will all-but-guarantee a massive segfault). This code finds the location in the string where '/' resides. Once it has that, the calculation temp - buffer uses pointer arithmetic (specifically pointer differencing) to calculate the distance between the address in temp and the address as the base of the array. The result is the element count not including the slash itself. Therefore this code copies up-to, but not including, the discovered slash, into the id buffer. The rest of the id buffer retains all the 0 values populated with the memset and therefore the string is terminated (which is way more work than you need to do, btw).
After that line, the remainder:
temp++;
strcpy(title, temp);
post-increments the temp pointer, which says "move to the next element in the array". Then the strcpy copies the remaining chars of the null-terminated buffer string into title. Worth noting this could have simply been:
strcpy(title, ++temp);
And likewise:
strcpy(title, temp+1);
which retains temp at the '/' position. In all of the above, the result in title will be the same: all chars after the slash, but not including it.
I hope that explains what is going on. Best of luck.
MAX_SIZE+1 is reserving space for the null terminator at the end of the string ('\0')
memcpy(id, buffer, temp - buffer)
This is copying (temp-buffer) bytes from buffer to id. Since strchr finds the '/' character in the input, temp is pointing inside buffer (assumiing it's found). So for example assume buffer points to a location in memory:
buffer = 0x781230001
and the third byte is the '/', after strchr, you have
temp = 0x781230003
temp - buffer therefore is 2.
HOWEVER: If the '/' is not found, then temp will not work and the code will crash. You should check the result of strchr before doing the pointer arithmetic.
There you calculate position of first / in buffer.
char* temp = strchr(buffer, '/');
Now temp points to / in buffer. If you want to copy this part of buffer, its enough to get pointer to start and length of string. So temp - buffer evaluates to length.
=================================
The cat is fat/And likes to sing
=================================
^ ^
buffer temp
| length | = temp - buffer
End of null terminated string determinated by \0 (or simply 0). So if you need to store N chars you need to allocate N+1 buffer size.

My program is giving different output on different machines..!

#include<iostream>
#include<string.h>
#include<stdio.h>
int main()
{
char left[4];
for(int i=0; i<4; i++)
{
left[i]='0';
}
char str[10];
gets(str);
strcat(left,str);
puts(left);
return 0;
}
for any input it should concatenate 0000 with that string, but on one pc it's showing a diamond sign between "0000" and the input string...!
You append a possible nine (or more, gets have no bounds checking) character string to a three character string (which contains four character and no string terminator). No string termination at all. So when you print using puts it will continue to print until it finds a string termination character, which may be anywhere in memory. This is, in short, a school-book example of buffer overflow, and buffer overflows usually leads to undefined behavior which is what you're seeing.
In C and C++ all C-style strings must be terminated. They are terminated by a special character: '\0' (or plain ASCII zero). You also need to provide enough space for destination string in your strcat call.
Proper, working program:
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(void)
{
/* Size is 4 + 10 + 1, the last +1 for the string terminator */
char left[15] = "0000";
/* The initialization above sets the four first characters to '0'
* and properly terminates it by adding the (invisible) '\0' terminator
* which is included in the literal string.
*/
/* Space for ten characters, plus terminator */
char str[11];
/* Read string from user, with bounds-checking.
* Also check that something was truly read, as `fgets` returns
* `NULL` on error or other failure to read.
*/
if (fgets(str, sizeof(str), stdin) == NULL)
{
/* There might be an error */
if (ferror(stdin))
printf("Error reading input: %s\n", strerror(errno));
return 1;
}
/* Unfortunately `fgets` may leave the newline in the input string
* so we have to remove it.
* This is done by changing the newline to the string terminator.
*
* First check that the newline really is there though. This is done
* by first making sure there is something in the string (using `strlen`)
* and then to check if the last character is a newline. The use of `-1`
* is because strings like arrays starts their indexing at zero.
*/
if (strlen(str) > 0 && str[strlen(str) - 1] == '\n')
str[strlen(str) - 1] = '\0';
/* Here we know that `left` is currently four characters, and that `str`
* is at most ten characters (not including zero terminaton). Since the
* total length allocated for `left` is 15, we know that there is enough
* space in `left` to have `str` added to it.
*/
strcat(left, str);
/* Print the string */
printf("%s\n", left);
return 0;
}
There are two problems in the code.
First, left is not nul-terminated, so strcat will end up looking beyond the end of the array for the appropriate place to append characters. Put a '\0' at the end of the array.
Second, left is not large enough to hold the result of the call to strcat. There has to be enough room for the resulting string, including the nul terminator. So the size of left should at least 4 + 9, to allow for the three characters (plus nul terminator) that left starts out with, and 9 characters coming from str (assuming that gets hasn't caused an overflow).
Each of these errors results in undefined behavior, which accounts for the different results on different platforms.
I do not know why you are bothering to include <iostream> as you aren't using any C++ features in your code. Your entire program would be much shorter if you had:
#include <iostream>
#include <string>
int main()
{
std::string line;
std::cin >> line;
std::cout << "You entered: " << line;
return 0;
}
Since std::string is going to be null-terminated, there is no reason to force it to be 4-null-terminated.
Problem #1 - not a legal string:
char left[4];
for(int i=0; i<4; i++)
{
left[i]='0';
}
String must end with a zero char, '\0' not '0'.
This causes what you describe.
Problem #2 - fgets. You use it on a small buffer. Very dangerous.
Problem #3 - strcat. Yet again trying to fill a super small buffer which should have already been full with an extra string.
This code looks an invitation to a buffer overflow attack.
In C what we call a string is a null terminated character array.All the functions in the string.h library are based on this null at the end of the character array.Your character array is not null terminated and thus is not a string , So you can not use the string library function strcat here.

I have an issue with fgets in c++

Im doing a small exercise to read a file which contains one long string and load this into an array of strings. So far I have:
char* data[11];
char buf[15];
int i = 0;
FILE* indata;
indata = fopen( "somefile.txt", "r" );
while( i < 11)
{
fgets(buf, 16, indata);
data[i] = buf;
i++;
}
fclose( indata );
somefile.txt: "aaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbaahhhhhbbbbdddddddddddddbbbbb"
etc..
This reads in 15 characters, adds that string to the array and gets the next 15. The problem is the array always equals the last string, so if the last string is "ccccv" the whole array, data[0] = "ccccv", data[1] = "ccccv", data[2] = "ccccv" and so on.
Does anyone know why this is happening and whether there is a better way to do it? Thanks
Each pointer in data will point to the same memory area, which is buf.
You need to use strcpy + malloc.
Also is seems like you have a "minor" buffer overflow. buf is size 15 and you're reading 16 characters.

Trimming UTF8 buffer

I have a buffer with UTF8 data. I need to remove the leading and trailing spaces.
Here is the C code which does it (in place) for ASCII buffer:
char *trim(char *s)
{
while( isspace(*s) )
memmove( s, s+1, strlen(s) );
while( *s && isspace(s[strlen(s)-1]) )
s[strlen(s)-1] = 0;
return s;
}
How to do the same for UTF8 buffer in C/C++?
P.S.
Thanks for perfomance tip regarding strlen(). Back to UTF8 specific: what if I need to remove all spaces all together, not only at beginning and at the tail? Also I may need to remove all characters with ASCII code <32. Is any specific here for UTF8 case, like using mbstowcs()?
Do you want to remove all of the various Unicode spaces too, or just ASCII spaces? In the latter case you don't need to modify the code at all.
In any case, the method you're using that repeatedly calls strlen is extremely inefficient. It turns a simple O(n) operation into at least O(n^2).
Edit: Here's some code for your updated problem, assuming you only want to strip ASCII spaces and control characters:
unsigned char *in, *out;
for (out = in; *in; in++) if (*in > 32) *out++ = *in;
*out = 0;
strlen() scans to the end of the string, so calling it multiple times, as in your code, is very inefficient.
Try looking for the first non-space and the last non-space and then memmove the substring:
char *trim(char *s)
{
char *first;
char *last;
first = s;
while(isspace(*first))
++first;
last = first + strlen(first) - 1;
while(last > first && isspace(*last))
--last;
memmove(s, first, last - first + 1);
s[last - first + 1] = '\0';
return s;
}
Also remember that the code modifies its argument.