Garbage chars at the beginning of file - c++

I'm reading a file, character by character using:
while(1)
{
char c ='\0';
c = infile.get();
cout << c << endl;
}
but I have a specific file where this code reads 3 (garbage = strange) characters before the actual data in my file (and only on the beginning of the file).
I've tried to open this file with some text editors (notepad and notepad++) but it seems right = no strange characters before my data...
Any idea why this strange chars are being read and how can I avoid it?

It is Byte Order Mark sequence with the hexadecimal representation of EF BB BF or .
more details here.

Related

How to read a specific amount of characters

I can get the characters from console with this code:
Displays 2 characters each time in a new line
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
char ch[3] = "";
ifstream file("example.txt");
while (file.read(ch, sizeof(ch)-1))
{
cout << ch << endl;
}
return 0;
}
My problem is, if the set of characters be odd it doesn't displays the last character in the text file!
my text file contains this:
abcdefg
it doesn't displays the letter g in the console
its displaying this:
ab
cd
ef
I wanna display like this:
ab
cd
ef
g
I wanna use this to read 1000 characters at a time for a large file so i don't wanna read character by character, It takes a lot of time, but it has a problem if u can fix it or have a better suggestion, share it with me
The following piece of code should work:
while (file) {
file.read(ch, sizeof(ch) - 1);
int number_read_chars = file.gcount();
// print chars here ...
}
By moving the read call into the loop, you'll be able to handle the last call, where too few characters are available. The gcount method will provide you with the information how many characters were actually read by the last unformatted input operation, e.g. read.
Please note, when reading less than sizeof(ch) chars, you manually have to insert a NUL character at the position returned by gcount, if you intend to use the buffer as a C string, as those are null terminated:
ch[file.gcount()] = '\0';

Inserting a new line in a file deletes the next two characters C++

I have a file and I wish to separate items in the file by line. The indicator for the end of an item is a semicolon. So when I come across a semicolon, I want to put everything after the semicolon on a newline. And continue until i find the next semicolon and repeat.
char c;
fstream distances;
distances.open(argv[1]);
while(distances >> c) {
if(c == ';') {
distances << endl;
}
}
This is the code inside main. It is opening the file correctly. The file says
i;am;testing;this
but after running the program the file reads:
i;
;
sting;
is
I'm not sure why it would delete the two characters following the semicolon. Unless it is using that space for \n character. If anyone could help or suggest a more efficient solution I would appreciate it.
Despite the illusion presented by text editing software, you can't "insert" into a file. You can only read its contents, modify them in memory, then write them back out again.
Your two characters are being replaced by your newline which, on Windows, actually consists of a Carriage Return followed by a Line Feed.
The fstream object you have is a representation of the bytes on disk. If 'insert' writes were allowed, the program would have to move every byte in the file after the insert point up by one position, essentially rewriting the file.
The only way to get an implemention of an 'insert' ability is to do it yourself, generally using a new file (you could do it in the same file by reading the rest of the file into memory, going back to the insert position, overwriting the character, and then writing your buffered copy of the file from before).
The reason that the next two characters are being overwritten is as follows:
// file buffer = [i][;][a][m][;][t]...
// position ^
distances >> c;
// c = [i]
// file buffer = [i][;][a][m][;][t]...
// position ^
distances >> c;
// c = [;]
// file buffer = [i][;][a][m][;][t]...
// position ^
distances << endl;
std::endl writes the local line ending sequence and issues a std::flush to force writing. Under Windows, endl produces the sequence file << '\r' << '\n' << std::flush;
The stream position is where the 'a' is, not where the ';' is - by reading that character you advanced the stream position past it, so it writes the '\r' over the 'a' and the '\n' over the 'm'.
// file buffer = [i][;][a][m][;][t]...
// position ^
distances << '\r'
// file buffer = [i][;][\r][m][;][t]...
// position ^
distances << '\n'
// file buffer = [i][;][\r][\n][;][t]...
// position ^
distances >> c;
// c = ';'
// file buffer = [i][;][\r][\n][;][t]...
// position ^
The most likely explanation is that you're using Microsoft Windows, where the newline sequence is two characters: a carriage return and a line feed: \r\n.
On Microsoft Windows a std::endl will write two characters: an \r and an \n.

how to test for white space c++: [duplicate]

This question already has answers here:
Why does reading a record struct fields from std::istream fail, and how can I fix it?
(9 answers)
Closed 8 years ago.
I'm trying to parse a .csv file, and I need to be able to test for a carriage return. Here is a test .csv file called sample.csv:
2
3
As you'll notice, there are two rows and one column in this file. I now write the following C++ code:
ifstream myfile (sample.csv); //Import file
char nextchar;
myfile.get(nextchar);
cout<<nextchar<<'\n';
myfile.get(nextchar);
cout<< nextchar<<" If 0, then that was not a carriage return. If 1, it was. :"<<(nextchar=='\n')<<'\n';
myfile.get(nextchar);
cout<<nextchar<<'\n';
I expect the following output:
2
If 0, then that was not a carriage return. If 1, it was. :1
3
however, I get:
2
If 0, then that was not a carriage return. If 1, it was. :0
3
How is this possible? how do I test for a carriage return??
It may be a pair of characters CR + LF. In any case you could output the code of this character yourself. Why did not you do this?
Also you could apply standard function std::isspace decalred in header <cctype>
I suggest to use standard function std::getline to read a whole line instead of using get.
There are a lot of things that can go wrong in the assumptions: OS behaviour, the text editor used to write the sample file, an undesired extra space or tab at the end of line, and the ios_base::openmode used to open the file, as well as all possible combination between those...
First instert this line to see what you actually read: is it 0x0d or 0x0a ? or somthing else ?
cout << "Char read: 0x0"<< std::hex << (int)nextchar<<"\n";
cout << "If 0 ... // Existing line
You can also replace your sample with the following. It opens the file in binary mode and display in hex the chars really in the file :
ifstream myfile ("sample.csv", ifstream::binary); //Import file
while (myfile.good() ) {
char nextchar;
myfile.get(nextchar);
if (myfile.good())
cout << "0x0"<< std::hex << (int)nextchar
<< " " << (isprint(nextchar)? nextchar:'?') <<"\n";
}
If second and third line are 0x0d and 0x0a, you'll know for sure that your text editor has put the extra CR.
Then you can remove ifstream::binary in the code above. Normally you should have, as you pointed out only 0x0a in the second line. If it's not the case, then you should investigate if the default openmode was somehow altered.
By the way, I've compiled your original code under windows and prepared the sample file using notepad , ran the programm and got... what you did expect ! Then I've redone the test with the following modification and the finally got what you got.
Good luck !

How to ignore a character through strtok?

In the below code i would like to also ignore the character ” . But after adding that in i still get “Mr_Bishop” as my output.
I have the following code:
ifstream getfile;
getfile.open(file,ios::in);
char data[256];
char *line;
//loop till end of file
while(!getfile.eof())
{
//get data and store to variable data
getfile.getline(data,256,'\n');
line = strtok(data," ”");
while(line != NULL)
{
cout << line << endl;
line = strtok(NULL," ");
}
}//end of while loop
my file content :
hello 7 “Mr_Bishop”
hello 10 “0913823”
Basically all i want my output to be :
hello
7
Mr_Bishop
hello
10
0913823
With this code i only get :
hello
7
"Mr_Bishop"
hello
10
"0913823"
Thanks in advance! :)
I realise i have made an error in the inner loop missing out the quote. But now i receive the following output :
hello
7
Mr_Bishop
�
hello
10
0913823
�
any help? thanks! :)
It looks like you used Wordpad or something to generate the file. You should use Notepad or Notepad++ on Windows or similar thing that will create ASCII encoding on Linux. Right now you are using what looks like UTF-8 encoding.
In addition the proper escape sequence for " is \". For instance
line = strtok(data," \"");
Once you fix your file to be in ASCII encoding, you'll find you missed something in your loop.
while(!getfile.eof())
{
//get data and store to variable data
getfile.getline(data,256,'\n');
line = strtok(data," \"");
while(line != NULL)
{
std::cout << line << std::endl;
line = strtok(NULL," \""); // THIS used to be strtok(NULL," ");
}
}//end of while loop
You missed a set of quotes there.
Correcting the file and this mistake yields the proper output.
Have a very careful look at your code:
line = strtok(data," ”");
Notice how the quotes lean at different angles (well mine do, I guess hopefully your font shows the same thing). You have included only the closing double quote in your strtok() call. However, Your data file has:
hello 7 “Mr_Bishop”
which has two different kinds of quotes. Make sure you're using all the right characters, whatever "right" is for your data.
UPDATE: Your data is probably UTF-8 encoded (that's how you got those leaning double quotes in there) and you're using strtok() which is completely unaware of UTF-8 encoding. So it's probably doing the wrong thing, splitting up the multibyte UTF-8 characters, and leaving you with rubbish at the end of the line.

Load certificate and png into char*

I'm trying to load a certificate and a png file into a char* in C++:
char certPath[] = "./user.pem";
char dataPath[] = "./test.png";
char *certificate = loadFile(certPath);
char *datafile = loadFile(dataPath);
And this is my loadFile()` method:
char* loadFile(char* filename) {
cout << endl << "Loading file: " << filename << endl;
char *contents;
ifstream file(filename, ios::in|ios::binary|ios::ate);
if (file.is_open())
{
int size = file.tellg();
contents = new char [size];
file.seekg (0, ios::beg);
file.read (contents, size);
file.clear();
file.close();
}
printf("contents: %s\n", contents);
cout << endl << "finished loading " << filename << endl;
return contents;
}
This is the output which it produces:
Loading file: ./user.pem
contents: -----BEGIN CERTIFICATE-----
MIID+TCCAuGgAwIBAgIJAJhxZybSGGMgMA0GCSqGSIb3DQEBBQUAMIGSMQswCQYD
VQQGEwJBVDEPMA0GA1UECAwGU3R5cmlhMQ0wCwYDVQQHDARHcmF6MQowCAYDVQQK
DAEvMQowCAYDVQQLDAEvMR0wGwYDVQQDDBRDaHJpc3RvZiBTdHJvbWJlcmdlcjEs
MCoGCSqGSIb3DQEJARYdc3Ryb21iZXJnZXJAc3R1ZGVudC50dWdyYXouYXQwHhcN
MTIwMjE0MjEwMzA4WhcNMTMwMjEzMjEwMzA4WjCBkjELMAkGA1UEBhMCQVQxDzAN
BgNVBAgMBlN0eXJpYTENMAsGA1UEBwwER3JhejEKMAgGA1UECgwBLzEKMAgGA1UE
CwwBLzEdMBsGA1UEAwwUQ2hyaXN0b2YgU3Ryb21iZXJnZXIxLDAqBgkqhkiG9w0B
CQEWHXN0cm9tYmVyZ2VyQHN0dWRlbnQudHVncmF6LmF0MIIBIjANBgkqhkiG9w0B
AQEFAAOCAQ8AMIIBCgKCAQEA15ISaiXMSTVnmGtEF+bbhmVQk+4voU1pUZlOMVBj
QKjfPgCtgrmRaY8L+d6Pu61urFE1QrsfNJdDJRYs87Cc1eZgkvOXz0fSE2DHVNE2
i9YdFR8ea5niU5ATFZwiDIEhfCAcXWcEHWtZBB4yYYISsBkFxq6UBniGV+p7XOtE
aAtriBP0PZ4KUo+arJLStbwt4f9tBeytKowaKVNGlOpBgj7TG4bw8yA7Avdx8s+k
sReSxYteo0o9clIqISdKL0pRdzXP0Zrix54mBIfsxojfCW2SvqvLLLxtJlRKriQj
JfBc4koS6yAoktx7CvzcepGQk65ZGl0TNlteG4FJqy5yBQIDAQABo1AwTjAdBgNV
HQ4EFgQU1/g63xTix2Vs0zv2d3wVX9FGvVQwHwYDVR0jBBgwFoAU1/g63xTix2Vs
0zv2d3wVX9FGvVQwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQUFAAOCAQEAHyvI
0L+ibesg45qUxx2OQb37HA9aRpR3wYpt6d5Rd1x2pfqumrKeV/42XWodZJSkU3sH
EX8V2xKwNoUBsPb/q54S9suCHwE33XtWjLvJyR9v2wd2HjNRYdGF9XoYdpsOpcAk
/kaZ2pExzLAPDg5pTsqY9dpCFWnyccZUO1CLEeljinOZ4raIj7d6EryWsn+u5pbs
WB12EFaoNCybQ6j5+TIcRs5xdGpVD6qMkm7HUnBn6mtz8Q7qVj9sqo5us4UBRWY8
ie9X494oW59nRuLiZ8dOPGuOXsuCILY44/3eyDh6yvW7G+wrp3eZ7L7eLRSI3+lm
mxqSJNq8Yi6ArfcB+Q==
-----END CERTIFICATE-----
finished loading ./user.pem
Loading file: ./test.png
First the content of the certificate should appear and then the content of the image. The certificate works but when I try to load the image it is really strange. Nothing works anymore. Even a simple cout or printf doesn't show up on the console but the program doesn't crash...
Any suggestions what's wrong?
Your error is that you have \0 at the beginning of the PNG header.
EDIT:
Change:
printf("contents: %s\n", contents);
To:
std::cout.write( contents, size );
std::cout.flush();
You have to move size into the correct scope as well of course.
There are different kinds of PNG file. So it could be the PNG image is having non-printable character. If so, then it will not be printed using any print function, be it printf or std::cout<<.
However, you can print the hexadecimal values of non-printable character:
//write it inside the if-block
for(int i = 0 ; i < size; ++i)
std::cout << std::hex << (int) contents[i];
It would print hexadecimal value of each character.
You can test if a given character is printable or not, using isprint() function.
You can't print the contents of a png file to the console, it's a binary file - different from a certificate file, which contains the certificate MIME-encoded and thus is a regular text file.
A printable file (i.e. text) contains only bytes representing standard-ASCII characters (0x20 - 0x7F) and uses ASCII formatting characters (CR, LF, etc.) in a predictable way. Furthermore, it doesn't contain a 0x00 byte, which is used in C/C++ to mark the end of a string. A binary file may contain any byte in any order.
So, two things will happen when you try to print it: a) it'll stop at the first 0x00 byte found; b) every byte containing a non-ASCII character will be printed as a special char (if it's in the code page active for the console), or nothing at all, and bytes that contain ASCII formatting chars will be "executed" as if they were actual formatting in a text file.
The result: either you won't see anything at all or just a few strange chars mixed with random line feeds, tabs & etc.
To have what you expect, the first thing is to define exactly what it is. Do you want to see the png contents MIME-encoded? Then you should use a MIME-encoding routine (like this). Or do you want to print the hex value of each byte? Then you need to do std::cout << std::hex << byte (as Nawaz suggested) or printf("%02x") for each byte in a loop.
Also for the certificate file you should open as a text file, not binary. Otherwise, you'd have two undesired effects: no LF normalization (for instance, in Windows the EOL is marked by CR+LF, while in Unix/Linux it's just LF) and no handling for the EOF char.