Number getting stored as special character in C++ - c++

#include<fstream>
#include<string.h>
#include<iostream>
using namespace std;
class contact
{
long long ph;
unsigned char name[20],add[50],email[30];
public:
void create_contact()
{
cout<<"Phone: ";
cin>>ph;
cout<<"Name: ";
cin.ignore();
cin>>name;
cout<<"Address: ";
cin.ignore();
cin>>add;
cout<<"Email address: ";
cin.ignore();
cin>>email;
cout<<"\n";
}
void show_contact()
{
cout<<endl<<"Phone Number: "<<ph;
cout<<endl<<"Name: "<<name;
cout<<endl<<"Address: "<<add;
cout<<endl<<"Email Address : "<<email;
}
long long getPhone()
{
return ph;
}
unsigned char* getName()
{
return name;
}
unsigned char* getAddress()
{
return add;
}
unsigned char* getEmail()
{
return email;
}
};
fstream fp;
contact cont;
void save_contact()
{
fp.open("contactBook.txt",ios::out|ios::app);
cont.create_contact();
fp.write((char*)&cont,sizeof(contact));
fp.close();
cout<<endl<<endl<<"Contact Has Been Sucessfully Created...";
getchar();
Hey there, I am new to C++ as well as this community and in this is the code that I have been working on, the phone number of the contact is getting saved as random special characters. This is half of the code where I think the problem occurs Any ideas on how I could fix it? It would be of much help. Thanks!

I take it you expected to see the phone number written out in your text file as something like "15551234567." However, long long is not stored in this form in memory. It's actually stored as a 64-bit binary integer. The special characters you describe are likely the encoded version of that integer. If you read the data back in, you should find that it is still an integer.
However, there is one remaining issue. You are missing ios::binary on the fstream open command. Each of the ios flag imbues the stream with a particular behavior:
ios::out - indicates that this stream should be an output stream that you can write bytes to
ios::app - indicates that this stream should be opened in "append" mode. This means that it will not erase the contents of the file every time you open it, and any bytes outputted to the stream are appended to the end of the file.
ios::binary - opens the file in binary mode, which is needed when you want to input/output binary data, rather than just text.
You want to open the file with ios::out | ios::app | ios::binary. Forgetting binary is going to lead to very difficult to debug errors.
Now binary mode is a bit of a pest. Sorry for this being a long read, but its a lot easier to come to grips with this flag if you understand the history behind it.
Way back in the early days of computing, there was a disagreement about how to write a newline into a file. This way the days of typewriters, where starting a new line was broken into two actions. There was "carriage return" which moved the sliding bit of the typewriter back to the start of the line (this was the loud part of the motion), and there was a "line feed" which moved the paper up one spot. Each of these were separate actions, so they were given separate characters in ASCII, one of the the definitive ways to write text as a string of bytes. The 8-bit number 10 encoded a line feed (aka LF), and the 8-bit number 13 encoded a carriage return (aka CR). This would permit one to do things like overtyping, a trick where one types one character (like a letter) and then goes back to add another over the top (like an accent). You might write à by first typing a, doing a "carriage return" and then writing a `, just like you did on a typewriter.
Some operating systems (such as Windows) encoded the start of the next line as both of these characters, so you'd see CR LF in a text file. Other operating systems (such as Unix) decided that it wasn't worth wasting a precious byte at the end of every line, so they chose to represent the start of a new line just with a LF. Others (such as Macintosh), decided to represent the new line as CR. Nobody could agree.
To deal with this, many file reading/writing APIs treat these characters specially. fopen and fstream follow a pattern where if they see a CR LF or a CR in a text file, they silently turn it into a LF character when read. This lets you read every file type. Likewise, if it sees a LF character when writing, it expands it to whatever the platform specified a new line should look like. This lets you write cross-platform code which writes text files without having to pay attention to which new line character is used on each platform!
However, this causes huge problems for binary data. Consider the number 302,844,416 written as a 32 bit number. In hexadecimal, we would write that as 0x120D0A00 (hex is a popular way to write numbers in programming because every byte can be written as 2 characters in hex). The issue is the middle two bytes of the number, 0x0D and 0x0A. In decimal, these are 13 and 10, which you should recognize as the same bytes as CR and LF.
If the program tries to read that number in "text mode," it will see the CR LF pair, and turn it into just a single LF, per the C rules. Now, instead of our number being 0x120D0A00, its 0x120A00XX, where the XX is whatever the next byte was in the file. Very bad things! Not only is this data corrupted, but you probably needed the next byte for whatever came next in the file!
ios::binary and the "b" flag for fopen resolve this. They tell C/C++ that the data is going to be binary. There wont be any new lines to convert. If you write bytes to a binary stream, they get written directly to the file, without any clever attempts to handle new lines.
Your phone number is stored as a long long, which is a binary integer format. Without ios::binary, you run the risk of the number just happening to have a CR LF pair in it, and fstream will corrupt your data. ios::binary tells fstream to not mess with the data in that way.

Related

Having problems with 0x0A character in C++ even in binary mode. (interprets it as new file)

Hi this might seem a bit noobie, but here we go. Im developing a program that downloads leaderboards of a certain game from the internet and transforms it into a proper format to work with it (elaborate rankings, etc).
The files contains the names, ordered by rank, but between each name there are 7 random control codes (obivously unprintable). The txt file looks like this:
..C...hName1..)...&Name2......)Name3..é...þName4..Ü...†Name5..‘...QName6..~...bName7..H...NName8..|....Name9..v...HName10.
Checked via an hexEditor and saw the first control code after each name is always a null character (0x00). So, what I do is read everything, and then cout every character. When a 0x00 character is found, skip 7 characters and keep couting. Therefore you end up with the list, right?
At first I had the problem that on those random control codes, sometimes you would find like a "soft EOF" (0x1A), and the program would stop reading there. So I finally figured out to open it in binary mode. It worked, and then everything would be couted... or thats what I thought.
But I came across another file which still didn't work, and finally found out that there was an EOF character! (0x0A) Which doesn't makes sense since Im opening it in binary mode. But still, after reading that character, C++ interprets that as a new file, and hence skips 7 characters, so the name after that character will always appear cut.
Here's my current code:
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
int main () {
string scores;
system("wget http://certainwebsite/001.txt"); //download file
ifstream highin ("001.txt", ios::binary);
ofstream highout ("board.txt", ios::binary);
if (highin.is_open())
{
while ( highin.good() )
{
getline (highin, scores);
for (int i=0;i<scores.length(); i++)
{
if (scores[i]==0x00){
i=i+7; //skip 7 characters if 'null' is found
cout << endl;
highout << endl;
}
cout << scores[i];
highout << scores[i]; //cout names and save them in output file
}
}
highin.close();
}
else cout << "Unable to open file";
system("pause>nul");
}
Not sure how to ignore that character if being already in binary mode doesn't work. Sorry for the long question but I wanted to be detailed and specific. In this case, the EOF character is located before the Name3, and hence this is how the output looks like:
http://i.imgur.com/yu1NjoZ.png
By default getline() reads until the end of line and discards the newline character. However, the delimiter character could be customized (by supplying the third parameter). If you wish to read until the null character (not until the end of line), you could try using getline (highin, scores, '\0'); (and adjusting the logic of skipping the characters).
I'm glad you figured it out and it doesn't surprise me that getline() was the culprit. I had a similar issue dealing with the newline character when I was trying to read in a CSV file. There are several different getline() functions in C++ depending on how you call the function and each seems to handle the newline character differently.
As a side note, in your for loop, I'd recommend against performing a method call in your test. That adds unnecessary overhead to the loop. It'd be better to call the method once and put that value into a variable, then enter the loop and test i against the length variable. Unless you expect the length to change, calling the length() method each iteration is a waste of system resources.
Thank you all guys, it worked, it was the getline() which was giving me problems indeed. Due to the 'while' loop, each time it found a new line character, it restarted the process, hence skipping those 7 characters.

C++ cin fails when reading more than 127 ASCII values

I've created a text file that has 256 characters, the first character of the text file being ASCII value 0 and the last character of the text value being ASCII value 255. The characters in between increment from 0 to 255 evenly. So character #27 is ASCII value 27. Character #148 should be ASCII value 148.
My goal is to read every character of this text file.
I've tried reading this with cin. I tried cin.get() and cin.read(), both of which are supposed to read unformatted input. But both fail when reading the 26th character. I think when I used an unsigned char, cin said it was reading read in 255, which simply isn't true. And when I used a normal signed char, cin said it was reading in -1. It should be reading in whatever the character equivalent of ASCII 26 is. Perhaps cin thinks it's hit EOF? But I've read on separate StackOverflow posts previously that EOF isn't an actual character that one can write. So I'm lost as to why cin is coughing on character values that represent integer -1 or integer 255. Could someone please tell me what I'm doing wrong, why, and what the best solution is, and why?
There's not much concrete code to paste. I've tried a few different non-working combinations all involving either cin.get() or cin.read() with either char or unsigned char and call casts to char and int in between. I've had no luck with being able to read past the 26th character, except for this:
unsigned char character;
while ( (character = (unsigned char)cin.get()) != EOF) { ... }
Interestingly enough though, although this doesn't stop my while loop at the 26th character, it doesn't move on either. It seems like cin, whether its cin.get() or cin.read() just refuses to advance to the next character the moment it detects something it doesn't like. I'm also aware that something like cin.ignore() exists, but my input isn't predictable; that is, these 256 characters for my text file are just a test case, and the real input is rather random. This is part of a larger homework assignment, but this specific question is not related to the assignment; I"m just stuck on part of the process.
Note: I am reading from the standard input stream, not a specific text file. Still no straightforward solution it seems. I can't believe this hasn't been done on cin before.
Update:
On Windows, it stops after character 26 probably due to that Ctrl-Z thing. I don't care that much for this problem. It only needs to work on Linux.
On Linux, though, it reads all characters from 0 - 127. But it doesn't seem to be reading the extended ASCII characters from 127 to 255. There's a "solution" program that produces output we're supposed to imitate, and that program is able to read all 255 characters somehow.
Question: How, using cin, can I read all 255 ASCII characters?
Solved
Using:
int characterInt;
unsigned char character;
while ( (characterInt = getchar()) != EOF )
{
// 'character' now stores values from 0 - 255
character = (unsigned char)(characterInt);
}
I presume you are on windows. On the windows platform character 26 is ctrl-z which is used in a console to represent end of file, so the iostreams is thinking your file ends at that character.
It onlt does this in text mode which cin is using, if you open a steam in binary mode it won't do this.
std::cin reads text streams, not arbitrary binary data.
As to why the 26th character is interesting, you are probably using a CP/M derivative (such as MS-DOS or MS-Windows). In those operating systems, Control-Z is used as an EOF character in text files.
EDIT:
On Linux, using g++ 4.4.3, the following program behaves precisely as expected, printing the numbers 0 thru 255, inclusive:
#include <iostream>
#include <iomanip>
int main () {
int ch;
while( (ch=std::cin.get()) != std::istream::traits_type::eof() )
std::cout << ch << " ";
std::cout << "\n";
}
There are two problems here. The first is that in Windows the default mode for cin is text and not binary, resulting in certain characters being interpreted instead of being input into the program. In particular the 26th character, Ctrl-Z, is being interpreted as end-of-file due to backwards compatibility taken to an extreme.
The other problem is due to the way cin >> works - it skips whitespace. This includes space obviously, but also tab, newline, etc. To read every character from cin you need to use cin.get() or cin.read().

Reading From A File Which Contains Unicode Characters

I have this huge file which contains unicode strings at the beginning (first ~10,000 character or so)
I don't care about the unicode part, parts I'm interested aren't unicode but whenever I try to read those parts I get '=', and if I were to load the entire file to char array and write to to some temporary file (without altering the data) with ofstream I get incorrect data actually all I get is a text file filled with Í If I were to remove the unicode part manually everything works fine, So it seems ifstream cannot deal with streams which contains unicode data, but if this assumption is true, is there any way to work on this file introducing a new library to my project?
Thanks,
EDIT: Here's a sample code, program reads from this file which contains characters (some, not all) that can't be represented in ASCII.
ifstream inFile("somefile");
inFile.seekg(0,ios_base::end);
size_t size = inFile.tellg();
inFile.seekg(0,ios_base::beg);
char *book = new char[size];
inFile.read(book,size);
for (int i = 0; i < size; i++) {
cout << book[i] << " " << i << endl; //book[i] will always be '='
}
ofstream outFile("TEST.txt");
outFile.write(book,size);
outFile.close();
Keith Thompson's question is very important. Depending on which Unicode encoding, writing a small C routine that reads (and discards) the Unicode characters can be trivial, or slightly more complex.
Supposing the encoding is UTF-8, you will have a problem determining when to stop discarding because ASCII is a subset of UTF-8, so any time you encounter an ASCII char, you might be tempted to say "this is it, we're back in ASCII land" and the next char still might be still outside the ASCII range.
So you need to read the file and determine where the last character>127 is. Anything after that is plain ASCII -- hopefully.
A text file is generally in just one encoding utf-8, utf-16 (big or little endian) or utf-32 (big or little) or ASCII or other ANSI code pages. Mixing of encoding is only possible in some custom ways.
That said, you will have to read both the data that you need and that you don't in the same encoding. If you know the format is utf-8 you could, depending on what you are going to do with the data, read the file as a binary file into char buffer piece by piece. Then you could API(s) like strnextc (on windows. equivalent API must be available on other platforms) to move character by character on the buffer. Once you reach the end - you could move the balance to the front of the buffer and load the rest of the buffer from the file.
In fact you could use the above approach in general for any encoding. But for utf-16, you could try using wifstream - provided the endianess of the file and the platform you would be running on is the same. And you need to check if the implementation of wifstream is good at handling change in endiness and is able to take care of BOM (byte order mark) - 2 byte sequence ("FE FF" or "FF FE") that is generally present at the beginning of a file - leave alone surrogate pairs.

how to get a single character from UTF-8 encoded URDU string written in a file?

i am working on Urdu Hindi translation/transliteration. my objective is to translate an Urdu sentence into Hindi and vice versa, i am using visual c++ 2010 software with c++ language. i have written an Urdu sentence in a text file saved as UTF-8 format. now i want to get a single character one by one from that file so that i can work on it to convert it into its equivalent Hindi character. when i try to get a single character from input file and write this single character on output file, i get some unknown ugly looking character placed in output file. kindly help me with proper code. my code is as follows
#include<iostream>
#include<fstream>
#include<cwchar>
#include<cstdlib>
using namespace std;
void main()
{
wchar_t arry[50];
wifstream inputfile("input.dat",ios::in);
wofstream outputfile("output.dat");
if(!inputfile)
{
cerr<<"File not open"<<endl;
exit(1);
}
while (!inputfile.eof()) // i am using this while just to
// make sure copy-paste operation of
// written urdu text from one file to
// another when i try to pick only one character
// from file, it does not work.
{ inputfile>>arry; }
int i=0;
while(arry[i] != '\0') // i want to get urdu character placed at
// each-index so that i can work on it to convert
// it into its equivalent hindi character
{ outputfile<<arry[i]<<endl;
i++; }
inputfile.close();
outputfile.close();
cout<<"Hello world"<<endl;
}
Assuming you are on Windows, the easiest way to get "useful" characters is to read a larger chunk of the file (for example a line, or the entire file), and convert it to UTF-16 using the MultiByteToWideChar function. Use the "pseudo"-codepage CP_UTF8. In many cases, decoding the UTF-16 isn't required, but I don't know about the languages you are referring to; if you expect non-BOM characters (with codes above 65535) you might want to consider decoding the UTF-16 (or decode the UTF-8 yourself) to avoid having to deal with 2-word characters.
You can also write your own UTF-8 decoder, if you prefer. It's not complicated, and just requires some bit-juggling to extract the proper bits from the input bytes and assemble them into the final unicode value.
HINT: Windows also has a NormalizeString() function, which you can use to make sure the characters from the file are what you expect. This can be used to transform characters that have several representations in Unicode into their "canonical" representation.
EDIT: if you read up on UTF-8 encoding, you can easily see that you can read the first byte, figure out how many more bytes you need, read these as well, and pass the whole thing to MultiByteToWideChar or your own decoder (although your own decoder could just read from the file, of course). That way you could really do a "read one char at a time".
'w' classes do not read and write UTF-8. They read and write UTF-16. If your file is in UTF-8, reading it with this code will produce gibberish.
You will need to read it as bytes and then convert it, or write it in UTF-16 in the first place.

Output data not the same as input data

I'm doing some file io and created the test below, but I thought testoutput2.txt would be the same as testinputdata.txt after running it?
testinputdata.txt:
some plain
text
data with
a number
42.0
testoutput2.txt (In some editors its on seperate lines, but in others its all on one line)
some plain
਍ऀ琀攀砀琀ഀഀ
data with
਍ 愀  渀甀洀戀攀爀ഀഀ
42.0
int main()
{
//Read plain text data
std::ifstream filein("testinputdata.txt");
filein.seekg(0,std::ios::end);
std::streampos length = filein.tellg();
filein.seekg(0,std::ios::beg);
std::vector<char> datain(length);
filein.read(&datain[0], length);
filein.close();
//Write data
std::ofstream fileoutBinary("testoutput.dat");
fileoutBinary.write(&datain[0], datain.size());
fileoutBinary.close();
//Read file
std::ifstream filein2("testoutput.dat");
std::vector<char> datain2;
filein2.seekg(0,std::ios::end);
length = filein2.tellg();
filein2.seekg(0,std::ios::beg);
datain2.resize(length);
filein2.read(&datain2[0], datain2.size());
filein2.close();
//Write data
std::ofstream fileout("testoutput2.txt");
fileout.write(&datain2[0], datain2.size());
fileout.close();
}
Its working fine on my side, i have run your program on VC++ 6.0 and checked the output on notepad and MS Word. can you specify name of editor where you are facing problem.
You can't read Unicode text into a std::vector<char>. The char data type only works with narrow strings, and my guess is that the text file you're reading in (testinputdata.txt) is saved with either UTF-8 or UTF-16 encoding.
Try using the wchar_t type for your characters, instead. It is specifically designed to work with "wide" (or Unicode) characters.
Thou shalt verify thy input was successful! Although this would sort you out, you should also note that number of bytes in the file has no direct relationship to the number of characters being read: there can be less characters than bytes (think Unicode character using multiple bytes using UTF8 to be encoded) or vice versa (although the latter doesn't happen with any of the Unicode encodings). All you experience is that read() couldn't read as many characters as you'd asked it to read but write() happily wrote the junk you gave it.