ifstream read: where are extra bytes coming from? - c++

I am reading WHOIS record files. The first line of a sample file reads, in the editor: "id:0--0.ga"
In code, I check to verify that the first line starts with "id:" as follows:
// given ifstream * fs,
char id[3];
streampos pos = fs-> tellg();
fs -> read(&id[0],3);
fs -> seekg(pos);
if (// id[3] is "id:" ...
However, when I do this (and I am running a debugger; further it is compiled with clang rather than gcc), I get the following result in id:
The characters it read, in addition to an 'i', 'd', and ':' were:
\xb87#_?
Where the question mark has a stop sign around it. I am not sure how I could have read anything "extra," seeing as I am only reading three bytes into an array of the proper length...
Further, the if statement evaluates to true.
Could this just be a coding mistake, an error in the debugger, or is something else going on?

The debugger is assuming that id contains a string, which it does not. You should probably just ignore the debugger when looking at things that aren't stored in formats you expect the debugger to understand.
The alternative is to mentally convert the debugger's display into the raw memory contents and then mentally parse the raw memory contents in the correct format. We have some area of memory which, if understood to contain a string would mean "id:\xb87#_? ..." so that same area of memory, if understood to be an array of only three characters, would be "id:".

Related

C++ Null characters in string?

I want to read a txt file and convert two cells from each line to floats.
If I first run:
someString = someString.substr(1, tempLine.size());
And then:
std::stof(someString)
it only converts the first number in 'someString' to a number. The rest of the string is lost.
When I handled the string in my IDE I noticed that copying it and pasting it inside quotation marks gives me "\u00005\u00007\u0000.\u00007\u00001\u00007\u00007\u0000" and not 57.7177.
If I instead do:
std::string someOtherString = "57.7177"
std::stof(someOtherString)
I get 57.7177.
Minimal working example is:
int main() {
std::string someString = "\u00005\u00007\u0000.\u00007\u00001\u00007\u00007\u0000";
float someFloat = std::stof(someString);
return 0;
}
Same problem occurs using both UTF-8 and -16 encoding.
What is happening and what should I do differently? Should I remove the null-characters somehow?
"I want to read a txt file"
What is the encoding of the text file? "Text" is not a encoding. What I suspect is happening is that you wrote code that reads in the file as either UTF8 or Windows-1250 encoding, and stored it in a std::string. From the bytes, I can see that the file is actually UTF16BE, and so you need to read into a std::u16string. If your program will only ever run on Windows, then you can get by with a std::wstring.
You probably have followup questions, but your original question is vague enough that I can't predict what those questions would be.

what's exactly the string of "^A" is?

I run my code on an online judgement. I log the string, key. Below is my code:
fprintf(stderr, "key=%s, and key.size()=%d\n", key.c_str(), key.size());
But the result is this:
key=^A, and key.size()=8
I want to what is the ^A represent in ascii. ^A's size is 2 rather than 8, but it shows that it is 8. I view the result by vim, and the log_file is encoded by UTF-8. Why?
Your viewer is electing to show you the bytes interpreted using a character encoding of its choosing and electing to show the resulting characters in caret notation.
Other viewers could make different choices on both counts or allow you to indicate what you want. For example, control picture characters (␁) instead of caret notation.
For a std:string c_str() is terminated by an additional \x00 byte following the actual value. You often use c_str() with functions that expect a string to be \x00 terminated. This applies to fprintf. In such cases, what's read ends just before the first \x00 seen.
You have several \x00 bytes in your string, which, of course, contributes to size() but fprintf will stop right at the first one (and not count it).
I have solve it by myself. If you write a std::string "\x01\x00\x00\x00\x00end" to a file and open it with vim later, you will get '^A'.
This is my test code:
string sss("\x01\x00\x00\x00\x00end");
ofstream of("of.txt");
for (int i=0; i<sss.size(); i++) {
of.put(sss[i]);
}
of.close();
After I open the file "of.txt", I saw "^A";

Delete content in a text file between two specific characters

I'm making a simple bug tracker and am using a text file as the database. Right now I'm reading in all the information through keys and importing them into specific arrays.
for (int i = 0; i < 5; i++)
{
getline(bugDB, title[i], '#');
getline(bugDB, importance[i], '!');
getline(bugDB, type[i], '$');
getline(bugDB, description[i], '*');
}
Here is what's in my (terribly unreadable) file
Cant jump#Moderate!Bug$Every time I enter the cave of doom, I'm unable
to jump.*Horse too expensive#Moderate!Improvement$The horses cost way
too much gold, please lower the costs.*Crash on startup#Severe!Bug$I'm
crashing on startup on my Win8.1 machine, seems to be a 8.1
bug.*Floating tree at Imperial March#Minimal!Bug$There is a tree
floating about half a foot over the ground near the crafting
area.*Allow us to instance our group#Moderate!Improvement$We would
like a feature that gives us the ability to play with our groups alone
inside dungeons.*
Output:
This works great for me, but I'd like to be able to delete specific bugs. I'd be able to do this by letting the user choose a bug by number, find the corresponding * key, and delete all information until the program reaches the next * key.
I'd appreciate any suggestions, I don't know where to start here.
There is no direct mechanism for deleting some chunk of data from the middle of the file, no delete(file, start, end) function. To perform such a deletion you have to move the data which appears after the region; To delete ten bytes from the middle of a file you'd have to move all of the subsequent bytes back ten, looping over the data, then truncate to make the file ten bytes smaller.
In your case however, you've already written code to parse the file into memory, populating your arrays. Why not just implement a function to write the contents of the arrays back to a file? Truncate the file (open in mode "w" rather than "w+"), loop over the arrays writing their contents back to the file in your preferred format, but skip the entry that you want to delete.
its only possible by manually copying the data from input file to output file and leaving out the entry you want to delete.
but: i strongly encourage the usage of some small database for keeping the informations (look at sqlite)
Also its a bad bugtracker if solving the bug means "delete it from database" (its not even is a tracker). give it a status field (open, refused, duplicate, fixed, working, ...).
Additional remarks:
use one array that keeps some structure with n informations and not n arrays.
please remind that someone may use your delimiter characters in the descriptions (use some uncommon character and replace its usage in saved text)
explanation for 1.:
instead of using
std::vector<std::string> title;
std::vector<int> importance;
std::vector<std::string> description;
define a structure or class and create a vector of this structure.
struct Bug{
std::string title;
int importance; // better define an enum for importance
std::string description;
};
std::vector<Bug> bugs;

Preventing buffer overflow when using fscanf

I'm using fscanf to read some values from a CSV file and I want to ensure that the data read into the values will not be too large and cause a buffer overflow.
My csv file has the format int,string,string and my code to read is below (I will fix the while condition later):
while(fscanf(f, "%d,%[^,],%[^,]", &inArray[i].ID, inArray[i].label, inArray[i].brand)/*insert while condition here*/
When using scanf I would specify the length like so to prevent overflow: scanf("%20f", example);
But if I try the same with the above: while(fscanf(f, "%d,%20[^,],%10[^,]", &inArray[i].ID, inArray[i].label, inArray[i].brand)/*insert while condition here*/
I get a crash when the code executes.
Try fscanf_s, this function has security enhancements.
http://msdn.microsoft.com/en-us/library/6ybhk9kc(v=vs.90).aspx
You can't do that with fprintf when reading characters.
I would read the whole line first, e.g., with getline(), locate the separators (or tokenize the line), and then parse the individual elements.
Btw., the reason for you crash might also be a wrong definition/initialization of inArray.
OP likely used the wrong width in the fscanf().
Although OP did not post details about inArray[i] let's assume it was
struct {
int ID;
char label[20];
char brand[10];
} inArray[100];
The format should then be
"%d,%19[^,],%9[^,]"
The width of 19 needs to be 1 less than the size of the destination, thus allowing a spot for the '\0'.

fscanf multiple lines [c++]

I am reading in a file with multiple lines of data like this:
:100093000202C4C0E0E57FB40005D0E0020C03B463
:1000A3000105D0E0022803B40205D0E0027C03027C
:1000B30002E3C0E0E57FB40005D0E0020C0BB4011D
I am reading in values byte by byte and storing them in an array.
fscanf_s(in_file,"%c", &sc); // start code
fscanf_s(in_file,"%2X", &iByte_Count); // byte count
fscanf_s(in_file,"%4X", &iAddr); // 2 byte address
fscanf_s(in_file,"%2X", &iRec_Type); // record type
for(int i=0; i<iByte_Count; i++)
{
fscanf_s(in_file,"%2X", &iData[i]);
iArray[(iMaskedAddr/16)][iMaskedNumMove+3+i]=iData[i];
}
fscanf_s(in_file,"%2X", &iCkS);
This is working great except when I get to the end of the first line. I need this to repeat until I get to the end of the file but when I put this in a loop it craps out.
Can I force the position to the begining of the next line?
I know I can use a stream and all that but I am dealing with this method.
Thanks for the help
My suggestion is to dump fscanf_s and use either fgets or std::getline.
That said, your issue is handling the newlines, and the next beginning of record token, the ':'.
One method is to use fscanf_s("%c") until the ':' character is read or the end of file is reached:
char start_of_record;
do
{
fscanf_s(infile, "%c", &start_of_record);
} while (!feof(infile) && (start_of_record != ':'));
// Now process the header....
The data the OP is reading is a standard format for transmitting binary data, usually for downloading into Flash Memories and EPROMs.
Your topic clear states that you are using C++ so, if I may, I suggest you use the correct STL stream manipulators.
To read line-by-line, you can use ifstream::getline. But again, you are not reading the file line by line, you are reading it field by field. So, you should try using ifstream::read, which lets you choose the amount of bytes to read from the stream.
UPDATE:
While doing an unrelated search over the net, I found out about a library called IOF which may help you with this task. Check it out.