How to change stream from text mode to binary in C++?

How to change stream from text mode to binary in C++? - c++

In a game I'm making I need to read a map from a file. Assuming some of the data in the beginning is written in characters, but the tile map is written in binary, I would open the file in text mode then switch it to binary mode once it reaches the tile data.
Is there an easy, or standard, way of changing an ifstream from text mode to binary mode while keeping the same position in the file?
This also applies to the writting part, I will need to start writting into the file using characters, then change to binary mode.
EDIT: I'm using text mode to make this readable and to read strings of unknown size. For example, this line:
map-name=TestMap
I'd read this with
getline( mapFile, attribute, '=' );
getline( mapFile, mapName, '\n' );
How would I read this in binary mode if there won't be newline characters?

The mode is established when the file is opened, and cannot be
changed later. If there is any binary data in the file, you
must use binary mode. But where is the problem? You can read
text in binary mode; line endings might appear a bit strange
(but not if you also wrote it in binary mode), but otherwise,
there should be no problem as long as the binary data actually
is text.

If you are responsible for writing the files as well, the simplest (and perhaps sanest) solution might be to write two files.
One in text, for text you wish to be human readable.
And the second as binary, for things like maps. In fact that way you could have one binary map file for each map.

Related

How to read .inp file in c++?

I have a dataset, a ".inp" format file, and I need to read this file in c++. However, the fopen() fread() method seemed to fail and read the wrong data(e.g. the first integer should be 262144, the fread yields an integer much larger than this nevertheless).
To be more specific, my ".inp" file contains a few integers and float points, how can I read them successfully in c++?
enter image description here
This is the screenshot of the "*.inp" file from Notepad++. Basically this is a text file.

I solved it by coping the data into a txt. However, I am still not aware how to read "*.inp"

I found some info about INP file extension. It seems like there are multiple variances of it, each meant to be used for different purpose. Where is your file coming from? As for soultion, if you can't open the file using fopen/fstream normally, you could treat it as binary and read each value in the way you specify. Other than that, I could think of calling system functions to get file contents (like cat in linux for example), then if there are some random characters, you could parse your string to ommit them.
Here is example of how to call cat in C++:
Simple way to call 'cat' from c++?

What is Eol in text file and normal file?

Now I am quite confused about the end of line character I am working with c++ and I know that text files have a end of line marker which sets the limit for reading a line which a single shifing operator(>>).Data is read continously untill eol character does not apprears and while opening a file in text mode carriage return(CR) is converted into CRLF which is eol marker so if i add white spaces in my text then would it act as eol maker cause it does.
Now i created a normal file i.e. a file without .txt
eg
ifstream("test"); // No .txt
Now what is eol marker in this case

The ".txt" at the end of the filename is just a convention. It's just part of the filename.
It does not signify any magical property of the file, and it certainly doesn't change how the file is handled by your operating system kernel or file system driver.
So, in short, what difference is there? None.
I know that text files have a end of line marker which sets the limit for reading a line which a single shifing operator(>>)
That is incorrect.
Data is read continously untill eol character does not apprears
Also incorrect. Some operating systems (e.g. Windows IIRC) inject an EOF (not EOL!) character into the stream to signify to calling applications that there is no more data to read. Other operating systems don't even do that. But in neither case is there an actual EOF character at the end of the actual file.
while opening a file in text mode carriage return(CR) is converted into CRLF which is eol marker
That conversion may or may not happen and, either way, EOL is not EOF.
if i add white spaces in my text then would it act as eol maker cause it does.
That's a negative, star command.
I'm not sure where you're getting all this stuff from, but you've been heavily mistaught. I suggest a good, peer-reviewed, well-recommended book from Amazon about how computer operating systems work.

When reading strings in C++ using the extraction operator >>, the default is to skip spaces.
If you want the entire line verbatim, use std::getline.
A typical input loop is:
int main(void)
{
std::string text_from_file;
std::ifstream input_file("My_data.txt");
if (!input_file)
{
cerr << "Error opening My_data.txt for reading.\n";
return EXIT_FAILURE;
}
while (input_file >> text_from_file)
{
// Process the variable text_from_file.
}
return EXIT_SUCCESS;
}

A lot of old and mainframe operating systems required a record structure of all data files which, for text files, originated with a Hollerith (punch) card of 80 columns and was faithfully preserved through disk file records, magnetic tapes, output punch card decks, and line printer lines. No line ending was used because the record structure required that every record have 80 columns (and were typically filled with spaces). In later years (1960s+), having variable length records with an 80 column maximum became popular. Today, even OpenVMS still requires the file creator to specify a file format (sequential, indexed, or "stream") and record size (fixed, variable) where the maximum record size must be specified in advance.
In the modern era of computing (which effectively began with Unix) it is widely considered a bad idea to force a structure on data files. Any programmer is free to do that to themselves and there are plenty of record-oriented data formats like compiler/linker object files (.obj, .so, .o, .lib, .exe, etc.), and most media formats (.gif, .tiff, .flv, .mov, mp3, etc.)
For communicating text lines, the paradigm is to target a terminal or printer and for that, line endings should be indicated. Most operating systems environments (except MSDOS and Windows) use the \n character which is encoded in ASCII as a linefeed (ASCII 10) code. MSDOS and ilk use '\r\n' which are encoded as carriage return then linefeed (ASCII 13, 10). There are advantages and disadvantages to both schemes. But text files may also contain other controls, most commonly the ANSI escape sequences which control devices in specific ways:
clear the screen, either in part or all of it
eject a printer page, skip some lines, reverse feed, and other little-used features
establish a scrolling region
change the text color
selecting a font, text weight, page size, etc.
For these operations, line endings are not a concern.
Also, data files encoded in ASCII such as JSON and XML (especially HTML with embedded Javascript), might not have any line endings, especially when the data is obfuscated or compressed.
To answer your questions:
I am quite confused about the end of line character I am working with c++ and I know that text files have a end of line marker
Maybe. Maybe not. From a C or C++ program's viewpoint, writing \n indicates to the runtime environment the end of a line. What the system does with that varies by runtime operating environment. For Unix and Linux, no translation occurs (though writing to a terminal-like device converts to \r\n). In MSDOS, '\n' is translated to \r\n. In OpenVMS, '\n' is removed and that record's size is set. Reading does the inverse translation.
which sets the limit for reading a line which a single shifing operator(>>).
There is no such limit: A program can choose to read data byte-by-byte if it wants as well as ignore the line boundaries.
The "shifting operators" are overloaded for filestreams to input or output data but are not related to bit twiddling shifts. These operators were chosen for visual approximation of input/output and due to their low operator precedence.
Data is read continously untill eol character does not apprears
This bit is confusing: I think you meant until eol character appears, which is indeed how the line-oriented functions gets() and fgets() work.
and while opening a file in text mode carriage return(CR) is converted into CRLF which is eol marker so if i add white spaces in my text then would it act as eol maker cause it does.
Opening the file does not convert anything, but reading from a file might. However, no environment (that I know of) converts input to CR LF. MSDOS converts CR LF on input to \n.
Adding spaces has no effect on end of lines, end of file, or anything. Spaces are just data. However, the C++ streaming operations reading/writing numbers and some other datatypes use whitespace (a sequence of spaces, horizontal tabs, vertical tabs, form feed, and maybe some others) as a delimiter. This convenience feature may cause some confusion.
Now i created a normal file i.e. a file without .txt eg
ifstream("test"); \No .txt
Now what is eol marker in this case
The filename does not determine the file type. In fact, file.txt may not be a text file at all. Using a particular file extension is convenient for humans to communicate a file's purpose, but it is not obligatory.

How to read large files in C++ with mixed text and binary

I need to read a large file of either text, binary, or combination, such as a JPEG file, encrypt it, and write it to a file. At some later time I will need to read the encrypted data, and decrypt it.
The end goal is to verify that the decrypted data matches the original data.
My problem is that with large files greater than 1Meg, I don't want to read and write character by character. I am targeting this code for a phone and I/O will cause too long a delay for the user.
With a pure text file, using fread() and fwrite() convert the data to binary, and the result is different than the original. With a jpeg image, it appears that there is some textual content mixed in with the binary data.
Is there a way to efficiently read in an arbitrary type of file and write it back in the original format?
Or is character by character the only option?
Or am I still out of luck?

After debugging it turned out that the decrypt function had the plain text and cipher text buffers assigned backwards. After swapping the buffer assignments, the decrypted results matched the original data. I originally thought that maybe reading the text as binary and then rewriting as binary would not appear as text, but I was wrong.
Reading the entire file as binary works just fine.

Binary output to file in random positions in C++

I'm beginning to feel awkwardly stupid, but I have a problem with outputting binary data to a file.
I have a file, let's say, 1000 bytes long. What I would like to do, in C++, is simply opening the file, replace ONE byte in a given position (let's say, the i-th byte), and close it.
File before operation:
AAAAAA
File after operation:
AAABAA
What is the easiest way to do so? I tried to open it with a ofstream.open, with the following modes:
ios::binary|ios::out
ios::binary|ios::app
ios::binary|ios::ate
All of these affected the actual size of the file after the operation. What should I do? I'm beginning to feel desperate.
Thank you very much and merry christmas to everybody!
Matteo

Besides binary mode, you need to open it in out and in modes. This corresponds to the fopen mode "r+b" which opens a file for reading and writing, and doesn't truncate the file if it exist (in fact, it must exist or you will fail to open the file).
References:
std::basic_filebuf::open
std::fopen

getline() text with UNIX formatting characters

I am writing a C++ program which reads lines of text from a .txt file. Unfortunately the text file is generated by a twenty-something year old UNIX program and it contains a lot of bizarre formatting characters.
The first few lines of the file are plain, English text and these are read with no problems. However, whenever a line contains one or more of these strange characters mixed in with the text, that entire line is read as characters and the data is lost.
The really confusing part is that if I manually delete the first couple of lines so that the very first character in the file is one of these unusual characters, then everything in the file is read perfectly. The unusual characters obviously just display as little ascii squiggles -arrows, smiley faces etc, which is fine. It seems as though a decision is being made automatically, without my knowledge or consent, based on the first line read.
Based on some googling, I suspected that the issue might be with the locale, but according to the visual studio debugger, the locale property of the ifstream object is "C" in both scenarios.
The code which reads the data is as follows:
//Function to open file at location specified by inFilePath, load and process data
int OpenFile(const char* inFilePath)
{
string line;
ifstream codeFile;
//open text file
codeFile.open(inFilePath,ios::in);
//read file line by line
while ( codeFile.good() )
{
getline(codeFile,line);
//check non-zero length
if (line != "")
ProcessLine(&line[0]);
}
//close line
codeFile.close();
return 1;
}
If anyone has any suggestions as to what might be going on or how to fix it, they would be very welcome.

From reading about your issues it sounds like you are reading in binary data, which will cause getline() to throw out content or simply skip over the line.
You have a couple of choices:
If you simply need lines from the data file you can first sanitise them by removing all non-printable characters (that is the "official" name for those weird ascii characters). On UNIX a tool such as strings would help you with that process.
You can off course also do this programmatically in your code by simply reading in X amount of data, storing it in a string, and then removing those characters that fall outside of the standard ASCII character range. This will most likely cause you to lose any unicode that may be stored in the file.
You change your program to understand the format and basically write a parser that allows you to parse the document in a more sane way.
If you can, I would suggest trying solution number 1, simply to see if the results are sane and can still be used. You mention that this is medical data, do you per-chance know what file format this is? If you are trying to find out and have access to a unix/linux machine you can use the utility file and maybe it can give you a clue (worst case it will tell you it is simply data).
If possible try getting a "clean" file that you can post the hex dump of so that we can try to provide better help than that what we are currently providing. With clean I mean that there is no personally identifying information in the file.
For number 2, open the file in binary mode. You mentioned using Windows, binary and non-binary files in std::fstream objects are handled differently, whereas on UNIX systems this is not the case (on most systems, I'm sure I'll get a comment regarding the one system that doesn't match this description).
codeFile.open(inFilePath,ios::in);
would become
codeFile.open(inFilePath, ios::in | ios::binary);
Instead of getline() you will want to become intimately familiar with .read() which will allow unformatted operations on the ifstream.
Reading will be like this:
// This code has not been tested!
char input[1024];
codeFile.read(input, 1024);
int actual_read = codeFile.gcount();
// Here you can process input, up to a maximum of actual_read characters.
//ProcessLine() // We didn't necessarily read a line!
ProcessData(input, actual_read);
The other thing as mentioned is that you can change the locale for the current stream and change the separator it considers a new line, maybe this will fix your issue without requiring to use the unformatted operators:
imbue the stream with a new locale that only knows about the newline. This method may or may not let your getline() function without issues.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js