I have a binary file with the following repeating format: 6 float values + 3 unsigned char (byte`, integer value from 0 to 255) values.
I am parsing it like this:
FILE *file = fopen("file.bin", "r");
bool valid = true;
while(!feof(file)) {
float vals[6];
valid = valid && (fread((void*)(&vals), sizeof(float), 6, file) == 6);
unsigned char a,b,c;
a = fgetc(file); b = fgetc(file); c = fgetc(file);
(...)
}
This works fine for the first 30 iterations or so, but after that it simply stops parsing (way way before the end of the file).
What could be wrong?
I also tried parsing the unsigned char bytes with
fread((void*)&(a), sizeof(unsigned char), 1, file);
it simply stops parsing (way way before the end of the file).
You and the C Standard library are having a difference of opinion about where the end of the file is. ASCII character EOF (for DOS/Windows: decimal 26, hex 1A, aka Ctrl+Z, for Unix/Linux: decimal 4, hex 04, aka Ctrl+D) is a control character meaning "end of file". There's also the file length stored by the filesystem metadata.
The C stdio functions can operate in several modes: text, default, binary, and these control several behaviors:
Newline translations (implementation-defined): enabled in text mode, disabled in binary mode, default: ???
End of file: Implementation-defined, but usually EOF character in text mode, by filesystem file length in binary mode, default: ???
Since your file contains binary data, you should force binary mode by using "b" in the mode string to fopen, e.g.
FILE* file = fopen("file.bin", "rb");
When you do so, characters with value 26 are treated like any other byte and lose their "EOF" meaning.
Related
Basically, I am trying to read binary data of a file by using fread() and print it on screen using printf(), now, the problem is that when it prints it out, it actually don't show it as binary 1 and 0 but printing symbols and stuff which I don't know what they are.
This is how I am doing it:
#include <stdio.h>
#include <windows.h>
int main(){
size_t sizeForB, sizeForT;
char ForBinary[BUFSIZ], ForText[BUFSIZ];
char RFB [] = "C:\\users\\(Unknown)\\Desktop\\hi.mp4" ; // Step 1
FILE *ReadBFrom = fopen(RFB , "rb" );
if(ReadBFrom == NULL){
printf("Following File were Not found: %s", RFB);
return -1;
} else {
printf("Following File were found: %s\n", RFB); // Step 2
while(sizeForB = fread(ForBinary, 1, BUFSIZ, ReadBFrom)){ // Step 1
printf("%s", ForBinary);
}
fclose(ReadBFrom);
}
return 0;
}
I would really appreciate if someone could help me out to read the actual binary data of a file as binary (0,1).
while(sizeForB = fread(ForBinary, 1, BUFSIZ, ReadBFrom)){
printf("%s", ForBinary); }
This is wrong on many levels. First of all you said it is binary file - which means there might not be text in it in the first place, and you are using %s format specifier which is used to print null terminated strings. Again since this is binary file, and there might not be text in it in the first place, %s is the wrong format specifier to use. And even if there was text inside this file, you are not sure that fread would read a "complete" null terminated string that you could pass to printf with format specifier %s.
What you may want to do is, read each byte form a file, convert it to a binary representation (google how to convert integer to binary string say, e.g., here), and print binary representation for each that byte.
Basically pseudocode:
foreach (byte b in FileContents)
{
string s = convertToBinary(b);
println(s);
}
How to view files in binary in the terminal?
Either
"hexdump -C yourfile.bin" perhaps, unless you want to edit it of course. Most linux distros have hexdump by default (but obviously not all).
or
xxd -b file
To simply read a file and print it in binary (ones and zeros), read it one char at a time. Then for each bit, print a '0' or '1'. Can print Most or Least significant bit first. Suggest MSb.
if (ReadBFrom) {
int ch;
while ((ch = fgetc(ReadBFrom)) != EOF) {
unsigned mask = 1u << (CHAR_BIT - 1); // CHAR_BIT is typically 8
while (mask) {
putchar(mask & ch ? '1' : '0');
mask >>= 1;
}
}
fclose(ReadBFrom);
}
When reading and parsing a CSV-file line, I need to process the nul character that appears as the value of some row fields. It is complicated by the fact that sometimes the CSV file is in windows-1250 encoding, sometimes it in UTF-8, and sometimes UTF-16. Because of this, I have started some way, and then found the nul char problem later -- see below.
Details: I need to clean a CSV files from third party to the form common to our data extractor (that is the utility works as a filter -- storing one CSV form to another CSV form).
My initial approach was to open the CSV file in binary mode and check whether the first bytes form BOM. I know all the given Unicode files start with BOM. If there is no BOM, I know that it is in windows-1250 encoding.
The converted CSV file should use the windows-1250 encoding. So, after checking the input file, I open it using the related mode, like this:
// Open the file in binary mode first to see whether BOM is there or not.
FILE * fh{ nullptr };
errno_t err = fopen_s(&fh, fnameIn.string().c_str(), "rb"); // const fs::path & fnameIn
assert(err == 0);
vector<char> buf(4, '\0');
fread(&buf[0], 1, 3, fh);
::fclose(fh);
// Set the isUnicode flag and open the file according to that.
string mode{ "r" }; // init
bool isUnicode = false; // pessimistic init
if (buf[0] == 0xEF && buf[1] == 0xBB && buf[2] == 0xBF) // UTF-8 BOM
{
mode += ", ccs=UTF-8";
isUnicode = true;
}
else if ((buf[0] == 0xFE && buf[1] == 0xFF) // UTF-16 BE BOM
|| (buf[0] == 0xFF && buf[1] == 0xFE)) // UTF-16 LE BOM
{
mode += ", ccs=UNICODE";
isUnicode = true;
}
// Open in the suitable mode.
err = fopen_s(&fh, fnameIn.string().c_str(), mode.c_str());
assert(err == 0);
After the successful open, the input line is read or via fgets or via fgetws -- depending on whether Unicode was detected or not. Then the idea was to convert the buffer content from Unicode to 1250 if the unicode was detected earlier, or let the buffer be in 1250. The s variable should contain the string in the windows-1250 encoding. The ATL::CW2A(buf, 1250) is used when conversion is needed:
const int bufsize = 4096;
wchar_t buf[bufsize];
// Read the line from the input according to the isUnicode flag.
while (isUnicode ? (fgetws(buf, bufsize, fh) != NULL)
: (fgets(reinterpret_cast<char*>(buf), bufsize, fh) != NULL))
{
// If the input is in Unicode, convert the buffer content
// to the string in cp1250. Otherwise, do not touch it.
string s;
if (isUnicode) s = ATL::CW2A(buf, 1250);
else s = reinterpret_cast<char*>(buf);
...
// Now processing the characters of the `s` to form the output file
}
It worked fine... until a file with a nul character used as the value in the row appeared. The problem is that when the s variable is assigned, the nul cuts the rest of the line. In the observed case, it happened with the file that used 1250 encoding. But it can probably happen also in the UTF encoded files.
How to solve the problem?
The NUL character problem is solved by using either C++ or Windows functions. In this case, the easiest solution is MultiByteToWideChar which will accept an explicit string length, precisely so it doesn't stop on NUL.
I need some explanations on encodage of files using g++ on Linux.
I have an easy code :
int main ()
{
FILE * pFile;
char buffer[] = { 'x' , 'y' , 'z' ,'é' };
pFile = fopen ("myfile", "wt, ccs=UTF-8");
//pFile = fopen ("myfile", "wt");
fwrite (buffer , sizeof(char), sizeof(buffer), pFile);
fclose (pFile);
return 0;
}
Even if the "ccs=UTF-8" part is added on the fopen line, the output file of this program is always encoded in iso-8859-1. However, if I create a file using vi on Linux containing theses charaters, the resulting file is UTF-8 encoded (I use the command "file myfile" to see the encoding mode of the file, and a "xxd -b myfile" confirm this behavior).
So I would like to undestand :
1- Why g++ on Linux doesn't create a UTF-8 file by default?
2- What is the aim of the ccs=UTF-8 if the file created is not encoded in UTF-8?
3- How can I create an UTF-8 file based on this simple code?
Thanks.
Your file may appear to be in ISO-8859-1, but it's actually not. It's simply broken.
Your file contains byte A9, which is the lower byte of UTF-8 representation of é.
When you wrote 'é', the compiler should have warned you:
aaa.c:4:38: warning: multi-character character constant [-Wmultichar]
char buffer[] = { 'x' , 'y' , 'z' ,'é' };
^
char is not a type for a character, it's a type for one byte. GCC treats multibyte character literals as big-endian integers. Here, you cast it immediately to char, leaving the lowest byte: A9
(BTW, é in ISO-8859-1 is E9, not A9)
You open your file with an encoding, but then you save bytes into it. The bytes correspond to ISO-8859-1 characters xyz©.
If you want to write characters, not bytes, then use wchar_t instead of char and fputws instead of fwrite
#include <stdio.h>
#include <wchar.h>
int main ()
{
FILE * pFile;
// note final zero and L indicating wchar_t literal
wchar_t buffer[] = { 'x' , 'y' , 'z' , L'é' , 0};
// note no space before ccs
pFile = fopen ("myfile", "wt,ccs=UTF-8");
fputws(buffer, pFile);
fclose (pFile);
return 0;
}
I'm trying to write-to-disk an array containing 11.26 million uint16_t values. The total memory size should be ~22 MB. However, the size of my file is 52MB. I'm using fprintf to write the array to disk. I thought maybe the values were being promoted. I tried to be explicit but it seems to make no difference. The size of my file is stubbornly unchanged.
What am I doing wrong? Code follows.
#define __STDC_FORMAT_MACROS
...
uint32_t dbsize = 11262336;
uint16_t* db_ = new uint16_t[dbsize_];
...
char fname[256] = "foo";
FILE* f = fopen(fname, "wb");
if(f == NULL)
{
return;
}
fprintf(f, "%i\t", dbsize_);
for(uint32_t i = 0; i < dbsize_; i++)
{
fprintf(f, "%" SCNu16 "", db_[i]);
}
fclose(f);
You're writing ASCII to your file, not binary.
Try writing your array like this instead of using fprintf in a loop.
fwrite(db_, sizeof(db_[0]), dbsize, f);
fprintf always formats numbers and other types to text, whether you've opened the file in binary mode or not. Binary mode just keeps the runtime from doing things like converting \n to \r\n.
fprintf will convert you number to a series of ASCII characters and write them to a file. Depending on its value, a 32-bit int will be from 1 to 10 characters long when expressed as a string. You need to use fwrite to write raw binary values to a file.
The source of confusion is likely to be that the "b" in FILE* f = fopen(fname, "wb"); does not do what you think it does.
Most significantly, it doesn't change any of the print or scan statements to use binary values instead of ASCII values. Like others have said - use fwrite instead.
I have a dat(binary) file but i wish to convert this file into Ascii (txt) file using c++ but i am very new in c++ programming.so I juct opend my 2 files:myBinaryfile and myTxtFile but I don't know how to read data from that dat file and then how to write those data into new txt file.so i want to write a c+ codes that takes in an input containing binary dat file, and converts it to Ascii txt in an output file. if this possible please help to write this codes. thanks
Sorry for asking same question again but still I didn’t solve my problem, I will explain it more clearly as follows: I have a txt file called “A.txt”, so I want to convert this into binary file (B.dat) and vice verse process. Two questions:
1. how to convert “A.txt” into “B.dat” in c++
2. how to convert “B.dat” into “C.txt” in c++ (need convert result of the 1st output again into new ascii file)
my text file is like (no header):
1st line: 1234.123 543.213 67543.210 1234.67 12.000
2nd line: 4234.423 843.200 60543.232 5634.60 72.012
it have more than 1000 lines in similar style (5 columns per one line).
Since I don’t have experiences in c++, I am struggle here, so need your helps. Many Thanks
All files are just a stream of bytes. You can open files in binary mode, or text mode. The later simply means that it may have extra newline handling.
If you want your text file to contain only safe human readable characters you could do something like base64 encode your binary data before saving it in the text file.
Very easy:
Create target or destination file
(a.k.a. open).
Open source file in binary mode,
which prevents OS from translating
the content.
Read an octet (byte) from source
file; unsigned char is a good
variable type for this.
Write the octet to the destination
using your favorite conversion, hex,
decimal, etc.
Repeat at 3 until the read fails.
Close all files.
Research these keywords: ifstream, ofstream, hex modifier, dec modifier, istream::read, ostream::write.
There are utilities and applications that already perform this operation. On the *nix and Cygwin side try od, *octal dump` and pipe the contents to a file.
There is the debug utility on MS-DOS system.
A popular format is:
AAAAAA bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb cccccccccccccccc
where:
AAAAAA -- Offset from beginning of file in hexadecimal or decimal.
bb -- Hex value of byte using ASCII text.
c -- Character representation of byte, '.' if the value is not printable.
Please edit your post to provide more details, including an example layout for the target file.
Edit:
A complex example (not tested):
#include <iostream>
#include <fstream>
#include <cstdio>
#include <cstdlib>
using namespace std;
const unsigned int READ_BUFFER_SIZE = 1024 * 1024;
const unsigned int WRITE_BUFFER_SIZE = 2 * READ_BUFFER_SIZE;
unsigned char read_buffer[READ_BUFFER_SIZE];
unsigned char write_buffer[WRITE_BUFFER_SIZE];
int main(void)
{
int program_status = EXIT_FAILURE;
static const char hex_chars[] = "0123456789ABCDEF";
do
{
ifstream srce_file("binary.dat", ios::binary);
if (!srce_file)
{
cerr << "Error opening input file." << endl;
break;
}
ofstream dest_file("binary.txt");
if (!dest_file)
{
cerr << "Error creating output file." << endl;
}
// While no read errors from reading a block of source data:
while (srce_file.read(&read_buffer[0], READ_BUFFER_SIZE))
{
// Get the number of bytes actually read.
const unsigned int bytes_read = srce_file.gcount();
// Define the index and byte variables outside
// of the loop to maybe save some execution time.
unsigned int i = 0;
unsigned char byte = 0;
// For each byte that was read:
for (i = 0; i < bytes_read; ++i)
{
// Get source, binary value.
byte = read_buffer[i];
// Convert the Most Significant nibble to an
// ASCII character using a lookup table.
// Write the character into the output buffer.
write_buffer[i * 2 + 0] = hex_chars[(byte >> 8)];
// Convert the Least Significant nibble to an
// ASCII character and put into output buffer.
write_buffer[i * 2 + 1] = hex_chars[byte & 0x0f];
}
// Write the output buffer to the output, text, file.
dest_file.write(&write_buffer[0], 2 * bytes_read);
// Flush the contents of the stream buffer as a precaution.
dest_file.flush();
}
dest_file.flush();
dest_file.close();
srce_file.close();
program_status = EXIT_SUCCESS;
} while (false);
return program_status;
}
The above program reads 1MB chunks from the binary file, converts to ASCII hex into an output buffer, then writes the chunk to the text file.
I think you are misunderstanding that the difference between a binary file and a test file is in the interpretation of the contents.