friends!
Programming language is C++.
I have a byte matrix
unsigned char Map[Height][Width];
I initialize it with zeros.
Then I read byte matrix from text file.
Of course, text file can be larger, than my matrix is. In this case I must read information from file in such way:
Text file has extra information. I don't need it.
In another case Matrix can be larger then information within text file:
In this case program must read all the information from file. The part of matrix that didn't get information from file is already initialized. It's OK.
What is the best way to read information from file to the matrix in my case?
I tried to use fgets, but it reads all information from file in consecutive way - byte after byte. But I don't need to read extra bytes. Of course, I can read file byte after byte and check counter. But I am sure that this is'nt best solution. Does a better solution exist?
The formatting of my text file is'nt relevant here. I read information from file like bytes field.
Assuming your text files look something like this:
3254352
6536543
8875687
4315254
5345435
1212122
the solution with standard library streams would be something like this:
#include <fstream>
#include <string>
int main()
{
std::ifstream matrix_file("matrix.txt");
const size_t Width = 5;
const size_t Height = 5;
unsigned char Map[Width][Height];
size_t line_count = 0;
std::string line;
while (std::getline(matrix_file, line) && line_count < Height) {
line.resize(Width);
for (size_t i = 0; i < line.length(); ++i)
Map[i][line_count] = line[i];
++line_count;
}
if (line_count < Height) {
// the file didn't have enough lines to fill our matrix,
// so we'll need to fill the rest with zeroes here
}
}
The line.resize(Width); line above will automatically put null characters in the string when Width is greater than the length of the line in the file. Also, don't forget to check if stream was opened succesfully before you try to read from it (fstream::is_open(), for example). I skipped that check for brevity.
If you can read all the lines with fgets(), you can copy the i-th line character by character to the i-th row of your table. Stop when you encounter the end of the line or when you have copied Width characters. Be careful that fgets() might leave a NL character at the end of the line that you might want to remove (NL + CR on Windows).
Repeat the above procedure until you have read Height lines or reached the end of the file.
My solution is basically C (because you say you are not familiar with the C++ functions). You should use standard C++ libraries (fstream, string) if you want a real C++ implementation.
#include <cstdio>
#include <cstring>
#define Height (...)
#define Width (...)
// I assume that each input line in the file can contain at most width * 3 characters.
// Three extra characters for NL, CR, 0.
// Change this if you expect the file to contain longer lines.
#define BUFFER_WIDTH (Width * 3 + 3)
unsigned char Map[Height][Width];
unsigned char line[BUFFER_WIDTH];
// Remove CR, NL at the end of the line.
void clean_line(char *line)
{
int len = strlen(line);
while (len > 0 && (line[len - 1] == '\n' || line[len - 1] == '\r'))
{
line[len - 1] = '\0';
len--;
}
}
void read_table(const char *filename)
{
FILE *fp = fopen(filename, "r");
int row = 0;
while (!feof(fp) && row < Height)
{
fgets(line, BUFFER_WIDTH, fp);
clean_line(line);
int len = strlen(line);
int rowLen = len > Width ? Width : len;
for (int col = 0; col < rowLen; col++)
{
Map[row][col] = line[col];
}
row++;
}
fclose(fp);
}
Read the entire file into a buffer:
FILE *fp;
long len;
char *buf;
fp=fopen("thefileyouwanttoread.txt","rb");
fseek(fp,0,SEEK_END); //go to end
len=ftell(fp); //get position at end (length)
fseek(fp,0,SEEK_SET); //go to beg.
buf=(char *)malloc(len); //malloc buffer
fread(buf,len,1,fp); //read into buffer
fclose(fp);
Copy the file into your byte array, check to see which is bigger to determine how to copy:
char *startByteArr;
unsigned char Map[Height][Width];
startByteArr = &Map[0][0];
if (len > Height*Width){
memcpy(startByteArr,buf,Height*Width);
else {
memcpy(startByteArr,buf,len);
}
This assumes that the first dimension is the same though. To account for varying width in the file, you could change the memcpy like:
char *startByteArr;
char *EndLineFile;
int lineLengthFile;
EndLineFile = strstr (buf,"\r\n");
lineLenghtFile = (int)(EndLineFile-buf);
unsigned char Map[Height][Width];
startByteArr = &Map[0][0];
int i;
if (lineLengthFile > Width){
for(i = 0; i < Height;i++){
memcpy(startByteArr+i*Width,buf+i*lineLengthFile,Width);
}
else {
for(i = 0; i < Height;i++){
memcpy(startByteArr+i*Width,buf+i*lineLengthFile,lineLengthFile);
}
}
I believe that would be the fastest way, just grab the whole file into memory in one read, then memcpy the segments you need to a byte array.
Related
I have the most strange problem here... I'm using the same code(copy-paste) from Linux in Windows to READ and WRITE and BMP image. And from some reason in Linux every thing works perfectly fine, but when I'm coming to Windows 10 from some I can't open that images and I've receive an error message how said something like this:
"It looks like we don't support this file format."
Do you have any idea what should I do? I will put the code below.
EDIT:
I've solved the padding problem and now it's write the images but they are completely white, any idea why? I've update the code also.
struct BMP {
int width;
int height;
unsigned char header[54];
unsigned char *pixels;
int size;
int row_padded;
};
void writeBMP(string filename, BMP image) {
string fileName = "Output Files\\" + filename;
FILE *out = fopen(fileName.c_str(), "wb");
fwrite(image.header, sizeof(unsigned char), 54, out);
unsigned char tmp;
for (int i = 0; i < image.height; i++) {
for (int j = 0; j < image.width * 3; j += 3) {
// Convert (B, G, R) to (R, G, B)
tmp = image.pixels[j];
image.pixels[j] = image.pixels[j + 2];
image.pixels[j + 2] = tmp;
}
fwrite(image.pixels, sizeof(unsigned char), image.row_padded, out);
}
fclose(out);
}
BMP readBMP(string filename) {
BMP image;
string fileName = "Input Files\\" + filename;
FILE *f = fopen(fileName.c_str(), "rb");
if (f == NULL)
throw "Argument Exception";
fread(image.header, sizeof(unsigned char), 54, f); // read the 54-byte header
// extract image height and width from header
image.width = *(int *) &image.header[18];
image.height = *(int *) &image.header[22];
image.row_padded = (image.width * 3 + 3) & (~3);
image.pixels = new unsigned char[image.row_padded];
unsigned char tmp;
for (int i = 0; i < image.height; i++) {
fread(image.pixels, sizeof(unsigned char), image.row_padded, f);
for (int j = 0; j < image.width * 3; j += 3) {
// Convert (B, G, R) to (R, G, B)
tmp = image.pixels[j];
image.pixels[j] = image.pixels[j + 2];
image.pixels[j + 2] = tmp;
}
}
fclose(f);
return image;
}
In my point of view this code should be cross-platform... But it's not... why?
Thanks for help
Check the header
The header must start with the following two signature bytes: 0x42 0x4D. If it's something different a third party application will think that this file doesn't contain a bmp picture despite the .bmp file extension.
The size and the way pixels are stored is also a little bit more complex than what you expect: you assume that the number of bits per pixels is 24 and no no compression is used. This is not guaranteed. If it's not the case, you might read more data than available, and corrupt the file when writing it back.
Furthermore, the size of the header depends also on the BMP version you are using, which you can detect using the 4 byte integer at offset 14.
Improve your code
When you load a file, check the signature, the bmp version, the number of bits per pixel and the compression. For debugging purpose, consider dumping the header to check it manually:
for (int i=0; i<54; i++)
cout << hex << image.header[i] << " ";`
cout <<endl;
Furthermore, when you fread() check that the number of bytes read correspond to the size you wanted to read, so to be sure that you're not working with uninitialized buffer data.
Edit:
Having checked the dump, it appears that the format is as expected. But verifying the padded size in the header with the padded size that you have calculated it appears that the error is here:
image.row_padded = (image.width * 3 + 3) & (~3); // ok size of a single row rounded up to multiple of 4
image.pixels = new unsigned char[image.row_padded]; // oops ! A little short ?
In fact you read row by row, but you only keep the last one in memory ! This is different of your first version, where you did read the full pixels of the picture.
Similarly, you write the last row repeated height time.
Reconsider your padding, working with the total padded size.
image.row_padded = (image.width * 3 + 3) & (~3); // ok size of a single row rounded up to multiple of 4
image.size_padded = image.row_padded * image.height; // padded full size
image.pixels = new unsigned char[image.size_padded]; // yeah !
if (fread(image.pixels, sizeof(unsigned char), image.size_padded, f) != image.size_padded) {
cout << "Error: all bytes couldn't be read"<<endl;
}
else {
... // process the pixels as expected
}
...
I'm working on a C++ text file compression program for a class.
I have everything working except being able to output to a file in binary mode.
I am using:
FILE* pFile;
pFile = fopen(c, "wb");
In order to open the file for writing in binary mode,
bool Buffer[8] = { 0, 0, 0,0, 0,0, 0,0 };
Im using a buffer of bools (initialized to all 0's) to store the 1's and 0's to store each byte of data until I use fwrite.
vector<bool> temp2 = bitstring[temp];//bitstring refers to a vector in a map at key [temp]
for (vector<bool>::iterator v = temp2.begin(); v != temp2.end(); ++v)
{
if (j < 8)//To create an 8 bit/byte buffer
{
if (*v == 1)//Checks the vector at that position to know what the bit is.
Buffer[j] =1;//sets the array at 'j' to 1
else
Buffer[j] =0 ;//sets the array at 'j' to 0
j++;
}
else //once the Buffer hits 8 it will print the buffer to the file
{
fwrite(Buffer,1,sizeof(Buffer), pFile);
clearBuffer(Buffer);//Clears the buffer to restart the process.
j = 0;
}
}
the (vector iterator )is going through a vector of bools that is assigned to a specific character, essentially a unique binary string that represents the character. My problem is that instead of outputting as binary in the buffer, its outputting essentially ASCII characters in binary mode, instead of just the digits as binary. Which ends up making the file WAY bigger than it needs to be. How could i change the buffer to just output bits. I was told to use bitwise operators, but I can't find very much documentation on implementing this in c++. Any help is appreciated, and
I would use std::bitset in the first place, definitely flexible for this purpose
std::bitset<8> BufferBits;
vector<bool> temp2 = bitstring[temp];//bitstring refers to a vector in a map at key [temp]
for (vector<bool>::iterator v = temp2.begin(); v != temp2.end(); ++v)
{
if (j < 8)//To create an 8 bit/byte buffer
{
if (*v == 1)//Checks the vector at that position to know what the bit is.
BufferBits[j] =1;//sets the array at 'j' to 1
else
BufferBits[j] =0 ;//sets the array at 'j' to 0
j++;
}
else //once the Buffer hits 8 it will print the buffer to the file
{
unsigned long i = BufferBits.to_ulong();
unsigned char c = static_cast<unsigned char>( i );
fwrite(&c, sizeof(char), 1, pFile);
BufferBits.reset();//Clears the buffer to restart the process.
j = 0;
}
}
Notice: I just considered the issues regarding your bit-vector
To set a single bit in a byte, use a shift and an or. This code starts with the highest order bit in a byte when j is 0, which is the usual convention.
char data = 0;
// ...
data |= 0x80 >> j;
To set individual bits in a byte you can use a union and bitfields:
typedef union
{
struct
{
unsigned char value;
} byte;
struct
{
unsigned char b0:1;
unsigned char b1:1;
unsigned char b2:1;
unsigned char b3:1;
unsigned char b4:1;
unsigned char b5:1;
unsigned char b6:1;
unsigned char b7:1;
} bits;
} u;
int main()
{
u buffer = {{0}};
buffer.bits.b0 = 1;
buffer.bits.b1 = 0;
buffer.bits.b2 = 1;
cout << static_cast<int>(buffer.byte.value) << endl;
return 0;
}
which would print out 5 (depending on your PC's endianness)
I'm having some troubles with pnm files (which is kinda obvious or else I wouldn't be posting here XD). Thing is, my teacher asked us to develop a simple pnm reader in binary mode then print it to the screen. I'm using libEGL (a framework avaliable here). My problem is that it works only with these two images and fails with any other one.
With birch.pnm and checkers.pnm it works, but cathedral.pnm, cotton.pnm and fish_tile.pnm it just simple enters an infinite loop or throws and error.
The images are avaliable here
My code is as follows:
#include <iostream>
#include <string>
#include <sstream>
#include <fstream>
#include "engcomp_glib.h"
using namespace std;
struct RGB{
char red, green, blue;
};
int main(int argc, char* argv[]){
RGB **image;
RGB pixel;
//ifstream _file("..\\bin\\birch.pnm");
ifstream _file("..\\bin\\checkers.pnm");
//ifstream _file("..\\bin\\cotton.pnm");
//ifstream _file("..\\bin\\cathedral.pnm");
//ifstream _file("..\\bin\\fish_tile.pnm");
string type, maxColor;
int width, height;
if(_file){
_file >> type;
if(type != "P6")
cout << "Error! File type is not allowed." << endl;
_file >> width >> height >> maxColor;
_file.close();
egl_inicializar(width, height, true);
image = new RGB *[height];
for(int row = 0; row < height; row++)
image[row] = new RGB[width];
//Working 8D
//_file.open("..\\bin\\birch.pnm", ios::binary);
_file.open("..\\bin\\checkers.pnm", ios::binary);
//Not working D:<
//_file.open("..\\bin\\cathedral.pnm", ios::binary);
//_file.open("..\\bin\\fish_tile.pnm", ios::binary);
//_file.open("..\\bin\\cotton.pnm", ios::binary);
//imagem img; img.carregar("..\\bin\\birch.pnm");
_file.seekg(0, _file.end);
int size = _file.tellg();
int currentSize = 0, counter = 0;
char byte;
_file.seekg(0, _file.beg);
do{
_file.read(reinterpret_cast<char *> (&byte), sizeof(char));
if(byte == 10 || byte == 13)
counter++;
}while(counter < 3);
int rows = 0, columns = 0;
while(size != currentSize){
_file.read(reinterpret_cast<char *> (&pixel), sizeof(RGB));
if(rows < height && columns < width){
image[rows][columns] = pixel;
rows++;
}
else if(rows == height){
rows = 0;
columns++;
image[rows][columns] = pixel;
rows++;
}
//else if(columns >= width)
//currentSize = size;
currentSize = _file.tellg();
}
_file.close();
while(!key[SDLK_ESCAPE]){
for(int row = 0; row < height; row++)
for(int column = 0; column < width; column++)
//egl_pixel(row, column, image[row][column].red, image[row][column].green, image[row][column].blue);
egl_pixel(column, row, image[column][row].red, image[column][row].green, image[column][row].blue);
//img.desenha(0, 0);
egl_desenha_frame(false);
}
}
egl_finalizar();
return 0;
}
It doesn't make sense, as it works for two of them, should work form them all
I opened them all in a text editor and they have the header, so the problem is not there. What am I doing wrong? My colleague wrote a code that stores the pixels into an array with size [height * width] and can read almost all of the images but cathedral.pnm.
Thanks for the patience and help :)
The specs for the pnm state that the values in the header are separated by whitespace, usually newlines, but they could also be spaces or tabs (or something else I can't think of at the moment ;). The cathedral file for instance has a space as separator.
And you're reading the files top to bottom, left to right, in stead of left to right, top to bottom, as per the specs.
And if you want to be really correct, if maxColor is not less than 256, you should read shorts in stead of chars.
You can find the specs here by the way:
http://netpbm.sourceforge.net/doc/ppm.html
Good luck!
I am trying to read a large file (~5GB) using ifstream in C++.
Since I'm on a 64bit OS, I thought this shouldn't be a problem.
Still, I get a segfault. Everything runs fine with smaller files,
so I'm pretty sure that is where the problem is.
I'm using g++ (4.4.5-8) and libstdc++6 (4.4.5-8).
Thanks.
The code looks like this:
void load (const std::string &path, int _dim, int skip = 0, int gap = 0) {
std::ifstream is(path.c_str(), std::ios::binary);
BOOST_VERIFY(is);
is.seekg(0, std::ios::end);
size_t size = is.tellg();
size -= skip;
long int line = sizeof(float) * _dim + gap;
BOOST_VERIFY(size % line == 0);
long int _N = size / line;
reset(_dim, _N);
is.seekg(skip, std::ios::beg);
char *off = dims;
for (long int i = 0; i < N; ++i) {
is.read(off, sizeof(T) * dim);
is.seekg(gap, std::ios::cur);
off += stride;
}
BOOST_VERIFY(is);
}
The segfault is in the is.read line for i=187664.
T is float and I'm reading dim=1000 floats at a time.
When the segfault occures, i * stride is way smaller than size, so I'm not running past the end of the file.
dims is allocated here
void reset (int _dim, int _N)
{
BOOST_ASSERT((ALIGN % sizeof(T)) == 0);
dim = _dim;
N = _N;
stride = dim * sizeof(T) + ALIGN - 1;
stride = stride / ALIGN * ALIGN;
if (dims != NULL) delete[] dims;
dims = (char *)memalign(ALIGN, N * stride);
std::fill(dims, dims + N * stride, 0);
}
I don't know if this is the bug, but this code looks very C like and plenty of opportunity to leak. Any way try changing
void reset (int _dim, int _N)
to
void reset (size_t dim, size_t _N)
//I would avoid using leading underscores that is usually used to identify elements of the standard library.
When you are dealing with the size or index of something in memory ALWAYS use size_t, it is guaranteed to be able to hold the maximum size of an object including arrays.
I think you have to use _ftelli64 etc... to have the right size of your file, and to use long long (or _int64) variables to manage it. But it's C library. I don't find how to use ifstream with so big file (actualy > 2Go). Did you find the way ?
PS : In your case size_t is fine, but I'm not sure that's OK with 32-bit software. I'm sure it's OK with 64-bit.
int main()
{
string name="tstFile.bin";
FILE *inFile,*inFile2;
fopen_s(&inFile,name.c_str(),"rb");
if (!inFile)
{
cout<<"\r\n***error -> File not found\r\n";
return 0;
}
_fseeki64 (inFile,0L,SEEK_END);
long long fileLength = _ftelli64(inFile);
_fseeki64 (inFile,0L,SEEK_SET);
cout<<"file lg : "<<fileLength<<endl;
return 1;
}
How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
// export a bitset consisting of I bits to an output stream.
// Eight bits are stored to a single stream byte.
unsigned int i = 0; // the current bit index
unsigned char c = 0; // the current byte
short bits = 0; // to process next byte
while(i < in.size())
{
c = c << 1; //
if(in.at(i)) ++c; // adding 1 if bit is true
++bits;
if(bits == 8)
{
out.put((char)c);
c = 0;
bits = 0;
}
++i;
}
// dump remaining
if(bits != 0) {
// pad the byte so that first bits are in the most significant positions.
while(bits != 8)
{
c = c << 1;
++bits;
}
out.put((char)c);
}
return;
}
template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
// read bytes from the input stream to a bitset of size I.
/* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
unsigned int i = 0; // current bit index
unsigned char mask = 0x80; // current byte mask
unsigned char c = 0; // current byte in stream
while(in.good() && (i < I))
{
if((i%8) == 0) // retrieve next character
{ c = in.get();
mask = 0x80;
}
else mask = mask >> 1; // shift mask
out.at(i) = (c & mask);
++i;
}
}
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
How about this
#include <sys/time.h>
#include <unistd.h>
#include <algorithm>
#include <fstream>
#include <vector>
...
{
std::srand(std::time(nullptr));
std::vector<bool> vct1, vct2;
vct1.resize(20000000, false);
vct2.resize(20000000, false);
// insert some data
for (size_t i = 0; i < 1000000; i++) {
vct1[std::rand() % 20000000] = true;
}
// serialize to file
std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
auto vct1_iter = vct1.begin();
vct1_iter += i;
uint32_t block_num = i / std::_S_word_bit;
std::_Bit_type block_val = *(vct1_iter._M_p);
if (block_val != 0) {
// only write not-zero block
ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
}
}
ofs.close();
// deserialize
std::ifstream ifs("bitset", std::ios::in);
ifs.seekg(0, std::ios::end);
uint64_t file_size = ifs.tellg();
ifs.seekg(0);
uint64_t load_size = 0;
while (load_size < file_size) {
uint32_t block_num;
ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
std::_Bit_type block_value;
ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
auto offset = block_num * std::_S_word_bit;
if (offset >= vct2.size()) {
std::cout << "error! already touch end" << std::endl;
break;
}
auto iter = vct2.begin();
iter += offset;
*(iter._M_p) = block_value;
}
ifs.close();
// check result
int count_true1 = std::count(vct1.begin(), vct1.end(), true);
int count_true2 = std::count(vct2.begin(), vct2.end(), true);
std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;
}
One way might be:
std::vector<bool> data = /* obtain bits somehow */
// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS));
for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
int bit = data[byteIndex * CHAR_BITS + bitIndex];
bytes[byteIndex] |= bit << bitIndex;
}
}
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.