put the file into an array in c++ - c++

I'm trying to read a txt file, and put it into an char array. But can I read different files which contain different length of characters and put them into an array. Can I create a dynamic array to contain unknown length of characters.

You can read a file of unknown size into a dynamics data structure like:
std::vector More info here.
Alternatively, you can use new to allocate a dynamic memory. However, vectors are more convenient at least to me :).

#include <vector>
#include <iostream>
#include <fstream>
int main(int argc, char **argv)
{
std::vector<std::string> content;
if (argc != 2)
{
std::cout << "bad argument" << std::endl;
return 0;
}
std::string file_name (argv[1]);
std::ifstream file(file_name);
if (!file)
{
std::cout << "can't open file" << std::endl;
return 0;
}
std::string line = "";
while (std::getline(file, line))
{
content.push_back(line);
line = "";
}
for (std::vector<std::string>::iterator it = content.begin(); it != content.end(); ++it)
std::cout << *it << std::endl;
}
here is a solution using std::vectors and std::string
the programm takes a file name as first parameter, opens it, read it line by line
each line is written in the vector
then you can display your vector as i did at the end of the function
EDIT: because C++11 is the new standars, the program use C++11 then you have to compile it using c++11 (g++ -std=c++11 if you use g++)
I just tested it it works perfectly

There may be library routines available which give you the size of the file without reading the contents of the file. In that case you could get the size and allocate a full-sized buffer, and suck in the whole file at once [if your buffer is a simple char array, don't forget to add one and put in the trailing nullchar].

The best way is use of malloc(), realloc(), and free() just like it was an old C program. If you try to use a std::vector you will choke approaching maximum RAM as realloc() can grow and shrink in place (grow is contingent on heap while shrink is guaranteed to work) while std::vector cannot do so.
In particular:
#include <iostream>
#include <tuple>
// TODO perhaps you want an auto-management class for this tuple.
// I'm not about to type up the whole auto_ptr, shared_ptr, etc.
// Mostly you don't do this enough to have to worry too hard.
std::tuple<char *, size_t> getarray(std::istream &input)
{
size_t fsize = 0;
size_t asize = 0;
size_t offset = 0;
size_t terminator = (size_t)-1;
char *buf = malloc(asize = 8192);
if (!buf) throw std::bad_alloc();
char *btmp;
char shift = 1;
do {
try {
input.read(buf + offset, asize - fsize);
} catch (...) {
free(buf);
throw;
}
if (input.gcount == 0) {
btmp = realloc(buf, bufsize);
if (btmp) buf = btmp;
return std::tuple<char *, size_t>(buf, fsize);
}
offset += input.gcount;
fsize += offset;
if (fsize == asize) {
if (shift) {
if ((asize << 1) == 0)
shift = 0;
else {
btmp = realloc(buf, asize << 1);
if (!btmp)
shift = 0;
else {
asize <<= 1;
buf = btmp;
}
}
if (!shift) {
btmp = realloc(buf, asize += 8192);
if (!btmp) {
free(buf);
throw std::bad_alloc();
}
}
}
}
} while (terminator - offset > fsize);
free(buf);
// Or perhaps something suitable.
throw "File too big to fit in size_t";
}

Related

sprintf buffer issue, wrong assignment to char array

I got an issue with sprintf buffer.
As you can see in the code down below I'm saving with sprintf a char array to the buffer, so pFile can check if there's a file named like that in the folder. If it's found, the buffer value will be assigned to timecycles[numCycles], and numCycles will be increased. Example: timecycles[0] = "timecyc1.dat". It works well, and as you can see in the console output it recognizes that there are only timecyc1.dat and timecyc5.dat in the folder. But as long as I want to read timecycles with a for loop, both indexes have the value "timecyc9.dat", eventhough it should be "timecyc1.dat" for timecycles[0] and "timecyc5.dat" for timecycles1. Second thing is, how can I write the code so readTimecycles() returns char* timecycles, and I could just initialize it in the main function with char* timecycles[9] = readTimecycles() or anything like that?
Console output
#include <iostream>
#include <cstdio>
char* timecycles[9];
void readTimecycles()
{
char buffer[256];
int numCycles = 0;
FILE* pFile = NULL;
for (int i = 1; i < 10; i++)
{
sprintf(buffer, "timecyc%d.dat", i);
pFile = fopen(buffer, "r");
if (pFile != NULL)
{
timecycles[numCycles] = buffer;
numCycles++;
std::cout << buffer << std::endl; //to see if the buffer is correct
}
}
for (int i = 0; i < numCycles; i++)
{
std::cout << timecycles[i] << std::endl; //here's the issue with timecyc9.dat
}
}
int main()
{
readTimecycles();
return 0;
}
With the assignment
timecycles[numCycles] = buffer;
you make all pointers point to the same buffer, since you only have a single buffer.
Since you're programming in C++ you could easily solve your problem by using std::string instead.
If I would remake your code into something a little-more C++-ish and less C-ish, it could look something like
std::array<std::string, 9> readTimeCycles()
{
std::array<std::string, 9> timecycles;
for (size_t i = 0; i < timecycles.size(); ++i)
{
// Format the file-name
std::string filename = "timecyc" + std::to_string(i + 1) + ".dat";
std::ifstream file(filename);
if (file)
{
// File was opened okay
timecycles[i] = filename;
}
}
return timecycles;
}
References:
std::array
std::string
std::to_string
std::ifstream
The fundamental problem is that your notion of a string doesn't match what a 'char array' is in C++. In particular you think that because you assign timecycles[numCycles] = buffer; somehow the chars of the char array are copied. But in C++ all that is being copied is a pointer, so timecycles ends up with multiple pointers to the same buffer. And that's not to mention the problem you will have that when you exit the readTimecycles function. At that point you will have multiple pointers to a buffer which no longer exists as it gets destroyed when you exit the readTimecycles function.
The way to fix this is to use C++ code that does match your expectations. In particular a std::string will copy in the way you expect it to. Here's how you can change your code to use std::string
#include <string>
std::string timecycles[9];
timecycles[numCycles] = buffer; // now this really does copy a string

Read binary file and count specific number c++

Hey everyone I've been looking everywhere for insight on how to do this particular assignment. I saw something similar but it didn't have a clear explanation. I'm trying to read a bin file and count the number of times a specific number appears. I saw examples of this using a .txt file and it seemed very straight forward using getline. I tried to replicate the similar structure but using a binary file.
int main() {
int searching = 3;
int counter = 0;
unsigned char * memblock;
long long int size;
//open bin file
ifstream file;
file.open("threesData.bin", ios:: in | ios::binary | ios::ate);
//read bin file
if (file.is_open()) {
cout << "it opened\n";
size = file.tellg();
memblock = new unsigned char[size];
file.seekg(0, ios::beg);
file.read((char * ) memblock, size);
while (file.read((char * ) memblock, size)) {
for (int i = 0; i < size; i++) {
(int) memblock[i];
if (memblock[i] == searching) {
counter++;
}
}
}
}
file.close();
cout << "The number " << searching << " appears ";
cout << counter << " times!";
return 0;
}
When I run the program it's clear that it opens but it doesn't count the number I'm searching for. What am I doing wrong?
You seem to be thinking this through but here's how I would go about doing it.
Initialize a buffer with a sensible size.
Cast it to integers, so you can do array[size_t] syntax for simpler arithmetic.
Open the stream, and read while the stream is valid.
Convert the number of read bytes to the number of ints you would expect.
Increment the counter for each character you find that is valid.
Code
#include <fstream>
#include <iostream>
bool check_character(int value)
{
return value == 3;
}
int main(void)
{
// choose the size, cast a pointer as an int type, and initialize
// our counter
static constexpr size_t size = 4096;
char* buffer = new char[size];
int* ints = (int*) buffer;
size_t counter = 0;
// create our stream,
std::ifstream stream("file.bin", std::ios_base::binary);
while (stream) {
// keep reading while the stream is valid
stream.read(buffer, size);
auto count = stream.gcount();
// we only want to go to the last valid integer
// if we expect the file to be only integers,
// we could do `assert(count % sizeof(int) == 0);
// otherwise, we may have trailing characters
// if we have trailing characters, we may want to move them
// to the front of the buffer....
auto chars = count / sizeof(int); // floor division
for (size_t i = 0; i < chars; ++i) {
// false == 0, true == 1, so we can just add
// if the value is 3
counter += check_character(ints[i]);
}
}
std::cout << "Counter is: " << counter << std::endl;
delete[] buffer;
return 0;
}
As NeilButterworth points out, you could also use a vector. I don't really like this, but "meh".
#include <fstream>
#include <iostream>
#include <vector>
/* ellipsed lines */
int main(void)
{
/* ellipsed lines */
static constexpr size_t size = 4096;
std::vector<int> ints;
ints.resize(size / sizeof(int));
char* buffer = (char*) ints.data();
/* ellipsed lines */
/* ellipsed lines */
std::cout << "Counter is: " << counter << std::endl;
// no delete[]
return 0;
}

How to update struct item in binary files

I have a binary file that i write some struct items to it. Now I want to find and update specific item from file items.
Note that my struct has a vector and its size is not constant.
my struct:
struct mapItem
{
string term;
vector<int> pl;
};
codes that write struct items to file
if (it==hashTable.end())//didn't find
{
vector <int> posting;
posting.push_back(position);
hashTable.insert ( pair<string,vector <int> >(md,posting ) );
mapItem* mi = new mapItem();
mi->term = md;
mi->pl = posting;
outfile.write((char*)mi, sizeof(mi));
}
else//finded
{
}
In else block I want to find and update item with its term(term is unique).
Now I have changed my code like this to serialize my vector.
if (it==hashTable.end())//didn't find
{
vector <int> posting;
posting.push_back(position);
hashTable.insert ( pair<string,vector <int> >(md,posting ) );
mapItem* mi = new mapItem();
mi->term = md;
mi->pl = posting;
if(!outfile.is_open())
outfile.open("sample.dat", ios::binary | ios::app);
size_t size = mi->term.size() + 1;
outfile.write((char*)&size, sizeof(size) );
outfile.write((char*)mi->term.c_str(), size);
size = (int)mi->pl.size() * sizeof(int);
outfile.write((char*)&size, sizeof(size) );
outfile.write((char*)&mi->pl[0], size );
outfile.close();
}
else//finded
{
(it->second).push_back(position);
mapItem* mi = new mapItem();
size_t size;
if(!infile.is_open())
{
infile.open("sample.dat", ios::binary | ios::in);
}
do{
infile.read((char*)&size, sizeof(size) ); // string size
mi->term.resize(size - 1); // make string the right size
infile.read((char*)mi->term.c_str(), size); // may need const_cast
infile.read((char*)&size, sizeof(size) ); // vector size
mi->pl.resize(size / sizeof(int));
infile.read((char*)&mi->pl[0], size );
}while(mi->term != md);
infile.close();
}
Well, my main question still remains: how can I update the data that I found?
Is there a better way to find them?
I evaluated the following solutions:
update in a new file, rename it to the old one in the end
update in the same file with a stream with two file positions, read & write, but I didn't rapidly find support for such a thing
update in the same file with two streams, read & write, but the risk of underlying overwrite is too big for me (even if protected outside against overlaps)
So I choose the first one, the most straightforward anyway.
#include <string>
#include <vector>
#include <fstream>
#include <cstdio>
#include <assert.h>
I added the following function to your struct:
size_t SizeWrittenToFile() const
{
return 2*sizeof(size_t)+term.length()+pl.size()*sizeof(int);
}
The read & write functions are basically same as your, except I choose not to write to string:c_str() pointer (although this ugly solution should work on every known compiles).
bool ReadNext(std::istream& in, mapItem& item)
{
size_t size;
in.read(reinterpret_cast<char*>(&size), sizeof(size_t));
if (!in)
return false;
std::istreambuf_iterator<char> itIn(in);
std::string& out = item.term;
out.reserve(size);
out.clear(); // this is necessary if the string is not empty
for (std::insert_iterator<std::string> itOut(out, out.begin());
in && (out.length() < size); itIn++, itOut++)
*itOut = *itIn;
assert(in);
if (!in)
return false;
in.read(reinterpret_cast<char*>(&size), sizeof(size_t));
if (!in)
return false;
std::vector<int>& out2 = item.pl;
out2.resize(size); // unfortunately reserve doesn't work here
in.read(reinterpret_cast<char*>(&out2[0]), size * sizeof(int));
assert(in);
return true;
}
// a "header" should be added to mark complete data (to write "atomically")
bool WriteNext(std::ostream& out, const mapItem& item)
{
size_t size = item.term.length();
out.write(reinterpret_cast<const char*>(&size), sizeof(size_t));
if (!out)
return false;
out.write(item.term.c_str(), size);
if (!out)
return false;
size = item.pl.size();
out.write(reinterpret_cast<const char*>(&size), sizeof(size_t));
if (!out)
return false;
out.write(reinterpret_cast<const char*>(&item.pl[0]), size * sizeof(int));
if (!out)
return false;
return true;
}
The update functions look like this:
bool UpdateItem(std::ifstream& in, std::ofstream& out, const mapItem& item)
{
mapItem it;
bool result;
for (result = ReadNext(in, it); result && (it.term != item.term);
result = ReadNext(in, it))
if (!WriteNext(out, it))
return false;
if (!result)
return false;
// write the new item content
assert(it.term == item.term);
if (!WriteNext(out, item))
return false;
for (result = ReadNext(in, it); result; result = ReadNext(in, it))
if (!WriteNext(out, it))
return false;
// failure or just the end of the file?
return in.eof();
}
bool UpdateItem(const char* filename, const mapItem& item)
{
std::ifstream in(filename);
assert(in);
std::string filename2(filename);
filename2 += ".tmp";
std::ofstream out(filename2.c_str());
assert(out);
bool result = UpdateItem(in, out, item);
// close them before delete
in.close();
out.close();
int err = 0;
if (result)
{
err = remove(filename);
assert(!err && "remov_140");
result = !err;
}
if (!result)
{
err = remove(filename2.c_str());
assert(!err && "remov_147");
}
else
{
err = rename(filename2.c_str(), filename);
assert(!err && "renam_151");
result = !err;
}
return result;
}
Questions ?
This:
outfile.write((char*)mi, sizeof(mi));
Does not make sense. You're writing the bits of a vector's implementation directly to disk. Some of those bits are extremely likely to be pointers. Pointers written to a file on disk are not useful, because they point to an address space belonging to the process which wrote the file, but won't work in another process reading the same file.
You need to "serialize" your data to the file, e.g. in a for loop writing each element.
You can serialize the struct to a file this way:
write length of string (4 bytes)
write string itself.
write length of vector (in bytes is easier to parse later).
write vector data. &vec[0] is the address of the first element. you can write all elements in ones shot since this buffer is contiguous.
Write:
size_t size = mi->term.size() + 1;
outfile.write((char*)&size, sizeof(size) );
outfile.write((char*)mi->term.c_str(), size);
size = (int)mi->pl.size() * sizeof(int);
outfile.write((char*)&size, sizeof(size) );
outfile.write((char*)&mi->pl[0], size );
Read:
infile.read((char*)&size, sizeof(size) ); // string size
mi->term.resize(size - 1); // make string the right size
infile.read((char*)mi->term.c_str(), size); // may need const_cast
infile.read((char*)&size, sizeof(size) ); // vector size
mi->pl.resize(size / sizeof(int));
infile.read((char*)&mi->pl[0], size );

Read CString from buffer with unknown length?

Let's say I have a file. I read all the bytes into an unsigned char buffer. From there I'm trying to read a c string (null terminated) without knowing it's length.
I tried the following:
char* Stream::ReadCString()
{
char str[0x10000];
int len = 0;
char* pos = (char*)(this->buffer[this->position]);
while(*pos != 0)
str[len++] = *pos++;
this->position += len+ 1;
return str;
}
I thought I could fill up each char in the str array as I went through, checking if the char was null terminated or not. This is not working. Any help?
this->buffer = array of bytes
this->position = position in the array
Are there any other methods to do this? I guess I could run it by the address of the actual buffer:
str[len++] = *(char*)(this->buffer[this->position++]) ?
Update:
My new function:
char* Stream::ReadCString()
{
this->AdvPosition(strlen((char*)&(this->buffer[this->position])) + 1);
return (char*)&(this->buffer[this->position]);
}
and calling it with:
printf( "String: %s\n", s.ReadCString()); //tried casting to char* as well just outputs blank string
Example File:
Check this:
#include <cstring>
#include <iostream>
class A
{
unsigned char buffer[4096];
int position;
public:
A() : position(0)
{
memset(buffer, 0, 4096);
char *pos = reinterpret_cast<char*>(&(this->buffer[50]));
strcpy(pos, "String");
pos = reinterpret_cast<char*>(&(this->buffer[100]));
strcpy(pos, "An other string");
}
const char *ReadString()
{
if (this->position != 4096)
{
while (std::isalpha(this->buffer[this->position]) == false && this->position != 4096)
this->position++;
if (this->position == 4096)
return 0;
void *tmp = &(this->buffer[this->position]);
char *str = static_cast<char *>(tmp);
this->position += strlen(str);
return (str);
}
return 0;
}
};
The reintrepret_cast are only for the init, since you are reading from a file
int main()
{
A test;
std::cout << test.ReadString() << std::endl;
std::cout << test.ReadString() << std::endl;
std::cout << test.ReadString() << std::endl;
}
http://ideone.com/LcPdFD
Edit I have changed the end of ReadString()
str is a local c string. Any referencing pointer to str outsider the function is undefined behavior: Undefined, unspecified and implementation-defined behavior, it might or might not cause notable problem.
Null termination is probably the best way to go as long as you're careful, but the reason its not working for you is most likely because you are returning memory that has been allocated on the stack. This memory is going to be freed as soon as you hit the return which will therefore cause undefined behaviour. Instead, allocate your chars on the heap:
char* str = new char[0x10000];
and free the memory when the caller doesn't need it anymore.
It can be fixed with the following method. I was advancing the position, and then returning the address.
char* Stream::ReadCString()
{
u64 str_len = strlen((char*)&(this->buffer[this->position])) + 1;
this->AdvPosition(str_len);
return (char*)&(this->buffer[this->position - str_len]);
}
Hope this helps anyone.

Compare two files

I'm trying to write a function which compares the content of two files.
I want it to return 1 if files are the same, and 0 if different.
ch1 and ch2 works as a buffer, and I used fgets to get the content of my files.
I think there is something wrong with the eof pointer, but I'm not sure. FILE variables are given within the command line.
P.S. It works with small files with size under 64KB, but doesn't work with larger files (700MB movies for example, or 5MB of .mp3 files).
Any ideas, how to work it out?
int compareFile(FILE* file_compared, FILE* file_checked)
{
bool diff = 0;
int N = 65536;
char* b1 = (char*) calloc (1, N+1);
char* b2 = (char*) calloc (1, N+1);
size_t s1, s2;
do {
s1 = fread(b1, 1, N, file_compared);
s2 = fread(b2, 1, N, file_checked);
if (s1 != s2 || memcmp(b1, b2, s1)) {
diff = 1;
break;
}
} while (!feof(file_compared) || !feof(file_checked));
free(b1);
free(b2);
if (diff) return 0;
else return 1;
}
EDIT: I've improved this function with the inclusion of your answers. But it's only comparing first buffer only -> but with an exception -> I figured out that it stops reading the file until it reaches 1A character (attached file). How can we make it work?
EDIT2: Task solved (working code attached). Thanks to everyone for the help!
If you can give up a little speed, here is a C++ way that requires little code:
#include <fstream>
#include <iterator>
#include <string>
#include <algorithm>
bool compareFiles(const std::string& p1, const std::string& p2) {
std::ifstream f1(p1, std::ifstream::binary|std::ifstream::ate);
std::ifstream f2(p2, std::ifstream::binary|std::ifstream::ate);
if (f1.fail() || f2.fail()) {
return false; //file problem
}
if (f1.tellg() != f2.tellg()) {
return false; //size mismatch
}
//seek back to beginning and use std::equal to compare contents
f1.seekg(0, std::ifstream::beg);
f2.seekg(0, std::ifstream::beg);
return std::equal(std::istreambuf_iterator<char>(f1.rdbuf()),
std::istreambuf_iterator<char>(),
std::istreambuf_iterator<char>(f2.rdbuf()));
}
By using istreambuf_iterators you push the buffer size choice, actual reading, and tracking of eof into the standard library implementation. std::equal returns when it hits the first mismatch, so this should not run any longer than it needs to.
This is slower than Linux's cmp, but it's very easy to read.
Here's a C++ solution. It seems appropriate since your question is tagged as C++. The program uses ifstream's rather than FILE*'s. It also shows you how to seek on a file stream to determine a file's size. Finally, it reads blocks of 4096 at a time, so large files will be processed as expected.
// g++ -Wall -Wextra equifile.cpp -o equifile.exe
#include <iostream>
using std::cout;
using std::cerr;
using std::endl;
#include <fstream>
using std::ios;
using std::ifstream;
#include <exception>
using std::exception;
#include <cstring>
#include <cstdlib>
using std::exit;
using std::memcmp;
bool equalFiles(ifstream& in1, ifstream& in2);
int main(int argc, char* argv[])
{
if(argc != 3)
{
cerr << "Usage: equifile.exe <file1> <file2>" << endl;
exit(-1);
}
try {
ifstream in1(argv[1], ios::binary);
ifstream in2(argv[2], ios::binary);
if(equalFiles(in1, in2)) {
cout << "Files are equal" << endl;
exit(0);
}
else
{
cout << "Files are not equal" << endl;
exit(1);
}
} catch (const exception& ex) {
cerr << ex.what() << endl;
exit(-2);
}
return -3;
}
bool equalFiles(ifstream& in1, ifstream& in2)
{
ifstream::pos_type size1, size2;
size1 = in1.seekg(0, ifstream::end).tellg();
in1.seekg(0, ifstream::beg);
size2 = in2.seekg(0, ifstream::end).tellg();
in2.seekg(0, ifstream::beg);
if(size1 != size2)
return false;
static const size_t BLOCKSIZE = 4096;
size_t remaining = size1;
while(remaining)
{
char buffer1[BLOCKSIZE], buffer2[BLOCKSIZE];
size_t size = std::min(BLOCKSIZE, remaining);
in1.read(buffer1, size);
in2.read(buffer2, size);
if(0 != memcmp(buffer1, buffer2, size))
return false;
remaining -= size;
}
return true;
}
When the files are binary, use memcmp not strcmp as \0 might appear as data.
Since you've allocated your arrays on the stack, they are filled with random values ... they aren't zeroed out.
Secondly, strcmp will only compare to the first NULL value, which, if it's a binary file, won't necessarily be at the end of the file. Therefore you should really be using memcmp on your buffers. But again, this will give unpredictable results because of the fact that your buffers were allocated on the stack, so even if you compare to files that are the same, the end of the buffers past the EOF may not be the same, so memcmp will still report false results (i.e., it will most likely report that the files are not the same when they are because of the random values at the end of the buffers past each respective file's EOF).
To get around this issue, you should really first measure the length of the file by first iterating through the file and seeing how long the file is in bytes, and then using malloc or calloc to allocate the buffers you're going to compare, and re-fill those buffers with the actual file's contents. Then you should be able to make a valid comparison of the binary contents of each file. You'll also be able to work with files larger than 64K at that point since you're dynamically allocating the buffers at run-time.
Switch's code looks good to me, but if you want an exact
comparison the while condition and the return need to be altered:
int compareFile(FILE* f1, FILE* f2) {
int N = 10000;
char buf1[N];
char buf2[N];
do {
size_t r1 = fread(buf1, 1, N, f1);
size_t r2 = fread(buf2, 1, N, f2);
if (r1 != r2 ||
memcmp(buf1, buf2, r1)) {
return 0; // Files are not equal
}
} while (!feof(f1) && !feof(f2));
return feof(f1) && feof(f2);
}
Better to use fread and memcmp to avoid \0 character issues. Also, the !feof checks really should be || instead of && since there's a small chance that one file is bigger than the other and the smaller file is divisible by your buffer size..
int compareFile(FILE* f1, FILE* f2) {
int N = 10000;
char buf1[N];
char buf2[N];
do {
size_t r1 = fread(buf1, 1, N, f1);
size_t r2 = fread(buf2, 1, N, f2);
if (r1 != r2 ||
memcmp(buf1, buf2, r1)) {
return 0;
}
} while (!feof(f1) || !feof(f2));
return 1;
}