reading data in C++ - c++

I am having the matlab code to read binary data:
**nfft = 256;
navg = 1024;
nsamps = navg * nfft;
f_s = 8e6;
nblocks = floor(10 / (nsamps / f_s));
for i = 1:nblocks
nstart = 1 + (i - 1) * nsamps;
fid = fopen('data.dat'); % binary data and 320 MB
fseek(fid,4 * nstart,'bof');
y = fread(fid,[2,nsamps],'short');
x = complex(y(1,:),y(2,:));
end**
it will give me complex data with the length up to 8e6.
I am trying to write C++ to do the same function what matab does, but I could not get all the data or they are not the same original.
Can anyone help for ideals?
Here is my C++ code which I am working on.
Thank you so much.
#include <cstdio>
#include <cstring>
#include <iostream>
#include <complex>
#include <vector>
#include <stdlib.h>
struct myfunc{
char* name;
};
int main() {
FILE* r = fopen("data.bin", "rb");
fread( w, sizeof(int), 30, r);
fread(&c, sizeof(myfunc),1,r);
for(int i=0; i < 30; i++){
cout<< i << ". " << w[i] << endl;
}
return 0;
}

Based on comment
the c I called from the struct myfunc and the w is the vector. so they are will be : int w[40]; myfunc c;
fread(&c, sizeof(myfunc),1,r);
will read one pointer's worth of data from file stream r into c. This will not be particularly useful as whatever address myfunc.name pointed at when the file was written will almost certainly be invalid when the file is read back.
Solution: Serialize myfunc.name when writing to the file and deserialize it when reading. Insufficient information is in the question to suggest how best to do this. I would store the string Pascal style and prepend the length of myfunc.name to make reading it back easier:
int len = strlen(myfunc.name);
fwrite(&len, sizeof(len), 1, outfile); // write length
fwrite(myfunc.name, len, 1, outfile); // write string
and read it
int len;
fread(&len, sizeof(len), 1, infile); // read length
myfunc.name = new char[len+1]; // size string with space for terminator
fwrite(myfunc.name, len, 1, infile); // read string
myfunc.name[len] = '\0'; // terminate string
Note the above code completely ignores endian and error handling.

Related

How can I get the latest changes in a file using ifstream?

It's a real-time capture system, I need to get the latest changes from a file which is occasionally edited(mostly add content) by other applications.
In other words, how can I get content that added in the period when I open it without reopening the file?
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(){
ifstream tfile("temp.txt",ios::in);
if(!tfile){
cout<<"open failed"<<endl;
return 0;
}
string str;
while(1){
if(tfile.eof())
continue;
getline(tfile,str);
cout<<str<<endl;
}
tfile.close();
}
C++ / C Solution
If you are looking for a c++ solution you can use the following functions that I had created a while back:
#include <iostream>
#include <string>
// For sleep function
#ifdef _WIN32
#include <Windows.h>
#else
#include <unistd.h>
#endif
using namespace std;
void watchLogs(const char *FILENAME) {
FILE * f;
unsigned size = 0;
f = fopen(FILENAME , "r");
char c;
while (true) {
if (!size) { // will print content of your log file. If you just want the updates you can remove the current content except the first two lines;
fseek(f, 0, SEEK_END);
size =(unsigned long)ftell(f) ;
fseek (f, 0, SEEK_SET);
char buffer[size + 1];
fread ( buffer, 1, size, f );
buffer[size] = '\0';
cout << buffer << "\n";
}
else if ((c = (char)fgetc(f)) >= 0) {
fseek(f, 0, SEEK_END); // reach end of file
int BUFFER_SIZE =(unsigned long)ftell(f) - size; // save the length of the update to your logs
char buffer[BUFFER_SIZE + 1]; // prepare a buffer to print the characters
fseek(f,-BUFFER_SIZE,SEEK_END); // rewind BUFFER_SIZE characters before the EOF
int i = 0;
do {buffer[i++] = (char)fgetc(f);} while(i < BUFFER_SIZE); // copy to buffer
buffer[i] = '\0'; // don't forget to NULL terminate your buffer
cout << buffer << "\n";
size += i; // increment the size of the current file
}
}
sleep(3); // updates are checked every 3 seconds to avoid running the cpu at fullspeed, you could set the new logs to show up every minutes or every seconds, up to you.
fclose(f);
}
And you can test it with:
int main(int argc, char **argv) {
if (argc < 2)
return 1;
const char *FILENAME = argv[1];
watchLogs(FILENAME);
return 0;
}
./a.out mysql_binary.log
I could have used stringstreamer but I like that this version would also work with c files with some minor tweaks (can't use string).
I hope you will find it helpful!
NB: This assume that your file will only grow and that the changes will be appended to the end of your file.
NB2: This program is not segfault proof, you may want to check the return of fopen etc
Inotify
If you use Linux you could also potentially go for inotify:
Download inotify: sudo apt-get install -y inotify-tools
Then create the following script mywatch.sh
while inotifywait -e close_write $1; do ./$1; done
Give permission to execute:
add chmox +x mywatch.sh
and call it with ./watchit.sh mysql_binary.log

What are the fastest methods to read from a file in standard C++? [duplicate]

I am currently writing a program in c++ which includes reading lots of large text files. Each has ~400.000 lines with in extreme cases 4000 or more characters per line. Just for testing, I read one of the files using ifstream and the implementation offered by cplusplus.com. It took around 60 seconds, which is way too long. Now I was wondering, is there a straightforward way to improve reading speed?
edit:
The code I am using is more or less this:
string tmpString;
ifstream txtFile(path);
if(txtFile.is_open())
{
while(txtFile.good())
{
m_numLines++;
getline(txtFile, tmpString);
}
txtFile.close();
}
edit 2: The file I read is only 82 MB big. I mainly said that it could reach 4000 because I thought it might be necessary to know in order to do buffering.
edit 3: Thank you all for your answers, but it seems like there is not much room to improve given my problem. I have to use readline, since I want to count the number of lines. Instantiating the ifstream as binary didn't make reading any faster either. I will try to parallelize it as much as I can, that should work at least.
edit 4: So apparently there are some things I can to. Big thank you to sehe for putting so much time into this, I appreciate it a lot! =)
Updates: Be sure to check the (surprising) updates below the initial answer
Memory mapped files have served me well1:
#include <boost/iostreams/device/mapped_file.hpp> // for mmap
#include <algorithm> // for std::find
#include <iostream> // for std::cout
#include <cstring>
int main()
{
boost::iostreams::mapped_file mmap("input.txt", boost::iostreams::mapped_file::readonly);
auto f = mmap.const_data();
auto l = f + mmap.size();
uintmax_t m_numLines = 0;
while (f && f!=l)
if ((f = static_cast<const char*>(memchr(f, '\n', l-f))))
m_numLines++, f++;
std::cout << "m_numLines = " << m_numLines << "\n";
}
This should be rather quick.
Update
In case it helps you test this approach, here's a version using mmap directly instead of using Boost: see it live on Coliru
#include <algorithm>
#include <iostream>
#include <cstring>
// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
const char* map_file(const char* fname, size_t& length);
int main()
{
size_t length;
auto f = map_file("test.cpp", length);
auto l = f + length;
uintmax_t m_numLines = 0;
while (f && f!=l)
if ((f = static_cast<const char*>(memchr(f, '\n', l-f))))
m_numLines++, f++;
std::cout << "m_numLines = " << m_numLines << "\n";
}
void handle_error(const char* msg) {
perror(msg);
exit(255);
}
const char* map_file(const char* fname, size_t& length)
{
int fd = open(fname, O_RDONLY);
if (fd == -1)
handle_error("open");
// obtain file size
struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;
const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");
// TODO close fd at some point in time, call munmap(...)
return addr;
}
Update
The last bit of performance I could squeeze out of this I found by looking at the source of GNU coreutils wc. To my surprise using the following (greatly simplified) code adapted from wc runs in about 84% of the time taken with the memory mapped file above:
static uintmax_t wc(char const *fname)
{
static const auto BUFFER_SIZE = 16*1024;
int fd = open(fname, O_RDONLY);
if(fd == -1)
handle_error("open");
/* Advise the kernel of our access pattern. */
posix_fadvise(fd, 0, 0, 1); // FDADVICE_SEQUENTIAL
char buf[BUFFER_SIZE + 1];
uintmax_t lines = 0;
while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))
{
if(bytes_read == (size_t)-1)
handle_error("read failed");
if (!bytes_read)
break;
for(char *p = buf; (p = (char*) memchr(p, '\n', (buf + bytes_read) - p)); ++p)
++lines;
}
return lines;
}
1 see e.g. the benchmark here: How to parse space-separated floats in C++ quickly?
4000 * 400,000 = 1.6 GB if you're hard drive isn't an SSD you're likely getting ~100 MB/s sequential read. That's 16 seconds just in I/O.
Since you don't elaborate on the specific code your using or how you need to parse these files (do you need to read it line by line, does the system have a lot of RAM could you read the whole file into a large RAM buffer and then parse it?) There's little you can do to speed up the process.
Memory mapped files won't offer any performance improvement when reading a file sequentially. Perhaps manually parsing large chunks for new lines rather than using "getline" would offer an improvement.
EDIT After doing some learning (thanks #sehe). Here's the memory mapped solution I would likely use.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <errno.h>
int main() {
char* fName = "big.txt";
//
struct stat sb;
long cntr = 0;
int fd, lineLen;
char *data;
char *line;
// map the file
fd = open(fName, O_RDONLY);
fstat(fd, &sb);
//// int pageSize;
//// pageSize = getpagesize();
//// data = mmap((caddr_t)0, pageSize, PROT_READ, MAP_PRIVATE, fd, pageSize);
data = mmap((caddr_t)0, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
line = data;
// get lines
while(cntr < sb.st_size) {
lineLen = 0;
line = data;
// find the next line
while(*data != '\n' && cntr < sb.st_size) {
data++;
cntr++;
lineLen++;
}
/***** PROCESS LINE *****/
// ... processLine(line, lineLen);
}
return 0;
}
Neil Kirk, unfortunately I can not reply to your comment (not enough reputation) but I did a performance test on ifstream an stringstream and the performance, reading a text file line by line, is exactly the same.
std::stringstream stream;
std::string line;
while(std::getline(stream, line)) {
}
This takes 1426ms on a 106MB file.
std::ifstream stream;
std::string line;
while(ifstream.good()) {
getline(stream, line);
}
This takes 1433ms on the same file.
The following code is faster instead:
const int MAX_LENGTH = 524288;
char* line = new char[MAX_LENGTH];
while (iStream.getline(line, MAX_LENGTH) && strlen(line) > 0) {
}
This takes 884ms on the same file.
It is just a little tricky since you have to set the maximum size of your buffer (i.e. maximum length for each line in the input file).
As someone with a little background in competitive programming, I can tell you: At least for simple things like integer parsing the main cost in C is locking the file streams (which is by default done for multi-threading). Use the unlocked_stdio versions instead (fgetc_unlocked(), fread_unlocked()). For C++, the common lore is to use std::ios::sync_with_stdio(false) but I don't know if it's as fast as unlocked_stdio.
For reference here is my standard integer parsing code. It's a lot faster than scanf, as I said mainly due to not locking the stream. For me it was as fast as the best hand-coded mmap or custom buffered versions I'd used previously, without the insane maintenance debt.
int readint(void)
{
int n, c;
n = getchar_unlocked() - '0';
while ((c = getchar_unlocked()) > ' ')
n = 10*n + c-'0';
return n;
}
(Note: This one only works if there is precisely one non-digit character between any two integers).
And of course avoid memory allocation if possible...
Do you have to read all files at the same time? (at the start of your application for example)
If you do, consider parallelizing the operation.
Either way, consider using binary streams, or unbffered read for blocks of data.
Use Random file access or use binary mode. for sequential, this is big but still it depends on what you are reading.

C++ Attempt to optimize printing data to binary file for every frame

void demodlg::printData(short* data)
{
FILE* pF;
char buf[50];
snprintf(buf, sizeof(buf), "%s\\%s\\%s%d.binary", "test", "data", "data", frameNum++);
pF = fopen(buf, "wb");
int lines = frameDescr->m_numLines;
int samples = frameDescr->m_pLineTypeDescr[0].m_numSamples;
int l, s;
fprintf(pF, "\t");
for (l = 0; l < lines; l++)
{
fprintf(pF, "%d\t", l);
}
fprintf(pF, "\n");
for (s = 0; s < samples; s++)
{
fprintf(pF, "%d)\t", s);
for (l = 0; l < lines; l++)
{
fprintf(pF, "%d\t", *(data + l * samples + s));
}
fprintf(pF, "\n");
}
fclose(pF);
}
I have the code snippet above which just takes in some data and then writes it out to a binary file. This function gets called about 20-30 times per second, so I'm trying to optimize it as much as possible. Each file that it writes to is about 1 MB in size. Ideally, I'd be able to write 20-30 MB per second. As of now, it's not at that rate.
Does anyone have any ideas on how I can optimize this further?
I originally was writing to a txt file before changing to a binary file, but the different isn't too noticeable, surprisingly.
Also, frameDescr gets updated for every frame so I believe I do need to get access to the lines and samples variables from inside, unfortunately.
I found this post to refer to (Writing a binary file in C++ very fast) but I'm not sure how I can apply it to mine.
Here is a short example of how I would write an array of data to a binary file and how I would read it back.
I do not understand the concept or purpose of lines in your code so I did not attempt to replicate it. If you do have additional data you need to write to allow it to be reconstructed when read I have placed comments to note where you could insert that code.
Keep in mind that the data when written as binary must be read the same way, so if you were writing the text in a particular format to consume it from another program then a binary file will not work for you unless you modify that other program or create an additional step to read the binary data and write the text format before consumption.
Assuming there is a speed advantage to writing the data as binary then adding an additional step to convert the binary data to text format is beneficial because you can do it offline when you're not trying to maintain a particular frame rate.
Normally since you tagged this c++ I would prefer manipulating the data in a vector and perhaps using c++ streams to write and read the data, but I tried to keep this as similar to your code as possible.
#include <cstdio>
#include <stdint.h>
const size_t kNumEntries = 128 * 1024;
void writeData(const char *filename, int16_t *data, size_t numEntries)
{
FILE *f = fopen(filename, "wb");
if (!f)
{
fprintf(stderr, "Error opening file: '%s'\n", filename);
return;
}
//If you have additional data that must be in the file write it here
//either as individual items that are mirrored in the reader,
//or using the pattern showm below for variable sized data.
//Write the number of entries we have to write to the file so the reader
//will know how much memory to allocate how many to read.
fwrite(&numEntries, sizeof(numEntries), 1, f);
//Write the actual data
fwrite(data, sizeof(*data), numEntries, f);
fclose(f);
}
int16_t* readData(const char *filename)
{
FILE *f = fopen(filename, "rb");
if (!f)
{
fprintf(stderr, "Error opening file: '%s'\n", filename);
return 0;
}
//If you have additional data to read, do it here.
//This code whould mirror the writing function.
//Read the number of entries in the file.
size_t numEntries;
fread(&numEntries, sizeof(numEntries), 1, f);
//Allocate memory for the entreis and read them into it.
int16_t *data = new int16_t[sizeof(int16_t) * numEntries];
fread(data, sizeof(*data), numEntries, f);
fclose(f);
return data;
}
int main()
{
int16_t *dataToWrite = new int16_t[sizeof(int16_t) * kNumEntries];
int16_t *dataRead = new int16_t[sizeof(int16_t) * kNumEntries];
for (int i = 0; i < kNumEntries; ++i)
{
dataToWrite[i] = i;
dataRead[i] = 0;
}
writeData("test.bin", dataToWrite, kNumEntries);
dataRead = readData("test.bin");
for (int i = 0; i < kNumEntries; ++i)
{
if (dataToWrite[i] != dataRead[i])
{
fprintf(stderr,
"Data mismatch at entry %d, : dataToWrite = %d, dataRead = %d\n",
i, dataToWrite[i], dataRead[i]);
}
}
delete[] dataRead;
return 0;
}

Program crashes when reading a long text file - "*.exe has stopped working"

RRThe title describes it all. I am reading various files in my program, and once it reaches a relatively large file, the program crashes.
I wrote a shortened version of my program that replicates the issue.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <iostream>
#include <fstream>
char** load_File(char** preComputed, const int lines, const int sLength,
std::string fileName){
//Declarations
FILE *file;
int C = lines+1;
int R = sLength+2;
int i; //Dummy index
int len;
//Create 2-D array on the heap
preComputed = (char**) malloc(C*sizeof(char*));
for(i = 0; i<C; i++) preComputed[i] = (char *) malloc(R*sizeof(char));
//Need to free each element individually later on
//Create temprary char array
char* line = (char *) malloc(R*sizeof(char));
assert(preComputed);
//Open file to read and store values
file = fopen(fileName.c_str(), "r");
if(file == NULL){ perror("\nError opening file"); return NULL;}
else{
i = 0;
while(fgets(line, R, file) != NULL){
//Remove next line
len = R;
if((line[len-1]) == '\n') (line[len-1]) = '\0';
len--; // Decrement length by one because of replacing EOL
// with null terminator
//Copy character set
strcpy(preComputed[i], line);
i++;
}
preComputed[C-1] = NULL; //Append null terminator
free(line);
}
return preComputed;
}
int main(void){
char** preComputed = NULL;
std::string name = "alphaLow3.txt";
system("pause");
preComputed = load_File(preComputed, 17576, 3, name);
if(preComputed == NULL){
std::cout<<"\nAn error has been encountered...";
system("PAUSE");
exit(1);
}
//Free preComputed
for(int y = 0; y < 17576; y++){
free(preComputed[y]);
}
free(preComputed);
}
This program will crash when it is executed. Here are two links to the text files.
alphaLow3.txt
alphaLow2.txt
To run alphaLow2.txt, change the numbers in the load_file call to 676 and 2 respectively.
When this program reads alphaLow2.txt, it executes successfully. However, when it read alphaLow3.txt, it crashes. This file is only 172KB. I have files that are a MB or larger. I thought I allocated enough memory, but I may be missing something.
The program is supposed to be in C, but I've included some C++ functions for ease.
Any constructive input is appreciated.
You must confirm your file length.In the alphaLow3.txt file, a total of 35152 lines.But in your program,you set the line 17576.This is the main reason leading to crash.
In addition,this sentence
if((line[len-1]) == '\n') (line[len-1]) = '\0';
fgets will make the last character NULL.For example the first line should be " 'a''a''a''\n''null' ".So you should do it like this.
if((line[len-2]) == '\n') (line[len-2]) = '\0';

Reading and printing an entire file in binary mode using C++

a follow up to my previous question (Reading an entire file in binary mode using C++)
After reading a jpg file in binary mode, the result of the read operation is always 4 bytes. The code is:
FILE *fd = fopen("c:\\Temp\\img.jpg", "rb");
if(fd == NULL) {
cerr << "Error opening file\n";
return;
}
fseek(fd, 0, SEEK_END);
long fileSize = ftell(fd);
int *stream = (int *)malloc(fileSize);
fseek(fd, 0, SEEK_SET);
int bytes_read = fread(stream, fileSize, 1, fd);
printf("%x\n", *stream);
fclose(fd);
The second last printf statement is always printing the first 4 bytes and not the entire file contents. How can I print the entire content of the jpg file?
Thanks.
You want it in C++? This opens a file, reads the entire contents into an array and prints the output to the screen:
#include <fstream>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
void hexdump(void *ptr, int buflen)
{
unsigned char *buf = (unsigned char*)ptr;
int i, j;
for (i=0; i<buflen; i+=16) {
printf("%06x: ", i);
for (j=0; j<16; j++) {
if (i+j < buflen)
printf("%02x ", buf[i+j]);
else
printf(" ");
}
printf(" ");
for (j=0; j<16; j++) {
if (i+j < buflen)
printf("%c", isprint(buf[i+j]) ? buf[i+j] : '.');
}
printf("\n");
}
}
int main()
{
ifstream in;
in.open("C:\\ISO\\ITCHOUT.txt", ios::in | ios::binary);
if(in.is_open())
{
// get the starting position
streampos start = in.tellg();
// go to the end
in.seekg(0, std::ios::end);
// get the ending position
streampos end = in.tellg();
// go back to the start
in.seekg(0, std::ios::beg);
// create a vector to hold the data that
// is resized to the total size of the file
std::vector<char> contents;
contents.resize(static_cast<size_t>(end - start));
// read it in
in.read(&contents[0], contents.size());
// print it out (for clarity)
hexdump(contents.data(), contents.size());
}
}
stream is a pointer to an int (the first element of the array you allocated1). *stream dereferences that pointer and gives you the first int.
A pointer is not an array. A pointer is not a buffer. Therefore, it carries no information about the size of the array it points to. There is no way you can print the entire array by providing only a pointer to the first element.
Whatever method you use to print that out, you'll need to provide the size information along with the pointer.
C++ happens to have a pointer + size package in its standard library: std::vector. I would recommend using that. Alternatively, you can just loop through the array yourself (which means using the size information) and print all its elements.
1Make sure the size of the file is a multiple of sizeof(int)!
Something like the following should do it. bytes_read() gives you the number of blocks read, in your case the block size is the file size so only one block can be read.
You should use a for loop to print the whole file. You're only printing one pointer address.
char *stream = (char *)malloc(fileSize);
fseek(fd, 0, SEEK_SET);
int bytes_read = fread(stream, fileSize, 1, fd);
for(int i=0; i<fileSize; i++){
printf("%d ", stream[i]);
}
I print the chars as numbers as binary data is not readable in the console. I don't know how you wanted the data to be formatted.
This is just meant as reference to your sample. You should really consider using Chad's sample. This is a far worse solution (as mixing C/C++ far too much) just for sake of completeness.