I want to read a big data, and then write it to a new file using Qt.
I have tried to read a big file. And the big file only have one line. I test with readAll() and readLine().
If the data file is about 600MB, my code can run although it is slow.
If the data file is about 6GB, my code will fail.
Can you give me some suggestions?
Update
My test code is as following:
#include <QApplication>
#include <QFile>
#include <QTextStream>
#include <QTime>
#include <QDebug>
#define qcout qDebug()
void testFile07()
{
QFile inFile("../03_testFile/file/bigdata03.txt");
if (!inFile.open(QIODevice::ReadOnly | QIODevice::Text))
{
qcout << inFile.errorString();
return ;
}
QFile outFile("../bigdata-read-02.txt");
if (!outFile.open(QIODevice::WriteOnly | QIODevice::Truncate))
return;
QTime time1, time2;
time1 = QTime::currentTime();
while(!inFile.atEnd())
{
QByteArray arr = inFile.read(3*1024);
outFile.write(arr);
}
time2 = QTime::currentTime();
qcout << time1.msecsTo(time2);
}
void testFile08()
{
QFile inFile("../03_testFile/file/bigdata03.txt");
if (!inFile.open(QIODevice::ReadOnly | QIODevice::Text))
return;
QFile outFile("../bigdata-readall-02.txt");
if (!outFile.open(QIODevice::WriteOnly | QIODevice::Truncate))
return;
QTime time1, time2, time3;
time1 = QTime::currentTime();
QByteArray arr = inFile.readAll();
qcout << arr.size();
time3 = QTime::currentTime();
outFile.write(inFile.readAll());
time2 = QTime::currentTime();
qcout << time1.msecsTo(time2);
}
int main(int argc, char *argv[])
{
testFile07();
testFile08();
return 0;
}
After my test, I share my experience about it.
read() and readAll() seem to be the same fast; more actually, read() is slightly faster.
The true difference is writing.
The size of file is 600MB:
Using read function, read and write the file cost about 2.1s, with 875ms for reading
Using readAll function, read and write the file cost about 10s, with 907ms for reading
The size of file is 6GB:
Using read function, read and write the file cost about 162s, with 58s for reading
Using readAll function, get the wrong answer 0. Fail to run well.
Open both files as QFiles. In a loop, read a fixed number of bytes, say 4K, into an array from the input file, then write that array into the output file. Continue until you run out of bytes.
However, if you just want to copy a file verbatim, you can use QFile::copy
You can use QFile::map and use the pointer to the mapped memory to write in a single shot to the target file:
void copymappedfile(QString in_filename, QString out_filename)
{
QFile in_file(in_filename);
if(in_file.open(QFile::ReadOnly))
{
QFile out_file(out_filename);
if(out_file.open(QFile::WriteOnly))
{
const qint64 filesize = in_file.size();
uchar * mem = in_file.map(0, filesize, QFileDevice::MapPrivateOption);
out_file.write(reinterpret_cast<const char *>(mem) , filesize);
in_file.unmap(mem);
out_file.close();
}
in_file.close();
}
}
One thing to keep in mind:
With read() you specify a maximum size for the currently read chunk (in your example 3*1024 bytes), with readAll() you tell the program to read the entire file at once.
In the first case you (repeatedly) put 3072 Bytes on the stack, write them and they get removed from the stack once the current loop iteration ends. In the second case you push the entire file on the stack. Pushing 600MB on the stack at once might be the reason for your performance issues. If you try to put 6GB on the stack at once you may just run out of memory/adress space - causing your program to crash.
Related
I am trying to update a data to two JSON files by providing the filename at run time.
This is the updateTOFile function which will update data stored in JSON variable to two different in two different threads.
void updateToFile()
{
while(runInternalThread)
{
std::unique_lock<std::recursive_mutex> invlock(mutex_NodeInvConf);
FILE * pFile;
std::string conff = NodeInvConfiguration.toStyledString();
pFile = fopen (filename.c_str(), "wb");
std::ifstream file(filename);
fwrite (conff.c_str() , sizeof(char), conff.length(), pFile);
fclose (pFile);
sync();
}
}
thread 1:
std::thread nt(&NodeList::updateToFile,this);
thread 2:
std::thread it(&InventoryList::updateToFile,this);
now it's updating the files even if no data has changed from the previous execution. I want to update the file only if there's any change compared to previously stored one. if there is no change then it should print the data is same.
Can anyone please help with this??
Thanks.
You can check if it has changed before writing.
void updateToFile()
{
std::string previous;
while(runInternalThread)
{
std::unique_lock<std::recursive_mutex> invlock(mutex_NodeInvConf);
std::string conf = NodeInvConfiguration.toStyledString();
if (conf != previous)
{
// TODO: error handling missing like in OP
std::ofstream file(filename);
file.write (conf.c_str() , conf.length());
file.close();
previous = std::move(conf);
sync();
}
}
}
However such constant polling in loop is likely inefficient. You may add Sleeps to make it less diligent. Other option is to track by NodeInvConfiguration itself if it has changed and clear that flag when storing.
I have a recording application that is reading data from a network stream and writing it to file. It works very well, but I would like to display the file size as the data is being written. Every second the gui thread updates the status bar to update the displayed time of recording. At this point I would also like to display the current file size.
I originally consulted this question and have tried both the stat method:
struct stat stat_buf;
int rc = stat(recFilename.c_str(), &stat_buf);
std::cout << recFilename << " " << stat_buf.st_size << "\n";
(no error checking for simplicity) and the fseek method:
FILE *p_file = NULL;
p_file = fopen(recFilename.c_str(),"rb");
fseek(p_file,0,SEEK_END);
int size = ftell(p_file);
fclose(p_file);
but either way, I get 0 for the file size. When I go back and look at the file I write to, the data is there and the size is correct. The recording is happening on a separate thread.
I know that bytes are being written because I can print the size of the data as it is written in conjunction with the output of the methods shown above.
The filename plus the 0 is what I print out from the GUI thread. 'Bytes written x' is out of the recording thread.
You can read all about C++ file manipulations here http://www.cplusplus.com/doc/tutorial/files/
This is an example of how I would do it.
#include <fstream>
std::ifstream::pos_type filesize(const char* file)
{
std::ifstream in(file, std::ifstream::ate | std::ifstream::binary);
return in.tellg();
}
Hope this helps.
As a desperate alternative, you can use a ftell in "the write data thread" or maybe a variable to track the amount of data that is written, but going to the real problem, you must be making a mistake, maybe fopen never opens the file, or something like that.
I'll copy a test code to show that this works at least in a singlethread app
int _tmain(int argc, _TCHAR* argv[])
{
FILE * mFile;
FILE * mFile2;
mFile = fopen("hi.txt", "a+");
// fseek(mFile, 0, SEEK_END);
// ## this is to make sure that fputs and fwrite works equal
// fputs("fopen example", mFile);
fwrite("fopen ex", 1, 9, mFile);
fseek(mFile, 0, SEEK_END);
std::cout << ftell(mFile) << ":";
mFile2 = fopen("hi.txt", "rb");
fseek(mFile2, 0, SEEK_END);
std::cout << ftell(mFile2) << std::endl;
fclose(mFile2);
fclose(mFile);
getchar();
return 0;
}
Just use freopen function before calling stat. It seems freopen refreshes the file length.
I realize this post is rather old at this point, but in response to #TylerD007, while that works, that is incredibly expensive to do if all you're trying to do is get the amount of bytes written.
In C++17 and later, you can simply use the <filesystem> header and call
auto fileSize {std::filesystem::file_size(filePath)}; and now variable fileSize holds the actual size of the file.
I have a problem in loading music from memory with SDL_mixer.
The following "minimal" example including a bit of error checking will always crash with an access violation in Music::play.
#include <SDL\SDL_mixer.h>
#include <SDL\SDL.h>
#include <vector>
#include <iostream>
#include <string>
#include <fstream>
class Music {
public:
void play(int loops = 1);
SDL_RWops* m_rw;
std::vector<unsigned char> m_file;
Mix_Music * m_music = nullptr;
};
void Music::play(int loops) {
if (Mix_PlayMusic(m_music, loops) == -1)
std::cout << "Error playing music " + std::string(Mix_GetError()) + " ...\n";
}
void readFileToBuffer(std::vector<unsigned char>& buffer, std::string filePath) {
std::ifstream file(filePath, std::ios::binary);
file.seekg(0, std::ios::end);
int fileSize = file.tellg();
file.seekg(0, std::ios::beg);
fileSize -= file.tellg();
buffer.resize(fileSize);
file.read((char *)&(buffer[0]), fileSize);
file.close();
}
void writeFileToBuffer(std::vector<unsigned char>& buffer, std::string filePath) {
std::ofstream file(filePath, std::ios::out | std::ios::binary);
for (size_t i = 0; i < buffer.size(); i++)
file << buffer[i];
file.close();
}
Music loadMusic(std::string filePath) {
Music music;
readFileToBuffer(music.m_file, filePath);
music.m_rw = SDL_RWFromMem(&music.m_file[0], music.m_file.size());
// Uncommenting the next block runs without problems
/*
writeFileToBuffer(music.m_file, filePath);
music.m_rw = SDL_RWFromFile(filePath.c_str(), "r");
*/
if (music.m_rw == nullptr)
std::cout << "Error creating RW " + std::string(Mix_GetError()) + " ...\n";
music.m_music = Mix_LoadMUSType_RW(music.m_rw, Mix_MusicType::MUS_OGG, SDL_FALSE);
if (music.m_music == nullptr)
std::cout << "Error creating music " + std::string(Mix_GetError()) + " ...\n";
return music;
}
int main(int argc, char** argv) {
SDL_Init(SDL_INIT_AUDIO);
Mix_Init(MIX_INIT_MP3 | MIX_INIT_OGG);
Mix_OpenAudio(MIX_DEFAULT_FREQUENCY, MIX_DEFAULT_FORMAT, MIX_DEFAULT_CHANNELS, 1024);
Music music = loadMusic("Sound/music/XYZ.ogg");
music.play();
std::cin.ignore();
return 0;
}
My ArchiveManager works for sure, which can also be seen because ucommenting the block that writes the buffer to a file and creating an SDL_RW from this will run just fine.
The music file I load is just assumed to be an ogg file, which it is in this case, hence creating an SDL_RW from the file works fine. Meaning nothing crashes and the music plays properly start to end.
The music class is from my understanding much too big. I am just keeping the buffer m_file around, as well as the SDL_RW to make sure that the problem does not come from that data being freed. Running Mix_LoadMUS_RW with SDL_FALSE should also make sure that the RW is not freed.
Notably a similar example loading a wav file from the same archive using Mix_LoadWAV_RW works just fine:
Mix_Chunk * chunk;
std::vector<unsigned char> fileBuf = ArchiveManager::loadFileFromArchive(filePath);
chunk = Mix_LoadWAV_RW(SDL_RWFromConstMem(&fileBuf[0], fileBuf.size()), SDL_TRUE);
And here I am not even keeping the buffer around until calling the Mix_PlayCannel. Also here I am calling the load function with SDL_TRUE because I am not creating an explicit SDL_RW. Trying the similar thing for loading the music will not make a difference.
I studied the SDL_mixer source code, but it didn't help me. Maybe my knowledge is not sufficient or maybe I missed something crucial.
To get to the point: Where does that access violation come from and how can I prevent it?
EDIT: Changed the example code so it is straightforward for anyone to reproduce it. So no ArchiveManager or anything like that, just reading an ogg directly into memory. The crucial parts are just the few lines in loadMusic.
Music music = loadMusic("Sound/music/XYZ.ogg");
music.play();
The first line will copy the object of type class Music on the right into the new one called music. This will result in the vector m_file being copied, including the data in it. The data for the vector of our new object music will obviously be stored at a different memory location than that of the vector of the object returned by loadMusic. Then the object returned by loadMusic will be deleted from the stack and the data of it's vector will be freed thus invalidating the previously created Mix_Music object and causing an access violation on the second line.
This can be remedied by only ever creating one Music object, for example by creating it via new on the heap and having loadMusic return a pointer to that object.
Music* music = loadMusic("Sound/music/XYZ.ogg");
music->play();
It might anyway be the better choice to allocate the memory for a whole file on the heap instead of on the stack, though I would guess that vectors do this internally.
So short version, it was (what I consider) an rookie mistake and I was too fixated on blaming SDL_Mixer. Bad idea.
I've been having a problem that I not been able to solve as of yet. This problem is related to reading files, I've looked at threads even on this website and they do not seem to solve the problem. That problem is reading files that are larger than a computers system memory. Simply when I asked this question a while ago I was referred too using the following code.
string data("");
getline(cin,data);
std::ifstream is (data);//, std::ifstream::binary);
if (is)
{
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
// allocate memory:
char * buffer = new char [length];
// read data as a block:
is.read (buffer,length);
is.close();
// print content:
std::cout.write (buffer,length);
delete[] buffer;
}
system("pause");
This code works well apart from the fact that it eats memory like fat kid in a candy store.
So after a lot of ghetto and unrefined programing, I was able to figure out a way to sort of fix the problem. However I more or less traded one problem for another in my quest.
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <stdio.h>
#include <stdlib.h>
#include <iomanip>
#include <windows.h>
#include <cstdlib>
#include <thread>
using namespace std;
/*======================================================*/
string *fileName = new string("tldr");
char data[36];
int filePos(0); // The pos of the file
int tmSize(0); // The total size of the file
int split(32);
char buff;
int DNum(0);
/*======================================================*/
int getFileSize(std::string filename) // path to file
{
FILE *p_file = NULL;
p_file = fopen(filename.c_str(),"rb");
fseek(p_file,0,SEEK_END);
int size = ftell(p_file);
fclose(p_file);
return size;
}
void fs()
{
tmSize = getFileSize(*fileName);
int AX(0);
ifstream fileIn;
fileIn.open(*fileName, ios::in | ios::binary);
int n1,n2,n3;
n1 = tmSize / 32;
// Does the processing
while(filePos != tmSize)
{
fileIn.seekg(filePos,ios_base::beg);
buff = fileIn.get();
// To take into account small files
if(tmSize < 32)
{
int Count(0);
char MT[40];
if(Count != tmSize)
{
MT[Count] = buff;
cout << MT[Count];// << endl;
Count++;
}
}
// Anything larger than 32
else
{
if(AX != split)
{
data[AX] = buff;
AX++;
if(AX == split)
{
AX = 0;
}
}
}
filePos++;
}
int tz(0);
filePos = filePos - 12;
while(tz != 2)
{
fileIn.seekg(filePos,ios_base::beg);
buff = fileIn.get();
data[tz] = buff;
tz++;
filePos++;
}
fileIn.close();
}
void main ()
{
fs();
cout << tmSize << endl;
system("pause");
}
What I tried to do with this code is too work around the memory issue. Rather than allocating memory for a large file that simply does not exist on a my system, I tried to use the memory I had instead which is about 8gb, but I only wanted to use maybe a few Kilobytes of it if at all possible.
To give you a layout of what I am talking about I am going to write a line of text.
"Hello my name is cake please give me cake"
Basically what I did was read said piece of text letter by letter. Then I put those letters into a box that could store 32 of them, from there I could use something like xor and then write them onto another file.
The idea in a way works but it is horribly slow and leaves off parts of files.
So basically how can I make something like this work without going slow or cutting off files. I would love to see how xor works with very large files.
So if anyone has a better idea than what I have, then I would be very grateful for the help.
To read and process the file piece-by-piece, you can use the following snippet:
// Buffer size 1 Megabyte (or any number you like)
size_t buffer_size = 1<<20;
char *buffer = new char[buffer_size];
std::ifstream fin("input.dat");
while (fin)
{
// Try to read next chunk of data
fin.read(buffer, buffer_size);
// Get the number of bytes actually read
size_t count = fin.gcount();
// If nothing has been read, break
if (!count)
break;
// Do whatever you need with first count bytes in the buffer
// ...
}
delete[] buffer;
The buffer size of 32 bytes, as you are using, is definitely too small. You make too many calls to library functions (and the library, in turn, makes calls (although probably not every time) to OS, which are typically slow, since they cause context-switching). There is also no need of tell/seek.
If you don't need all the file content simultaneously, reduce the working set first - like a set of about 32 words, but since XOR can be applied sequentially, you may further simplify the working set with constant size, like 4 kilo-bytes.
Now, you have the option to use file reader is.read() in a loop and process a small set of data each iteration, or use memmap() to map the file content as memory pointer which you can perform both read and write operations.
I'm using code like the following to check whether a file has been created before continuing, thing is the file is showing up in the file browser much before it is being detected by stat... is there a problem with doing this?
//... do something
struct stat buf;
while(stat("myfile.txt", &buf))
sleep(1);
//... do something else
alternatively is there a better way to check whether a file exists?
Using inotify, you can arrange for the kernel to notify you when a change to the file system (such as a file creation) takes place. This may well be what your file browser is using to know about the file so quickly.
The "stat" system call is collecting different information about the file, such as, for example, a number of hard links pointing to it or its "inode" number. You might want to look at the "access" system call which you can use to perform existence check only by specifying "F_OK" flag in "mode".
There is, however, a little problem with your code. It puts the process to sleep for a second every time it checks for file which doesn't exist yet. To avoid that, you have to use inotify API, as suggested by Jerry Coffin, in order to get notified by kernel when file you are waiting for is there. Keep in mind that inotify does not notify you if file is already there, so in fact you need to use both "access" and "inotify" to avoid a race condition when you started watching for a file just after it was created.
There is no better or faster way to check if file exists. If your file browser still shows the file slightly faster than this program detects it, then Greg Hewgill's idea about renaming is probably taking place.
Here is a C++ code example that sets up an inotify watch, checks if file already exists and waits for it otherwise:
#include <cstdio>
#include <cstring>
#include <string>
#include <unistd.h>
#include <sys/inotify.h>
int
main ()
{
const std::string directory = "/tmp";
const std::string filename = "test.txt";
const std::string fullpath = directory + "/" + filename;
int fd = inotify_init ();
int watch = inotify_add_watch (fd, directory.c_str (),
IN_MODIFY | IN_CREATE | IN_MOVED_TO);
if (access (fullpath.c_str (), F_OK) == 0)
{
printf ("File %s exists.\n", fullpath.c_str ());
return 0;
}
char buf [1024 * (sizeof (inotify_event) + 16)];
ssize_t length;
bool isCreated = false;
while (!isCreated)
{
length = read (fd, buf, sizeof (buf));
if (length < 0)
break;
inotify_event *event;
for (size_t i = 0; i < static_cast<size_t> (length);
i += sizeof (inotify_event) + event->len)
{
event = reinterpret_cast<inotify_event *> (&buf[i]);
if (event->len > 0 && filename == event->name)
{
printf ("The file %s was created.\n", event->name);
isCreated = true;
break;
}
}
}
inotify_rm_watch (fd, watch);
close (fd);
}
your code will check if the file is there every second. you can use inotify to get an event instead.