On my machine, I have a file which is regenerated by an application every second - it contains different data each time, as it's based on some realtime data.
I would like to have a copy of this file, which would contain what the original file contained 5 minutes ago. Is this somehow easily achievable? I would be happy to do this using some BASH scripting magic, but adding some wise, memory efficient code to that original application (written in c++) would also satisfy me :)
You tagged your question with both linux and unix. This answer only applies to Linux.
You may be able to use inotify-tools (inotifywait man page) or incron (incrontab(5) man page) to watch the directory and make copies of the files as they are closed.
If disk space isn't an issue, you could make the program create a new file every second instead of writing to the same file. You would need a total of 300 files (5 min * 60 sec/min). The file name to write to would be $somename + timestamp() % 300. That way, to get the file 5 minutes ago, you would just access the file $somename + (timestamp()+1) % 300.
In order to achieve that, you need the space to hold each of the 300 (5*60) files. Since you indicate that the files are only about 50K in size, this is doable in 15MB memory (if you don't want to clutter your filesystem)
It should be as simple as: (something like)
struct {char* buf; size_t size} hist[300]; //initalize to all nulls.
int n = 0;
struct stat st;
int ifd = open("file", O_READ);
int ofd = open("file-lag", O_WRITE);
stat(ifd, &st);
hist[n].size = st.st_size;
buffer[n] = malloc(hist[n].size);
read(ifd, hist.buf[n], hist[n].size);
n = (n+1)%300;
write(ofd, hist.buf[n], hist[n].size)
I am developing a c++ program that writes a large amount of data to disk. The following function gzips the data and writes it out to a file. The compressed data is on the order of 100GB. The function to compress and write out the data is as follows:
void constructSNVFastqData(string const& fname) {
ofstream fastq_gz(fname.c_str());
stringstream ss;
for (int64_t i = 0; i < snvId->size(); i++) {
consensus_pair &cns_pair = snvId->getPair(i);
string qual(cns_pair.non_mutated.size(), '!');
ss << "#" + cns_pair.mutated + "[" + to_string(cns_pair.left_ohang) +
";" + to_string(cns_pair.right_ohang) + "]\n"
+ cns_pair.non_mutated + "\n+\n" + qual + "\n";
boost::iostreams::filtering_streambuf<boost::iostreams::input> out;
The function writes data to a string stream, which I then
write out to a file (fastq_gz) using boost's filtering_streambuf.
The file is not a log file. After the file has been written
it will be read by a child process. The file does not need to be viewed
by humans.
Currently, I am writing the data out to a single, large file (fastq_gz). This is taking a while, and the file system - according to our system manager - is very busy. I wonder if, instead of writing out a single large file, I should instead write out a number of smaller files? Would this approach be faster, or reduce the load on the file system?
Please note that it is not the compression that is slow - I have benchmarked.
I am running on a linux system and do not need to consider generalising the implementation to a windows filesystem.
So what your code is probably doing is (a) generating your file into memory swap space, (b) loading from swap space and compressing on the fly, (c) writing compressed data as you get it to the outfile.
(b) and (c) are great; (a) is going to kill you. It is two roundmtrips of thr uncompressed data, one of which while competing with your output file generation.
I cannot find one in boost iostreams, but you need an istream (source) or a device that gets data from you on demand. Someone must have written it (it seems so useful), but I don't see it in 5 minutes of looking at boost iostreams docs.
0.) Devise an algorithm to divide the data into multiple files so that it could be recombined later.
1.) Write data to multiple files on separate threads in the background. Maybe shared threads. (maybe start n = 10 threads at a time or so)
2.) Query through the future attribute of the shared objects to check if writing is done. (size > 1 GB)
3.) Once above is the case; then recombine data when it is queried by the child process
4.) I would recommend writing a new file after every 1 GB
I would like to ask about inputing more than one file or about the easiest way how to put filenames to some queue.
This files are about edit.
I have, let me say, 100 txt files and for each one I wanted to open it, find something and and save it.
I have functions/methods for each operation.
But I run into problem loading files into program. I made it for five or less files.
Process is. Program asked me for file name in program root directory or full path to file
for example C/myfile.txt
after pressing enter executes.
unfortunately bad thing happened and I have to do 100 files per day. So I know in C# is possible to make open file dialog - multiple file load...
In this program, I was thinking about doing static array of strings and for each make a for cycle with iterration.
but I have no idea what is the easier way how to load this strings (filenames) into this array.
I read something on msdn but it looks so complicated for me.
Can someone help? Program is just for me and I don't want to set too many things. It is possible?
what is the less or minimalistic part of code I must use - add to my program?
example of result I want
array[0] = C/text. txt
array[1] = C/texta. txt
array[50] = C/textdhsjfk. txt
maybe it is not so easy as I think, maybe yes.
But I have no support in this major... So I only tried to find something on the interent and I am not sure about the result I have found.
Thank you for time and willing to help.
A loop and a container look like good things to use:
static const char filenames[] =
"C:\\Fred\\file1.txt", "c:\\Fred\\file2.txt", "c:\\Barney\\file1.txt",
static const unsigned int quantity_of_files =
sizeof(filenames) / sizeof(filenames[0]);
for (unsigned int i = 0u; i < quantity_of_files; ++i)
std::ifstream input_stream(filenames[i]);
The task is to get the code working correctly and robust, then worry about performance.
Note: The program is set up so that file names can be added to the array without having to change or retest the code.
Purpose: I am monitoring file writes in a particular directory on iOS using BSD kernel queues, and poll for file sizes to determine write ends (when the size stops changing). The basic idea is to refresh a folder only after any number of file copies coming from iTunes sync. I have a completely working Objective-C implementation for this but I have my reasons for needing to implement the same thing in C++ only.
Problem: The one thing stopping me is that I can't find a C or C++ API that will get the correct file size during a write. Presumably, one must exist because Objective-C's [NSFileManager attributesOfItemAtPath:] seems to work and we all know it is just calling a C API underneath.
Failed Solutions:
I have tried using stat() and lstat() to get st_size and even st_blocks for allocated block count, and they return correct sizes for most files in a directory, but when there is a file write happening that file's size never changes between poll intervals, and every subsequent file iterated in that directory have a bad size.
I have tried using fseek and ftell but they are also resulting in a very similar issue.
I have also tried modified date instead of size using stat() and st_mtimespec, and the date doesn't appear to change during a write - not that I expected it to.
Going back to NSFileManager's ability to give me the right values, does anyone have an idea what C API call that [NSFileManager attributesOfItemAtPath:] is actually using underneath?
Thanks in advance.
It appears that this has less to do with in-progress write operations and more with specific files. After closer inspection there are some files which always return a size, and other files that never return a size when using the C API (but will work fine with the Objective-C API). Even creating a copy of the "good" files the C API does not want to give a size for the copy but works fine with the original "good" file. I have both failures and successes with text (xml) files and binary (zip) files. I am using iTunes to add these files to the iPad's app's Documents directory. It is an iPad Mini Retina.
Update 2 - Answer:
Probably any of the above file size methods will work, if your path isn't invisibly trashed, like mine was. See accepted answer on why the path was trashed.
Well this weird behavior turned out to be a problem with the paths, which result in strings that will print normally, but are likely trashed in memory enough that file descriptors sometimes didn't like it (thus only occurring in certain file paths). I was using the dirent API to iterate over the files in a directory and concatenating the dir path and file name erroneously.
Bad Path Concatenation: Obviously (or apparently not-so-obvious at runtime) str-copying over three times is not going to end well.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcpy(fullPath, "/");
strcpy(fullPath, file);
long sizeBytes = getSize(fullPath);
Correct Path Concatenation: Use proper str-concatenation.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcat(fullPath, "/");
strcat(fullPath, file);
long sizeBytes = getSize(fullPath);
Long story short, it was sloppy work on my part, via two typos.
Is it possible to delete part of a file (let's say from the beginning to its half), without having to use another file?
Yes, it is possible, but still you'll have to rewrite most of the file.
The rough idea is as follows:
open the file
beg = find the start of the fragment to be removed
len = length of the fragment to be removed
blocksize = 4096 -- example block size, may be any
datamoved = 0
do {
fseek(pos +len +datamoved);
if( endoffile ) return; -- finished!
actualread = fread(buffer, blocksize)
fseek(pos + datamoved)
fwrite(buffer, actualread)
datamoved += actualread
and the last step after the loop is to 'truncate' the file to the pos+datamoved size. if the underlying filesystem does not handle 'truncatefile' operation, then you have to rewrite.. but most of filesystems and libraries do support that.
The short answer is that no, most file systems don't attempt to support operations like that.
That leaves you with two choices. The obvious one is to create a copy of the data, leaving out the parts you don't want. You can do this either in-place (i.e., moving the data around in the same file) or by using an auxiliary file, typically copying the data to the new file, then doing something like renaming the new file to the old name.
The other major choice is to simply re-structure your file and data so you don't have to get rid of the old data at all. For example, if you want to keep the most recent N amount of data from a process, you might structure (most of) the file as a circular buffer, with a couple of "pointers" at the beginning tell you the head and tail points, so you know where to read data from/write data to. With a structure like this, you don't erase or remove the old data, you just overwrite it as needed.
If you have enough memory, read its contents fully to the memory, copy it back to the front of the file, and truncate the file.
If you do not have enough memory, copy in blocks, and only when you are done truncate the file.
I'm making simple dll packet sniffer using C++, that will hook to the apps, and write the received packet into INI file. Unfortunately after 20-30 minutes it crashed the main apps.
When the packet is received, receivedPacket() will be called. After 20+ minutes, WriteCount value is around 150,000-200,000.. and starting to get C++ runtime error/crash, GetLastError() code that I get is 0x8, which is ERROR_NOT_ENOUGH_MEMORY, and the WritePrivateProfileStringA() returns 0
void writeToINI(LPCSTR iSec,LPCSTR iKey,int iVal){
sprintf(inival, _T("%d"), iVal);
//sprintf(strc, _T("%d \n"), WriteCount);
//WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), strc, strlen(strc), 0, 0);
void receivedPacket(char *packet,WORD size){
switch ( packet[2] )
case 0x30:
// Size : 0x5F
ID = *(signed char*)&packet[0x10];
X = *(signed short*)&packet[0x20];
Y = *(signed short*)&packet[0x22];
Z = *(signed short*)&packet[0x24];
sprintf(inisec, _T("PACKET_%d"), (ID+1));
[.....OTHER CASES.....]
Thanks :)
WritePrivateProfileString() and GetPrivateProfileString() are very slow (due to parsing INI file each call), instead you can:
use one of existing parsing libraries, but i am not sure about memory efficiency nor supporting sequential write.
write your own sequential INI writter:
read file (or only part, by part, if it is too big)
find section and key (if not found, create new section at end of file, or find insertion position, if you want sorted sections), save file position of key and next key
change value
save (beginning of original file to position of key + actual changed key + position of next key in original file to end of file) (if new section is created at end, you can simply append new section to original file) (if packets rewrite same ID often, you can add padding whitespace after each key, large to hold any value of desired type (example: change X=1---\n to X=100-\n (change - to whitespace), so you have constant size of key, you can update only part of file) )
database, for example MySQL
write binary file (fastest solution) and make program to read values, or to convert to text
Little note: I use GetPrivateProfileString() few years ago to read settings file (about 1KB of size), reading form HDD: 50ms, reading from USB flash disk: 1000ms!, after changing (1. read file to memory 2. run my own parser) it run in 1ms both on HDD and USB.
Thanks for the reply guys, but looks like the problem wasn't come from WritePrivateProfileStringA().
I just need to add extra size in malloc() for the Hook.