How can I get the latest changes in a file using ifstream? - c++

It's a real-time capture system, I need to get the latest changes from a file which is occasionally edited(mostly add content) by other applications.
In other words, how can I get content that added in the period when I open it without reopening the file?
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(){
ifstream tfile("temp.txt",ios::in);
cout<<"open failed"<<endl;
return 0;
string str;

C++ / C Solution
If you are looking for a c++ solution you can use the following functions that I had created a while back:
#include <iostream>
#include <string>
// For sleep function
#ifdef _WIN32
#include <Windows.h>
#include <unistd.h>
using namespace std;
void watchLogs(const char *FILENAME) {
FILE * f;
unsigned size = 0;
f = fopen(FILENAME , "r");
char c;
while (true) {
if (!size) { // will print content of your log file. If you just want the updates you can remove the current content except the first two lines;
fseek(f, 0, SEEK_END);
size =(unsigned long)ftell(f) ;
fseek (f, 0, SEEK_SET);
char buffer[size + 1];
fread ( buffer, 1, size, f );
buffer[size] = '\0';
cout << buffer << "\n";
else if ((c = (char)fgetc(f)) >= 0) {
fseek(f, 0, SEEK_END); // reach end of file
int BUFFER_SIZE =(unsigned long)ftell(f) - size; // save the length of the update to your logs
char buffer[BUFFER_SIZE + 1]; // prepare a buffer to print the characters
fseek(f,-BUFFER_SIZE,SEEK_END); // rewind BUFFER_SIZE characters before the EOF
int i = 0;
do {buffer[i++] = (char)fgetc(f);} while(i < BUFFER_SIZE); // copy to buffer
buffer[i] = '\0'; // don't forget to NULL terminate your buffer
cout << buffer << "\n";
size += i; // increment the size of the current file
sleep(3); // updates are checked every 3 seconds to avoid running the cpu at fullspeed, you could set the new logs to show up every minutes or every seconds, up to you.
And you can test it with:
int main(int argc, char **argv) {
if (argc < 2)
return 1;
const char *FILENAME = argv[1];
return 0;
./a.out mysql_binary.log
I could have used stringstreamer but I like that this version would also work with c files with some minor tweaks (can't use string).
I hope you will find it helpful!
NB: This assume that your file will only grow and that the changes will be appended to the end of your file.
NB2: This program is not segfault proof, you may want to check the return of fopen etc
If you use Linux you could also potentially go for inotify:
Download inotify: sudo apt-get install -y inotify-tools
Then create the following script
while inotifywait -e close_write $1; do ./$1; done
Give permission to execute:
add chmox +x
and call it with ./ mysql_binary.log


File Locks not Preventing File Overwrite

I have the following c++ code which writes "Line from #" to a file while managing a file lock. I am running this code on two different computers, which share at least some of their memory. That is I can access my files by logging onto either of these computers.
On the first computer I run the program as ./test 1 (e.g. so it will print Line from 1 20,000 times) and on the second computer I run the program as ./test 17. I am starting these programs close enough in time so that the writes to file.txt should be interleaved and controlled by the file locks.
The problem is that I am losing output as the file has 22,770 newlines, but it should have exactly 40,000 newlines.
wc file.txt
22770 68310 276008 file.txt
cat -n file.txt | grep 18667
18667 ne from 17
My question is why are my file locks not preventing file overwriting, and how can I fix my code so that multiple processes can write to the same file without file loss.
#include <unistd.h>
#include <fcntl.h>
#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <sstream>
#include <iostream>
using namespace std;
void inline Set_Lck(struct flock &flck, const int fd)
flck.l_type = F_WRLCK;
if (fcntl(fd, F_SETLKW, &flck) == -1) {
void inline Release_Lck(struct flock &flck, const int fd)
flck.l_type = F_UNLCK;
if (fcntl(fd,F_SETLK,&flck) == -1) {
void Print_Spec(fstream &fout, ostringstream &oss,struct flock &flck, const int fd)
fout << oss.str() << endl;
int main(int argc, char **argv)
int fd_cd;
struct flock flock_cd;
ostringstream oss;
fstream comp_data;
const string s_cd_lck = "file_lock.txt";
const string s_cd = "file.txt";
int my_id;
if (argc == 1) {
my_id = 0;
} else if (argc == 2) {
my_id = atoi(argv[1]);
} else {
fprintf(stderr,"error -- usage ./test [my_id]\n");
/* Open file computed_data.txt for writing; create it if non-existent.*/,ios::app|ios::out);
if ( {
/* Open file that we will be locking. */
fd_cd = open(s_cd_lck.c_str(),O_CREAT|O_WRONLY,0777);
if (fd_cd == -1) {
perror("fd_cd = open");
/* Set up the lock. */
flock_cd.l_type = F_WRLCK;
flock_cd.l_whence = SEEK_SET;
flock_cd.l_start = 0;
flock_cd.l_len = 0;
flock_cd.l_pid = getpid();
for (int i = 0; i < 20000; ++i) {
oss.str(""); /* Yes, this can be moved outside the loop. */
oss << "Line from " << my_id << endl;
return 0;
I am using c++ and this program is running on Red Hat Enterprise Linux Server release 7.2 (Maipo).
My Research
I am not sure if part of the answer comes from the following Stackoverflow post ( where they state that "locks are bound to processes."
At this website (, the author dissuades against using LOCK_UN with flock and suggests closing the file each time and reopening it as needed, so as to flush the file buffer. I don't know if this carries over with fcntl or if this is even necessary if flush the file buffer manually.

OpenSSL SHA256 Wrong result

I have following piece of code that is supposed to calculate the SHA256 of a file. I am reading the file chunk by chunk and using EVP_DigestUpdate for the chunk. When I test the code with the file that has content
Test Message
Hello World
in Windows, it gives me SHA256 value of 97b2bc0cd1c3849436c6532d9c8de85456e1ce926d1e872a1e9b76a33183655f but the value is supposed to be 318b20b83a6730b928c46163a2a1cefee4466132731c95c39613acb547ccb715, which can be verified here too.
Here is the code:
#include <openssl\evp.h>
#include <iostream>
#include <string>
#include <fstream>
#include <cstdio>
const int MAX_BUFFER_SIZE = 1024;
std::string FileChecksum(std::string, std::string);
int main()
std::string checksum = FileChecksum("C:\\Users\\Dell\\Downloads\\somefile.txt","sha256");
std::cout << checksum << std::endl;
return 0;
std::string FileChecksum(std::string file_path, std::string algorithm)
EVP_MD_CTX *mdctx;
const EVP_MD *md;
unsigned char md_value[EVP_MAX_MD_SIZE];
int i;
unsigned int md_len;
md = EVP_get_digestbyname(algorithm.c_str());
if(!md) {
printf("Unknown message digest %s\n",algorithm);
mdctx = EVP_MD_CTX_create();
std::ifstream readfile(file_path,std::ifstream::in|std::ifstream::binary);
std::cout << "COuldnot open file\n";
return 0;
readfile.seekg(0, std::ios::end);
long filelen = readfile.tellg();
std::cout << "LEN IS " << filelen << std::endl;
readfile.seekg(0, std::ios::beg);
if(filelen == -1)
std::cout << "Return Null \n";
return 0;
EVP_DigestInit_ex(mdctx, md, NULL);
long temp_fil = filelen;
while(!readfile.eof() && readfile.is_open() && temp_fil>0)
int bufferS = (temp_fil < MAX_BUFFER_SIZE) ? temp_fil : MAX_BUFFER_SIZE;
char *buffer = new char[bufferS+1];
buffer[bufferS] = 0;, bufferS);
std::cout << strlen(buffer) << std::endl;
EVP_DigestUpdate(mdctx, buffer, strlen(buffer));
temp_fil -= bufferS;
delete[] buffer;
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
printf("Digest is: ");
//char *checksum_msg = new char[md_len];
//int cx(0);
for(i = 0; i < md_len; i++)
printf("%02x", md_value[i]);
//std::string res(checksum_msg);
//delete[] checksum_msg;
/* Call this once before exit. */
return "";
I tried to write the hash generated by program as string using _snprintf but it didn't worked. How can I generate the correct hash and return the value as string from FileChecksum Function? Platform is Windows.
EDIT: It seems the problem was because of CRLF issue. As Windows in saving file using \r\n, the Checksum calculated was different. How to handle this?
MS-DOS used the CR-LF convention,So basically while saving the file in windows, \r\n comes in effect for carriage return and newline. And while testing on online (given by you), only \n character comes in effect.
Thus either you have to check the checksum of Test Message\r\nHello World\r\n in string which is equivalent to creating and reading file in windows(as given above), which is the case here.
However, the checksum of files,wherever created, will be same.
Note: your code works fine :)
It seems the problem was associated with the value of length I passed in EVP_DigestUpdate. I had passed value from strlen, but replacing it with bufferS did fixed the issue.
The code was modified as:
while(!readfile.eof() && readfile.is_open() && temp_fil>0)
int bufferS = (temp_fil < MAX_BUFFER_SIZE) ? temp_fil : MAX_BUFFER_SIZE;
char *buffer = new char[bufferS+1];
buffer[bufferS] = 0;, bufferS);
EVP_DigestUpdate(mdctx, buffer, bufferS);
temp_fil -= bufferS;
delete[] buffer;
and to send the checksum string, I modified the code as:
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
char str[128] = { 0 };
char *ptr = str;
std::string ret;
for(i = 0; i < md_len; i++)
sprintf(ptr,"%02x", md_value[i]);
ptr += 2;
ret = str;
/* Call this once before exit. */
return ret;
As for the wrong checksum earlier, the problem was associated in how windows keeps the line feed. As suggested by Zangetsu, Windows was making text file as CRLF, but linux and the site I mentioned earlier was using LF. Thus there was difference in the checksum value. For files other than text, eg dll the code now computes correct checksum as string

Program crashes when reading a long text file - "*.exe has stopped working"

RRThe title describes it all. I am reading various files in my program, and once it reaches a relatively large file, the program crashes.
I wrote a shortened version of my program that replicates the issue.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <iostream>
#include <fstream>
char** load_File(char** preComputed, const int lines, const int sLength,
std::string fileName){
FILE *file;
int C = lines+1;
int R = sLength+2;
int i; //Dummy index
int len;
//Create 2-D array on the heap
preComputed = (char**) malloc(C*sizeof(char*));
for(i = 0; i<C; i++) preComputed[i] = (char *) malloc(R*sizeof(char));
//Need to free each element individually later on
//Create temprary char array
char* line = (char *) malloc(R*sizeof(char));
//Open file to read and store values
file = fopen(fileName.c_str(), "r");
if(file == NULL){ perror("\nError opening file"); return NULL;}
i = 0;
while(fgets(line, R, file) != NULL){
//Remove next line
len = R;
if((line[len-1]) == '\n') (line[len-1]) = '\0';
len--; // Decrement length by one because of replacing EOL
// with null terminator
//Copy character set
strcpy(preComputed[i], line);
preComputed[C-1] = NULL; //Append null terminator
return preComputed;
int main(void){
char** preComputed = NULL;
std::string name = "alphaLow3.txt";
preComputed = load_File(preComputed, 17576, 3, name);
if(preComputed == NULL){
std::cout<<"\nAn error has been encountered...";
//Free preComputed
for(int y = 0; y < 17576; y++){
This program will crash when it is executed. Here are two links to the text files.
To run alphaLow2.txt, change the numbers in the load_file call to 676 and 2 respectively.
When this program reads alphaLow2.txt, it executes successfully. However, when it read alphaLow3.txt, it crashes. This file is only 172KB. I have files that are a MB or larger. I thought I allocated enough memory, but I may be missing something.
The program is supposed to be in C, but I've included some C++ functions for ease.
Any constructive input is appreciated.
You must confirm your file length.In the alphaLow3.txt file, a total of 35152 lines.But in your program,you set the line 17576.This is the main reason leading to crash.
In addition,this sentence
if((line[len-1]) == '\n') (line[len-1]) = '\0';
fgets will make the last character NULL.For example the first line should be " 'a''a''a''\n''null' ".So you should do it like this.
if((line[len-2]) == '\n') (line[len-2]) = '\0';

C++ Read large data, parse, then write data

I'm trying to read a large dataset, format it the way I need, and then write it to another file. I'm trying to use C++ over SAS or STATA for the speed advantage. The data file are usually around 10gigabytes. And my current code takes over an hour to run (and then I kill it because I'm sure that something is very inefficient with my code.
Is there a more efficient way to do this? Maybe read the file into memory and then analyze it using the switch statements? (I have 32gb ram linux 64bit). Is it possible that reading, and then writing within the loop slows it down since it is constantly reading, then writing? I tried to read it from one drive, and then write to another in an attempt to speed this up.
Are the switch cases slowing it down?
The process I have now reads the data using getline, uses the switch statement to parse it correctly, and then writes it to my outfile. And repeats for 300 million lines. There are about 10 more cases in the switch statement, but I didn't copy for brevity's sake.
The code is probably very ugly all being in the main function, but I wanted to get it working before I worked on attractiveness.
I've tried using read() but without any success. Please let me know if I need to clarify anything.
Thank you for the help!
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <stdio.h>
//#include <cstring>
//#include <boost/algorithm/string.hpp>
#include <vector>
using namespace std;
//using namespace boost;
struct dataline
char type[0];
double second;
short mill;
char event[1];
char ticker[6];
char marketCategory[1];
char financialStatus[1];
int roundLotSize;
short roundLotOnly;
char tradingState[1];
char reserved[1];
char reason[4];
char mpid[4];
char primaryMarketMaker[1];
char primaryMarketMode[1];
char marketParticipantState[1];
unsigned long orderNumber;
char buySell[0];
double shares;
float price;
int executedShares;
double matchNumber;
char printable[1];
double executionPrice;
int canceledShares;
double sharesBig;
double crossPrice;
char crossType[0];
double pairedShares;
double imbalanceShares;
char imbalanceDirection[1];
double fairPrice;
double nearPrice;
double currentReferencePrice;
char priceVariationIndicator[1];
int main ()
string a;
string b;
string c;
string d;
string e;
string f;
string g;
string h;
string k;
string l;
string times;
string smalltimes;
short time; //counter to keep second filled
short smalltime; //counter to keep millisecond filled
double N;
double NN;
double NNN;
int length;
char M;
//vector<> fout;
string line;
ofstream fout ("/media/3tb/test.txt");
ifstream myfile;"S050508-v3.txt");
dataline oneline;
if (myfile.is_open())
while ( myfile.good() )
getline (myfile,line);
// cout << line<<endl;;
stringstream ss(a);
char type;
switch (type)
case 'T':
if (type == 'T')
stringstream s(times);
case 'M':
if (type == 'M')
stringstream ss(smalltimes);
ss>>smalltime; //oneline.mill;
// cout<<smalltime<<endl; //smalltime=oneline.mill;
case 'R':
stringstream ss(a);
stringstream sss(b);
stringstream ssss(c);
stringstream sssss(d);
stringstream ssssss(e);
stringstream sssssss(f);
}//End Case
}//End Switch
}//end While
}//End If
else cout << "Unable to open file";
return 0;
UPDATE So I've been trying to use memory map, but now I'm getting a segmentation fault.
I've been trying to follow different examples to piece together something that would work for mine. Why would I be getting a segmentation fault? I've taken the first part of my code, which looks like this:
int main (int argc, char** path)
long i;
int fd;
char *map;
char *FILEPATH = path;
unsigned long FILESIZE;
FILE* fp = fopen(FILEPATH, "/home/brian/Desktop/S050508-v3.txt");
fseek(fp, 0, SEEK_END);
FILESIZE = ftell(fp);
fseek(fp, 0, SEEK_SET);
fd = open(FILEPATH, O_RDONLY);
map = (char *) mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
char z;
stringstream ss;
for (long i = 0; i <= FILESIZE; ++i)
z = map[i];
if (z != '\n')
ss << z;
// c style tokenizing
if (munmap(map, FILESIZE) == -1) perror("Error un-mmapping the file");
The data file are usually around 10gigabytes.
Are the switch cases slowing it down?
Almost certainly not, smells like you're I/O bound. But you should consider measuring it. Modern CPUs have performance counters which are pretty easy to leverage with the right tools. But let's start to partition the problems into some major domains: I/O to devices, load/store to memory, CPU. You can place some markers in your code where you read a clock in order to understand how long each of the operations are taking. On linux you can use clock_gettime() or the rdtsc instruction to access a clock with higher precision than the OS tick.
Consider mmap/CreateFileMapping, either of which might provide better efficiency/throughput to the pages you're accessing.
Consider large/huge pages if streaming through large amounts of data which has already been paged in.
From the manual for mmap():
mmap() creates a new mapping in the virtual address space of the
calling process. The starting address for the new mapping is specified
in addr. The length argument specifies the length of the mapping.
Here's an mmap() example:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#define FILEPATH "/tmp/mmapped.bin"
#define NUMINTS (1000)
#define FILESIZE (NUMINTS * sizeof(int))
int main(int argc, char *argv[])
int i;
int fd;
int *map; /* mmapped array of int's */
fd = open(FILEPATH, O_RDONLY);
if (fd == -1) {
perror("Error opening file for reading");
map = mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
perror("Error mmapping the file");
/* Read the file int-by-int from the mmap
for (i = 1; i <=NUMINTS; ++i) {
printf("%d: %d\n", i, map[i]);
if (munmap(map, FILESIZE) == -1) {
perror("Error un-mmapping the file");
return 0;

Help Editing Code to Fix "Argument list too long" Error

I am currently doing some testing with a new addition to the ICU dictionary-based break iterator.
I have code that allows me to test the word-breaking on a text document but when the text document is too large it gives the error: bash: ./a.out: Argument list too long
I am not sure how to edit the code to break-up the argument list when it gets too long so that a file of any size can be run through the code. The original code author is quite busy, would someone be willing to help out?
I tried removing the printing of what is being examined to see if that would help, but I still get the error on large files (printing what is being examined isn't necessary - I just need the result).
If the code could be modified to read the source text file line by line and export the results line by line to another text file (ending up with all the lines when it is done), that would be perfect.
The code is as follows:
Written by George Rhoten to test how word segmentation works.
Code inspired by the break ICU sample.
Here is an example to run this code under Cygwin.
PATH=$PATH:icu-test/source/lib ./a.exe "`cat input.txt`" > output.txt
Encode input.txt as UTF-8.
The output text is UTF-8.
#include <stdio.h>
#include <unicode/brkiter.h>
#include <unicode/ucnv.h>
#define ZW_SPACE "\xE2\x80\x8B"
void printUnicodeString(const UnicodeString &s) {
int32_t len = s.length() * U8_MAX_LENGTH + 1;
char *charBuf = new char[len];
len = s.extract(0, s.length(), charBuf, len, NULL);
charBuf[len] = 0;
printf("%s", charBuf);
delete charBuf;
/* Creating and using text boundaries */
int main(int argc, char **argv)
UnicodeString stringToExamine("Aaa bbb ccc. Ddd eee fff.");
printf("Examining: ");
if (argc > 1) {
// Override the default charset.
stringToExamine = UnicodeString(argv[1]);
if (stringToExamine.charAt(0) == 0xFEFF) {
// Remove the BOM
stringToExamine = UnicodeString(stringToExamine, 1);
//print each sentence in forward and reverse order
UErrorCode status = U_ZERO_ERROR;
BreakIterator* boundary = BreakIterator::createWordInstance(NULL, status);
if (U_FAILURE(status)) {
printf("Failed to create sentence break iterator. status = %s",
printf("Result: ");
//print each word in order
int32_t start = boundary->first();
int32_t end = boundary->next();
while (end != BreakIterator::DONE) {
if (start != 0) {
printUnicodeString(UnicodeString(stringToExamine, start, end-start));
start = end;
end = boundary->next();
delete boundary;
return 0;
Thanks so much!
The Argument list too long error message is coming from the bash shell and is happening before your code even gets started executing.
The only code you can fix to eliminate this problem is the bash source code (or maybe it is in the kernel) and then, you're always going to run into a limit. If you increase from 2048 files on command line to 10,000, then some day you'll need to process 10,001 files ;-)
There are numerous solutions to managing 'too big' argument lists.
The standardized solution is the xargs utility.
find / -print | xargs echo
is a un-helpful, but working example.
See How to use "xargs" properly when argument list is too long for more info.
Even xargs has problems, because file names can contain spaces, new-line chars, and other unfriendly stuff.
I hope this helps.
The code below reads the content of a file whos name is given as the first parameter on the command-line and places it in a str::buffer. Then, instead of calling the function UnicodeString with argv[1], use that buffer instead.
using namespace std;
int main(int argc, char **argv)
std::string buffer;
if(argc > 1) {
std::ifstream t;[1]);
std::string line;
std::getline(t, line);
buffer += line + '\n';
cout << buffer;
return 0;
Input to UnicodeString should be char*. The function GetFileIntoCharPointer does that.
Note that only the most rudimentary error checking is implemented below!
using namespace std;
char * GetFileIntoCharPointer(char *pFile, long &lRet)
FILE * fp = fopen(pFile,"rb");
if (fp == NULL) return 0;
fseek(fp, 0, SEEK_END);
long size = ftell(fp);
fseek(fp, 0, SEEK_SET);
char *pData = new char[size + 1];
lRet = fread(pData, sizeof(char), size, fp);
return pData;
int main(int argc, char **argv)
long Len;
char * Data = GetFileIntoCharPointer(argv[1], Len);
std::cout << Data << std::endl;
if (Data != NULL)
delete [] Data;
return 0;