I'm reading a big file using fread. When I interrupt the program during it using Ctrl+C, the program hangs and is not killable, also not with kill -9. It simple sticks with 100% CPU, keeping the RAM it had already allocated. It would be great to get that fixed, but it would also be okay just to be able to kill that application from outside (the main problem being the fact that I can't restart that machine myself).
Is there a way of doing that in Unix?
Thanks!
Here is the source:
int Read_New_Format(const char* prefix,const char* folder)
{
char filename[500];
long count_pos;
//open files for reading.
sprintf(filename,"%s/%s.pos.mnc++",folder,prefix);
FILE *pos = fopen(filename,"r");
if(pos==NULL)
{
printf("Could not open pos file %s\n",filename);
}
//read the number count of entries in each of the three files.
fread(&count_pos,sizeof(long),1,pos);
printf("[...]");
//read the complete file into an array.
float *data_pos = new float[3*count_pos];
fread(data_pos,3*sizeof(float),*count_pos,pos);
printf("Read files.\n");
[...]
}
If your program cannot be interrupted by a signal, that almost surely means it's in an uninterruptable sleep state. This is normally an extremely short-lived state that only exists momentarily while waiting for the physical disk to perform a read or write, either due to an explicit read or write call that can't be satisfied by the cache, or one resulting from a page fault where a disk-backed page is not swapped into physical memory.
If the uninterruptable sleep state persists, this is almost surely indicative of either extremely high load on the storage device (a huge number of IO requests all happening at once) or, much more likely, damaged hardware.
I suspect you have a failing hard disk or scratched optical disc.
Problem wasn't reproducable after some days. Maybe a problem with the file system. As a workaround, direct use of the unix library routines instead of fread worked.
Related
I went through all the answers that were available regarding real-time reading a text file but none seems to work.
In my program 1 have a continuously growing text file being written by a hardware which is giving two coordinates (two columns).
In program 2, I want to read those coordinates in real time and move another hardware to the coordinates that are being written.
The biggest problem is I want to work with shortest possible delay (under 50ms).
I tried notepad++, but its refresh rate is 3 seconds which is too much.
Can anyone tell how can this be done?
Your fastest response is to either poll (read the hardware) directly or to have the hardware create an event (interrupt) that calls your program.
Writing to a file takes time. The OS has to find space on the hard drive, write to the hard drive; and not to mention the time required to ramp up the motors to spin the hard drive.
Writing to memory is a lot quicker. A more efficient method is for the H/W to write to memory rather than a file. Alternately, a memory mapped file or RAM drive will be the next best option.
Also remember that Windows is not a real-time operating system. You have other tasks in your system being swapped out and executed. This takes time away from your "real time" requirements. You may want to research Windows to see if there is an API that allows your program exclusive access to the processor (or makes your program a very high priority).
Research "Windows Drivers" to write code that can service your H/W and perform activities in real time.
I tried this:
int main()
{
std::ifstream ifs("file.txt");
if (ifs.is_open())
{
std::string line;
while (true)
{
while (std::getline(ifs, line)) std::cout << line << "\n";
if (!ifs.eof()) break;
ifs.clear();
}
}
return 0;
}
But it reads till the end and when i add more values to my text file, it doesn't read that. But when I refresh my file, I get the o/p on the console.
I have also tried using tellg and seekg but that also doesn't help.
We have a project where multiple nodes writes a data to a file in sequence and the file resides on NFS.
We were using synchronous NFS before so the flush to file streams just worked fine. Now we have asynchronous NFS and its not working. Not working in a sense obviously the caching comes into picture and other nodes doesnt see the changes made by a particular node.
I wanted to know if there is a way to forcefully flush the data from the cache to disk. I know this is not efficient but it will get things working until we get the real solution in place.
I've had a similar problem using NFS with VxWorks. After some experimentation I've found a way to surely flush data to the device:
int fd;
fd = open("/ata0a/test.dat", O_RDWR | O_CREATE);
write(fd, "Hallo", 5);
/* data is having a great time in some buffers... */
ioctl(fd, FIOSYNC, 0); // <-- may last quite a while...
/* data is flushed to file */
I've never worked with ofstreams neither do I know if your OS provides something similar to the code shown above...
But one thing to try is to simply close the file. This will cause all buffers to be flushed. But be aware that there may be some time between closing the file and all data being flushed which your application does not see since the "close" call may return before the data is written. Additionally this creates a lot of overhead since you have to re-open the file afterwards.
If this is no option you can also write as much "dummy-data" after your data to cause the buffers to fill up. This will also result in the data being written to the file. But this may waste a lot of disk space depending on the size of your data.
My application records audio samples from a microphone connected to my PC. So I chose the Windows WaveInXXX API to do the job.
After reading the documentation I decided to avoid using the callback mechanism with WaveInProc to save me the hassle synchronizing the threads. The whole application is pretty big and I thought this would make debugging simpler. When the application requests a block of samples, I just iterate over my buffer queue, take one out, copy the data, unprepare it, prepare it and add it back to the buffer queue. Basic program structure looks like this, I hope it makes the basic program flow clear:
WaveInOpen()
WaveInStart()
FunctionAddingPreparedBuffersToTheQueue()
while(someConditionThatEventuallyBecomesFalse)
if(NextBufferInQueueIsMarkedDone)
GetDataFromBuffer()
UnpreparePrepareHeaderAndAddBuffer()
else
WaitForAShortTime()
WaveInStop()
WaveInClose()
Now the problem appears: After some time (and I am unable to reproduce the exact condition), WaveInAddBuffer() causes a deadlock although it's in the same thread as all the rest. The header for the buffer that shall be added when the deadlock happens is prepared and dwFlags == WHDR_PREPARED == 2.
Any ideas what could cause this deadlock?
I have not seen such a problem, but a guess might be something like fragmentation related to all the unprepare/prepare cycles. They are not necessary. You can do the prepare once for each buffer and then unprepare when finished recording. (Prepare locks the buffer into physical memory.)
I'm trying to work with nasty large xml and text documents: ~40GBs.
I'm using Visual Studio 2012 on Windows 7.
I'm going to use 'Xerces' to snag the header/'footer tag' from the xmls.
I want to map an area of the file, say.. 60-120MBs.
Split the Map into (3 * processors/cores) equal parts. Setting each part as a buffer and loading the buffers into an array.
Then using (#processors/cores) while statments in new threads, I will synchronously count characters/lines/xml cycles while chewing through the the buffer array. When one buffer is completed the the process will jump to the next 'available' buffer and the completed buffer will be dropped out of memory. At the end I will add the total results into a project log.
Afterwards, I will reference the log, Split the files by character count/size(Or other option) to the nearest line or cycle and drop in the header and 'footer tag' to all the splits.
I'm doing this so I can import massive data to a MySQL server over a network with multiple computers.
My Question is, how do I create the buffer array and the file map with new threads?
Can I use :
win CreateFile
win CreateFileMapping
win MapViewOfFile
with standard ifstream operations and char buffers or should I opt something else?
Futher clarification:
My thinking is that if I can have the hard drive streaming the file into memory from one place and in one direction that I can use the full processing power of the machine to chew through seperate but equal buffers.
~Flavor: It's kind of like being a Shepard trying to scoop food out from one huge bin with 3-6 Large buckets with only two arms for X sheep that need to stay inside the fenced area. But they all move at the speed of light.
A few ideas or pointers might help me along here.
Any thoughts are Most Welcome. Thanks.
while(getline(my_file, myStr))
{
characterCount += myStr.length();
lineCount++;
if(my_file.eof()){
break;
}
}
This was the only code at run time for the test. 2hours, 30+min. 45-50% total processor for the program running it on a dual core 1.6Mhz laptop with 2GB RAM. Most of the RAM loaded right now is 600+MB from ~50 tabs open in firefox, Visual Studio at 60MB, then etcs.
IMPORTANT: During the test, the program running the code, which is only a window, and a dialog box, seemed to dump it's own working and private set of ram, down to like 300K ish, and didn't respond for the length of the test. I need to make another thread for the while statement I'm sure. But this means that NONE of the file was read into a buffer. The CPU was struggling for the entire run to keep up with the tinyest effort from the hard drive.
P.S. Further proof of CPU bottlenecking. It might take me 20min to transfer than entire file to another computer over my wireless network. Which includes the read process and a socket catch to write process on the other computer.
UPDATE
I used this adorable little thing to go from the previous test time to about 15-20min which is in line with what Mats Petersson was saying.
while (my_file.read( &bufferOne[0], bufferOne.size() ))
{
int cc = my_file.gcount();
for (int i = 0; i < cc; i++)
{
if (bufferOne[i] == '\n')
lineCount++;
characterCount++;
}
currentPercent = characterCount/onePercent;
SendMessage(GetDlgItem(hDlg, IDC_GENPROGRESS), PBM_SETPOS, currentPercent, 0);
}
Granted this is a single loop and it actually behaved much more appropriately than the previous test. This test was ~800% faster than the tight loop shown above this one with Getline. I set the buffer for this loop at 20MB. I jacked this code from: SOF - Fastest Example
BUT...
I would like to point out that while polling the process in resource mon and task manager, it clearly showed the first core at 75-90% usage, the second fluxuately 25-50% (Pretty standard for some minor background stuff that I have open), and the hard drive at.. wait for it... 50%. Some 100% disk time spikes but also some lows at 25%. All of which basically means that Splitting the buffer processing between two different threads could very well be a benefit. It will use all the system resources but.. that's what I want. I'll update later today when I have the working prototype.
MAJOR UPDATE:
Finally finished my project after a bunch of learning. No File Map needed. Only a bunch of vector char's. I have successfully built a dynamically executing file stream line and character counter.
The good news, went from the previous 10-15min marker to ~3-4min on a 5.8GB file, BOOYA!~
Very short answer: Yes, you can use those functions.
For reading data, it's likely the most efficient method to map the file content into memory, since it saves having to copy the memory into a buffer in the application, just read it straight into the place it's supposed to go. So, no problem as long as you have enough address space available - 64-bit machines should certainly have plenty, in a 32-bit system it may be more of a scarce resource - but for sections of a few hundred MB, it shouldn't be a huge issue.
However, using multiple threads, I'm not at all convinced. I have a fair idea that reading more than one part of a very large file will be counter productive. This will increase the amount of head movement on the disk, which is a large portion of transfer rate. You can count on some 50-100MB/s transfer rates for "ordinary" systems. If the system has some sort of raid controller or some such, maybe around double that - very exotic raid controllers may achieve three times.
So reading 40GB will take somewhere in the order of 3-15 minutes.
The CPU is probably not going to be very busy, and running multiple threads is quite likely to worsen the overall performance of the system.
You may want to keep a thread for reading and one for writing, and only actually write out the data once you have a sufficient amount of it, again, to avoid unnecessary moves of the read/write head on the disk(s).
I'm developing an application over Qt.
In this application the main thread is a web server. Another thread sometimes read data from big files (250mb) and write them in a output file (~2gb).
This thread performs high I/O operation on file, and CPU iowait is around 70%.
My problem is that when writing into the file, the web server is not responding quickly. What i understood is that the server's qt socket (on Linux) is represented by a system socket connected to the poll or select event system. So Qt send signal to my application only when poll emit event.
What i think is that too huge io operation with file writing may block the poll system, so my qt server doesn't receive socket event. When the thread has finished to write its data, everything become normal.
The file writing look like this:
while(dataToRead){
// context has the list of files to read and current step
dataToRead = extractData(context, &pBuffer, &sizeBuf);
fwrite (pBuffer, 1, sizeBuf, pOutFile);
free(pBuffer);
pBuffer = NULL;
// usleep(100000);
}
If i add a break with usleep function, this help to avoid the problem but not completely if i don't use a big enough sleep. But too big sleep destroy the performance, and i was the file generated as fast as possible.
What i'm doing wrong? Is it safe to read/write into a file as fast as possible? Is a sleep is mandatory in the above function? But how can we know the good timeslice?
I'm working on Mint LMDE, Linux 3.2.0 64 bits with Intel Core i5 2500 and standard HDD drive.
Edit:
A sample program that reproduce the problem is available here: https://bugreports.qt-project.org/secure/attachment/30436/TestQtBlocked.zip. Need qt's qmake to compile it. If you run it, it will create an empty 3GB file, the worker thread will be launched at startup and will create the file during few seconds. During this time, if you try to connect to http:// localhost:8081/ and you run many F5 to refresh the page, you will see that sometime it's not responding quickly.
If could be helpful if someone can reproduce my problem with my sample program, and let me know.
If you are starving the main thread's select calls, create a separate thread to do the file I/O. When the event comes from Qt, trigger some kind IPC that wakes up your worker thread to do the big file I/O and return from your event handler immediately.
(This assumes that writing to the file asynchronously makes sense to your program logic. Only you can figure out if that is true.)
from the man page:
size_t fwrite(const void *ptr, size_t size, size_t nmemb,
FILE *stream);
You want to write sizeBuf , 1 element.
You may want to tune buffering with setvbuf.
setvbuf(pOutfile, NULL, _IONBF, 0) - to disable buffering.
Complete example at:
http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/index.jsp?topic=%2Frtref%2Fsetvbuf.htm
better switch to work with file descritors, not file streams.
Using file descriptors you can use sendfile and slice.
man sendfile
man slice