I am trying to build a tool which is something similar to Task Manager. I was able to get the CPU and Memory of each processes, but I couldn't figure out the Disk statistics. I was able to get the I/O Read, Write bytes, but it includes all file, disk and network. How could I get only the Disk Utilized by each processes??Otherwise is it possible to segregate the disk statistics from those I/O bytes? If yes, how could I do it?
Related
I am receiving a large quantity of data at a fixed rate. I need to do some processing on this data on a different thread, but this may run slower than the data is coming in, so I need to buffer the data. Due to the quantity of data coming in the available RAM would be quickly exhausted, so it needs to overflow onto the hard disk. What I could do with is something like a filesystem-backed pipe, so the writer could be blocked by the filesystem, but not by the reader running too slowly.
Here's a rough set of requirements:
Writing should not be blocked by the reader running too slowly.
If data is read slow enough that the available RAM is exhausted it should overflow to the filesystem. It's ok for writes to the disk to block.
Reading should block if no data is available unless the stream has been closed by the writer.
If the reader is able to keep up with the data then it should never hit the hard disk as the RAM buffer would be sufficient (nice but not essential).
Disk space should be recovered as the data is consumed (or soon after).
Does such a mechanism exist in Windows?
This looks like a classic message queue. Did you consider MSMQ or similar? MSMQ has all the properties you are asking for. You may want to use direct addressing to avoid Active Directory http://msdn.microsoft.com/en-us/library/ms700996(v=vs.85).aspx and use local or TCP/IP queue address.
Use an actual file. Write to the file as the data is received, and in another process read the data from the file and process it.
You even get the added benefits of no multithreading.
I have written a program in C/C++ which needs to fetch data from the disk. After some time it so happens that the operating system stores some of the data in its caches. Is there some way by which I may figure out in a C/c++ programs whether the data has been retrieved from the caches or the data has been retrieved from the disk?
A simple solution would be to time the read operation. Disk reads are significantly slower. you can read a a group of file blocks (4K) twice to get an estimate.
The problem is that if you run the program again or copy the file in a shell, the OS will cache it.
Bakcground:
I'm developing on the new SparkleDB NoSQL database, the database is ACID and has its own disk space manager (DSM) all for its database file storage accessing. The DSM allows for multiple thread concurrent I/O operations on the same physical file, ie. Asynchronous I/O or overlapped I/O. We disable disk caching, thus we write pages directly to the disk, as this is required for ACID databases.
My question is:
Is there a performance gain by arranging continuous disk page from many threads writes before sending the I/O request to the underlying disk OS I/O subsystem(thus merging the data to be written if they are continuous), or does the I/O subsystem do this for you? My question applies to UNIX, Linux, and Windows.
Example (all happends within a space of 100ms):
Thread #1: Write 4k to physical file address 4096
Thread #2: Write 4k to physical file address 0
Thread #3: Write 4k to physical file address 8192
Thread #4: Write 4k to physical file address 409600
Thread #5: Write 4k to physical file address 413696
Using this information, the DSM arranges a single 12kb write operation to physical file address 0, and a single 8kb write operation to physical file address 409600.
Update:
The DSM does all the physical file access address positioning on Windows by providing a OVERLAPPED structure, io_prep_pwrite on Linux AIO, and aiocb's aio_offset on POSIX AIO.
The most efficient method to use a hard drive is to keep writing as much data as you can while the platters are still spinning. This involves reducing the quantity of writes and increase the amount of data per write. If this can happen, then having a disk area of continuous sectors will help.
For each write, the OS needs to translate the write to your file into logical or physical coordinates on the drive. This may involve reading the directory, searching for your file and locating the mapping of your file within the directory.
After the OS determines the location, it sends data across the interface to the hard drive. Your data may be cached along the way many times until it is placed onto the platters. An efficient write will use the block sizes of the caches and data interfaces.
Now the questions are: 1) How much time does this save? and 2) Is the time saving significant. For example, if all this work saves you 1 second, this one second gained may be lost in waiting for a response from the User.
Many programs, OS and drivers will postpone writes to a hard drive to non-critical or non-peak periods. For example, while you are waiting for User input, you could be writing to the hard drive. This posting of writes may be less effort than optimizing the disk writes and have more significant impact to your application.
BTW, this has nothing to do with C++.
I am writing to USB disk from a lowest priority thread, using chunked buffer writing and still, from time to time the system in overall lags on this operation. If I disable writing to disk only, everything works fine. I can't use Windows file operations API calls, only C write. So I thought maybe there is a WinAPI function to turn on/off USB disk write caching which I could use in conjunction with FlushBuffers or similar alternatives? The number of drives for operations is undefined.
Ideally I would like to never be lagging using write call and the caching, if it will be performed transparently is ok too.
EDIT: would _O_SEQUENTIAL flag on write only operations be of any use here?
Try to reduce I/O priority for the thread.
See this article: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx
In particular use THREAD_MODE_BACKGROUND_BEGIN for your IO thread.
Warning: this doesn't work in Windows XP
The thread priority won't affect the delay that happens in the process of writing the media, because it's done in the kernel mode by the file system/disk drivers that don't pay attention to the priority of the calling thread.
You might try to use "T" flag (_O_SHORTLIVED) and flush the buffers at the end of the operation, also try to decrease the buffer size.
There are different types of data transfer for USB, for data there are 3:
1.Bulk Transfer,
2.Isochronous Transfer, and
3.Interrupt Transfer.
Bulk Transfers Provides:
Used to transfer large bursty data.
Error detection via CRC, with guarantee of delivery.
No guarantee of bandwidth or minimum latency.
Stream Pipe - Unidirectional
Full & high speed modes only.
Bulk transfer is good for data that does not require delivery in a guaranteed amount of time The USB host controller gives a lower priority to bulk transfer than the other types of transfer.
Isochronous Transfers Provides:
Guaranteed access to USB bandwidth.
Bounded latency.
Stream Pipe - Unidirectional
Error detection via CRC, but no retry or guarantee of delivery.
Full & high speed modes only.
No data toggling.
Isochronous transfers occur continuously and periodically. They typically contain time sensitive information, such as an audio or video stream. If there were a delay or retry of data in an audio stream, then you would expect some erratic audio containing glitches. The beat may no longer be in sync. However if a packet or frame was dropped every now and again, it is less likely to be noticed by the listener.
Interrupt Transfers Provides:
Guaranteed Latency
Stream Pipe - Unidirectional
Error detection and next period retry.
Interrupt transfers are typically non-periodic, small device "initiated" communication requiring bounded latency. An Interrupt request is queued by the device until the host polls the USB device asking for data.
From the above, it seems that you want a Guaranteed Latency, so you should use Isochronous mode. There are some libraries that you can use like libusb, or you can read more in msdn
To find out what is letting your system hang you first need to drill down to the Windows hang. What was Windows doing while you did experience the hang?
To find this out you can take a kernel dump. How to get and analyze a Kernel Dump read here.
Depending on the findings you get there you then need to decide if there is anything under your control you can do about. Since you are using a third party library to to the writing there is little you can do except to set the IO priority, thread priority on thread or process level. If the library you were given links against a specific CRT you could try to build your own customized version of it to e.g. flush after every write to prevent write combining by the OS to write only data in big chunks back to disc.
Edit1
Your best bet would be to flush the device after every write. This could force the OS to flush any pending data and write the current pending writes to disc without caching the writes up to certain amount.
The second best thing would be to simply wait after each write to give the OS the chance to write pending changes though small back to disc after a certain time interval.
If you are deeper into performance you should try out XPerf which has a nice GUI and shows you even the call stack where your process did hang. The Windows Team and many other teams at MS use this tool to troubleshoot hang experiences. The latest edition with many more features comes with the Windows 8 SDK. But beware that Xperf only works on OS > Vista.
I want to be able to gain access to the ram of a program and send some of the ram data to another program. I might try to to send the data over an Ethernet wire, and load up a similar program with the state of the program on the sending computer. I think it would be similar to hibernating.
Is this possible? I don't really know where to start to find out.