I want to know about the utilities like winHex , which are the disk editor
. they access the hard disk & represent the data in hexadecimal of a whole harddisk of about 2TB .
How do they achieve this in a single scroll area & also provide the undo functionality in that....
which widget should be used to display such a huge amount of data.????
I want to make this application in QT.
How do they achieve this in a single scroll area
It is not a "single scroll area" containing the entire disk. It is a scrollbar and dynamically generated content for whatever disk content you are showing at the time.
Simply calculate the position based on the scroll location (unless your screen is 10000's of pixels tall, however, you will not be able to place the cursor EXACTLY on the disk sector you want).
also provide the undo functionality in that....
Undo functionality, I expect (I haven't looked at the code) is done by holding "address changed, old value" in some sort of container. Pretty much the same way you'd do undo information for any other large dataset.
which widget should be used to display such a huge amount of data.????
One which shows text and allows you to intercept the redraw and provide your own data on each redraw operation. I'm afraid I don't know QT very well, so can't advice on the details.
Obviously, one factor you haven't covered is "how do you open/mount the whole disk in read-write mode when it is already mounted" - I'm not sure if it allows this, but if it does, I expect there is a disk filter driver involved that has "sideways" interfaces to allow updates behind the scenes of the filesystem.
Edit: In answer to the question in the comment:
There are two options, either write to disk whenever the data is changed. In which case the code needs to remember all the original values, and restore them when the user does the undo operation. The alternative, which is approximately the same effort is to store all the edits, ("change value at 1000 to 05"), and then when asked to display some content, process any edits within the displayed range before the actual display operation.
Obviously, if someone decides to play "monkey on keyboard" for many many hours (weeks, months) to fill the ENTIRE disk with rather random content, then that would be a problem to "remember" all that without running out of memory, so you probably need a "I've run out of memory to store undo-information, do you want to save what you have done so far?" type option.
One could also consider a "same value stored in a large section" type compression (e.g. if you have a "fill from A to B with value X", you store simply that "from A to B we have filled with X", rather than store, potentially, many megabytes of "A = X, A+1 = X, A+2 = X ... B-1=X, B=X").
Related
I have to give some background first. I want to implement an optimized storage engine for OSM planet data (50GB+). The purpose of this engine is to enable map area extractions as fast as possible - while also remaining the ability for minutely updates. The design I've chosen for several reasons (not mentioning all of them here) is to use one data cell per grid. E.g. think of a all cells on a map being distinct files or databases: http://3.bp.blogspot.com/_CntRFtGsdQo/TTU5UMlLkTI/AAAAAAAAARk/_hW8n33t4Ok/s1600/utmworld.gif
(Jut to get the idea though, this is not the actual cell grid I'll be using)
I have never used leveldb before, but settled on it for it's bulk insert and update performance. However, I'd like to know about the "performance characteristics" when opening many very small and very large leveldb databases. very small meaning just a few kB, very large meaning a few hundred MB
I expect that I have to open / close somewhere between 10-100 dbs per minute. I'd rule out leveldb if it needs significant initialization time.
An answer to this question could be either concrete figures, or insight to what leveldb does during initialization and how it relates to data / index size.
PS. I'll also do my own measurements of course. But as with all tests, I may draw wrong conclusions from my sample data.
I've seen some similar questions out of which I have made a system which works for me but I need to optimize it because this program alone is taking up a lot of CPU load.
Here is the problem exactly.
I have an incoming signal/stream of data which I need to plot in real time. I only want a limited number of points to be displayed at a time (Say 1024 points) so I plot the data points along the y axis against an index from 0-1024 on the x-axis. The values of the incoming data range from 0-1023.
What I do currently (This is all in C++) is I put the data into a circular loop as it comes and each time the data gets updated (Or every second/third data point), I write out to a file and using a pipe, I plot the data from that file with gnuplot.
While this works almost perfectly, it causes a fair bit of load (Depending on the input data rate, I saw even 70% usage on both my cores of my Core 2 Duo). I'll need to be running some processor intensive code along with this short program so I feel that it is almost necessary to optimize it.
What I was hoping could be done is this: Can I only plot the differences between the current plot and the new data (Or plot each point as it comes in without replotting the whole graph such that the old item at that x index is removed).
I have a fixed number of points on the graph so replot wouldn't work. I want the old point at that x location to be removed.
Unfortunately, what you're trying to accomplish can't be done. You can mark a datafile as volatile or use the refresh keyword, but those only update the plot without re-reading the data. You want to re-read the data and then only update the differences.
There are a few things that might be helpful though. 1) your eye can only register ~26 frames per second. So, if you have a way to make sure that you only send data 26x per second to gnuplot, that might help. 2) How are you writing the datafiles? Are you dumping as ascii or binary? Doing a binary dump might be faster (both for writing and for gnuplot to read). You'll have to experiment.
There is one hack which will probably not make your script go faster, but you can try it (if you know a reasonable yrange to set, and are using points to plot the data)...
#set up code:
set style line 1 lc rgb "blue"
set xrange [0:1023]
set yrange [0:1]
plot NaN notitle #Only need to do this once.
for [i=0:1023] set label i+1 at i,0 point ls 1 #Labels must have tags > 0 :-(
#this part gets repeated by your C code.
#you could move a few points at a time to make it more responsive.
set label 401 at 400,0.8 #move point number 400 to a different y value
refresh #show it at it's new location.
You can use gnuplot to do dynamic plotting of data as explained in their FAQ, using the reread function. It seems to run at quite a low load and automatically scrolls the graph when it reaches the end. To run at low load I found I had to add a ; sleep 1 after the awk command (in their example file dyn-ping-loop.gp) otherwise it spends too much CPU on looping on the awk processing.
I have 2 hard-disk volumes(one is a backup image of the other), I want to compare the volumes and list all the modified files, so that the user can select the ones he/she wants to roll-back.
Currently I'm recursing through the new volume and comparing each file's time-stamps to the old volume's files (if they are int the old volume). Obviously this is a blunder approach. It's time consuming and wrong!
Is there an efficient way to do it?
EDIT:
- I'm using FindFirstFile and likes to recurse the volume, and gather info of each file (not very slow, just a few minutes).
- I'm using Volume Shadow Copy to backup.
- The backup-volume is remote so I cannot continuously monitor the actual volume.
Part of this depends upon how the two volumes are duplicated; if they are 'true' copies from the file system's point of view (e.g. shadow copies or other block-level copies), you can do a few tricky little things with respect to USN, which is the general technology others are suggesting you look into. You might want to look at an API like FSCTL_READ_FILE_USN_DATA, for example. That API will let you compare two different copies of a file (again, assuming they are the same file with the same file reference number from block-level backups). If you wanted to be largely stateless, this and similar APIs would help you a lot here. My algorithm would look something like this:
foreach( file in backup_volume ) {
file_still_exists = try_open_by_id( modified_volume )
if (file_still_exists) {
usn_result = compare_usn_values_of_files( file, file_in_modified_volume )
if (usn_result == equal_to) {
// file hasn't changed at all
} else {
// file has changed (somehow)
}
} else {
// file was deleted (possibly deleted and recreated)
}
}
// we still don't know about files new in modified_volume
All of that said, my experience leads me to believe that this will be more complicated than my off-the-cuff explanation hints at. This might be a good starting place, though.
If the volumes are not block-level copies of one another, then it will be very difficult to compare USN numbers and file IDs, if not impossible. Instead, you may very well be going by file name, which will be difficult if not impossible to do without opening every file (times can be modified by apps, sizes and times can be out of date in the findfirst/next queries, and you have to handle deleted-then-recreated cases, rename cases, etc.).
So knowing how much control you have over the environment is pretty important.
Instead of waiting until after changes have happened, and then scanning the whole disk to find the (usually few) files that have changed, I'd set up a program to use ReadDirectoryChangesW to monitor changes as they happen. This will let you build a list of files with a minimum of fuss and bother.
Assuming you're not comparing each file on the new volume to every file in the snapshot, that's the only way you can do it. How are you going to find which files aren't modified without looking at all of them?
I am not a Windows programmer.
However shouldn't u have stat function to retrieve the modified time of a file.
Sort the files based on mod time.
The files having mod time greater than your last backup time are the ones of your interest.
For the first time u can iterate over the back up volume to figure out the max mod time and created time from your interested set.
I am assuming the directories of interest don't get modified in the backup volume.
Without knowing more details about what you're trying to do here, it's hard to say. However, some tips about what I think you're trying to achieve:
If you're only concerned about NTFS volumes, I suggest looking into the USN / change journal API's. They have been around since 2000. This way, after the initial inventory you can only look at changes from that point on. A good starting point for this, though a very old article is here: http://www.microsoft.com/msj/0999/journal/journal.aspx
Also, utilizing USN API's, you could omit the hash step and just record information from the journal yourself (this will become more clear when/if you look into said APIs)
The first time through comparing a drive's contents, utilize a hash such as SHA-1 or MD5.
Store hashes and other such information in a database of some sort. For example, SQLite3. Note that this can take up a huge amount of space itself. A quick look at my audio folder with 40k+ files would result in ~750 megs of MD5 information.
I'm working on a Qt GUI for visualizing 'live' data which is received via a TCP/IP connection. The issue is that the data is arriving rather quickly (a few dozen MB per second) - it's coming in faster than I'm able to visualize it even though I don't do any fancy visualization - I just show the data in a QTableView object.
As if that's not enough, the GUI also allows pressing a 'Freeze' button which will suspend updating the GUI (but it will keep receiving data in the background). As soon as the Freeze option was disabled, the data which has been accumulated in the background should be visualized.
What I'm wondering is: since the data is coming in so quickly, I can't possibly hold all of it in the memory. The customer might even keep the GUI running over night, so gigabytes of data will accumulate. What's a good data storage system for writing this data to disk? It should have the following properties:
It shouldn't be too much work to use it on a desktop system
It should be fast at appending new data at the end. I never need to touch previously written data anymore, so writing into anywhere but the end is not needed.
It should be possible to randomly access records in the data. This is because scrolling around in my GUI will make it necessary to quickly display the N to N+20 (or whatever the height of my table is) entries in the data stream.
The data which is coming in can be separated into records, but unfortunately the records don't have a fixed size. I'd rather not impose a maximum size on them (at least not if it's possible to get good performance without doing so).
Maybe some SQL database, or something like CouchDB? It would be great if somebody could share his experience with such scenarios.
I think that sqlite might do the trick. It seems to be fast. Unfortunately, I have no data flow like yours, but it works well as a backend for a log recorder. I have a GUI where you can view the n, n+k logs.
You can also try SOCI as a C++ database access API, it seems to work fine with sqlite (I have not used it for now but plan to).
my2c
I would recommend a simple file based solution.
If you can use fixed size records: If the you get the data continuously with constant sample rate, random access to data is easy and very fast when you know the time stamp of first data point and the sample rate. If the sample rate varies, then write time stamp with each data point. Now random access requires binary search, but it is still fast enough.
If you have variable size records: Write the variable size data to one file and to other file write indexes (which are fixed size) to the data file. And if the sample rate varies, write time stamps too. Now you can do the random access fast using the index file.
If you are using Qt to implement this kind of solution, you need two sets of QFile and QDataStream instances, one for writing and one for reading.
And a note about performance: don't flush the file after every data point write. But remember to flush the file before doing any random access to it.
I have a MFC dialog with 32 CComboBoxes on it that all have the same data in the listbox. Its taking a while to come up, and it looks like part of the delay is the time I need to spend using InsertString() to add all the data to the 32 controls. How can I subclass CComboBox so that the 32 instances share the same data?
Turn off window redrawing when filling the combos. e.g.:
m_wndCombo.SetRedraw(FALSE);
// Fill combo here
...
m_wndCombo.SetRedraw(TRUE);
m_wndCombo.Invalidate();
This might help.
The first thing I would try is calling "InitStorage" to preallocate the internal memory for the strings.
From MSDN:
// Initialize the storage of the combo box to be 256 strings with
// about 10 characters per string, performance improvement.
int n = pmyComboBox->InitStorage(256, 10);
In addition to what has already been said, you might also turn off sorting in your combo box and presort the data before you insert it.
One way along the lines of your request would be to go owner drawn - you will be writing a fair chunk of code, but you won't have to add the data to all of them.
"CComboBox::DrawItem"
Support.microsoft have this article on subclassing a Combo box which might also be of interest
"How to subclass CListBox and Cedit inside of CComboBox"
Really one has to ask if it is worth the effort, and alot of that depends things like
number of entries in the list
number of times the dialog will show
variability of the combo content
optomising elsewhere
not drawing until the screen is complete
only building the dialog once and re showing it.
using the one combo but showing it in different locations at different times