We have written a feature that allows our users to upload a file to a nas device using UNC path. The feature has not been stressed tested and I'm not 100% convinced CFFILE can handle the load. Does cffile use multi-threading to perform writes to the file system and what kind of load can cffile support?
I worked on a project that had numerous simultaneous writes of separate files to a local device without any problems. I doubt CF (or, for that matter, the underlying OS) would support simultaneous writes to the same file, regardless of location, but as long as you are writing different files, simultaneous writes should be fine.
As Ben's answer says, CF can write multiple files simultaneously. The first bottleneck you run into will probably be what your hardware can support. If you start running into issues, consider getting a Solid State Drive specifically for these files to be written to.
Related
In our Windows 8 application, we are using the IXMLHTTPRequest2 method to stream files over HTTP, files whose size can reach gigabytes. This all works perfectly, except for the fact that internally, WinRT has a caching system which stores all that is streamed over the call to IXMLHTTPRequest2 in the temporary internet cache. As we stream more and more files, the cache is never emptied and it just starts taking more and more space on disk, until the disk is full.
Optimally, we would like to disable this caching functionality entirely. Another option we could live with would be that the cached files would be removed after a short while (although we'd like to avoid having to browse the temporary internet cache and removing files manually).
We've tried adding the "Expires: 0" header to the server response, as well as disabling the caching directly inside IE (we thought this might have an influence on the call to IXMLHTTPRequest2), but to no avail.
Anyone has any thoughts on this?
I realize this question is similar to another one posted here, however, our problem has more to do with the space that is taken by the cache rather than by the "freshness" of the files.
EDIT:
We have also found this post on the MSDN forums, where, according to a MSFT Moderator, "The system will also periodically cleans up the cache so you will not have to worry about running out of disk space", but that is not the case in our scenario.
According to this post on the MSDN forums, this isn't possible and is a known limitation with WinRT.
Sometimes the only answer is bad news. :-[
As ildjarn noted, this seems to be unavoidable on Windows 8. But it looks like there might be a way to fix this for clients running Windows 8.1.
I haven't tried it myself, but I just noticed that there is now "IXMLHTTPRequest3" which extends "IXMLHTTPRequest2" with some new features:
http://msdn.microsoft.com/en-us/library/windows/desktop/dn376398%28v=vs.85%29.aspx
The relevant feature is:
XHR_PROP_NO_CACHE – Suppresses cache reads and writes for the HTTP request.
That sounds promising.
I want to make sure no other process changes the contents of a particular folder. I'd like to stop other processes from creating, deleting, or modifying files within a folder. Further, I'd like this restriction to nest down into subfolders.
I can get close to what I want by enumerating the contents of the folder and calling CreateFile on each file. This has problems in that it doesn't stop new files from being created and requires I acquire lots of handles.
Is there an easier way to get what I want?
Update: Addressing some comments, what I want to do is to prevent modification while my program is running. It's OK if the file get modified between runs.
In terms of ACLS, the app has read access to all files within the folder.
ACLs would be the best way to go but if you can't get that to work for whatever reason (you're fairly thin on the details) then use a file system filter driver. Note that this isn't very straightforward. Not rocket science either, but you have to be extra careful with driver development.
http://www.microsoft.com/whdc/driver/filterdrv/default.mspx
I'm trying to speedup directory enumeration in C++, where I'm recursing into subdirectories. I currently have an app which spends 95% of it's time in FindFirst/FindNextFile APIs, and it takes several minutes to enumerate all the files on a given volume. I know it's possible to do this faster because there is an app that does: Everything. It enumerates my entire drive in seconds.
How might I accomplish something like this?
I realize this is an old post, but there is a project on source forge that does exactly what you are asking and the source code is available.
You can find the project here: NTFS-Search
"Everything" builds an index in the background, so queries are against the index not the file system itself.
There are a few improvements to be made - at least over the straight-forward algorrithm:
First, breadth search over depth search. That is, enumerate and process all files in a single folder before recursing into the sub folders you found. This improves locality - usually a lot.
On Windows 7 / W2K8R2, you can use FindFirstFileEx with FindExInfoBasic, the main speedup being omitting the short file name on NTFS file systems where this is enabled.
Separate threads help if you enumerate different physical disks (not just drives). For the same disk it only helps if it's an SSD ("zero seek time"), or you spend significant time processing a file name (compared to the time spent on disk access).
[edit] Wikipedia actually has some comments -
Basically, they are skipping the file system abstraction layer, and access NTFS directly. This way, they can batch calls and skip expensive services of the file system - such as checking ACL's.
A good starting point would be the NTFS Technical Reference on MSDN.
"Everything" accesses directory information at a lower level than the Win32 FindFirst/FindNext APIs.
I believe it reads and interprets the NTFS MFT structures directly, and that this is one of the main reasons for its performance. It's also why it requires admin privileges and why "Everything" only indexes local or removable NTFS volumes (not network drives, for example).
A couple other utilities that do the similar things are:
FindOnClick by 2Brightsparks
Search GT
A little reverse engineering with a debugger on these tools might give you some insight on the techniques they use.
Don't recurse immediately, save a list of directories you find and dive into them when finished. You want to do linear access to each directory, to take advantage of locality of reference and any caching the OS is doing.
If you're already doing the best you can to get the maximum speed from the API, the next step is to do low-level disk accesses and bypass Windows altogether. You might get some guidance from the NTFS drivers for Linux, or perhaps you can use one directly.
If you are doing this on NTFS, here's a lib for low level access: NTFSLib.
You can enumerate through all file records in $MFT, each representing a real file on disk. You can get all file attributes from the record, including $DATA.
This may be the fastest way to enumerate all files/directories on NTFS volumes, 200k~300k files per minute as I tested.
I've recalled using little 'filesystems' before that basically provided an interface to something else. For example, I believe there was a GMail filesystem that created an entry in My Computer and could be used like any other drive on your local computer. How can I go about implementing something like this in C++?
Thank you!
Try Dokan. It's like FUSE, except for Windows. I think there are certain limitations to namespace extensions, like they cannot be accessed from the command line, but I'm really not sure as of now.
Writing an actual file-system involves writing a driver; which means kernel-mode code (scary stuff) and paying for getting the IFS DDK. (edit: looks like they don't charge for it anymore)
What you probably want is a “namespace extension”.
Try this: The Complete Idiot's Guide to Writing Namespace Extensions - CodeProject
This may be a starting point to extending NTFS in the way that the GMail filesystem used to do: Windows NT reparse points.
The GMail Filesystem is just the name given; it is not any filesystem as such. It is just a namespace extension for Windows Explorer that links with the GMail account of yours!
I dont know exactly what you are trying to do, but in anyway, I believe, the following link will be of some use to you:
http://msdn.microsoft.com/en-us/magazine/cc188741.aspx
Just as a reference: virtual drives can be created using our Callback File System product, which is a supported, documented and maintained solution.
I was thinking of this too, perhaps some example code ? (email me if i forget plz ;p doin sdk now)
I'm thinking of a similar filesystem that would plug in as a driver and allow dynamic 'soft raid' on larger files mostly by putting them on more than one disk, perhaps some compression options and 'smart' filters to toggle usage in high disk space low usage and other situations more effectively, with status controls and indicators as a normalish program too
Seems like I would load the driver kit,
then i want the file writing event, and am mostly replacing fopen and similar functions automatically as an intermediate driver with a little windows network driver experience
I also heard good things about developing on a virtual machine for less crashing and more debugging
Also perhaps more metainfo on some or all files, including files in special folders with options too, including maybe both fast and simple (obfuscated and/or symmetic key) encryption options on folder, specified, all, letter, etc, or whatever, or the slower version and maybe integrated and optional (also profitable) online cvs-like diff style backups that mostly target changes to hot files for online backup at intervals and prices, mostly perhaps with matching keyboard events and might even be useful as simply a keylogging online backup service that is reasonably secure too
while avoiding common files like windows files or the normal stuff in the 'programs' directory that can be copied easily with pirate tools, unlike all of your documents.
I'm using rsync to run backups of my machine twice a day and the ten to fifteen minutes when it searches my files for modifications, slowing down everything considerably, start getting on my nerves.
Now I'd like to use the inotify interface of my kernel (I'm running Linux) to write a small background app that collects notifications about modified files and adds their pathnames to a list which is then processed regularly by a call to rsync.
Now, because this process by definition always works on files I've just been - and might still be - working on, I'm wondering whether I'll get loads of corrupted / partially updated files in my backup as rsync copies the files while I'm writing to them.
I couldn't find anyhing in the manpage and was yet unsuccessful in googling for the answer. I could go read the source, but that might take quite a while. Anybody know how concurrent file access is handled inside rsync?
It's not handled at all: rsync opens the file, reads as much as it can and copies that over.
So it depends how your applications handle this: Do they rewrite the file (not creating a new one) or do they create a temp file and rename that when all data has been written (as they should).
In the first case, there is little you can do: If two processes access the same data without any kind of synchronization, the result will be a mess. What you could do is defer the rsync for N minutes, assuming that the writing process will eventually finish before that. Reschedule the file if it is changes again within this time limit.
In the second case, you must tell rsync to ignore temp files (*.tmp, *~, etc).
It isn't handled in any way. If it is a problem, you can use e.g. LVM snapshots, and take the backup from the snapshot. That won't in itself guarantee that the files will be in a usable state, but it does guarantee that, as the name implies, it's a snapshot at a specific time.
Note that this doesn't have anything to do with whether you're letting rsync handle the change detection itself or if you use your own app. Your app, or rsync itself, just produces a list of files that have been changed, and then for each file, the rsync binary diff algorithm is run. The problem is if the file is changed while the rsync algorithm runs, not when producing the file list.