I am using SQLite3 in my RTOS system. I've set the configuration such that it will lock for each transaction. On my system I end up with one file on the drive
"SQLDB.db"
When there is a transaction you can usually see a lock file if you are fast enough. "SQLDB.db.lock".
What's driving me wild is that when I delete "SQLDB.db" I still have the ability to do SELECTs from the database, but I cannot insert. It's not a caching issue because I can do selects on multiple tables (that I haven't done any operations on before rebooting the system).
So my question is, is the DB file being cached? Is it saved in RAM somewhere? How is it possible to query this ghost database?
In Unix, when you delete a file, the directory entry is deleted immediately, but the actual file data is deleted only when all open file handles have been closed.
Apparently, your RTOS behaves the same.
Related
I have files on disk that contain critical information that needs to be updated every so often. It's safe to assume that only one process accesses that file, that the app is running on Windows and on an NTFS file system.
My goal is to prevent possible data corruption (partial write, for example) in case of program crashes, power outage, etc. In this question, I'm not concerned with disk sector corruption.
This is my current approach:
On write:
Write new data in a temporary file.
Create a voucher file. (This file guarantees that the temporary file has valid contents.)
Write new data in the target file.
Delete the voucher file.
Delete the temporary file.
On read:
If the voucher file exists:
Copy the temporary file to the target file.
Delete the voucher file.
Read the target file.
Delete the temporary file, if it exists.
Questions:
With my current approach, if I'm overwriting only a few bytes of the file, if it is possible for other bytes to erroneously change as a result of some failure?
With my current approach, is it neccessary to disable file caching, or does Windows reliably recover the cache in case of power outage?
Is there a faster approach, by using some knowledge of how either Windows or NTFS work? If so, I would be interested if there are any differences on FAT file systems.
I'm aware of the transactional NTFS API in Win32, but I don't want to use it since it's deprecated.
I am developing an application for a small office to maintain their monetary accounts.
My application can help create a file which can store all the information.
But it should not be accessible to the user other than in my application.
Why? Because somebody may delete the file & all the records will vanish.
The environment is a Windows PC with a single account having admin privilages.
I am developing the application in C++ using the MinGW compiler.
I am sort of blank right now, as to how I can create such a file.
Any suggestions please?
If your application can modify it, then the user under whose credentials it runs can modify it, period. Also, if he has administrator privileges then you can't stop him from deleting stuff, even if your application runs under different credentials and the file is protected by ACLs.
Now, since the problem seems to be not of security, but of protecting the user from himself, I would just store the file in a location that is "out of sight" enough and be happy with it; write your data in %APPDATA%\yourappname1, such a directory is specifically for user-specific application data that is not intended to be touched directly by the user.
If you want to be paranoid you can enable every security setting you can find (hide the directory, protect it with a restrictive ACL when the app is not running, open it for exclusive access, ...), but if you ask me it's just wasted time:
the average user (our target AFAICT) doesn't mess in appdata, since it's a hidden folder to begin with;
the "power user" who messes around, if sufficiently determined to shoot himself in the foot (or voluntarily do damage), will find a way, since the security settings are easily circumventable in your situation (an admin can take ownership of any file and change its ACLs, and use applications like Unlocker to circumvent file locking);
the technician that has legitimate reasons to access the file (e.g. he must take/restore a backup of it) will be frustrated by all these useless precautions.
You can get the actual %APPDATA% path by expanding the corresponding environment variable or via SHGetFolderPath/SHGetKnownFolderPath (or whatever replacement they invented for it in new Windows versions).
Make sure your application loads on windows boot and opens the file with dwShareMode 0 option.
Here is an MSDN Example
You would need to give these files their own file extension and perhaps other security measures (I.e passwords to files). If you want these files to be suggested by Windows then you will have to do some work with the registry.
Here's a good source since you're concerned with Windows only:
http://msdn.microsoft.com/en-us/library/windows/desktop/ff513920(v=vs.85).aspx
As far as keeping the data from being deleted, redundancy my friend redundancy. Talk to a network administrator about how they keep their data safe. I'd bet money on them naming lot's of backups as one of their reasons.
But it should not be accessible to the user other than in my application.
You cannot do that.
Everything that exists on machine user has physical access to can be deleted if user has sufficient determination.
You can protect your file from being deleted while program is running - on windows, you can't delete open files. Keep file open, people won't delete it while your program is running. Instead, they will kill your program via task manager and delete the file anyway.
Either that, or you could upload it somewhere. Data that is not located on physically accessible device cannot be easily deleted by user. However, somebody will have to run the server (and deal with security + possibly write server software). In your case it might not be worth it.
I'd suggest to document location of user data in help file, and you should probably put "!do not delete this.txt" or something into folder with this file.
So often my applications want to save files to load again later. Having recently got unlucky with a crash, I want to write the operation in such a way that I am guaranteed to either have the new data, or the original data, but no a corrupted mess.
My first idea was to do something along the lines of (to save a file called example.dat):
Come up with a unique file name for the target directory, e.g. example.dat.tmp
Create that file and write my data to it.
Delete the original file (example.dat)
Rename ("Move") the temp file to where the original was (example.dat.tmp -> example.dat).
Then at load time the application can follow the following rules:
If no "example.dat" and no "example.dat.tmp", first run / new project, so load in the defaults / create new file.
If "example.dat" and no "example.dat.tmp", then load example.dat (normal load case)
If "example.dat.tmp" exists offer the user the chance to potentially recover data. If "example.dat" also exists, do not overwrite it without explicit user constant.
However, having done a little research, I found that as well as OS caching which I may be able to override with the file flush methods, some disk drives still then cache internally and may even lie to the OS saying they are done, so 4. could complete, the write is not actually written, and if the system goes down I have lost my data...
I am not sure the disk problem is actually solvable by an application, but are the general rules above the correct thing to do? Should I keep an old recovery copy of the file for longer to be sure, what are the guidelines regarding such things (e.g. acceptable disk usage, should the user choose, where to put such files, etc.).
Also how should I avoid potential conflict the user and other programs for "example.dat.tmp". I recall seeing a "~example.dat" sometimes from some other software, is that a better convention?
If the disk drives report back to the OS that the data is
physically on the disk, and it's not, then there's not much you
can do about it. A lot of disks do cache a certain number of
writes, and report them done, but such disks should have
a battery backup, and finish the physical writes no matter what
(and they won't loose data in case of a system crash, since they
won't even see it).
For the rest, you say you've done some research, so you no doubt
know that you can't use std::ofstream (nor FILE*) for this;
you have to do the actual writes at the system level, and open
the files with special attributes for them to ensure full
synchronization. Otherwise, the operations can stick around in
the OS buffering for a while. And that as far as I know,
there's no way of ensuring such synchronization for a rename.
(But I'm not sure that it's necessary, if you always keep two
versions: my usual convention in such cases is to write to
a file "example.dat.new", then when I'm done writing, delete
any file named "example.dat.bak", rename "example.dat" to
"example.dat.bak", and then rename "example.dat.new" to
"example.dat". Given this, you should be able to figure out
what did or did not happen, and find the correct file
(interactively, if need be, or insert an initial line with the
timestamp).
You should lock the actual data file while you write its substitute, if there's a chance that a different process could be going through the same protocol that you are describing.
You can use flock for the file lock.
As for your temp file name, you could make your process ID part of it, for instance "example.dat.3124," No other simultaneously-running process would generate the same name.
I have a Windows process that runs in the background and periodically backs up files. The backup is done by uploading the file to a server.
During the backup, I don't want to lock any other application out of writing to or reading from the file; if another applications wants to change the file, I should stop the upload and close the file.
Share mode is useless here; even though I'm sharing all access to the file being read, if the other process attempts to open it for writing without sharing read, it will be locked out of the file.
Is it possible to accomplish this on Windows, without writing a driver?
You may be interested in Volume Shadow Copy.
You certainly could copy the file and then check that the original and copy are identical (thus representing a consistent snapshot) prior to uploading to the server.
According to this MSDN page, if using NTFS, you should be able to lock the file inside a transaction of yours, while uploading the file to a server. This will ensure your view of the file does not change, even if the file has been changed externally.
Windows Win32 C++ question about flushing file activity to disk.
I have an external application (ran using CreateProcess) which does some file creation. i.e., when it returns it will have created a file with some content.
How can I ensure that the file the process created was really flushed to disk, before I proceed?
By this I mean not the C++ buffers but really flushing disk (e.g. FlushFileBuffers).
Remember that I don't have access to any file HANDLE - this is all of course hidden inside the external process.
I guess I could open up a handle of my own to the file and then use FlushFileBuffers, but it's not clear this would work (since my handle doesn't actually contain anything which needs flushing).
Finally, I want this to run in non-admin userspace so I cannot use FlushFileBuffers on a whole volume.
Any ideas?
UPDATE: Why do I think this is a problem?
I'm working on a data backup application. Essentially it has to create some files as described. It then has to update it's internal DB (using SQLite embedded DB).
I recently had a data corruption issue which occurred during a bluescreen (the cause of which was unrelated to my app).
What I'm concerned about is application integrity during a system crash. And yes, I do care about this because this app is a data backup app.
The use case I'm concerned about is this:
A small data file is created using external process. This write is waiting in the OS cache to be written to disk.
I update the DB and commit. This is a disk activity. This write is also waiting in the OS cache.
A system failure occurs.
As I see it, we're now in a potential race condition. If "1" gets flushed and "2" doesn't then we're fine (as the DB transact wasn't then committed). If neither gets flushed or both get flushed then we're also OK.
As I understand it, the writes will be non-deterministic. i.e., I'm not aware that the OS will guarantee to write "1" before "2". (Am I wrong?)
So, if "2" gets flushed, but "1" doesn't then we have a problem.
What I observed was that the DB was correctly updated, but that the file had garbage in: the last 2 thirds of the data was binary "zeroes". Now, I don't know what it looks like when you have a file part flushed at the time of bluescreen, but I wouldn't be surprised if it looked like that.
Can I guarantee this is the cause? No I cannot guarantee this. I'm just speculating. It could just be that the file was "naturally" corrupted due to disk failure or as a result of the blue screen.
With regards to performance, this is something I believe I can deal with.
For example, the default behaviour of SQLite is to do a full file flush (using FlushFileBuffers) every time you commit a transaction. They are quite clear that if you don't do this then at the time of system crash, you might have a corrupted DB.
Also, I believe I can mitigate the performance hit by only flushing at "checkpoints". For example, writing 50 files, flushing the lot and then writing to the DB.
How likely is all this to be a problem? Beats me. But then my app might well be archiving at or around the time of system failure so it might be more likely that you think.
Hope that explains why I wan't to do this.
Why would you want this? The OS will make sure that the data is flushed to the disk in due time. If you access it, it will either return the data from the cache or from disk, so this is transparent for you.
If you need some safety in case of disaster, then you must call FlushFileBuffers, for example by creating a process with admin rights after running the external process. But that can severely impact the performance of the whole machine.
Your only other option is to modify the source of the other process.
[EDIT] The most simple solution is probably to copy the file in your process and then flush the copy (since you have the handle). Save the copy under a name which says "not committed in the database".
Then update the database. Write into the database, "updated from file ...". If this entry already exists next time, don't update the database and skip this step.
Flush the database to disk.
Rename the file to "file has been processed into database". Rename is an atomic operation (so it either happens or not).
If you can't think of a good filename for the different states, then use subfolders and move the file between them.
Well, there are no attractive options here. There is no documented way to retrieve the file handle you need from the process. Although there are undocumented ones, go there (via DuplicateHandle) only with careful consideration.
Yes, calling FlushFileBuffers on a volume handle is the documented way. You can avoid the privilege problem by letting a service make the call. Talk to it from your app with one of the standard process interop mechanisms. A named pipe whose name is prefixed with Global\ is probably the easiest way to get that going.
After your update I think http://sqlite.org/atomiccommit.html gives you the answers you need.
The way SQLite ensures that everything is flushed to disc works. So it works for you as well - take a look at the source.