Scan for changed files - c++

I'm looking for a good efficient method for scanning a directory structure for changed files in Windows XP+. Something like how git does it is exactly what I'm looking for, when running a git status it displays all modified files, all new (untracked) files and deleted files very quickly which is exactly what I would like to do.
I have a basic model up and running which performs an initial scan and stores all filenames, size, dates and attributes.
On a subsequent scan it checks if the size, attributes or date have changed and marks as a changed file.
My issue now comes in detecting moved and deleted files. Is there a tried and tested method for this sort of thing? I'm struggling to come up with a good method.
I should mention that it will eventually use ReadDirectoryChangesW to monitor files and alert the user when something changes so a full scan is really a last resort after the initial scan.
Thanks,
J
EDIT: I think I may have described the problem badly. The issue I'm facing is not so much detecting the changes - I have ReadDirectoryChangesW() using IOCP on multiple threads to detected when a change happens, the issue is more what to do with the information. For example, a moved file is reported as a delete followed by a create and a rename comes in 2 parts, old name, followed by new name. So what I'm asking is how to differentiate between the delete as part of a move and an actual delete. I'm guessing buffering the changes and processing batches would be an option but feels messy.

In native code FileSystemWatcher is replaced by ReadDirectoryChangesW. Using this properly is not simple, there is a good baseline to build off here.
I have used this code in a previous job and it worked pretty well. The Win32 API itself (and FileSystemWatcher) are prone to problems that are described in the docs and also discussed in various places online, but impact of those will depending on your use cases.
EDIT: the exact change is indicated in the FILE_NOTIFY_INFORMATION structure that you get back - adds, removals, rename data including old and new name.

I voted Liviu M. up. However, another option if you don't want to use the .NET framework for some reason, would be to use the basic Win32 API call FindFirstChangeNotification.

You can use USN journaling if you are up to it, that is pretty low level (NTFS level) stuff.
Here you can find detailed information and source code included. It is written in C# but most of it is PInvoking C/C++ functions.

Related

Can you embed files into exe, and update them with each use?

I'm trying to make an application which to manage information about several providers.
Target system is windows and I'll be coding with c++.
The users are not expected to be handy on anything related to computers, so I want to make it as fool-proof as possible. Right now my objective is to distribute only an executable, which should store all the information they introduce in there.
Each user stores information of their own providers, so I don't need the aplication to share the data with other instances. They do upload the information into a preexisting system via csv, but I can handle that easily.
I expect them to introduce new information at least once a month, so I need to update the information embedded. Is that even possible? Making a portable exe and update its information? So far the only portable apps I've seen which allow saving some personification do so by making you drag files along with your exe.
I try to avoid SQL to avoid compatibility problems (for my own applications I use external TXTs and parse the data), but if you people tell me it's the only way, I'll use sql.
I've seen several other questions about embedding files, but it seems all of them are constants. My files need to be updatable
Thanks in advance!
Edit: Thanks everyone for your comments. I've understood that what I want is not worth the problems it'd create. I'll store the data separatedly and make an effort so my coworkers understand what's the difference between an executable and it's data (just like explaining the internet to your grandma's grandma...)
While I wouldn't go as far as to say that it's impossible, it will definitely be neither simple nor pretty nor something anyone should ever recommend doing.
The basic problem is: While your .exe is running, the .exe file is mapped into memory and cannot be modified. Now, one thing you could do is have your .exe, when it's started, create a temporary copy of itself somewhere, start that one, tell the new process where the original image is located (e.g., via commandline arguments), and then have the original exit. That temporary copy could then modify the original image. To put data into your .exe, you can either use Resources, or manually modify the PE image, e.g., using a special section created inside the image to hold your data. You can also simply append arbitrary data at the end of an .exe file without corrupting it.
However, I would like to stress again that I do not recommend actually doing stuff like that. I would simply store data in separate files. If your users are familiar with Excel, then they should be familiar with the idea that data is stored in files…

What's wrong with this user ignore file for Mercurial?

A little retrospective now that I've settled into Mercurial. Forget forget files combined with hg remove. It's crazy and bass-ackwards. You can use hg remove once you've established that something is in a forget file that isn't forgetting because the item in question was tracked before the original repo was created. Note that hg remove effectively clears tracked status but it also schedules the file for deletion in anything that gets changes from your repo. If ignored, however the tracking deactivation still happens but that delete-me change set won't ever reach another repo and for some reason will never delete in yours which IMO is counter-intuitive. It is a very sure sign that somebody and I don't know these guys, is UNWILLING TO COMPROMISE ON DUH DESIGN PROBLEMS. The important thing to understand is that you don't determine what's important, Mercurial does. Except when you're merging on a pull of course. It's entirely reasonable then. But I digress...
Ignore-file/remove is a good combo for already-tracked but very specific files you want forgotten but if you're dealing with a larger quantity of built files determined with broader patterns it's not worth the risk. Just go with a double-repo and pull -u from the remote repo to your syncing repo and then pull -u commits from your working repo and merge in a repo whose sole purpose is to merge changes and pass them on in a place where your not-quite tracked or untracked files (behavior is different when pulling rather than pushing of course because hey, why be consistent?) won't cause frustration. Trust me. The idea that you should have to have two repos just to get 'er done offends for good reason AND THAT SO MANY OF US ARE DOING IT should suggest a serioush !##$ing design problem, but it's much less painful than all the other awful things that will make you regret seeking a sensible alternative.
And use hg help. It's actually Mercurial's best feature and often better than the internet (which I don't fault for confusion on the matter of all things hg) for getting answers to everything that is confusing and counter-intuitive in this VCS.
/retrospective
# switch to regexp syntax.
syntax: regexp
#Config Files
#.Net
^somecompany\.Net[\\/]MasterSolution[\\/]SomeSolution[\\/]SomeApp[\\/]app\.config
^somecompany\.Net[\\/]MasterSolution[\\/]SomeSolution[\\/]SomeApp_test[\\/]App\.config
#and more of the same following
And in my mercurial.ini at the root of my user directory
[ui]
username = ereppen
merge = bcomp
ignore = C:\<path to user ignore file>\.hgignore-config
Context:
I wrote an auto-config utility in node. I just want changes to the files it changes to get ignored. We have two teams and both aren't on the same page with making this a universal thing so it needs to be user-specific for now.
The config file is in place and pointed at by my ini file. I clone. I run the config utility and change the files and stat reveals a list of every single file with an M next to it. I thought it was the utf-8 thing and explicitly set the file to utf-16 little endian. I don't think I'm doing with the regEx that any modern flavor of regEx worth actually calling regEx wouldn't support.
The .hgignore file has no effect for files that are tracked. Its function is to stop you from seeing files you want ignored listed as "untracked". If you're seeing "M" then they're already added (you got them with the clone) so .hgignore does nothing.
The usual way config files that differ from machine to machine are handled is to put a app.config.sample in source control, have app.config in .hgignore and have people do a copy when they're making their config edits.
Alternately if your config files allow for includes and overrides you end them with include app-local.config and override any settings in a app-local.config which you don't add and do include in .hgignore.

Wix: How to add files to the RemoveFiles table from c++

I've been following the advice in this question.
How to add a WiX custom action that happens only on uninstall (via MSI)?
I have an executable running as a custom action after InstallFinalize which I intend to purge all my files and folders. I was just going to write some standard deletion logic but I'm stuck on the point that Rob Mensching made that the windows installer should handle this incase someone bails midway through an uninstallation.
"create a CustomAction that adds temporary rows to the RemoveFiles table"
I'm looking for some more information on this. I'm not really sure how to achieve this in c++ and my searching hasn't turned up a whole lot.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa371201(v=vs.85).aspx
Thanks
Neil
EDIT: I've marked the answer due to the question being specific about how to add files to the removeFiles table in c++ however I'm inclined to agree that the better solution is to use the RemoveFolderEx functionality in wix even though it is currently in beta (3.6 I think)
Roughly you will have to use the following functions in this order:
MsiDatabaseOpenView - the (input) handle is the one you get inside your custom action functions
MsiCreateRecord - to create a record with the SQL stuff inside
MsiRecord* - set of functions to prepare the record
MsiViewExecute to insert the new record into whatever table you please ...
MsiCloseHandle - with the handle from the very first step and the record handle (from MsiCreateRecord)
Everything is explained in detail over at MSDN. However, pay special attention to the section "Functions Not for Use in Custom Actions".
The documentation of MsiViewExecute also explains how the SQL queries should look. To get a feel for them you may want to use one of the .vbs scripts that are part of the Windows Installer SDK.
If you use WiX to create your installation package, consider using RemoveFolderEx element. It does what you want and you don't have to write the code yourself.
Read Tactical directory nukes for an example of how to use it.
If you still want to implement it yourself, you can get your inspiration from this blog post, there's the code for doing this in VBScript.

sketch flow -- how do we rename .xaml and .cs files without breaking sketchflow map?

how do we rename .xaml and .cs files?
would like to be able to keep development in synch with the original sketchflow. i.e. sketchflow has features such as the ability to collect client feedback on a per screen basis, etc.
... I kind of answered my own question here, so I'll post it as a follow up. Asked the original question 9 hours ago on the MS site without response... still trying to work out where the best place is to talk to the community, so sorry for the duplicate.
THE ANSWER (IS THERE A BETTER ONE?)
Context: Sketchflow is a prototyping tool. In large teams possibly you want to keep the prototype seperate from the finished version, or there's a large prototyping phase.
My view is that I really like Sketchflow. It's one of the coolest things I've seen for a while (well done Microsoft).
... so for me, I want the prototype to become a the finished product. I want the designers to step in and make transitions whenever they want. I want the designers to kick the process off, and the developers to put in the detail. I'd like our customers to be able to post feedback at any time during the build process. btw: get your developers to check out MVVM. It's very cool.
My bet is that the feedback could get lost if you make a breaking change (a file rename) -- so just beware of that. That wont be a problem for us. We'll get our file names to make sense and then mostly leave it alone. Of course MS could fix this this by creating a globally unique id (Guid) for each screen that is created. Perhaps they've done this already. If someone from MS reads this, please put this on your requested features list.
THE ANSWER:
So here is the answer that works for me:
don't try to hand-edit the xaml / cs, as all the cross referencing that you might be doing with behaviors will break if you aren't really careful. Typical files that need to be modified: .csproj, Sketch.Flow, xxxx.xaml, and xxxx.cs.
To auto do it, download a tool like Ultraedit. Alternatively, you might be able to just use VS 2010 (untested).
Steps with ultraedit:
(BACKUP YOUR PROJECT FIRST)
Search/Replace In Files...
Find in files... "Screen_1_19"
Replace with... "Welcome"
In Files/Types... "."
Directory...
Match Whole Word Only
Hit "Start"
follow the prompts
rename the files (.xaml & .cs) to be Welcome.???? (where ???? is .xaml or .cs) . Since I use SVN, this step gets done for me in one step (no big deal).
If using VS2010 for steps 1 through 8, be careful do longer string replacements first e.g. Screen_1_19 before Screen_1. I think VS treats _ as a word break. On ultraedit you'll be fine.
If there's interest, in the spare time that I don't currently have, I could release a quick tool to do this on codeplex.
** note: because we are working with XML and XML is very particular about being correct, I close expression blend down, and then reopen it again after the replace/rename to see if I was successful + my screen map still has all the flow lines still drawn in.
answer is above in the body of the question.

Easiest way to sign/certify text file in C++?

I want to verify if the text log files created by my program being run at my customer's site have been tampered with. How do you suggest I go about doing this? I searched a bunch here and google but couldn't find my answer. Thanks!
Edit: After reading all the suggestions so far here are my thoughts. I want to keep it simple, and since the customer isn't that computer savy, I think it is safe to embed the salt in the binary. I'll continue to search for a simple solution using the keywords "salt checksum hash" etc and post back here once I find one.
Obligatory preamble: How much is at stake here? You must assume that tampering will be possible, but that you can make it very difficult if you spend enough time and money. So: how much is it worth to you?
That said:
Since it's your code writing the file, you can write it out encrypted. If you need it to be human readable, you can keep a second encrypted copy, or a second file containing only a hash, or write a hash value for every entry. (The hash must contain a "secret" key, of course.) If this is too risky, consider transmitting hashes or checksums or the log itself to other servers. And so forth.
This is a quite difficult thing to do, unless you can somehow protect the keypair used to sign the data. Signing the data requires a private key, and if that key is on a machine, a person can simply alter the data or create new data, and use that private key to sign the data. You can keep the private key on a "secure" machine, but then how do you guarantee that the data hadn't been tampered with before it left the original machine?
Of course, if you are protecting only data in motion, things get a lot easier.
Signing data is easy, if you can protect the private key.
Once you've worked out the higher-level theory that ensures security, take a look at GPGME to do the signing.
You may put a checksum as a prefix to each of your file lines, using an algorithm like adler-32 or something.
If you do not want to put binary code in your log files, use an encode64 method to convert the checksum to non binary data. So, you may discard only the lines that have been tampered.
It really depends on what you are trying to achieve, what is at stakes and what are the constraints.
Fundamentally: what you are asking for is just plain impossible (in isolation).
Now, it's a matter of complicating the life of the persons trying to modify the file so that it'll cost them more to modify it than what they could earn by doing the modification. Of course it means that hackers motivated by the sole goal of cracking in your measures of protection will not be deterred that much...
Assuming it should work on a standalone computer (no network), it is, as I said, impossible. Whatever the process you use, whatever the key / algorithm, this is ultimately embedded in the binary, which is exposed to the scrutiny of the would-be hacker. It's possible to deassemble it, it's possible to examine it with hex-readers, it's possible to probe it with different inputs, plug in a debugger etc... Your only option is thus to make debugging / examination a pain by breaking down the logic, using debug detection to change the paths, and if you are very good using self-modifying code. It does not mean it'll become impossible to tamper with the process, it barely means it should become difficult enough that any attacker will abandon.
If you have a network at your disposal, you can store a hash on a distant (under your control) drive, and then compare the hash. 2 difficulties here:
Storing (how to ensure it is your binary ?)
Retrieving (how to ensure you are talking to the right server ?)
And of course, in both cases, beware of the man in the middle syndroms...
One last bit of advice: if you need security, you'll need to consult a real expert, don't rely on some strange guys (like myself) talking on a forum. We're amateurs.
It's your file and your program which is allowed to modify it. When this being the case, there is one simple solution. (If you can afford to put your log file into a seperate folder)
Note:
You can have all your log files placed into a seperate folder. For eg, in my appplication, we have lot of DLLs, each having it's own log files and ofcourse application has its own.
So have a seperate process running in the background and monitors the folder for any changes notifications like
change in file size
attempt to rename the file or folder
delete the file
etc...
Based on this notification, you can certify whether the file is changed or not!
(As you and others may be guessing, even your process & dlls will change these files that can also lead to a notification. You need to synchronize this action smartly. That's it)
Window API to monitor folder in given below:
HANDLE FindFirstChangeNotification(
LPCTSTR lpPathName,
BOOL bWatchSubtree,
DWORD dwNotifyFilter
);
lpPathName:
Path to the log directory.
bWatchSubtree:
Watch subfolder or not (0 or 1)
dwNotifyFilter:
Filter conditions that satisfy a change notification wait. This parameter can be one or more of the following values.
FILE_NOTIFY_CHANGE_FILE_NAME
FILE_NOTIFY_CHANGE_DIR_NAME
FILE_NOTIFY_CHANGE_SIZE
FILE_NOTIFY_CHANGE_SECURITY
etc...
(Check MSDN)
How to make it work?
Suspect A: Our process
Suspect X: Other process or user
Inspector: The process that we created to monitor the folder.
Inpector sees a change in the folder. Queries with Suspect A whether he did any change to it.
if so,
change is taken as VALID.
if not
clear indication that change is done by *Suspect X*. So NOT VALID!
File is certified to be TAMPERED.
Other than that, below are some of the techniques that may (or may not :)) help you!
Store the time stamp whenever an application close the file along with file-size.
The next time you open the file, check for the last modified time of the time and its size. If both are same, then it means file remains not tampered.
Change the file privilege to read-only after you write logs into it. In some program or someone want to tamper it, they attempt to change the read-only property. This action changes the date/time modified for a file.
Write to your log file only encrypted data. If someone tampers it, when we decrypt the data, we may find some text not decrypted properly.
Using compress and un-compress mechanism (compress may help you to protect the file using a password)
Each way may have its own pros and cons. Strength the logic based on your need. You can even try the combination of the techniques proposed.