awesomeApp is a cross platform (Mac, Win, Linux) Qt application in C++ that will store information about all files recursively under a given folder ~/content. Here are my options:
Store all information replicated hierarchically or in a db in ~/.awesomeApp under home directory.
Decentralizing info under ~/content. For example ~/content/foo/.awesomeApp stores information about all files ~/content/foo/*
The first approach will be good for a multi-user environment where each user wants to have his own "view". It will also work when the directory is read-only.
However, the second approach will be good when a user moves or renames a sub-directory.
Is there any other point worth considering or known best practices?
Is there a library that achieves either of these two approaches?
Related
This question says the best place to store settings in linux is in ~/.config/appname
The program I'm writing needs to use a 99MB .dat file for recognizing facial landmarks, embedding it in the binary doesn't seem like a good idea.
Is there some default place to store resources on linux? currently it's just in the directory next to the executable, but this requires that the program is run with the current directory being the directory it's located in.
What's the best way to deal with resources like this on linux? (that could potentially be cross platform with at least OSX)
You should take a look at the Filesystem Hierarchy Standards. Depending on the data (will it change, is it constant across all installations, etc) the path where it gets placed will change based on the standards.
In general:
/usr/lib/program: includes object files, libraries, and internal binaries for an application
/usr/share/program: for all read-only architecture independent data files
/var/lib/program: holds state information pertaining to an application or the system
Those seem like pretty good places to start, and you can check the documentation to see if your app falls into one of those categories.
If the file is specific to the user running the app, it should be in a subdir of ~/ but AFAIK there's no standard, and the best choice depends much on the file type/usage. If it should be visible to the user via GUI, you could use ~/Desktop or ~/Downloads. If it's temporary, you can use ~/tmp or ~/var/tmp.
If it's not specific, you should place it in a subdir of /var. Again, the exact subdir may depend on its kind and other factors.
I have following scenario:
The main software I wrote uses a database created by a simulator. This database is around 10 GB big at the moment, so I want to keep only one copy of that data per system.
Assuming I have following projects:
Main Software using the data, located at /SimData
DLL using the data for debugging, searching for data at /SimData
Debugging tool to parse the image database, searching for the data at /SimData
Since I do not want to have all those programs have their own copy of SimData (not only to decrease place used, but also to ensure that all Simulation data used is always up to date for all programs).
I created for the DLL and Debugging Utility a link named SimData to MainSoftware/SimData, but when opening a file with "SimData\MyFile.data" it cannot find it, only the MainSoftware with the ACTUAL SimData folder can find it.
How can I use the MainSoftware/SimData folder without setting absolute paths?
This is on Windows 7 x64
I agree with Peter about adding the DB location as a configurable parameter. A common place to store that is in the registry.
however, If you want to create links that will be recognized by your software, try hardlinks. . fsutil should do the trick as described here.
You need a way to configure the database location. You could use an INI or other configuration file, or a registry setting, or a command-line input, or an environment variable. Or You could write your program to search a directory hierarchy... for example, if the various modules are usually siblings of each other in your directory tree, you could search for SimData/MyFile.data, ../SimData/MyFile.data, ../../MainSoftware/SimData/Myfile.data, and use the first one found.
Which answer is the "right one" depends on your situation.
In my program I have been using SHGetFolderPath to get the AppData path. However I need to get the AppData path of the other users on the computer. The only way I can think to do it is to get the path for the current user, and then replace the current user's name with the other users names. I don't know how to get a list of the users. There's probably a much more elegant solution aswell... If you have insight I would greatly appreciate it.
For your situation I would recommend the following:
Continue to store configuration files in AppData, but store it in CSIDL_COMMON_APPDATA (SHGetFolderPath). This AppData is shared with all users. Your setup program (or an administrator user) can set up a folder in this location named after your program that gives "Everyone" full access (this is very easy with Windows Installer). That way, any user can read/write to it. Everything in "Program Files" should never change. It should only contain read-only executables, DLLs, and other such resources. Microsoft has long discouraged writing to this location and many administrators no longer expect to encounter custom user data that requires regular backup & restore in Program Files.
When your software runs, you can check for data in the current user's AppData (i.e. stored by your old version) and merge it with the data in the machine's AppData (described by #1 above). To migrate the data for the user, log in as that user and run your software.
There really isn't a great way that I'm aware of for gathering all that data from other user profiles. Nothing supported by Microsoft, that is (that I'm aware of!).
Regarding storage of data in Program Files: http://msdn.microsoft.com/en-us/library/bb776776(VS.85).aspx "Do not store user data under the Program Files folder." There are many other references that say similar.
How can I find the user's home directory in a cross platform manner in C++? i.e. /home/user in Linux, C:\Users\user\ on Windows Vista, C:\Documents And Settings\user\ on Windows XP, and whatever it is that Macs use. (I think it's /User/user)
Basically, what I'm looking for is a C++ way of doing this (example in python)
os.path.expanduser("~")
I don't think it's possible to completely hide the Windows/Unix divide with this one (unless, maybe, Boost has something).
The most portable way would have to be getenv("HOME") on Unix and concatenating the results of getenv("HOMEDRIVE") and getenv("HOMEPATH") on Windows.
This is possible, and the best way to find it is to study the source code of os.path.expanduser("~"), it is really easy to replicate the same functionality in C.
You'll have to add some #ifdef directives to cover different systems.
Here are the rules that will provide you the HOME directory
Windows: env USERPROFILE or if this fails, concatenate HOMEDRIVE+HOMEPATH
Linux, Unix and OS X: env HOME or if this fails, use getpwuid() (example code)
Important remark: many people are assuming that HOME environment variable is always available on Unix but this is not true, one good example would be OS X.
On OS X when you run an application from GUI (not console) this will not have this variable set so you need to use the getpwuid().
The home directory isn't really a cross-platform concept. Your suggestion of the root of the profile directory (%USERPROFILE%) is a fair analogy, but depending what you want to do once you have the directory, you might want one of the Application Data directories, or the user's My Documents. On UNIX, you might create a hidden ".myapp" in the home directory to keep your files in, but that's not right on Windows.
Your best bet is to write specific code for each platform, to get at the directory you want in each case. Depending how correct you want to be, it might be enough to use env vars: HOME on UNIX, USERPROFILE or APPDATA (depending what you need) on Windows.
On UNIX at least (any Windows folks care to comment?), it's usually good practice to use the HOME environment variable if it's set, even if it disagrees with the directory specific in the password file. Then, on the odd occasion when users want all apps to read their data from a different directory, it will still work.
I'm trying to speedup directory enumeration in C++, where I'm recursing into subdirectories. I currently have an app which spends 95% of it's time in FindFirst/FindNextFile APIs, and it takes several minutes to enumerate all the files on a given volume. I know it's possible to do this faster because there is an app that does: Everything. It enumerates my entire drive in seconds.
How might I accomplish something like this?
I realize this is an old post, but there is a project on source forge that does exactly what you are asking and the source code is available.
You can find the project here: NTFS-Search
"Everything" builds an index in the background, so queries are against the index not the file system itself.
There are a few improvements to be made - at least over the straight-forward algorrithm:
First, breadth search over depth search. That is, enumerate and process all files in a single folder before recursing into the sub folders you found. This improves locality - usually a lot.
On Windows 7 / W2K8R2, you can use FindFirstFileEx with FindExInfoBasic, the main speedup being omitting the short file name on NTFS file systems where this is enabled.
Separate threads help if you enumerate different physical disks (not just drives). For the same disk it only helps if it's an SSD ("zero seek time"), or you spend significant time processing a file name (compared to the time spent on disk access).
[edit] Wikipedia actually has some comments -
Basically, they are skipping the file system abstraction layer, and access NTFS directly. This way, they can batch calls and skip expensive services of the file system - such as checking ACL's.
A good starting point would be the NTFS Technical Reference on MSDN.
"Everything" accesses directory information at a lower level than the Win32 FindFirst/FindNext APIs.
I believe it reads and interprets the NTFS MFT structures directly, and that this is one of the main reasons for its performance. It's also why it requires admin privileges and why "Everything" only indexes local or removable NTFS volumes (not network drives, for example).
A couple other utilities that do the similar things are:
FindOnClick by 2Brightsparks
Search GT
A little reverse engineering with a debugger on these tools might give you some insight on the techniques they use.
Don't recurse immediately, save a list of directories you find and dive into them when finished. You want to do linear access to each directory, to take advantage of locality of reference and any caching the OS is doing.
If you're already doing the best you can to get the maximum speed from the API, the next step is to do low-level disk accesses and bypass Windows altogether. You might get some guidance from the NTFS drivers for Linux, or perhaps you can use one directly.
If you are doing this on NTFS, here's a lib for low level access: NTFSLib.
You can enumerate through all file records in $MFT, each representing a real file on disk. You can get all file attributes from the record, including $DATA.
This may be the fastest way to enumerate all files/directories on NTFS volumes, 200k~300k files per minute as I tested.