Long- and Multi-format Path Manipulation Library? - c++

Is there any path open-source manipulation library which supports all of the following?
Unrestricted path lengths (i.e. the only restriction should be from the range of size_t, not arbitrary limitations like 256 characters)
Basic manipulations like canonicalization, the equivalent of basename, dirname, getting the file extension, getting the root, etc.
All valid Windows-style paths and file names, such as \Rooted, Dir/, C:\Dir/foo, File, \\Computer\Dir/File, \\.\C:, Foo\./.\Bar:ADS, or \\?\C:\Dir\Escaped:ADS:$DATA
I believe this should also cover POSIX-style paths, but if not, those should work too
I'd prefer C++, but C is also fine.

cwalk can do that. It's a small C path library.

Sounds like QDir and QFileInfo from Qt 4.

Related

C++ multiple files with common name beginning

Is there any way to open files which have common name beginning without specifying their full names and amount? They should be open one by one, not all at once.
For example, I have files:
certainstr_123.txt, certainstr_5329764.txt, certainstr_1323852.txt.
Or, maybe, it can be more easily done in some another language?
Thank you.
C++ doesn't define a standard way for listing files in that way.
The best cross platform approach is to use a library such as the boost filesystem module. I don't think boost::filesystem has wildcard search, you have to filter files yourself but it isn't difficult.
You could use regular expressions, like in the other answer (it's the perfect-fit solution).
Probably it could be enough to check file extension (i->path().extension()) and filename starting with "certainstr_" (boost::starts_with or std::string::substr).
If you choose C++11 standard regex library make sure to have a recent libstdc++.
There are a lot of system specific functions. E.g. see:
How can I get the list of files in a directory using C or C++?
How do I get a list of files in a directory in C++?
for some Unix/Windows examples.
You could also try something like (Windows):
std::system("dir /b certainstr_*.txt > list.txt");
or (Unix):
std::system("ls -m1 certainstr_*.txt > list.txt");
parsing the list.txt output file (of course this is a hack).
Anyway, depending on your needs, Python-based (or script-based) solutions could be simpler (see also How to list all files of a directory?):
[glob.glob('certainstr_*.txt')][3]
or also:
files = [f for f in os.listdir('.') if re.match(r'certainstr_\d+.txt', f)]
This is the Python equivalent of https://stackoverflow.com/a/26585425/3235496
I suggest just using a regular expression. Pseudo code:
boost::regex reg("^certainstr_\\d+.txt$");
for(recursive_directory_iterator it("."); it != recursive_directory_iterator(); ++it)
{
if(boost::regex_search(it->string(), reg))
{
cout << *it << endl;
}
}

Writing french character in binary file using c++ / qt

I have a binary file read/write module in c++ . Which works fine for English language, but fails to read write french character set. What changes do i need to make ? any special encoding type needs to be specified ? (I have access to c++ std libs and qt 4.7 lib functions) .
You can try QString::fromUtf8(yourString)
For starters, make sure that your data files are UTF8 and that you open them as UTF8. Make sure that your source code files are UTF8, too, especially if you use any explicit strings in them, but it's better to avoid using explicit strings.

Path and string chopping in Powershell

I'm doing some work involving some automated file moving, and these files contain relative paths that must be maintained. Unfortunately, I'm finding the facilities offered by System.IO.Path, System.String, and Powershell's operators to be a little ill-equipped to handle my work gracefully.
One function that would be very useful to me is the notion of a subtraction of paths, that would work in theory like subtracting vectors. Conceptually, A - B gets you a path from B to A. In the application to paths, D:\A\B\C\D - D:\A\B\ = \C\D. Likewise, D:\A\B\ - D:\A\B\C\D = \..\.. in this case. I can accept, for now, that this only makes sense when one path is wholly contained in the other.
This seems to consist of two steps: 1) determine containment of one path in the other. 2) remove the contained path from the containing path. 3) Optionally, replace folder names with the parent .. symbol based on the sidedness of the operation.
As I am concerned with NTFS, I need both containment and replacement operations to be case-insensitive. For containment, I can use select-string since it is case-insensitive, and allows the -simple switch which allows me to use a path without hacking it apart to escape them for regex.
Removing the string from the other is a little more annoying though. System.IO.Path has nothing for this, System.String's pertinent methods are all case-sensitive, and powershell's operators all require massaging so that the regex will match things.
All this seems like more work than it should be--are there any tools I'm missing that would better handle this?
Determine containment - convert your paths to absolute paths (if not already). You can use Resolve-Path for this. Then you can use $path1.StartsWith($path2, 'OrdinalIgnoreCase') to test for containment.
Remove contained path - $path1.Substring($path2.length)
Replace parent folder names with ... - although I don't have the regex off the top of my head, I'm pretty sure you could do this with a regular expression search/replace using PowerShell's -replace operator
filedirectorypath, on CodePlex, may offer what you need
It's not a PowerShell specific API, but that's no reason not to use it from PowerShell.
Benefits of the NDepend.Helpers.FilePathDirectory over the .NET Framework class System.IO.Path include:
Strongly typed File/Directory path.
Relative / absolute path conversion.
Path normalization API
Path validity check API
Path comparison API
Path browsing API.
Path rebasing API
List of path operations (TryGetCommonRootDirectory, GetListOfUniqueDirsAndUniqueFileNames, list equality…)

Programmatically search + replace in a .doc

If I'm given a .doc file with special tags in it such as [first_name], how do I go about replacing all occurrences of it with something like "Clark"? A simple binary replacement only works if the replacement string is the exact same length.
Haskell, C, and C++ answers would be best, but any compiled language would do. I'd also prefer to do this without an external library since it has to be deployed on Windows and Linux and cross-platform dependency handling is a bitch.
To summarize...
.doc -> magic program -> .doc with strings replaced
You could use the Word COM component ("Word.Application") on Windows to open the file, do the replacements, save the file, and close it. However, this is Windows-only and can be buggy.
Another thing you could do is use the OpenOffice.org command line interface to convert the file to the ODF format, unzip the file (ODF is mostly zipped XML), do the replacements with the files inside, re-zip the file, and re-convert it to .doc format. However, OpenOffice.org doesn't always read Word files correctly (especially if there is a lot of complex formatting) and it can make it harder to distribute (users must either have OpenOffice.org or you must distribute it with your program).
Also, if you have a file in the .docx format, you can unzip it, do the replacements, and re-zip it.
First read the Word Document Specification.
If that hasn't terrified you, then you should find it fairly straightforward to figure out how to read and write it. It must be possible; Word manages to do it most of the time.
You probably have to use .Net programming (VB or C#) to create an object of Word.Application and then use the MS Word object model to manipulate your document.
Why do you want to be using C/C++/Haskell or another compiled language? I'm not too familiar with Haskell, but in general I would say that C is not a great language for performing text processing. A lot of interpreted languages (Perl, Python, etc.) also have powerful regular expression libraries that are suited for finding and replacing phrases.
With that said, as the other posters have noted, you will still have to deal with the eccentricities of the .doc format.

In C++ how do i validate a file or folder path?

A user input string for a destination path can potentially contain spaces or other invalid characters.
Example: " C:\users\username\ \directoryname\ "
Note that this has whitespace on both sides of the path as well as an invalid folder name of just a space in the middle. Checking to see if it is an absolute path is insufficient because that only really handles the leading whitespace. Removing trailing whitespace is also insufficient because you're still left with the invalid space-for-folder-name in the middle.
How do i prove that the path is valid before I attempt to do anything with it?
The only way to "prove" the path is valid is to open it.
SHLWAPI provides a set of path functions which can be used to canonicalize the path or verify that a path seems to be valid. This can be useful to reject obviously bad paths but you still cannot trust that the path is valid without going through the file system.
With NTFS, I believe the path you give is actually valid (though Explorer may not allow you to create a directory with only a space.)
The Boost Filesystem library provides helpers to manipulate files, paths and so... Take a look at the simple ls example and the exists function.
I use GetFileAttributes for checking for existence. Works for both folders (look for the FILE_ATTRIBUTE_DIRECTORY flag in the returned value) and for files. I've done this for years, never had a problem.
If you don't want to open the file you can also use something like the access() function on POSIX-like platforms or _access() and friends on Windows. However, I like the Boost.Filesystem method Ricardo pointed out.