C/C++ equivalent to Bash "readlink -f" - c++

I'm coding a Linux tool in C/C++ and receive a full path to a directory as input, this will always be rooted at "/" but elements of the path may be symlinks rather than real directories. For example:
/tools/sometool/latest
where "latest" is a symlink to "1.0".
The path is then used to filter a list of other files. My problem is that the other files can use either form:
/tools/sometool/latest/foo.txt
/tools/sometool/1.0/foo.txt
and I need to treat both as matching the criteria of being contained inside /tools/sometool/latest. I therefore need a way to fully resolve all the symlinks in the path.
In Bash, "readlink -f /tools/sometool/latest" returns "/tools/sometool/1.0" which is perfect, but when I try "readlink()" in C, it just gives me "1.0".
I've searched but can't find any existing solution to this in C.
Is there a simple solution that I've missed, or do I need to build the equivalent of the bash command in C to make this work?
Thanks!
P.S. The tool doesn't need to be portable, so a Linux-only solution would be fine.

If you're using C++17, you can try std::filesystem::canonical.
Converts path p to a canonical absolute path, i.e. an absolute path that has no dot, dot-dot elements or symbolic links in its generic format representation.
#include <iostream>
#include <filesystem>
namespace fs = std::filesystem;
int main()
{
fs::path p = fs::path("/tools/sometool/latest");
std::cout << "Canonical path for " << p << " is " << fs::canonical(p) << '\n';
}

Try realpath() (see its man page). This does basically the same as readlink -f.

Related

Can filesystem::canonical be used to prevent filepath injection for filepaths passed to fstream

I have a public folder pub with subfolders and files in it. A user gives me now a relative filepath, I perform some mappings, and I read the file with fstream and return it to the user.
The problem is now if the user gives me a path like e.g. ../fileXY.txt or some other fancy stuff considering path traversal or other types of filepath injection. fstream is just gonna accept it and read potential files outside of my public pub folder or even worse give them a list of all files on my system etc... .
Before reinventing the wheel, I searched in the filesystem library
and I have seen there is this std::filesystem::canonical function and there is quite a talk about the normal form. I have a general question here, can this function and the variant std::filesystem::weakly_canonical be used to prevent this types of vulnerabilities? So basically is it enough?
Further, my system's filesystem library is still in experimental mode and the std::filesystem::weakly_canonical is missing. But I cannot use the canonical because the files must exist in canonical. In my case I have certain mappings and the files dont exist in that sense. So I would need to mimic the weakly_canonical function, but how?
I have seen a related stackoverflow question on realpath for nonexisting paths and he was suggested to repeat the canonical as long as the path exist and then to add the nonexisting part to it, but that is again vulnerable to these type of injections. So do I have to roll my own weakly_canonical or can I somehow mimic it by combining some std::experimental::filesystem functions?
Short answer no.
Long answer this is modeled after posix realpath
I understand the source of confusion. From realpath
The realpath() function shall derive, from the pathname pointed to by file_name, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.', '..
From cppref path you can also see that the double dot is removed. However the path still points to the same file. It's just that redundant elements are removed.
If you are processing values from a db/webapp/whatever where your program has different privileges than the user who supplied the path, you need to sanitize the filename first by escaping double dots. Dots are fine.
Perhaps you can use a regex to escape double dots with a backslash thus rendering them ineffective.
#include <iostream>
#include <filesystem>
#include <string>
#include <regex>
int main()
{
std::string bad = "../bad/../other";
std::filesystem::path p(bad);
std::cout << std::filesystem::weakly_canonical(p) << std::endl;
std::regex r(R"(\.\.)");
p = std::regex_replace(bad, r, "\\.\\.");
std::cout << std::filesystem::weakly_canonical(p) << std::endl;
}
Output
"/tmp/other"
"/tmp/1554895428.8689194/\.\./bad/\.\./other"
Run sample
I can see how you could employ weakly_canonical() to prevent path traversal - similar to what is described here - by checking that the result is prefixed with your base path. E.g.
#include <iostream>
#include <filesystem>
#include <optional>
// Returns the canonical form of basepath/relpath if the canonical form
// is under basepath, otherwise returns std::nullopt.
// Note that one would probably require that basepath is sanitized,
// safe for use in this context and absolute.
// Thanks to https://portswigger.net/web-security/file-path-traversal
// for the basic idea.
std::optional<std::filesystem::path> abspath_no_traversal(
const std::filesystem::path & basepath,
const std::filesystem::path & relpath) {
const auto abspath = std::filesystem::weakly_canonical(basepath / relpath);
// thanks to https://stackoverflow.com/questions/1878001/how-do-i-check-if-a-c-stdstring-starts-with-a-certain-string-and-convert-a
const auto index = abspath.string().rfind(basepath.string(), 0);
if (index != 0) {
return std::nullopt;
}
return abspath;
}
Since I am no security expert, I welcome any corrections.

Printing boost path on windows with escaped backslashes

I need to print out a path (stored as boost filesystem path) to file, to be parsed back to path later.
The parser expects paths in windows platform to be escaped, so a path like
c:\path\to\file
will appear in the file as
c:\\path\\to\\file
Is there a method in boost path to do this? or do i need to process the output of string() method to add the escapes?
Did you hear about std::quoted?
It can be handy for things like this. Alternatively, use the power of your shell (e.g. Escape FileNames Using The Same Way Bash Do It)
Live On Coliru
#include <iomanip>
#include <iostream>
int main() {
std::cout << std::quoted(R"(c:\path\to\file)") << std::endl;
std::cout << std::quoted("c:\\path\\to\\file") << std::endl;
}
Prints
"c:\\path\\to\\file"
"c:\\path\\to\\file"
Note: also shows raw string literal

C++: Getting size of all files inside current directory

I'm new to C++ programming, and I'm trying to practice file reading and writing. I'm trying to get the sizes of all the files of the current directory. Thing is, after getting the names of the files in the current directory, I place them inside of a text file. So now I'm stuck, and don't know where to go from here.
#include <iostream>
#include <fstream>
#include <algorithm>
using namespace std;
// FILE FUNCTION
void fileStuff(){
}
// MAIN FUNCTION
int main(int argc, char const *argv[])
{
// ERROR CHECKING
if(argc != 3){ // IF USER DOESN'T TYPE ./nameOfFile, AND THE OTHER REQUIRED ARGUMENTS.
cout << "Incorrect. Try Again" << endl;
exit(-1);
}
ifstream file;
string fileContents;
system("find . -type f > temp.txt");
file.open("temp.txt");
if (!file){
cout << "Unable to open file: temp.txt" << endl;
exit(-1);
}
while(file){
getline(file, fileContents);
cout << fileContents << endl;
}
file.close();
return 0;
}
C++14 (and earlier versions, notably C++11) does not know about file systems and directories (yet). For C++17, see its file system library. Otherwise, your code is operating system specific, but Boost library has some file system support.
I am assuming you are running on Linux or some POSIX system.
Your program just uses an external command (find(1)); if you want to read from such a command, you might use popen(3) with pclose, then you won't need a temporary file. BTW, you could use find . -type f -ls.
However, you don't need to use an external command, and it is safer (and faster) to avoid that.
Pedantically, a file name could contain a newline character, and with your approach you'll need to special case that. A file name could also contain a tab character (or other control characters) and in that case find . -type f behave specifically, and you would also need to special case. In practice, it is extremely poor taste and very unlikely to have a newline or tab character in a file name and you might forget these weird cases.
You could use nftw(3). You could recursively use opendir(3) & loop on readdir(3) (and later closedir).
Once you have a file path, you would use stat(2) to get that file's metadata, including its size (field st_size). BTW the /bin/ls and /usr/bin/find programs use that.
The readdir(3) function returns a struct dirent pointer ending with d_name; you probably want to skip the two entries for . and .. (so use strcmp(3) to compare with "." and "..", or do the compare the hard way). Then you'll build a complete file path using string catenation. You might use (in genuine C++) std::string or you could use snprintf(3) or asprintf(3) for that. If you readdir the current directory . you could call stat(2) directly on d_name field.
BTW exit(-1) is incorrect (and certainly poor taste). See exit(3). A much more readable alternative is exit(EXIT_FAILURE)

C++ code for copying FILES : : confused about relative address (tilde)

I've written a simple program to copy files.
It gets two strings :
1) is for the path of the source file.
2) is for name of a copy file.
It works correctly when I give it the absolute or relative path(without tilde sign (~)).
But when I give it a relative path with tilde sign (~) it can't find the address of a file. And it makes me confused !
Here is my sample input :
1) /Users/mahan/Desktop/Copy.cpp
2) ~/Desktop/Copy.cpp
The first one works correctly but the second one no.
And here is my code :
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
string path, copy_name;
cin >> path >> copy_name;
ifstream my_file;
ofstream copy(copy_name);
my_file.open(path);
if(my_file.is_open())
{
copy << my_file.rdbuf();
copy.close();
my_file.close();
}
}
The ~ is handled by the shell you're using to auto expand to your $HOME directory.
std::ofstream doesn't handle the ~ character in the filepath, thus only your first sample works.
If you pass the filepath to your program from the command line using argv[1], and call it from your shell, you'll get the ~ automatically expanded.
With what was said above, if you want to expand the ~ character yourself, you can use the std::getenv() function to determine the value of $HOME, and replace it with that value.
The second example does not work because the shell is what replaces ~ with $HOME, i.e. the path to your home directory.
fstream objects will not perform this replacement and will instead look for a directory actually called ~, which likely does not exist in your working directory.
std::ofstream can't handle ~. It is a shortcut to your home directory. You need to give absolute path of home or the relative path with respect to the code run directory for it to work.
To give relative path, For example, if you are running your code in Desktop directory, then you needn't give ~/Desktop/Copy.cpp. Just give Copy.cpp and it should suffice.

How do I get the full path for a filename command-line argument?

I've found lots of libraries to help with parsing command-line arguments, but none of them seem to deal with handling filenames. If I receive something like "../foo" on the command line, how do I figure out the full path to the file?
You could use boost::filesystem to get the absolute path of a file, from its relative path:
namespace fs = boost::filesystem;
fs::path p("test.txt");
fs::path full_p = fs::complete(p); // complete == absolute
std::cout << "The absolute path: " << full_p;
POSIX has realpath().
#include <stdlib.h>
char *realpath(const char *filename, char *resolvedname);
DESCRIPTION
The realpath() function derives, from the pathname pointed to by filename, an absolute pathname that names the same file, whose resolution does not involve ".", "..", or symbolic links. The generated pathname is stored, up to a maximum of {PATH_MAX} bytes, in the buffer pointed to by resolvedname.
Boost.Filesystem
In shell scripts, the command "readlink -f" has the functionality of realpath().