PathGetArgs/PathRemoveArgs vs. CommandLineToArgvW - is there a difference? - c++

I'm working on some path-parsing C++ code and I've been experimenting with a lot of the Windows APIs for this. Is there a difference between PathGetArgs/PathRemoveArgs and a slightly-massaged CommandLineToArgvW?
In other words, aside from length/cleanness, is this:
std::wstring StripFileArguments(std::wstring filePath)
{
WCHAR tempPath[MAX_PATH];
wcscpy(tempPath, filePath.c_str());
PathRemoveArgs(tempPath);
return tempPath;
}
different from this:
std::wstring StripFileArguments(std::wstring filePath)
{
LPWSTR* argList;
int argCount;
std::wstring tempPath;
argList = CommandLineToArgvW(filePath.c_str(), &argCount);
if (argCount > 0)
{
tempPath = argList[0]; //ignore any elements after the first because those are args, not the base app
LocalFree(argList);
return tempPath;
}
return filePath;
}
and is this
std::wstring GetFileArguments(std::wstring filePath)
{
WCHAR tempArgs[MAX_PATH];
wcscpy(tempArgs, filePath.c_str());
wcscpy(tempArgs, PathGetArgs(tempArgs));
return tempArgs;
}
different from
std::wstring GetFileArguments(std::wstring filePath)
{
LPWSTR* argList;
int argCount;
std::wstring tempArgs;
argList = CommandLineToArgvW(filePath.c_str(), &argCount);
for (int counter = 1; counter < argCount; counter++) //ignore the first element (counter = 0) because that's the base app, not args
{
tempArgs = tempArgs + TEXT(" ") + argList[counter];
}
LocalFree(argList);
return tempArgs;
}
? It looks to me like PathGetArgs/PathRemoveArgs just provide a cleaner, simpler special-case implementation of the CommandLineToArgvW parsing, but I'd like to know if there are any corner cases in which the APIs will behave differently.

The functions are similar but not exactly the same - mostly relating to how quoted strings are handled.
PathGetArgs returns a pointer to the first character following the first space in the input string. If a quote character is encountered before the first space, another quote is required before the function will start looking for spaces again. If no space is found the function returns a pointer to the end of the string.
PathRemoveArgs calls PathGetArgs and then uses the returned pointer to terminate the string. It will also strip a trailing space if the first space encountered happened to be at the end of the line.
CommandLineToArgvW takes the supplied string and splits it into an array. It uses spaces to delineate each item in the array. The first item in the array can be quoted to allow spaces. The second and subsequent items can also be quoted, but they support slightly more complex processing - arguments can also include embedded quotes by prepending them with a backslash. For example:
"c:\program files\my app\my app.exe" arg1 "argument 2" "arg \"number\" 3"
This would produce an array with four entries:
argv[0] - c:\program files\my app\my app.exe
argv[1] - arg1
argv[2] - argument 2
argv[3] - arg "number" 3
See the CommandLineToArgVW docs for a full description of the parsing rules, including how you can have embedded backslashes as well as quotes in the arguments.

Yes I've observed a different behaviour with the current SDK (VS2015 Update 3 + Windows 1607 Anniversary SDK with SDK version set to 8.1):
Calling CommandLineToArgvW with an empty lpCmdLine (what you get from wWinMain when no arguments were passed) returns the program path and filename, which will be split-up on every space. But this was not specified in the parameter, it must have done that itself but failed to think about ignoring spacing that path itself:
lpCmdLine = ""
argv[0] = C:\Program
argv[1] = Files\Vendor\MyProgram.exe
Calling CommandLineToArgvW with lpCmdLine containing parameters, does not include the program path and name, so works as expected (so long as there are no further spaces in the parameters...):
lpCmdLine = "One=1 Two=\"2\""
argv[0] = One=1
argv[1] = Two=2
Note it also strips any other quotes inside the parameters when passed.
CommandLineToArgvW doesn't like the first parameter in the format Text=\"Quoted spaces\" so if you try to pass lpCmdLine to it directly it incorrectly splits the key=value pairs if they have spaces:
lpCmdLine = "One=\"Number One\" Two=\"Number Two\""
argv[0] = One=\"Number
argv[1] = One\"
argv[2] = Two=\"Number
argv[3] = Two\"
It's kind of documented here:
https://msdn.microsoft.com/en-us/library/windows/desktop/bb776391(v=vs.85).aspx
But this kind of behaviour with spaces in the program path was not expected. It seems like a bug to me. I'd prefer the same data to be processed in both situations. Because if I really want the path to the executable I'd call GetCommandLineW() instead.
The only sensible consistent solution in my opinion is to totally ignore lpCmdLine and call GetCommandLineW(), pass the results to CommandLineToArgvW() then skip the first parameter if you are not interested in the program path. That way, all combinations are supported, i.e. path with and without spaces, parameters with nested quotes with and without spaces.
int argumentCount;
LPWSTR commandLine = GetCommandLineW();
LPWSTR *arguments = CommandLineToArgvW(commandLine, &argumentCount);

Related

Create argument string from argv [duplicate]

Let I want to write an application, that launches another application. Like this:
# This will launch another_app.exe
my_app.exe another_app.exe
# This will launch another_app.exe with arg1, arg and arg3 arguments
my_app.exe another_app.exe arg1 arg2 arg3
The problem here is that I'm getting char* argv[] in my main function, but I need to merge it to LPTSTR in order to pass it to CreateProcess.
There is a GetCommandLine function, but I cannot use it because I'm porting code from Linux and tied to argc/argv (otherwise, it's a very ugly hack for me).
I cannot easily merge arguments by hand, because argv[i] might contain spaces.
Basically, I want the reverse of CommandLineToArgvW. Is there a standard way to do this?
The definitive answer on how to quote arguments is on Daniel Colascione's blog:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
I am reluctant to quote the code here because I don't know the license. The basic idea is:
for each single argument:
if it does not contain \t\n\v\",
just use as is
else
output "
for each character
backslashes = 0
if character is backslash
count how many successive backslashes there are
fi
if eow
output the backslashs doubled
break
else if char is "
output the backslashs doubled
output \"
else
output the backslashes (*not* doubled)
output character
fi
rof
output "
fi // needs quoting
rof // each argument
If you need to pass the command line to cmd.exe, see the article (it's different).
I think it is crazy that the Microsoft C runtime library doesn't have a function to do this.
There is no Win32 API that does the reverse of CommandLineToArgvW(). You have to format the command line string yourself. This is nothing more than basic string concatenation.
Microsoft documents the format for command-line arguments (or at least the format expected by VC++-written apps, anyway):
Parsing C++ Command-Line Arguments
Microsoft C/C++ startup code uses the following rules when
interpreting arguments given on the operating system command line:
Arguments are delimited by white space, which is either a space or a
tab.
The caret character (^) is not recognized as an escape character or
delimiter. The character is handled completely by the command-line
parser in the operating system before being passed to the argv array
in the program.
A string surrounded by double quotation marks ("string") is
interpreted as a single argument, regardless of white space contained
within. A quoted string can be embedded in an argument.
A double quotation mark preceded by a backslash (\") is interpreted
as a literal double quotation mark character (").
Backslashes are interpreted literally, unless they immediately
precede a double quotation mark.
If an even number of backslashes is followed by a double quotation
mark, one backslash is placed in the argv array for every pair of
backslashes, and the double quotation mark is interpreted as a string
delimiter.
If an odd number of backslashes is followed by a double quotation
mark, one backslash is placed in the argv array for every pair of
backslashes, and the double quotation mark is "escaped" by the
remaining backslash, causing a literal double quotation mark (") to be
placed in argv.
It should not be hard for you to write a function that takes an array of strings and concatenates them together, applying the reverse of the above rules to each string in the array.
You need to recreate the command line, taking care of having all program name and arguments enclosed in ". This is done by concatenating a \" to these strings, one at the beginning, one at the end.
Assuming the program name to be created is argv[1], the first argument argv[2] etc...
char command[1024]; // size to be adjusted
int i;
for (*command=0, i=1 ; i<argc ; i++) {
if (i > 1) strcat(command, " ");
strcat(command, "\"");
strcat(command, argv[i]);
strcat(command, "\"");
}
Use the 2nd argument of CreateProcess
CreateProcess(NULL, command, ...);
You can check out the below code if it suits your need, the txt array sz can be used as a string pointer. I have added code support for both Unicode and MBCS,
#include <string>
#include <vector>
#ifdef _UNICODE
#define String std::wstring
#else
#define String std::string
#endif
int _tmain(int argc, _TCHAR* argv[])
{
TCHAR sz[1024] = {0};
std::vector<String> allArgs(argv, argv + argc);
for(unsigned i=1; i < allArgs.size(); i++)
{
TCHAR* ptr = (TCHAR*)allArgs[i].c_str();
_stprintf_s(sz, sizeof(sz), _T("%s %s"), sz, ptr);
}
return 0;
}

c++ Function to add an extra '\' to a filepath?

I have about 3500 full file paths to sort through (ex. "C:\Users\Nick\Documents\ReadIns\NC_000852.gbk"). I just learned that c++ does not recognize the single backslash when reading in a file path. I have about 3500 file paths that I am reading in so it would be overly tedious to manually change each one.
I have this for loop that finds the single backslash and inserts a double backslash at that index. This:
string line = "C:\Users\Nick\Documents\ReadIns\NC_000852.gbk";
for (unsigned int i = 0; i < filepath.size(); i++) {
if(filepath[i] == '\') {
filepath.insert(i, '\');
}
}
However, c++, specifically on c::b, does not compile because of the backslash character. Is there a way to add in the extra backslash character with a function?
I am reading the filepaths in from a text file, so they are being read into the string filepath variable, this is just a test.
Use double backslash as '\\' and "C:\\Users...". Because single backslash with the next character makes an escape.
Also the string::insert() method's 2nd argument expects number of characters, which is missing in your code.
With all those fixes, it compiles fine:
string filepath = "C:\\Users\\Nick\\Documents\\ReadIns\\NC_000852.gbk";
// ^^ ^^ ^^ ^^ ^^
for (unsigned int i = 0; i < filepath.size(); i++) {
if(filepath[i] == '\\') {
// ^^
filepath.insert(i, 1, '\\');
} // ^^^^^^^
}
I am not sure, how above logic will work. But below is my preferred way:
for(auto pos = filepath.find('\\'); pos != string::npos; pos = filepath.find('\\', ++pos))
filepath.insert(++pos, 1, '\\');
If you had only single character to be replaced (e.g. linux system or probably supported in windows); then, you may also use std::replace() to avoid the looping as mentioned in this answer:
std::replace(filepath.begin(), filepath.end(), '\\', '/');
I assumed that, you already have a file created which contains single backslashes and you are using that for parsing.
But from your comments, I notice that apparently you are getting the file paths directly in runtime (i.e. while running the .exe). In that case, as #MSalters has mentioned, you need not worry about such transformations (i.e. changing the backslashes).
The problem that you're seeing is because in C++, string literals are commonly enclosed in "" quotes. This brings up one minor problem: how do you put a quote inside a string literal, when that quote would end the string literal. The solution is escaping it with a \. This can also be used to add a few other characters to a string, such as \n (newline). And since \ now has a special meaning in string literals, it's also used to escape itself. So "\\" is a string containing just one character (and of course a trailing NUL).
This also applies to character literals: char example[4] = {'a', '\\', 'b', 0} is an alternative way to write "a\\b".
Now this is all about compile time, when the compiler needs to separate C++ code and string contents. Once your executable is running, a backslash is just one char. std::cout << "a\\b" prints a single backslash, because there's only one in memory. std::String word; std::cin >> word will read a single word, and if you enter one backslash then word will contain one backslash. The compiler isn't involved in that.
So if you read 3500 filenames from a std::ifstream list_of_filenames and then use that to create a further 3500 std::ifstreams, you only need to worry about backslashes in specifying that very first filename in code. And if ou take that filename from argv[1] instead, you don't need to care at all.
One way to get rid of special handling of backslash is to keep all file names in a separate disk file as such and use file stream objects such as ifstream to get file names in C++ format.
TCHAR tcszFilename[MAX_PATH] = {0};
ifstream ObjInFiles( "E:\\filenames.txt" );
ObjInFiles.getline( tcszFilename, MAX_PATH );
ObjInFiles.close();
Suppose first file name stored in filenames.txt is "e:\temp\abc.txt" then after executing getline() above, the variable tcszFilename will hold "e:\\temp\\abc.txt".

c++: argv contains some spaces

I want to pass only ONE parameter containing some spaces to my function main. Here is an example:
string param = "{\"abc\" \"de\"}"; // the string is {"abc" "de"}
boost::replace_all(param, "\"", "\\\""); // now it becomes: {\"abc\" \"de\"}
boost::replace_all(param, " ", "\\40"); // now it becomes: {\"abc\"\40\"de\"}
ShellExecute(GetDesktopWindow(), "open", "myMainTest.exe", param.c_str(), "", SW_SHOWNORMAL); // execute my function main in another project
// in the function main of myMainTest.exe
cout<<argv[1];
I got this result:
{"abc"\40"de"}
It means that the double quote is OK but the space is not.
IMHO, this is directly tied to the way windows processes its command line. Arguments are normally splitted on spaces with the exception that strings enclosed in double quotes (") are processed as a single parameter after removing quotes.
But it is far from the way Unix-like shells processes input! No simple and direct way to escape a quote itself. But as your quotes are balanced it will work. Here is the actual string that you must pass to ShellExecute: "{\"abc\" \"def\"}". Now only remains how to write that is C++ source:
string param = "\"{\\\"abc\\\" \\\"def\\\"}\"";
ShellExecute(GetDesktopWindow(), "open", "myMainTest.exe", param.c_str(), "", SW_SHOWNORMAL);
And myMainTest.exe should see only single parameter: {"abc" "def"}

How do I convert argv to lpCommandLine parameter of CreateProcess?

Let I want to write an application, that launches another application. Like this:
# This will launch another_app.exe
my_app.exe another_app.exe
# This will launch another_app.exe with arg1, arg and arg3 arguments
my_app.exe another_app.exe arg1 arg2 arg3
The problem here is that I'm getting char* argv[] in my main function, but I need to merge it to LPTSTR in order to pass it to CreateProcess.
There is a GetCommandLine function, but I cannot use it because I'm porting code from Linux and tied to argc/argv (otherwise, it's a very ugly hack for me).
I cannot easily merge arguments by hand, because argv[i] might contain spaces.
Basically, I want the reverse of CommandLineToArgvW. Is there a standard way to do this?
The definitive answer on how to quote arguments is on Daniel Colascione's blog:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
I am reluctant to quote the code here because I don't know the license. The basic idea is:
for each single argument:
if it does not contain \t\n\v\",
just use as is
else
output "
for each character
backslashes = 0
if character is backslash
count how many successive backslashes there are
fi
if eow
output the backslashs doubled
break
else if char is "
output the backslashs doubled
output \"
else
output the backslashes (*not* doubled)
output character
fi
rof
output "
fi // needs quoting
rof // each argument
If you need to pass the command line to cmd.exe, see the article (it's different).
I think it is crazy that the Microsoft C runtime library doesn't have a function to do this.
There is no Win32 API that does the reverse of CommandLineToArgvW(). You have to format the command line string yourself. This is nothing more than basic string concatenation.
Microsoft documents the format for command-line arguments (or at least the format expected by VC++-written apps, anyway):
Parsing C++ Command-Line Arguments
Microsoft C/C++ startup code uses the following rules when
interpreting arguments given on the operating system command line:
Arguments are delimited by white space, which is either a space or a
tab.
The caret character (^) is not recognized as an escape character or
delimiter. The character is handled completely by the command-line
parser in the operating system before being passed to the argv array
in the program.
A string surrounded by double quotation marks ("string") is
interpreted as a single argument, regardless of white space contained
within. A quoted string can be embedded in an argument.
A double quotation mark preceded by a backslash (\") is interpreted
as a literal double quotation mark character (").
Backslashes are interpreted literally, unless they immediately
precede a double quotation mark.
If an even number of backslashes is followed by a double quotation
mark, one backslash is placed in the argv array for every pair of
backslashes, and the double quotation mark is interpreted as a string
delimiter.
If an odd number of backslashes is followed by a double quotation
mark, one backslash is placed in the argv array for every pair of
backslashes, and the double quotation mark is "escaped" by the
remaining backslash, causing a literal double quotation mark (") to be
placed in argv.
It should not be hard for you to write a function that takes an array of strings and concatenates them together, applying the reverse of the above rules to each string in the array.
You need to recreate the command line, taking care of having all program name and arguments enclosed in ". This is done by concatenating a \" to these strings, one at the beginning, one at the end.
Assuming the program name to be created is argv[1], the first argument argv[2] etc...
char command[1024]; // size to be adjusted
int i;
for (*command=0, i=1 ; i<argc ; i++) {
if (i > 1) strcat(command, " ");
strcat(command, "\"");
strcat(command, argv[i]);
strcat(command, "\"");
}
Use the 2nd argument of CreateProcess
CreateProcess(NULL, command, ...);
You can check out the below code if it suits your need, the txt array sz can be used as a string pointer. I have added code support for both Unicode and MBCS,
#include <string>
#include <vector>
#ifdef _UNICODE
#define String std::wstring
#else
#define String std::string
#endif
int _tmain(int argc, _TCHAR* argv[])
{
TCHAR sz[1024] = {0};
std::vector<String> allArgs(argv, argv + argc);
for(unsigned i=1; i < allArgs.size(); i++)
{
TCHAR* ptr = (TCHAR*)allArgs[i].c_str();
_stprintf_s(sz, sizeof(sz), _T("%s %s"), sz, ptr);
}
return 0;
}

How can I terminate a string with regex_replace?

I'm using CreateProcess to run a bash script via Cygwin's bash.exe and redirecting the output (because that's what the customer wants). The only problem still left to solve is that if ReadFile doesn't fill up lpBuffer I end up with a bunch of junk characters at the end of it, which I would like to filter out. Usually, this is something like:
"ÌÌÌÌ...ÌÌÌÌÌuÆì¨õD"
for which the code below will give me:
"uÆì¨õD"
So, I'm at least partially successful =D
However, what I'd really like is to just terminate the string at the first junk character, preferably with a newline also, but I can't seem to find a variation of fmt that works.
void ReadAndHandleOutput(HANDLE hPipeRead) {
char lpBuffer[256];
DWORD nBytesRead;
wstringstream wss;
while(TRUE)
{
if(!ReadFile(hPipeRead, lpBuffer, sizeof(lpBuffer), &nBytesRead, NULL) || !nBytesRead)
{
break;
}
// Filter out the weird non-ascii characters.
std::string buffer(lpBuffer);
std::regex rx("[^[:alnum:][:punct:][:space:]]+");
std::string fmt("\n\0");
std::regex_constants::match_flag_type fonly = std::regex_constants::format_first_only;
std::string result = std::regex_replace(buffer, rx, fmt, fonly);
wss << result.c_str();
}
SetWindowText(GetDlgItem(HwndMain, IDC_OUTPUT), LPCWSTR(wss.str().c_str())); }
I'm not sure fixing it with regex is all right. I believe you should put a \0 in where the input has finished, and you can find out the location by retrieving the number of characters read.
However, these are the set of printable (non-junk) ASCII characters:
[ -~]
Which is the set of characters from space to tilde.
So this is the desired pattern:
[^ -~]+