Could someone explain C++ escape character " \ " in relation to Windows file system? - c++

I'm really confused about the escape character " \ " and its relation to the windows file system. In the following example:
char* fwdslash = "c:/myfolder/myfile.txt";
char* backslash = "c:\myfolder\myfile.txt";
char* dblbackslash = "c:\\myfolder\\myfile.txt";
std::ifstream file(fwdslash); // Works
std::ifstream file(dblbackslash); // Works
std::ifstream file(backslash); // Doesn't work
I get what you are doing here is escaping a special character so you can use it in this string. In no way by placing a backslash in a string literal or std::string do you actually change the string ---
---Edit: This is completely wrong, and the source of my confusion---
So it seems that the escape character is only treated by certain classes or things to mean something other than a backslash, like outputting on the console, ie., std::cout << "\hello"; will not print the backslash. In the case of ifstream (or I'm not sure if the same applies with the C fopen() version), it must be that this class or function treats backslashes as escape characters. I'm wondering, since the Windows file system uses backslashes wouldn't it make sense for it to accept the simple string with backslashes, ie., "c:\myfolder\myfile.txt" ? Trying it this way fails.
Also, in my compiler (Visual Studio) when I include headers I can use .\ and ..\ to mean either the current folder, or the parent folder. I'm pretty sure the \ in this isn't related to the escape character, but are these forms specific to Windows, part of the C preprocessor, or part of the C or C++ language? I know that backslashes are a Windows thing, so I can't see any reason another system would expect backslashes even when using .\ and ..\
Thanks.

In no way by placing a backslash in a string literal[...] do you
actually change the string
You do. Compiler actually modifies literal you wrote before embedding it into compiled program. If a backslash is found in string or character literal while parsing source code it is ignored and next character is treated specially. \n becomes carriage return, etc. For escaped characters without special meaning threatment is implementation defined. Usually it just means character unchanged.
You cannot just pass "c:\myfolder\file.txt" because it is not a string which will be seen by your program. Your program will see "c:myfolderfile.txt" instead. This is why escaped backslash has a special meaning, to allow embedding backslashes in actual string your program will see.
The solution is to either escape your backslashes, or use raw string literals (C++11 onwards):
const char* path = R"(c:\myfolder\file.txt)"
Filenames given to #include directive are not string literals, even if they are in form "path\to\header", so substitution rules are not applied to them.

The single backwards slash practically escapes the next character. In order to get rid of this behavior you need to double escape it. Now for the forward slash, it is probably a compatibility issue which follows the Unix tradition.
Similar thing to this is also in the Java world. A single forward slash is treated for path separation on both Windows and Unix, while also a double backslash.
To make it more clear why single backslash doesn't work, just remember that the following String practically produces a newline, a backslash and a tab:
"\n\\\t"
i.e. in an example like:
""c:\my\next\file.txt"
would actually produce:
"c:my
ext
ile.txt"
(the double space is form feed, see here)

Because when declaring a cstring literal the backslashes escape the next character, for special characters. This is so you can do newlines (\n), nulls (\0), carriage returns (\r) etc...
char* backslash = "c:\myfolder \myfile.txt";

Related

Why does it give me an error when opening a txt fiile? [duplicate]

I'm really confused about the escape character " \ " and its relation to the windows file system. In the following example:
char* fwdslash = "c:/myfolder/myfile.txt";
char* backslash = "c:\myfolder\myfile.txt";
char* dblbackslash = "c:\\myfolder\\myfile.txt";
std::ifstream file(fwdslash); // Works
std::ifstream file(dblbackslash); // Works
std::ifstream file(backslash); // Doesn't work
I get what you are doing here is escaping a special character so you can use it in this string. In no way by placing a backslash in a string literal or std::string do you actually change the string ---
---Edit: This is completely wrong, and the source of my confusion---
So it seems that the escape character is only treated by certain classes or things to mean something other than a backslash, like outputting on the console, ie., std::cout << "\hello"; will not print the backslash. In the case of ifstream (or I'm not sure if the same applies with the C fopen() version), it must be that this class or function treats backslashes as escape characters. I'm wondering, since the Windows file system uses backslashes wouldn't it make sense for it to accept the simple string with backslashes, ie., "c:\myfolder\myfile.txt" ? Trying it this way fails.
Also, in my compiler (Visual Studio) when I include headers I can use .\ and ..\ to mean either the current folder, or the parent folder. I'm pretty sure the \ in this isn't related to the escape character, but are these forms specific to Windows, part of the C preprocessor, or part of the C or C++ language? I know that backslashes are a Windows thing, so I can't see any reason another system would expect backslashes even when using .\ and ..\
Thanks.
In no way by placing a backslash in a string literal[...] do you
actually change the string
You do. Compiler actually modifies literal you wrote before embedding it into compiled program. If a backslash is found in string or character literal while parsing source code it is ignored and next character is treated specially. \n becomes carriage return, etc. For escaped characters without special meaning threatment is implementation defined. Usually it just means character unchanged.
You cannot just pass "c:\myfolder\file.txt" because it is not a string which will be seen by your program. Your program will see "c:myfolderfile.txt" instead. This is why escaped backslash has a special meaning, to allow embedding backslashes in actual string your program will see.
The solution is to either escape your backslashes, or use raw string literals (C++11 onwards):
const char* path = R"(c:\myfolder\file.txt)"
Filenames given to #include directive are not string literals, even if they are in form "path\to\header", so substitution rules are not applied to them.
The single backwards slash practically escapes the next character. In order to get rid of this behavior you need to double escape it. Now for the forward slash, it is probably a compatibility issue which follows the Unix tradition.
Similar thing to this is also in the Java world. A single forward slash is treated for path separation on both Windows and Unix, while also a double backslash.
To make it more clear why single backslash doesn't work, just remember that the following String practically produces a newline, a backslash and a tab:
"\n\\\t"
i.e. in an example like:
""c:\my\next\file.txt"
would actually produce:
"c:my
ext
ile.txt"
(the double space is form feed, see here)
Because when declaring a cstring literal the backslashes escape the next character, for special characters. This is so you can do newlines (\n), nulls (\0), carriage returns (\r) etc...
char* backslash = "c:\myfolder \myfile.txt";

Make variable string ignore escape sequences

I'm currently facing an issue with a method parsing a string to another method. The problem is that I want to prevent it from using possible escape sequences.
The string I want to parse is not constant so (as far as I know) using the R declaration to make it a raw literal is not applicable here since I have to use variables.
Furthermore, in some cases there is user input included into the string (unconverted), so simply escaping those sequences by replacing a "\" character with "\\" is not an option either, the input can include those sequences too.
To be more precise on the issue:
A string formatted like f.e. " "\x10\x4 \x6(" " is getting auto compiled and converted into a non-human readable format as soon as it gets parsed to the next function. I want to prevent that conversion without In order to get the exact same string in the next function which needs to work with it.
Hope someone can help me since I'm new to c++ programming. Thanks in advance :D
#include "pch.h"
#include <iostream>
int main()
{
stringTester stringtester;
std::string test = stringtester.exampleString();
stringtester.stringOutput(test);
}
std::string stringTester::exampleString()
{
std::string exampleInput = "\x10\x5\x1a\aTestInput\\n \x6(";
return exampleInput;
}
void stringTester::stringOutput(std::string test)
{
std::cout << test << std::endl;
}
The actual output her (copied from console) is " TestInput\n ( ", whereas the wanted output would be the original string "\x10\x5\x1a\aTestInput\n \x6("
Edit: It seems like on SO it can't show the unknown characters. There are xtra characters in front and after the "TestInput\n ("
When you write a string literal in your source code the compiler replaces escape sequences with the character that they represent. That's why the quoted string in your example gets turned into nonsense. The way to fix that is to either replace each backslash with two backslashes or to make it a raw string literal.
When your program reads text input it doesn't do any of those adjustments. So if the code does
std::string input;
std::cin >> input;
and the user types the characters \x10\x5\x1a\aTestInput\\n \x6( into the console, input will end up with the characters \x10\x5\x1a\aTestInput\\n \x6(.
Once you've got the string, whether as a string literal or as text from the console, you can do whatever you want with it.
You have two possibilities for a backslash to remain a backslash in your C/C++ strings (and Java, JavaScript, PHP...)
Double all the Backslashes
Just as you said, you want to double all backslashes. This is fine. If the input was:
\\\\
Then your C/C++ string is going to be:
"\\\\\\\\"
(a mouthful, I know...)
Use the Hex/Octal Character
The other way, if you don't like the double backslash too much (if it scares you, somehow), is to use the character sequence in octal or hex (or Unicode in newer versions):
\ becomes "\134" or "\x5C"
As you may notice, though, this means 4 characters per backslash. So most people will generally just double the backslash (one 2 characters). Plus the double backslash is well understood. The code point may not be as well known by programmers coming behind you.
As a side note, if your user can enter any character, then they can also enter the double quote (") character. It is important that you also escape those. You can similarly use the backslash and the double quote character or its code point:
\" or \042 or \x22

string Regex using lex [duplicate]

I am learning to make a compiler and it's got some rules like single string:
char ch[] ="abcd";
and multi string:
printf("This is\
a multi\
string");
I wrote the regular expression
STRING \"([^\"\n]|\\{NEWLINE})*\"
It works fine with single line string but it doesn't work with multi line string where one line ends with a '\' character.
What should I change?
A common string pattern is
\"([^"\\\n]|\\(.|\n))*\"
This will match strings which include escaped double quotes (\") and backslashes (\\). It uses \\(.|\n) to allow any character after a backslash. Although some backslash sequences are longer than one character (\x40), none of them include non-alphanumerics after the first character.
It is possible that your input includes Windows line endings (CR-LF), in which case the backslash will not be directly followed by a newline; it will be followed by a carriage return. If you want to accept that input rather than throwing an error (which might be more appropriate), you need to do so explicitly:
\"([^"\\\n]|\\(.|\r?\n))*\"
But recognising a string and understanding what the string represents are two different things. Normally a compiler will need to turn the representation of a string into a byte sequence and that requires, for example, turning \n into the byte 10 and removing backslashed newlines altogether.
That task can easily be done in a (f)lex scanner using start conditions. (Or, of course, you can rescan the string using a different lexical scanner.)
Additionally, you need to think about error-handling. Once you ban strings with unescaped newlines (as C does), you open the door to the possibility of an unterminated string, where a newline is encountered before the closing quote. The same could happen at the end of the file if a string is not correctly​ closed.
If you have a single-character fallback rule, it will recognise the opening quote of an unterminated string. This is not desirable because it will then scan the contents of the string as program text leading to a cascade of errors. If you are not attempting error recovery it doesn't matter, but if you are it is usually better to at least recognize the unterminated string as such up to the newline, using a different pattern.

C++ - Escaping or disabling backslash on string

I am writing a C++ program to solve a common problem of message decoding. Part of the problem requires me to get a bunch of random characters, including '\', and map them to a key, one by one.
My program works fine in most cases, except that when I read characters such as '\' from a string, I obviously get a completely different character representation (e.g. '\0' yields a null character, or '\' simply escapes itself when it needs to be treated as a character).
Since I am not supposed to have any control on what character keys are included, I have been desperately trying to find a way to treat special control characters such as the backslash as the character itself.
My questions are basically these:
Is there a way to turn all special characters off within the scope of my program?
Is there a way to override current digraphs definitions of special characters and define them as something else (like digraphs using very rare keys)?
Is there some obscure method on the String class that I missed which can force the actual character on the string to be read instead of the pre-defined constant?
I have been trying to look for a solution for hours now but all possible fixes I've found are for other languages.
Any help is greatly appreciate.
If you read in a string like "\0" from stdin or a file, it will be treated as two separate characters: '\\' and '0'. There is no additional processing that you have to do.
Escaping characters is only used for string/character literals. That is to say, when you want to hard-code something into your source code.

Is it necessary to escape slashes in strings in Windows Registry?

This is a question mostly concerning WinAPI RegSetValueEx. If you look at its description in MSDN here you'd find:
lpData [in] The data to be stored.
REG_SZ, the string must be null-terminated. With the REG_MULTI_SZ data
type, the string must be terminated with two null characters. A
backslash must be preceded by another backslash as an escape
character. For example, specify "C:\\mydir\\myfile" to store the
string "C:\mydir\myfile".
The question I have, do I really need to escape slashes? Because I've never done that before and it worked perfectly fine.
This is indeed a documentation error. You do not need to escape backslashes here. The exact string that you send to this API is what will be stored in the registry. No processing of backslashes will be performed.
Now, it's true that in C and C++ you need to escape certain characters in string literals, but that's not pertinent to a Win32 API documentation. That's an issue for source code to object code translation for specific languages and quite beyond the remit of this documentation.
Yes, because \ has a meaning in C++, whereas \\ means an ordinary backslash.
When \ appears in a string, C++ compiler will look at the next character and convert the combination into something (for example \n will be converted into a "newline" character). \\ will be converted into a regular backslash. This is called "escaping" (historically, on old terminals, the ESC+key combination was used for many keys that were not on the keyboard).