read regular expression from .ini file [duplicate] - regex

When I create a string containing backslashes, they get duplicated:
>>> my_string = "why\does\it\happen?"
>>> my_string
'why\\does\\it\\happen?'
Why?

What you are seeing is the representation of my_string created by its __repr__() method. If you print it, you can see that you've actually got single backslashes, just as you intended:
>>> print(my_string)
why\does\it\happen?
The string below has three characters in it, not four:
>>> 'a\\b'
'a\\b'
>>> len('a\\b')
3
You can get the standard representation of a string (or any other object) with the repr() built-in function:
>>> print(repr(my_string))
'why\\does\\it\\happen?'
Python represents backslashes in strings as \\ because the backslash is an escape character - for instance, \n represents a newline, and \t represents a tab.
This can sometimes get you into trouble:
>>> print("this\text\is\not\what\it\seems")
this ext\is
ot\what\it\seems
Because of this, there needs to be a way to tell Python you really want the two characters \n rather than a newline, and you do that by escaping the backslash itself, with another one:
>>> print("this\\text\is\what\you\\need")
this\text\is\what\you\need
When Python returns the representation of a string, it plays safe, escaping all backslashes (even if they wouldn't otherwise be part of an escape sequence), and that's what you're seeing. However, the string itself contains only single backslashes.
More information about Python's string literals can be found at: String and Bytes literals in the Python documentation.

As Zero Piraeus's answer explains, using single backslashes like this (outside of raw string literals) is a bad idea.
But there's an additional problem: in the future, it will be an error to use an undefined escape sequence like \d, instead of meaning a literal backslash followed by a d. So, instead of just getting lucky that your string happened to use \d instead of \t so it did what you probably wanted, it will definitely not do what you want.
As of 3.6, it already raises a DeprecationWarning, although most people don't see those. It will become a SyntaxError in some future version.
In many other languages, including C, using a backslash that doesn't start an escape sequence means the backslash is ignored.
In a few languages, including Python, a backslash that doesn't start an escape sequence is a literal backslash.
In some languages, to avoid confusion about whether the language is C-like or Python-like, and to avoid the problem with \Foo working but \foo not working, a backslash that doesn't start an escape sequence is illegal.

Related

Why does it give me an error when opening a txt fiile? [duplicate]

I'm really confused about the escape character " \ " and its relation to the windows file system. In the following example:
char* fwdslash = "c:/myfolder/myfile.txt";
char* backslash = "c:\myfolder\myfile.txt";
char* dblbackslash = "c:\\myfolder\\myfile.txt";
std::ifstream file(fwdslash); // Works
std::ifstream file(dblbackslash); // Works
std::ifstream file(backslash); // Doesn't work
I get what you are doing here is escaping a special character so you can use it in this string. In no way by placing a backslash in a string literal or std::string do you actually change the string ---
---Edit: This is completely wrong, and the source of my confusion---
So it seems that the escape character is only treated by certain classes or things to mean something other than a backslash, like outputting on the console, ie., std::cout << "\hello"; will not print the backslash. In the case of ifstream (or I'm not sure if the same applies with the C fopen() version), it must be that this class or function treats backslashes as escape characters. I'm wondering, since the Windows file system uses backslashes wouldn't it make sense for it to accept the simple string with backslashes, ie., "c:\myfolder\myfile.txt" ? Trying it this way fails.
Also, in my compiler (Visual Studio) when I include headers I can use .\ and ..\ to mean either the current folder, or the parent folder. I'm pretty sure the \ in this isn't related to the escape character, but are these forms specific to Windows, part of the C preprocessor, or part of the C or C++ language? I know that backslashes are a Windows thing, so I can't see any reason another system would expect backslashes even when using .\ and ..\
Thanks.
In no way by placing a backslash in a string literal[...] do you
actually change the string
You do. Compiler actually modifies literal you wrote before embedding it into compiled program. If a backslash is found in string or character literal while parsing source code it is ignored and next character is treated specially. \n becomes carriage return, etc. For escaped characters without special meaning threatment is implementation defined. Usually it just means character unchanged.
You cannot just pass "c:\myfolder\file.txt" because it is not a string which will be seen by your program. Your program will see "c:myfolderfile.txt" instead. This is why escaped backslash has a special meaning, to allow embedding backslashes in actual string your program will see.
The solution is to either escape your backslashes, or use raw string literals (C++11 onwards):
const char* path = R"(c:\myfolder\file.txt)"
Filenames given to #include directive are not string literals, even if they are in form "path\to\header", so substitution rules are not applied to them.
The single backwards slash practically escapes the next character. In order to get rid of this behavior you need to double escape it. Now for the forward slash, it is probably a compatibility issue which follows the Unix tradition.
Similar thing to this is also in the Java world. A single forward slash is treated for path separation on both Windows and Unix, while also a double backslash.
To make it more clear why single backslash doesn't work, just remember that the following String practically produces a newline, a backslash and a tab:
"\n\\\t"
i.e. in an example like:
""c:\my\next\file.txt"
would actually produce:
"c:my
ext
ile.txt"
(the double space is form feed, see here)
Because when declaring a cstring literal the backslashes escape the next character, for special characters. This is so you can do newlines (\n), nulls (\0), carriage returns (\r) etc...
char* backslash = "c:\myfolder \myfile.txt";

Could someone explain C++ escape character " \ " in relation to Windows file system?

I'm really confused about the escape character " \ " and its relation to the windows file system. In the following example:
char* fwdslash = "c:/myfolder/myfile.txt";
char* backslash = "c:\myfolder\myfile.txt";
char* dblbackslash = "c:\\myfolder\\myfile.txt";
std::ifstream file(fwdslash); // Works
std::ifstream file(dblbackslash); // Works
std::ifstream file(backslash); // Doesn't work
I get what you are doing here is escaping a special character so you can use it in this string. In no way by placing a backslash in a string literal or std::string do you actually change the string ---
---Edit: This is completely wrong, and the source of my confusion---
So it seems that the escape character is only treated by certain classes or things to mean something other than a backslash, like outputting on the console, ie., std::cout << "\hello"; will not print the backslash. In the case of ifstream (or I'm not sure if the same applies with the C fopen() version), it must be that this class or function treats backslashes as escape characters. I'm wondering, since the Windows file system uses backslashes wouldn't it make sense for it to accept the simple string with backslashes, ie., "c:\myfolder\myfile.txt" ? Trying it this way fails.
Also, in my compiler (Visual Studio) when I include headers I can use .\ and ..\ to mean either the current folder, or the parent folder. I'm pretty sure the \ in this isn't related to the escape character, but are these forms specific to Windows, part of the C preprocessor, or part of the C or C++ language? I know that backslashes are a Windows thing, so I can't see any reason another system would expect backslashes even when using .\ and ..\
Thanks.
In no way by placing a backslash in a string literal[...] do you
actually change the string
You do. Compiler actually modifies literal you wrote before embedding it into compiled program. If a backslash is found in string or character literal while parsing source code it is ignored and next character is treated specially. \n becomes carriage return, etc. For escaped characters without special meaning threatment is implementation defined. Usually it just means character unchanged.
You cannot just pass "c:\myfolder\file.txt" because it is not a string which will be seen by your program. Your program will see "c:myfolderfile.txt" instead. This is why escaped backslash has a special meaning, to allow embedding backslashes in actual string your program will see.
The solution is to either escape your backslashes, or use raw string literals (C++11 onwards):
const char* path = R"(c:\myfolder\file.txt)"
Filenames given to #include directive are not string literals, even if they are in form "path\to\header", so substitution rules are not applied to them.
The single backwards slash practically escapes the next character. In order to get rid of this behavior you need to double escape it. Now for the forward slash, it is probably a compatibility issue which follows the Unix tradition.
Similar thing to this is also in the Java world. A single forward slash is treated for path separation on both Windows and Unix, while also a double backslash.
To make it more clear why single backslash doesn't work, just remember that the following String practically produces a newline, a backslash and a tab:
"\n\\\t"
i.e. in an example like:
""c:\my\next\file.txt"
would actually produce:
"c:my
ext
ile.txt"
(the double space is form feed, see here)
Because when declaring a cstring literal the backslashes escape the next character, for special characters. This is so you can do newlines (\n), nulls (\0), carriage returns (\r) etc...
char* backslash = "c:\myfolder \myfile.txt";

Why do regexes and string literals use different escape sequences?

The handling of escape sequences varies across languages and between string literals and regular expressions. For example, in Python the \s escape sequence can be used in regular expressions but not in string literals, whereas in PHP the \f form feed escape sequence can be used in regular expressions but not in string literals.
In PHP, there is a dedicated page for PCRE escape sequences (http://php.net/manual/en/regexp.reference.escape.php) but it does not have an official list of escape sequences that are exclusive to string literals.
As a beginner in programming, I am concerned that I may not have a full understanding of the background and context of this topic. Are these concerns valid? Is this an issue that others are aware of?
Why do different programming languages handle escape sequences differently between regular expressions and string literals?
The escape sequences found in string literals are there to stop the programing language from getting confused. For example, in many languages a string literal is denoted as characters between quotes, like so
my_string = 'x string'
But if your string contains a quote character then you need a way to tell the programming language that this should be interpreted as a literal character
my_string = 'x's string' # this will cause bugs
my_string = 'x\'s string' # lets the programing language know that the internal quote is literal and not the end of the string
I think that most programing languages have the same set of escape sequences for string literals.
Regexes are a different story, you can think of them as their own separate language that is written as a string literal. In a regex some characters like the period (.) have a special meaning and must be escaped to match their literal counterpart. Whereas other characters, when preceded by a backslash allow those characters to have special meaning.
For example
regex_string = 'A.C' # match an A, followed by any character, followed by C
regex_string = 'A\.C' # match an A, followed by a period, followed by C
regex_string = 'AsC' # match an A, followed by s, followed by C
regex_string = 'A\sC' # match an A, followed by a space character, followed by C
Because regexes are their own mini-language it doesn't make sense that all of the escape sequences in regexes are available to normal string literals.
Regular expressions are best thought of as a language in themselves, which have their own syntax. Some programming languages offer a literal syntax specifically for describing a regex, but usually a regex will be compiled from an existing string. If you create that string from literal syntax, that uses a different set of escape sequences because it is a different kind of thing, created with a different syntax, for a different context, in a different language. That's the simple and direct answer to the question.
There are different needs and requirements. Regexes have to be able to describe things that aren't a single, specific sequence of text. String literals obviously don't have that problem, but they do need a way to, say, include quotation marks in the text. That usually isn't a problem for regex syntax, because the content of the string is already determined by that point. (Some languages have a "regex literal" syntax, typically enclosing the regex in forward slashes. In these languages, forward slashes that are supposed to be part of the regex need to be escaped.)
Although I understand the obvious (\s represents multiple characters and would introduce ambiguity)
Ambiguity isn't actually a concern for most languages that support regex. It often happens that the string literal syntax and the regex syntax use the same sequence to mean different things. For example: \b represents a word boundary in regex syntax, but many languages' string literal syntax also uses it to represent a backspace character, Unicode code point 8. (Unless you meant that \s to mean "any whitespace character" doesn't make sense in the string literal context but only in the regex context - then yes, of course.)
But keep in mind - if the regex is being compiled from a string literal, then first the string literal is interpreted to figure out what the string actually contains, and then that string is used to create the regex. These are separate steps that can and do apply separate rules, so there is no conflict.
This sometimes means that code has to use a double escaping mechanism: first for the string literal, and then for the regex syntax. If you want a regex that matches a literal backslash, you might end up typing four backslashes in a string literal - since that code will create a string that actually contains only two backslashes, which in turn is what the regex syntax requires. (Some languages offer some kind of "raw" string literal facility to work around this.)

Is it necessary to escape slashes in strings in Windows Registry?

This is a question mostly concerning WinAPI RegSetValueEx. If you look at its description in MSDN here you'd find:
lpData [in] The data to be stored.
REG_SZ, the string must be null-terminated. With the REG_MULTI_SZ data
type, the string must be terminated with two null characters. A
backslash must be preceded by another backslash as an escape
character. For example, specify "C:\\mydir\\myfile" to store the
string "C:\mydir\myfile".
The question I have, do I really need to escape slashes? Because I've never done that before and it worked perfectly fine.
This is indeed a documentation error. You do not need to escape backslashes here. The exact string that you send to this API is what will be stored in the registry. No processing of backslashes will be performed.
Now, it's true that in C and C++ you need to escape certain characters in string literals, but that's not pertinent to a Win32 API documentation. That's an issue for source code to object code translation for specific languages and quite beyond the remit of this documentation.
Yes, because \ has a meaning in C++, whereas \\ means an ordinary backslash.
When \ appears in a string, C++ compiler will look at the next character and convert the combination into something (for example \n will be converted into a "newline" character). \\ will be converted into a regular backslash. This is called "escaping" (historically, on old terminals, the ESC+key combination was used for many keys that were not on the keyboard).

How to do string concatenation in gdb/ada

According to the manual, string concatenation isn't implemented in gdb. I need it however, so is there a way to achieve this, perhaps using array functions?
I don't have a copy of gdb around to try this on, but perhaps this line from later in the Ada section of the document will help you?
Rather than use catenation and
symbolic character names to introduce
special characters into strings, one
may instead use a special bracket
notation, which is also used to print
strings. A sequence of characters of
the form ["XX"]' within a string or
character literal denotes the (single)
character whose numeric encoding is XX
in hexadecimal. The sequence of
characters["""]' also denotes a
single quotation mark in strings. For
example, "One line.["0a"]Next
line.["0a"]"
contains an ASCII newline character
(Ada.Characters.Latin_1.LF) after each
period.
For Objective-C:
[#"asd" stringByAppendingString:#"zxc"]
[#"ID: " stringByAppendingString:(NSString*) [aTaskDict valueForKey:#"ID"]]