This question may be silly but would be great if i understand the behavior.
I try to print
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
using a simple program
char testme [] ="\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\0";
cout<<"testme:"<<testme<<endl;
The out put in this case is
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
I intend to print 64 "\" characters, instead the output is 32 "\" characters.
There seems to be some thing that i am missing since the out put is exactly half.
Edit: The reason why i was asking is becasue , i have to ^ "\" to another char for HMAC encryption and i see some weird things.
in C++11 you can do like this...
char testme [] =R"(\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\0)";
cout<<"testme:"<<testme<<endl;
The R"(...)" for Raw Character Strings...
To represent a backslash () in a string literal, we have to precede it with a backslash. To prevent errors (cos of too many backslash), C++ provides raw string literals...
This is called escaping and is a mechanism to insert certain characters into a string. For example, if you want to insert a citation mark into a string, you need to escape it.
char testme [] ="I am a so called \"programmer\".";
There's also \n, \t and other codes. However, this applies to \ itself, since you might want to be able to have a string that says \n without converting it into a newline character.
char testme [] ="This is a backslash followed by the letter n: \\n";
\\ is used to denote a single backslash: \. This is because \ is used in string literals to denote other symbols like \t for a tab, \n for a newline and \" for a quotation character.
So \\ gives you one backslash, \\\\ gives you two and so on.
To print a \ standard states that
C11; 6.4.4.4 Character constants:
The double-quote " and question-mark ? are representable either by themselves or by the
escape sequences \" and \?, respectively, but the single-quote ' and the backslash \
shall be represented, respectively, by the escape sequences \' and \\1.
That mean to print a \ you need an extra backslash \ . To print two \\ you need four backslash \\\\ and hence for 64 backslash you need 128 backslash.
1. Emphasis is mine.
\ is a special character known as Escape Character. For ex:: \n means newline character. So, if you want to print single \, you have to give \\. The first \ says the compiler to not treat the next \ as an escape character.
If it is C++, why not use string:
string testme(64, '\\');
cout << testme << endl;
The backslash \ is a very widespread escape character, and C++ also uses it like that. This means it's used to express special meaning (usually nonprintable characters). For example, to encode a line-feed character (ASCII 10) into a string, you express it as \n in the string literal. Another example, putting a single backslash at the end of a line (that is, before the line's terminating newline character) escapes the newline - so this way, you can continue a macro definition or //-style comment across several source file lines, and they will still count as one logical line.
This of course means that to get a literal backslash character, you have to escape the backslash itself, to get remove its "escape character" status. So typing \\ into a string literal yields a literal \ character.
That's why you get only half the amount of backslashes output - the C++ source code parser consumes two to produce one.
Didn't you notice one thing:
You printed 64 '\' but it printed only 32 of them.
Did you try 60, or 54, or some odd combi. say 33 ?
In C, '\' is escape character. You should have used '\n' for newline didn't you notice then, that '\' is not being printed.
To print '\' you must use '\\'.
A question for you:
Try printing 64 '%'. See what you get. Try understanding the reason for the output.
Related
I am trying to read file path from user as command line in simple format
ex:
path="c:\files\sample.txt"
but when trying to access using file.open("c:\files\sample.txt");
file not found.
so i used path.replace(path.begin(),path.end(),"\\","\\\\") for it to change \ to \\ but its not working.
Help me!!
As you seem to know
path="c:\files\sample.txt";
is incorrect, it should be
path="c:\\files\\sample.txt";
(or forward slashes work as well)
path="c:/files/sample.txt";
But this is also incorrect
path="c:\files\sample.txt";
path.replace(path.begin(),path.end(),"\\","\\\\");
The second line replaces all the single backslashes with double backslashes, but there are no single backslashes in your original string.
\f is an escape sequence for the form feed character, and \s not even a legal character sequence.
Escape sequences are used in string literals to denote characters which would otherwise be hard to write. For instance you cannot easily put " in a string literal because it would end the string, so the escape sequence \" exists to let you do this. Similarly since the backslash character is used to start an escape sequence, the escape sequence \\ stands for the backslash character itself.
These rules apply to string literals only, if you are reading a string from the user then there is no need to replace the backslashes with double backslashes, that makes no sense because single backslashes are what you want in a file path. It's just that the way to get single backslashes in a string literal is to write double backslashes.
Here is the example that confuses me:
select ' w' ~ '^\s\w$';
This results in "false", but seems like it should be true.
select ' w' ~ '^\\s\w*$';
This results in "true", but:
Why does \s need the extra backslash?
If it truly does, why does \w not need the extra backslash?
Thanks for any help!
I think you have tested it the wrong way because I'm getting the opposite results that you got.
select ' w' ~ '^\s\w$';
Is returning 1 in my case. Which actually makes sense because it is matching the space at the beginning of the text, followed by the letter at the end.
select ' w' ~ '^\\s\w*$';
Is returning 0 and it makes sense too. Here you're trying to match a backslash at the beginning of the text followed by an s and then, by any number of letters, numbers or underscores.
A piece of text that would match your second regex would be: '\sw'
Check the fiddle here.
The string constants are first parsed and interpreted as strings, including escaped characters. Escaping of unrecognized sequences is handled differently by different parsers, but generally, besides errors, the most common behavior is to ignore the backslash.
In the first example, the right-hand string constant is first being interpreted as '^sw$', where both \s and \w are not recognized string escape sequences.
In the second example the right hand constant is interpreted as '^\sw*$' where \\s escapes the \
After the strings are interpreted they are then applied as a regular expression, '^\sw*$' matching ' w' where '^sw$' does not.
Some languages use backslash as an escape character. Regexes do that, C-like languages do that, and some rare and odd dialects of SQL do that. PostgresSQL does it. PostgresSQL is translating the backslash escaping to arrive at a string value, and then feeding that string value to the regex parser, which AGAIN translates whatever backslashes survived the first translation -- if any. In your first regex, none did.
For example, in a string literal or a regex, \n doesn't mean a backslash followed by a lowercase n. It means a newline. Depending on the language, a backslash followed by a lowercase s will mean either just a lowercase s, or nothing. In PostgresSQL, an invalid escape sequence in a string literal translates as the escaped character: '\w' translates to 'w'. All the regex parser sees there is the w. By chance, you used the letter w in the string you're matching against. It's not matching that w in the lvalue because it's a word character; it's matching it because it's a lowercase w. Change it to lowercase x and it'll stop matching.
If you want to put a backslash in a string literal, you need to escape it with another backslash: '\\'. This is why \\s in your second regex worked. Add a second backslash to \w if you want to match any word character with that one.
This is a horrible pain. It's why JavaScript, Perl, and other languages have special conventions for regex literals like /\s\w/, and why C# programmers use the #"string literal" feature to disable backslash escaping in strings they intend to use as regexes.
I can only find negative lookbehind for this , something like (?<!\\).
But this won't compile in c++ and flex. It seems like both regex.h nor flex support this?
I am trying to implement a shell which has to get treat special char like >, < of | as normal argument string if preceded by backslash. In other word, only treat special char as special if not preceded by 0 or even number of '\'
So echo \\>a or echo abc>a should direct output to a
but echo \>a should print >a
What regular expression should I use?
I'm using flex and yacc to parse the input.
In a Flex rule file, you'd use \\ to match a single backslash '\' character. This is because the \ is used as an escape character in Flex.
BACKSLASH \\
LITERAL_BACKSLASH \\\\
LITERAL_LESSTHAN \\\\<
LITERAL_GREATERTHAN \\\\>
LITERAL_VERTICALBAR \\\\|
If I follow you correctly, in your case you want "\>" to be treated as literal '>' but "\\>" to be treated as literal '\' followed by special redirect. You don't need negative look behind or anything particularly special to accomplish this as you can build one rule that would accept both your regular argument characters and also the literal versions of your special characters.
For purposes of discussion, let's assume that your argument/parameter can contain any character but ' ', '\t', and the special forms of '>', '<', '|'. The rule for the argument would then be something like:
ARGUMENT ([^ \t\\><|]|\\\\|\\>|\\<|\\\|)+
Where:
[^ \t\\><|] matches any single character but ' ', '\t', and your special characters
\\\\ matches any instance of "\" (i.e. a literal backslash)
\\> matches any instance of ">" (i.e. a literal greater than)
\\< matches any instance of "\<" (i.e. a literal less than)
\\\| matches any instance of "\|" (i.e. a literal vertical bar/pipe)
Actually... You can probably just shorten that rule to:
ARGUMENT ([^ \t\\><|]|\\[^ \t\r\n])+
Where:
[^ \t\\><|] matches any single character but ' ', '\t', and your special characters
\\[^ \t\r\n] matches any character preceded by a '\' in your input except for whitespace (which will handle all of your special characters and allow for literal forms of all other characters)
If you want to allow for literal whitespace in your arguments/parameters then you could shorten the rule even further but be careful with using \\. for the second half of the rule alternation as it may or may not match " \n" (i.e. eat your trailing command terminator character!).
Hope that helps!
You cannot easily extract single escaped characters from a command-line, since you will not know the context of the character. In the simplest case, consider the following:
LessThan:\<
BackslashFrom:\\<
In the first one, < is an escaped character; in the second one, it is not. If your language includes quotes (as most shells do), things become even more complicated. It's a lot better to parse the string left to right, one entity at a time. (I'd use flex myself, because I've stopped wasting my time writing and testing lexers, but you might have some pedagogical reason to do so.)
If you really need to find a special character which shouldn't be special, just search for it (in C++98, where you don't have raw literals, you'll have to escape all of the backslashes):
regex: (\\\\)*\\[<>|]
(An even number -- possibly 0 -- of \, then a \ and a <, > or |)
as a C string => "(\\\\\\\\)*\\\\[<>|]"
I'm using vim to do a search and replace with this command:
%s/lambda\s*{\([\n\s\S]\)*//gc
I'm trying to match for all word, endline and whitespace characters after a {. For instance, the entirety of this line should match:
lambda {
FactoryGirl.create ...
Instead, it only matches up to the newline and no spaces before FactoryGirl. I've tried manually replacing all the spaces before, just in case there were tab characters instead, but no dice. Can anyone explain why this doesn't work?
The \s is an atom for whitespace; \n, though it looks similar, syntactically is an escape sequence for a newline character. Inside the collection atom [...], you cannot include other atoms, only characters (including some special ones like \n. From :help /[]:
The following translations are accepted when the 'l' flag is not
included in 'cpoptions' {not in Vi}:
\e <Esc>
\t <Tab>
\r <CR> (NOT end-of-line!)
\b <BS>
\n line break, see above |/[\n]|
\d123 decimal number of character
\o40 octal number of character up to 0377
\x20 hexadecimal number of character up to 0xff
\u20AC hex. number of multibyte character up to 0xffff
\U1234 hex. number of multibyte character up to 0xffffffff
NOTE: The other backslash codes mentioned above do not work inside
[]!
So, either specify the whitespace characters literally [ \t\n...], use the corresponding character class expression [[:space:]...], or combine the atom with the collection via logical or \%(\s\|[...]\).
Vim interprets characters inside of the [ ... ] character classes differently. It's not literally, since that regex wouldn't fully match lambda {sss or lambda {\\\. What \s and \S are interpreted as...I still can't explain.
However, I was able to achieve nearly what I wanted with:
%s/lambda\s*{\([\n a-zA-z]\)*//gc
That ignores punctuation, which I wanted. This works, but is dangerous:
%s/lambda\s*{\([\n a-zA-z]\|.\)*//gc
Because adding on a character after the last character like } causes vim to hang while globbing. So my solution was to add the punctuation I needed into the character class.
In such code, what it is called, \\n like this?
cout<<"Hello\\n \'world\'!";
What's the basic rule about such characters?
\n is an escape sequence to print a new line. Now if you want to print a \n ( a literal \n that is a slash followed by an n) on the screen you need to escape the \ like \\. So \\n will make \n print on the screen.
I suppose your question is about escape characters? They are a part of string literal declarations, not stream operations. See documentation for more details on escape sequences.
In particular: \n signifies new line, \t signifies TAB character, \" signifies a quote character.
In computing, we call those escape characters.
\n is a newline character; it signals the end of a line of text.
\\ is an escaped backslash, so it will print \. So \\n will just print a literal "\n" to the console.
For more information about C escape sequences, see Escape Sequences (MSDN).