ANTLR4 - Need an explanation on this String Literals - regex

On my assignment, I have this description for the String Lexer:
"String literals consist zero or more characters enclosed by double
quotes ("). Use escape sequences (listed below) to represent special
characters within a string. It is a compile-time error for a new line
or EOF character to appear inside a string literal.
All the supported escape sequences are as follows:
\b backspace
\f formfeed
\r carriage return
\n newline
\t horizontal tab
\" double quote
\ backslash
The following are valid examples of string literals:
"This is a string containing tab \t"
"He asked me: \"Where is John?\""
A string literal has a type of string."
And this is my String lexer:
STRINGLIT: '"'(('\\'('b'|'t'|'n'|'f'|'r'|'\"'|'\\'))|~('\n'))*'"';
Can anybody check for my lexer if it meets the requirement or not? If it's not, please tell me your correction, I don't really understand the requirement and ANTLR4.

With ANTLR4, instead of writing \\ ('b' | 't' | 'n'), you can write \\ [btn]. Also, as J Earls mentioned in a comment, you'll want to include the quote in your negated set, as well as the \r and the literal \.
This ought to do the trick:
STRINGLIT
: '"' ( '\\' [btnfr"'\\] | ~[\r\n\\"] )* '"'
;

try this:
QUOTE: '"';
STRINGLIT: QUOTE ( '\\' [bfrnt"\\] | ~[\b\f\r\n\t"\\] )* QUOTE
{self.text = self.text[1:-1]};

Related

Ignore escaped double quote characters swift

I am trying to validate a phone number using NSPredicate and regex. The only problem is when setting the regex Swift thinks that I am trying to escape part of it due to the backslashes. How can I get around this?
My code is as follows:
let phoneRegEx = "^((\(?0\d{4}\)?\s?\d{3}\s?\d{3})|(\(?0\d{3}\)?\s?\d{3}\s?\d{4})|(\(?0\d{2}\)?\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?$"
In Swift regular string literals, you need to double-escape the slashes to define literal backslashes:
let phoneRegEx = "^((\\(?0\\d{4}\\)?\\s?\\d{3}\\s?\\d{3})|(\\(?0\\d{3}\\)?\\s?\\d{3}\\s?\\d{4})|(\\(?0\\d{2}\\)?\\s‌​?\\d{4}\\s?\\d{4}))(\\s?#(\\d{4}|\\d{3}))?$"
Starting from Swift 5, you can use raw string literals and escape regex escapes with a single backslash:
let phoneRegEx = #"^((\(?0\d{4}\)?\s?\d{3}\s?\d{3})|(\(?0\d{3}\)?\s?\d{3}\s?\d{4})|(\(?0\d{2}\)?\s‌?\d{4}\s?\d{4}))(\s?#(\d{4}|\d{3}))?$"#
Please refer to the Regular Expression Metacharacters table on the ICU Regular Expressions page to see what regex escapes should be escaped this way.
Please mind the difference between the regex escapes (in the above table) and string literal escape sequences used in the regular string literals that you may check, say, at Special Characters in String Literals:
String literals can include the following special characters:
The escaped special characters \0 (null character), \\ (backslash), \t (horizontal tab), \n (line feed), \r (carriage return), \" (double quotation mark) and \' (single quotation mark)
An arbitrary Unicode scalar value, written as \u{n}, where n is a 1–8 digit hexadecimal number (Unicode is discussed in Unicode below)
So, in regular string literals, "\"" is a " string written as a string literal, and you do not have to escape a double quotation mark for the regex engine, so "\"" string literal regex pattern is enough to match a " char in a string. However, "\\\"", a string literal repesenting \" literal string will also match " char, although you can already see how redundant this regex pattern is. Also, "\n" (an LF symbol) matches a newline in the same way as "\\n" does, as "\n" is a literal representation of the newline char and "\\n" is a regex escape defined in the ICU regex escape table.
In raw string literals, \ is just a literal backslash.

Recognize special characters

I've got a little question (I've used Google before):
Is there a way, how to match all special unicode characters except quotes?
I have this code:
STRING: '"' (NUMBER|LETTER|' '|'!'|'?'|':'|'.'|'/'|'*')* '"';
fragment LETTER: ('a'..'z'|'A'..'Z');
fragment DIGIT: ('0'..'9');
Is there more efficient way?
Thanks for feedback!
~["], or the old v3 style ~'"', matches any character except a quote.
If you also want to exclude line breaks, do something like this:
STRING : '"' ~["\r\n]* '"';
From the official docs:
~x
Match any single character not in the set described by x. Set x can be a single character literal, a range, or a subrule set like ~(’x’|’y’|’z’) or ~[xyz]. Here is a rule that uses ~ to match any character other than characters using ~[\r\n]*:
COMMENT : '#' ~[\r\n]* '\r'? '\n' -> skip ;

Printing "\" character in C++

This question may be silly but would be great if i understand the behavior.
I try to print
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
using a simple program
char testme [] ="\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\0";
cout<<"testme:"<<testme<<endl;
The out put in this case is
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
I intend to print 64 "\" characters, instead the output is 32 "\" characters.
There seems to be some thing that i am missing since the out put is exactly half.
Edit: The reason why i was asking is becasue , i have to ^ "\" to another char for HMAC encryption and i see some weird things.
in C++11 you can do like this...
char testme [] =R"(\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\0)";
cout<<"testme:"<<testme<<endl;
The R"(...)" for Raw Character Strings...
To represent a backslash () in a string literal, we have to precede it with a backslash. To prevent errors (cos of too many backslash), C++ provides raw string literals...
This is called escaping and is a mechanism to insert certain characters into a string. For example, if you want to insert a citation mark into a string, you need to escape it.
char testme [] ="I am a so called \"programmer\".";
There's also \n, \t and other codes. However, this applies to \ itself, since you might want to be able to have a string that says \n without converting it into a newline character.
char testme [] ="This is a backslash followed by the letter n: \\n";
\\ is used to denote a single backslash: \. This is because \ is used in string literals to denote other symbols like \t for a tab, \n for a newline and \" for a quotation character.
So \\ gives you one backslash, \\\\ gives you two and so on.
To print a \ standard states that
C11; 6.4.4.4 Character constants:
The double-quote " and question-mark ? are representable either by themselves or by the
escape sequences \" and \?, respectively, but the single-quote ' and the backslash \
shall be represented, respectively, by the escape sequences \' and \\1.
That mean to print a \ you need an extra backslash \ . To print two \\ you need four backslash \\\\ and hence for 64 backslash you need 128 backslash.
1. Emphasis is mine.
\ is a special character known as Escape Character. For ex:: \n means newline character. So, if you want to print single \, you have to give \\. The first \ says the compiler to not treat the next \ as an escape character.
If it is C++, why not use string:
string testme(64, '\\');
cout << testme << endl;
The backslash \ is a very widespread escape character, and C++ also uses it like that. This means it's used to express special meaning (usually nonprintable characters). For example, to encode a line-feed character (ASCII 10) into a string, you express it as \n in the string literal. Another example, putting a single backslash at the end of a line (that is, before the line's terminating newline character) escapes the newline - so this way, you can continue a macro definition or //-style comment across several source file lines, and they will still count as one logical line.
This of course means that to get a literal backslash character, you have to escape the backslash itself, to get remove its "escape character" status. So typing \\ into a string literal yields a literal \ character.
That's why you get only half the amount of backslashes output - the C++ source code parser consumes two to produce one.
Didn't you notice one thing:
You printed 64 '\' but it printed only 32 of them.
Did you try 60, or 54, or some odd combi. say 33 ?
In C, '\' is escape character. You should have used '\n' for newline didn't you notice then, that '\' is not being printed.
To print '\' you must use '\\'.
A question for you:
Try printing 64 '%'. See what you get. Try understanding the reason for the output.

How to write regex express string literal in scala

String litertal consist zero or more character enclosed by double quote(").
Use escape sequences(listed below) to represent special characters within a string.
It is a compile-time error for a newline or EOF characterto appear inside a string literal.
All the supported escape sequences are as follow:
\b backspace
\f formfeed
\r carriage return
\n newline
\t tab
\" double quote
\ backslash
The following are valid examples of string literal:
" This is a string contain tab \t"
" Hello stackoverflow \"\b"
Can you help me write a regex match string literal?
Thanks so much.
The most general way is to use Pattern.quote() method which returns a regular expression that matches the literal string passed as its argument. You can use it in Scala as well as in Java.
If you want to match e.g. the string represented by the literal "contain tab \t", you would use the regexp "contain tab \t".r—so, there is no need for any special handling of TAB inside the regexp.

one question about iostream cout in C++

In such code, what it is called, \\n like this?
cout<<"Hello\\n \'world\'!";
What's the basic rule about such characters?
\n is an escape sequence to print a new line. Now if you want to print a \n ( a literal \n that is a slash followed by an n) on the screen you need to escape the \ like \\. So \\n will make \n print on the screen.
I suppose your question is about escape characters? They are a part of string literal declarations, not stream operations. See documentation for more details on escape sequences.
In particular: \n signifies new line, \t signifies TAB character, \" signifies a quote character.
In computing, we call those escape characters.
\n is a newline character; it signals the end of a line of text.
\\ is an escaped backslash, so it will print \. So \\n will just print a literal "\n" to the console.
For more information about C escape sequences, see Escape Sequences (MSDN).