I'm trying to use the following regex
(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?:(\/[^\s?#]+)([?][^\s#]+)?)?([#]\S*)?
on C++ like this:
#include <iostream>
#include <string>
#include <regex>
int main() {
std::string str("rtsp://3333:1232#hellowebsite.com:2222");
std::regex r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?:(\/[^\s?#]+)([?][^\s#]+)?)?([#]\S*)?");
std::smatch m;
std::regex_search(str, m, r);
std::cout << str << std::endl;
for(auto v: m) std::cout << v << std::endl;
}
To match rtsp or http URLs, but this is the output of compilation + running:
main.cpp:7:33: warning: unknown escape sequence '\/' [-Wunknown-escape-sequence]
std::regex r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?...
^~
main.cpp:7:35: warning: unknown escape sequence '\/' [-Wunknown-escape-sequence]
std::regex r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?...
^~
main.cpp:7:43: warning: unknown escape sequence '\s' [-Wunknown-escape-sequence]
std::regex r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?...
^~
main.cpp:7:46: warning: unknown escape sequence '\/' [-Wunknown-escape-sequence]
std::regex r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?...
^~
main.cpp:7:60: warning: unknown escape sequence '\s' [-Wunknown-escape-sequence]
std::regex r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?...
^~
main.cpp:7:62: warning: unknown escape sequence '\/' [-Wunknown-escape-sequence]
std::regex r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?...
^~
main.cpp:7:88: warning: unknown escape sequence '\/' [-Wunknown-escape-sequence]
...r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?:(\/[^\s?#]+)([...
^~
main.cpp:7:92: warning: unknown escape sequence '\s' [-Wunknown-escape-sequence]
...r("(https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?:(\/[^\s?#]+)([...
^~
main.cpp:7:105: warning: unknown escape sequence '\s' [-Wunknown-escape-sequence]
...\s#]+)?)?([#]\S*)?");
^~
main.cpp:7:118: warning: unknown escape sequence '\S' [-Wunknown-escape-sequence]
...\S*)?");
^~
10 warnings generated.
./main
rtsp://3333:1232#hellowebsite.com:2222
rtsp://3333:1232#helloweb
rtsp
3333:1232
helloweb
check here..
First of all, why I'm getting unknown escape sequences? \\, \s and etc are pretty known.
Most importantly, why do I get these unfinished groups? It works fine on regex online testers.
Especially when you're doing regexes, raw string literals are your friend. So, as a starting point, I'd do something like this:
std::regex r(R"--((https?|rtsp):\/\/(?:([^\s#\/]+?)[#])?([^\s\/:]+)(?:[:]([0-9]+))?(?:(\/[^\s?#]+)([?][^\s#]+)?)?([#]\S*)?)--");
If you really don't want to use raw string literals, the starting point is to note that a back-slash in a C++ string introduces an escape sequence, so when you want the literal to actually contain a back-slash you need to use two back-slash characters in a row, so (at bare minimum) you need to convert those, so it starts something like this:
std::regex r("(https?|rtsp):\\/\\/(?:
...continuing for all the other back-slashes it contains. There might be a bit more to do after that, but that's the minimum that it's immediately obvious you need to do.
Related
I built C parser from Lex/Flex & YACC/Bison grammars (1, 2) as:
$ flex c.l && yacc -d c.y && gcc lex.yy.c y.tab.c -o c
and then tested on this C code:
char* s = "xxx;
which is expected to produce missing terminating " character (or syntax error) diagnostics.
However, it doesn't:
$ ./c t1.c
char* s = xxx;
Why? How to fix it?
Note: The STRING_LITERAL is defined in lex specification as:
L?\"(\\.|[^\\"])*\" { count(); return(STRING_LITERAL); }
Here we see the [^\\"] part, which represents the "except the double-quote ", backslash , or new-line character" (C11, 6.4.5 String literals, 1) and the \\. part, which (incorrectly?) represents the escape-sequence (C11, 6.4.4.4 Character constants, 1). -- end note
UPD: Fix: The STRING_LITERAL is defined in lex specification as:
L?\"(\\.|[^\\"\n])*\" { count(); return(STRING_LITERAL); }
The lexer you link has a rule:
. { /* Add code to complain about unmatched characters */ }
so when it sees an unmatched ", it will silently ignore it. If you add code here to complain about the character, you'll see that.
If you want a syntax error, you could have this action just return *yytext;
Note that your STRING_LITERAL pattern will match strings that contain embedded newlines, so if you have a mismatched " in a larger program wity another string later, it will be recognized as a long string with embedded newlines. This will likely lead to poor error reporting, since the error would be reported after the bug string rather than where it starts, making it hard for a user to debug.
I want to pass a raw string literals to [[deprecated(message)]] attribute as the message. The message is used again and again. So I want to avoid code repeat.
First, I tried to use static constexpr variable.
static constexpr auto str = R"(
Use this_func()
Description: ...
Parameter: ...
)";
[[deprecated(str)]]
void test1() {
}
I got the error "deprecated message is not a string". It seems that static constexpr variable isn't accepted by [[deprecated(message)]].
I tried to define the row string literals as preprocessor macro.
#define STR R"(
Use this_func()
Description: ...
Parameter: ...
)"
[[deprecated(STR)]]
void test2() {
}
It works as I expected as follows on clang++ 8.0.0.
prog.cc:38:5: warning: 'test2' is deprecated:
Use this_func()
Description: ...
Parameter: ...
[-Wdeprecated-declarations]
test2();
^
Demo: https://wandbox.org/permlink/gN4iOrul8Y0F76TZ
But g++ 9.2.0 outputs the compile error as follows:
prog.cc:19:13: error: unterminated raw string
19 | #define STR R"(
| ^
prog.cc:23:2: warning: missing terminating " character
23 | )"
| ^
https://wandbox.org/permlink/e62pQ2Dq9vTuG6Or
#define STR R"( \
Use this_func() \
Description: ... \
Parameter: ... \
)"
If I add backslashes on the tail of each line, no compile error occurred but output message is different from I expected as follows:
prog.cc:38:11: warning: 'void test2()' is deprecated: \\nUse this_func() \\nDescription: ... \\nParameter: ... \\n [-Wdeprecated-declarations]
I'm not sure which compiler works correctly.
Is there any way to pass the raw string literals variable/macro to [[deprecated]] attribute?
There is no such thing as a "raw string literal variable". There may be a variable which points to a string literal, but it is a variable, not the literal itself. The deprecated attribute does not take a C++ constant expression evaluating to a string. It takes a string literal: an actual token sequence.
So the most you can do is use a macro to contain your string literal. Of course, macros and raw string literals don't play nice together, since the raw string is supposed to consume the entire text. So the \ characters will act as both continuations for the macro and be part of the string.
I am working on some Qt app, whose main window consists of QPlainTextEdit subclassed log window for outputting events. I have three types of messages:
Information message, which represents a QString that begins with [INFO] substring
Warning message, which represents a QString that begins with [WARNING] substring
Error message, which represents a QString that begins with [ERROR] substring
Now, what I want to achieve with QSyntaxHighlighter class is to change color of these messages according to their type (INFO type - Qt::DarkBlue color, WARNING type - Qt::DarkYellow color, ERROR type - Qt::DarkRed color) and here is code chunk, which should have done the task:
void UeLogWindowTextHighlighter::ueSetupRules()
{
UeHighlightRule* ueRuleInfo=new UeHighlightRule(this);
UeHighlightRule* ueRuleWarning=new UeHighlightRule(this);
UeHighlightRule* ueRuleError=new UeHighlightRule(this);
this->ueInfoStartExpression()->setPattern("^[INFO].\*"); // FIRST WARNING
this->ueWarningStartExpression()->setPattern("^[WARNING].\*"); // SECOND WARNING
this->ueErrorStartExpression()->setPattern("^[ERROR].\*"); // THIRD WARNING
this->ueInfoExpressionCharFormat()->setForeground(Qt::darkGray);
this->ueWarningExpressionCharFormat()->setForeground(Qt::darkYellow);
this->ueErrorExpressionCharFormat()->setForeground(Qt::darkRed);
ueRuleInfo->ueSetPattern(this->ueInfoStartExpression());
ueRuleInfo->ueSetTextCharFormat(this->ueInfoExpressionCharFormat());
this->ueHighlightRules()->append(ueRuleInfo);
ueRuleWarning->ueSetPattern(this->ueWarningStartExpression());
ueRuleWarning->ueSetTextCharFormat(this->ueWarningExpressionCharFormat());
this->ueHighlightRules()->append(ueRuleWarning);
ueRuleError->ueSetPattern(this->ueErrorStartExpression());
ueRuleError->ueSetTextCharFormat(this->ueErrorExpressionCharFormat());
this->ueHighlightRules()->append(ueRuleError);
} // ueSetupRules
However, when I compile the project, I get following warnings:
../../../gui/uelogwindowtexthighlighter.cpp: In member function 'void UeLogWindowTextHighlighter::ueSetupRules()': ../../../gui/uelogwindowtexthighlighter.cpp:58:47: warning: unknown escape sequence: '\*' [enabled by default]
this->ueInfoStartExpression()->setPattern("^[INFO].\*");
^ ../../../gui/uelogwindowtexthighlighter.cpp:59:50: warning: unknown escape sequence: '\*' [enabled by default]
this->ueWarningStartExpression()->setPattern("^[WARNING].\*");
^ ../../../gui/uelogwindowtexthighlighter.cpp:60:48: warning: unknown escape sequence: '\*' [enabled by default]
this->ueErrorStartExpression()->setPattern("^[ERROR].\*");
^
and consequently the messages are not colored (that is my suspicion). What is wrong with my regular expressions? I was following this question and answer on SO.
Star (*) didn't have to be escaped. Remove the \ or if you need the \ it should be escaped and write double \ (\\).
This question already has answers here:
😃 (and other Unicode characters) in identifiers not allowed by g++
(3 answers)
Closed 7 years ago.
I heard that it is possible to use unicode variable names using the -fextended-identifiers flag in gcc. So I made a test program in C++ but it does not compile.
#include <iostream>
#include <string>
#define ¬ !
#define ≠ !=
#define « <<
#define » >>
/* uniq: remove duplicate lines from stdin */
int main() {
std::string s;
std::string t = "";
while (cin » s) {
if (s ≠ t)
cout « s;
t = s;
}
return 0;
}
I get these errors:
g++ -fextended-identifiers -g3 -o a main.cpp
main.cpp:10:3: error: stray ‘\342’ in program
if (s ≠ t)
^
main.cpp:10:3: error: stray ‘\211’ in program
main.cpp:10:3: error: stray ‘\240’ in program
main.cpp:11:4: error: stray ‘\302’ in program
cout « s;
^
main.cpp:11:4: error: stray ‘\253’ in program
What is going on? Aren't these macro names supposed to work with -fextended-identifiers?
G++ doesn't support Unicode characters in the source yet:
What is the status of adding the UTF-8 support for identifier names in GCC?
Notably, the errors generated by your program are for the individual octets of the UTF-8 encoding, not for the Unicode character they represent. ≠ is being seen as three bytes: \342\211\240 and « as two: \302\253.
The C++ Standard requires (section 2.10):
An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name in an identifier shall designate a character whose encoding in ISO 10646 falls into one of the ranges specified in E.1. The initial element shall not be a universal-character-name designating a character whose encoding falls into one of the ranges specified in E.2. Upper- and lower-case letters are different. All characters are significant.
And E.1:
Ranges of characters allowed [charname.allowed]
00A8, 00AA, 00AD, 00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF
0100-167F, 1681-180D, 180F-1FFF
200B-200D, 202A-202E, 203F-2040, 2054, 2060-206F
2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF
3004-3007, 3021-302F, 3031-303F
3040-D7FF
F900-FD3D, FD40-FDCF, FDF0-FE44, FE47-FFFD
10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD,
60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD,
B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFFD
0300-036F, 1DC0-1DFF, 20D0-20FF, FE20-FE2F
Your angle brackets are 0x300A and 0x300B, which are not included. Not equal is 0x2260, also disallowed.
#include <iostream>
#include <fstream>
#include <cstring>
#define MAX_CHARS_PER_LINE 512
#define MAX_TOKENS_PER_LINE 20
#define DELIMITER " "
using namespace std;
int main ()
{
//char buf[MAX_CHARS_PER_LINE];
string buf; // string to store a line in file
fstream fin;
ofstream fout;
fin.open("PiCalculator.h", ios::in);
fout.open("op1.txt", ios::out);
fout<< "#include \"PiCalculator.h\"\n";
static int flag=0; //this variable counts the no of curly brackets
while (!fin.eof())
{
// read an entire line into memory
getline(fin, buf);
//fout<<buf.back()<<endl;
if(buf.back()== "{" && buf.front() !='c'){
flag++;
fout<<buf<<endl;
}
if(flag > 0)
fout<<buf<<endl;
if(buf.back()== "}"){
flag--;
fout<<buf<<endl;
}
}
cout<<buf.back()<<endl;
return 0;
}
Here I'm getting the error in the if condition:
if(buf.back()== "{" && buf.front() !='c')
The error states that: ISO C++ forbids comparison between pointer and integer [-fpermissive]
Can anyone help me in sorting out the problem ??
The comparison you have might be invalid due to you checking for "{" for bug.back and 'c' for bug.front. I assume that the types returned by the back and front function have to be the same. One is in single quotes, and the other "{" is double quotes. Thats the issue.
Change "{" to '{'
Hope this helps.
As noted by Samuel, std::string::back returns a char&; (e.g., see http://en.cppreference.com/w/cpp/string/basic_string/back) and you're comparing it to a string literal (double quotes signify a string, single quotes a char).
It's not clear why it isn't necessary to have a #include <string> directive in this case, but it would be best practice to have it. It's not obvious that you need <cstring>, though.
I recommend compiling with warnings on; if I compile your code that way, I get reasonably helpful error messages:
g++ -Wall -Wextra -std=c++11 q3.cc -o q3
q3.cc: In function ‘int main()’:
q3.cc:24:25: warning: comparison with string literal results in unspecified behaviour [-Waddress]
if(buf.back()== "{" && buf.front() !='c'){
^
q3.cc:24:25: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
q3.cc:30:25: warning: comparison with string literal results in unspecified behaviour [-Waddress]
if(buf.back()== "}"){
^
q3.cc:30:25: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
This is gcc version 4.8.2.
buf.back() returns either a char or a char &.
What you are comparing is char with a string literal "{" which won't work.
change this line to (and all others where you compare with a string literal:
if(buf.back()== '{' && buf.front() !='c'){
Note the '{'
buf.back()== "{"
The string literal "{" decays into a pointer and the buf.back() returns an int, and hence u get the error
Please change that to
buf.back() == '{'