How to build a Raw string for regex from string variable - c++

How build a regex from a string variable, and interpret that as Raw format.
std::regex re{R"pattern"};
For the above code, is there a way to replace the fixed string "pattern" with a std::string pattern; variable that is either built from compile time or run time.
I tried this but didn't work:
std::string pattern = "key";
std::string pattern = std::string("R(\"") + pattern + ")\"";
std::regex re(pattern); // does not work as if it should when write re(R"key")
Specifically, the if using re(R("key") the result is found as expected. But building using re(pattern) with pattern is exactly the same value ("key"), it did not find the result.
This is probably what I need, but it was for Java, not sure if there is anything similar in C++:
How do you use a variable in a regular expression?

std::string pattern = std::string("R(\"") + pattern + ")\"";
should be build from raw string literals as follows
pattern = std::string(R"(\")") + pattern + std::string(R"(\")");
This results in a string value like
\"key\"
See a working live example;
In case you want to have escaped parenthesis, you can write
pattern = std::string(R"(\(")") + pattern + std::string(R"("\))");
This results in a string value like
\("key"\)
Live example
Side note: You can't define the pattern variable twice. Omit the std::string type in follow up uses.

Related

The regex in string.format of LUA

I use string.format(str, regex) of LUA to fetch some key word.
local RICH_TAGS = {
"texture",
"img",
}
--\[((img)|(texture))=
local START_OF_PATTER = "\\[("
for index = 1, #RICH_TAGS - 1 do
START_OF_PATTER = START_OF_PATTER .. "(" .. RICH_TAGS[index]..")|"
end
START_OF_PATTER = START_OF_PATTER .. "("..RICH_TAGS[#RICH_TAGS].."))"
function RichTextDecoder.decodeRich(str)
local result = {}
print(str, START_OF_PATTER)
dump({string.find(str, START_OF_PATTER)})
end
output
hello[img=123] \[((texture)|(img))
dump from: [string "utils/RichTextDecoder.lua"]:21: in function 'decodeRich'
"<var>" = {
}
The output means:
str = hello[img=123]
START_OF_PATTER = \[((texture)|(img))
This regex works well with some online regex tools. But it find nothing in LUA.
Is there any wrong using in my code?
You cannot use regular expressions in Lua. Use Lua's string patterns to match strings.
See How to write this regular expression in Lua?
Try dump({str:find("\\%[%("))})
Also note that this loop:
for index = 1, #RICH_TAGS - 1 do
START_OF_PATTER = START_OF_PATTER .. "(" .. RICH_TAGS[index]..")|"
end
will leave out the last element of RICH_TAGS, I assume that was not your intention.
Edit:
But what I want is to fetch several specific word. For example, the
pattern can fetch "[img=" "[texture=" "[font=" any one of them. With
the regex string I wrote in my question, regex can do the work. But
with Lua, the way to do the job is write code like string.find(str,
"[img=") and string.find(str, "[texture=") and string.find(str,
"[font="). I wonder there should be a way to do the job with a single
pattern string. I tryed pattern string like "%[%a*=", but obviously it
will fetch a lot more string I need.
You cannot match several specific words with a single pattern unless they are in that string in a specific order. The only thing you could do is to put all the characters that make up those words into a class, but then you risk to find any word you can build from those letters.
Usually you would match each word with a separate pattern or you match any word and check if the match is one of your words using a look up table for example.
So basically you do what a regex library would do in a few lines of Lua.

Can't get an Array of matches using Regular Expression

const stringWithDate: string = "4/7/20 This is a date!";
const reg: RegExp = new RegExp("^(\d{1,2}\/\d{1,2}\/\d{1,2})").compile();
const exist: boolean = reg.test(stringWithDate)
const matches: RegExpExecArray | null = reg.exec(stringWithDate);
console.log(exist);
console.log(matches);
I am trying to get the date (4/7/20) extracted from strngWithDate. When I log the value of 'exist' it says true but the matches array says [""]. I'm not sure what I'm doing wrong here. I know the regex isn't that good but I know it works because I tried the same in python and
here. As far as I can tell it should give me "4/7/20" from stringWithDate. But isn't happening.
There are two problems:
You're not allowing for the fact your backslashes are in a string literal.
You're not passing anything into compile.
1. Backslashes
Remember that in a string literal, a backslash is an escape character, so the \d in your string is an unnecessary escape of d, which results in just d. So your actual regular expression is:
^(d{1,2}/d{1,2}/d{1,2})
Use the literal form instead:
const reg: RegExp = /^(\d{1,2}\/\d{1,2}\/\d{1,2})/; // No `compile`, see next point
Live Example:
const stringWithDate/*: string*/ = "4/7/20 This is a date!";
const reg/*: RegExp*/ = /^(\d{1,2}\/\d{1,2}\/\d{1,2})/; // No `compile`, see next point
const exist/*: boolean*/ = reg.test(stringWithDate)
const matches/*: RegExpExecArray | null*/ = reg.exec(stringWithDate);
console.log(exist);
console.log(matches);
2. compile
compile accepts a new expression to compile, replacing the existing expression. By not passing an expression in as an argument, you're getting the expression (?:), which matches the blank at the beginning of your string.
You dont need compile (spec | MDN). It's an Annex B feature (supposedly only in JavaScript engines in web browsers). Here's what the spec has to say in a note about it:
The compile method completely reinitializes the this object RegExp with a new pattern and flags. An implementation may interpret use of this method as an assertion that the resulting RegExp object will be used multiple times and hence is a candidate for extra optimization.
...but JavaScript engines can figure out whether a regular expression needs optimization without your telling them.
If you wanted to use compile, you'd do it like this:
const reg: RegExp = /x/.compile(/^(\d{1,2}\/\d{1,2}\/\d{1,2})/);
The contents of the initial regular expression are completely replaced with the pattern and flags from the one passed into compile.
Side note: There's no reason for the type annotations on any of those consts. TypeScript will correctly infer them.

QRegExp in C++ to capture part of string

I am attempting to use Qt to execute a regex in my C++ application.
I have done similar regular expressions with Qt in C++ before, but this one is proving difficult.
Given a string with optional _# at the end of the string, I want to extract the part of the string before that.
Examples:
"blue_dog" should result "blue_dog"
"blue_dog_1" should result "blue_dog"
"blue_dog_23" should result "blue_dog"
This is the code I have so far, but it does not work yet:
QString name = "blue_dog_23";
QRegExp rx("(.*?)(_\\d+)?");
rx.indexIn(name);
QString result = rx.cap(1);
I have even tried the following additional options in many variations without luck. My code above always results with "":
rx.setMinimal(TRUE);
rx.setPatternSyntax(QRegExp::RegExp2);
Sometimes it's easier not to pack everything in a single regexp. In your case, you can restrict manipulation to the case of an existing _# suffix. Otherwise the result is name:
QString name = "blue_dog_23";
QRegExp rx("^(.*)(_\\d+)$");
QString result = name;
if (rx.indexIn(name) == 0)
result = rx.cap(1);
Alternatively, you can split the last bit and check if it is a number. A compact (but maybe not the most readable) solution:
QString name = "blue_dog_23";
int i = name.lastIndexOf('_');
bool isInt = false;
QString result = (i >= 0 && (name.mid(i+1).toInt(&isInt) || isInt)) ? name.left(i) : name;
The following solution should work as you want it to!
^[^\s](?:(?!_\d*\n).)*/gm
Basically, that is saying match everything up to, but not including, _\d*\n. Here, _\d*\n means match the _ char, then match any number of digits \d* until a new line marker, \n is reached. ?! is a negative lookahead, and ?: is a non-capturing group. Basically, the combination means that the sequence after the ?: is the group representing the non-inclusive end point of the what should be captured.
The ^[^\s] tells the expression to match starting at the start of a line, as long as the first character isn't a white space.
The /gm sets the global flag (allowing more than one match to be returned) and the mutli-line flag (which allows sequences to match past a single line.

add datetime as string to a string after matching a pattern in vb.net

I have this string for example: "Example_string.xml"
and i would like to add before the "." _DateTime of now so it will be like:
"Example_string_20151808185631.xml"
How can i achieve it? regex?
Yes, you can achieve that through the use of a look ahead. For instance:
Dim result As String = Regex.Replace("Example_string.xml", "(?=\.)", "_20151808185631")
Since the pattern only matches a position in the string (the position just before the period), rather than matching a portion of the text, the replace method doesn't actually replace any of the input text. It effectively just inserts the replacement text into that position in the string.
Alternatively, if you find that confusing, you could just match the period and then just include the period in the replacement text:
Dim result As String = Regex.Replace("Example_string.xml", "\.", "_20151808185631.")
If you don't want to just look for any period, and you want to be more safe about it (such as handling file names that contain multiple periods, then instead of \., you could use something like \.\w+$. However, if you need to make it that resilient, and it doesn't have to be done with RegEx, it would be better to use the Path.GetFileNameWithoutExtension and Path.GetExtension methods, as recommended by Crowcoder. For instance, you may also need to make it handle file names that have no extension, which even further complicates it.
or...
Path.GetFileNameWithoutExtension("Example_string.xml") + "_20151808185631" + Path.GetExtension("Example_string.xml")
How about:
Dim sFile As String = "Example_string.xml"
Dim sResult As String = sFile.ToLower.Replace(".xml", "_" & Format(Now(), "yyyyMMddHHmmss") & ".xml")
MsgBox(sresult, , sFile)

String searching with Regex

I am trying to use regex (which I am just starting with) to find sequences with up to 1 mismatched character. For example the pattern “nan” and the text “banana” I would want to find “ban” and “nan” the former being acceptable with the mismatch with ‘b’ and ‘n’. The problem I am having is making up a regex pattern without resorting to making individual wildcard inserts where I want them.
final String[] patterns = {"[a-z]an", "n[a-z]n", "na[a-z]"};
final String text = "banana";
for(String pattern : patterns)
{
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);
while(m.find())
{
System.out.println(m.start() + " " + m.group());
}
}
Is what I have as a test which is kind of a clunky way of getting what I want (albeit with some duplicates). For this kind of String Searching with a single mismatch is regex an effective means or should I try modifying traditional algorithms like Horspool or KMP?
Unfortunately regular expressions are designed to allow you to be very precise in the pattern you intend to match, conversely there is little support of approximate match. However the TRE library was designed exactly for this purpose ( http://laurikari.net/tre/ )