Can't get an Array of matches using Regular Expression - regex

const stringWithDate: string = "4/7/20 This is a date!";
const reg: RegExp = new RegExp("^(\d{1,2}\/\d{1,2}\/\d{1,2})").compile();
const exist: boolean = reg.test(stringWithDate)
const matches: RegExpExecArray | null = reg.exec(stringWithDate);
console.log(exist);
console.log(matches);
I am trying to get the date (4/7/20) extracted from strngWithDate. When I log the value of 'exist' it says true but the matches array says [""]. I'm not sure what I'm doing wrong here. I know the regex isn't that good but I know it works because I tried the same in python and
here. As far as I can tell it should give me "4/7/20" from stringWithDate. But isn't happening.

There are two problems:
You're not allowing for the fact your backslashes are in a string literal.
You're not passing anything into compile.
1. Backslashes
Remember that in a string literal, a backslash is an escape character, so the \d in your string is an unnecessary escape of d, which results in just d. So your actual regular expression is:
^(d{1,2}/d{1,2}/d{1,2})
Use the literal form instead:
const reg: RegExp = /^(\d{1,2}\/\d{1,2}\/\d{1,2})/; // No `compile`, see next point
Live Example:
const stringWithDate/*: string*/ = "4/7/20 This is a date!";
const reg/*: RegExp*/ = /^(\d{1,2}\/\d{1,2}\/\d{1,2})/; // No `compile`, see next point
const exist/*: boolean*/ = reg.test(stringWithDate)
const matches/*: RegExpExecArray | null*/ = reg.exec(stringWithDate);
console.log(exist);
console.log(matches);
2. compile
compile accepts a new expression to compile, replacing the existing expression. By not passing an expression in as an argument, you're getting the expression (?:), which matches the blank at the beginning of your string.
You dont need compile (spec | MDN). It's an Annex B feature (supposedly only in JavaScript engines in web browsers). Here's what the spec has to say in a note about it:
The compile method completely reinitializes the this object RegExp with a new pattern and flags. An implementation may interpret use of this method as an assertion that the resulting RegExp object will be used multiple times and hence is a candidate for extra optimization.
...but JavaScript engines can figure out whether a regular expression needs optimization without your telling them.
If you wanted to use compile, you'd do it like this:
const reg: RegExp = /x/.compile(/^(\d{1,2}\/\d{1,2}\/\d{1,2})/);
The contents of the initial regular expression are completely replaced with the pattern and flags from the one passed into compile.
Side note: There's no reason for the type annotations on any of those consts. TypeScript will correctly infer them.

Related

C++ regex capture group confusion

I'm implementing the nand2tetris Assembler in C++ (I'm pretty new to C++), and I'm having a lot of trouble parsing a C-instruction using regex. Mainly I really don't understand the return value of regex_search and how to use it.
Setting aside the various permutations of a C instruction, the current example I'm having trouble with is D=D-M. The result should have dest = "D"; comp = "D-M".
With the current code below, the regex appears to find the results correctly (confirmed by regex101.com), but, not really correctly, or something, or I don't know how to get to it. See the debugger screenshot. matches[n].second (which appears to contain the correct comp value) is not a string but an iterator.
Note that the 3rd capture group is correctly empty for this example.
auto regex_str = regex("([AMD]{1,3}=)?([01\-AMD!|+&><]{1,3})?(;[A-Z]{3})?");
regex_search(assemblyCode, matches, regex_str);
string dest = matches[1]; // this automatically casts some object (submatch) into a string?
string comp = matches[2];
string jump = matches[3];
I will note, though, that D=D+M works, but not D=D-M!
gcc warns about unknows escape sequence \- Demo.
You have to escape \,
std::regex("([AMD]{1,3}=)?([01\\-AMD!|+&><]{1,3})?(;[A-Z]{3})?");
or use raw string
std::regex(R"(([AMD]{1,3}=)?([01\-AMD!|+&><]{1,3})?(;[A-Z]{3})?)");
Demo

Pattern Validator in Angular Reactive Forms using Regex [duplicate]

I'm doing a small javascript method, which receive a list of point, and I've to read those points to create a Polygon in a google map.
I receive those point on the form:
(lat, long), (lat, long),(lat, long)
So I've done the following regex:
\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)
I've tested it with RegexPal and the exact data I receive:
(25.774252, -80.190262),(18.466465, -66.118292),(32.321384, -64.75737),(25.774252, -80.190262)
and it works, so why when I've this code in my javascript, I receive null in the result?
var polygons="(25.774252, -80.190262),(18.466465, -66.118292),(32.321384, -64.75737),(25.774252, -80.190262)";
var reg = new RegExp("/\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g");
var result = polygons.match(reg);
I've no javascript error when executing(with debug mode of google chrome). This code is hosted in a javascript function which is in a included JS file. This method is called in the OnLoad method.
I've searched a lot, but I can't find why this isn't working. Thank you very much!
Use a regex literal [MDN]:
var reg = /\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g;
You are making two errors when you use RegExp [MDN]:
The "delimiters" / are should not be part of the expression
If you define an expression as string, you have to escape the backslash, because it is the escape character in strings
Furthermore, modifiers are passed as second argument to the function.
So if you wanted to use RegExp (which you don't have to in this case), the equivalent would be:
var reg = new RegExp("\\(\\s*([0-9.-]+)\\s*,\\s([0-9.-]+)\\s*\\)", "g");
(and I think now you see why regex literals are more convenient)
I always find it helpful to copy and past a RegExp expression in the console and see its output. Taking your original expression, we get:
/(s*([0-9.-]+)s*,s([0-9.-]+)s*)/g
which means that the expressions tries to match /, s and g literally and the parens () are still treated as special characters.
Update: .match() returns an array:
["(25.774252, -80.190262)", "(18.466465, -66.118292)", ... ]
which does not seem to be very useful.
You have to use .exec() [MDN] to extract the numbers:
["(25.774252, -80.190262)", "25.774252", "-80.190262"]
This has to be called repeatedly until the whole strings was processed.
Example:
var reg = /\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g;
var result, points = [];
while((result = reg.exec(polygons)) !== null) {
points.push([+result[1], +result[2]]);
}
This creates an array of arrays and the unary plus (+) will convert the strings into numbers:
[
[25.774252, -80.190262],
[18.466465, -66.118292],
...
]
Of course if you want the values as strings and not as numbers, you can just omit the +.

How to build a Raw string for regex from string variable

How build a regex from a string variable, and interpret that as Raw format.
std::regex re{R"pattern"};
For the above code, is there a way to replace the fixed string "pattern" with a std::string pattern; variable that is either built from compile time or run time.
I tried this but didn't work:
std::string pattern = "key";
std::string pattern = std::string("R(\"") + pattern + ")\"";
std::regex re(pattern); // does not work as if it should when write re(R"key")
Specifically, the if using re(R("key") the result is found as expected. But building using re(pattern) with pattern is exactly the same value ("key"), it did not find the result.
This is probably what I need, but it was for Java, not sure if there is anything similar in C++:
How do you use a variable in a regular expression?
std::string pattern = std::string("R(\"") + pattern + ")\"";
should be build from raw string literals as follows
pattern = std::string(R"(\")") + pattern + std::string(R"(\")");
This results in a string value like
\"key\"
See a working live example;
In case you want to have escaped parenthesis, you can write
pattern = std::string(R"(\(")") + pattern + std::string(R"("\))");
This results in a string value like
\("key"\)
Live example
Side note: You can't define the pattern variable twice. Omit the std::string type in follow up uses.

How to access the results of .match as string value in Crystal lang

In many other programming languages, there is a function which takes as a parameter a regular expression and returns an array of string values. This is true of Javascript and Ruby. The .match in crystal, however, does 1) not seem to accept the global flag and 2) it does not return an array but rather a struct of type Regex::MatchData. (https://crystal-lang.org/api/0.25.1/Regex/MatchData.html)
As an example the following code:
str = "Happy days"
re = /[a-z]+/i
matches = str.match(re)
puts matches
returns Regex::MatchData("Happy")
I am unsure how to convert this result into a string or why this is not the default as it is in the inspiration language (Ruby). I understand this question probably results from my inexperience dealing with structs and compiled languages but I would appreciate an answer in hopes that it might also help someone else coming from a JS/Ruby background.
What if I want to convert to a string merely the first match?
puts "Happy days"[/[a-z]+/i]?
puts "Happy days".match(/[a-z]+/i).try &.[0]
It will try to match a string against /[a-z]+/i regex and if there is a match, Group 0, i.e. the whole match, will be output. Note that the ? after [...] will make it fail gracefully if there is no match found. If you just use puts "??!!"[/[a-z]+/i], an exception will be thrown.
See this online demo.
If you want the functionality similar to String#scan that returns all matches found in the input, you may use (shortened version only left as per #Amadan's remark):
matches = str.scan(re).map(&.string)
Output of the code above:
["Happy days", "Happy days"]
Note that:
String::scan will return an array of Regex::MatchData for each match.
You can call .string on the match to return the actual matched text.
Actually the posted example returns a #<MatchData "Happy"> in Ruby, which also has no "global" flag – thats what String#scan(Regex) is for as mentioned by others.
If you want only a single match without going through Regex::MatchData, you can use String#[](Regex):
str = "Happy days"
p str[/[a-z]+/i] # => "Happy"

How to replace parts of a string in lua "in a single pass"?

I have the following string of anchors (where I want to change the contents of the href) and a lua table of replacements, which tells which word should be replaced for:
s1 = '<a href="word7">'
replacementTable = {}
replacementTable["word1"] = "potato1"
replacementTable["word2"] = "potato2"
replacementTable["word3"] = "potato3"
replacementTable["word4"] = "potato4"
replacementTable["word5"] = "potato5"
The expected result should be:
<a href="word7">
I know I could do this iterating for each element in the replacementTable and process the string each time, but my gut feeling tells me that if by any chance the string is very big and/or the replacement table becomes big, this apporach is going to perform poorly.
So I though it could be best if I could do the following: apply the regular expression for finding all the matches, get an iterator for each match and replace each match for its value in the replacementTable.
Something like this would be great (writing it in Javascript because I don't know yet how to write lambdas in Lua):
var newString = patternReplacement(s1, '<a[^>]* href="([^"]*)"', function(match) { return replacementTable[match] })
Where the first parameter is the string, the second one the regular expression and the third one a function that is executed for each match to get the replacement. This way I think s1 gets parsed once, being more efficient.
Is there any way to do this in Lua?
In your example, this simple code works:
print((s1:gsub("%w+",replacementTable)))
The point is that gsub already accepts a table of replacements.
In the end, the solution that worked for me was the following one:
local updatedBody = string.gsub(body, '(<a[^>]* href=")(/[^"%?]*)([^"]*")', function(leftSide, url, rightSide)
local replacedUrl = url
if (urlsToReplace[url]) then replacedUrl = urlsToReplace[url] end
return leftSide .. replacedUrl .. rightSide
end)
It kept out any querystring parameter giving me just the URI. I know it's a bad idea to parse HTML bodies with regular expressions but for my case, where I required a lot of performance, this was performing a lot faster and just did the job.