What does s/(\W)/\\$1/g do in perl? [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I went over a piece of code in which a subroutine takes video filename as argument, and then printing it's duration time. Here I'm only showing the snippet.
sub videoInfo {
my $file = shift;
$file =~ s/(\W)/\\$1/g;
}
So far I understood is that it is dealing with whitespaces but I'm not able to break the meaning of code, I mean what is $1 and how it will work?

It puts backslashes in front of non-word characters. Things like "untitled file" becomes "untitled\ file".
As in most regular expression operations $1 represents the first thing captured with (...) which in this case is the (\W) representing a single non-word character.
I think this is an unnecessary home-rolled version of quotemeta.

Related

Trying to understand regex snippet (/[-/\\^$*+?.()|[\]{}]/g, '\\$&') [duplicate]

This question already has answers here:
Difference between $1 and $& in regular expressions
(3 answers)
Closed 2 years ago.
I am most interested in learning more about that piece at the end, '\\$&'
I'm not really sure what its doing, or how it works, but it gets the job done.
The code that I have:
function escapeRegExp(s) {
return s.replace(/[-/\\^$*+?.()|[\]{}]/g, '\\$&')
}
const searchRegex = new RegExp(
searchQuery
.split(/\s+/g)
.map(s => s.trim())
.filter(s => !!s)
.map(word => `(?=.*\\b${escapeRegExp(word)})`).join('') + '.+',
'i'
)
$& is defined in MDN's String#replace -- Specifying a string as a parameter reference as
Pattern $& Inserts the matched substring.
Essentially, this gives you back the entire matched substring. In your example, \\ prepends a single backslash, effectively escaping the match.
Here are some examples:
// replace every character with itself doubled
console.log("abc".replace(/./g, "$&$&"));
// replace entire string with itself doubled
console.log("abc".replace(/.*/g, "$&$&"));
// prepend a backslash to every match
console.log("abc".replace(/./g, "\\$&"));
// behavior with capture groups that can be accessed with $1, $2...
console.log("abc".replace(/(.)(.)/g, "[ $$1: $1 ; $$2: $2 ; $$&: $& ]"));

RegEx for Dutch ING bankstatement [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Is there anyone who can help me to get the marked pieces out of this file (see image below) with a regular expression? As you can see, it's difficult because the length is not always the same and the part before my goal is sometimes broken down and sometimes not.
Thank you in advance.
Text:
:61:200106D48,66NDDTEREF//00060100142533
/TRCD/01028/
:86:/EREF/SLDD-0705870-5658387529//MARF/11514814-001//CSID/NL59ZZZ390
373820000//CNTP/NL96ABNA0123456789/ABCANL2A/XXXXXXX123///REMI/UST
D//N00814760/
:61:200106D1840,55NDDTEREF//00060100142534
/TRCD/01028/
:86:/EREF/SLDD-0705869-5658387528//MARF/11514814-001//CSID/NL59ZZZ390
373820000//CNTP/NL96ABNA0123456789/ABCANL2A/XXX123XXXX///REMI/UST
D//N00814759/
:61:200106C236,31NTRFEREF//00060100142535
/TRCD/00100/
:86:/EREF/05881000010520//CNTP/NL19INGB0123456789/ABCBNL2A/XX123XXXX//
/REMI/USTD//KLM REF 1000000022/
The length is not always the same but it does not really matter in your case. You can check for a particular pattern at the end of a string.
(?<=\/\/)([\u2022a-zA-Z0-9]+)(?=\/$)
this regex will look for a string of caracter containing bullet (•), numbers, letters (uppercase and lowercase), that followes two front slash (//) and is followed by a slash (/) and the end of the string ( $ ).
You can test more cases here

How to filter out c-type comments with regex? [duplicate]

This question already has answers here:
Regex to match a C-style multiline comment
(8 answers)
Improving/Fixing a Regex for C style block comments
(5 answers)
Strip out C Style Multi-line Comments
(4 answers)
Closed 3 years ago.
I'm trying to filter out "c-style" comments in a line so i'm only left with the words (or actual code).
This is what i have so far: demo
regex:
\/\*[^\/]*[^\*]*\*\/
text:
/* 1111 */ one /*2222*/two /*3333 */ three/* 4444*/ four /*/**/ five /**/
My guess is that this expression might likely work,
\/\*(\/\*\*\/)?\s*([^\/*]+?)\s*(?:\/?\*?\*?\/|\*)
or we would modify our left and right boundaries, if we would have had different inputs.
In this demo, the expression is explained, if you might be interested.
We can try doing a regex replacement on the following pattern:
/\*.*?\*/
This matches any old-school C style comment. It works by using a lazy dot .*? to match only content within a single comment, before the end of that comment. We can then replace with empty string, to effectively remove these comments from the input.
Code:
Dim input As String = "/* 1111 */ one /*2222*/two /*3333 */ three/* 4444*/ four /*/**/ five /**/"
Dim output As String = Regex.Replace(input, "/\*.*?\*/", "")
Console.WriteLine(input)
Console.WriteLine(output)
This prints:
one two three four five

What does \. mean when matching in Perl? (backslash dot) [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
I am wondering what the characters \. mean in Perl, specifically in a matching expression. I know the \ can be an escape character. Is it simply escaping the dot? Or does it have an additional meaning together?
In the context below, I am assuming that when the condition ($ARGV[$i] =~ /\./) is satisfied, the variable $Chain is not set to the argument $ARGV[$i]. I tried looking up information on Perl regular expressions and matching but I am having trouble fitting the context.
for (my $i = 0; $i <= $#ARGV; $i++) {
if ($ARGV[$i] && ! ($ARGV[$i] =~ /\./)) {
$Chain .= " " . $ARGV[$i];
}
}
It's escaping the period so that it can match a period instead of using the period's usual special meaning.

Weird behaviour of the global g regex flag [duplicate]

This question already has answers here:
Help understanding global flag in perl
(2 answers)
Closed 9 years ago.
my $test = "There was once an\n ugly ducking";
if ($test =~ m/ugly/g) {
if ($test =~ m/here/g) {
print 'Match';
}
}
Results in no output, but
my $test = "There was once an\n ugly ducking";
if ($test =~ m/here/g) {
if ($test =~ m/ugly/g) {
print 'Match';
}
}
results in Match!
If I remove the g flag from the regex, then the second internal test matches whichever way around the matches appear in $test. I can't find a reference to why this is so.
Yes. That behaviour is documented in perlop man page. Using m/.../ with g flag advances in the string for the next match.
In scalar context, each execution of "m//g" finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the "pos()" function; see "pos" in perlfunc. A failed match normally resets the search position
to the beginning of the string, but you can avoid that by adding the "/c" modifier (e.g. "m//gc"). Modifying the target string also resets the search position.
So, in first case after ugly there isn't any here substring, but in second case it first matches here in There and later it finds the ugly word.