Regex replace between 2 words and modify result - regex

I have this string example:
var s = 'type=audio&hls=&mp3=foo';
I would like to find everything between = and & and replace with quotes + matched value so I get this:
type="audio" hls="" mp3="foo"
(match is in quotes even if its empty and & gets replaced with space)
This is my regex but its not working:
s = s.replace(/=.+?\\&/g,function(a,inside){
return '="'+inside+'" ';
})

If we consider your regex one token at a time, here's what it means:
= matches a literal equal sign
.+? matches an optional string that has at least one character in it
\\ matches a literal backslash
& matches a literal ampersand
Among other problems, this requires that a literal backslash to be in the input string, and requires the string to end with an ampersand. Also .+? can be shortened to .*, but is still wrong because it might include ampersands and equal signs in the matched string.
Also, there is no need to replace with a function, as JavaScript can do what you are doing with just a string replacement.
A better regex might have these tokens:
= matches a literal equal sign
[^&]* matches a string (possibly empty) that does not contain ampersands
&? matches an optional ampersand
As Wiktor pointed out above, this could all be combined together like this:
s = s.replace(/=([^&]*)&?/g, '="$1" ').trim();
Here parentheses are used to mark the portion of the matched pattern that is being replaced, the $1 is used to refer to the marked portion of the pattern in parentheses, and the .trim() removes the trailing space.

Related

Powershell regex for string between two special characters

A file name as below
$inpFiledev = "abc_XYZ.bak"
I need only XYZ in a variable to do a compare with other file name.
i tried below:
[String]$findev = [regex]::match($inpFiledev ,'_*.').Value
Write-Host $findev
Asterisks in regex don't behave in the same way as they do in filesystem listing commands. As it stands your regex is looking for underscore, repeated zero or more times, followed by any character (represented in regex by a period). So the regex finds zero underscores right at the start of the string, then it finds 'a', and that's the match it returns.
First, correct that bit:
'_*.'
Becomes "underscore, followed by any number of characters, followed by a literal period". The 'literal period' means we need to escape the period in the regex, by using \., remembering that period means any character:
'_.*\.'
_ underscore
.* any number of characters
\. a literal period
That returns:
_XYZ.
So, not far off.
If you're looking to return something from between characters, you'll need to use capturing groups. Put parentheses around the bit you want to keep:
'_(.*)\.'
Then you'll need to use PowerShell regex groups to get the value:
[regex]::match($inpFiledev ,'_(.*)\.').Groups[1].Value
Which returns: XYZ
The number 1 in the Groups[1] just means the first capturing group, you can add as many as you like to the expression by using more parentheses, but you only need one in this case.
To complement mjsqu's helpful answer with two PowerShell-idiomatic alternatives:
For an overview of how regexes (regular expressions) are used in PowerShell, see Get-Help about_regular_expressions.
Using -split to split by _ and ., extracting the resulting 3-element array's middle element:
PS> ("abc_XYZ.bak" -split '[_.]')[1]
XYZ
-split's (first) RHS operand is a regex; regex [_.] is a character set ([...]) that matches a single char. that is either a literal _ or a literal . Therefore, input abc_XYZ.bak is broken into an array containing the strings abc, XYZ, and bak. Applying index [1] therefore extracts the middle token, XYZ.
Using -replace to extract the token of interest via a capture group ((...), referred to in the replacement operand as $1):
PS> "abc_XYZ.bak" -replace '^.+_([^.]+).+$', '$1'
XYZ
-replace too operates on a regex as the first RHS operand - what to replace - whereas the second operand specifies what to replace the matched (sub)string with.
Regex ^.+_([^.]+).+$:
^.+_ matches one or more (+) characters (.) at the start of the input (^) - note how . - used outside of a character set ([...]) - is a regex metacharacter that represents any character (in a single-line input string).
([^.]+) is a capture group ((...)) that matches a negated character set ([^...]): [^.] matches any literal char. that isn't a literal ., one or more times (+).
Whatever matched the sub-expression inside (...) can be referenced in the replacement operand as $<n>, where <n> represents the 1-based index of the capture group in the regex; in this case, $1 can be used to refer to this first (and only) capture group.
.+$ matches one or more (+) remaining characters (.) until the end of the input is reached ($).
Replacement operand $1 simply refers to what the first capture group matched; in this case: XYZ.
For a comprehensive overview of the syntax of -replace replacement operands, see this answer.
Because you're using the [regex] accelerator, you need the backslash to escape your end . (if you want to match it), and you need a dot before your asterix to match any characters after your underscore. If the characters in between are all letters, then use \w+
$findev = [regex]::match($inpFiledev ,'_.*\.')
$findev
_XYZ.
this demos two other ways to get the desired info from the sample string. the 1st uses the basic .Split() string method on the raw string. the 2nd presumes you are dealing with file objects and starts off by getting the .BaseName for the file. that already removes the extension, so you need not bother doing it yourself.
if you are dealing with a large number of strings, and not file objects, then the previous regex answers will likely be faster. [grin]
$inpFiledev = 'abc_XYZ.bak'
$findev = $inpFiledev.Split('.')[0].Split('_')[-1]
# fake reading in a file with Get-Item or Get-ChildItem
$File = [System.IO.FileInfo]'c:\temp\testing\abc_XYZ.bak'
$WantedPart = $File.BaseName.Split('_')[-1]
'split on a string = {0}' -f $findev
'split on BaseName of file = {0}' -f $WantedPart
output ...
split on a string = XYZ
split on BaseName of file = XYZ

How to exclude part of string using regex and change add this part and the and of string?

I've got a little problem with regex.
I got few strings in one file looking like this:
TEST.SYSCOP01.D%%ODATE
TEST.SYSCOP02.D%%ODATE
TEST.SYSCOP03.D%%ODATE
...
What I need is to define correct regex and change those string name for:
TEST.D%%ODATE.SYSCOP.#01
TEST.D%%ODATE.SYSCOP.#02
TEST.D%%ODATE.SYSCOP.#03
Actually, I got my regex:
r".SYSCOP[0-9]{2}.D%%ODATE" - for finding this in file
But how should look like the changing regex? I need to have the numbers from a string at the and of new string name.
.D%%ODATE.SYSCOP.# - this is just string, no regex and It didn't work
Any idea?
Find: (SYSCOP)(\d+)\.(D%%ODATE)
Replace: $3.$1.#$2 or \3.\1.#\2 for Python
Demo
You may use capturing groups with backreferences in the replacement part:
s = re.sub(r'(\.SYSCOP)([0-9]{2})(\.D%%ODATE)', r'\3\1.#\2', s)
See the regex demo
Each \X in the replacement pattern refers to the Nth parentheses in the pattern, thus, you may rearrange the match value as per your needs.
Note that . must be escaped to match a literal dot.
Please mind the raw string literal, the r prefix before the string literals helps you avoid excessive backslashes. '\3\1.#\2' is not the same as r'\3\1.#\2', you may print the string literals and see for yourself. In short, inside raw string literals, string escape sequences like \a, \f, \n or \r are not recognized, and the backslash is treated as a literal backslash, just the one that is used to build regex escape sequences (note that r'\n' and '\n' both match a newline since the first one is a regex escape sequence matching a newline and the second is a literal LF symbol.)

kotlin String::replace removing escape sequences?

I'm trying some string manipulation using regex's, but I'm not getting the expected output
var myString = "/api/<user_id:int>/"
myString.replace(Regex("<user_id:int>"), "(\\d+)")
this should give me something like /api/(\d+)/ but instead I get /api/(d+)/
However if I create an escaped string directly like var a = "\d+"
I get the correct output \d+ (that I can further use to create a regex Pattern)
is this due to the way String::replace works?
if so, isn't this a bug, why is it removing my escape sequences?
To make the replace a literal string, use:
myString.replace(Regex("<user_id:int>"), Regex.escapeReplacement("(\\d+)"))
For details, this is what kotlin Regex.replace is doing:
Pattern nativePattern = Pattern.compile("<user_id:int>");
String m = nativePattern.matcher("/api/<user_id:int>/").replaceAll("(\\d+)");
-> m = (d+)
From Matcher.replaceAll() javadoc:
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string. Dollar signs may be treated
as references to captured subsequences as described above, and
backslashes are used to escape literal characters in the replacement
string.
The call to Regex.escapeReplacement above does exactly that, turning (\\d+) to (\\\\d+)
You are using a .replace overload that takes a regex as the first argument, thus, the second argument is parsed as a regex replacement pattern. Inside a regex replacement pattern, a \ char is special, it may escape a dollar symbol to be treated as a literal dollar sign. So, the literal backslash inside regex replacement patterns should be doubled.
You might use
myString.replace(Regex("<user_id:int>"), """(\\d+)""")
Whenever you have to search and replace with a regex and your replacement pattern is a dynamic value, you should use Regex.escapeReplacement (see GUIDO's answer).
However, you are replacing a literal value with another literal value, you do not have to use a regex here:
myString.replace("<user_id:int>", """(\d+)""")
See this Kotlin demo yielding /api/(\d+)/.
Note the use of raw string literals where a backslash is parsed as a literal backslash.
The replacement as the regex engine see's it is interpolated as a double quoted string.
This is true with every regex engine.
This is to distinguish control codes, like tab newline or carriage return.
Nothing special here.
So the replacement as the engine wants to see it is (\\d+).
The language interpolates the same.
Final result repl_str = "(\\\\d+)"

What is the meaning of this line in perl?

$line =~ s/^<(\w+)=\"(.*?)\">//;
What is the meaning of this line in perl?
The s/.../.../ is the substitution operator. It matches its first operand, which is a regular expression and replaces it with its second operand.
By default, the substitution operator works on a string stored in $_. But your code uses the binding operator (=~) to make it work on $line instead.
The two operands to the substitution operator are the bits delimited by the / characters (there are more advanced versions of these delimiters, but we'll ignore them for now). So the first operand is ^<(\w+)=\"(.*?)\"> and the second operand is an empty string (because there is nothing between the second and third / characters).
So your code says:
Examine the variable $line
Look for a section of the string which matches ^<(\w+)=\"(.*?)\">
Replace that part of the string with an empty string
All that is left now is for us to untangle the regular expression and see what that matchs.
^ - matches the start of the string
< - matches a literal < character
(...) - means capture this bit of the match and store it in $1
\w+ - matches one or more "word characters" (where a word character is a letter, a digit or an underscore)
= - matches a literal = character
\" - matches a literal " character (the \ is unnecessary here)
(...) - means capture this bit of the match and store it in $2
.*? - matches zero or more instances of any character
\" - matches a literal " character (once again, the \ is unnecessary here)
> - matches a literal >
So, all in all, this looks like a slightly broken attempt to match XML or HTML. It matches tags of the form <foo="bar"> (which isn't valid XML or HTML) and replaces them with an empty string.
It's searching for an XML tag at the start of a string, and substituting it with nothing (i.e. removing it).
For example, in the input:
<hello="world">example
The regex will match <hello="world">, and substitute it with nothing - so the final result is just:
example
In general, this is something that you shouldn't do with regex. There are a dozen different ways you could create false negatives here, that don't get stripped from the string.
But if this is a "quick and dirty" script, where you don't need to worry about all possible edge cases, then it may be OK to use.

Perl regex: string literal containing metacharacters as variable for match expression

I need to define a string literal as a variable which will be later used as a match expression.
I want my variable $regex_op to match the string alt_id: ID: as well as the string id: ID:.
my $regex_op = "(id|alt_id):\sID:";
my $searchword = "4";
Later on, I'm joining the variables in a regular expression:
/^($regex_op)($searchword)/m
Unfortunately, the whitespace wildcard \s is an "Unrecognized escape \s passed through".
The problem apparently consists in the string literal containing backslashes (which are needed as part of the regex later on!).
Any ideas how to solve this?
For regexes, use regex quotes qr//. This ensures the correct parsing rules for regexes are used, not those for double quoted strings:
my $regex_op = qr/(?:id|alt_id):\sID:/; # I think that group should be non-capturing
my $searchword = 4;
/^$regex_op($searchword)/m; # no need to group $regex_op; unless you want to capture
In a double quoted string, if a backslash is followed by a character that is not a known escape, then that character is left as is, but the backslash removed:
"\s" eq "s"