How do I write a regular expression to get that returns only the letters and numbers without the asterisks in between ?
You could use a regex replacement here:
my $var = 'RMRIV43069411**2115.82';
$var =~ s/^.*?\D(\d+(?:\.\d+)*)$/$1/g;
print "$var"; // 2115.82
The idea is to capture the final number in the string, and then replace with only that captured quantity.
Here is an explanation of the pattern:
^ from the start of the input
.*? consume all content up until
\D the first non digit character, which is followed by
(\d+(?:\.\d+)*) match AND capture: a number, with optional decimal component,
occurring before
$ the end of the input
Then, we place with just this captured number, which is available in $1.
A file name as below
$inpFiledev = "abc_XYZ.bak"
I need only XYZ in a variable to do a compare with other file name.
i tried below:
[String]$findev = [regex]::match($inpFiledev ,'_*.').Value
Write-Host $findev
Asterisks in regex don't behave in the same way as they do in filesystem listing commands. As it stands your regex is looking for underscore, repeated zero or more times, followed by any character (represented in regex by a period). So the regex finds zero underscores right at the start of the string, then it finds 'a', and that's the match it returns.
First, correct that bit:
'_*.'
Becomes "underscore, followed by any number of characters, followed by a literal period". The 'literal period' means we need to escape the period in the regex, by using \., remembering that period means any character:
'_.*\.'
_ underscore
.* any number of characters
\. a literal period
That returns:
_XYZ.
So, not far off.
If you're looking to return something from between characters, you'll need to use capturing groups. Put parentheses around the bit you want to keep:
'_(.*)\.'
Then you'll need to use PowerShell regex groups to get the value:
[regex]::match($inpFiledev ,'_(.*)\.').Groups[1].Value
Which returns: XYZ
The number 1 in the Groups[1] just means the first capturing group, you can add as many as you like to the expression by using more parentheses, but you only need one in this case.
To complement mjsqu's helpful answer with two PowerShell-idiomatic alternatives:
For an overview of how regexes (regular expressions) are used in PowerShell, see Get-Help about_regular_expressions.
Using -split to split by _ and ., extracting the resulting 3-element array's middle element:
PS> ("abc_XYZ.bak" -split '[_.]')[1]
XYZ
-split's (first) RHS operand is a regex; regex [_.] is a character set ([...]) that matches a single char. that is either a literal _ or a literal . Therefore, input abc_XYZ.bak is broken into an array containing the strings abc, XYZ, and bak. Applying index [1] therefore extracts the middle token, XYZ.
Using -replace to extract the token of interest via a capture group ((...), referred to in the replacement operand as $1):
PS> "abc_XYZ.bak" -replace '^.+_([^.]+).+$', '$1'
XYZ
-replace too operates on a regex as the first RHS operand - what to replace - whereas the second operand specifies what to replace the matched (sub)string with.
Regex ^.+_([^.]+).+$:
^.+_ matches one or more (+) characters (.) at the start of the input (^) - note how . - used outside of a character set ([...]) - is a regex metacharacter that represents any character (in a single-line input string).
([^.]+) is a capture group ((...)) that matches a negated character set ([^...]): [^.] matches any literal char. that isn't a literal ., one or more times (+).
Whatever matched the sub-expression inside (...) can be referenced in the replacement operand as $<n>, where <n> represents the 1-based index of the capture group in the regex; in this case, $1 can be used to refer to this first (and only) capture group.
.+$ matches one or more (+) remaining characters (.) until the end of the input is reached ($).
Replacement operand $1 simply refers to what the first capture group matched; in this case: XYZ.
For a comprehensive overview of the syntax of -replace replacement operands, see this answer.
Because you're using the [regex] accelerator, you need the backslash to escape your end . (if you want to match it), and you need a dot before your asterix to match any characters after your underscore. If the characters in between are all letters, then use \w+
$findev = [regex]::match($inpFiledev ,'_.*\.')
$findev
_XYZ.
this demos two other ways to get the desired info from the sample string. the 1st uses the basic .Split() string method on the raw string. the 2nd presumes you are dealing with file objects and starts off by getting the .BaseName for the file. that already removes the extension, so you need not bother doing it yourself.
if you are dealing with a large number of strings, and not file objects, then the previous regex answers will likely be faster. [grin]
$inpFiledev = 'abc_XYZ.bak'
$findev = $inpFiledev.Split('.')[0].Split('_')[-1]
# fake reading in a file with Get-Item or Get-ChildItem
$File = [System.IO.FileInfo]'c:\temp\testing\abc_XYZ.bak'
$WantedPart = $File.BaseName.Split('_')[-1]
'split on a string = {0}' -f $findev
'split on BaseName of file = {0}' -f $WantedPart
output ...
split on a string = XYZ
split on BaseName of file = XYZ
I am using regexp replacing in emacs to replace the first term and third term's double quote to single quote. Input:
"term1" "term2" "term3" "term4"
(the term seperator is tab)
wanted output:
'term1' "term2" 'term3' "term4"
I used below regexp search and replacement strings:
search: "\(.+?\)" "\(.+?\)" "\(.+?\)"
replacement: '\1' "\2" '\3'
However, the actual output replaces first and fourth term instead:
'term1' "term2" "term3" 'term4'
Is there any mistake in my regexp?
Elisp regexps are greedy, so I expect your first group is actually matching the whole line, not just the "term". Try this instead:
"\([^"]+?\)" "\([^"]+?\)" "\([^"]+?\)"
The regex I'm searching has the following constraints:
it starts with "//"
then "[" a non number sequence (called delimiter in this list) and "]"
next line "\n"
"[" 0 or more number separated by the delimiter previously found "]".
For example the following text matches the regex:
//[*#*]
[1*#*34*#*64]
and the following text doesn't match the regex:
//[*#*]
[1#34#64]
because the delimiter is not the same matched in the first row
The regex I currently create is
^//\[(\D)+\]\n\[[(\d)+(\D)+]*(\d)+\]$|^//\[(\D)+\]\n\[\]$|^//\[(\D)+\]\n\[(\d)+\]$
but obviously this regex match with both previous examples.
Is there a way to "recall" a char sequence already matched in the regex itself?
You need something called back-reference (a very good tutorial here).
Use this regex in Python:
r'^//\[([^\]]+)\]\n\[\d+(\1\d+)*\]'
Sample run:
>>> string = """//[*#*]
... [1*#*34*#*64]"""
>>> print re.search(r'^//\[([^\]]+)\]\n\[\d+(\1\d+)*\]',string).group(0)
//[*#*]
[1*#*34*#*64]
will match your string in Python.
Debuggex Demo
You need to use a back-reference, in most languages you can reference a matching group using \n where n is the group number.
This pattern will work:
//\[([^]]++)]\n\[(?>\d++\1?)+]
To break it down:
// just matches the literal
\[([^]]++)] matches some characters in square brackets
\n matches the newline
\[(?:\d++\1?)++] matches one or more digits followed by the match captured in the first pattern section - optionally. This is an atomic group.
Is there a Powershell regex command I could use to replace the last consecutive zero in a text string with a "M". For Example:
$Pattern = #("000123456", "012345678", "000000001", "000120000")
Final result:
00M123456
M12345678
0000000M1
00M120000
Thanks.
Search for the following regex:
"^(0*)0"
The regex searches for a consecutive string of 0 at the beginning ^ of the string. It captures all the 0 except the one for replacement. "^0(0*)" also works, since we only need to take note of the number of 0 which we don't touch.
With the replacement string:
'$1M'
Note that $1 is denotes the text captured by the first capturing group, which is (0*) in the regex.
Example by #SegFault:
"000120000" -replace "^(0*)0", '$1M'