I'm gettting a CR between the regex match and the ','. What's going on?
$r_date ='ExposeDateTime=([\w /:]{18,23})'
$v2 = (Select-String -InputObject $_ -Pattern $r_date | ForEach-Object {$_.Matches.Groups[1].Value}) + ',';
Example of output:
9/25/2018 8:45:19 AM[CR],
Original String:
ExposeDateTime=9/25/2018 8:45:19 AM
Error=Dap
PostKvp=106
PostMa=400
PostTime=7.2
PostMas=2.88
PostDap=0
Try this:
$original = #"
ExposeDateTime=9/25/2018 8:45:19 AM
Error=Dap
PostKvp=106
PostMa=400
PostTime=7.2
PostMas=2.88
PostDap=0
"#
$r_date ='ExposeDateTime=([\d\s/:]+(?:(?:A|P)M)?)'
$v2 = (Select-String -InputObject $original -Pattern $r_date | ForEach-Object {$_.Matches.Groups[1].Value}) -join ','
Regex details:
ExposeDateTime= Match the characters “ExposeDateTime=” literally
( Match the regular expression below and capture its match into backreference number 1
[\d\s/:] Match a single character present in the list below
A single digit 0..9
A whitespace character (spaces, tabs, line breaks, etc.)
One of the characters “/:”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
(?: Match the regular expression below
Match either the regular expression below (attempting the next alternative only if this one fails)
A Match the character “A” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
P Match the character “P” literally
)
M Match the character “M” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
if your input is a multiline string stored in $Original, then this rather simpler regex seems to do the job. [grin] it uses a named capture group and the multiline regex flag to capture the string after ExposedDateTime= and before the next line ending.
$Original -match '(?m)ExposeDateTime=(?<Date>.+)$'
$Matches.Date
output ...
9/25/2018 8:45:19 AM
Related
I have a file in which I need to modify a URL, not knowing what that URL contains, just like in the following example:
In file file.txt I have to replace the URL, so that wether it is "https://SomeDomain/Release/SomeText" or "https://SomeDomain/Staging/SomeText" to "https://SomeDomain/Deploy/SomeText". So like, whatever is written between SomeDomain and SomeText, it should be replaced with a known String. Are there any regex that can help me achieve this?
I used to do it with the following command"
((Get-Content -path "file.txt" -Raw) -replace '"https://SomeDomain/Release/SomeText");','"https://SomeDomain/Staging/SomeText");') | Set-Content -Path "file.txt"
This works fine, but I have to know if in file.txt the URL contains Release or Staging before executing the command.
Thanks!
You can do this with a regex -replace, where you capture the parts you wish to keep and use the backreferences to recreate the new string
$fileName = 'Path\To\The\File.txt'
$newText = 'BLAHBLAH'
# read the file as single multilined string
(Get-Content -Path $fileName -Raw) -replace '(https?://\w+/)[^/]+(/.*)', "`$1$newText`$2" | Set-Content -Path $fileName
Regex details:
( Match the regular expression below and capture its match into backreference number 1
http Match the characters “http” literally
s Match the character “s” literally
? Between zero and one times, as many times as possible, giving back as needed (greedy)
:// Match the characters “://” literally
\w Match a single character that is a “word character” (letters, digits, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
/ Match the character “/” literally
)
[^/] Match any character that is NOT a “/”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 2
/ Match the character “/” literally
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
I'm searching a ton of logs using a foreach loop for a string ($text) and currently outputting the entire line to an output file ($logfile)
Get-ChildItem "\\$server\$Path" -Filter "*.log" |select-string -pattern $text |select -expandproperty line |out-file $logfile -append
A sample line of one of the log files might look like this
May 25 04:08:36.640 2016 AUDITOF GUID 1312.2657.11075.54819.13021094807.198 opened by USER
where $text = "opened by USER"
All of this works fine and it spits out every line of every log file that includes $text which is great.
But.. what I think I'd like to do is get an output of the date time and the GUID. The Guid can change formats, lengths, etc., but it will always have dots and will always follow GUID (space) and precede (space) opened
In short, I'm trying to regex using a lookbehind (or lookforward) or match that would return something like this to the $logfile
May 25 04:08:36.640 2016,1312.2657.11075.54819.13021094807.198
Any help appreciated. I'm lousy with Regex.
One way would be to do this
$result = Get-ChildItem "\\$server\$Path" -Filter "*.log" -File |
Select-String -Pattern $text -SimpleMatch |
Select-Object -ExpandProperty Line |
ForEach-Object {
if ($_ -match '([a-z]{3,}\s*\d{2}\s*\d{2}:\d{2}:\d{2}\.\d{3}\s*\d{4}).*GUID ([\d.]+)') {
'{0},{1}' -f $matches[1], $matches[2]
}
}
$result | Out-File $logfile -Append
Explanation:
I added switch -SimpleMatch to the Select-String cmdlet, because it seems you want to match $text exactly and since it doesn't use regex there, this would be the best option.
Select-Object -ExpandProperty Line could return an array of matching lines, so I pipe this to ForEach-Object to loop though that
The if (..) uses the regex -match and if that condition is $true, we do whatever is inside the curly brackets.
Also, this test (if $true) automatically sets up an array of $matches objects and we use those matches to output a comma separated line, which is then collected in variable $result.
Finally we simply output that $result to a file
Regex details:
( Match the regular expression below and capture its match into backreference number 1
[a-z] Match a single character in the range between “a” and “z”
{3,} Between 3 and unlimited times, as many times as possible, giving back as needed (greedy)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d Match a single digit 0..9
{2} Exactly 2 times
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d Match a single digit 0..9
{2} Exactly 2 times
: Match the character “:” literally
\d Match a single digit 0..9
{2} Exactly 2 times
: Match the character “:” literally
\d Match a single digit 0..9
{2} Exactly 2 times
\. Match the character “.” literally
\d Match a single digit 0..9
{3} Exactly 3 times
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d Match a single digit 0..9
{4} Exactly 4 times
)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
GUID\ Match the characters “GUID ” literally
( Match the regular expression below and capture its match into backreference number 2
[\d.] Match a single character present in the list below
A single digit 0..9
The character “.”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
Can you please help me to get the desired output, where SIT is the environment and type of file is properties, i need to remove the environment and the extension of the string.
#$string="<ENV>.<can have multiple period>.properties
*$string ="SIT.com.local.test.stack.properties"
$b=$string.split('.')
$b[0].Substring(1)*
Required output : com.local.test.stack //can have multiple period
This should do.
$string = "SIT.com.local.test.stack.properties"
# capture anything up to the first period, and in between first and last period
if($string -match '^(.+?)\.(.+)\.properties$') {
$environment = $Matches[1]
$properties = $Matches[2]
# ...
}
You may use
$string -replace '^[^.]+\.|\.[^.]+$'
This will remove the first 1+ chars other than a dot and then a dot, and the last dot followed with any 1+ non-dot chars.
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1+ chars other than .
\. - a dot
| - or
\. - a dot
[^.]+ - 1+ chars other than .
$ - end of string.
You can use -match to capture your desired output using regex
$string ="SIT.com.local.test.stack.properties"
$string -match "^.*?\.(.+)\.[^.]+$"
$Matches.1
You can do this with the Split operator also.
($string -split "\.",2)[1]
Explanation:
You split on the literal . character with regex \.. The ,2 syntax tells PowerShell to return 2 substrings after the split. The [1] index selects the second element of the returned array. [0] is the first substring (SIT in this case).
Given
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
I can use
$line -match '^\{(?<type>[a-z]+)(-\[(?<target>(C|F|CF))\])?(\[(?<tab>\d+)\])?\}_(?<string>.*)'
And $matches['tab'] will correctly have a value of 3. However, if I then want to increment that value, without also affecting the [3] in the string section things get more complicated. I can use $tabIndex = $line.indexOf("[$tab]") to get the index of the first occurrence, and I can also use $newLine = ([regex]"\[$tab\]").Replace($line, '[4]', 1) to only replace the first occurrence. But I wonder, is there a way to get at the this more directly? It's not strictly necessary, as I will only ever want to replace things within the initial {}_, which has a very consistent form, so replacing first instance works, just wondering if I am missing out on a more elegant solution, which also might be needed in a different situation.
I would change the regex a bit, because mixing Named captures with Numbered captures is not recommended, so it becomes this:
'^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)'
You could then use it like below to replace the tab value:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$newTabValue = 12345
$line -replace '^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)', "{`${type}-[`${target}][$newTabValue]}_`${string}"
The result of this will be:
{initError-[cf][12345]}_Invalid nodes(s): [3]
Regex details:
^ Assert position at the beginning of the string
\{ Match the character “{” literally
(?<type> Match the regular expression below and capture its match into backreference with name “type”
[a-z] Match a single character in the range between “a” and “z”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?: Match the regular expression below
- Match the character “-” literally
\[ Match the character “[” literally
(?<target> Match the regular expression below and capture its match into backreference with name “target”
[CF] Match a single character present in the list “CF”
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
\[ Match the character “[” literally
(?<tab> Match the regular expression below and capture its match into backreference with name “tab”
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
\} Match the character “}” literally
_ Match the character “_” literally
(?<string> Match the regular expression below and capture its match into backreference with name “string”
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
An alternative way of increasing the first number in the brackets is using the -Split operator to access the number you want to change:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$NewLine = $line -split "(\d+)"
$NewLine[1] = [int]$newLine[1] + 1
-join $NewLine
Output:
{initError-[cf][4]}_Invalid nodes(s): [3]
I am editing a Perl file, but I don't understand this regexp comparison. Can someone please explain it to me?
if ($lines =~ m/(.*?):(.*?)$/g) { } ..
What happens here? $lines is a line from a text file.
Break it up into parts:
$lines =~ m/ (.*?) # Match any character (except newlines)
# zero or more times, not greedily, and
# stick the results in $1.
: # Match a colon.
(.*?) # Match any character (except newlines)
# zero or more times, not greedily, and
# stick the results in $2.
$ # Match the end of the line.
/gx;
So, this will match strings like ":" (it matches zero characters, then a colon, then zero characters before the end of the line, $1 and $2 are empty strings), or "abc:" ($1 = "abc", $2 is an empty string), or "abc:def:ghi" ($1 = "abc" and $2 = "def:ghi").
And if you pass in a line that doesn't match (it looks like this would be if the string does not contain a colon), then it won't process the code that's within the brackets. But if it does match, then the code within the brackets can use and process the special $1 and $2 variables (at least, until the next regular expression shows up, if there is one within the brackets).
There is a tool to help understand regexes: YAPE::Regex::Explain.
Ignoring the g modifier, which is not needed here:
use strict;
use warnings;
use YAPE::Regex::Explain;
my $re = qr/(.*?):(.*?)$/;
print YAPE::Regex::Explain->new($re)->explain();
__END__
The regular expression:
(?-imsx:(.*?):(.*?)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
See also perldoc perlre.
It was written by someone who either knows too much about regular expressions or not enough about the $' and $` variables.
THis could have been written as
if ($lines =~ /:/) {
... # use $` ($PREMATCH) instead of $1
... # use $' ($POSTMATCH) instead of $2
}
or
if ( ($var1,$var2) = split /:/, $lines, 2 and defined($var2) ) {
... # use $var1, $var2 instead of $1,$2
}
(.*?) captures any characters, but as few of them as possible.
So it looks for patterns like <something>:<somethingelse><end of line>, and if there are multiple : in the string, the first one will be used as the divider between <something> and <somethingelse>.
That line says to perform a regular expression match on $lines with the regex m/(.*?):(.*?)$/g. It will effectively return true if a match can be found in $lines and false if one cannot be found.
An explanation of the =~ operator:
Binary "=~" binds a scalar expression
to a pattern match. Certain operations
search or modify the string $_ by
default. This operator makes that kind
of operation work on some other
string. The right argument is a search
pattern, substitution, or
transliteration. The left argument is
what is supposed to be searched,
substituted, or transliterated instead
of the default $_. When used in scalar
context, the return value generally
indicates the success of the
operation.
The regex itself is:
m/ #Perform a "match" operation
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
: #Match a literal colon character
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
$ #Match the end of string
/g #Perform the regex globally (find all occurrences in $line)
So if $lines matches against that regex, it will go into the conditional portion, otherwise it will be false and will skip it.