RegEx command to get date - regex

Using the following RegEx line in my PowerShell script to pull dates from .txt files. The script is reading and pulling the dates to a .csv file in this format Year,Month,Day,Hour,Min,Sec (2020,06,20,00,50,56). I'm looking for some guidance on how I can get the date just to show without the commas in this format 2020-06-20
This is how date is listed in .txt files see line that starts with Generated:
Node 001 Status Report - Report Version 20200505;
Generated 2020-06-20 00:50:56;
Below is portion of the script that's reading and pulling the date:
If($_ -imatch 'Generated'){
$Date = ([regex]::Matches($_,'\b\d+') | select value).value -join ','
}

You can use Select-String to read each file line by line and pattern match against each line:
Select-String -Path a.txt,b.txt -Pattern '^Generated (\d{4}-\d{2}-\d{2})' |
Foreach-Object { $_.Matches.Groups[1].Value }
Select-String also adds other benefits. Each pattern match is a MatchInfo object that contains the file name, line number that matched, and the line that contains the match. The -AllMatches switch will match as many times as possible per input line. The -Path parameter accepts an array of files and/or wildcards in the path. The [1] index is the first unnamed capture group results, which will be what matches within the first set of ().
As an aside, I would verify that the ####-##-## is actually a valid date unless you know that will always be so within your data. You can do this easily if your system culture settings allow for the date format:
Select-String -Path a.txt,b.txt -Pattern '^Generated (\d{4}-\d{2}-\d{2})' | Foreach-Object {
$_.Matches.Groups[1].Value | Where { $_ -as [datetime] }
}
If the culture settings do not allow the format, you will need to use ParseExact or TryParseExact to test the date.
If you must work within your current data format, then you can do the following to extract the date from the comma-delimited string in the required format:
If($_ -imatch 'Generated'){
$Numbers = ([regex]::Matches($_,'\b\d+') | select value).value -join ','
$Date = ($Numbers -split ',')[0..2] -join '-'
}

You are joining the expression with -join ',' for commas, if you want dashes instead, just change that to a dash.
If($_ -imatch 'Generated'){
$Date = ([regex]::Matches($_,'\b\d+') | select value).value -join '-'
}

Related

Export data to CSV different row using regex

I used a regular expression to extract a string from a file and export to CSV. I could figure out how to extract each match value to different rows. The result would end up in single cell
{ 69630e4574ec6798, 78630e4574ec6798, 68630e4574ec6798}
I need it to be in different rows in CSV as below:
69630e4574ec6798
78630e4574ec6798
68630e4574ec6798
$Regex = [regex]"\s[a-f0-9]{16}"
Select-Object #{Name="Identity";Expression={$Regex.Matches($_.Textbody)}} |
Format-Table -Wrap |
Export-Csv -Path c:\temp\Inbox.csv -NoTypeInformation -Append
Details screenshot:
Edit:
I have been trying to split the data I have in my CSV but I am having difficulty in splitting the output data "id" to next line as they all come in one cell "{56415465456489944,564544654564654,46565465}".
In the screenshot below the first couple lines are the source input and the highlighted lines in the second group is the output that I am trying to get.
Change your regular expression so that it has the hexadecimal substrings in a capturing group (to exclude the leading whitespace):
$Regex = [regex]"\s([a-f0-9]{16})"
then extract the first group from each match:
$Regex.Matches($_.Textbody) | ForEach-Object {
$_.Groups[1].Value
} | Set-Content 'C:\temp\a.txt'
Use Set-Content rather than Out-File, because the latter will create the output file in Unicode format by default whereas the former defaults to ASCII output (both cmdlets allow overriding the default via a parameter -Encoding).
Edit:
To split the data from you id column and create individual rows for each ID you could do something like this:
Import-Csv 'C:\path\to\input.csv' | ForEach-Object {
$row = $_
$row.id -replace '[{}]' -split ',' | ForEach-Object {
$row | Select-Object -Property *,#{n='id';e={$_}} -ExcludeProperty id
}
} | Export-Csv 'C:\path\to\output.csv' -NoType

Retain carriage returns in text filtered through a regular expression

I need to search though a folder of logs and retrieve the most recent logs. Then I need to filter each log, pull out the relevant information and save to another file.
The problem is the regular expression I use to filter the log is dropping the carriage return and the line feed so the new file just contains a jumble of text.
$Reg = "(?ms)\*{6}\sBEGIN(.|\n){98}13.06.2015(.|\n){104}00000003.*(?!\*\*)+"
get-childitem "logfolder" -filter *.log |
where-object {$_.LastAccessTime -gt [datetime]$Test.StartTime} |
foreach {
$a=get-content $_;
[regex]::matches($a,$reg) | foreach {$_.groups[0].value > "MyOutFile"}
}
Log structure:
******* BEGIN MESSAGE *******
<Info line 1>
Date 18.03.2010 15:07:37 18.03.2010
<Info line 2>
File Number: 00000003
<Info line 3>
*Variable number of lines*
******* END MESSAGE *******
Basically capture everything between the BEGIN and END where the dates and file numbers are a certain value. Does anyone know how I can do this without losing the line feeds? I also tried using Out-File | Select-String -Pattern $reg, but I've never had success with using Select-String on a multiline record.
As #Matt pointed out, you need to read the entire file as a single string if you want to do multiline matches. Otherwise your (multiline) regular expression would be applied to single lines one after the other. There are several ways to get the content of a file as a single string:
(Get-Content 'C:\path\to\file.txt') -join "`r`n"
Get-Content 'C:\path\to\file.txt' | Out-String
Get-Content 'C:\path\to\file.txt' -Raw (requires PowerShell v3 or newer)
[IO.File]::ReadAllText('C:\path\to\file.txt')
Also, I'd modify the regular expression a little. Most of the time log messages may vary in length, so matching fixed lengths may fail if the log message changes. It's better to match on invariant parts of the string and leave the rest as variable length matches. And personally I find it a lot easier to do this kind of content extraction in several steps (makes for simpler regular expressions). In your case I would first separate the log entries from each other, and then filter the content:
$date = [regex]::Escape('13.06.2015')
$fnum = '00000003'
$re1 = "(?ms)\*{7} BEGIN MESSAGE \*{7}\s*([\s\S]*?)\*{7} END MESSAGE \*{7}"
$re2 = "(?ms)[\s\S]*?Date\s+$date[\s\S]*?File Number:\s+$fnum[\s\S]*"
Get-ChildItem 'C:\log\folder' -Filter '*.log' | ? {
$_.LastAccessTime -gt [DateTime]$Test.StartTime
} | % {
Get-Content $_.FullName -Raw |
Select-String -Pattern $re1 -AllMatches |
select -Expand Matches |
% {
$_.Groups[1].Value |
Select-String -Pattern $re2 |
select -Expand Matches |
select -Expand Groups |
select -Expand Value
}
} | Set-Content 'C:\path\to\output.txt'
BTW, don't use the redirection operator (>) inside a loop. It would overwrite the output file's content with each iteration. If you must write to a file inside a loop use the append redirection operator instead (>>). However, performance-wise it's usually better to put writing to output files at the end of the pipeline (see above).
Wanted to see if I could make that regex better but for now if you are using those regex modes you should be reading your text file in as a single string which helps a lot.
$a=get-content $_ -Raw
or if you don't have PowerShell 3.0
$a=(get-content $_) -join "`r`n"
I had to solve the problem of disappearing newlines in a completely different context. What you get when you do a get-content of a text file is an array of records, where each record is a line of text.
The only way I found to put the newline back in after some transformation was to use the automatic variable $OFS (output field separator). The default value is space, but if you set it to carriage return line feed, then you get separate records on separate lines.
So try this (it might work):
$OFS = "`r`n"

Regex to match only words without _ or -

I am trying to extract word out of a text file which contains exactly one word per each line. But I only want to match the word if there are no "_"(underscore) or "-" (dash) in the word:
File might look like :
< someword
< SomeOtherword
< wordwith-dash-anotherd
< wordwith_under_anotheru
I only want to extract line 1 & 2 and ignore line 3 & 4
(i.e. result when regex match each line should be: someword SomeOtherword without "<" and space for each line)
I have been trying with "[\w-]+" which matches words with both _ & -
I am using PowerShell regex engine.
I am processing a file with close to 100000 lines. I don't want to loop through each line as need the processing time to be very quick. code I am using:
$rx = '[\w-]+'
Get-Content $filename | Select-String -Pattern $rx -AllMatches | select -ExpandProperty Matches | select -ExpandProperty Value | out-file $outputfile
If you are performance sensitive, this approach is measurably faster (2.6 secs vs. 80 millisecs):
(Select-String '^[a-zA-Z]+$' file.txt -AllMatches).Matches.Value
This does require a feature that is new to PowerShell v3. You don't say which version you are using.
To do a regex match in powershell you can use either -match operator or select-string. There is also a -notmatch operator and a -NotMatch flag for select-string. Both filter for the absence of a match.
So one option is
gc 'file.txt' | where { $_ -notmatch '-|_' } | foreach { $_.Trim('<', ' ') }
and another is
gc 'file.txt' | select-string -NotMatch '-|_' | foreach { $_.Line.Trim('<', ' ') }

PowerShell Select-String from file with Regex

I am trying to get a string of text from a .sln (Visual Studio solution file) to show what project files are contained within the solution.
An example line of text is
Project("{xxxx-xxxxx-xxxx-xxxx-xx}") = "partofname.Genesis.Printing", "Production\partofname.Genesis.Printing\partofname.Genesis.Printing.csproj", "{xxx-xxx-xxx-xxx-xxxx}"
EndProject
The part of the string that I am interested in is the last part between the \ and the ".
\partofname.Genesis.Printing.csproj"
The regular expression I am using is:
$r = [regex] "^[\\]{1}([A-Za-z.]*)[\""]{1}$"
I am reading the file content with:
$sln = gci .\Product\solutionName.sln
I don't know what to put in my string-select statement.
I am very new to PowerShell and would appreciate any and all help...
I did have a very very long-hand way of doing this earlier, but I have lost the work... Essentially it was doing this for each line in a file:
Select-String $sln -pattern 'proj"' | ? {$_.split()}
But a regular expression would be a lot easier (I hope).
The following gets everything between " and proj":
Select-String -Path $PathToSolutionFile ', "([^\\]*?\\)*([^\.]*\..*proj)"' -AllMatches | Foreach-Object {$_.Matches} |
Foreach-Object {$_.Groups[2].Value}
The first group gets the folder that the proj file is in. The second group gets just was you requested (the project file name). AllMatches returns every match, not just the first. After that it's just a matter of looping through each collection of matches on the match objects and getting the value of the second group in the match.
Your Script works great. To make into a one liner add -Path on the Select String:
Select-String -path $pathtoSolutionFile ', "([^\\]*?\\)?([^\.]*\..*proj)"' -
AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[2].Value}
To build from this you can use Groups[0]
(((Select-String -path $pathtoSoultionFile ', "([^\\]*?\\)?([^\.]*\..*proj)"' -AllMatches | Foreach-Object {$_.Matches} |
Foreach-Object {$_.Groups[0].Value})-replace ', "','.\').trim('"'))
For me this pattern was the best:
[^"]+\.csproj

Search mutiple words using regular expression in powershell

I am new to powershell. I highly appreciate any help you can provide for the below. I have a powershell script but not being able to complete to get all the data fields from the text file.
I have a file 1.txt as below.
I am trying to extract output for "pid" and "ctl00_lblOurPrice" from the file in table format below so that I can get open this in excel. Column headings are not important. :
pid ctl00_lblOurPrice
0070362408 $6.70
008854787666 $50.70
Currently I am only able to get pid as below. Would like to also get the price for each pid. -->
0070362408
008854787666
c:\scan\1.txt:
This is sentence 1.. This is sentence 1.1... This is sentence A1...
fghfdkgjdfhgfkjghfdkghfdgh gifdgjkfdghdfjghfdg
gkjfdhgfdhgfdgh
ghfghfjgh
...
href='http://example.com/viewdetails.aspx?pid=0070362408'>
This is sentence B1.. This is sentence B2... This is sentence B3...
GFGFGHHGH
HHGHGFHG
<p class="price" style="display:inline;">
ctl00_lblOurPrice=$6.70
This is sentence 1.. This is sentence 1.1... This is sentence A1...
fghfdkgjdfhgfkjghfdkghfdgh gifdgjkfdghdfjghfdg
gkjfdhgfdhgfdgh
ghfghfjgh
...
href='http://example.com/viewdetails.aspx?pid=008854787666'>
This is sentence B1.. This is sentence B2... This is sentence B3...
6GBNGH;L
887656HGFHG
<p class="price" style="display:inline;">
ctl00_lblOurPrice=$50.70
...
...
Current powershell script:
$files=Get-ChildItem c:\scan -recurse
$output_file = ‘c:\output\outdata.txt’
foreach ($file in $files) {
$input_path = $file
$regex = ‘num=\d{1,13}’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % {
($_.Value) -replace "num=","" } | Out-File $output_file -Append }
Thanks in advance for your help
I'm going to assume that you either mean pid=\d{1,13} in your code, or that your sample text should have read num= instead of pid=. We will go with the assumption that it is in fact supposed to be pid.
In that case we will turn the entire file into one long string with -Join "", and then split it on "href" to create records for each site to parse against. Then we match for pid= and ending when it comes across a non-numeric character, and then we look for a dollar amount (a $ followed by numbers, followed by a period, and then two more numbers).
When we have a pair of PID/Price matches we can create an object with two properties, PID and Price, and output that. For this I will assign it to an array, to be used later. If you do not have PSv3 or higher you will have to change [PSCustomObject][ordered] into New-Object PSObject -Property but that loses the order of properties, so I like the former better and use it in my example here.
$files=Get-ChildItem C:\scan -recurse
$output_file = 'c:\output\outdata.csv'
$Results = #()
foreach ($file in $files) {
$Results += ((gc $File) -join "") -split "href" |?{$_ -match "pid=(\d+?)[^\d].*?(\$\d*?\.\d{2})"}|%{[PSCustomObject][ordered]#{"PID"=$Matches[1];"Price"=$Matches[2]}}
}
$Results | Select PID,Price | Export-Csv $output_file -NoTypeInformation