Export data to CSV different row using regex

Export data to CSV different row using regex - regex

I used a regular expression to extract a string from a file and export to CSV. I could figure out how to extract each match value to different rows. The result would end up in single cell
{ 69630e4574ec6798, 78630e4574ec6798, 68630e4574ec6798}
I need it to be in different rows in CSV as below:
69630e4574ec6798
78630e4574ec6798
68630e4574ec6798
$Regex = [regex]"\s[a-f0-9]{16}"
Select-Object #{Name="Identity";Expression={$Regex.Matches($_.Textbody)}} |
Format-Table -Wrap |
Export-Csv -Path c:\temp\Inbox.csv -NoTypeInformation -Append
Details screenshot:
Edit:
I have been trying to split the data I have in my CSV but I am having difficulty in splitting the output data "id" to next line as they all come in one cell "{56415465456489944,564544654564654,46565465}".
In the screenshot below the first couple lines are the source input and the highlighted lines in the second group is the output that I am trying to get.

Change your regular expression so that it has the hexadecimal substrings in a capturing group (to exclude the leading whitespace):
$Regex = [regex]"\s([a-f0-9]{16})"
then extract the first group from each match:
$Regex.Matches($_.Textbody) | ForEach-Object {
$_.Groups[1].Value
} | Set-Content 'C:\temp\a.txt'
Use Set-Content rather than Out-File, because the latter will create the output file in Unicode format by default whereas the former defaults to ASCII output (both cmdlets allow overriding the default via a parameter -Encoding).
Edit:
To split the data from you id column and create individual rows for each ID you could do something like this:
Import-Csv 'C:\path\to\input.csv' | ForEach-Object {
$row = $_
$row.id -replace '[{}]' -split ',' | ForEach-Object {
$row | Select-Object -Property *,#{n='id';e={$_}} -ExcludeProperty id
}
} | Export-Csv 'C:\path\to\output.csv' -NoType

Related

RegEx command to get date

Using the following RegEx line in my PowerShell script to pull dates from .txt files. The script is reading and pulling the dates to a .csv file in this format Year,Month,Day,Hour,Min,Sec (2020,06,20,00,50,56). I'm looking for some guidance on how I can get the date just to show without the commas in this format 2020-06-20
This is how date is listed in .txt files see line that starts with Generated:
Node 001 Status Report - Report Version 20200505;
Generated 2020-06-20 00:50:56;
Below is portion of the script that's reading and pulling the date:
If($_ -imatch 'Generated'){
$Date = ([regex]::Matches($_,'\b\d+') | select value).value -join ','
}

You can use Select-String to read each file line by line and pattern match against each line:
Select-String -Path a.txt,b.txt -Pattern '^Generated (\d{4}-\d{2}-\d{2})' |
Foreach-Object { $_.Matches.Groups[1].Value }
Select-String also adds other benefits. Each pattern match is a MatchInfo object that contains the file name, line number that matched, and the line that contains the match. The -AllMatches switch will match as many times as possible per input line. The -Path parameter accepts an array of files and/or wildcards in the path. The [1] index is the first unnamed capture group results, which will be what matches within the first set of ().
As an aside, I would verify that the ####-##-## is actually a valid date unless you know that will always be so within your data. You can do this easily if your system culture settings allow for the date format:
Select-String -Path a.txt,b.txt -Pattern '^Generated (\d{4}-\d{2}-\d{2})' | Foreach-Object {
$_.Matches.Groups[1].Value | Where { $_ -as [datetime] }
}
If the culture settings do not allow the format, you will need to use ParseExact or TryParseExact to test the date.
If you must work within your current data format, then you can do the following to extract the date from the comma-delimited string in the required format:
If($_ -imatch 'Generated'){
$Numbers = ([regex]::Matches($_,'\b\d+') | select value).value -join ','
$Date = ($Numbers -split ',')[0..2] -join '-'
}

You are joining the expression with -join ',' for commas, if you want dashes instead, just change that to a dash.
If($_ -imatch 'Generated'){
$Date = ([regex]::Matches($_,'\b\d+') | select value).value -join '-'
}

Exporting Hash Table Using Property Dictionary to CSV

I can't seem to figure out how to simply export formatted information to a CSV unless I iterate through each item in the object and write to the CSV line by line, which takes forever. I can export values instantly to the CSV, it's just when using the properties dictionary I run into issues.
The TestCSV file is formatted with a column that has IP addresses.
Here's what I have:
$CSV = "C:\TEMP\OutputFile.csv"
$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$TestCSV = "C:\TEMP\FileWithIPs.csv"
$spreadsheetDataobject = import-csv $TestCSV
$Finding = $spreadsheetDataObject | Select-String $RX
$Props = #{ #create a properties dictionary
LineNumber = $finding.LineNumber
Matches = $finding.Matches.Value
}
$OBJ = New-Object -TypeName psobject -Property $Props
$OBJ | Select-Object Matches,LineNumber | Export-Csv -Path $CSV -Append -NoTypeInformation

This isn't going to work as written. You are using Import-CSV which creates an array of objects with properties. The Select-String command expects strings as input, not objects. If you want to use Select-String you would want to simply specify the file name, or use Get-Content on the file, and pass that to Select-String. If what you want is the line number, and the IP I think this would probably work just as well if not better for you:
$CSV = "C:\TEMP\OutputFile.csv"
$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$TestCSV = "C:\TEMP\FileWithIPs.csv"
$spreadsheetDataobject = import-csv $TestCSV
$spreadsheetDataobject |
Where{$_.IP -match $RX} |
Select-Object #{l='Matches';e={$_.IP}},#{l='LineNumber';e={[array]::IndexOf($spreadsheetDataobject,$_)+1}} |
Export-Csv -Path $CSV -Append -NoTypeInformation
Edit: wOxxOm is quite right, this answer has considerably more overhead than parsing the text directly like he does. Though, for somebody who's new to PowerShell it's probably easier to understand.
In regards to $_.IP, since you use Import-CSV you create an array of objects. Each object has properties associated with it based on the header of the CSV file. IP was listed in the header as one of your columns, so each object has a property of IP, and the value of that property is whatever was in the IP column for that record.
Let me explain the Select line for you, and then you'll see that it's easy to add your source path as another column.
What I'm doing is defining properties with a hashtable. For my examples I will refer to the first one shown above. Since it is a hashtable it starts with #{ and ends with }. Inside there are two key/value pairs:
l='Matches'
e={$_.IP}
Essentially 'l' is short for Label, and 'e' is short for Expression. The label determines the name of the property being defined (which equates to the column header when you export). The expression defines the value assigned to the property. In this case I am really just renaming the IP column to Matches, since the value that I assign for each row is whatever is in the IP field. If you open the CSV in Excel, copy the entire IP column, paste it in at the end, and change the header to Matches, that is basically all I'm doing. So to add the file path as a column we can add one more hashtable to the Select line with this:
#{
l='FilePath'
e={$CSV}
}
That adds a third property, where the name is FilePath, and the value is whatever is stored in $CSV. That updated Select line would look like this:
Select-Object #{l='Matches';e={$_.IP}},#{l='LineNumber';e={[array]::IndexOf($spreadsheetDataobject,$_)+1}},#{l='FilePath'e={$CSV}} |

Any code based on the built-in CSV cmdlets is extremely slow because objects are created for each field on each line, and it's noticeable on large files (for example, code from the other answer takes 900 seconds to process a 9MB file with 100k lines).
If your input CSV file is simple, you can process it as text in less than a second for a 100k lines file:
$CSV = .......
$RX = .......
$TestCSV = .......
$line = 0 # header line doesn't count
$lastMatchPos = 0
$text = [IO.File]::ReadAllText($TestCSV) -replace '"http.+?",', ','
$out = New-Object Text.StringBuilder
ForEach ($m in ([regex]"(?<=,""?)$RX(?=""?,)").Matches($text)) {
$line += $m.index - $lastMatchPos -
$text.substring($lastMatchPos, $m.index-$lastMatchPos).Replace("`n",'').length
$lastMatchPos = $m.Index + $m.length
$out.AppendLine('' + $line + ',' + $m.value) >$null
}
if (!(Test-Path $CSV)) {
'LineNumber,IP' | Out-File $CSV -Encoding ascii
}
$out.ToString() | Out-File $CSV -Encoding ascii -Append
The code zaps quoted URLs fields just in the unlikely but possible case those contain a matching IP.

Retain carriage returns in text filtered through a regular expression

I need to search though a folder of logs and retrieve the most recent logs. Then I need to filter each log, pull out the relevant information and save to another file.
The problem is the regular expression I use to filter the log is dropping the carriage return and the line feed so the new file just contains a jumble of text.
$Reg = "(?ms)\*{6}\sBEGIN(.|\n){98}13.06.2015(.|\n){104}00000003.*(?!\*\*)+"
get-childitem "logfolder" -filter *.log |
where-object {$_.LastAccessTime -gt [datetime]$Test.StartTime} |
foreach {
$a=get-content $_;
[regex]::matches($a,$reg) | foreach {$_.groups[0].value > "MyOutFile"}
}
Log structure:
******* BEGIN MESSAGE *******
<Info line 1>
Date 18.03.2010 15:07:37 18.03.2010
<Info line 2>
File Number: 00000003
<Info line 3>
*Variable number of lines*
******* END MESSAGE *******
Basically capture everything between the BEGIN and END where the dates and file numbers are a certain value. Does anyone know how I can do this without losing the line feeds? I also tried using Out-File | Select-String -Pattern $reg, but I've never had success with using Select-String on a multiline record.

As #Matt pointed out, you need to read the entire file as a single string if you want to do multiline matches. Otherwise your (multiline) regular expression would be applied to single lines one after the other. There are several ways to get the content of a file as a single string:
(Get-Content 'C:\path\to\file.txt') -join "`r`n"
Get-Content 'C:\path\to\file.txt' | Out-String
Get-Content 'C:\path\to\file.txt' -Raw (requires PowerShell v3 or newer)
[IO.File]::ReadAllText('C:\path\to\file.txt')
Also, I'd modify the regular expression a little. Most of the time log messages may vary in length, so matching fixed lengths may fail if the log message changes. It's better to match on invariant parts of the string and leave the rest as variable length matches. And personally I find it a lot easier to do this kind of content extraction in several steps (makes for simpler regular expressions). In your case I would first separate the log entries from each other, and then filter the content:
$date = [regex]::Escape('13.06.2015')
$fnum = '00000003'
$re1 = "(?ms)\*{7} BEGIN MESSAGE \*{7}\s*([\s\S]*?)\*{7} END MESSAGE \*{7}"
$re2 = "(?ms)[\s\S]*?Date\s+$date[\s\S]*?File Number:\s+$fnum[\s\S]*"
Get-ChildItem 'C:\log\folder' -Filter '*.log' | ? {
$_.LastAccessTime -gt [DateTime]$Test.StartTime
} | % {
Get-Content $_.FullName -Raw |
Select-String -Pattern $re1 -AllMatches |
select -Expand Matches |
% {
$_.Groups[1].Value |
Select-String -Pattern $re2 |
select -Expand Matches |
select -Expand Groups |
select -Expand Value
}
} | Set-Content 'C:\path\to\output.txt'
BTW, don't use the redirection operator (>) inside a loop. It would overwrite the output file's content with each iteration. If you must write to a file inside a loop use the append redirection operator instead (>>). However, performance-wise it's usually better to put writing to output files at the end of the pipeline (see above).

Wanted to see if I could make that regex better but for now if you are using those regex modes you should be reading your text file in as a single string which helps a lot.
$a=get-content $_ -Raw
or if you don't have PowerShell 3.0
$a=(get-content $_) -join "`r`n"

I had to solve the problem of disappearing newlines in a completely different context. What you get when you do a get-content of a text file is an array of records, where each record is a line of text.
The only way I found to put the newline back in after some transformation was to use the automatic variable $OFS (output field separator). The default value is space, but if you set it to carriage return line feed, then you get separate records on separate lines.
So try this (it might work):
$OFS = "`r`n"

Is it possible to replace Get-Content, ForEach-Object string -match with Select-String cmdlet?

I have a fixed width file with records in a format as follows
DDEDM2018890 19960730015000010000
DDETPL015000 20150515015005010000
DDETPL015010 20150515015003010000
DDETPL015020 20150515015002010000
DDETPL015030 20150515015005010000
DDETPL015040 20150515015000010000
the first 3 characters identify the record type, in the above example all records are of type DDE but there are also lines of a different type in the file.
the following regular expression with named capture groups parses the relevant information from each record for my purpose (notice it also filters down to DDE record types:
DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})
play with this regex on this excellent online parser
I have written a script that uses the Get-Content, ForEach-Object and Select-Object cmdlets to convert the fixed width file into a csv file.
I wonder if I could replace the Get-Content and ForEach-Object cmdlets by a single Select-String cmdlet?
#this powershell script reads fixed width file and generates a csv file of the relevant & converted values
#Prepare HashSet object for Select-Object to convert CategoryCode and append with CategoryId
$Category = #{
Name = "Category"
Expression = {
$cat = switch($_.CategoryCode)
{
"50"{"A"}
"54"{"C"}
"60"{"F"}
"66"{"I"}
"74"{"M"}
"88"{"T"}
}
$cat+$_.CategoryId
}
}
gc "C:\Path\To\File.txt" | % {
if($_ -match "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3}).*$")
{
#$matches is a hashset of named capture groups, convert to object to allow Select-Object to handle hashset elements as object properties
[PSCustomObject]$matches
}
} | select Database, $Category, Length #| export-csv "AnalysisLengths.csv" -NoTypeInformation
Before I finalized the script, I was trying to use the Select-String cmdlet but could not figure out how to use it, I believe it can achieve the same result in a more eloquent way... this is what I had:
##Could this be completed with just the Select-String commandlet instead of Get-Content+ForEach+Select-Object?
Select-String -Path "C:\Path\To\File.txt" `
-Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" `
| Select-Object -ExpandProperty Matches
Using -ExpandProperty should convert the Microsoft.PowerShell.Commands.MatchInfo Matches property into the actual System.Text.RegularExpressions.Match objects for each line...
see also Powershell Select-Object vs ForEach on Select-String results

Here is one way (I'am not so proud of it)
Select-String -Path "C:\Path\To\File.txt" -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" | %{New-Object -TypeName PSObject -Property #{Database=$_.matches.groups[1];CategoryCode=$_.matches.groups[2];CategoryId=$_.matches.groups[3];Length=$_.matches.groups[4]}} | export-csv "C:\Path\To\File.csv"

I don't know why you have limited your question to Select-String cmdlet. If you had included the switch statement, then, I'd answer to you: YES! It's possible!
And I'd present to you this simple and short PowerShell code:
$(switch -Regex -File $fileIN{$patt{[pscustomobject]$matches|select * -ExcludeProperty 0}})|epcsv $fileCSV`
where $fileIN is the input file, $fileCSV is CSV file you wanna create, and $patt is the pattern you have in your OP:
$patt='DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'`
The switch statement is very powerful.

While Select-String can combine Get-Content and pattern matching, you still need a loop for constructing your custom objects. You could stick with what you have, although I'd suggest a couple modifications. Replace the switch statement with a hashtable and make the nested if a Where-Object filter:
$categories = #{
'50' = 'A'
'54' = 'C'
'60' = 'F'
'66' = 'I'
'74' = 'M'
'88' = 'T'
}
$category = #{
Name = 'Category'
Expression = { $categories[$_.CategoryCode] + $_.CategoryId }
}
$pattern = 'DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'
Get-Content 'C:\path\to\file.txt' |
? { $_ -match $pattern } |
% { [PSCustomObject]$matches } |
select Database, $category, Length |
Export-Csv 'C:\path\to\output.csv' -NoType
Or you could go with #JPBlanc's suggestion (again with some slight modifications):
$category = #{
'50' = 'A'
'54' = 'C'
'60' = 'F'
'66' = 'I'
'74' = 'M'
'88' = 'T'
}
$pattern = "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})"
Select-String -Path 'C:\path\to\file.txt' -Pattern $pattern | % {
New-Object -TypeName PSObject -Property #{
Database = $_.Matches.Groups[1].Value
Category = $category[$_.Matches.Groups[2].Value] + $_.Matches.Groups[3].Value
Length = $_.Matches.Groups[4].Value
}
} | Export-Csv 'C:\path\to\output.csv' -NoType
The latter will give you slightly better performance, although not too much (execution times were 2:35 vs 2:50 for a 120 MB input file on my test box).

RegEx required for search-replace using PowerShell

I'm trying to load up a file from a PS script and need to search replace on the basis of given pattern and new values. I need to know what the pattern would be. Here is an excerpt from the file:
USER_IDEN;SYSTEM1;USERNAME1;
30;WINDOWS;Wanner.Siegfried;
63;WINDOWS;Ott.Rudolf;
68;WINDOWS;Waldera.Alicja;
94;WINDOWS;Lanzl.Dieter;
98;WINDOWS;Hofmeier.Erhard;
ReplacerValue: "#dummy.domain.com"
What to be replaced: USERNAME1 column
Expected result:
USER_IDEN;SYSTEM1;USERNAME1;
30;WINDOWS;Wanner.Siegfried#dummy.domain.com;
63;WINDOWS;Ott.Rudolf#dummy.domain.com;
68;WINDOWS;Waldera.Alicja#dummy.domain.com;
94;WINDOWS;Lanzl.Dieter#dummy.domain.com;
98;WINDOWS;Hofmeier.Erhard#dummy.domain.com;
Also, the file can be like this as well:
USER_IDEN;SYSTEM1;USERNAME1;SYSTEM2;USERNAME2;SYSTEM3;USERNAME3;
30;WINDOWS;Wanner.Siegfried;WINDOWS2;Wanner.Siegfried;LINUX;Dev-1;LINUX2;QA1
63;WINDOWS;Ott.Rudolf;WINDOWS2;Ott.Rudolf;LINUX;Dev-2
68;WINDOWS;Waldera.Alicja;
94;WINDOWS;Lanzl.Dieter;WINDOWS4;Lanzl.Dieter;WINDOWS3;Lead1
98;WINDOWS;Hofmeier.Erhard;
In the above examples, I want to seek the values under USERNAMEn columns but there is a possibility that the column row may not be present but the CSV (;) and the pairs will remain same and also the first value is the identifier so it's always there.
I have found the way to start but need to get the pattern:
(Get-Content C:\script\test.txt) |
Foreach-Object {$_ -replace "^([0-9]+;WINDOWS;[^;]+);$", '$#dummy.domain.com;'} |
Set-Content C:\script\test.txt
Edit
I came up with this pattern: ^([0-9]+;WINDOWS;[^;]+);$
It is very much fixed to this particular file only with no more than one Domain-Username pair and doesn't depend on the columns.

I think that using a regex to do this is going about it the hard way. Instead of using Get-Content use Import-Csv which will split your columns for you. You can then use Get-Memeber to identify the USERNAME columns. Something like this:
$x = Import-Csv YourFile.csv -Delimiter ';'
$f = #($x[0] | Get-Member -MemberType NoteProperty | Select name -ExpandProperty name | ? {$_ -match 'USERNAME'})
$f | % {
$n = $_
$x | % {$_."$n" = $_."$n" + '#dummy.domain.com'}
}
$x | Export-Csv .\YourFile.csv -Delimiter ';' -NoTypeInformation

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Export data to CSV different row using regex - regex

Related

RegEx command to get date

Exporting Hash Table Using Property Dictionary to CSV

Retain carriage returns in text filtered through a regular expression

Is it possible to replace Get-Content, ForEach-Object string -match with Select-String cmdlet?

RegEx required for search-replace using PowerShell

Categories

Resources