RegEx required for search-replace using PowerShell - regex

I'm trying to load up a file from a PS script and need to search replace on the basis of given pattern and new values. I need to know what the pattern would be. Here is an excerpt from the file:
USER_IDEN;SYSTEM1;USERNAME1;
30;WINDOWS;Wanner.Siegfried;
63;WINDOWS;Ott.Rudolf;
68;WINDOWS;Waldera.Alicja;
94;WINDOWS;Lanzl.Dieter;
98;WINDOWS;Hofmeier.Erhard;
ReplacerValue: "#dummy.domain.com"
What to be replaced: USERNAME1 column
Expected result:
USER_IDEN;SYSTEM1;USERNAME1;
30;WINDOWS;Wanner.Siegfried#dummy.domain.com;
63;WINDOWS;Ott.Rudolf#dummy.domain.com;
68;WINDOWS;Waldera.Alicja#dummy.domain.com;
94;WINDOWS;Lanzl.Dieter#dummy.domain.com;
98;WINDOWS;Hofmeier.Erhard#dummy.domain.com;
Also, the file can be like this as well:
USER_IDEN;SYSTEM1;USERNAME1;SYSTEM2;USERNAME2;SYSTEM3;USERNAME3;
30;WINDOWS;Wanner.Siegfried;WINDOWS2;Wanner.Siegfried;LINUX;Dev-1;LINUX2;QA1
63;WINDOWS;Ott.Rudolf;WINDOWS2;Ott.Rudolf;LINUX;Dev-2
68;WINDOWS;Waldera.Alicja;
94;WINDOWS;Lanzl.Dieter;WINDOWS4;Lanzl.Dieter;WINDOWS3;Lead1
98;WINDOWS;Hofmeier.Erhard;
In the above examples, I want to seek the values under USERNAMEn columns but there is a possibility that the column row may not be present but the CSV (;) and the pairs will remain same and also the first value is the identifier so it's always there.
I have found the way to start but need to get the pattern:
(Get-Content C:\script\test.txt) |
Foreach-Object {$_ -replace "^([0-9]+;WINDOWS;[^;]+);$", '$#dummy.domain.com;'} |
Set-Content C:\script\test.txt
Edit
I came up with this pattern: ^([0-9]+;WINDOWS;[^;]+);$
It is very much fixed to this particular file only with no more than one Domain-Username pair and doesn't depend on the columns.

I think that using a regex to do this is going about it the hard way. Instead of using Get-Content use Import-Csv which will split your columns for you. You can then use Get-Memeber to identify the USERNAME columns. Something like this:
$x = Import-Csv YourFile.csv -Delimiter ';'
$f = #($x[0] | Get-Member -MemberType NoteProperty | Select name -ExpandProperty name | ? {$_ -match 'USERNAME'})
$f | % {
$n = $_
$x | % {$_."$n" = $_."$n" + '#dummy.domain.com'}
}
$x | Export-Csv .\YourFile.csv -Delimiter ';' -NoTypeInformation

Related

Using "notin" with matching groups

Using powershell, I am trying to determine which perl scripts in a directory are not called from any other script. In my Select-String I am grouping the matches because there is some other logic I use to filter out results where the line is commented, and a bunch of other scenarios I want to exclude(for simplicity I excluded that from the code posted below). My main problem is in the "-notin" part.
I can get this to work if I remove the grouping from Select-string and only match the filename itself. So this works.
$searchlocation = "C:\Temp\"
$allresults = Select-String -Path "$searchlocation*.pl" -Pattern '\w+\.pl'
$allperlfiles = Get-Childitem -Path "$searchlocation*.pl"
$allperlfiles | foreach-object -process{
$_ | where {$_.name -notin $allresults.matches.value} | Select -expandproperty name | Write-Host
}
However I cannot get the following to work. The only difference between this and above is the value for the "-Pattern" and the value after "-notin". I'm not sure how to use "notin" along with matching groups.
$searchlocation = "C:\Temp\"
$allresults = Select-String -Path "$searchlocation*.pl" -Pattern '(.*?)(\w+\.pl)'
$allperlfiles = Get-Childitem -Path "$searchlocation*.pl"
$allperlfiles | foreach-object -process{
$_ | where {$_.name -notin $allresults.matches.groups[2].value} | Select -expandproperty name | Write-Host}
At a high level the code should search all perl scripts in a directory for any lines that execute any other perl script. With that I now have $allresults which basically gives me a list of all perl scripts called from other files. To get the inverse of that(files that are NOT called from any other file) I get a list of all perl scripts in the directory, cycle through those and list out the ones that DONT show up in $allresults.
When you select a grouping you need to do so using a Select statement, or iteratively in a loop, otherwise you are only going to select the value from the Nth match.
IE if your $Allresults object contains
File.pl, File 2.pl, File 3.pl
Then $allresults.Matches.Groups[2].value Only Returns File2.pl
Instead, you need to select those values!
$allresults | select #{N="Match";E={ $($_.Matches.Groups[2].value) } }
Which will return:
Match
-----
File1.pl
File2.pl
File3.pl
In your specific example, each match has three sub-items, the results will be completely sequential, so what you would term "match 1, group 1" is groups[0] while "match 2, group 1" is groups[3]
This means the matches you care about (those with grouping 2) are in the array values contained in the set {2,5,8,11,...,etc.} or can be described as (N*3-1) Where N is the number of the match. So For Match 1 = (1*3)-1 = [2]; while For Match 13 = (13*3)-1 = [38]
You can iterate through them using a loop to check:
for($i=0; $i -le ($allresults.Matches.groups.count-1); $i++){
"Group[$i] = ""$($allresults.Matches.Groups[$i].value)"""
}
I noticed that you took the time to avoid loops in collecting your data, but then accidentally seem to have fallen prey to using one in matching your data.
Not-In and other compares when used by the select and where clauses don't need a loop structure and are faster if not looped, so you can forego the Foreach-object loop and have a better process just by using a simple Where (?).
$SearchLocation = "C:\Temp\"
$FileGlob = "*.pl"
$allresults = Select-String -Path "$SearchLocation$FileGlob" -Pattern '(.*?)([\w\.]+\.bat)'
$allperlfiles = Get-Childitem -Path "$SearchLocation$FileGlob"
$allperlfiles | ? {
$_.name -notin $(
$allresults | select #{N="Match";E={ $($_.Matches.Groups[2].value) } }
)
} | Select -expandproperty name | Write-Host
Now, that should be faster and simpler code to maintain, but, as you may have noticed, it still has some redundancies now that you are not looping.
As you are piping it all into a Select which can do the work of the where, and what's more you only are looking to match the NAME property here so you can either for-go the last select by only piping the name of the file in the first place, or you can forgo the where and select exactly what you want.
I think the former is far simpler, and the latter is useful if you are going to actually do something with those other values inside the loop that we don't know yet.
Finally, Write-host is likely redundant as any object output will echo to the console.
Here is that version which incorporates the removal of the unnecessary loops and removes redundancies related to the output of the info you wanted, all together.
$SearchLocation = "C:\Temp\"
$FileGlob = "*.pl"
$allresults = Select-String -Path "$SearchLocation$FileGlob" -Pattern ('(.*?)([\w\.]+\'+$FileGlob+')')
$allperlfiles = Get-Childitem -Path "$SearchLocation$FileGlob"
$allperlfiles.name | ? {
$_ -notin $(
$allresults | select #{
N="Match";E={
$($_.Matches.Groups[2].value)
}
}
)
}

Powershell replace lines stating with given pattern

Background:
I am trying to read a config file and place it in a string variable for further use. For now I've managed to remove the newlines, which was fairly simple. However, I'm having a bit of trouble with removing comments (lines starting with a #).
What I have so far
$var = (Get-Content $HOME/path/to/config.txt -Raw).Replace("`r`n","")
What I've tried
A lot of it is something along the lines of
$var = (Get-Content $HOME/path/to/config.txt -Raw).Replace("#.*?`r`n","").Replace("`r`n","")
( .Replace("(?<=#).*?(?=`r`n)","") , .Replace('^[^#].*?`r`n','') etc)
A lot of the resources I've found have treated how to iteratively read from a file and write back to it or a new one, but what I need is for the result to stay in a variable and for the original file not to be altered in any way (also I'd rather avoid using temp files or even variables if possible). I think there is something fundamental I'm missing about the input to Replace. (Also found this semi-relevant piece when you're using the ConvertFrom-Csv Using Import-CSV in Powershell, ignoring commented lines .)
Sample input:
text;
weird text;
other-sort-of-text;
#commented out possibility;
more-input/with-comment;#comment
Sample output:
text;weird text;other-sort-of-text;more-input/with-comment;
Additional info:
Am going to run this on current builds of Windows 10, locally now I seem to have pwershell version 5.1.14393.693
Split the string at semicolons, remove element starting with a #, then join the result back to a string.
((Get-Content $HOME/path/to/config.txt -Raw).Replace("`r`n","") -split ';' |
Where-Object { $_.Trim() -notlike '#*' }) -join ';'
This might be identical to some of the other responses, but here is how I would do it:
$Data = Get-Content $HOME/path/to/config.txt -Raw
(($Data -split(';')).replace("`r`n",'') | Where-Object { $_ -notlike '^#*' }) -join(';')
Anyway you do it, remember that rn needs to be expanded, so it has to be encased in double quotes, unlike the rest of your characters.
#############solution 1 with convertfrom-string####################
#short version
(gc "$HOME/path/to/config.txt" | ? {$_ -notlike "#*"} | cfs -D ";").P1 -join ";"
#verbose version
(Get-Content "$HOME/path/to/config.txt" | where {$_ -notlike "#*"} | ConvertFrom-String -Delimiter ";").P1 -join ";"
#############Solution 2 with convertfrom-csv#######################
(Get-Content "C:\temp\test\config.txt" | where {$_ -notlike "#*"} | ConvertFrom-csv -Delimiter ";" -Header "P1").P1 -join ";"
#############Solution 3 with split #######################
(Get-Content "C:\temp\test\config.txt" | where {$_ -notlike "#*"} | %{$_.Split(';')[0]}) -join ";"

Export data to CSV different row using regex

I used a regular expression to extract a string from a file and export to CSV. I could figure out how to extract each match value to different rows. The result would end up in single cell
{ 69630e4574ec6798, 78630e4574ec6798, 68630e4574ec6798}
I need it to be in different rows in CSV as below:
69630e4574ec6798
78630e4574ec6798
68630e4574ec6798
$Regex = [regex]"\s[a-f0-9]{16}"
Select-Object #{Name="Identity";Expression={$Regex.Matches($_.Textbody)}} |
Format-Table -Wrap |
Export-Csv -Path c:\temp\Inbox.csv -NoTypeInformation -Append
Details screenshot:
Edit:
I have been trying to split the data I have in my CSV but I am having difficulty in splitting the output data "id" to next line as they all come in one cell "{56415465456489944,564544654564654,46565465}".
In the screenshot below the first couple lines are the source input and the highlighted lines in the second group is the output that I am trying to get.
Change your regular expression so that it has the hexadecimal substrings in a capturing group (to exclude the leading whitespace):
$Regex = [regex]"\s([a-f0-9]{16})"
then extract the first group from each match:
$Regex.Matches($_.Textbody) | ForEach-Object {
$_.Groups[1].Value
} | Set-Content 'C:\temp\a.txt'
Use Set-Content rather than Out-File, because the latter will create the output file in Unicode format by default whereas the former defaults to ASCII output (both cmdlets allow overriding the default via a parameter -Encoding).
Edit:
To split the data from you id column and create individual rows for each ID you could do something like this:
Import-Csv 'C:\path\to\input.csv' | ForEach-Object {
$row = $_
$row.id -replace '[{}]' -split ',' | ForEach-Object {
$row | Select-Object -Property *,#{n='id';e={$_}} -ExcludeProperty id
}
} | Export-Csv 'C:\path\to\output.csv' -NoType

Is it possible to replace Get-Content, ForEach-Object string -match with Select-String cmdlet?

I have a fixed width file with records in a format as follows
DDEDM2018890 19960730015000010000
DDETPL015000 20150515015005010000
DDETPL015010 20150515015003010000
DDETPL015020 20150515015002010000
DDETPL015030 20150515015005010000
DDETPL015040 20150515015000010000
the first 3 characters identify the record type, in the above example all records are of type DDE but there are also lines of a different type in the file.
the following regular expression with named capture groups parses the relevant information from each record for my purpose (notice it also filters down to DDE record types:
DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})
play with this regex on this excellent online parser
I have written a script that uses the Get-Content, ForEach-Object and Select-Object cmdlets to convert the fixed width file into a csv file.
I wonder if I could replace the Get-Content and ForEach-Object cmdlets by a single Select-String cmdlet?
#this powershell script reads fixed width file and generates a csv file of the relevant & converted values
#Prepare HashSet object for Select-Object to convert CategoryCode and append with CategoryId
$Category = #{
Name = "Category"
Expression = {
$cat = switch($_.CategoryCode)
{
"50"{"A"}
"54"{"C"}
"60"{"F"}
"66"{"I"}
"74"{"M"}
"88"{"T"}
}
$cat+$_.CategoryId
}
}
gc "C:\Path\To\File.txt" | % {
if($_ -match "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3}).*$")
{
#$matches is a hashset of named capture groups, convert to object to allow Select-Object to handle hashset elements as object properties
[PSCustomObject]$matches
}
} | select Database, $Category, Length #| export-csv "AnalysisLengths.csv" -NoTypeInformation
Before I finalized the script, I was trying to use the Select-String cmdlet but could not figure out how to use it, I believe it can achieve the same result in a more eloquent way... this is what I had:
##Could this be completed with just the Select-String commandlet instead of Get-Content+ForEach+Select-Object?
Select-String -Path "C:\Path\To\File.txt" `
-Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" `
| Select-Object -ExpandProperty Matches
Using -ExpandProperty should convert the Microsoft.PowerShell.Commands.MatchInfo Matches property into the actual System.Text.RegularExpressions.Match objects for each line...
see also Powershell Select-Object vs ForEach on Select-String results
Here is one way (I'am not so proud of it)
Select-String -Path "C:\Path\To\File.txt" -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" | %{New-Object -TypeName PSObject -Property #{Database=$_.matches.groups[1];CategoryCode=$_.matches.groups[2];CategoryId=$_.matches.groups[3];Length=$_.matches.groups[4]}} | export-csv "C:\Path\To\File.csv"
I don't know why you have limited your question to Select-String cmdlet. If you had included the switch statement, then, I'd answer to you: YES! It's possible!
And I'd present to you this simple and short PowerShell code:
$(switch -Regex -File $fileIN{$patt{[pscustomobject]$matches|select * -ExcludeProperty 0}})|epcsv $fileCSV`
where $fileIN is the input file, $fileCSV is CSV file you wanna create, and $patt is the pattern you have in your OP:
$patt='DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'`
The switch statement is very powerful.
While Select-String can combine Get-Content and pattern matching, you still need a loop for constructing your custom objects. You could stick with what you have, although I'd suggest a couple modifications. Replace the switch statement with a hashtable and make the nested if a Where-Object filter:
$categories = #{
'50' = 'A'
'54' = 'C'
'60' = 'F'
'66' = 'I'
'74' = 'M'
'88' = 'T'
}
$category = #{
Name = 'Category'
Expression = { $categories[$_.CategoryCode] + $_.CategoryId }
}
$pattern = 'DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'
Get-Content 'C:\path\to\file.txt' |
? { $_ -match $pattern } |
% { [PSCustomObject]$matches } |
select Database, $category, Length |
Export-Csv 'C:\path\to\output.csv' -NoType
Or you could go with #JPBlanc's suggestion (again with some slight modifications):
$category = #{
'50' = 'A'
'54' = 'C'
'60' = 'F'
'66' = 'I'
'74' = 'M'
'88' = 'T'
}
$pattern = "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})"
Select-String -Path 'C:\path\to\file.txt' -Pattern $pattern | % {
New-Object -TypeName PSObject -Property #{
Database = $_.Matches.Groups[1].Value
Category = $category[$_.Matches.Groups[2].Value] + $_.Matches.Groups[3].Value
Length = $_.Matches.Groups[4].Value
}
} | Export-Csv 'C:\path\to\output.csv' -NoType
The latter will give you slightly better performance, although not too much (execution times were 2:35 vs 2:50 for a 120 MB input file on my test box).

Using powershell, in a csv doc, need to iterate and insert a character

So my csv file looks something like:
J|T|W
J|T|W
J|T|W
I'd like to iterate through, most likely using a regex so that after the two pipes and content \|.+{2}, and insert a tab character `t.
I'm assuming I'd use get-content to loop through, but I'm unsure of where to go from there.
Also...just thought of this, it is possible that the line will overrun to the next line, and therefore the two pipes will be on different lines, which I'm pretty sure makes a difference.
-Thanks
Ok, I'll move the comment discussion to an answer since it seems like it is a potentially valid solution:
Import-csv .\test.csv -Delimiter '|' -Header 'One', 'two', 'three' | %{$_.Three = "`t$($_.Three)"; $_} | Export-CSV .\test_result.cs
This works for a file that is known to have 3 fields. For a more generic solution, if you have the ability to determine the number of fields initially being exported to CSV, then:
Import-csv .\test.csv -Delimiter '|' -Header (1..$fieldCount) | %{$_.$fieldCount = "`t$($_.$fieldCount)"; $_} | Export-CSV .\test_result.cs
In PowerShell you can use the -replace operator with a regex e.g.:
$c = Get-Content foo.csv | Foreach {$_ -replace '<regex_here>','new_string'}
$c | Out-File foo.csv -encoding ascii
Note that in new_string you can refer to capture groups using $1 but you'll want to put that string in single quotes so PowerShell won't try to interpret $1 as a variable reference.