I looked into doing this for one file and I really like the solution powershell offers:
Get-Content test.txt | ForEach-Object { $_ -replace "foo", "bar" } | Set-Content test2.txt
Is there a way I can do this to get the content of a list of files, perform the same search and replace, and produce a second set of files?
This accomplishes what I want:
(Get-Content *.txt) | ForEach-Object { $_ -replace "foo", "bar" } | Set-Content *.txt
Give this a try --
dir *.txt | % {[IO.File]::ReadAllText($_).Replace('bar', 'test') | sc $_}
Related
I've searched all over including here at StackOverFlow and I cannot seem to find the solution I am needing help with. Here is my issue.
Lets say in File1.txt I have the following (no spaces between each line)
\\Serv02\LOC6\Client\726C30\032383\2200018023.pdf
\\Serv02\LOC6\Client\726C30\032383\2200718091.pdf
\\Serv02\LOC6\Client\726C30\030684\2300309040.pdf
\\Serv02\LOC6\Client\726C30\031274\2300429971.pdf
File2.txt will have the same information, however, I am needing to add a 1 right before the .pdf for each one (within file2.txt)
Example:
\\Serv02\LOC6\Client\726C30\032383\22000180231.pdf
I can easily update file2.txt using a RegEx statement, however it's only updating the contents based on that RegEx statement.
File2.txt will have a lot more data in it than file1.txt (more of the exact type of information). I am only needing to update file2.txt adding in the 1 right before .pdf BASED on what is in file1.txt
Here is the code I am using but as you can see it does not read file1.txt at all, I'm just using a RegEx statement to update file2.txt adding in the 1 before .pdf (the code below works to add in the 1 before .pdf, but I'm not iterating through file1.txt)
clear-host
set-location c:\temp
$File = "C:\Temp\file1.txt"
$FileZ = "C:\Temp\file2.txt"
$File2 = (Get-ChildItem $fileZ) | Select -ExpandProperty BaseName
$regex01 = '(\\Serv02\LOC6\Client\726C30\\d{1,6}\\d{1,10})(.pdf)$'
get-content $fileZ | % { $_ -replace $regex01, '${1}1${2}' -join "`r`n" } | out-file -Encoding default "c:\Temp\$File2.txt"
start-sleep -Seconds 2
$NewMRC = Get-ChildItem "$file2.txt" | Select -ExpandProperty Name
Get-ChildItem $NewMRC | rename-item -NewName {$_.Name -replace ".txt",".MRC2"}
If file1.txt had another line that didn't match up to the RegEx as shown above, file2.txt would not be updated with that line
\\Serv03\LOC7\Client\780D30\031456\8675309123.pdf
I hope I have explained this well enough. I'm not new to PowerShell but I am far from an expert. Any assistance is greatly appreciated.
I've modified your code as follows. The approach is read the content of File1.txt and store it in a variable. Then iterate on each line of File2.txt to check it against the regex as well as if that line is present in file1 content. If yes then replace it with whatever you want. Output this to a .tmp file in append mode. Once all the lines in File2.txt are processed, then replace it with .tmp file.
clear-host
set-location c:\temp
$File = "file1.txt"
$FileZ = "file2.txt"
# PS2
$File1 = get-content $File | Out-String
# PS3
# $File1 = get-content $File -Raw
$File2 = (Get-ChildItem $fileZ) | Select -ExpandProperty BaseName
if( test-path "$File2.tmp" ) { remove-item "$File2.tmp" }
$regex01 = '(\\\\Serv02\\LOC6\\Client\\726C30\\\d{1,6}\\\d{1,10})(.pdf)$'
get-content $fileZ |% {
$line = $_
$find = $line -replace '\\','\\'
if ( ($line -match $regex01) -AND ( $File1 -match $find ) ) {
$line -replace $regex01,'${1}1${2}' -join "`r`n"
} else {
$line
}
} | out-file "$File2.tmp" -append
remove-item "$File2.txt"
rename-item "$File2.tmp" "$File2.txt"
#start-sleep -Seconds 2
#$NewMRC = Get-ChildItem "$file2.txt" | Select -ExpandProperty Name
#Get-ChildItem $NewMRC | rename-item -NewName {$_.Name -replace ".txt",".MRC2"}
Notes:
The last 3 lines of your code doesn't seem to be related to your problem statement. So I've commented those lines.
$find = $line -replace '\\','\\': We are replacing single backslash \ with double backslash \\. But in the first parameter to -replace it must be escaped and in second param it must NOT be. So, even though they look same, they are interpreted differently.
One way to do this: Retrieve file content of first file into an array, then retrieve content of second file. For each line in second file: If first file's content has a line matching the current line, output modified line; otherwise, just output the current line.
$pattern = '(\\{2}(?:[^\\]+\\)+)([^\\\.]+)(\.pdf)'
$file1Content = Get-Content "file1.txt"
Get-Content "file2.txt" | ForEach-Object {
if ( $file1Content -contains $_ ) {
$_ | Select-String $pattern | ForEach-Object {
"{0}{1}1{2}" -f
$_.Matches[0].Groups[1].Value,
$_.Matches[0].Groups[2].Value,
$_.Matches[0].Groups[3].Value
}
}
else {
$_
}
}
First match group ($_.Matches[0].Groups[1].Value) is \\servername\sharename\path, second match group is filename without extension, and third match group is the file extension.
I have a file, input.txt, containing text like this:
GRP123456789
123456789012
GRP234567890
234567890123
GRP456789012
"A lot of text. More text. Blah blah blah: Foobar." (Source Error) (Blah blah blah)
GRP567890123
Source Error
GRP678901234
Source Error
GRP789012345
345678901234
456789012345
I'm attempting to capture all occurrences of "GRP#########" on the condition that at least one number is on the next line.
So GRP123456789 is valid, but GRP456789012 and GRP678901234 are not.
The RegEx pattern I came up with on http://regexstorm.net/tester is: (GRP[0-9]{9})\s\n\s+[0-9]
The PowerShell script I have so far, based off this site http://techtalk.gfi.com/windows-powershell-extracting-strings-using-regular-expressions/, is:
$input_path = 'C:\Users\rtaite\Desktop\input.txt'
$output_file = 'C:\Users\rtaite\Desktop\output.txt'
$regex = '(GRP[0-9]{9})\s\n\s+[0-9]'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Values } > $output_file
I'm not getting any output, and I'm not sure why.
Any help with this would be appreciated as I'm just trying to understand this better.
You need to turn the text input into a single string before passing it to Select-String, otherwise the cmdlet will operate on each line individually and thus never find a match.
Get-Content $input_path | Out-String |
Select-String $regex -AllMatches |
Select-Object -Expand Matches |
ForEach-Object { $_.Groups[1].Value } |
Set-Content $output_file
If you're using PowerShell v3 or newer you can replace Get-Content | Out-String with Get-Content -Raw.
To strip strings from a text file using a pattern, then the best tool for the job is the Select-String. This is also has a parameter called -Context which lets you capture lines before or after the matched line, ideal for just this problem.
So my solution would be something like this:
Select-String 'input.txt' -Pattern '^GRP[0-9]{9}' -Context 0, 1 | ? {
$_.Context.PostContext -match '\d'
} | Select -ExpandProperty line | Set-Content 'output_file.txt'
Using
[regex]::Matches($(Get-Content '.\Desktop\new 1.txt'), "GRP\d+(?=\s+\d)") |
% { $_.value | Out-File .\Desktop\new-1-matches.txt -Append }
I achieved the following output from your sample file:
GRP123456789
GRP234567890
GRP789012345
I want to search through files in a folder and find the following strings in each file and I want to output it to a file. I would like to find a combination of 2 strings in the files no matter how it is written in the file. I should be able to find these combination of strings even if a carriage return exists in the middle of these 2 strings.
Here's the code I have so far:
$Path = "C:\Promotion\Scripts"
$txt_string1 = "CREATE"
$txt_string2 = "PROC"
$PathArray = #()
$Results = "C:\Promotion\Errors\Deployment_Errors.txt"
# This code snippet gets all the files in $Path that end in ".sql".
Get-ChildItem $Path -Filter "*.sql" |
Where-Object { $_.Attributes -ne "Directory"} |
ForEach-Object {
If (Get-Content $_.FullName | Select-String -Pattern $txt_string2) {
$PathArray += $_.FullName
}
}
$PathArray | ForEach-Object {$_} | Out-File $Results
for find more than one string in txt file You should Use like this method
"hello","guy","hello guy" | Select-String -Pattern '(hello.*guy)|(guy.*hello)'
the result :
hello guy
after you find strings you want out-file
like that:
"hello","guy","hello guy" | Select-String -Pattern '(hello.*guy)|(guy.*hello)' | Out-File -FilePath c:\test.txt
now we see in test.txt
PS C:\> Get-Content test.txt
hello guy
You can do this without loops. Define the combinations of your two search terms as alternatives in a regular expression with multiline support enabled ((?ms)).
$basepath = 'C:\Promotion\Scripts'
$results = 'C:\Promotion\Errors\Deployment_Errors.txt'
$term1 = 'CREATE'
$term2 = 'PROC'
$pattern = "(?ms)($term1.*$term2|$term2.*$term1)"
Get-ChildItem "$basepath\*.sql" |
? { Get-Content $_.FullName -Raw | Select-String -Pattern $pattern } |
select -Unique -Expand FullName |
Out-File $results
Note that this will report any file that contains both terms anywhere in it, no matter what other text is between them. If you want to find only files that contain combinations of the two terms either not separated (CREATEPROC or PROCCREATE) or separated nothing but whitespace, change the pattern to this:
$pattern = "(?ms)($term1\s*$term2|$term2\s*$term1)"
Depending on your search terms it may also be a good idea to escape them before building the regular expression, so that you don't get unwanted meta characters (not likely with the two string literals you have, but just to be on the safe side):
$term1 = [regex]::Escape('CREATE')
$term2 = [regex]::Escape('PROC')
I have a list of regular expressions(about 2000) and over a million html files. I want to check if each regular expression success on every file or not. How to do this on powershell?
Performance is important, so I don't want to loop through regular expressions.
I try
$text | Select-String -Pattern pattern1, pattern2,...
And it returns all matches, but I also want to find out, which pattern success which one not. I need to build a list of success regular expressions for each file
You could try something like this:
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
Get-ChildItem -Filter *.txt | Select-String -Pattern $regex | ForEach-Object {
$ht[$_.Path] += #($_ | Select-Object -ExpandProperty Pattern)
}
Test-output:
$ht | Format-Table -AutoSize
Name Value
---- -----
C:\Users\graimer\Desktop\New Text Document (2).txt {e2$}
C:\Users\graimer\Desktop\New Text Document.txt {^test, e2$}
You didn't specify how you wanted the output.
UPDATE: To match multiple patterns on a single line, try this(mjolinor's answer is probably faster then this).
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
$regex | ForEach-Object {
$pattern = $_
Get-ChildItem -Filter *.txt | Select-String -Pattern $pattern | ForEach-Object {
$ht[$_.Path] += #($_ | Select-Object -ExpandProperty Pattern)
}
}
UPDATE2: I don't have enough samples to try it, but since you have such a huge amount of files, you migh want to try reading the file into memory before looping through the patterns. It may be faster.
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
Get-ChildItem -Filter *.txt | ForEach-Object {
$text = $_ | Get-Content
$filename = $_.FullName
$regex | ForEach-Object {
$text | Select-String -Pattern $_ | ForEach-Object {
$ht[$filename] += #($_ | Select-Object -ExpandProperty Pattern)
}
}
}
I don't see any way around doing a foreach through the regex collection.
This is the best I could come up with performance-wise:
$regexes = 'pattern1','pattern2'
$files = get-childitem -Path <file path> |
select -ExpandProperty fullname
$ht = #{}
foreach ($file in $files)
{
$ht[$file] = New-Object collections.arraylist
foreach ($regex in $regexes)
{
if (select-string $regex $file -Quiet)
{
[void]$ht[$file].add($regex)
}
}
}
$ht
You could speed up the process by using background jobs and dividing up the file collection among the jobs.
I have a text file containing lines of data. I can use the following powershell script to extract the lines I'm interested in:
select-string -path *.txt -pattern "subject=([A-Z\.]+),"
Some example data would be:
blah blah subject=THIS.IS.TEST.DATA, blah blah blah
What I want is to be able to extract just the actual contents of the subject (i.e. the "THIS.IS.TEST.DATA" string). I tried this:
select-string -path *.txt -pattern "subject=([A-Z\.]+)," | %{ $_.Matches[0] }
But the "Matches" property is always null. What am I doing wrong?
I don't know why your version doesn't work. It should work. Here is an uglier version that works.
$p = "subject=([A-Z\.]+),"
select-string -path *.txt -pattern $p | % {$_ -match $p > $null; $matches[1]}
Explanation:
-match is a regular expression matching operator:
>"foobar" -match "oo.ar"
True
The > $null just suppresses the True being written to the output. (Try removing it.) There is a cmdlet that does the same thing whose name I don't recall at the moment.
$matches is a magic variable that holds the result of the last -match operation.
In PowerShell V2 CTP3, the Matches property is implemented. So the following will work:
select-string -path *.txt -pattern "subject=([A-Z\.]+)," | %{ $_.Matches[0].Groups[1].Value }
Yet another option
gci *.txt | foreach { [regex]::match($_,'(?<=subject=)([^,]+)').value }
There is a much simpler alternative to select-string that will work better.
In powershell,
$sample="blah blah subject=THIS.IS.TEST.DATA, blah blah blah"
$sample -match "subject=([A-Z\.]+),"
$matches[1] will have the substring you are looking for.
This works on Windows 10.0.16299 version
Having learnt a lot from all the other answers I was able to get what I want using the following line:
gci *.txt | gc | %{ [regex]::matches($_, "subject=([A-Z\.]+),") } | %{ $_.Groups[1].Value }
This felt nice as I was only running the regex once per line and as I was entering this at the command prompt it was nice not to have multiple lines of code.
The problem with the code you are typing is that select-string does not pass down the actual Regex object. Instead it passes a different class called MatchInfo which does not have the actual regex matches information.
If you only want to run the regex once, you will have to roll you're own function which isn't too difficult.
function Select-Match() {
param ($pattern = $(throw "Need a pattern"),
$filePath = $(throw "Need a file path") )
foreach ( $cur in (gc $filePath)) {
if ( $cur -match $pattern ) {
write-output $matches[0];
}
}
}
gci *.txt | %{ Select-Match "subject=([A-Z\.]+)," $_.FullName }
The Select-String command seems to return a MatchInfo variable and not a "string" variable.
I spent several hours finding this out on forums and official website with no luck.
I'm still gathering info.
A way around this is to declare explicitly a string variable to hold the result returned from the Select-String, from your example:
[string] $foo = select-string -path *.txt -pattern "subject=([A-Z.]+),"
The $foo variable is now a string and not a MatchInfo object.
Hope this helps.
ps5 powershell version 5 string strings manipulation
Another variation, matching 7 digits in a string
echo "123456789 hello test" | % {$_ -match "\d{7}" > $null; $matches[0]}
returns: 1234567