I am trying to extract only my JIRA issue numbers from a text file, eliminating duplicates. This is good in Shell script:
cat /tmp/jira.txt | grep -oE '^[A-Z]+-[0-9]+' | sort -u
But I want to use Powershell and tried this
$Jira_Num=Get-Content /tmp/jira.txt | Select-String -Pattern '^[A-Z]+-[0-9]+' > "$outputDir\numbers.txt"
But, this returns the entire line also not eliminating the duplicates. I tried regex but I am new to powershell don't know how exactly to use it. Can someone help please.
Sample Jira.txt file
PRJ-2303 Modified the artifactName
PRJ-2303 Modified comment
JIRA-1034 changed url to tag the prj projects
JIRA-1000 for release 1.1
JIRA-1000 Content modification
Expected output
PRJ-2303
JIRA-1034
JIRA-1000
Should work with something like this:
$Jira_Num = Get-Content /tmp/jira.txt | ForEach-Object {
if ($_ -match '^([A-Z]+-[0-9]+)') {
$Matches[1]
}
} | Select-Object -Unique
Get-Content reads a file line by line, so we can pipe it to other cmdlets to process each line.
ForEach-Object runs a command block for each item in the pipeline. So here we're using the -match operator to perform a regex match against the line, with a capturing group. If the match succeeds, we send the matched group (the JIRA issue key) down the pipeline.
Select-Object -Unique will compare the objects and return only the unique ones.
Select-String can still work! The problem comes from the misconception of the return object. It returns a [Microsoft.PowerShell.Commands.MatchInfo] and it would appear it ToString() equivalent is the whole matching line. I don't know what version of PowerShell you have but this should do the trick.
$Jira_Num = Get-Content /tmp/jira.txt |
Select-String -Pattern '^[A-Z]+-[0-9]+' |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value -Unique
Also you can get odd results when you are writing to an output stream and a variable at the same time. It is generally better to use Tee-Object in cases like that.
Select-String /tmp/jira.txt -Pattern '^[A-Z]+-[0-9]+' |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value -Unique |
Tee-Object -Variable Jira_Num |
Set-Content "$outputDir\numbers.txt"
Now the file $outputDir\numbers.txt and the variable $Jira_Num contain the unique list. The $ not used with Tee-Object was done on purpose.
Related
I have a file, input.txt, containing text like this:
GRP123456789
123456789012
GRP234567890
234567890123
GRP456789012
"A lot of text. More text. Blah blah blah: Foobar." (Source Error) (Blah blah blah)
GRP567890123
Source Error
GRP678901234
Source Error
GRP789012345
345678901234
456789012345
I'm attempting to capture all occurrences of "GRP#########" on the condition that at least one number is on the next line.
So GRP123456789 is valid, but GRP456789012 and GRP678901234 are not.
The RegEx pattern I came up with on http://regexstorm.net/tester is: (GRP[0-9]{9})\s\n\s+[0-9]
The PowerShell script I have so far, based off this site http://techtalk.gfi.com/windows-powershell-extracting-strings-using-regular-expressions/, is:
$input_path = 'C:\Users\rtaite\Desktop\input.txt'
$output_file = 'C:\Users\rtaite\Desktop\output.txt'
$regex = '(GRP[0-9]{9})\s\n\s+[0-9]'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Values } > $output_file
I'm not getting any output, and I'm not sure why.
Any help with this would be appreciated as I'm just trying to understand this better.
You need to turn the text input into a single string before passing it to Select-String, otherwise the cmdlet will operate on each line individually and thus never find a match.
Get-Content $input_path | Out-String |
Select-String $regex -AllMatches |
Select-Object -Expand Matches |
ForEach-Object { $_.Groups[1].Value } |
Set-Content $output_file
If you're using PowerShell v3 or newer you can replace Get-Content | Out-String with Get-Content -Raw.
To strip strings from a text file using a pattern, then the best tool for the job is the Select-String. This is also has a parameter called -Context which lets you capture lines before or after the matched line, ideal for just this problem.
So my solution would be something like this:
Select-String 'input.txt' -Pattern '^GRP[0-9]{9}' -Context 0, 1 | ? {
$_.Context.PostContext -match '\d'
} | Select -ExpandProperty line | Set-Content 'output_file.txt'
Using
[regex]::Matches($(Get-Content '.\Desktop\new 1.txt'), "GRP\d+(?=\s+\d)") |
% { $_.value | Out-File .\Desktop\new-1-matches.txt -Append }
I achieved the following output from your sample file:
GRP123456789
GRP234567890
GRP789012345
I am using a PowerShell command to find all *.vue files (it's a simple text format) in a directory, where I need to match this:
7,Id
6,Default
So, these are 2 consecutive lines. With Notepad++ I see CRLF at the end of the line. Following Google searches, this must be close:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse |
Select-String -Pattern "7,Id\r\n6,Default" -CaseSensitive |
Out-File C:\test.txt
But it does not find the files. I checked that I can find the first part (7,Id) correctly, and also the second part (6,Default), but the combination with the newline is not working.
Any ideas please? Maybe an alternative?
I can have a workaround but it's inefficient and a lot of coding. For example, I could use PowerShell to provide a list of only the first sentence, then process these files to see if it matches the second sentence as well. I want to avoid that.
You need to pass the content of the file as a single string, otherwise Select-String will apply the pattern to each line separately.
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse | ForEach-Object {
Get-Content $_.FullName | Out-String |
Select-String -Pattern "7,Id\r\n6,Default" -CaseSensitive |
Select-Object -Expand Matches |
Select-Object -Expand Groups |
Select-Object -Expand Value
} | Out-File C:\test.txt
On PowerShell v3 and newer you can use Get-Content -Raw instead of Get-Content | Out-String.
As an alternative to Select-String you could use the -cmatch operator in a Where-Object filter:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse | ForEach-Object {
Get-Content $_.FullName | Out-String | Where-Object {
$_ -cmatch "7,Id\r\n6,Default"
} | ForEach-Object {
$matches[0]
}
} | Out-File C:\test.txt
With Select-String, the -Pattern parameter is regex capable, so try this:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse |
Select-String -Pattern "7,Id|6,Default" -CaseSensitive |
Out-File C:\test.txt
The vertical pipe bar (|) acts as an alternative separator, or in otherwords, an "or" operator. With the pattern it will match either.
I'm trying to extract the titles of events in one of my logs, which is just a text file with lots of data. The filename is eventlog-1-5-2016.txt (date is always the current date). Each line in the file is one event like this:
1-1-16(Commodore Rally)Address|Time
1-2-16(Open House)Address|Time
I just want to go through the file and extract the title in parentheses, excluding the parentheses themselves, and output the list to the console, or a text file.
I've also tried output to a txt file but I'm missing something. Can you tell me why this doesn't work:
Console:
Select-String -Path c:\log\eventlog-1-5-2016.txt -Pattern '\(([^\)]+)\)' -AllMatches |
% { $_.Matches }
To File:
Select-string -Path c:\log\eventlog-1-5-2016.txt -Pattern '\(([^\)]+)\)' -AllMatches |
% { $_.Matches } | { $_.Value > C:\log\results.txt
or even a better way to do this if this wrong.
Bonus question, could the path auto calculate the current date and correct the file name for easy future pasting? (not major!)
The current date can be determined like this:
(Get-Date).ToString('d-M-yyyy')
Also, your regular expression could be simplified a little by using a non-greedy match:
`\((.+?)\)
If you want just the text between the parentheses you need the value of the captured group instead of the complete match:
$date = (Get-Date).ToString('d-M-yyyy')
Select-String -Path "C:\log\eventlog-$date.txt" -Pattern '\((.+?)\)' -AllMatches |
ForEach-Object { $_.Matches } |
ForEach-Object { $_.Groups[1] } |
ForEach-Object { $_.Value } |
Out-File 'C:\log\results.txt'
If you have PowerShell v3 or newer you could collapse the ForEach-Object statements:
Select-String -Path "C:\log\eventlog-$date.txt" -Pattern '\((.+?)\)' -AllMatches |
ForEach-Object { $_.Matches.Groups[1].Value } |
Out-File 'C:\log\results.txt'
Or you could use the -match operator:
Get-Content "C:\log\eventlog-$date.txt" |
Where-Object { $_ -match '\((.+?)\)' } |
ForEach-Object { $matches[1] } |
Set-Content 'C:\log\results.txt'
What happens is that Select-String doesn't quite do what you think it does. It will match patterns, but instead of returning the matched part, it will return you the whole matching string. Thus, the statement returns you the whole matched row instead of just substring in parenthesis. This is common a cause for confusion.
As a simple example in case of link rot,
[regex]$rx = '\(([^\)]+)\)'
cat C:\Temp\logfile.txt | % { $rx.Matches( $_ ).value }
(Commodore Rally)
(Open House)
I am learning regex and am trying to get a better understanding by using a text file with the value $100,000 in it. What I am trying to do is to search the text file for the string "$100,000" and if it is there export the value out into a new CSV. this is what I'm using so far.
[io.file]::readalltext("c:\utilities\notes_$datetime.txt") -match("[$][0-9][0-9][0-9],[0-9][0-9][0-9]") | Out-File C:\utilities\amount.txt -Encoding ascii -Force
Which returns true. Can someone point me in the right direction as to grabbing the string value that it finds into a new CSV?
many thanks!
You're reading the file into a single string, not an array of lines, so you should use the Select-String -AllMatches instead of the -match operator:
[IO.File]::ReadAllText("c:\utilities\notes_$datetime.txt") |
Select-String '\$\d{3},\d{3}' -AllMatches |
% { $_.Matches.Groups.Value } |
Out-File C:\utilities\amount.txt -Encoding ascii -Force
As a side note, using Get-Content -Raw would be slightly more PoSh than using .Net methods, although .Net methods provide better performance.
Get-Content "c:\utilities\notes_$datetime.txt" -Raw |
Select-String '\$\d{3},\d{3}' -AllMatches |
% { $_.Matches.Groups.Value } |
Out-File C:\utilities\amount.txt -Encoding ascii -Force
I prefer to use [regex]::match for that:
$x = 'text bla $100,000 text text'
[regex]::Match($x,"\$[\d]{3},[\d]{3}").Groups[0].Value
I also changed the expression a little bit ($ followed by 3 numbers, followed by a "," and another 3 numbers).
So your script could look like this:
$fileContent = Get-Content "c:\utilities\notes_$datetime.txt"
[regex]::Match($fileContent,"\$[\d]{3},[\d]{3}").Groups[0].Value | Out-File C:\utilities\amount.txt -Encoding ascii -Force
Why not use the Select-String cmdlet - far easier:
Select-String .\infile.csv -pattern '\$[\d]{3},[\d]{3}' | Select Line | Out-File outfile.txt
You can then process multiple files like so:
Get-Childitem *.csv | Select-String -pattern '\$[\d]{3},[\d]{3}' | Select Line | Out-File outfile.txt
The Select-String has the following properties:
Line - the line where the regex found a match
LineNumber - the line number in the file where the match was found
Filename - the name of the file the match was found in
I want to search through files in a folder and find the following strings in each file and I want to output it to a file. I would like to find a combination of 2 strings in the files no matter how it is written in the file. I should be able to find these combination of strings even if a carriage return exists in the middle of these 2 strings.
Here's the code I have so far:
$Path = "C:\Promotion\Scripts"
$txt_string1 = "CREATE"
$txt_string2 = "PROC"
$PathArray = #()
$Results = "C:\Promotion\Errors\Deployment_Errors.txt"
# This code snippet gets all the files in $Path that end in ".sql".
Get-ChildItem $Path -Filter "*.sql" |
Where-Object { $_.Attributes -ne "Directory"} |
ForEach-Object {
If (Get-Content $_.FullName | Select-String -Pattern $txt_string2) {
$PathArray += $_.FullName
}
}
$PathArray | ForEach-Object {$_} | Out-File $Results
for find more than one string in txt file You should Use like this method
"hello","guy","hello guy" | Select-String -Pattern '(hello.*guy)|(guy.*hello)'
the result :
hello guy
after you find strings you want out-file
like that:
"hello","guy","hello guy" | Select-String -Pattern '(hello.*guy)|(guy.*hello)' | Out-File -FilePath c:\test.txt
now we see in test.txt
PS C:\> Get-Content test.txt
hello guy
You can do this without loops. Define the combinations of your two search terms as alternatives in a regular expression with multiline support enabled ((?ms)).
$basepath = 'C:\Promotion\Scripts'
$results = 'C:\Promotion\Errors\Deployment_Errors.txt'
$term1 = 'CREATE'
$term2 = 'PROC'
$pattern = "(?ms)($term1.*$term2|$term2.*$term1)"
Get-ChildItem "$basepath\*.sql" |
? { Get-Content $_.FullName -Raw | Select-String -Pattern $pattern } |
select -Unique -Expand FullName |
Out-File $results
Note that this will report any file that contains both terms anywhere in it, no matter what other text is between them. If you want to find only files that contain combinations of the two terms either not separated (CREATEPROC or PROCCREATE) or separated nothing but whitespace, change the pattern to this:
$pattern = "(?ms)($term1\s*$term2|$term2\s*$term1)"
Depending on your search terms it may also be a good idea to escape them before building the regular expression, so that you don't get unwanted meta characters (not likely with the two string literals you have, but just to be on the safe side):
$term1 = [regex]::Escape('CREATE')
$term2 = [regex]::Escape('PROC')