Extracting match from text file if subsequent lines contain specific strings - regex

I'm trying to pull certain lines of data from multiple text file using a certain match of data. I have that part working (it matches on the strings that I have and pulling back the entire line). That's what I want, but I also need a certain line of data that occurs before the match (only when it matches). I also have that working, but its not 100% right.
I have tried to accomplish pulling the line above my match by using the -Context parameter. It seems to work, but in some cases it is merging data together from multiple matches and not pulling the line above my matches. Below is a sample of one of the files that I'm searching in:
TRN*2*0000012016120500397~
STC*A3:0x9210019*20170103*U*18535********String of data here
STC*A3:0x810049*20170103*U*0********String of Data here
STC*A3:0x39393b5*20170103*U*0********String of data here
STC*A3:0x810048*20170103*U*0********String of data here
STC*A3:0x3938edc*20170103*U*0********String of data here
STC*A3:0x3938edd*20170103*U*0********String of data here
STC*A3:0x9210019*20170103*U*0********String of data here
TRN*2*0000012016120500874~
STC*A3:0x9210019*20170103*U*18535********String of data here
STC*A3:0x39393b5*20170103*U*0********String of data here
STC*A3:0x3938edc*20170103*U*0********String of data here
STC*A3:0x3938edd*20170103*U*0********String of data here
STC*A3:0x9210019*20170103*U*0********String of data here
TRN*2*0000012016120500128~
STC*A3:0x810049*20170103*U*0********String of Data here
STC*A3:0x39393b5*20170103*U*0********String of data here
STC*A3:0x810024*20170103*U*0********String of data here
STC*A3:0x9210019*20170103*U*0********String of data here
TRN*2*0000012016120500345~
STC*A3:0x9210019*20170103*U*18535********String of data here
STC*A3:0x810049*20170103*U*0********String of Data here
STC*A3:0x39393b5*20170103*U*0********String of data here
STC*A3:0x3938edc*20170103*U*0********String of data here
TRN*2*0000012016120500500~
STC*A3:0x810048*20170103*U*18535********String of data here
TRN*2*0000012016120500345~
STC*A3:0x810049*20170103*U*18535********String of data here
I'm trying to pull the TRN*2 line only when the lines below each TRN*2 have STC*A3:0x810024 and STC*A3:0x810048 in them, but again getting inconsistent results.
Is there a way that I could search for the TRN*2 line and pull the TRN*2 and the lines below it that contain STC*A3:0x810024 and STC*A3:0x810048? If the lines below the TRN*2 line do not contain STC*A3:0x810024 and STC*A3:0x810048, then don't pull anything.
Here is my code so far:
$FilePath = "C:\Data\2017"
$files = Get-ChildItem -Path $FilePath -Recurse -Include *.277CA_UNWRAPPED
foreach ($file in $files) {
(Get-Content $file) |
Select-String -Pattern "STC*A3:0x810024","STC*A3:0x810048" -SimpleMatch -Context 1,0 |
Out-File -Append -Width 512 $FilePath\Output\test_results.txt
}

Your approach won't work, because you're selecting lines that contain STC*A3:0x810024 or STC*A3:0x810048 and the line before them. However, the preceding lines don't necessarily start with TRN. Even if they did, the statement would still produce TRN lines that are followed by any of the STC strings, not just lines that are followed by both STC strings.
What you actually want is split the files before lines starting with TRN, and then check each fragment if it contains both STC strings.
(Get-Content $file | Out-String) -split '(?m)^(?=TRN\*2)' | Where-Object {
$_.Contains('STC*A3:0x810024') -and
$_.Contains('STC*A3:0x810048')
} | ForEach-Object {
($_ -split '\r?\n')[0] # get just the 1st line from each fragment
} | Out-File -Append "$FilePath\Output\test_results.txt"
(?m)^(?=TRN\*2) is a regular expression matching the beginning of a line followed by the string "TRN*2". The (?=...) is a so-called positive lookahead assertion. It ensures that the "TRN*2" is not removed when splitting the string. (?m) is a modifier that makes ^ match the beginning of a line inside a multiline string rather than just the beginning of the string.

Related

How to replace lines depending on the remaining text in file using PowerShell

I need to edit txt file using PowerShell. The problem is that I need to apply changes for the string only if the remaining part of the string matches some pattern. For example, I need to change 'specific_text' to 'other_text' only if the line ends with 'pattern':
'specific_text and pattern' -> changes to 'other_text and pattern'
But if the line doesn't end with pattern, I don't need to change it:
'specific_text and something else' -> no changes
I know about Replace function in PowerShell, but as far as I know it makes simple change for all matches of the regex. There is also Select-String function, but I couldn't combine them properly. My idea was to make it this way:
((get-content myfile.txt | select-string -pattern "pattern") -Replace "specific_text", "other_text") | Out-File myfile.txt
But this call rewrites the whole file and leaves only changed lines.
You may use
(get-content myfile.txt) -replace 'specific_text(?=.*pattern$)', "other_text" | Out-File myfile.txt
The specific_text(?=.*pattern$) pattern matches
specific_text - some specific_text...
(?=.*pattern$) - not immediately followed with any 0 or more chars other than a newline as many as possible and then pattern at the end of the string ($).

Match Multi line Events Using Only Starting Value

I'm attempting to match events where the only way to tell when an event starts and ends is with the header or first value in the multi line event (e.g. START--). Basically, using the header as an ending anchor to get the whole event. Also, the last event will end at the end of the file, so there's no anchor for that one. I'm not quite sure how to make this work.
Event Example (There's no spaces between the lines)
START--random stuff here
more random stuff on this new line
more stuff and things
START--some random things
additional random things
blah blah
START--data data more data
START--things
blah data
$FileContent | select-string '^START--(.*?)^START--' -AllMatches | Foreach {$_.Matches} | Foreach {$_.Value}
You may read in the file into a single variable (it can be done by passing -Raw option to Get-Content, for example) and split it at the start of lines starting with START-- but the first line:
$contents = Get-Content 'your_file_path' -Raw
$contents -split '(?m)^(?!\A)(?=START--)'
It will yield
Regex details
(?m) - the multiline option is ON
^ - now, it matches start of lines due to (?m)
(?!\A) - not the start of the whole string/text
(?=START--) - the location that is immediately followed with START-- substring.

Split specific string from lines via regex

I have been trying to extract certain equal to 40 values get the sixth last word from multiple lines inside a .txt file with PowerShell.
I have code so far :
$file = Get-Content 'c:\temp\file.txt'
$Array = #()
foreach ($line in $file)
{
$Array += $line.split(",")[6]
}
$Array
$Array | sc "c:\temp\export2.txt"
Txt file : (may be duplicate lines such as hostname01)
4626898,0,3,0,POL,INCR,hostname01,xx,1549429809,0000000507,1549430316,xxx,0,40,1,xxxx,51870834,5040,100
4626898,0,3,0,POL,INCR,hostname02,xx,1549429809,0000000507,1549430316,xxx,0,15,1,xxxx,51870834,5040,100
4626898,0,3,0,POL,INCR,hostname03 developer host,xx,1549429809,0000000507,1549430316,xxx,0,40,1,xxxx,51870834,5040,100
4626898,0,3,0,POL,INCR,hostname01,xx,1549429809,0000000507,1549430316,xxx,0,40,1,xxxx,51870834,5040,100
This is what I want :
hostname01
hostname02
hostname03 developer host
This is not a fast solution, but a convenient and flexible one:
Since your text file is effectively a CSV file, you can use Import-Csv.
Since your data is missing is a header row (column names), which we can supply to Import-Csv via its -Header parameter.
Since you're interested in columns number 7 (hostnames) and 14 (the number whose value should be 40), we need to supply column names (of our choice) for columns 1 through 14.
Import-Csv conveniently converts the CSV rows to (custom) objects, whose properties you can query with Where-Object and selectively extract with Select-Object; adding -Unique suppresses duplicate values.
To put it all together:
Import-Csv c:\temp\file.txt -Header (1..14) |
Where-Object 14 -eq 40 |
Select-Object -ExpandProperty 7 -Unique
For convenience we've named the columns 1, 2, ... using a range expression (1..14), but you're free to use descriptive names.
Assuming that c:\temp\file.txt contains your sample data, the above yields:
hostname01
hostname03 developer host
To output to a file, pipe the above to Set-Content, as in your question:
... | Set-Content c:\temp\export2.txt
If the desired field is always the 6th in the line it is easier to split each line and fetch the 6th member:
Get-Content 'c:\temp\file.txt' | Foreach-Object {($_ -split ',')[6]} | Select-Object -Unique
You could use a non-capturing group to look through the string for the correct format and reference the name of your 6 element with the 1st capture group $1:
(?:\d+,\d,\d,\d,[A-Z]+,[A-Z]+,)([a-zA-Z 0-9]+)
Demo here
(?: ) - Specifies a non-capture group (meaning it's not referenced via $1, or $2 like you normally would with a capture group
\d+, (I won't repeat all of these, but) looking for a one or more digits followed by a literal ,.
[A-Z]+, - Finds an all capital letter string, followed by a literal , (this occurs twice).
([a-zA-Z 0-9]+) - The capture group you're looking for, $1, that will capture all characters a-z, A-Z, spaces, and digits up until a character not in this set (in this case, a comma). Giving you the text you're looking for.
Below should work with what you are trying to do
Get-Content 'c:\temp\file.txt' | %{
$_.Split(',')[6]
}| select -Unique

How to get a value from Select-String

I have several files in a folder, those are .xml files.
I want to get a value from those files.
A line in the file, could look like this:
<drives name="Virtual HD ATA Device" deviceid="\\.\PHYSICALDRIVE0" interface="IDE" totaldisksize="49,99">
What i'm trying to do is get the value 49,99 in this case.
I am able to get the line out of the file with:
$Strings = Select-String -Path "XML\*.xml" -Pattern totaldisksize
foreach ($String in $Strings) {
Write-Host "Line is" $String
}
But getting just the value in "" i don't get how. I've also played around with
$Strings.totaldisksize
But no dice.
Thanks in advance.
You can do this in one line as follows:
$(select-string totaldisksize .\XML\*.xml).line -replace '.*totaldisksize="(\d+,\d+)".*','$1'
The Select-String will give you a collection of objects that contains information about the match. The line property is the one you're interested in, so you can pull that directly.
Using the -replace operator, every time the .line property is a match of totaldisksize, you can run the regex on it. The $1 replacement will grab the group in the regex, the group being the part in parentheses (\d+,\d+) which will match one or more digits, followed by a comma, followed by one or more digits.
This will print to screen because by default powershell will print an object to the screen. Because you're only accessing the .line property, that's the only bit that's printed and also only after the replacement has been run.
If you wanted to explicitly use a Write-Host to see the results, or do anything else with them, you could store to a variable as follows:
$sizes = $(select-string totaldisksize .\XML\*.xml).line -replace '.*totaldisksize="(\d+,\d+)".*','$1'
$sizes | % { Write-Host $_ }
The above stores the results to an array, $sizes, and you iterate over it by piping it to the Foreach-Object or %. You can then access the array elements with $_ inside the block.
But.. but.. PowerShell knows XML.
$XMLfile = '<drives name="Virtual HD ATA Device" deviceid="\\.\PHYSICALDRIVE0" interface="IDE" totaldisksize="49,99"></drives>'
$XMLobject = [xml]$XMLfile
$XMLobject.drives.totaldisksize
Output
49,99
Or walk the tree and return the content of "drives":
$XMLfile = #"
<some>
<nested>
<tags>
<drives someOther="stuff" totaldisksize="49,99" freespace="22,33">
</drives>
</tags>
</nested>
</some>
"#
$drives = [xml]$XMLfile | Select-Xml -XPath "//drives" | select -ExpandProperty node
Output
PS> $drives
someOther totaldisksize freespace
--------- ------------- ---------
stuff 49,99 22,33
PS> $drives.freespace
22,33
XPath query of "//drives" = Find all nodes named "drives" anywhere in the XML tree.
Reference: Windows PowerShell Cookbook 3rd Edition (Lee Holmes). Page 930.
I am not sure about powershell but if you prefer using python below is the way of doing it.
import re
data = open('file').read()
item = re.findall('.*totaldisksize="([\d,]+)">', data)
print(item[0])
Output
49,99

Text manipulation problem - how to replace text after a known value

I have a large text file containing filenames ending in .txt
Some of the rows of the file have unwanted text after the filename extension.
I am trying to find a way to search+replace or trim the whole file so that if a row is found with .txt, anything after this is simply removed. Example
C:\Test1.txt
C:\Test2.txtHelloWorld this is my
problem
C:\Test3.txt_____Annoying
stuff1234 .r
Desired result
C:\Test1.txt
C:\Test2.txt
C:\Test3.txt
I have tried with notepad++, or using batch/powershell, but got close, no cigar.
(Get-Content "D:\checkthese.txt") |
Foreach-Object {$_ -replace '.txt*', ".txt"} |
Set-Content "D:\CLEAN.txt"
My thinking here is if I replace anything (Wildcard*) after .txt then I would trim off what I need, but this doesnt work. I think I need to use regular expression, buy have the syntax wrong.
Simply change the * to a .*, like so:
(Get-Content "D:\checkthese.txt") |
Foreach-Object {$_ -replace '\.txt.*', ".txt"} |
Set-Content "D:\CLEAN.txt"
In regular expressions, * means "0 or more times", and in this case it'd act on the final t of .txt, so .txt* would only match .tx, .txt, .txtt, .txttt, etc...
., however, matches any character. This means, .* matches 0 or more of anything, which is what you want. Because of this, I also escaped the . in .txt, as it otherwise could break on filenames like: alovelytxtfile.txt, which would be trimmed to alovel.txt.
For more information, see:
Regex Tutorial - .
Regex Tutorial - *