Extract data from .log file with Regex - regex

I'm trying to extract data using Regex positive lookbehind. I have created a .ps1 file with the following content:
$input_path = ‘input.log’
$output_file = ‘Output.txt’
$regex = ‘(?<= "name": ")(.*)(?=",)|(?<= "fullname": ")(.*)(?=",)|(?<=Start identity token validation\r\n)(.*)(?=ids: Token validation success)|(?<= "ClientName": ")(.*)(?=",\r\n "ValidateLifetime": false,)’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } >$output_file
The input file looks like this:
08:15.27.47-922: T= 11 ids: Start end session request
08:15.27.47-922: T= 11 ids: Start end session request validation
08:15.27.47-922: T= 11 ids: Start identity token validation
08:15.27.47-922: T= 11 ids: Token validation success
{
"ClientId": "te_triouser",
"ClientName": "TE Trio User",
"ValidateLifetime": false,
"Claims": {
"iss": "http://sv-trio17.adm.linkoping.se:34000/core/",
"aud": "te_triouser",
"exp": "1552054900",
"nbf": "1552054600",
"nonce": "f1ae9044-25f9-4e7f-b39f-bd7bdcb9dc8d",
"iat": "1552054600",
"at_hash": "Wv_7nNe42gUP945FO4p0Wg",
"sid": "9870230d92cb741a8674313dd11ae325",
"sub": "23223",
"auth_time": "1551960154",
"idp": "tecs",
"name": "tele2",
"canLaunchAdmin": "1",
"isLockedToCustomerGroup": "0",
"customerGroupId": "1",
"fullname": "Tele2 Servicekonto Test",
"tokenIdentifier": "2Ljta5ZEovccNlab9QXb8MPXOqaBfR6eyKst/Dc4bF4=",
"tokenSequence": "bMKEXP9urPigRDUguJjvug==",
"tokenChecksum": "NINN0DDZpx7zTlxHqCb/8fLTrsyB131mWoA+7IFjGhAV303///kKRGQDuAE6irEYiCCesje2a4z47qvhEX22og==",
"idpsrv_lang": "sv-SE",
"CD_UserInfo": "23223 U2 C1",
"amr": "optional"
}
}
If i run the regex through http://regexstorm.net/tester i get the right matches. But when i run my script with powershell on my computer I dont get the matches where I have \r\n in the regex question. I only get the matches from the first two regex questions.

I agree with #AdminOfThings to use Get-Content with the -raw parameter.
also don't use typographic quotes in scripts.
If the number of leading spaces aren't really fixed replace with one space and + or * quantifier.
make the \r optional => \r?.
A minimal complete verifiable example should also include your expected output.
EDIT changed Regex to be better readable
The following script
## Q:\Test\2019\03\22\SO_55298614.ps1
$input_path = 'input.log'
$output_file = 'Output.txt'
$regexes = ('(?<= *"(full)?name": ")(.*)(?=",)',
'(?<=Start identity token validation\r?\n)(.*)(?=ids: Token validation success)',
'(?<= *"ClientName": ")(.*)(?=",\r?\n *"ValidateLifetime": false,)')
$regex = [RegEx]($regexes -join'|')
Get-Content $input_path -Raw | Select-String -pattern $regex -AllMatches |
ForEach-Object { $_.Matches.Value }
yields this sample output:
> Q:\Test\2019\03\22\SO_55298614.ps1
08:15.27.47-922: T= 11
TE Trio User
tele2
Tele2 Servicekonto Test

Related

How to use powerShell and regular expressions to parse a text file

I am new to powerShell and need to input a text file, parse it to extract the data we need and write the result to a .csv file. However, at this point I still am unable to parse the file and am totally confused about which PS commands to use and how to incorporate regular expressions. While I could write out all of the ways I've tried to get this to work I think it would be more beneficial to just ask for help and then ask questions on anything I don't fully understand. FYI: we're running Win10 and my only 2 scripting options are batch or PowerShell.
We have a JSON file that was formatted by notepad++ and looks like this:
"issue": [{
"field": [{
"name": "someName",
"value": [],
"values": []
}],
"field": [{
"name": "numberinproject",
"value": ["81"],
"values": ["81"]
}],
"field": [{
"name": "summary",
"value": ["This is a summary for 81."],
"values": ["This is a summary for 81."]
}],
"comment":[{
"text": "someText for 81 - 01",
"markdown":false,
"created":0123456789101,
"updated":null,
"Author":"first.last01",
"permitted group":null
},{
"text": "someText for 81 - 02",
"markdown":false,
"created":0123456789102,
"updated":null,
"Author":"first.last02",
"permitted group":null
},{
"text": "someText for 81 - 03",
"markdown":false,
"created":0123456789103,
"updated":null,
"Author":"first.last03",
"permitted group":null
}],
"field": [{
"name": "someNameTwo",
"value": [],
"values": []
}],
"field": [{
"name": "numberinproject",
"value": ["83"],
"values": ["83"]
}],
"field": [{
"name": "summary",
"value": ["This is a summary for 83."],
"values": ["This is a summary for 83."]
}],
"comment":[]
}
]
What I am attempting to do is extract the numberinproject, summary and Comment text, created and Author.
Notice that there could be Zero to multiple comments per project number. The comment.created field is a 13 digit epoch number that has to be converted into mm/dd/yyyy hh:mm:ss AM/PM
I had hoped to export this data into a .csv file but at this time would be happy just getting the data parsed out of the file.
Thanks for whatever feedback you can give.
===================================================
By request: Here are some of the things I tried and I apologise for this being such a mess. Since the "json" file was not in a format that convertfrom-json could use I assumed the file was actually text and that is where this starts.
What I've picked up has been from Searching on the web. If anyone can suggest a good article, please let me know and I will read it.
Set-Variable -Name "inputFile" -Value "inputFile.txt"
Set-Variable -Name "outputTXTFile" -Value "outputTXTFile.txt"
Set-Variable -Name "outputFile" -Value "outputFile.csv"
numberinProject = \"value\"\:\s\[\"\d+
summary = \"value\"\:\s\[\".+\"\],
comment - text = \"text\"\:\s\".+\",
comment - created = \"created\"\:\d{13}
comment - author = \"Author\"\:\"\w+\.\w+
## This actually worked. Though it grabbed the whole line, my plan was to then parse it a for a substring.
$results = Get-Content -Path $inputFile | Select-String -Pattern '"values": ' -CaseSensitive -SimpleMatch
# ------------------------------------------------
# However, If I tried using regex, the parse failed
$results = Get-Content -Path $inputFile | Select-String -Pattern \"values\"\:\s\[\"\d+ -CaseSensitive -SimpleMatch
# I also tried this
#$A = Get-ChildItem $inputFile | Select-String -Pattern '(<ID>\"value\"\:\s\[\"\d'
# $results | Export-CSV $outputFile -NoTypeInformation
$results | Out-File $outputTXTFile
# ---------------------------------------------------------
#I tried to output the file as a single string for manipulation - it didn't work
Get-Content -Path $inputFile) -join "`r`n" | Out-File $outputTXTFile
# I tried to use "patterns" to find the data but that didn't work
$issueIDPattern = "(<ID>\"value\"\:\s\[\"\d+)"
$summaryPattern = "\"value\"\:\s\[\".+\"\],"
$commentTextPattern = "\"text\"\:\s\".+\","
$commentDatePattern = "\"created\"\:\d{13}"
$commentAuthorPattern = "\"Author\"\:\"\w+\.\w+
Get-ChildItem $inputFile|
Select-String -Pattern $issueIDPattern |
Foreach-Object {
$ID = $_.Matches[0].Groups['ID'].Value
[PSCustomObject] #{
issueNum = $ID
}
}
### Also tried a variation of this
Get-Content C:\Path\To\File.txt) -join "`r`n" -Split "(?m)^(?=\S)" |
Where{$_} |
ForEach{
Clear-Variable commentauthor,commentcreated,commenttext,summary,numberinProject
$commentcreated = #()
$numberinProject = ($_ -split "`r`n")[0].trim()
Switch -regex ($_ -split "`r`n"){
"^\s+summary:" {$summary = ($_ -split ':',2)[-1].trim();Continue}
"^\s+.:\\" {$commentcreated += $_.trim();continue}
"^\s+commenttext" {$commenttext = [RegEx]::Matches($_,"(?<=commenttext installed from )(.+?)(?= \[)").value;continue}
}
[PSCustomObject]#{'numberinProject' = $numberinProject;'summary' = $summary; 'commenttext' = $commenttext; 'commentcreated' = $commentcreated}
}

PowerShell filer out invalid AD users using -filte {SamAccountName -eq $_} but with regex

I am trying to filter AD for user names based on computer names which contain the user name, like XXXXXX01BLOGGSJ (BLOGGSJenter code here is the user name in this example)
In order to extract the user name, I use this method:
"XXXXXX01BLOGGSJ" | %{($_ -split '\d+')[-1]}
The output is BLOGGSJ
However, I need to filter many computer names like this, a small percentage of which have invalid usernames in the machine name like "XXXXXX01RUBBISH"
In order to stop the inevitable errors from appearing I am trying to use the -filter {SamAccountName $_} method which works like this:
"BLOGGSJ", "RUBBISH" | % {Get-ADUser -Server domain.com -Filter{SamAccountName -eq $_ }} | select Name
But not when I attempt to do this, which is what I want to do:
“XXXXXX01BLOGGSJ”, “XXXXXX01BLOGGSJ” | % {Get-ADUser -Server domain.com -Filter{SamAccountName -eq "'($_ -split '\d+')[-1]'"}} | select Name
……or various permutations of that. So I am struggling with the syntax I think.
I know I can do this instead:
"XXXXXX01BLOGGSJ","XXXXXX01RUBBISH" | %{($_ -split '\d+')[-1]} | %{Get-ADUser -Server domain.com -Filter {SamAccountName -eq $_ }} | Select Name
but there is something else happening further down the pipe that requires me to do it in the way shown above.
Any help please.
Especially because you say something else is happening further down, I would suggest not trying to do all in a one-line code.
This should get you on your way:
"XXXXXX01BLOGGSJ","XXXXXX01RUBBISH" | ForEach-Object {
$name = ($_ -split '\d+')[-1]
$user = Get-ADUser -Server domain.com -Filter "SamAccountName -eq '$name'" -ErrorAction SilentlyContinue
if ($user) {
# a user with that SamAccountName was found
[PsCustomObject]#{
ComputerName = $_
SamAccountName = $user.SamAccountName
UserName = $user.Name
}
}
else {
# user not found
[PsCustomObject]#{
ComputerName = $_
SamAccountName = $name
UserName = "User Not found in AD"
}
}
}
Output:
ComputerName SamAccountName UserName
------------ -------------- --------
XXXXXX01BLOGGSJ bloggsj Joe Bloggs
XXXXXX01RUBBISH RUBBISH User Not found in AD

Parsing Rest query output

I am trying to parse the output of a rest api query of the form
$response = Invoke-RestMethod -Uri $uri -Headers $headers
$response.name | Select-String -Pattern ^role
returns an output similar to this below (elements separated by ::)
role::servicing2
role::collaboration::lei
role::commercial_lines::npds
role::nvp::windows::ucce_gold
role::oracle::linux::oracle_oid
role::splunk::splunk_enterprise::add_on
I need to read this output line by line and parse.
If there are just 2 elements eg. role::servicing2 ignore the line
If there are 3 elements, ignore the first element "role", prepend puppet_ to the second element and it becomes the project, the third element is the role (OS is unknown)
If there are 4 or more elements, ignore the first element "role", prepend puppet_ to the second element and it becomes the project, if the third element is "windows" or "linux" that is the OS, else OS is "unknown", and the last element \:\:'(\w+)'$ is the role.
Need an output in the form of an array or table or list in this format
(Don't necessarily need header)
Project OS Role
puppet_collaboration unknown lei
puppet_commercial_lines unknown npds
puppet_nvp windows ucce_gold
puppet_oracle linux oracle_oid
puppet_splunk unknown add_on
I have tried various regex expressions. Couldn't figure out the logic of walking this line by line and parsing appropriately into a list or array.
I think below code should do what you want:
$roles = #'
role::servicing2
role::collaboration::lei
role::commercial_lines::npds
role::nvp::windows::ucce_gold
role::oracle::linux::oracle_oid
role::splunk::splunk_enterprise::add_on
'# -split '\r?\n'
$result = $roles | ForEach-Object {
$parts = $_ -split '::'
switch ($parts.Count) {
2 { continue } # ignore this line
3 {
[PsCustomObject]#{
'Project' = 'puppet_{0}' -f $parts[1]
'OS' = 'unknown'
'Role' = $parts[2]
}
}
default {
[PsCustomObject]#{
'Project' = 'puppet_{0}' -f $parts[1]
'OS' = if ('windows', 'linux' -contains $parts[2]) {$parts[2]} else {'unknown'}
'Role' = $parts[-1]
}
}
}
}
# output on screen
$result
# output to CSV file
$result | Export-Csv -Path 'D:\roles.csv' -NoTypeInformation
For testing I have put the result of your $response.name | Select-String -Pattern ^role in a here-string.
Output:
Project OS Role
------- -- ----
puppet_collaboration unknown lei
puppet_commercial_lines unknown npds
puppet_nvp windows ucce_gold
puppet_oracle linux oracle_oid
puppet_splunk unknown add_on

Parse email body paragragh in Powershell

I am creating a script to parse outlook email body, so that I can get say an (ID number, date, name) after strings ID: xxxxxx Date: xxxxxx Name:xxxxx. I was looking around and could not fine anything that allows me to take the string after a match.
What I manage so far is to query for the email that was send by the specific users from outlook.
Add-Type -Assembly "Microsoft.Office.Interop.Outlook"
$Outlook = New-Object -ComObject Outlook.Application
$namespace = $Outlook.GetNameSpace("MAPI")
$inbox = $namespace.GetDefaultFolder([Microsoft.Office.Interop.Outlook.OlDefaultFolders]::olFolderInbox)
foreach ($items in $inbox.items){if (($items.to -like "*email*") -or ($items.cc -like "*email.add*")){$FindID = $items.body}}
Now that I have the email body in the for loop I am wondering how I can parse the content?
In between the paragraphs will be a text something like this
ID: xxxxxxxx
Name: xxxxxxxxx
Date Of Birth : xxxxxxxx
I did some testing on the below to see if I can add that into the for loop but it seem like I cannot break the paragraphs.
$FindID| ForEach-Object {if (($_ -match 'ID:') -and ($_ -match ' ')){$testID = ($_ -split 'ID: ')[1]}}
I get the following results which I cannot get just the ID.
Sample Result when i do $testID
xxxxxxxx
Name: xxxxxxxxx
Date Of Birth : xxxxxxxx
Regards,
xxxxx xxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
How do I get just the results I want? I am just struggling at that portion.
You'll need a Regular Expression with (named) capture groups to grep the values. See example on rexgex101.com.
Provdid $item.bodyis not html and a single string, this could work:
## Q:\Test\2018\07\24\SO_51492907.ps1
Add-Type -Assembly "Microsoft.Office.Interop.Outlook"
$Outlook = New-Object -ComObject Outlook.Application
$namespace = $Outlook.GetNameSpace("MAPI")
$inbox = $namespace.GetDefaultFolder(
[Microsoft.Office.Interop.Outlook.OlDefaultFolders]::olFolderInbox)
## see $RE on https://regex101.com/r/1B2rD1/1
$RE = [RegEx]'(?sm)ID:\s+(?<ID>.*?)$.*?Name:\s+(?<Name>.*?)$.*?Date Of Birth\s*:\s*(?<DOB>.*?)$.*'
$Data = ForEach ($item in $inbox.items){
if (($item.to -like "*email*") -or
($item.cc -like "*email.add*")){
if (($item.body -match $RE )){
[PSCustomObject]#{
ID = $Matches.ID
Name = $Matches.Name
DOB = $Matches.DOB
}
}
}
}
$Data
$Data | Export-CSv '.\data.csv' -NoTypeInformation
Sample output with above anonimized mail
> Q:\Test\2018\07\24\SO_51492907.ps1
ID Name DOB
-- ---- ---
xxxxxx... xxxxxxx... xxxxxx...
I don't have Outlook available at the moment, but i think this will work
Add-Type -Assembly "Microsoft.Office.Interop.Outlook"
$Outlook = New-Object -ComObject Outlook.Application
$namespace = $Outlook.GetNameSpace("MAPI")
$inbox = $namespace.GetDefaultFolder([Microsoft.Office.Interop.Outlook.OlDefaultFolders]::olFolderInbox)
$inbox.items | Where-Object { $_.To -like "*email*" -or $_.CC -like "*email.add*"} {
$body = $_.body
if ($body -match '(?s)ID\s*:\s*(?<id>.+)Name\s*:\s*(?<name>.+)Date Of Birth\s*:\s*(?<dob>\w+)') {
New-Object -TypeName PSObject -Property #{
'Subject' = $_.Subject
'Date Received' = ([datetime]$_.ReceivedTime).ToString()
'ID' = $matches['id']
'Name' = $matches['name']
'Date of Birth' = $matches['dob']
}
}
}

Select a particular type of word from a text file and load it in a Variable

I am trying to use a power shell script to read the contents of a file and pick a specific type of word from it. I need to load the word that is found as a variable which I intend to use further downstream.
This is how my input file looks like:
{
"AvailabilityZone": "ap-northeast-1b",
"VolumeType": "gp2",
"VolumeId": "vol-087238f9",
"State": "creating",
"Iops": 100,
"SnapshotId": "",
"CreateTime": "2016-09-15T12:17:27.952Z",
"Size": 10
}
The specific word I would like to pick is vol-xxxxxxxx.
I used this link to write my script
How to pass a variable in the select-string of powershell
This is how I am doing it:
$Filename = "c:\reports\volume.jason"
$regex = "^[vol-][a-z0-9]{8}$"
$newvolumeid=select-string -Pattern $regex -Path $filename > C:\Reports\newVolumeid.txt
$newVolumeid
When I run this script it runs but does not give any response. Seems somehow the output of select string is not loaded into the variable $newvolumeid.
Any idea how to resolve this? Or what I am missing?
PS: The post mentioned above is about 3 years old and doesn't work hence I am reposting.
You are trying to read a property of a JSON object. Instead of using regex, you can parse the JSON and select the property using:
Get-Content 'c:\reports\volume.jason' | ConvertFrom-Json | select -ExpandProperty VolumeId
Try this
$Inpath = "E:\tests\test.txt"
$INFile = Get-Content $Inpath
$NeedsTrimming = $INFile.Split(" ") | ForEach-Object {if ($_ -like '*vol-*'){$_}}
$FirstQuote = $NeedsTrimming.IndexOf('"')
$LastQuote = $NeedsTrimming.LastIndexOf('"')
$vol = $NeedsTrimming.Substring(($FirstQuote + 1),($LastQuote - 1))
$vol