How to parse through a string in perl to extract certain value? - regex

I have following string
> show box detail
2 boxes:
1) Box ID: 1
IP: 127.0.0.1
Interface: 1/1
Priority: 31
2) Box ID: 2
IP: 192.68.1.1
Interface: 1/2
Priority: 31
How to get BOX ID from above string in perl?
The number of boxes here can vary . So based on the number of boxes "n", how to extract box Ids if the show box detail can go upto n nodes in the same format ?

my #ids = $string =~ /Box ID: ([0-9]+)/g;
More restrictive:
my #ids = $string =~ /^[0-9]+\) Box ID: ([0-9]+)$/mg;

Related

PowerShell script to concatenate regex and variable

I want to write PowerShell script and regex to audit several network devices configuration files for compliance. Some devices are configured one management vlan while others have multiple different management vlans. Examples below
Config1:
VLAN Name Status Ports
1 default active
100 12_NET_MGMT_VLAN active Gi1/2
Config2:
VLAN Name Status Ports
1 default active
88 100_MGMT-VLLAN active Gi8/1
100 12_Net_MGMT_VLAN active
If I hard code the regex pattern like this $regex_pattern = "^\d{1,3}\s+.*MGMT.*", I got the corrected output as expected
Config1 12_NET_MGMT_VLAN
Config2 100_MGMT_VLAN
Config2 12_Net_MGMT_VLAN
Instead of hard-code the regex pattern, I want to use the Read-Host cmdlet and ask a user to enter the word "MGMT" and store it in a variable $Mgmt, then concatenate with a regex pattern to create a dynamic regex pattern, like this:
$Mgmt = Read-Host "Enter a word pattern to find a management vlan: "
For example, a user type in MGMT, and then I created a dynamic regex pattern as below:
$regex_pattern = "^\d{1,3}\s+.*"+$Mgmt+"_.*"
$regex_pattern = "^\d{1,3}\s+.*"+[regex]::escape($Mgmt)+".*"
None of the results came out correct
If anyone has a solution, please help. Thx
If we are to assume that VLAN names cannot contain spaces, you can use \S (non-space) as an anchor character. Using the subexpression operator $(), you can evaluate an expression within a string.
# Simulating a vlan config output
$Config = #'
VLAN Name Status Ports
1 default active
88 100_MGMT-VLLAN active Gi8/1
100 12_Net_MGMT_VLAN active
'# -split '\r?\n'
# Using value MGMT here when prompted
$Mgmt = Read-Host "Enter a word pattern to find a management vlan"
$regex = "\S*$([regex]::Escape($Mgmt))\S*"
[regex]::Matches($Config,$regex).Value
Output:
100_MGMT-VLLAN
12_Net_MGMT_VLAN
Note that simple variable references like $Mgmt will expand properly within surrounding double quotes, e.g. "My VLAN is $Mgmt".
You could take this in a different direction and create a custom object from your output. This would enable you to use filtering via Where-Object and member access (.Property) to retrieve target data. This again assumes values don't contain spaces.
# Simulating a vlan config output
$Config = #'
VLAN Name Status Ports
1 default active
88 100_MGMT-VLLAN active Gi8/1
100 12_Net_MGMT_VLAN active
'# -split '\r?\n'
$Mgmt = Read-Host "Enter VLAN Name"
# Replacing consecutive spaces with , first
$ConfigObjs = $Config -replace '\s+',',' | ConvertFrom-Csv
$ConfigObjs
Output:
VLAN Name Status Ports
---- ---- ------ -----
1 default active
88 100_MGMT-VLLAN active Gi8/1
100 12_Net_MGMT_VLAN active
Now you have properties that can be referenced and access to other comparison operators so you don't always need to use regex.
($ConfigObjs | Where Name -like "*$Mgmt*").Name
Output:
100_MGMT-VLLAN
12_Net_MGMT_VLAN

Columns not aligning when using sed \t

I have an input file named rectangle.txt and my aim is to re-format the contents and output it to rectangle_f.txt using only sed commands.
I have managed to format it the content from:
Name,Length,Width,Area,Owner
Rec1,9,9,81,Em
Rec2,2,2,4,Soph
(etc...)
To look like this:
Name: Rec1 Length: 9 Width: 9 Area: 81 Owner: Em
Name: Rec2 Length: 2 Width: 2 Area: 4 Owner: Soph
My issue is when it comes to names with double digits (example: Sq10) the tab that I have inserted between Name and Length is one space too long. For example:
Name: Rec9 Length: 6 Width: 6 Area: 36 Owner: Jay
Name: Rec10 Length: 7 Width: 7 Area: 49 Owner: Chris
Does anyone know the best way to make all columns line up?
The code I currently have is:
sed -e "1d" \
-e 's/^/Name: /;s/,/ \t\tHeight: /;s/,/ \t\tWidth: /;s/,/\t\tArea: /;s/,/ \t\tColour: /' rectangle.txt > rectangle_f.txt
I need it to apply only to Rec10 to Rec20 (inclusive) which also are code lines 10-20.
Any help would be great!
Have you tried:
sed 1d file.txt | sed -E 's/([^,]+),([^,]+),([^,]+),([^,]+),([^,]+)/Name: \1\tLength: \2\tWidth: \3\tArea: \4\tOwner: \5\t/'

How do I retrieve values from successive lines in perl?

I have this data below,called data.txt, I want to retrieve four columns from this data. First, I want to retrieve degradome category, then p-value, then the text before and after Query:. So the result should look like this(showing the first row only):
Degardome Category: 3 Degradome p-value: 0.0195958324320822 3' UGACGUUUCAGUUCCCAGUAU 5' Seq_3694_200
data.txt:
5' CCGGUAAGGUUAUGGGUCAUG 3' Transcript: Supercontig_2.8_1446328:1451-1471 Slice Site:1462
|o||o||o| |||||||o
3' UGACGUUUCAGUUCCCAGUAU 5' Query: Seq_3694_200
SiteID: Supercontig_2.8_1446328:1462
MFE of perfect match: -36.10
MFE of this site: -23.60
MFEratio: 0.653739612188366
Allen et al. score: 7.5
Paired Regions (query5'-query3',transcript3'-transcript5')
1-8,1471-1464
10-18,1462-1454
Unpaired Regions (query5'-query3',transcript3'-transcript5')
9-9,1463-1463 SIL: Symmetric internal loop
19-21,1453-1451 UP3: Unpaired region at 3' of query
Degradome data file: /media/owner/newdrive/phasing/degradome/_degradome.20171210/bbduk_trimmed/merged_HV2.fasta_dd.txt
Degardome Category: 3
Degradome p-value: 0.0195958324320822
T-Plot file: T-plots-IGR/Seq_3694_200_Supercontig_2.8_1446328_1462_TPlot.pdf
Position Reads Category
1462 4 3 <<<<<<<<<<
2949 7 3
4179 517 0
---------------------------------------------------
---------------------------------------------------
5' GGUGAGGAGGGGGGUUUG-GUC 3' Transcript: Supercontig_2.8_1511075:1311-1331 Slice Site:1323
| |||||oo||| |||o |||
3' AC-CUCCUUUCCCGAAAUACAG 5' Query: Seq_2299_664
SiteID: Supercontig_2.8_1511075:1323
MFE of perfect match: -37.90
MFE of this site: -25.30
MFEratio: 0.66754617414248
Allen et al. score: 8
Paired Regions (query5'-query3',transcript3'-transcript5')
1-3,1331-1329
5-8,1328-1325
10-19,1323-1314
20-20,1312-1312
Unpaired Regions (query5'-query3',transcript3'-transcript5')
4-4,x-x BULq: Bulge on query side
9-9,1324-1324 SIL: Symmetric internal loop
x-x,1313-1313 BULt: Bulge on transcript side
21-21,1311-1311 UP3: Unpaired region at 3' of query
Degradome data file: /media/owner/newdrive/phasing/degradome/_degradome.20171210/bbduk_trimmed/merged_HV2.fasta_dd.txt
Degardome Category: 4
Degradome p-value: 0.013385336399181
I tried to do this for before and after values, then I keep getting errors. Sorry I am new to perl and would really appreciate your help. Here are some of the codes I tried:
#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
use Modern::Perl;
my word = "Query:";
my $filename = $ARGV[0];
open(INPUT_FILE, $filename);
while (<<>>) {
chomp;
my ($before, $after) = m/(.+)(?:\t\Q$word\E:\t)(.+)/i;
say "word: $word\tbefore: $before\tafter: $after";
}
Since you need straight pieces of data from each section, and both sections and data come clearly demarcated, the only question is of what data structure to use. Given that you want mere lines with values collected from each section a simple array should be fine.
It is known that the phrases of interest, Query: then Degardome Category: N then p-value, are unique to the context and places shown in the sample.
use warnings;
use strict;
use feature 'say';
my $file = shift || die "Usage $0 file\n";
open my $fh, '<', $file or die "Can't open $file: $!";
my (#res, #query, $category, $pvalue);
while (<$fh>) {
next if not /\S/;
if (/(.*?)\s+Query:\s+(.*)/) {
#query = ($1, $2);
next;
}
if (/^\s*(Degardome Category:\s+[0-9]+)/) {
$category = $1;
}
elsif (/^\s*(Degradome p-value:\s+[0-9.]+)/) {
$pvalue = $1;
push #res, [$category, $pvalue, #query];
}
}
say "#$_" for #res;
The end of a section is detected with the p-value: line, at which point we add to the #res an arrayref with all needed values captured up to that point.
The regex throughout depends on properties of data seen in the sample. Please review and adjust if some of my assumptions aren't right.
Details can also be pried from data more precisely, even by simply adding capture groups to the regexes above (and saving those captures into additional data structures).

Graylog regex search with numbers in text

I use graylog 2.0 (http://docs.graylog.org/en/2.0/pages/queries.html) and it's super useful.
I want to refine my full_message search.
Currently I'm:
- searching graylog for all full_message occurrences of the start of the string
- I then export this to excel
- Split the text (text to columns)
- Apply an autofilter
- Filter for any times > 20
search pattern:
full_message: "Running queue with*"
search text:
Network Queue: Running queue with id: dd82c225-fab7-44ce-9618-67d1ef332a03 and 1 items
Network Queue: Running queue with id: dd82c225-fab7-44ce-9618-67d1ef332a03 and 5 items
Network Queue: Running queue with id: dd82c225-fab7-44ce-9618-67d1ef332a03 and 25 items
Network Queue: Running queue with id: dd82c225-fab7-44ce-9618-67d1ef332a03 and 200 items
I'm wondering if a better reg search could just list any reccord with items > 20.
e.g. the search string would be
full_message: "Running queue with [insert better regex here]"
Thanks
You can use the pattern
Running queue with id: \S+ and (?:\d{3,}|[3-9]\d|2[1-9])
The final group there allows for either:
\d{3,} Any number with three or more digits, or
[3-9]\d Any number 30-99, or
2[1-9] Any number 21-29
https://regex101.com/r/ctLvQD/1

Sorting logs using regex?

I'm trying to figure out how to sort logs for example...
User: test
Level: user
Domain: localhost
Time: 12pm
Blah: INFO
Date: 07-12-2016
Ip: 127.0.0.1
I would like the output text to be this also there is tab spaces.
User:Level:Domain:Time:Blah:Date:IP
If i get your question right, you're talking not about sorting, but about parsing. You have log strings which you want to convert to another format. The regex to match your log string would be
(?P<User>[^:]+):(?P<Level>[^:]+):(?P<Domain>[^:]+):(?P<Time>[^:]+):(?P<Blah>[^:]+):(?P<Date>[^:]+):(?P<IP>[^:]+)
However, since you have so many groups, it could be done much more efficiently, here's an example in python
import re
logString = "User:Level:Domain:Time:Blah:Date:IP"
logGroups = ["User", "Level", "Domain", "Time", "Blah", "Date", "IP"]
reLogGroups = "(?P<"+">[^:]+):(?P<".join(logGroups)+">[^:]+)"
matchLogGroups = re.search(reLogGroups,logString)
if matchLogGroups:
counter = 1
for logGroup in logGroups:
print(str(counter)+". " + logGroup + ": " + matchLogGroups.group(logGroup) + "\n")
counter += 1
The output is
1. User: User
2. Level: Level
3. Domain: Domain
4. Time: Time
5. Blah: Blah
6. Date: Date
7. IP: IP