AWK change field separator multiple times - regex
I have the following sample code below; for ease of testing I have combined the text of a few files into one. Usually this script would use the find command to filter through each subdirectory looking for versions.tf and run AWK on each one.
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "> 2.0.0"
}
}
required_version = ">= 0.13"
}
terraform {
required_providers {
luminate = {
source = "terraform.example.com/nbs/luminate"
version = "1.0.8"
}
azurerm = {
source = "hashicorp/azurerm"
version = "2.40.0"
}
random = {
source = "hashicorp/random"
}
template = {
source = "hashicorp/template"
}
}
required_version = ">= 0.13"
}
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">=2.38.0, < 3.0.0"
}
luminate = {
source = "terraform.example.com/nbs/luminate"
version = "1.0.8"
}
random = {
source = "hashicorp/random"
version = "3.0.0"
}
null = {
source = "hashicorp/null"
version = "3.0.0"
}
}
required_version = ">= 0.13"
}
My original AWK script looked like this:
/^[[:space:]]{2,2}required_providers/,/^[[:space:]]{2,2}}$/ {
gsub("\"", "")
if ($0 ~ /[[:alpha:]][[:space:]]=[[:space:]]\{/) {
pr = $1
}
if ($0 ~ /version[[:space:]]=[[:space:]]/) {
printf("%s %s\n", pr, $3)
}
}
Which would print out the following:
azurerm > # note this
luminate 1.0.8
azurerm 2.40.0
azurerm >=2.38.0, # note this
luminate 1.0.8
random 3.0.0
null 3.0.0
When people submitted code to the repo the versions line would normally not contain spaces inbetween the " and I would be fine, however this is tending not to be the case lately. So therefore my script messes up on the two lines noted above.
I noted in a book that I've been reading where I can change the Field Separator multiple times in a script (https://www.packtpub.com/product/learning-awk-programming/9781788391030):
Now, to switch between two different FS, we can perform the following:
$ vi fs1.awk
{
if ($1 == "#entry")
{ FS=":"; }
else if ($1 == "#exit")
{ FS=" "; }
else
{ print $2 }
}
I have tried this in my script, but it doesn't work. I can only assume that it's because I'm trying to perform the switch in nested functions?
/^[[:space:]]{2,2}required_providers/,/^[[:space:]]{2,2}}$/ {
if ($0 ~ /[[:alpha:]][[:space:]]=[[:space:]]\{/) {
FS = " "
pr = $1
}
if ($0 ~ /version[[:space:]]=[[:space:]]/) {
FS = "\""
printf("%s %s\n", pr, $2)
}
}
Which outputs like:
azurerm =
luminate =
azurerm =
azurerm =
luminate =
random =
null =
Can anyone suggest a fix/workaround for capturing/printing the output so it looks like:
azurerm > 2.0.0
luminate 1.0.8
azurerm 2.40.0
azurerm >=2.38.0, < 3.0.0
luminate 1.0.8
random 3.0.0
null 3.0.0
A much simpler solution is to just normalize the value you are pulling out. You are using a regex already; just stretch it a little bit further.
/^[[:space:]]{2}required_providers/,/^[[:space:]]{2}}$/ {
gsub("\"", "")
if ($0 ~ /[[:alpha:]][[:space:]]=[[:space:]]\{/) {
pr = $1
}
if ($0 ~ /version[[:space:]]*[<>=]+[[:space:]]*/) {
ver = $0;
sub(/^[[:space:]]*version[[:space:]]*(=[[:space:]]*)?/, "", ver);
print pr, ver
}
}
Tangentially, notice how I relaxed the whitespace requirements, and replaced {2,2} with the equivalent but more succinct {2}.
Changing FS does not have immediate effect, consider that if file.txt content is
1-2-3
4-5-6
then
awk '(NR==1){FS="-"}{print NF}' file.txt
output
1
3
As you can see new FS was applied starting from next line. If you need to split in current line like FS would do consider using split function, for example for same file input as above
awk '{split($0,arr,"-");print arr[1],arr[2],arr[3]}' file.txt
output
1 2 3
4 5 6
(tested in gawk 4.2.1)
With your shown samples only, could you please try following. Written and tested in GNU awk.
awk '
!NF{
found1=found2=0
val=""
}
/required_providers/{
found1=1
next
}
found1 && /^[[:space:]]+[[:alpha:]]+ = {/{
sub(/^ +/,"",$1)
val=$1
found2=1
next
} found2 && /version/{
match($0,/".*"/)
print val,substr($0,RSTART+1,RLENGTH-2)
found2=0
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
!NF{ ##checking condition if NF is NULL then do following.
found1=found2=0 ##Setting found1 and found2 to 0 here.
val="" ##Nullifying val here.
}
/required_providers/{ ##Checking if line has required_providers then do following.
found1=1 ##Setting found1 to 1 here.
next ##next will skip all further statements from here.
}
found1 && /^[[:space:]]+[[:alpha:]]+ = {/{ ##Checking if found1 is set and line has spaces and alphabets followed by = { then do following.
sub(/^ +/,"",$1) ##Substituting initial spaces with NULL here in first field.
val=$1 ##Setting $1 to val here.
found2=1 ##Setting found2 here.
next ##next will skip all further statements from here.
} found2 && /version/{ ##Checking condition if found2 is set and line has version in it.
match($0,/".*"/) ##Using match to match regex from " to till " here.
print val,substr($0,RSTART+1,RLENGTH-2) ##Printing val and sub string of matched values.
found2=0 ##Setting found2 to 0 here.
}
' Input_file ##Mentioning Input_file name here.
Related
conditional search and replace differently in sed
This content is in text.txt for testing, I am going to rewrite bundle identifier in pipeline. Current Behavior: All PRODUCT_BUNDLE_IDENTIFIER will be replaced to "abc" if execute this command. sed -i -e "s/PRODUCT_BUNDLE_IDENTIFIER =.*/PRODUCT_BUNDLE_IDENTIFIER = abc;/g" text.txt Expectation: However, I would like to change PRODUCT_BUNDLE_IDENTIFIER separately, Eg. from BundleA to BundleZ and from BundleB to BundleX etc. How can i use sed command to match regex and replace them with different values? Content in text.txt: { 2D02E40000B4A5E006451C7 /* Debug */ = { isa = XCBuildConfiguration; buildSettings = { PRODUCT_BUNDLE_IDENTIFIER = "BundleA"; }; name = Debug; }; 2D02E4981E0000006451C7 /* Release */ = { isa = XCBuildConfiguration; buildSettings = { PRODUCT_BUNDLE_IDENTIFIER = "BundleB"; }; name = Release; }; }
regex on tnsnames.ora
my tnsnames.ora file has 2 formats : db_cl = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = a55)(PORT = 1522)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = cl) ) ) dbcd = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = a66 )(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = cd) ) ) myx5= (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = v55)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = x5) ) ) i want to get the hostname of a specific service_name or sid. in some cases it is sid and in some cases it is service_name. what should i search with grep in order to get the hostname? In this example i want to get the string "host_name". ****UPDATE**** I ALSO NEED THE db_name2, if someone can help
If perl is available try this command: perl -nle 'BEGIN{$service = shift} $host = $1 if /HOST\s*=\s*([^\s\)]+)/i; print $host if /\((SID|SERVICE_NAME)\s*=\s*$service\)/; ' blabla tnsnames.ora It stores the last host value found in HOST = ... and prints it when encounters (**SID/SERVICE_NAME** = blabla)
#drf: try: awk '/HOST/{sub(/).*/,"",$(NF-2));print $(NF-2)}' Input_file Simply looking for string HOST for each line then substituting the ).* from the 2nd last filed of the line where awk matches the string HOST in it, it should give you host_name then. EDIT: For looking for a specific SID or service name try: awk '/HOST/{sub(/).*/,"",$(NF-2));HOST=$(NF-2);next} /SID/{print HOST}' Change SID with sid or service name which you want to search and it should work then. EDIT2: awk '/db_name/{sub(/=/,"",$1);DB=$1}/HOST/{sub(/).*/,"",$(NF-2));HOST=$(NF-2);next} /chumma/{print DB ORS HOST}' EDIT3: awk '{gsub(/\)|\(/,"");;for(i=1;i<=NF;i++){if($i=="HOST"){host=$(i+1)};if($NF=="cd"){val="DB_NAME= "$1", HOST_NAME= "host",SID/SERVICE_NAME= "$NF}};if(val){print val;val=""}}' Input_file OR(non-one liner form of solution too as follows): awk '{gsub(/\)|\(/,""); for(i=1;i<=NF;i++){ if($i=="HOST"){ host=$(i+1) }; if($NF=="cd") { val="DB_NAME= "$1", HOST_NAME= "host",SID/SERVICE_NAME= "$NF } }; if(val) { print val; val="" } } ' Input_file You could put "cd"''s place another service or ssid which you want to search too in above code.
Parsing Microsoft Office 2013 MRU Lists in Registry using Perl
I am currently trying to parse the keys in a Windows 7 registry containing the MRU lists for Microsoft Office 2013. However when I attempt to run the Perl script in RegRipper it says the plugin was not successfully run. Im not sure if there is a syntax error in my code or if it is unable to parse the registry as I have it written. The biggest problem is that one of the keys is named after the user's LiveId (it appear as LiveId_XXXXXXX) and this changes from user to user so i would like this plugin to work no matter what the user's LiveId is. Thanks! my $reg = Parse::Win32Registry->new($ntuser); my $root_key = $reg->get_root_key; # ::rptMsg("officedocs2013_File_MRU v.".$VERSION); # 20110830 [fpi] - redundant my $tag = 0; my $key_path = "Software\\Microsoft\\Office\\15.0"; if (defined($root_key->get_subkey($key_path))) { $tag = 1; } if ($tag) { ::rptMsg("MSOffice version 2013 located."); my $key_path = "Software\\Microsoft\\Office\\15.0"; my $of_key = $root_key->get_subkey($key_path); if ($of_key) { # Attempt to retrieve Word docs my $word_mru_key_path = 'Software\\Microsoft\\Office\\15.0\\Word\\User MRU'; my $word_mru_key = $of_key->get_subkey($word_mru_key_path); foreach ($word_mru_key->get_list_of_subkeys()) { if ($key->as_string() =~ /LiveId_\w+/) { $word = join($key->as_string(),'\\File MRU'); ::rptMsg($key_path."\\".$word); ::rptMsg("LastWrite Time ".gmtime($word_key->get_timestamp())." (UTC)"); my #vals = $word_key->get_list_of_values(); if (scalar(#vals) > 0) { my %files # Retrieve values and load into a hash for sorting foreach my $v (#vals) { my $val = $v->get_name(); if ($val eq "Max Display") { next; } my $data = getWinTS($v->get_data()); my $tag = (split(/Item/,$val))[1]; $files{$tag} = $val.":".$data; } # Print sorted content to report file foreach my $u (sort {$a <=> $b} keys %files) { my ($val,$data) = split(/:/,$files{$u},2); ::rptMsg(" ".$val." -> ".$data); } } else { ::rptMsg($key_path.$word." has no values."); } else { ::rptMsg($key_path.$word." not found."); } ::rptMsg(""); } }
The regex LiveId_(\w+) will grab the string after LiveId_ and you can reference it with a \1 like this
Regex extract multiple line from file
How can I extract two fields from a given file "named.conf"? I want the fields 'zone' and 'file'. zone "example.com" IN { type master; file "db.example.com"; allow-query { any; }; allow-update { none; }; allow-transfer { 10.101.100.2; }; };
This might work for you: sed '/^\s*\(zone\|file\) "\([^"]*\)".*/,//!d;//!d;s//\2/' named.conf example.com db.example.com
Try this quick & dirty (GNU) AWK program (save it as zone-file.awk): /^zone/, /^}/ { if (NF == 4) { zone = $2 next } if (NF == 2 && $1 == "file") { sub(";$", "", $2) print zone, $2 } } It works for me as follows: $ awk -f zone-file.awk /etc/named.conf "." "named.ca" "localhost" "localhost.zone" [...]
Text Pattern Processing in paragraph with unix linux utilities
I have a file with the following pattern (please note this is a file generated using sed, awk, grep etc processing). The part of file input is as follows. filename1, BASE=a/b/c CONFIG=$BASE/d propertiesfile1=$CONFIG/e.properties EndOfFilefilename1 filename2, BASE=f/g/h CONFIG=$BASE/i propertiesfile1=$CONFIG/j.properties EndOfFilefilename2 filename3, BASE=k/l/m CONFIG=$BASE/n propertiesfile1=$CONFIG/o.properties EndOfFilefilename3 I want the output like filename1,a/b/c/d/e.properties, filename2,f/g/h/i/j.properties, filename3, k/l/m/n/o.properties, I could not find a solution with sed or awk or grep. So I ams tuck. Please do let me know if you know the solution with these unix utilities or any other language, platform. Regards, Suhaas
Assuming you generated the original file, and therefore it is safe to execute it as a script: sed -e 's/^.*,/FILE=&/' \ -e 's/^.*=\$CONFIG/PROPFILE=$CONFIG/' \ -e 's/^EndOfFile.*/echo $FILE $PROPFILE/' < yourInputFile | sh This converts each section of your file into the form: FILE=filename1, BASE=a/b/c CONFIG=$BASE/d PROPFILE=$CONFIG/e.properties echo $FILE $PROPFILE ... and then sends it into a shell for processing. Line-by-line explanation: Line 1: Searches for the lines ending in a comma (the filenames), and sets FILE to the name. Line 2: Searches for lines that set the properties file, and renames the variable to PROPFILE. Line 3: Replaces the EndOfFile lines with a command to echo the file name and the properties file, then pipes it into a shell.
This is an excellent use case for structural regular expressions, which have been implemented as a python library, amongst other places. Here's an article which descibes how to emulate SREs in Perl.
And here is an awk script to process that input and generate what you want: BEGIN { FS="=" state = 0; base = ""; config = ""; prop = ""; filename = ""; dbg = 0; } /^BASE=/ { if (dbg) { print "BASE"; print $0; } if (state != 1) { print "Error base!"; exit 1; } state++; base = $2; if (dbg > 1) printf ("BASE = %s\n", base); } /^CONFIG=/ { if (dbg) { print "CONFIG"; print $0; } if (state != 2) { print "Error config!"; exit 1; } state++; config = $2; sub (/\$BASE/, base, config); if (dbg > 1) printf ("CONFIG = %s\n", config); } /^propertiesfile1=/ { if (dbg) { print "PROP"; print $0; } if (state != 3) { print "Error pF!"; exit 1; } state++; prop = $2; sub (/\$CONFIG/, config, prop); } /^EndOfFile/ { if (dbg) { print "EOF"; print $0; } if (state != 4) { print "Error EOF!"; print state; exit 1; } state = 0; printf ("%s%s,\n", filename, prop); } /,$/{ if (dbg) { print "FILENAME"; print $0; } if (state != 0) { print "Error filename!"; print state; exit 1; } state++; filename = $1; }
gawk gawk -vRS= 'BEGIN{FS="BASE[=]?|CONFIG|\n"} { s=$1 for(i=1;i<=NF;i++){ if($i~/\// ){ s=s $i } } print s s="" }' file output $ more file filename1, BASE=a/b/c CONFIG=$BASE/d propertiesfile1=$CONFIG/e.properties EndOfFilefilename1 filename2, BASE=f/g/h CONFIG=$BASE/i propertiesfile1=$CONFIG/j.properties EndOfFilefilename2 filename3, BASE=k/l/m CONFIG=$BASE/n propertiesfile1=$CONFIG/o.properties EndOfFilefilename3 $ ./shell.sh filename1,a/b/c/d/e.properties filename2,f/g/h/i/j.properties filename3,k/l/m/n/o.properties
A perl script that does what you want would be something like (note this is untested) while (<>) { $base = $1 if (m/BASE=(.+)/); $config = $1 if (m/CONFIG=(.+)/); if (m/propertiesfile1=(.+)/) { $props = $1; $props =~ m/\$CONFIG/$config/; $props =~ m/\$BASE/$base/; print $ARGV . ", " . $props . "\n"; } } you give the script the filenames as arguments.
Multi-steps but it works! cat yourInputFile | egrep ',|\/' | \ sed -e "s/^.*=//g" -e "s/\$.*\(\/.*\)/\1/g" | \ awk '{if($0 ~ "properties") print $0; else printf $0}' The egrep grabs the lines containing a "," or a "/" and so eliminates the last line: BASE=a/b/c CONFIG=$BASE/d propertiesfile1=$CONFIG/e.properties The sed reduces the output to: filename1, a/b/c /d /e.properties The awk portion reassembles the line to: filename1,a/b/c/d/e.properties