awk, skip current rule upon sanity check - if-statement

How to skip current awk rule when its sanity check failed?
{
if (not_applicable) skip;
if (not_sanity_check2) skip;
if (not_sanity_check3) skip;
# the rest of the actions
}
IMHO, it's much cleaner to write code this way than,
{
if (!not_applicable) {
if (!not_sanity_check2) {
if (!not_sanity_check3) {
# the rest of the actions
}
}
}
}
1;
I need to skip the current rule because I have a catch all rule at the end.
UPDATE, the case I'm trying to solve.
There is multiple match point in a file that I want to match & alter, however, there's no other obvious sign for me to match what I want.
hmmm..., let me simplify it this way, I want to match & alter the first match and skip the rest of the matches and print them as-is.

As far as I understood your requirement, you are looking for if, else if here. Also you could use switch case available in newer version of gawk packages too.
Let's take an example of a Input_file here:
cat Input_file
9
29
Following is the awk code here:
awk -v var="10" '{if($0<var){print "Line " FNR " is less than var"} else if($0>var){print "Line " FNR " is greater than var"}}' Input_file
This will print as follows:
Line 1 is less than var
Line 2 isgreater than var
So if you see code carefully its checking:
First condition if current line is less than var then it will be executed in if block.
Second condition in else if block, if current line is greater than var then print it there.

I'm really not sure what you're trying to do but if I focus on just that last sentence in your question of I want to match & alter the first match and skip the rest of the matches and print them as-is. ... is this what you're trying to do?
{ s=1 }
s && /abc/ { $0="uvw"; s=0 }
s && /def/ { $0="xyz"; s=0 }
{ print }
e.g. to borrow #Ravinder's example:
$ cat Input_file
9
29
$ awk -v var='10' '
{ s=1 }
s && ($0<var) { $0="Line " FNR " is less than var"; s=0 }
s && ($0>var) { $0="Line " FNR " is greater than var"; s=0 }
{ print }
' Input_file
Line 1 is less than var
Line 2 is greater than var
I used the boolean flag variable name s for sane as you also mentioned something in your question about the conditions tested being sanity checks so each condition can be read as is the input sane so far and this next condition is true?.

Related

In awk, Divide values in to array and count then compare

I have a csv file in which column-2 has certain values with delimiter of "," and some values in column-3 with delimiter "|". Now I need to count the values in both columns and compare them. If both are equal, column-4 should print passed, if not is should print failed. I have written below awk script but not getting what I expected
cat /tmp/test.csv
awk -F '' 'BEGIN{ OFS=";"; print "sep=;\nresource;Required_packages;Installed_packages;Validation;"};
{
column=split($2,aray,",")
columns=split($3,aray,"|")
Count=${#column[#]}
Counts=${#column[#]}
if( Counts == Count)
print $1,$2,$3,"passed"
else
print $1,$2,$3,"failed";}'/tmp/test.csv
[![my csv][1]][1]
my csv file looks:
resource Required_Packages Installed_packages
--------------------------------------------------
Vm1 a,b,c,d a|b|c
vm2 a,b,c,d b|a
vm3 a,b,c,d c|b|a
my expected file:
resource Required_packages Installed_packages Validation
------------------------------------------------------------------
Vm1 a,b,c,d a|b|c Failed
vm2 a,b,c,d b|a Failed
vm3 a,b,c,d c|b|a|d Passed
you code doesn't match the input/output data (where are the dashed printed, etc) but
this code segment
column=split($2,aray,",")
columns=split($3,aray,"|")
Count=${#column[#]}
Counts=${#column[#]}
if( Counts == Count)
print $1,$2,$3,"passed"
else
print $1,$2,$3,"failed";
can be replaced with
print $1,$2,$3,(split($2,a,",")==split($3,a,"|")?"Passed":"Failed")
Also, just checking the counts may not be enough, I think you should be checking the matches as well.
Could you please try following, written and tested with shown samples in GNU awk.
awk '
FNR<=2{
print
next
}
{
num=split($2,array1,",")
num1=split($3,array2,"|")
for(i=1;i<=num;i++){
value[array1[i]]
}
for(k=1;k<=num1;k++){
if(array2[k] in value){ count++ }
}
if(count==num){ $(NF+1)="Passed" }
else { $(NF+1)="Failed" }
count=num=num1=""
delete value
}
1
' Input_file | column -t
Explanation: Adding detailed explanation for above solution.
awk ' ##Starting awk program from here.
FNR<=2{ ##Checking condition if line number is lesser or equal to 2 then do following.
print ##Printing current line here.
next ##next will skip all further statements from here.
}
{
num=split($2,array1,",") ##Splitting 2nd field into array named array1 with field separator of comma and num will have total number of elements of array1 in it.
num1=split($3,array2,"|") ##Splitting 3rd field into array named array2 with field separator of comma and num1 will have total number of elements of array2 in it.
for(i=1;i<=num;i++){ ##Starting a for loop from 1 to till value of num here.
value[array1[i]] ##Creating value which has key as value of array1 who has key as variable i in it.
}
for(k=1;k<=num1;k++){ ##Starting a for loop from from 1 to till value of num1 here.
if(array2[k] in value){ count++ } ##Checking condition if array2 with index k is present in value then increase variable of count here.
}
if(count==num){ $(NF+1)="Passed" } ##Checking condition if count equal to num then adding Passed to new last column of current line.
else { $(NF+1)="Failed" } ##Else adding Failed into nw last field of current line.
count=num=num1="" ##Nullify variables count, num and num1 here.
delete value
}
1 ##1 will print current line.
' Input_file | column -t ##Mentioning Input_file and passing its output to column command here.

AWK: dynamically change FS or RS

I cannot seem to get the trick to interchange the FS/RS variables dynamically, so that I get the following results from the input:
Input_file
header 1
header 2
{
something should not be removed
}
50
(
auto1
{
type good;
remove not useful;
}
auto2
{
type good;
keep useful;
}
auto3
{
type moderate;
remove not useful;
}
)
Output_file
header 1
header 2
{
something that should not be removed
}
50
(
auto1//good
{
type good;//good
}
auto2//good
{
type good;//good
keep useful;
}
auto3//moderate
{
type moderate;//moderate
}
)
The key things are:
There's no change is happening when the code-block {...} is not preceded by a autoX (X can be 1,2,3 etc.).
The changes should happen when autoX is followed by a codeblock {...}.
the value inside the codeblock & autoX is modified with the addition of \\good or //moderate, which needs to be read from the {...} itself.
the whole line should be removed from {...}, if it contains the phrase remove.
HINT: It might be something that can use regex and the idea explained here, with this particular example.
For now, I only have been able to meet the last requirement, with the following code:
awk ' {$1=="{"; FS=="}";} {$1!="}"; gsub("remove",""); print NR"\t\t"$0}' Input_file
Thanks in advance, for your skill & time, to tackle this problem with awk.
Here is my attempt to solve this problem:
awk '
FNR==NR{
if($0~/auto[0-9]+/){
found1=1
val=$0
next
}
if(found1 && $0 ~ /{/){
found2=1
next
}
if(found1 && found2 && $0 ~ /type/){
sub(/;/,"",$NF)
a[val]=$NF
next
}
if($0 ~ /}/){
found1=found2=val=""
}
next
}
found3 && /not useful/{
next
}
/}/{
found3=val1=""
}
found3 && /type/{
sub($NF,$NF"//"a[val1])
}
/auto[0-9]+/ && $0 in a{
print $0"//"a[$0]
found3=1
val1=$0
next
}
1
' Input_file Input_file
Explanation: Adding detailed explanation for above code here.
awk ' ##Starting awk program from here.
FNR==NR{ ##FNR==NR will be TRUE when first time Input_file is being read.
if($0~/auto[0-9]+/){ ##Check condition if a line is having auto string followed by digits then do following.
found1=1 ##Setting found1 to 1 which makes sure that the line with auto is FOUND to later logic.
val=$0 ##Storing current line value to variable val here.
next ##next will skip all further statements from here.
}
if(found1 && $0 ~ /{/){ ##Checking condition if found1 is SET and line has { in it then do following.
found2=1 ##Setting found2 value as 1 which tells program further that after auto { is also found now.
next ##next will skip all further statements from here.
}
if(found1 && found2 && $0 ~ /type/){ ##Checking condition if found1 and found2 are ET AND line has type in it then do following.
sub(/;/,"",$NF) ##Substituting semi colon in last field with NULL.
a[val]=$NF ##creating array a with variable var and its value is last column of current line.
next ##next will skip all further statements from here.
}
if($0 ~ /}/){ ##Checking if line has } in it then do following, which basically means previous block is getting closed here.
found1=found2=val="" ##Nullify all variables value found1, found2 and val here.
}
next ##next will skip all further statements from here.
}
/}/{ ##Statements from here will be executed when 2nd time Input_file is being read, checking if line has } here.
found3=val1="" ##Nullifying found3 and val1 variables here.
}
found3 && /type/{ ##Checking if found3 is SET and line has type keyword in it then do following.
sub($NF,$NF"//"a[val1]) ##Substituting last field value with last field and array a value with index val1 here.
}
/auto[0-9]+/ && $0 in a{ ##Searching string auto with digits and checking if current line is present in array a then do following.
print $0"//"a[$0] ##Printing current line // and value of array a with index $0.
found3=1 ##Setting found3 value to 1 here.
val1=$0 ##Setting current line value to val1 here.
next ##next will skip all further statements from here.
}
1 ##1 will print all edited/non0-edited lines here.
' Input_file Input_file ##Mentioning Input_file names here.
You can use two newlines as record separator and process each record which may contain one
autoX
{
...
...
}
block.
awk '
BEGIN{
RS="\n\n" # set record separator RS to two newlines
a["good"]; a["moderate"] # create array a with indices "good" and "moderate"
}
{
sub(/\n[ \t]+remove[^;]+;/, "") # remove line containing "remove xxx;"
for (i in a){ # loop array indices "good" and "moderate"
if (index($0, i)){ # if value exists in record
sub(i";", i";//"i) # add "//good" to "good;" or "//moderate" to "moderate;"
match($0, /(auto[0-9]+)/) # get pos. RSTART and length RLENGTH of "autoX"
if (RSTART){ # RSTART > 0 ?
# set prefix including "autox", "//value" and suffix
$0=substr($0, 1, RSTART+RLENGTH-1) "//"i substr($0, RSTART+RLENGTH)
}
break # stop looping (we already replaced "autoX")
}
}
printf "%s", (FNR==1 ? "" : RS)$0 # print modified line prefixed by RS if not the first line
}
' Input_file

Regex match scss function / mixin

I am trying to match a function or mixin used in an SCSS string so I may remove it but I am having a bit of trouble.
For those unfamiliar with SCSS this is an example of the things I am trying to match (from bootstrap 4).
#mixin _assert-ascending($map, $map-name) {
$prev-key: null;
$prev-num: null;
#each $key, $num in $map {
#if $prev-num == null {
// Do nothing
} #else if not comparable($prev-num, $num) {
#warn "Potentially invalid value for #{$map-name}: This map must be in ascending order, but key '#{$key}' has value #{$num} whose unit makes it incomparable to #{$prev-num}, the value of the previous key '#{$prev-key}' !";
} #else if $prev-num >= $num {
#warn "Invalid value for #{$map-name}: This map must be in ascending order, but key '#{$key}' has value #{$num} which isn't greater than #{$prev-num}, the value of the previous key '#{$prev-key}' !";
}
$prev-key: $key;
$prev-num: $num;
}
}
And a small function:
#function str-replace($string, $search, $replace: "") {
$index: str-index($string, $search);
#if $index {
#return str-slice($string, 1, $index - 1) + $replace + str-replace(str-slice($string, $index + str-length($search)), $search, $replace);
}
#return $string;
}
So far I have the following regex:
#(function|mixin)\s?[[:print:]]+\n?([^\}]+)
However it only matches to the first } that it finds which makes it fail, this is because it needs to find the last occurance of the closing curly brace.
My thoughts are that a regex capable of matching a function definition could be adapted but I can't find a good one using my Google foo!
Thanks in advance!
I would not recommend to use a regex for that, since a regex is not able to handle recursion, what you might need in that case.
For Instance:
#mixin test {
body {
}
}
Includes two »levels« of scope here ({{ }}), so your regex should be able to to count brackets as they open and close, to match the end of the mixin or function. But that is not possible with a regex.
This regex
/#mixin(.|\s)*\}/gm
will match the whole mixin, but if the input is like that:
#mixin foo { … }
body { … }
It will match everything up to the last } what includes the style definition for the body. That is because the regex cannot know which } closes the mixin.
Have a look at this answer, it explains more or less the same thing but based on matching html elements.
Instead you should use a parser, to parse the whole Stylesheet into syntax tree, than remove unneeded functions and than write it to string again.
In fact, like #philipp said, regex can't replace syntax analysis like compilers do.
But here is a sed command which is a little ugly but could make the trick :
sed -r -e ':a' -e 'N' -e '$!ba' -e 's/\n//g' -e 's/}\s*#(function|mixin)/}\n#\1/g' -e 's/^#(function|mixin)\s*str-replace(\s|\()+.*}$//gm' <your file>
-e ':a' -e 'N' -e '$!ba' -e 's/\n//g' : Read all file in a loop and remove the new line (See https://stackoverflow.com/a/1252191/7990687 for more information)
-e 's/}\s*#(function|mixin)/}\n#\1/g' : Make each #mixin or #function statement the start of a new line, and the preceding } the last character of the previous line
's/^#(function|mixin)\s*str-replace(\s|\()+.*}$//gm' : Remove the line corresponding to the #function str-replace or #mixin str-replace declaration
But it will result in an output that will loose indentation, so you will have to reindent it after that.
I tried it on a file where I copy/paste multiple times the sample code you provided, so you will have to try it on your file because there could be cases where the regex will match more element than wanted. If it is the case, provide us a test file to try to resolve these issues.
After much headache here is the answer to my question!
The source needs to be split line by line and read, maintining a count of the open / closed braces to determine when the index is 0.
$pattern = '/(?<remove>#(function|mixin)\s?[\w-]+[$,:"\'()\s\w\d]+)/';
$subject = file_get_contents('vendor/twbs/bootstrap/scss/_variables.scss'); // just a regular SCSS file containing what I've already posted.
$lines = explode("\n",$subject);
$total_lines = count($lines);
foreach($lines as $line_no=>$line) {
if(preg_match($pattern,$line,$matches)) {
$match = $matches['remove'];
$counter = 0;
$open_braces = $closed_braces = 0;
for($i=$line_no;$i<$total_lines;$i++) {
$current = $lines[$i];
$open_braces = substr_count($current,"{");
$closed_braces = substr_count($current,"}");
$counter += ($open_braces - $closed_braces);
if($counter==0) {
$start = $line_no;
$end = $i;
foreach(range($start,$end) as $a) {
unset($lines[$a]);
} // end foreach(range)
break; // break out of this if!
} // end for loop
} // end preg_match
} // endforeach
And we have a $lines array without any functions or mixins.
There is probably a more elegant way to do this but I don't have the time or the willing to write an AST parser for SCSS
This can be quite easily adapted into making a hacked one however!

Replace several occurences of the same character in a different way in AWK

I want to replace several characters in a csv file depending on the characters around them using AWK.
For example in this line:
"Example One; example one; EXAMPLE ONE; E. EXAMPLE One"
I would like to replace all capital "E"'s with "EE" if they are within a word that uses only capitals and with "Ee" if they are in a word with upper and lower case letters or in an abbreviation (like the E., it's an adress file so there are no cases where this could also be the end of a sentence) so it should look like this:
"Eexample One; example one; EEXAMPLEE ONEE; Ee. EEXAMPLEE One"
Now what I have tried is this:
{if ($0 ~/E[A-Z]+/)
$0 = gensub(/E/,"EE","g",$0)
else if ($0 ~/[A-Z]E/)
$0 = gensub(/E/,"EE","g",$0)
else
$0 = gensub(/E/,"Ee","g",$0)
}
This works fine in most cases, but for lines (or fieds for that matter) that contain several "E"'s where I'd want one to be replaced as a "Ee" and one as a "EE" like in "E. EXAMPLE One", it matches the E in "EXAMPLE" and just replaces all "E"'s in that line with "EE".
Is there a better way to do this? Can I maybe somehow use if within gensub?
ps: Hope this makes sense, I just started learning the basics of programming!
$ cat tst.awk
{
head = ""
tail = $0
while ( match(tail,/[[:alpha:]]+\.?/) ) {
tgt = substr(tail,RSTART,RLENGTH)
add = (tgt ~ /^[[:upper:]]+$/ ? "E" : "e")
gsub(/E/,"&"add,tgt)
head = head substr(tail,1,RSTART-1) tgt
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}
$ awk -f tst.awk file
Eexample One; example one; EEXAMPLEE ONEE; Ee. EEXAMPLEE One
It's not clear though how you distinguish a string of letters followed by a period as an abbreviation or just the end of a sentence.

Regex performance: validating alphanumeric characters

When trying to validate that a string is made up of alphabetic characters only, two possible regex solutions come to my mind.
The first one checks that every character in the string is alphanumeric:
/^[a-z]+$/
The second one tries to find a character somewhere in the string that is not alphanumeric:
/[^a-z]/
(Yes, I could use character classes here.)
Is there any significant performance difference for long strings?
(If anything, I'd guess the second variant is faster.)
Just by looking at it, I'd say the second method is faster.
However, I made a quick non-scientific test, and the results seem to be inconclusive:
Regex Match vs. Negation.
P.S. I removed the group capture from the first method. It's superfluous, and would only slow it down.
Wrote this quick Perl code:
#testStrings = qw(asdfasdf asdf as aa asdf as8up98;n;kjh8y puh89uasdf ;lkjoij44lj 'aks;nasf na ;aoij08u4 43[40tj340ij3 ;salkjaf; a;lkjaf0d8fua ;alsf;alkj
a a;lkf;alkfa as;ldnfa;ofn08h[ijo ok;ln n ;lasdfa9j34otj3;oijt 04j3ojr3;o4j ;oijr;o3n4f;o23n a;jfo;ie;o ;oaijfoia ;aosijf;oaij ;oijf;oiwj;
qoeij;qwj;ofqjf08jf0 ;jfqo;j;3oj4;oijt3ojtq;o4ijq;onnq;ou4f ;ojfoqn;aonfaoneo ;oef;oiaj;j a;oefij iiiii iiiiiiiii iiiiiiiiiii);
print "test 1: \n";
foreach my $i (1..1000000) {
foreach (#testStrings) {
if ($_ =~ /^([a-z])+$/) {
#print "match"
} else {
#print "not"
}
}
}
print `date` . "\n";
print "test 2: \n";
foreach my $j (1..1000000) {
foreach (#testStrings) {
if ($_ =~ /[^a-z]/) {
#print "match"
} else {
#print "not"
}
}
}
then ran it with:
date; <perl_file>; date
it isn't 100% scientific, but it gives us a good idea. The first Regex took 10 or 11 seconds to execute, the second Regex took 8 seconds.