sed: replacing text if the next line does not start with } - replace

I would like to replace ; with , if the next line does not start with } and if the next line is not empty.
if next line starts with } or it is empty then ; must be removed.
for example
struct Point {
float x;
float y;
};
should be changed like below
type something{
type record Point{
c_float x,
c_float y
};
}
All other changes via sed are working except this ';' to ',' and i really does not have anymore ideas how to continue with this.. :(

With awk instead of sed:
echo 'struct Point {
float x;
float y;
};' |
awk '
$1 == "};" {
print prev
print
print "}"
in_struct = 0
}
in_struct {
if (prev) {print prev ","}
prev = $0
sub(/; *$/, "", prev)
sub(/float/, "c_&", prev)
}
$1 == "struct" {
print "type something {"
$1 = "type record"
print
prev = ""
in_struct = 1
}
'
outputs
type something {
type record Point {
c_float x,
c_float y
};
}

[DoD#MBP-13~/temp] cat file
struct Point {
float x;
float y;
};
[DoD#MBP-13~/temp] sed ' /^}\|^$/ s/;// ' file | sed 'N ; /}/! s/;/,/ ' | sed 'N ; /}/ s/;//'
struct Point {
float x,
float y
}
The first sed iteration looks for lines that starts with "}" or blank lines and removes ";" and second sed iteration loads the new line into buffer checks if it does not contain "}" then substitutes ";" to ",". Third iteration looks for "}" and removes ";".

This might work:
a="SOMETHING"
sed -r '/^struct\s*\S*\s*\{/,/^\};/s/^struct\s*(\S*)\s*\{/type '"$a"'{\ntype record \1\{/;s/^\};/&\n}/' input_file |
sed -r '/^type '"$a"'\{/,/^\}/{//b;/^type/b;/^\};/!N;/.\n/{s/\s*/&c_/;/\n(\};|$)/{s/;//;ta};{s/;/,/}};:a;P;D};P;D'
Owing to the nature of the problem I found it easiest to use to passes, it probably can be condensed.
BTW this was done GNU sed other sed's might not work!

Related

How to use sed to extract numbers from a comma separated string?

I managed to extract the following response and comma separate it. It's comma seperated string and I'm only interested in comma separated values of the account_id's. How do you pattern match using sed?
Input: ACCOUNT_ID,711111111119,ENVIRONMENT,dev,ACCOUNT_ID,111111111115,dev
Expected Output: 711111111119, 111111111115
My $input variable stores the input
I tried the below but it joins all the numbers and I would like to preserve the comma ','
echo $input | sed -e "s/[^0-9]//g"
I think you're better served with awk:
awk -v FS=, '{for(i=1;i<=NF;i++)if($i~/[0-9]/){printf sep $i;sep=","}}'
If you really want sed, you can go for
sed -e "s/[^0-9]/,/g" -e "s/,,*/,/g" -e "s/^,\|,$//g"
$ awk '
BEGIN {
FS = OFS = ","
}
{
c = 0
for (i = 1; i <= NF; i++) {
if ($i == "ACCOUNT_ID") {
printf "%s%s", (c++ ? OFS : ""), $(i + 1)
}
}
print ""
}' file
711111111119,111111111115

awk regex for nested big brackets

I have a structure like this:
label1 {
label1_1 {
item1_1_1: "value1_1_1";
label1_1_2:{ item1_1_2_1: "value1_1_2_1";};
item1_1_3: "value1_1_3";
};
label1_2 {...};
...
};
label2 {
item2_1: "value2_1";
label2_1:{
item2_1_1: "value2_1_1";
...
};
};
The section could be in one line or multiple lines, and empty line presentable. I'm trying to use awk to get any section with given label name,
section=$(awk -v RS='' -v ORS='\n\n' "/($2)\s(\{([^{}]|(?R)|\n)*\})/" $1)
where the $1 is file name, $2 is label name. It works if happens no empty line in the section, for example "label2", but faild by others.
What's the correct regex I should use?
Here's one way to do what you want, assuming neither { nor } can occur within quoted strings and using GNU awk 4.* for a couple of extensions:
$ cat tst.awk
BEGIN { RS="^$" }
{
tmp = $0
while ( match(tmp,/(\<([[:alnum:]_]+):?\s*{[^{}]+};)/,a) ) {
start[a[2]] = RSTART
lgth[a[2]] = RLENGTH
tmp = substr(tmp,1,RSTART-1) sprintf("%*s",length(a[1]),"") substr(tmp,RSTART+RLENGTH)
}
}
label in start { print substr($0,start[label],lgth[label]) }
.
$ awk -v label='label2' -f tst.awk file
label2 {
item2_1: "value2_1";
label2_1:{
item2_1_1: "value2_1_1";
...
};
};
$ awk -v label='label1_1' -f tst.awk file
label1_1 {
item1_1_1: "value1_1_1";
label1_1_2:{ item1_1_2_1: "value1_1_2_1";};
item1_1_3: "value1_1_3";
};
$ awk -v label='label1_1_2' -f tst.awk file
label1_1_2:{ item1_1_2_1: "value1_1_2_1";};
You can call awk as either awk -f scriptfile inputfile or awk 'script' inputfile so to use the above awk script inline instead of stored in a file is just:
awk '
BEGIN { RS="^$" }
{
tmp = $0
while ( match(tmp,/(\<([[:alnum:]_]+):?\s*{[^{}]+};)/,a) ) {
start[a[2]] = RSTART
lgth[a[2]] = RLENGTH
tmp = substr(tmp,1,RSTART-1) sprintf("%*s",length(a[1]),"") substr(tmp,RSTART+RLENGTH)
}
}
label in start { print substr($0,start[label],lgth[label]) }
' file

awk remove unwanted records and consolidate multiline fields to one line record in specific order

I have an output file that I am trying to process into a formatted csv for our audit team.
I thought I had this mastered until I stumbled across bad data within the output. As such, I want to be able to handle this using awk.
MY OUTPUT FILE EXAMPLE
Enter password ==>
o=hoster
ou=people,o=hoster
ou=components,o=hoster
ou=websphere,ou=components,o=hoster
cn=joe-bloggs,ou=appserver,ou=components,o=hoster
cn=joe
sn=bloggs
cn=S01234565
uid=bloggsj
cn=john-blain,ou=appserver,ou=components,o=hoster
cn=john
uid=blainj
sn=blain
cn=andy-peters,ou=appserver,ou=components,o=hoster
cn=andy
sn=peters
uid=petersa
cn=E09876543
THE OUTPUT I WANT AFTER PROCESSING
joe,bloggs,s01234565;uid=bloggsj,cn=joe-bloggs,ou=appserver,ou=components,o=hoster
john,blain;uid=blainj;cn=john-blain,ou=appserver,ou=components,o=hoster
andy,peters,E09876543;uid=E09876543;cn=andy-peters,ou=appserver,ou=components,o=hoster
As you can see:
we always have a cn= variable that contains o=hoster
uid can have any value
we may have multiple cn= variables without o=hoster
I have acheived the following:
cat output | awk '!/^o.*/ && !/^Enter.*/{print}' | awk '{getline a; getline b; getline c; getline d; print $0,a,b,c,d}' | awk -v srch1="cn=" -v repl1="" -v srch2="sn=" -v repl2="" '{ sub(srch1,repl1,$2); sub(srch2,repl2,$3); print $4";"$2" "$3";"$1 }'
Any pointers or guidance is greatly appreciated using awk. Or should I give up and just use the age old long winded method a large looping script to process the file?
You may try following awk code
$ cat file
Enter password ==>
o=hoster
ou=people,o=hoster
ou=components,o=hoster
ou=websphere,ou=components,o=hoster
cn=joe-bloggs,ou=appserver,ou=components,o=hoster
cn=joe
sn=bloggs
cn=S01234565
uid=bloggsj
cn=john-blain,ou=appserver,ou=components,o=hoster
cn=john
uid=blainj
sn=blain
cn=andy-peters,ou=appserver,ou=components,o=hoster
cn=andy
sn=peters
uid=petersa
cn=E09876543
Awk Code :
awk '
function out(){
print s,u,last
i=0; s=""
}
/^cn/,!NF{
++i
last = i == 1 ? $0 : last
s = i>1 && !/uid/ && NF ? s ? s "," $NF : $NF : s
u = /uid/ ? $0 : u
}
i && !NF{
out()
}
END{
out()
}
' FS="=" OFS=";" file
Resulting
joe,bloggs,S01234565;uid=bloggsj;cn=joe-bloggs,ou=appserver,ou=components,o=hoster
john,blain;uid=blainj;cn=john-blain,ou=appserver,ou=components,o=hoster
andy,peters,E09876543;uid=petersa;cn=andy-peters,ou=appserver,ou=components,o=hoster
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
This awk script works for your sample and produces the sample output:
BEGIN { delete cn[0]; OFS = ";" }
function print_info() {
if (length(cn)) {
names = cn[1] "," sn
for (i=2; i <= length(cn); ++i) names = names "," cn[i]
print names, uid, dn
delete cn
}
}
/^cn=/ {
if ($0 ~ /o=hoster/) dn = $0
else {
cn[length(cn)+1] = substr($0, index($0, "=") + 1)
uid = $0; sub("cn", "uid", uid)
}
}
/^sn=/ { sn = substr($0, index($0, "=") + 1) }
/^uid=/ { uid = $0 }
/^$/ { print_info() }
END { print_info() }
This should help you get started.
awk '$1 ~ /^cn/ {
for (i = 2; i <= NF; i++) {
if ($i ~ /^uid/) {
u = $i
continue
}
sub(/^[^=]*=/, x, $i)
r = length(r) ? r OFS $i : $i
}
print r, u, $1
r = u = x
}' OFS=, RS= infile
I assume that there is an error in your sample output: in the 3d record the uid should be petersa and not E09876543.
You might want look at some of the "already been there and done that" solutions to accomplish the task.
Apache Directory Studio for example, will do the LDAP query and save the file in CSV or XLS format.
-jim

What does this awk sentence mean?

I have the following sentence in awk
$ gawk '$2 == "-" { print $1 }' file
I was wondering what thing this instruction exactly did because I can't parse exactly the words I need.
Edit: How can I do in order to skip the lines before the following astersiks?
Let's say I have the following lines:
text
text
text
* * * * * * *
line1 -
line2 -
And then I want to filter just
line1
line2
with the sentence I posted above...
$ gawk '$2 == "-" { print $1 }' file
Thanks for your time and response!
This will find all lines on which the second column (Separated by spaces) is a -, and will then print the first column.
The first part ($2 == "-") checks for the second column being a -, and then if that is the case, runs the attached {} block, which prints the first column ($0 being the whole line, and $1, $2, etc being the first, second, ... columns.)
Spaces are the separator here simply because they are the default separator in awk.
Edit: To do what you want to do now, try the following (Not the most elegant, but it should work.)
gawk 'BEGIN { p = 0 } { if (p != 0 && $2 == "-") { print $1 } else { p = ($0 == "* * * * * * *")? 1 : 0 } }'
Spread over more lines for clarity on what's happening:
gawk 'BEGIN { p = 0 }
{ if (p != 0 && $2 == "-")
{ print $1 }
else
{ p = ($0 == "* * * * * * *")? 1 : 0 }
}'
Answer to the original question:
If the second column in a line from the file matches the string "-" then it prints out the first column of the line, columns are by default separated by spaces.
This would match and print out one:
one - two three
This would not:
one two three four
Answer to the revised question:
This code should do what you need after the match of the given string:
awk '/\* \* \* \* \* \* \*/{i++}i && $2 == "-" { print $1 }' data2.txt
Testing on this data gives the following output:
2two
2two

Awk find and replace for exact match only

If I'd like to replace a character field, say {, with awk I can use:
awk '{ gsub(/{/, "<"); print }' file
...but this will also replace a field such as "{" (which I don't want). Is there an awk function which will find only an exact match (and replace) of an entire field; for all fields.
For example, the following:
$ echo "foo bar zod \"{\" {" | awk '{ gsub(/{/, "<"); print }'
will output:
foo bar zod "<" <
but I'd like it to output:
foo bar zod "{" <
I could also explicitly iterate over the fields and use == to check for an exact match, but I wonder if there's an alternative.
I would do what you said, loop through all field, either checking with == or /^{$/.
However if we play some trick, it could be done without loop: (gnu awk)
awk '$0=gensub(/(\s|^){(\s|$)/, "\\1<\\2","g")'
check this example:
kent$ echo '{ foo "{" and this: { bar {'|awk '$0=gensub(/(\s|^){(\s|$)/, "\\1<\\2","g")'
< foo "{" and this: < bar <
In the example above, 3 of 4 { were substituted.