loop inside sed script doesn't terminate - regex

Say I have a file like this:
...
{
...
cout<< "---at time:\t\t"<< sc_core::sc_time_stamp()<<"\n\n" << endl;
...
cout<< "---at time:\t\t"<< sc_core::sc_time_stamp()<<"\n\n" << endl;
...
}
...
{
...
if(strcmp(argv[1], "--io_freq1_Mhz") != 0)
if(strcmp(argv[2], "--io_freq2_Mhz") != 0)
...
if(strcmp(argv[11], "--bytes_per_word") != 0)
...
if(strcmp(argv[23], "--mem_size_bytes") != 0)
...
}
What I want do first is, using sed, loading all lines containing the pattern
"--foo"
into pattern space and printing out just the part inside the paranthesis, so I use the command:
sed -n -e 's/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/\1/p' file
which does exactly what I want, so I get as an output:
--io_freq1_Mhz
--io_freq2_Mhz
...
--bytes_per_word
...
--mem_size_bytes
Next I want to merge all the lines into one and separating the content by a blank. I can solve this using command substitution:
echo `sed -n -e 's/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/\1/p' file`
This gives me:
--io_freq1_Mhz --io_freq2_Mhz --... --bytes_per_word --... --mem_size_bytes
Next I want to insert a number betwenn the parameters, for example 1, so the final result should look like:
--io_freq1_Mhz 1 --io_freq2_Mhz 1 --... 1 --bytes_per_word 1 --... 1 --mem_size_bytes 1
I can almost solve this. I'm using the command:
echo `sed -n -e 's/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/\1/p' file` | sed -n -e ':start { s/\(--[^\ ]*\) -/\1 1 -/p; b start }' | sed -n -e 's/\(--.*[^\ ]\)/\1 1/p'
but I get two minor problems. First of all, before I'm jumping back to my start mark the output gets piped into the last sed statement, that means I get as an output:
--io_freq1_Mhz 1 --io_freq2_Mhz --... --bytes_per_word --... --mem_size_bytes 1
--io_freq1_Mhz 1 --io_freq2_Mhz 1 --... --bytes_per_word --... --mem_size_bytes 1
and so on. So my first question is, how I can avoid piping the output every time into my last sed statement. Can I achieve this using different sed options/flags?
The second problem is, that the command doesn't terminate. The iteration ends with
--io_freq1_Mhz 1 --io_freq2_Mhz 1 --... --bytes_per_word 1 --... --third_last_item 1 --second_last_item mem_size_bytes 1
As can be seen, behind the second last item a '1' isn't appended and additionally the whole command doesn't terminate. I have to terminate it using Ctrl-C.

Use a minor modification of your first command:
sed -n -e 's/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/\1 1/p' file | tr '\n' ' '
Having extracted the name, append the digit after it. The tr command converts newlines into blanks. You could do it all in sed; it would be fiddly, that's all.
Actually, it isn't all that much more fiddly, but it requires a different way of looking at the process. Specifically, you need to save the matching patterns in the hold space, and then process them all at the end of the input:
sed -n \
-e '/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/{ s//\1 1/; H; }' \
-e '$ { x; s/\n/ /g; p; }' file
The semicolons before the } characters are necessary with BSD (macOS) sed, but not with GNU sed. The first -e option finds lines that match your pattern, and then applies a substitute command to the line to retain just the --name part plus a digit 1, and then appends that information to the hold space after a newline. The second -e option works on the last line. It exchanges the pattern and hold spaces, then replaces every newline with a blank and prints the result, including a trailing newline which the script with tr replaces with a blank.
Output (note the leading blank):
--io_freq1_Mhz 1 --io_freq2_Mhz 1 --bytes_per_word 1 --mem_size_bytes 1
If you don't want the leading blank, remove it before printing (add s/^ //; before the p).

Related

Can not replace multiple empty lines with one

Why does the following not replace multiple empty lines with one?
$ cat some_random_text.txt
foo
bar
test
and this does not work:
$ cat some_random_text.txt | perl -pe "s/\n+/\n/g"
foo
bar
test
I am trying to replace the multiple new lines (i.e. empty lines) to a single empty new line but the regex I use for that does not work as you can see in the example snippet.
What am I messing up?
Expected outcome is:
foo
bar
test
The reason it doesn't work is that -p tells perl to process the input line by line, and there's never more than one \n in a single line.
Better idea:
perl -00 -lpe 1
-00: Enable paragraph mode (input records are terminated by any sequence of 2+ newlines).
-l: Enable autochomp mode (the input record separators are trimmed automatically, so since we're in paragraph mode, all trailing newlines are removed, and output records get "\n\n" added).
-p: Enable automatic input/output (the main code is executed for each input record; anything left in $_ is printed automatically).
-e 1: Use a dummy main program that does nothing.
Taken all together this does nothing except normalize paragraph terminators to exactly two newlines.
You are executing the following program:
LINE: while (<>) {
s/\n+/\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
Since you are reading one line at at time, and since a line is a sequence of characters that aren't line feeds terminated by a line feed, your pattern will never match more than one newline.
The simple fix is to tell Perl to treat the entire file as one line. Also, you don't want to replace every line feed, but just those found in sequence of two or more, and you want to replace the sequence with two line feeds.
perl -0777pe's/\n\n\K\n+//g; s^\n+//; s/\n\K\n\z//' some_random_text.txt
The second and third substitutions ensure there are no blank lines at the start and end of the file.
While reading the entire file into memory is easy, it's not necessary. The desired output can also be achieved by maintaining a flag that indicates whether the previous line was blank or not.
perl -ne'if (/\S/) { print "\n" if $f; print; $f=0 } else { $f=1 }' some_random_text.txt
This solution also removes blank lines from the start and end of the file.
Given:
$ echo "$txt"
foo
bar
test
You can use sed to reduce the runs of blank lines to a single \n:
$ echo "$txt" | sed '/^$/N;/^\n$/D'
foo
bar
test
Even easier, you can use cat -s:
$ echo "$txt" | cat -s # same output
In perl the idiomatic 1 liner is to use -00 for paragraph mode:
$ echo "$txt" | perl -00pe0 # same output
And in awk you have the flexibility of using paragraph mode by setting RS= and then set ORS= to what you want the replacement for runs of \n to be:
$ echo "$txt" | awk '1' RS= ORS="\n\n" # same output
ikegami correctly states that printf 'a\n\n' | ... will produce two trailing spaces with these solutions. That may or may not be an issue.

Replace a block of text

I have a file in this pattern:
Some text
---
## [Unreleased]
More text here
I need to replace the text between '---' and '## [Unreleased]' with something else in a shell script.
How can it be achieved using sed or awk?
Perl to the rescue!
perl -lne 'my #replacement = ("First line", "Second line");
if ($p = (/^---$/ .. /^## \[Unreleased\]/)) {
print $replacement[$p-1];
} else { print }'
The flip-flop operator .. tells you whether you're between the two strings, moreover, it returns the line number relative to the range.
This might work for you (GNU sed):
sed '/^---/,/^## \[Unreleased\]/c\something else' file
Change the lines between two regexp to the required string.
This example may help you.
$ cat f
Some text
---
## [Unreleased]
More text here
$ seq 1 5 >mydata.txt
$ cat mydata.txt
1
2
3
4
5
$ awk '/^---/{f=1; while(getline < c)print;close(c);next}/^## \[Unreleased\]/{f=0;next}!f' c="mydata.txt" f
Some text
1
2
3
4
5
More text here
awk -v RS="\0" 'gsub(/---\n\n## \[Unreleased\]\n/,"something")+1' file
give this line a try.
An awk solution that:
is portable (POSIX-compliant).
can deal with any number of lines between the start line and the end line of the block, and potentially with multiple blocks (although they'd all be replaced with the same text).
reads the file line by line (as opposed to reading the entire file at once).
awk -v new='something else' '
/^---$/ { f=1; next } # Block start: set flag, skip line
f && /^## \[Unreleased\]$/ { f=0; print new; next } # Block end: unset flag, print new txt
! f # Print line, if before or after block
' file

print line with match and next line, but only first match, from a file of strings

I have two files, one with a newline separated list of number IDs
>cat list.txt
3342
232
...
and one with those IDs and some sequence data in the line after
>cat Seqeunce.txt
>600
ATCGCGG
>3342
ACTCGGTC
>232
TGTGCT
>3342
ACGCGGTC
I would like to print all lines with the ID match and the next line, but only the first time a match is found.
So, the out put would be:
> ...some code... list.txt Sequence.txt
>3342
ACTCGGTC
>232
TGTGCT
Note that only the line with the first occurrence of ID 3342, and the next line, is printed
I tried using grep,
grep -f list.txt -A 1 -m 1 Sequence.txt
But it wasnt working. Just running grep -A 1 and -m 1 with the actual ID produced what I want, but I have thousands of IDs and cant run each by hand.
awk 'NR==FNR{tgts[">"$0]; next} $0 in tgts{c=2; delete tgts[$0]} c&&c--' list.txt sequence.txt
>3342
ACTCGGTC
>232
TGTGCT
You can use this awk command:
awk -F'>' 'NR==FNR{a[$1];next} $2 in a{p=1; print; delete a[$2]; next};
p; {p=0}' list.txt Sequence.txt
>3342
ACTCGGTC
>232
TGTGCT
You are so close. Give this a try:
for id in `cat list.txt`; do grep -A 1 -m 1 -x ">$id" Sequence.txt; done

AWK end of line sign in regular expressions

I have a simple awk script named "script.awk" that contains:
/\/some_simple_string/ { print $0;}
I'm using it to parse some file that contains:
(by using: cat file | awk -f script.awk)
14 catcat one_two/some_thing
15 catcat one_three/one_more_some_simple_string
16 dogdog one_two/some_simple_string_again
17 dogdog one_four/some_simple_string
18 qweqwe firefire/ppp
I want the script to only print the stroke that fully reflect "/some_simple_string[END_OF_LINE]" but not 2 or 3.
Is there any simple way to do it?
I think, the most appropriate way is to add end-of-line sigh to the regular expression.
So it will parse only strokes that starting with "/some.." and have a new line at the end of "..string[END_OF_LINE]"
Desired output:
17 dogdog one_four/some_simple_string
Sorry for confusion, I was asking for END OF LINE sign in regular expressions.
The correct answer is:
/\/some_simple_string$/ { print $0;}
You can always use:
/\/some_simple_string$/ { print $0 }
I.e. match not only "some_simple_string" but match "/some_simple_string" followed by the end of the line ($ is end of line in regex)
grep '\some_simple_string$' file | tail -n 1 should do the trick.
Or if you really want to use awk do awk '/\/some_simple_string/{x = $0}END{print x}'
To return just the last of a group of matches, ...
Store the line in a variable and print it in the END block.
/some_simple_string/ { x = $0 }
END{ print x }
To print all the matches that end with the string /some_simple_string using regular expression you need to anchor to the the end of the line using $. The most suitable tool for this job is grep:
$ grep '/some_simple_string$' file
In awk the command is much the same:
$ awk '/[/]some_simple_string$/' file
To print all lines after the matching you would do:
$ awk 'print_flag{print;f=0} /[/]some_simple_string$/{print_flag=1}' file
Or just combine grep and tail if it makes it clearer using context option -A to print the following lines:
$ grep -A1 '/some_simple_string$' file | tail -n 1
I sometimes find that the input records can have a trailing carriage return (\r).
Yes, I deal with both Windows and Linux text files.
So I add the following 'pre-processor' to my awk scripts:
1 == 1 { # preprocess all records
res = gsub("\r", "") # remove unwanted trailing char
if(res>0 && NR<100) { print "(removed stuff)" > "/dev/stderr" } # optional
}
more optimally, let FS do the work instead of having it perform unnecessary and unrelated field splitting (adding the \r bit for Windows/DOS completeness):
mawk '!_<NF' FS='[/]some_simple_string[\r]?$'
17 dogdog one_four/some_simple_string

sed join lines together

what would be the sed (or other tool) command to join lines together in a file that do not end w/ the character '0'?
I'll have lines like this
412|n|Leader Building Material||||||||||d|d|20||0
which need to be left alone, and then I'll have lines like this for example (which is 3 lines, but only one ends w/ 0)
107|n|Knot Tying Tools|||||Knot Tying Tools
|||||d|d|0||0
which need to be joined/combined into one line
107|n|Knot Tying Tools|||||Knot Tying Tools|||||d|d|0||0
sed ':a;/0$/{N;s/\n//;ba}'
In a loop (branch ba to label :a), if the current line ends in 0 (/0$/) append next line (N) and remove inner newline (s/\n//).
awk:
awk '{while(/0$/) { getline a; $0=$0 a; sub(/\n/,_) }; print}'
Perl:
perl -pe '$_.=<>,s/\n// while /0$/'
bash:
while read line; do
if [ ${line: -1:1} != "0" ] ; then
echo $line
else echo -n $line
fi
done
awk could be short too:
awk '!/0$/{printf $0}/0$/'
test:
kent$ cat t
#aasdfasdf
#asbbb0
#asf
#asdf0
#xxxxxx
#bar
kent$ awk '!/0$/{printf $0}/0$/' t
#aasdfasdf#asbbb0
#asf#asdf0
#xxxxxx#bar
The rating of this answer is surprising ;s (this surprised wink emoticon pun on sed substitution is intentional) given the OP specifications: sed join lines together.
This submission's last comment
"if that's the case check what #ninjalj submitted"
also suggests checking the same answer.
ie. Check using sed ':a;/0$/{N;s/\n//;ba}' verbatim
sed ':a;/0$/{N;s/\n//;ba}'
does
no one
ie. 0
people,
try
nothing,
ie. 0
things,
any more,
ie. 0
tests?
(^D aka eot 004 ctrl-D ␄ ... bash generate via: echo ^V^D)
which will not give (do the test ;):
does no one ie. 0
people, try nothing, ie. 0
things, any more, ie. 0
tests? (^D aka eot 004 ctrl-D ␄ ... bash generate via: echo ^V^D)
To get this use:
sed 'H;${z;x;s/\n//g;p;};/0$/!d;z;x;s/\n//g;'
or:
sed ':a;/0$/!{N;s/\n//;ba}'
not:
sed ':a;/0$/{N;s/\n//;ba}'
Notes:
sed 'H;${x;s/\n//g;p;};/0$/!d;z;x;s/\n//g;'
does not use branching and
is identical to:
sed '${H;z;x;s/\n//g;p;};/0$/!{H;d;};/0$/{H;z;x;s/\n//g;}'
H commences all sequences
d short circuits further script command execution on the current line and starts the next cycle so address selectors following /0$/! can only be /0$/!! so the address selector of
/0$/{H;z;x;s/\n//g;} is redundant and not needed.
if a line does not end with 0 save it in hold space
/0$/!{H;d;}
if a line does end with 0 save it too and then print flush (double entendre ie. purged and lines aligned)
/0$/{H;z;x;s/\n//g;}
NB ${H;z;x;s/\n//g;p;} uses /0$/ ... commands with an extra p to coerce the final print and with a now unnecessary z (to empty and reset pattern space like s/.*//)
A typically cryptic Perl one-liner:
perl -pe 'BEGIN{$/="0\n"}s/\n//g;$_.=$/'
This uses the sequence "0\n" as the record separator (by your question, I'm assuming that every line should end with a zero). Any record then should not have internal newlines, so those are removed, then print the line, appending the 0 and newline that were removed.
Another take to your question would be to ensure each line has 17 pipe-separated fields. This does not assume that the 17th field value must be zero.
awk -F \| '
NF == 17 {print; next}
prev {print prev $0; prev = ""}
{prev = $0}
'
if ends with 0 store, remove newline..
sed '/0$/!N;s/\n//'