How to add a prefix to all lines that don't start with one of multiple words - regex

I am trying to add a prefix to all the lines in a file that don't start with one of multiple words using sed.
Example :
someText
sleep 1
anotherString
sleep 1
for i in {1..50}
do
command
sleep 1
secondCommand
sleep 1
done
Should become
PREFIX_someText
sleep 1
PREFIX_anotherString
sleep 1
for i in {1..50}
do
PREFIX_command
sleep 1
PREFIX_secondCommand
sleep 1
done
I am able to exclude any line starting with a single pattern word (ie: sleep, for, do, done), but I don't know how to exclude all lines starting with one of multiple patterns.
Currently I use the following command :
sed -i '/^sleep/! s/^/PREFIX_/'
Which works fine on all the lines starting with sleep.
I imagine there is some way to combine pattern words, but I can't seem to find a solution.
Something like this (which obviously doesn't work) :
sed -i '/[^sleep;^for;^do]/! s/^/PREFIX_/'
Any help would be greatly appreciated.

Use alternation with multiple words for negation:
sed -i -E '/^(sleep|for|do)/! s/^/PREFIX_/' file
PREFIX_someText
sleep 1
PREFIX_anotherString
sleep 1
for i in {1..50}
do
PREFIX_command
sleep 1
PREFIX_secondCommand
sleep 1
done
/^(sleep|for|do)/! will match all lines except those that start with sleep, or for or do words.

I like awk.
awk '/sleep|for|do/ { print; next; } { print "PREFIX_" $0 }' filename

Related

Unix regex get only the first match

I have the following text:
NodeMetaData MapNodeId="105141" PageFormat="OsXml" UniqueIdentifier="fd0f9ade-88e1-4b04-b338-0a8884f66423" RelativePath="Test_03/AddressMap_MyAddressMap.os.xml" LastPulledRevision="-9223372036854775808" LastPulledMd5="" LastSyncedMd5="7D0C294B9A7C09F17FD5AC0414179DD414649455297B8F73125D7FB5E39D647D" HasMergeConflicts="false"
NodeMetaData MapNodeId="105142" Pag
eFormat="OsXml" UniqueIdentifier="85f55c40-f95c-47f2-9c97-d35881e8f762" RelativePath="Test_03/Struct_MyStruct.os.xml" LastPulledRevision="-922337203685477580
8" LastPulledMd5="" LastSyncedMd5="32364BCCBCD8AA9C47D8E09A3EB06667DD9476EB155F9411FA359EFA5C1A4F4F" HasMergeConflicts="false"
There are two MapNodeId (see bold) and I need to get only the first one and insert it to a file.
I used the following:
set WorkingCopyRI=`( sed -n 's/.*MapNodeId=\"// ; s/\" .*//p' Result.log)`
but the var contains the the id of both MapNodeId, what do I need to add in order to get only the first one?
You can append ;T;q to your script to make it quit after the second s instruction prints for the first time.
Here's a cleaner and more robust way to do the whole thing:
sed -n '/MapNodeId=/ { s/^.*\sMapNodeId="\([^"]*\)"\s .*$/\1/p; q }'
I'm assuming your ID-s won't contain double quotes -- if they can, you will have to modify the expression in group #1.
(Also, your formatting gives no clue as to whether your text occurs in multiple lines or not, but I'm assuming that the MapNodeId="..." parts appear on separate lines, otherwise you wouldn't have this problem.)
perl approach:
perl -ne 'print "$1\n" if /MapNodeId="([^"]+)"/' Result.log
The output:
105141
print "$1\n" - print the first captured group value
Or if you have grep PCRE support:
grep -Po '.*MapNodeId="\K([^"]+)' Result.log | head -n 1

get number value between two strings using regex

I have a string with multiple value outputs that looks like this:
SD performance read=1450kB/s write=872kB/s no error (0 0), ManufactorerID 27 Date 2014/2 CardType 2 Blocksize 512 Erase 0 MaxtransferRate 25000000 RWfactor 2 ReadSpeed 22222222Hz WriteSpeed 22222222Hz MaxReadCurrentVDDmin 3 MaxReadCurrentVDDmax 5 MaxWriteCurrentVDDmin 3 MaxWriteCurrentVDDmax 1
I would like to output only the read value (1450kB/s) using bash and sed.
I tried
sed 's/read=\(.*\)kB/\1/'
but that outputs read=1450kB but I only want the number.
Thanks for any help.
Sample input shortened for demo:
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/read=\(.*\)kB/\1/'
SD performance 1450kB/s write=872/s no error
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/.*read=\(.*\)kB.*/\1/'
1450kB/s write=872
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/.*read=\([0-9]*\)kB.*/\1/'
1450
Since entire line has to be replaced, add .* before and after search pattern
* is greedy, will try to match as much as possible, so in 2nd example it can be seen that it matched even the values of write
Since only numbers after read= is needed, use [0-9] instead of .
Running
sed 's/read=\(.*\)kB/\1/'
will replace read=[digits]kB with [digit]. If you want to replace the whole string, use
sed 's/.*read=\([0-9]*\)kB.*/\1/'
instead.
As Sundeep noticed, sed doesn't support non-greedy pattern, updated for [0-9]* instead

Can adding a particular number to a bunch of "time" strings, be done in Regex

I have a "srt" file(like standard movie-subtitle format) like shown in below link:http://pastebin.com/3k8a53SC
Excerpt:
1
00:00:53,000 --> 00:00:57,000
<any text that may span multiple lines>
2
00:01:28,000 --> 00:01:35,000
<any text that may span multiple lines>
But right now the subtitles timing is all wrong, as it lags behind by 9 seconds.
Is it possible to add 9 seconds(+9) to every time entry with regex ?
Even if the milliseconds is set to 000 then it's fine, but the addition of 9 seconds should adhere to "60 seconds = 1 minute & 60 minutes = 1 hour" rules.
Also the subtitle text after timing entry must not get altered by regex.
By the way the time format for each time string is "Hours:Minutes:Seconds.Milliseconds".
Quick answer is "no", that's not an application for regex. A regular expression lets you MATCH text, but not change it. Changing things is outside the scope of the regex itself, and falls to the language you're using -- perl, awk, bash, etc.
For the task of adjusting the time within an SRT file, you could do this easily enough in bash, using the date command to adjust times.
#!/usr/bin/env bash
offset="${1:-0}"
datematch="^(([0-9]{2}:){2}[0-9]{2}),[0-9]{3} --> (([0-9]{2}:){2}[0-9]{2}),[0-9]{3}"
os=$(uname -s)
while read line; do
if [[ "$line" =~ $datematch ]]; then
# Gather the start and end times from the regex
start=${BASH_REMATCH[1]}
end=${BASH_REMATCH[3]}
# Replace the time in this line with a printf pattern
linefmt="${line//[0-2][0-9]:[0-5][0-9]:[0-5][0-9]/%s}\n"
# Calculate new times
case "$os" in
Darwin|*BSD)
newstart=$(date -v${offset}S -j -f "%H:%M:%S" "$start" '+%H:%M:%S')
newend=$(date -v${offset}S -j -f "%H:%M:%S" "$end" '+%H:%M:%S')
;;
Linux)
newstart=$(date -d "$start today ${offset} seconds" '+%H:%M:%S')
newend=$(date -d "$end today ${offset} seconds" '+%H:%M:%S')
;;
esac
# And print the result
printf "$linefmt" "$newstart" "$newend"
else
# No adjustments required, print the line verbatim.
echo "$line"
fi
done
Note the case statement. This script should auto-adjust for Linux, OSX, FreeBSD, etc.
You'd use this script like this:
$ ./srtadj -9 < input.srt > output.srt
Assuming you named it that, of course. Or more likely, you'd adapt its logic for use in your own script.
No, sorry, you can’t. Regex are a context free language (see Chomsky e.g. https://en.wikipedia.org/wiki/Chomsky_hierarchy) and you cannot calculate.
But with a context sensitive language like perl it will work.
It could be a one liner like this ;-)))
perl -n -e 'if(/^(\d\d:\d\d:\d\d)([-,\d\s\>]*)(\d\d:\d\d:\d\d)(.*)/) {print plus9($1).$2.plus9($3).$4."\n";}else{print $_} sub plus9{ ($h,$m,$s)=split(/:/,shift); $t=(($h*60+$m)*60+$s+9); $h=int($t/3600);$r=$t-($h*3600);$m=int($r/60);$s=$r-($m*60);return sprintf "%02d:%02d:%02d", $h, $m, $s;}‘ movie.srt
with move.srt like
1
00:00:53,000 --> 00:00:57,000
hello
2
00:01:28,000 --> 00:01:35,000
I like perl
3
00:02:09,000 --> 00:02:14,000
and regex
you will get
1
00:01:02,000 --> 00:01:06,000
hello
2
00:01:37,000 --> 00:01:44,000
I like perl
3
00:02:18,000 --> 00:02:23,000
and regex
You can change the +9 in the "sub plus9{...}", if you want another delta.
How does it work?
We are looking for lines that matches
dd:dd:dd something dd:dd:dd something
and then we call a sub, which add 9 seconds to the matched group one ($1) and group three ($3). All other lines are printed unchanged.
added
If you want to put the perl oneliner in a file, say plus9.pl, you can add newlines ;-)
if(/^(\d\d:\d\d:\d\d)([-,\d\s\>]*)(\d\d:\d\d:\d\d)(.*)/) {
print plus9($1).$2.plus9($3).$4."\n";
} else {
print $_
}
sub plus9{
($h,$m,$s)=split(/:/,shift);
$t=(($h*60+$m)*60+$s+9);
$h=int($t/3600);
$r=$t-($h*3600);
$m=int($r/60);
$s=$r-($m*60);
return sprintf "%02d:%02d:%02d", $h, $m, $s;
}
Regular expressions strictly do matching and cannot add/substract. You can match each datetime string using python, for example, add 9 seconds to that, and then rewrite the string in the appropriate spot. The regular expression I would use to match it would be the following:
(?<hour>\d+):(?<minute>\d+):(?<second>\d+),(?<msecond>\d+)
It has labeled capture groups so it's really easy to get each section (you won't need msecond but it's there for visualization, I guess)
Regex101

bash: Batch reformatting using sed + date?

I have a bunch of data that looks like this:
"2004-03-23 20:11:55" 3 3 1
"2004-03-23 20:12:20" 1 1 1
"2004-03-31 02:20:04" 15 15 1
"2004-04-07 14:33:48" 141 141 1
"2004-04-15 02:08:31" 2 2 1
"2004-04-15 07:56:01" 1 2 1
"2004-04-16 12:41:22" 4 4 1
and I need to feed this data to a program which only accepts time in UNIX (Epoch) format. Is there a way I can change all the dates in bash? My first instinct tells me to do something like this:
sed 's/"(.*)"/`date -jf "%Y-%m-%d %T" "\1" "+%s"`'
But I am not entirely sure that the \1 inside the date call will properly backreference the regex matched by sed. In fact, when I run this, I get the following response:
sed: 1: "s/(".*")/`date -jf "% ...": unterminated substitute in regular expression
Can anyone guide me in the right direction on this? Thank you.
Nothing is going to be expanded between single quotes. Also, no, the shell expansions are going to happen before the sed \1 expansion, so your code isn't going to work. How about something like this (untested):
while IFS= read -r date time a b c
do
date --date "${date:1} ${time::-1}" # Cut the variables to remove the literal quotes
printf " %s %s %s\n" "$a" "$b" "$c"
done < file

sed: remove strings between two patterns leaving the 2nd pattern intact (half inclusive)

I am trying to filter out text between two patterns, I've seen a dozen examples but didn't manage to get exactly what I want:
Sample input:
START LEAVEMEBE text
data
START DELETEME text
data
more data
even more
START LEAVEMEBE text
data
more data
START DELETEME text
data
more
SOMETHING that doesn't start with START
# sometimes it starts with characters that needs to be escaped...
I want to stay with:
START LEAVEMEBE text
data
START LEAVEMEBE text
data
more data
SOMETHING that doesn't start with START
# sometimes it starts with characters that needs to be escaped...
I tried running sed with:
sed 's/^START DELETEME/,/^[^ ]/d'
And got an inclusive removal, I tried adding "exclusions" (not sure if I really understand this syntax well):
sed 's/^START DELETEME/,/^[^ ]/{/^[^ ]/!d}'
But my "START DELETEME" line is still there (yes, I can grep it out, but that's ugly :) and besides - it DOES remove the empty line in this sample as well and I'd like to leave empty lines if they are my end pattern intact )
I am wondering if there is a way to do it with a single sed command.
I have an awk script that does this well:
BEGIN { flag = 0 }
{
if ($0 ~ "^START DELETEME")
flag=1
else if ($0 !~ "^ ")
flag=0
if (flag != 1)
print $0
}
But as you know "A is for awk which runs like a snail". It takes forever.
Thanks in advance.
Dave.
Using a loop in sed:
sed -n '/^START DELETEME/{:l n; /^[ ]/bl};p' input
GNU sed
sed '/LEAVEMEBE/,/DELETEME/!d;{/DELETEME/d}' file
I would stick with awk:
awk '
/LEAVE|SOMETHING/{flag=1}
/DELETE/{flag=0}
flag' file
But if you still prefer sed, here's another way:
sed -n '
/LEAVE/,/DELETE/{
/DELETE/b
p
}
' file