using sed in bash script - regex

Using an array of line numbers acquired through a grep command, I'm trying to then increase the line number and retrieve what is on the new line number with a sed command, but I'm assuming something is wrong with my syntax (specifically the sed part because everything else works.)
The script reads:
#!/bin/bash
#getting array of initial line numbers
temp=(`egrep -o -n '\<a class\=\"feed\-video\-title title yt\-uix\-contextlink yt\-uix\-sessionlink secondary"' index.html |cut -f1 -d:`)
new=( )
#looping through array, increasing the line number, and attempting to add the
#sed result to a new array
for x in ${temp[#]}; do
((x=x+5))
z=sed '"${x}"q;d' index.html
new=( ${new[#]} $z )
done
#comparing the two arrays
echo ${temp[#]}
echo ${new[#]}

This might work for you:
#!/bin/bash
#getting array of initial line numbers
temp=(`egrep -o -n '\<a class\=\"feed\-video\-title title yt\-uix\-contextlink yt\-uix\-sessionlink secondary"' index.html |cut -f1 -d:`)
new=( )
#looping through array, increasing the line number, and attempting to add the
#sed result to a new array
for x in ${temp[#]}; do
((x=x+5))
z=$(sed ${x}'q;d' index.html) # surrounded sed command by $(...)
new=( "${new[#]}" "$z" ) # quoted variables
done
#comparing the two arrays
echo "${temp[#]}" # quoted variables
echo "${new[#]}" # quoted variables
Your sed command was fine; it just needed to be surrounded by $(...) and have unnecessary quotes removed and rearranged.
BTW
To get the line five lines after a pattern (GNU sed):
sed '/pattern/,+5!d;//,+4d' file

You're sed line should probably be:
z=$(sed - n "${x} { p; q }" index.html)
Notice that we use the "-n" flag to tell sed to only print the lines we tell it to. When we reach the line number stored in the "x" variable, it will print it ("p") and then quit ("q"). To allow the x variable to be expanded, the commabd we send to sed must be placed between double quotes, and not single quotes.
And you should probably place the z variable between double quotes when using it afterwards.
Hope this helps =)

Related

SED: How to search for word "tokens" on consecutive lines (Windows)?

I have EDI files I need to find, by using SED to search for some anomalies.
The anomaly is when I search for a "token" called SGP, and where they are on multiple consecutive lines — so one SGP on one line and another SGP on another line — regardless of what's after the token:
SGP+SEGU1037087'
SGP+DFSU1143210'
SGP+SEGU1166926'
SGP+TGHU1203545'
But I don't want to find files where there are other segment lines between each SGP line:
SGP+TGHU1643436'
GID+2+3:BAG'
FTX+AAA+++sdfjkhsdfjkhsdfjkh'
MEA+AAE+AAB+KGM:20000.0000'
MEA+AAE+AAW+MTQ:.0000'
SGP+HCIU2090577'
So I've tried this:
sed 'SGP.*\n.*SGP' < *.txt
And as probably expected, I get nothing.
Any ideas on how to feed into SED a list of files in DOS, and get a list of files that meet the above criteria?
UPDATE
I think I have the "feed the files" bit here. But I am still stuck on how to use SED properly.
for i in *.txt; do
sed -i '<<WHAT DO I PLACE HERE?>>' $i
done
UPDATE 2
Please no Unix/Bash/etc solutions.. I am in Windows only! Thank you
UPDATE 3
Tried a DOS equivalent of #tshiono's answer but I get nothing..
for %%f in (*.txt) do (
sed -ne ':l;N;$!b l;/SGP[^\n]\+\nSGP/p' %%f
}
UPDATE 4
#tshiono - I want the script to find files that have this pattern...
SGP+SEGU1037087'
SGP+DFSU1143210'
SGP+SEGU1166926'
SGP+TGHU1203545'
Not this pattern ...
SGP+SEGU1037087'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+DFSU1143210'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+SEGU1166926'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+TGHU1203545'
Again - only lines with SGP as a token on every NEWLINE
Could you please try following.
awk '
FNR==1{
if(count){
if(fnr==count){
print prev_file " has all lines of SGP."
}
}
prev_file=FILENAME
count=fnr=""
}
/^SGP/{
++count
}
{
fnr++
}
END{
if(fnr==count){
print prev_file " has all lines of SGP."
}
}
' *.txt
The requirement is to detect which files contain consecutive lines both starting SGP.
Using standard (POSIX) sed, there's no way to get sed to print the file name. You can use this combination of shell script and sed, though, to detect which files contain consecutive lines starting with SGP:
for file in *.txt;
do
if [ -n "$(sed -n -e '/^SGP/{N;/^SGP.*\nSGP/{p;q;}}' "$file")" ]
then echo "$file"
fi
done
The shell test [ … ] checks whether the output of $(sed …) is a non-empty string, and reports the name of the file if it is. Note that the script is more flexible if, instead of using the glob *.txt, it uses the "$#" (list of arguments, preserving spaces etc). You can the write:
sh find-consecutive-SGP.sh *.txt
or use other more fanciful ways of specifying the file names as arguments.
The sed command doesn't print by default (-n). It looks for a line starting SGP and appends the next line into the 'pattern space'. It then looks to see if the result has two lots of SGP in it; one at the start (we know that will be there) and one after a newline. If that's found, it prints both lines (the pattern space) and quits because its job is done; it has found two consecutive lines both starting SGP. If the pattern space doesn't match, it is not printed (because of the -n) and more data is read. Any lines that don't start SGP are ignored and not printed.
With GNU sed, the F command prints the file name and a newline, so you could use:
for file in *.txt;
do
sed -n -e '/^SGP/{N;/^SGP.*\nSGP/{F;q;}}' "$file"
done
AFAICT from the GNU sed manual, there's no way to 'skip to the start of the next file' so you have to test each file separately as shown, rather than trying sed -n -e '…' *.txt — that will only report the first file that breaches the condition, not all the files.
If your objective is to get the list of filenames which meet the criteria,
how about:
for i in *.txt; do
[[ -n $(sed -ne ':l;N;$!b l;/SGP[^\n]\+\nSGP/p' "$i") ]] && echo "$i"
done
The sed commands :l;N;$!b makes a loop and slurps the whole lines
in the pattern space including "\n"
Then it matches the lines with the pattern of two consecutive lines
which both contain SGP.
If the sed output is non-empty, it prints the current filename.
[Update]
If your requirement is DOS platform, please try instead:
setlocal EnableDelayedExpansion
for %%f in (text*.txt) do (
set result=
for /f "usebackq tokens=*" %%a in (`sed.exe -ne ":l;N;$!b l;/SGP.\+\nSGP.\+/p" %%f`) do set result=!result!%%a
if "!result!" neq "" (
echo %%f
)
)
I've tested with Windows10 and sed-4.2.1.

Replace in result of other regex outcome

I am trying to automate some things for my CentOS server. For this automation I need to add, replace or modify file lines. This works fine by doing a global replace or replacing unique lines.
sed -i "s/user \\= apache/user \\= nginx/" file.conf
This replaces user = apache with user = nginx which works fine.
But sometimes I need to replace lines like these within a result of an other regex. For example the file /etc/php-fpm.d/www.conf contains the same configuration lines for a serval times.
With a regex I can easily filter out the part I want to do my replaces in but I don't know how to do that.
This is a part of the file:
...
[base]
name=CentOS-$releasever - Base
baseurl=http://centos.mirrors.ovh.net/ftp.centos.org/$releasever/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
exclude=postfix
[updates]
name=CentOS-$releasever - Updates
baseurl=http://centos.mirrors.ovh.net/ftp.centos.org/$releasever/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
exclude=postfix
[extras]
name=CentOS-$releasever - Extras
...
With this regex:
sed -n "/\[base\]/,/\[updates\]/p" /etc/yum.repos.d/CentOS-Base.repo
I can get the lines from [base] until [updates]. Then I can modify for example exclude line. But how can I get these lines back into the main file?
This is what I have so far:
# $BASE contains the lines [base] until [updates]
BASE=`sed -n "/\[base\]/,/\[updates\]/p" /etc/yum.repos.d/CentOS-Base.repo`
# Get te line that holds 'exclude' to the $EXCLUDE parameter
EXCLUDE=`echo "$BASE" | grep "exclude"`
# Modify EXCLUDE
EXCLUDE="$EXCLUDE, dovecot"
# Replace old exclude line in $BASE variable with new one
BASE=`sed "s/^exclude.*/$EXCLUDE/g" $BASE`
# Put $BASE back into /etc/yum.repos.d/CentOS-Base.repo
# ???
In this script I am trying to achieve this with multiple commands which is fine. It would also be nice if there is just one replace expression for this. But that of course is not a must.
sed -i "/\[base\]/,/\[updates\]/s/^exclude.*/&, dovecot/"
sed commands consist of an optional address, the command, then any arguments to the command. The address, if present, is either a single expression indicating the line number or single-line pattern identifying the line to which the command is to be applied, or a pair of comma-separated expressions indicating the starting and ending line numbers or patterns of the range to which the command is to be applied.
Your previous sed script ("/\[base\]/,/\[updates\]/p") used the command p (print) with the address range of /\[base\]/ to /\[updates\]/.
You can just apply your s (substitute) command to that same range, to substitute exclude.* (exclude followed by anything - but gobbling up the rest of the line) with &, dovecot (where & in the replacement string means everything that was matched).
If you need to do something more complicated within that address range, you can use the { sed command - which opens a block of commands that terminates with the } command. This lets you do things like if/else blocks, so you could do:
sed -i -e "/\[base\]/,/\[updates\]/{" \
-e "/\[base\]/h" \
-e "s/^exclude=.*/&, dovecot/" \
-e "/^exclude=/h" \
-e "/\[updates\]/{" \
-e "x" \
-e "/^exclude=/b e" \
-e "i \
exclude=dovecot" \
-e ": e" \
-e "x" \
-e "}" \
-e "}" /etc/yum.repos.d/...
This big hairy sed script does the following:
everything in the script operates between the [base] and [updates] lines.
The [base] line or any line that begins exclude= are placed into the "hold" buffer (via the h command)
Any line that begins exclude= gets , dovecot appended to it
When [updates] is seen:
the hold buffer is temporarily recovered (via the x command which swaps the current line with the hold buffer)
the buffer is checked to see if it begins with exclude=
if it does, skip the next line by branching (command b) to label e
if it doesn't (because we weren't skipped), insert the line exclude=dovecot
after label e (via the command :)...
swap the current line and hold buffer again (via x again)
In other words: Any exclude= line gets , dovecot appended; if no exclude= line was seen, a new exclude=dovecot line is added at the end.
Note that it might be more useful to go from /\[base\]/ to /^$/, if you can guarantee a blank line at the end of each block (which is normally the case), in which case the you would update the first -e expression as well as changing -e "/\[updates\]/{" to -e '/^$/'{ (note also the change in quotes so that the $ isn't parsed by the shell).

Swap Strings within a line in Bash

I'm parsing a document with a bash script and output different parts of it. At one point i need find and reformat text in the form of:
(foo)[X]
[Y]
(bar)[Z]
to something like:
X->foo
Y
Z->bar
Now, I'm able to grep the parts I want with RegEx, but I'm having trouble swapping the two elements in one line and handling the fact that the text in parentheses is optional. Is this even possible with a combination of sed and grep?
Thank You for your time.
You can use sed:
sed -e 's/(\([^)]*\))\[\([^]]*\)]/\2->\1/' -e 's/\[\([^]]*\)]/\1/' file
This works for your given input example:
X->foo
Y
Z->bar
You might need to make the patterns more strict if you have more kinds of input to handle.
You can use awk:
awk -F '[][()]+' '{print (NF>3 ? $3 "->" $2 : $2)}' file
X->foo
Y
Z->bar
You can even do it in bash itself, although it's not pretty.
# Three capture groups:
# 1. The optional paranthesized text
# 2. The contents of the parentheses
# 3. The contents of the square brackets
regex="(\((.*)\))?\[(.*)\]"
while IFS= read -r str; do
[[ "$str" =~ $regex ]]
# If the 2nd array element is not empty, print -> followed by the
# non-empty value.
echo "${BASH_REMATCH[3]}${BASH_REMATCH[2]:+->${BASH_REMATCH[2]}}"
done < file.txt

Using multiple sed commands

Hi I'm looking to search through a file and output the values of a line that matches the following regex with the matching text removed, I don't need it output to a file. This is what I am currently using and it is outputting the required text but multiple times:
#!/bin/sh
for file in *; do
sed -e 's/^owner //g;p;!d ; s/^admin //g;p;!d ; s/^loc //g;p;!d ; s/^ser //g;p;!d' $file
done
The preferred format would be something like this so I could have control over what happens inbetween:
for file in *; do
sed 's/^owner //g;p' $file | head -1
sed 's/^admin //g;p' $file | head -1
sed '/^loc //g;p' $file | head -1
sed '/^ser //g;p' $file | head -1
done
An example input file would be the following:
owner sys group
admin guy
loc Q-30934
ser 18r9723
comment noisy fan is something
and the required output is the following:
sys group
guy
Q-30934
18r9723
You're giving sed the p (for Print) command several times. It prints the entire line each time. And unless you tell it not to with the -n option, sed will print the line at the end anyway.
You also give the !d command multiple times.
Edited after you added the multiple-sed version: instead of using head -q, just use -n to avoid printing lines you don't want. Or even use q (Quit) to stop processing after printing the bit you do want.
For instance:
sed -n '/^owner / { s///gp; q; }' $file
The {} group the substitution and quit commands together, so that they are both executed if and only if the pattern is matched. Having used the pattern in the address at the beginning, you can leave it out of the s command. So that command is short for:
sed -n '/^owner / { s/^owner //gp; q; }' $file
I'd suggest:
sed -n -e '/^owner / { s///; p; }' \
-e '/^admin / { s///; p; }' \
-e '/^loc / { s///; p; }' \
-e '/^ser / { s///; p; }' \
*
sed is perfectly capable of reading many files, so the loop control is unnecessary (you aren't doing per-file I/O redirection, for example) and it's reasonable to list the files after the rest of the sed command (that's the * on its own). If you've got a more modern version of sed (e.g. GNU sed), you can combine the patterns into a single line:
sed -r -n -e '/^(owner|admin|loc|ser) / { s///; p; }' *
This might work for (GNU sed):
sed '0,/^owner /{//s///p};0,/^admin /{//s///p};0,/^loc /{//s///p};0,/^ser /{//s///p}' file
Creates a series of toggle switches, one for each of the desired strings. The switches apply once only throughout the file for each string i.e. only the first occurence of each string is printed.
An alternative and depending on file sizes maybe quicker method:
sed -rn '1{x;s/^/owner admin loc ser /;x};/^(owner |admin |loc |ser )/{G;/^(owner |admin |loc |ser )(.*\n.*)\1/!b;s//\2/;P;/\n$/q;s/.*\n//;h}' file
This preps the hold space with the desired strings. For only those lines that contain the desired strings, append the hold space and check if the current line needs to be amended. Match the desired string with the same string in the hold space. If the line has already appeared the match will fail and the line can be disregarded. If the line is yet to be amended, the desired string is removed from the current line and then the first half of the line is printed. If no strings appear in the remaining half of the line the process is over and can be quit. Otherwise remove the first half of the string and replace the hold space with the desired string removed.

sed/awk replace in all matches

I want to invert all the color values in a bunch of files. The colors are all in the hex format #ff3300 so the inversion could be done characterwise with the sed command
y/0123456789abcdef/fedcba9876543210/
How can I loop through all the color matches and do the char translation in sed or awk?
EDIT:
sample input:
random text... #ffffff_random_text_#000000__
asdf#00ff00
asdfghj
desired output:
random text... #000000_random_text_#ffffff__
asdf#ff00ff
asdfghj
EDIT: I changed my response as per your edit.
OK, sed may result in a difficult processing. awk could do the trick more or less easily, but I find perl much more easy for this task:
$ perl -pe 's/#[0-9a-f]+/$&=~tr%0123456789abcdef%fedcba9876543210%r/ge' <infile >outfile
Basically you find the pattern, then execute the right-hand side, which executes the tr on the match, and substitutes the value there.
The inversion is really a subtraction. To invert a hex, you just subtract it from ffffff.
With this in mind, you can build a simple script to process each line, extract hexes, invert them, and inject them back to the line.
This is using Bash (see arrays, printf -v, += etc) only (no external tools there):
#!/usr/bin/env bash
[[ -f $1 ]] || { printf "error: cannot find file: %s\n" "$1" >&2; exit 1; }
while read -r; do
# split line with '#' as separator
IFS='#' toks=( $REPLY )
for tok in "${toks[#]}"; do
# extract hex
read -n6 hex <<< "$tok"
# is it really a hex ?
if [[ $hex =~ [0-9a-fA-F]{6} ]]; then
# compute inversion
inv="$((16#ffffff - 16#$hex))"
# zero pad the result
printf -v inv "%06x" "$inv"
# replace hex with inv
tok="${tok/$hex/$inv}"
fi
# build the modified line
line+="#$tok"
done
# print the modified line and clean it for reuse
printf "%s\n" "${line#\#}"
unset line
done < "$1"
use it like:
$ ./invhex infile > outfile
test case input:
random text... #ffffff_random_text_#000000__
asdf#00ff00
bdf#cvb_foo
asdfghj
#bdfg
processed output:
random text... #000000_random_text_#ffffff__
asdf#ff00ff
bdf#cvb_foo
asdfghj
#bdfg
This might work for you (GNU sed):
sed '/#[a-f0-9]\{6\}\>/!b
s//\n&/g
h
s/[^\n]*\(\n.\{7\}\)[^\n]*/\1/g
y/0123456789abcdef/fedcba9876543210/
H
g
:a;s/\n.\{7\}\(.*\n\)\n\(.\{7\}\)/\2\1/;ta
s/\n//' file
Explanation:
/#[a-f0-9]\{6\}\>/!b bail out on lines not containing the required pattern
s//\n&/g prepend every pattern with a newline
h copy this to the hold space
s/[^\n]*\(\n.\{7\}\)[^\n]*/\1/g delete everything but the required pattern(s)
y/0123456789abcdef/fedcba9876543210/ transform the pattern(s)
H append the new pattern(s) to the hold space
g overwrite the pattern space with the contents of the hold space
:a;s/\n.\{7\}\(.*\n\)\n\(.\{7\}\)/\2\1/;ta replace the old pattern(s) with the new.
s/\n// remove the newline artifact from the H command.
This works...
cat test.txt |sed -e 's/\#\([0123456789abcdef]\{6\}\)/\n\#\1\n/g' |sed -e ' /^#.*/ y/0123456789abcdef/fedcba9876543210/' | awk '{lastType=type;type= substr($0,1,1)=="#";} type==lastType && length(line)>0 {print line;line=$0} type!=lastType {line=line$0} length(line)==0 {line=$0} END {print line}'
The first sed command inserts line breaks around the hex codes, then it is possible to make the substitution on all lines starting with a hash. There are probably an elegant solution to merge the lines back again, but the awk command does the job. The only assumption there is that there won't be two hex-codes following directly after each other. If so, this step has to be revised.