How to parse every match of sed command - regex

I have a string [u'SOMEVALUE1', u'SOMEVALUE2', u'SOMEVALUE3'], I would like to parse every element matched by my sed command. The element matched are in the single quote. Here is my script
#!/bin/bash
ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
for id in $(sed -n "s/^.*'\(.*\)'.*$/\1/ p" <<< ${ARR});
do
echo "$id"
done
I have only the first value returned.

The wildcard .* will match the longest leftmost possible string. If your intention is to match the individual substrings which are in single quotes, try
grep -o "'[^']*'" <<<"$ARR"
To remove the single quotes around the values, simply pipe to sed "s/'//g" and to loop over the lines printed by a pipe, do
... commands ... |
while read -r id; do
: things with "$id"
done

BASH can match regular expressions with the help of =~ (see man bash). Matching more than once is a bit painful but in your case we can split the input on white space and match once per item:
ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
for A in $ARR
do
[[ $A =~ u\'(.+)\' ]] && echo ${BASH_REMATCH[1]}
done
results in
SOMEVALUE1
SOMEVALUE1
SOMEVALUE1

is this what you're trying to do?
$ ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
$ awk -v RS="'" '!(NR%2)' <<< "$ARR"
SOMEVALUE1
SOMEVALUE1
SOMEVALUE1
$ awk -v RS="'" '!(NR%2)' <<< "$ARR" |
while IFS= read -r id; do echo "id=$id"; done
id=SOMEVALUE1
id=SOMEVALUE1
id=SOMEVALUE1

Related

Replace Random Characters in a Variable

I want to replace value of a variable (can contain a number, a character, a string of characters).
$ echo $VAR
http://some-random-string.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352........
So far, I've tried this command, however it's not working, so I'm thinking some of these might need a regex.
$ echo $VAR | sed -e "s/\(http[^^]*\).*\(.watch\)/\1"mystring"\2/g"
$ echo $VAR | sed -e "s/\(https\?:\/\/\).*\(.watch\)/\1"mystring"\2/g"
$ echo $VAR | sed -e "s/\(http[s]\?:\/\/\).*\(.watch\)/\1"mystring"\2/g"
I'm aware that there are questions that answer similar queries, but they have not been of help.
echo $VAR | sed 's|\(http[x]*://\)[^.]*\(.*\)|\1mystring\2|'
explanation
s| # substitute
\(http[x]*://\) # save first part in arg1 (\1)
[^.]* # all without '.'
\(.*\) # save the rest in arg2 (\2)
|\1 # print arg1
mystring # print your replacement
\2 # print arg2
|
In sed you have to escape any control characters like forward slash before matching:
echo $VAR | sed 's/http.\/\/.*\.watch\.film\.tv/http:\/\/mystring.watch.film.tv/'
You don't need sed. This task can be done in just shell:
$ var='http://some-random-string.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352'
$ echo "${var%%//*}//mystring.watch${var#*.watch}"
http://mystring.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352
How it works:
${var%%//*} returns the value of $var with the first // and everything after it removed.
//mystring.watch adds the string that we want.
${var#*.watch}" returns the value of $var with everything up to and including the first occurrence of .watch removed.
Because this approach does not require pipelines or subshells, it will be more efficient.
gnu sed
$ echo $VAR | sed -E 's/^(http:\/\/)\S+(\.watch\.film\.tv\/)/\1"mystring"\2/i'

sed: struggling with substitution and regex for ^*=

I am running a linux bash script. From stout lines like: /gpx/trk/name=MyTrack1, I want to keep only the end of line after =.
I am struggling to understand why the following sed command is not working as I expect:
echo "/gpx/trk/name=MyTrack1" | sed -e "s/^*=//"
(I also tried)
echo "/gpx/trk/name=MyTrack1" | sed -e "s/^*\=//"
The return is always /gpx/trk/name=MyTrack1 and not MyTrack1
An even simpler way if this is the only structure you are concerned about:
echo "/gpx/trk/name=MyTrack1" | cut -d = -f 2
Simply try:
echo "/gpx/trk/name=MyTrack1" | sed 's/.*=//'
Solution 2nd: With another sed.
echo "/gpx/trk/name=MyTrack1" | sed 's/\(.*=\)\(.*\)/\2/'
Explanation: As per OP's request adding explanation for this code here:
s: Means telling sed to do substitution operation.
\(.*=\): Creating first place in memory to keep this regex's value which tells sed to keep everything in 1st place of memory from starting to till = so text /gpx/trk/name= will be in 1 place.
\(.*\): Creating 2nd place in memory for sed telling it to keep everything now(after the match of 1st one, so this will start after =) and have value in it as MyTrack1
/\2/: Now telling sed to substitute complete line with only 2nd memory place holder which is MyTrack1
Solution 3rd: Or with awk considering that your Input_file is same as shown samples.
echo "/gpx/trk/name=MyTrack1" | awk -F'=' '{print $2}'
Solution 4th: With awk's match.
echo "/gpx/trk/name=MyTrack1" | awk 'match($0,/=.*$/){print substr($0,RSTART+1,RLENGTH-1)}'
$ echo "/gpx/trk/name=MyTrack1" | sed -e "s/^.*=//"
MyTrack1
The regular expression ^.*= matches anything up to and including the last = in the string.
Your regular expression ^*= would match the literal string *= at the start of a string, e.g.
$ echo "*=/gpx/trk/name=MyTrack1" | sed -e "s/^*=//"
/gpx/trk/name=MyTrack1
The * character in a regular expression usually modifies the immediately previous expression so that zero or more of it may be matched. When * occurs at the start of an expression on the other hand, it matches the character *.
Not to take you off the sed track, but this is easy with Bash alone:
$ echo "$s"
/gpx/trk/name=MyTrack1
$ echo "${s##*=}"
MyTrack1
The ##*= pattern removes the maximal pattern from the beginning of the string to the last =:
$ s="1=2=3=the rest"
$ echo "${s##*=}"
the rest
The equivalent in sed would be:
$ echo "$s" | sed -E 's/^.*=(.*)/\1/'
the rest
Where #*= would remove the minimal pattern:
$ echo "${s#*=}"
2=3=the rest
And in sed:
$ echo "$s" | sed -E 's/^[^=]*=(.*)/\1/'
2=3=the rest
Note the difference in * in Bash string functions vs a sed regex:
The * in Bash (in this context) is glob like - itself means 'any character'
The * in a regex refers to the previous pattern and for 'any character' you need .*
Bash has extensive string manipulation functions. You can read about Bash string patterns in BashFAQ.

Find all text between $...$ delimiters using bash script

I have a text file, and I'm trying to get an array of strings containing between $..$ delimiters (LaTeX formulas) using bash script. My current code doesn't work, result is empty:
#!/bin/bash
array=($(grep -o '\$([^\$]*)\$' test.txt))
echo ${array[#]}
I tested this regex here, it finds the matches. I use the following test string:
b5f1e7$bfc2439c621353$d1ce0$629f$b8b5
Expected result is
bfc2439c621353 629f
But echo returns empty. Although if I use '[0-9]\+' it works:
5 1 7 2439 621353 1 0 629 8 5
What do I do wrong?
How about:
grep -o '\$[^$]*\$' test.txt | tr -d '$'
This is basically performing your original grep (but without the brackets, which were causing it to not match), then removing the first/last characters from each match.
You may use awk with input field separator as $:
s='b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
awk -F '$' '{for (i=2; i<=NF; i+=2) print $i}' <<< "$s"
Note that this awk command doesn't validate input. If you want awk to allow for only valid inputs then you may use this gnu awk command with FPAT:
awk -v FPAT='\\$[^$]*\\$' '{for (i=1; i<=NF; i++) {gsub(/\$/, "", $i); print $i}}' <<< "$s"
bfc2439c621353
629f
What about this?
grep -Eo '\$[^$]+\$' a.txt | sed 's/\$//g'
I'm using sed to replace the $.
Try escaping your braces:
tst> grep -o '\$\([^\$]*\)\$' test.txt
$bfc2439c621353$
$629f$
of course, you then have to strip out the $ signs (-o prints the entire match). You can try sed instead:
tst> sed 's/[^\$]*\$\([^\$]*\)\$[^\$]*/\1\n/g' test.txt
bfc2439c621353
629f
Why is your expected output given b5f1e7$bfc2439c621353$d1ce0$629f$b8b5 the two elements bfc2439c621353 629f rather than the three elements bfc2439c621353 d1ce0 629f?
Here's a single grep command to extract those:
$ grep -Po '\$\K[^\$]*(?=\$)' <<<'b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
bfc2439c621353
d1ce0
629f
(This requires GNU grep as compiled with libpcre for -P)
This uses \$\K (equivalent to (?<=\$)to look behind at the first $ and (?=\$) to look ahead to the next $. Since these are lookarounds, they are not absorbed by grep in the process and therefore d1ce0 is available to be found.
Here's a single POSIX sed command to extract those:
$ sed 's/^[^$]*\$//; s/\$[^$]*$//; s/\$/\n/g' \
<<<'b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
bfc2439c621353
d1ce0
629f
This does not use any GNU notation and should work on any POSIX-compatible system (such as OS X). It removes the leading and trailing portions that aren't wanted, then replaces each $ with a newline.
Using bash regex:
var="b5f1e7\$bfc2439c621353\$d1ce0\$629f\$b8b5" # string to var
while [[ $var =~ ([^$]*\$)([^$]*)\$(.*) ]] # matching
do
echo -n "${BASH_REMATCH[2]} " # 2nd element has the match
var="${BASH_REMATCH[3]}" # 3rd is the rest of the string
done
echo # trailing newline
bfc2439c621353 629f

Get all strings after the 4th occurrence of the pattern is found in bash

Starting with a string like:
String=1973251922:197325192278:abcdefgh:0xfff689990:Searching done for the string:SUCCESS.
A regular expression needed for matching all strings after the 4th colon ":" and assigning it for a variable in shell script like:
var_result="Searching done for the string:SUCCESS."
Using shell (bash or POSIX)
$ string="1973251922:197325192278:abcdefgh:0xfff689990:Searching done for the string:SUCCESS."
$ echo "${string#*:*:*:*:}"
Searching done for the string:SUCCESS.
${string#*:*:*:*:} is an example of prefix removal. It removes a prefix consisting of four colon-separated strings.
The output can be saved in a shell variable:
$ var_result=${string#*:*:*:*:}
$ echo "$var_result"
Searching done for the string:SUCCESS.
Using cut
cut works for this:
$ string="1973251922:197325192278:abcdefgh:0xfff689990:Searching done for the string:SUCCESS."
$ cut -d: -f 5- <<<"$string"
Searching done for the string:SUCCESS.
The above selects the fifth field and all succeeding fields where fields are separated by colons. More specifically, -d: tells cut to use : as the field separator and -f 5- tells it to select field 5 and everything after.
To save the output in a variable, we use command substitution:
$ var_result=$(cut -d: -f 5- <<<"$var")
$ echo "$var_result"
Searching done for the string:SUCCESS.
If you just have a POSIX shell, not bash, then we need to use echo:
$ var_result=$(echo "$var" | cut -d: -f 5-)
$ echo "$var_result"
Searching done for the string:SUCCESS.
Or, safer still, printf:
$ var_result=$(printf "%s" "$var" | cut -d: -f 5-)
$ echo "$var_result"
Searching done for the string:SUCCESS.
Using sed
The following uses sed to remove the first four fields defined by colons:
$ sed -E 's/([^:]*:){4}//' <<<"$string"
Searching done for the string:SUCCESS.
More specifically:
[^:] matches any character except :.
[^:]*: matches any number of non-colons followed by a colon.
([^:]*:){4} matches exactly four colon separated fields.
s/([^:]*:){4}// is a substitute command which looks for the first four colon-separated columns and replaces them with an empty string.
The following is the same but saves the result in a variable:
$ var_result=$(sed -E 's/([^:]*:){4}//' <<<"$string")
$ echo "$var_result"
Searching done for the string:SUCCESS.
The following is the same but good also for POSIX shells:
$ var_result=$(printf '%s' "$var" | sed -E 's/([^:]*:){4}//')
$ echo "$var_result"
Searching done for the string:SUCCESS.
Following solution may help you on same.
Let's say following is the variable's value:
var="1973251922:197325192278:abcdefgh:0xfff689990:Searching done for the string:SUCCESS."
echo "$var"
1973251922:197325192278:abcdefgh:0xfff689990:Searching done for the string:SUCCESS.
echo "$var" | awk -F":" '{$1=$2=$3=$4="";sub(/^:+/,"");print $0}' OFS=":"
Searching done for the string:SUCCESS.
With bash regex you can say:
String="1973251922:197325192278:abcdefgh:0xfff689990:Searching done for the string:SUCCESS."
if [[ $String =~ ^([^:]*:){4}(.+)$ ]]; then
echo ${BASH_REMATCH[2]}
fi

Find regular expression in a file matching a given value

I have some basic knowledge on using regular expressions with grep (bash).
But I want to use regular expressions the other way around.
For example I have a file containing the following entries:
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
Now I want to use bash to figure out to which line a particular number matches.
For example:
grep 8 file
should return:
line_three=[7-9]
Note: I am aware that the example of "grep 8 file" doesn't make sense, but I hope it helps to understand what I am trying to achieve.
Thanks for you help,
Marcel
As others haven pointed out, awk is the right tool for this:
awk -F'=' '8~$2{print $0;}' file
... and if you want this tool to feel more like grep, a quick bash wrapper:
#!/bin/bash
awk -F'=' -v seek_value="$1" 'seek_value~$2{print $0;}' "$2"
Which would run like:
./not_exactly_grep.sh 8 file
line_three=[7-9]
My first impression is that this is not a task for grep, maybe for awk.
Trying to do things with grep I only see this:
for line in $(cat file); do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done
Using while for file reading (following comments):
while IFS= read -r line; do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done < file
This can be done in native bash using the syntax [[ $value =~ $regex ]] to test:
find_regex_matching() {
local value=$1
while IFS= read -r line; do # read from input line-by-line
[[ $line = *=* ]] || continue # skip lines not containing an =
regex=${line#*=} # prune everything before the = for the regex
if [[ $value =~ $regex ]]; then # test whether we match...
printf '%s\n' "$line" # ...and print if we do.
fi
done
}
...used as:
find_regex_matching 8 <file
...or, to test it with your sample input inline:
find_regex_matching 8 <<'EOF'
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
EOF
...which properly emits:
line_three=[7-9]
You could replace printf '%s\n' "$line" with printf '%s\n' "${line%%=*}" to print only the key (contents before the =), if so inclined. See the bash-hackers page on parameter expansion for a rundown on the syntax involved.
This is not built-in functionality of grep, but it's easy to do with awk, with a change in syntax:
/[0-3]/ { print "line one" }
/[4-6]/ { print "line two" }
/[7-9]/ { print "line three" }
If you really need to, you could programmatically change your input file to this syntax, if it doesn't contain any characters that need escaping (mainly / in the regex or " in the string):
sed -e 's#\(.*\)=\(.*\)#/\2/ { print "\1" }#'
As I understand it, you are looking for a range that includes some value.
You can do this in gawk:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
$ awk -v n=8 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[7-9]
Since the digits are being treated as numbers (vs a regex) it supports larger ranges:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[75-95]
line_four=[55-105]
$ awk -v n=92 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[75-95]
line_four=[55-105]
If you are just looking to interpret the right hand side of the = as a regex, you can do:
$ awk -F= -v tgt=8 'tgt~$2' /tmp/file
You would like to do something like
grep -Ef <(cut -d= -f2 file) <(echo 8)
This wil grep what you want but will not display where.
With grep you can show some message:
echo "8" | sed -n '/[7-9]/ s/.*/Found it in line_three/p'
Now you would like to transfer your regexp file into such commands:
sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file
Store these commands in a virtual command file and you will have
echo "8" | sed -nf <(sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file)