Replace Random Characters in a Variable - regex

I want to replace value of a variable (can contain a number, a character, a string of characters).
$ echo $VAR
http://some-random-string.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352........
So far, I've tried this command, however it's not working, so I'm thinking some of these might need a regex.
$ echo $VAR | sed -e "s/\(http[^^]*\).*\(.watch\)/\1"mystring"\2/g"
$ echo $VAR | sed -e "s/\(https\?:\/\/\).*\(.watch\)/\1"mystring"\2/g"
$ echo $VAR | sed -e "s/\(http[s]\?:\/\/\).*\(.watch\)/\1"mystring"\2/g"
I'm aware that there are questions that answer similar queries, but they have not been of help.

echo $VAR | sed 's|\(http[x]*://\)[^.]*\(.*\)|\1mystring\2|'
explanation
s| # substitute
\(http[x]*://\) # save first part in arg1 (\1)
[^.]* # all without '.'
\(.*\) # save the rest in arg2 (\2)
|\1 # print arg1
mystring # print your replacement
\2 # print arg2
|

In sed you have to escape any control characters like forward slash before matching:
echo $VAR | sed 's/http.\/\/.*\.watch\.film\.tv/http:\/\/mystring.watch.film.tv/'

You don't need sed. This task can be done in just shell:
$ var='http://some-random-string.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352'
$ echo "${var%%//*}//mystring.watch${var#*.watch}"
http://mystring.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352
How it works:
${var%%//*} returns the value of $var with the first // and everything after it removed.
//mystring.watch adds the string that we want.
${var#*.watch}" returns the value of $var with everything up to and including the first occurrence of .watch removed.
Because this approach does not require pipelines or subshells, it will be more efficient.

gnu sed
$ echo $VAR | sed -E 's/^(http:\/\/)\S+(\.watch\.film\.tv\/)/\1"mystring"\2/i'

Related

Get substring using either perl or sed

I can't seem to get a substring correctly.
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g')
That still returns bugfix/US3280841-something-duh.
If I try an use perl instead:
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9]|[A-Z0-9])+/; print $1');
That outputs nothing.
What am I doing wrong?
Using bash parameter expansion only:
$: # don't use caps; see below.
$: declare branch="bugfix/US3280841-something-duh"
$: tmp="${branch##*/}"
$: echo "$tmp"
US3280841-something-duh
$: trimmed="${tmp%%-*}"
$: echo "$trimmed"
US3280841
Which means:
$: tmp="${branch_name##*/}"
$: trimmed="${tmp%%-*}"
does the job in two steps without spawning extra processes.
In sed,
$: sed -E 's#^.*/([^/-]+)-.*$#\1#' <<< "$branch"
This says "after any or no characters followed by a slash, remember one or more that are not slashes or dashes, followed by a not-remembered dash and then any or no characters, then replace the whole input with the remembered part."
Your original pattern was
's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g'
This says "remember any number of anything followed by a slash, then a lowercase letter or a digit, then a pipe character (because those only work with -E), then a capital letter or digit, then a literal plus sign, and then replace it all with what you remembered."
GNU's manual is your friend. I look stuff up all the time to make sure I'm doing it right. Sometimes it still takes me a few tries, lol.
An aside - try not to use all-capital variable names. That is a convention that indicates it's special to the OS, like RANDOM or IFS.
You may use this sed:
sed -E 's~^.*/|-.*$~~g' <<< "$BRANCH_NAME"
US3280841
Ot this awk:
awk -F '[/-]' '{print $2}' <<< "$BRANCH_NAME"
US3280841
sed 's:[^/]*/\([^-]*\)-.*:\1:'<<<"bugfix/US3280841-something-duh"
Perl version just has + in wrong place. It should be inside the capture brackets:
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9A-Z]+)/; print $1');
Just use a ^ before A-Z0-9
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[^A-Z0-9]\+/\1/g')
in your sed case.
Alternatively and briefly, you can use
TRIMMED=$(echo $BRANCH_NAME | sed "s/[a-z\/\-]//g" )
too.
type on shell terminal
$ BRANCH_NAME="bugfix/US3280841-something-duh"
$ echo $BRANCH_NAME| perl -pe 's/.*\/(\w\w[0-9]+).+/\1/'
use s (substitute) command instead of m (match)
perl is a superset of sed so it'd be identical 'sed -E' instead of 'perl -pe'
Another variant using Perl Regular Expression Character Classes (see perldoc perlrecharclass).
echo $BRANCH_NAME | perl -nE 'say m/^.*\/([[:alnum:]]+)/;'

How to parse every match of sed command

I have a string [u'SOMEVALUE1', u'SOMEVALUE2', u'SOMEVALUE3'], I would like to parse every element matched by my sed command. The element matched are in the single quote. Here is my script
#!/bin/bash
ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
for id in $(sed -n "s/^.*'\(.*\)'.*$/\1/ p" <<< ${ARR});
do
echo "$id"
done
I have only the first value returned.
The wildcard .* will match the longest leftmost possible string. If your intention is to match the individual substrings which are in single quotes, try
grep -o "'[^']*'" <<<"$ARR"
To remove the single quotes around the values, simply pipe to sed "s/'//g" and to loop over the lines printed by a pipe, do
... commands ... |
while read -r id; do
: things with "$id"
done
BASH can match regular expressions with the help of =~ (see man bash). Matching more than once is a bit painful but in your case we can split the input on white space and match once per item:
ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
for A in $ARR
do
[[ $A =~ u\'(.+)\' ]] && echo ${BASH_REMATCH[1]}
done
results in
SOMEVALUE1
SOMEVALUE1
SOMEVALUE1
is this what you're trying to do?
$ ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
$ awk -v RS="'" '!(NR%2)' <<< "$ARR"
SOMEVALUE1
SOMEVALUE1
SOMEVALUE1
$ awk -v RS="'" '!(NR%2)' <<< "$ARR" |
while IFS= read -r id; do echo "id=$id"; done
id=SOMEVALUE1
id=SOMEVALUE1
id=SOMEVALUE1

sed: struggling with substitution and regex for ^*=

I am running a linux bash script. From stout lines like: /gpx/trk/name=MyTrack1, I want to keep only the end of line after =.
I am struggling to understand why the following sed command is not working as I expect:
echo "/gpx/trk/name=MyTrack1" | sed -e "s/^*=//"
(I also tried)
echo "/gpx/trk/name=MyTrack1" | sed -e "s/^*\=//"
The return is always /gpx/trk/name=MyTrack1 and not MyTrack1
An even simpler way if this is the only structure you are concerned about:
echo "/gpx/trk/name=MyTrack1" | cut -d = -f 2
Simply try:
echo "/gpx/trk/name=MyTrack1" | sed 's/.*=//'
Solution 2nd: With another sed.
echo "/gpx/trk/name=MyTrack1" | sed 's/\(.*=\)\(.*\)/\2/'
Explanation: As per OP's request adding explanation for this code here:
s: Means telling sed to do substitution operation.
\(.*=\): Creating first place in memory to keep this regex's value which tells sed to keep everything in 1st place of memory from starting to till = so text /gpx/trk/name= will be in 1 place.
\(.*\): Creating 2nd place in memory for sed telling it to keep everything now(after the match of 1st one, so this will start after =) and have value in it as MyTrack1
/\2/: Now telling sed to substitute complete line with only 2nd memory place holder which is MyTrack1
Solution 3rd: Or with awk considering that your Input_file is same as shown samples.
echo "/gpx/trk/name=MyTrack1" | awk -F'=' '{print $2}'
Solution 4th: With awk's match.
echo "/gpx/trk/name=MyTrack1" | awk 'match($0,/=.*$/){print substr($0,RSTART+1,RLENGTH-1)}'
$ echo "/gpx/trk/name=MyTrack1" | sed -e "s/^.*=//"
MyTrack1
The regular expression ^.*= matches anything up to and including the last = in the string.
Your regular expression ^*= would match the literal string *= at the start of a string, e.g.
$ echo "*=/gpx/trk/name=MyTrack1" | sed -e "s/^*=//"
/gpx/trk/name=MyTrack1
The * character in a regular expression usually modifies the immediately previous expression so that zero or more of it may be matched. When * occurs at the start of an expression on the other hand, it matches the character *.
Not to take you off the sed track, but this is easy with Bash alone:
$ echo "$s"
/gpx/trk/name=MyTrack1
$ echo "${s##*=}"
MyTrack1
The ##*= pattern removes the maximal pattern from the beginning of the string to the last =:
$ s="1=2=3=the rest"
$ echo "${s##*=}"
the rest
The equivalent in sed would be:
$ echo "$s" | sed -E 's/^.*=(.*)/\1/'
the rest
Where #*= would remove the minimal pattern:
$ echo "${s#*=}"
2=3=the rest
And in sed:
$ echo "$s" | sed -E 's/^[^=]*=(.*)/\1/'
2=3=the rest
Note the difference in * in Bash string functions vs a sed regex:
The * in Bash (in this context) is glob like - itself means 'any character'
The * in a regex refers to the previous pattern and for 'any character' you need .*
Bash has extensive string manipulation functions. You can read about Bash string patterns in BashFAQ.

Replace a string in bash script using regex for both linux and AIX

I have a bash script that I copy and run on both linux and AIX servers.
This script gets a "name" parameter which represents a file name, and I need to manipulate this name via regex (the purpose is irrelevant and very hard to explain).
From the name parameter I need to take the beginning until the first "-" character that is followed by a digit, and then concat it with the last "." character until the end of the string.
For example:
name: abcd-efg-1.23.4567-8.jar will become: abcd-efg.jar
name: abc123-abc3.jar will remain: abc123-abc3.jar
name: abc-890.jar will become: abc.jar
I've tried several variations of:
name=$1
regExpr="^(.*?)-\d.*\.(.*?)$/g"
echo $name
echo $(printf ${name} | sed -e $regExpr)
Also I cant use sed -r (seen on some examples) because AIX sed does not support the -r flag.
The last line is the problem of course; I think I need to somehow use $1 + $2 placeholders, but I can't seem to get it right.
How can I change my regex so that it does what I want?
Given the file:
abcd-efg-1.23.4567-8.jar
abc123-abc3.jar
abc-890.jar
This is a way to change the names you give:
$ sed 's/\(.\?\)-[0-9].*\(\.[a-z]*\)$/\1\2/' file
abcd-efg.jar
abc123-abc3.jar
abc.jar
Which is equivalent to (if you could use -r):
$ sed -r 's/(.?)-[0-9].*(\.[a-z]*)$/\1\2/' file
abcd-efg.jar
abc123-abc3.jar
abc.jar
It gets everything up to - + digit and "stores" in \1.
It gets from last . + letters and "stores" in \2.
Finally it prints those blocks back.
Note the extension could also be fetched with the basename builtin or with something like `"${line##*.}".
You could use this:
perl -F'(-(?:\d)|\.)' -ane 'print "$F[0].$F[$#F]"'
It splits the input on any - followed by a digit, or any .. Then it prints the first field, followed by a dot, followed by the last field.
Testing it out:
$ cat file
abcd-efg-1.23.4567-8.jar
abc123-abc3.jar
abc-890.jar
$ perl -F'(-(?:\d)|\.)' -ane 'print "$F[0].$F[$#F]"' file
abcd-efg.jar
abc123-abc3.jar
abc.jar
In sed, you could simply use the following.
#!/bin/sh
STRING=$( cat <<EOF
abcd-efg-1.23.4567-8.jar
abc123-abc3.jar
abc-890.jar
EOF
)
echo "$STRING" | sed 's/-[0-9].*\(\.[^.]\+\)$/\1/'
# abcd-efg.jar
# abc123-abc3.jar
# abc.jar
This matches a hyphen followed by a number and everything after and replaces with the file extension.
Or you may consider using a Perl one-liner:
echo "$STRING" | perl -pe 's/-\d.*(?=\.[^.]+$)//'
# abcd-efg.jar
# abc123-abc3.jar
# abc.jar
when a successful regex match is made, perl will capture whatever is matched in parenthesis ( .. ) as $1, $2, etc.
$ perl -e 'my $arg = $ARGV[0]; $arg =~ /^(.*?)-\d.*\.(.*?)$/; print "$1.$2\n"; ' abc-890.jar
abc.jar

How do I form the correct regular expression to capture everything before parentheses?

current I have a set strings that are of the format
customName(path/to/the/relevant/directory|file.ext#FileRefrence_12345)
From this I could like to extract customName, the characters before the first parentheses, using sed.
My best guesses so far are:
echo $s | sed 's/([^(])+\(.*\)/\1/g'
echo $s | sed 's/([^\(])+\(.*\)/\1/g'
However, using these I get the error:
sed: -e expression #1, char 21: Unmatched ( or \(
So how do I form the correct regular expression? and why is it relevant that I do not have a matched \( is it is just an escaped character for my expression, not a character used for formatting?
you could substitute everything after the opening parenthesis, like this (note that parentheses by default do not need to be escaped in sed)
echo 'customName(path/to/the/relevant/directory|file.ext#FileRefrence_12345)' |
sed -e 's/(.*//'
grep
kent$ echo "customName(blah)"|grep -o '^[^(]*'
customName
sed
kent$ echo "customName(blah)"|sed 's/(.*//'
customName
note I changed the stuff between the brackets.
Different options:
$ echo $s | sed 's/(.*//' #sed (everything before "(")
customName
$ echo $s | cut -d"(" -f1 #cut (delimiter is "(", print 1st block)
customName
$ echo $s | awk -F"(" '{print $1}' #awk (field separator is "(", print 1st)
customName
$ echo ${s%(*} #bash command substitution
customName