Deleting everything after a pattern in Unix - regex

I have a string replenishment_category string,Date string, I want to delete everything starting with Date (including it), also the comma before it if it is present.
I have the string to be replaced stored in a variable:
PARTITION_COLUMN='Date'
I tried sed to replace everything after the variable PARTITION_COLUMN
echo "replenishment_category string,Date string" | sed "s/"$PARTITION_COLUMN".* //g"
but the output still has the string that follows the date:
replenishment_category string,string
How do I remove the string part and also the comma preceding the Date?

Try this:
echo "replenishment_category string,Date string" | sed "s/$PARTITION_COLUMN.*//"
Notice the space removed after .* and the double quote around the entire command.

You could do this with shell parameter expansion alone, assuming you have extended globs enabled (shopt -s extglob):
$ var='replenishment_category string,Date string'
$ part_column=Date
$ echo "${var%%?(,)"$part_column"*}"
replenishment_category string
The ${word%%pattern} expansion works without extended globs and removes the longest match of pattern from the end of $word.
The ?(pattern) extended pattern matches zero or one occurrences of pattern and is used to remove the comma if present.
"$part_column"* matches any string that begins with the expansion of $part_column. Quoting it is not required in the example, but good practice to prevent glob characters from expanding.

Since you want to remove variable as well as comma before it so following sed may help you here.
echo "replenishment_category string,Date string" | sed "s/,$PARTITION_COLUMN.*//g"

sed would obviously work here:
echo "replenishment_category string,Date string" | sed "s/\b,$PARTITION_COLUMN.*//"
Output:
replenishment_category string

Related

How to use sed to add double quotes around every word, excluding colons and commas

I want to alter a string so that I have double quotes around every "word," excluding colons and commas ':,'.
For example, my input may look like:
[ANALYSIS:true, RESTRICTED:false, STRING_PARAMETER:World,
JOB_NAME:Hello_Jenkins]
but I want it to appear as
["ANALYSIS":"true", "RESTRICTED":"false", "STRING_PARAMETER":"World",
"JOB_NAME":"Hello_Jenkins"]
I've been using something like (using '_' as the delimiter)
'echo ${params} | sed -i "s_\'/\\([^:]*\\):/i\'_\'"$1" :\'_g" '
based off of what I've found online, yet it makes no changes to my string.
> sed -r 's/[^], :[]+/"&"/g' file
["ANALYSIS":"true", "RESTRICTED":"false", "STRING_PARAMETER":"World", "JOB_NAME":"Hello_Jenkins"]
In the above sed we exclude colons, commas, the brackets and the spaces, as your example says so. If your case is not fully represented by your example, you could modify the excluded characters, but the order of the brackets in the expression is important.
$ echo '[ANALYSIS:true, RESTRICTED:false, STRING_PARAMETER:World, JOB_NAME:Hello_Jenkins]' |
sed 's/[[:alnum:]_]\+/"&"/g'
["ANALYSIS":"true", "RESTRICTED":"false", "STRING_PARAMETER":"World", "JOB_NAME":"Hello_Jenkins"]
or if you have to exclude instead of include chars in the regexp:
$ echo '[ANALYSIS:true, RESTRICTED:false, STRING_PARAMETER:World, JOB_NAME:Hello_Jenkins]' |
sed 's/[^][,: ]\+/"&"/g'
["ANALYSIS":"true", "RESTRICTED":"false", "STRING_PARAMETER":"World", "JOB_NAME":"Hello_Jenkins"]

REGEX - How to get rid of quotation marks at the start and end of a string

Have a bunch of strings
"pipe 1/4" square"
"3" bar"
"3/16" spanner
2" nozzle
spare tyre
I want to get rid of " marks from the start of the string and the end of the string with RegEx.
I've been trying on a simulator with the aid of some references but cannot seem to do it right.
Q: What is the RegEx that will do this with BASH?
Use this regex to match double quotes which exists at the start and end of a line ^"|"$ and then replace the match with empty string.
Using sed.
sed 's/^"\|"$//g' <<<$var
Try the following command:
echo $var | sed 's/^(.*)"$/\1/'
This will pass the variable $var into the sed command via the pipe | operator. Sed will then substitute this input string with the group match in parenthesis. This match is available in sed as \1. So your input string, minus the final quotation mark, is what will actually be output by echo.
Using Bash parameter expansion:
a="\"pipe 1/4\" square\""
a="${a/#\"/}" && a="${a/%\"/}"
echo "$a"
Output:
pipe 1/4" square
Explanation:
${var/old/new} replaces old with new in $var.
A # before old makes it to match at the beginning of $var.
A % before old makes it to match at the end of $var.

How to match until the last occurrence of a character in bash shell

I am using curl and cut on a output like below.
var=$(curl https://avc.com/actuator/info | tr '"' '\n' | grep - | head -n1 | cut -d'-' -f -1, -3)
Varible var gets have two kinds of values (one at a time).
HIX_MAIN-7ae526629f6939f717165c526dad3b7f0819d85b
HIX-R1-1-3b5126629f67892110165c524gbc5d5g1808c9b5
I am actually trying to get everything until the last '-'. i.e HIX-MAIN or HIX-R1-1.
The command shown works fine to get HIX-R1-1.
But I figured this is the wrong way to do when I have something something like only 1 - in the variable; it is getting me the entire variable value (e.g. HIX_MAIN-7ae526629f6939f717165c526dad3b7f0819d85b).
How do I go about getting everything up to the last '-' into the variable var?
This removes everything from the last - to the end:
sed 's/\(.*\)-.*/\1/'
As examples:
$ echo HIX_MAIN-7ae52 | sed 's/\(.*\)-.*/\1/'
HIX_MAIN
$ echo HIX-R1-1-3b5126629f67 | sed 's/\(.*\)-.*/\1/'
HIX-R1-1
How it works
The sed substitute command has the form s/old/new/ where old is a regular expression. In this case, the regex is \(.*\)-.*. This works because \(.*\)- is greedy: it will match everything up to the last -. Because of the escaped parens,\(...\), everything before the last - will be saved in group 1 which we can refer to as \1. The final .* matches everything after the last -. Thus, as long as the line contains a -, this regex matches the whole line and the substitute command replaces the whole line with \1.
You can use bash string manipulation:
$ foo=a-b-c-def-ghi
$ echo "${foo%-*}"
a-b-c-def
The operators, # and % are on either side of $ on a QWERTY keyboard, which helps to remember how they modify the variable:
#pattern trims off the shortest prefix matching "pattern".
##pattern trims off the longest prefix matching "pattern".
%pattern trims off the shortest suffix matching "pattern".
%%pattern trims off the longest suffix matching "pattern".
where pattern matches the bash pattern matching rules, including ? (one character) and * (zero or more characters).
Here, we're trimming off the shortest suffix matching the pattern -*, so ${foo%-*} will get you what you want.
Of course, there are many ways to do this using awk or sed, possibly reusing the sed command you're already running. Variable manipulation, however, can be done natively in bash without launching another process.
You can reverse the string with rev, cut from the second field and then rev again:
rev <<< "$VARIABLE" | cut -d"-" -f2- | rev
For HIX-R1-1----3b5126629f67892110165c524gbc5d5g1808c9b5, prints:
HIX-R1-1---
I think you should be using sed, at least after the tr:
var=$(curl https://avc.com/actuator/info | tr '"' '\n' | sed -n '/-/{s/-[^-]*$//;p;q}')
The -n means "don't print by default". The /-/ looks for a line containing a dash; it then executes s/-[^-]*$// to delete the last dash and everything after it, followed by p to print and q to quit (so it only prints the first such line).
I'm assuming that the output from curl intrinsically contains multiple lines, some of them with unwanted double quotes in them, and that you need to match only the first line that contains a dash at all (which might very well not be the first line). Once you've whittled the input down to the sole interesting line, you could use pure shell techniques to get the result that's desired, but getting the sole interesting line is not as trivial as some of the answers seem to be assuming.

Extract string located after or between matched pattern(s)

Given a string "pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words"
Is it possible to extract string between "crop=" and the following space using bash and grep?
So if I match "crop=" how can I extract anything after it and before the following white space?
Basically, I need "720:568:0:4" to be printed.
I'd do it this way:
grep -o -E 'crop=[^ ]+' | sed 's/crop=//'
It uses sed which is also a standard command. You can, of course, replace it with another sequence of greps, but only if it's really needed.
I would use sed as follows:
echo "pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words" | sed 's/.*crop=\([0-9.:]*\)\(.*\)/\1/'
Explanation:
s/ : substitute
.*crop= : everything up to and including "crop="
\([0-9.:]\) : match only numbers and '.' and ':' - I call this the backslash-bracketed expression
\(.*\) : match 'everything else' (probably not needed)
/\1/ : and replace with the first backslash-bracketed expression you found
I think this will work (need to recheck my reference):
awk '/crop=([0-9:]*?)/\1/'
yet another way with bash pattern substitution
PAT="pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words"
RES=${PAT#*crop=}
echo ${RES%% *}
first remove all up to and including crop= found from left to right (#)
then remove all from and including the first space found from right to left (%%)

Find and replace date/time in string

I have already made a regex which should work, but it doesn't work.
echo "FileName.17:09:2010 4.16.PM.720p.mp4" | sed -E 's/\d{2}:\d{2}:\d{4}\ (\d|\d{2})\.(\d{2}|\d)\.((AM)|(PM))//g'
Should output: FileName..720p.mp4
But instead outputs the same "FileName.17:09:2010 4.16.PM.720p.mp4".
Is \d a valid character class in sed? Try replacing it with [0-9] or [:digit:]. See re_format.
#! /bin/sh
f="FileName.17:09:2010 4.16.PM.720p.mp4"
echo ${f%%.*}${f##*[AP]M}
This works for any variable containing a string matching the pattern. See Parameter Expansion for shell variables.
With sed, just delete from the first dot to the AM or PM. Or, if the filename could have extraneous dots, then delete from the 1st number folowed by ':' up to [AP]M, 's/\.[0-9]\+:.*\[AP]M//'
I think the sed way might be better; because if it fails it returns the original string. A mismatch on the shell expression would return some of the name twice; but error checking can be added easily: It fails for files not having AM or PM in the name.
echo FileName.17:09:2010 4.16.PM.720p.mp4|sed -e 's/\..*[AP]M\././'
bash$ echo "FileName.17:09:2010 4.16.PM.720p.mp4"|ruby -e 's=gets.chomp.split(".");puts [s[0],s[-2,2]].join(".")'
FileName.720p.mp4