Extract parent url from header with bash script - regex

I have some time trying to get some part of a text (a header) on bash script but I couldn't. This is the string I have:
link: <https://api.some.com/v1/monitor/zzsomeLongIdzz?access_token=xxSomeLongTokenxx==>; rel="monitor",<https://api.some.com/v1/services/xx/something-more/accounts/2345?access_token=xxSomeLongTokenxx==>; rel="parent"
Here is with some format so you could see it better:
link:
<https://api.some.com/v1/monitor/zzsomeLongIdzz?access_token=xxSomeLongTokenxx==>; rel="monitor",
<https://api.some.com/v1/services/xx/something-more/accounts/2345?access_token=xxSomeLongTokenxx==>; rel="parent"
I need the second part, just the url, basically the values between ,< and >; rel="parent" and assign that to a variable, like:
my_url = $(echo $complete_header) <== some way to filter that
I have no idea how to apply some regex or pattern to extract the data I need. On the past I had use jq for filtering json responses, like this:
error_message=$(echo $response | jq '.["errors"]|.[0]|.["message"]')
But unfortunately for me, this is not a json. Could somebody point me on the right direction with that?

Use the following code:
#!/bin/bash
link='<https://api.some.com/v1/monitor/zzsomeLongIdzz?access_token=xxSomeLongTokenxx==>; rel="monitor",<https://api.some.com/v1/services/xx/something-more/accounts/2345?access_token=xxSomeLongTokenxx==>; rel="parent"'
re=",<([^>]+)>"
# look for ,< literally
# capture everything that is not a > one or more times ([^>]+)
# look for the closing > literally
my_url="test"
if [[ $link =~ $re ]]; then my_url=${BASH_REMATCH[1]}; fi
echo $my_url
See a demo on ideone.com.

Related

Replace unknown sub-string in an URL

I have an URL in the format like https://foo.bar.whoo.dum.io, for which I like to replace the foo string with something else. Of course, the foo part is unknown and can be anything.
I tried with a simple regex like (.+?)\.(.+), but it seems that regex in Bash is always greedy (or?).
My best attempt is to split the string by . and then join it back with the first part left out, but I was wondering, whether there is a more intuitive, different solution.
Thank you
There are a lot of ways of getting the desired output.
If you're sure the url will always start with https://, we can use parameter expansion to remove everything before the first . and then add the replacement you need:
input="https://foo.bar.whoo.dum.io"
echo "https://new.${input#*.}"
Will output
https://new.bar.whoo.dum.io
Try it online!
You can use sed:
url='https://foo.bar.whoo.dum.io'
url=$(sed 's,\(.*://\)[^/.]*,\1new_value,' <<< "$url")
Here, the sed command means:
\(.*://\) - Capturing group 1: any text and then ://
[^/.]* - zero or more chars other than / and .
\1new_value - replaces the match with the Group 1 and new_value is appended to this group value.
See the online demo:
url='https://foo.bar.whoo.dum.io'
sed 's,\(.*://\)[^/.]*,\1new_value,' <<< "$url"
# => https://new_value.bar.whoo.dum.io
1st solution: Using Parameter expansion capability of bash here, adding this solution. Where newValue is variable with new value which you want to have in your url.
url='https://foo.bar.whoo.dum.io'
newValue="newValue"
echo "${url%//*}//$newValue.${url#*.}"
2nd solution: With your shown samples, please try following sed code here. Where variable url has your shown sample url value in it.
echo "$url" | sed 's/:\/\/[^.]*/:\/\/new_value/'
Explanation: Simple explanation would be, printing shell variable named url value by echo command and sending it as a standard input to sed command. Then in sed command using its capability of substitution here. Where substituting :// just before1st occurrence of . with ://new_value as per requirement.

How do I perform a regex test in bash that starts with spaces and includes quotation marks?

I'm trying to write a bash script that will change the fill color of certain elements within SVG files. I'm inexperienced with shell scripting, but I'm good with regexes (...in JS).
Here's the SVG tag I want to modify:
<!-- is the target because its ID is exactly "the.target" -->
<path id="the.target" d="..." style="fill:#000000" />
Here's the bash code I've got so far:
local newSvg="" # will hold newly-written SVG file content
while IFS="<$IFS" read tag
do
if [[ "${tag}" =~ +id *= *"the\.target" ]]; then
tag=$(echo "${tag}" | sed 's/fill:[^;];/fill:${color};/')
fi
newSvg="${newSvg}${tag}"
done < ${iconSvgPath} # is an argument to the script
Explained: I'm using read (splitting the file on < via custom IFS) to read the SVG content tag by tag. For each tag, I test to see if it includes an id property with the exact value I want. If it doesn't, I add this tag as-is to a newSvg string that I will later write to a file. If the tag does have the desired ID, I'll used sed to replace fill:STUFF; with fill:${myColor};. (Note that my sed is also failing, but that's not what I'm asking about here.)
It fails to find the right line with the test [[ "${tag}" =~ +id *= *"the\.target" ]].
It succeeds if I change the test to [[ "${tag}" =~ \"the\.target\" ]].
I'm not happy with the working version because it's too brittle. While I don't intend to support all the flexibility of XML, I would like to be tolerant of semantically irrelevant whitespace, as well as the id property being anywhere within the tag. Ideally, the regex I'd like to write would express:
id (preceded by at least one whitespace)
followed by zero or more whitespaces
followed by =
followed by zero or more whitespaces
followed by "the.target"
I think I'm not delimiting the regex properly inside the [[ ... =~ REGEX ]] construction, but none of the answers I've seen online use any delimiters whatsoever. In javascript, regex literals are bounded (e.g. / +id *= *"the\.target"/), so it's straightforward beginning a regex with a whitespace character that you care about. Also, JS doesn't have any magic re: *, whereas bash is 50% magic-handling-of-asterisks.
Any help is appreciated. My backup plan is maybe to try to use awk instead (which I'm no better at).
EDIT: My sed was really close. I forgot to add + after the [^;] set. Oof.
It would be much easier if you define regular expression pattern in a variable :
tag=' id = "the.target"'
pattern=' +id *= *"the\.target"'
if [[ $tag =~ $pattern ]]; then
echo matched.
fi
Thank you for giving us such a clear example that regex is not the way to solve this problem.
A SVG file is an XML file, and a possible tool to modify these is xmlstarlet.
Try this script I called modifycolor:
#!/bin/bash
# invoke as: modifycolor <svg.file> <target_id> <new_color>
xmlstarlet edit \
--update "//path[#id = '$2']/#style" --value "fill:#$3" \
"$1"
Assuming the svg file is test.svg, invoke it as:
./modifycolor test.svg the.target ff0000
You will be astonished by the result.
If you want to paste a piece of code inside your bash script, try this:
target="the.target"
newSvg=$(xmlstarlet edit \
--update "//path[#id = '${target}']/#style" --value "fill:#${myColor}" \
"${iconSvgPath}")
Thanks to folks for pointing out the mistakes in my bash-fu, I came up with this code which does what I said I wanted. I will not be marking this as the accepted answer because, as folks have observed, regex is a bad way to operate on XML. Sharing this for posterity.
local newSvg="" # will hold newly-written SVG code
while IFS="<$IFS" read tag
do
if [[ "${tag}" =~ \ +id\ *=\ *\"the\.target\" ]]; then
tag=$(echo "${tag}" | sed -E 's/fill:[^;]+;/fill:'"${color}"';/')
fi
newSvg="${newSvg}${tag}"
done < ${iconSvgPath}
Fixes:
escape the whitespace in the regex: =~ \ +id\ *=\ *
for sed, switch to double-quotes for the variable in the pattern
also for sed, I added the -E extended regex flag in order to support the negated set [^;]
Re: XML, I'll be comparing the list of available CLI-friendly XML parsers to the set of tools commonly available on my users' machines.

Extract value of get parameter in shell

I have an input that could either be dn3321 or
https://domaincom/file?tag=dn3321 and I'm trying to parse the value of tag using shell.
Looks like a regex could do the trick, how would I write a one liner to detect if it's a URL if it is apply the regex to extract the value and if its not just use the value directly.
It's unclear from the question what the full space of possible inputs looks like, but, for the simple cases you gave, you can use parameter expansion:
#!/usr/bin/env bash
in1='dn3321'
in2='https://domaincom/file?tag=dn3321'
echo "${in1#*=}"
echo "${in2#*=}"
# prints "dn3321" twice
This works by removing the first = and any text preceding it.
If you just need to print out a very specific part of the string that is a url you can do it like this:
#!/bin/bash
url="https://domaincom/file?tag=dn3321"
if [[ "$url" =~ "${http,,}" ]] ; then
tag=$(echo $url | cut -d'=' -f2)
fi
if you need something more elaborate I can post an example.

Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)

I have a bunch of daily printer logs in CSV format and I'm writing a script to keep track of how much paper is being used and save the info to a database, but I've come across a small problem
Essentially, some of the document names in the logs include commas in them (which are all enclosed within double quotes), and since it's in comma separated format, my code is messing up and pushing everything one column to the right for certain records.
From what I've been reading, it seems like the best way to go about fixing this would be using awk or sed, but I'm unsure which is the best option for my situation, and how exactly I'm supposed to implement it.
Here's a sample of my input data:
2015-03-23 08:50:22,Jogn.Doe,1,1,Ineo 4000p,"MicrosoftWordDocument1",COMSYRWS14,A4,PCL6,,,NOT DUPLEX,GRAYSCALE,35kb,
And here's what I have so far:
#!/bin/bash
#Get today's file name
yearprefix="20"
currentdate=$(date +"%m-%d-%y");
year=${currentdate:6};
year="$yearprefix$year"
month=${currentdate:0:2};
day=${currentdate:3:2};
filename="papercut-print-log-$year-$month-$day.csv"
echo "The filename is: $filename"
# Remove commas in between quotes.
#Loop through CSV file
OLDIFS=$IFS
IFS=,
[ ! -f $filename ] && { echo "$Input file not found"; exit 99; }
while read time user pages copies printer document client size pcl blank1 blank2 duplex greyscale filesize blank3
do
#Remove headers
if [ "$user" != "" ] && [ "$user" != "User" ]
then
#Remove any file name with an apostrophe
if [[ "$document" =~ "'" ]];
then
document="REDACTED"; # Lazy. Need to figure out a proper solution later.
fi
echo "$time"
#Save results to database
mysql -u username -p -h localhost -e "USE printerusage; INSERT INTO printerlogs (time, username, pages, copies, printer, document, client, size, pcl, duplex, greyscale, filesize) VALUES ('$time', '$user', '$pages', '$copies', '$printer', '$document', '$client', '$size', '$pcl', '$duplex', '$greyscale', '$filesize');"
fi
done < $filename
IFS=$OLDIFS
Which option is more suitable for this task? Will I have to create a second temporary file to get this done?
Thanks in advance!
As I wrote in another answer:
Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas (with say |) instead:
s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g
And then splitting on | (assuming none of your data has | in it).
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern
There is probably an easier way using sed alone, but this should work. Loop on the file, for each line match the parentheses with grep -o then replace the commas in the line with spaces (or whatever it is you would like to use to get rid of the commas - if you want to preserve the data you can use a non printable and explode it back to commas afterward).
i=1 && IFS=$(echo -en "\n\b") && for a in $(< test.txt); do
var="${a}"
for b in $(sed -n ${i}p test.txt | grep -o '"[^"]*"'); do
repl="$(sed "s/,/ /g" <<< "${b}")"
var="$(sed "s#${b}#${repl}#" <<< "${var}")"
done
let i+=1
echo "${var}"
done

Using sed to remove all console.log from javascript file

I'm trying to remove all my console.log, console.dir etc. from my JS file before minifying it with YUI (on osx).
The regex I got for the console statements looks like this:
console.(log|debug|info|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*)\);?
and it works if I test it with the RegExr.
But it won't work with sed.
What do I have to change to get this working?
sed 's/___???___//g' <$RESULT >$RESULT_STRIPPED
update
After getting the first answer I tried
sed 's/console.log(.*)\;//g' <test.js >result.js
and this works, but when I add an OR
sed 's/console.\(log\|dir\)(.*)\;//g' <test.js >result.js
it doesn't replace the "logs":
Your original expression looks fine. You just need to pass the -E flag to sed, for extended regular expressions:
sed -E 's/console.(log|debug|info|...|count)\((.*)\);?//g'
The difference between these types of regular expressions is explained in man re_format.
To be honest I have never read that page, but instead simply tack on an -E when things don't work as expected. =)
You must escape ( (for grouping) and | (for oring) in sed's regex syntax. E.g.:
sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
UPDATE example:
$ sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
console.log # <- input line, not matches, no replacement printed on next line
console.log
console.log() # <- input line, matches, no printing
console.log(blabla); # <- input line, matches, no printing
console.log(blabla) # <- input line, matches, no printing
console.debug(); # <- input line, matches, no printing
console.debug(BAZINGA) # <- input line, matches, no printing
DATA console.info(ditto); DATA2 # <- input line, matches, printing of expected data
DATA DATA2
HTH
I also find the way to remove all the console.log ,
and i am trying to use python to do this,
but i find the Regex is not work for.
my writing like this:
var re=/^console.log(.*);?$/;
but it will match the following string:
'console.log(23);alert(234dsf);'
does it work? with the
"s/console.(log|debug|info|...|count)((.*));?//g"
I try this:
sed -E 's/console.(log|debug|info)( ?| +)\([^;]*\);//g'
See the test:
Regex Tester
Here's my implementation
for i in $(find ./dir -name "*.js")
do
sed -E 's/console\.(log|warn|error|assert..timeEnd)\((.*)\);?//g' $i > ${i}.copy && mv ${i}.copy $i
done
took the sed thing from github
I was feeling lazy and hoping to find a script to copy & paste. Alas there wasn't one, so for the lazy like me, here is mine. It goes in a file named something like 'minify.sh' in the same directory as the files to minify. It will overwrite the original file and it needs to be executable.
#!/bin/bash
for f in *.js
do
sed -Ei 's/console.(log|debug|info)\((.*)\);?//g' $f
yui-compressor $f -o $f
done
I'd just like to add here that I was running into issues with namespaced console.logs such as window.console.log. Also Tweenmax.js has some interesting uses of console.log in some parts such as
window.console&&console.log(t)
So I used this
sed -i.bak s/[^\&a-zA-Z0-9\.]console.log\(/\\/\\//g js/combined.js
The regex effectively says replace all console.logs that don't start with &, alphanumerics, and . with a '//' comment, which uglify later takes out.
Rodrigocorsi's works with nested parentheses. I added a ? after the ; because yuicompressor was omitting some semicolons.
It is probable that the reason this is not working is that you are not 'limiting'
the regex to not include a closing parenthesises ()) in the method parameters.
Try this regular expression:
console\.(log|trace|error)\(([^)]+)\);
Remember to include the rest of your method names in the capture group.