change ampersands in href - regex

I know s/&/\&/g replaces all escaped ampersands and replaces them with ampersands. I want to be more picky. I want to only replace those escaped ampersands if they are in an href. I can't figure it out.
I was trying the following but it wasn't working:
echo "Link" | sed -E 's/^href="(.*)&/\1&/g'
It didn't work. I also see another problem being it would only do the first instance of an escaped ampersand and not all. Anyone know what the solution might be?

Not sure how to do it with sed, but here's Ruby:
echo 'Link' | ruby -pe '$_.gsub!(/href="([^"]*)"/) { |h| h.gsub("&", "&") }'
However, I fully support #muistooshort's comment: unless you're doing something weird, you should want the & in there.

perl -e '$url=$ARGV[0]; while ( $url =~ s/(Link'
Easily amended to run through a file

Related

How do I perform a regex test in bash that starts with spaces and includes quotation marks?

I'm trying to write a bash script that will change the fill color of certain elements within SVG files. I'm inexperienced with shell scripting, but I'm good with regexes (...in JS).
Here's the SVG tag I want to modify:
<!-- is the target because its ID is exactly "the.target" -->
<path id="the.target" d="..." style="fill:#000000" />
Here's the bash code I've got so far:
local newSvg="" # will hold newly-written SVG file content
while IFS="<$IFS" read tag
do
if [[ "${tag}" =~ +id *= *"the\.target" ]]; then
tag=$(echo "${tag}" | sed 's/fill:[^;];/fill:${color};/')
fi
newSvg="${newSvg}${tag}"
done < ${iconSvgPath} # is an argument to the script
Explained: I'm using read (splitting the file on < via custom IFS) to read the SVG content tag by tag. For each tag, I test to see if it includes an id property with the exact value I want. If it doesn't, I add this tag as-is to a newSvg string that I will later write to a file. If the tag does have the desired ID, I'll used sed to replace fill:STUFF; with fill:${myColor};. (Note that my sed is also failing, but that's not what I'm asking about here.)
It fails to find the right line with the test [[ "${tag}" =~ +id *= *"the\.target" ]].
It succeeds if I change the test to [[ "${tag}" =~ \"the\.target\" ]].
I'm not happy with the working version because it's too brittle. While I don't intend to support all the flexibility of XML, I would like to be tolerant of semantically irrelevant whitespace, as well as the id property being anywhere within the tag. Ideally, the regex I'd like to write would express:
id (preceded by at least one whitespace)
followed by zero or more whitespaces
followed by =
followed by zero or more whitespaces
followed by "the.target"
I think I'm not delimiting the regex properly inside the [[ ... =~ REGEX ]] construction, but none of the answers I've seen online use any delimiters whatsoever. In javascript, regex literals are bounded (e.g. / +id *= *"the\.target"/), so it's straightforward beginning a regex with a whitespace character that you care about. Also, JS doesn't have any magic re: *, whereas bash is 50% magic-handling-of-asterisks.
Any help is appreciated. My backup plan is maybe to try to use awk instead (which I'm no better at).
EDIT: My sed was really close. I forgot to add + after the [^;] set. Oof.
It would be much easier if you define regular expression pattern in a variable :
tag=' id = "the.target"'
pattern=' +id *= *"the\.target"'
if [[ $tag =~ $pattern ]]; then
echo matched.
fi
Thank you for giving us such a clear example that regex is not the way to solve this problem.
A SVG file is an XML file, and a possible tool to modify these is xmlstarlet.
Try this script I called modifycolor:
#!/bin/bash
# invoke as: modifycolor <svg.file> <target_id> <new_color>
xmlstarlet edit \
--update "//path[#id = '$2']/#style" --value "fill:#$3" \
"$1"
Assuming the svg file is test.svg, invoke it as:
./modifycolor test.svg the.target ff0000
You will be astonished by the result.
If you want to paste a piece of code inside your bash script, try this:
target="the.target"
newSvg=$(xmlstarlet edit \
--update "//path[#id = '${target}']/#style" --value "fill:#${myColor}" \
"${iconSvgPath}")
Thanks to folks for pointing out the mistakes in my bash-fu, I came up with this code which does what I said I wanted. I will not be marking this as the accepted answer because, as folks have observed, regex is a bad way to operate on XML. Sharing this for posterity.
local newSvg="" # will hold newly-written SVG code
while IFS="<$IFS" read tag
do
if [[ "${tag}" =~ \ +id\ *=\ *\"the\.target\" ]]; then
tag=$(echo "${tag}" | sed -E 's/fill:[^;]+;/fill:'"${color}"';/')
fi
newSvg="${newSvg}${tag}"
done < ${iconSvgPath}
Fixes:
escape the whitespace in the regex: =~ \ +id\ *=\ *
for sed, switch to double-quotes for the variable in the pattern
also for sed, I added the -E extended regex flag in order to support the negated set [^;]
Re: XML, I'll be comparing the list of available CLI-friendly XML parsers to the set of tools commonly available on my users' machines.

Using ampersand in sed

I have a csv file full of lines like the following:
Aity Chel Jenni,Hendaland 229,2591 TE Amsterdam
I want to create a sed pattern for in an automated batch script that changes the info in this kind of formatting into the following formatting:
Aity Chel Jenni,Hendaland 30,2591 TE, Amsterdam
With a bit of research, I found out that I had to create a regex, then use an ampersand (&) character to have it change things around using the & to define the location of the regex.
I have tried the following:
sed 's/([1-9] [A-Z]{2}/&,/' file1 >file2
And have been trying variants of that trying to get the regexes down, but it doesn't seem to change anything.
Am I making a mistake in the usage of the ampersand or is my regex wrong?
Reading through the internet I can't seem to wrap my head around this function, can someone give me any examples/explain to me how to properly do this?
You are saying
sed 's/([1-9] [A-Z]{2}/&,/' file1 >file2
^
But you don't have to capture with () to use &. Instead, just say:
sed 's/[1-9] [A-Z]\{2\}/&,/' file
Note you need to escape the elements in the { } quantifier, unless you use -r:
sed -r 's/[1-9] [A-Z]{2}/&,/' file
Try the following:
sed -r 's:[0-9] [A-Z]{2}\b:&,:' file > out
About your own pattern, you're missing the closing parenthesis. And, iirc, you need to escape ( inside sed patterns to not match them literally.
The -r option enabled sed to use extended regex, which provides the {2} expansion.

Regular expression help - what's wrong?

I would like to ask for help with my regex. I need to extract the very last part from each URL. I marked it as 'to_extract' within the example below.
I want to know what's wrong with the following regex when used with sed:
sed 's/^[ht|f]tp.*\///' file.txt
Sample content of file.txt:
http://a/b/c/to_extract
ftp://a/b/c/to_extract
...
I am getting only correct results for the ftp links, not for the http.
Thanks in advance for your explanation on this.
i.
Change [ht|f] to (ht|f), that would give better results.
[abc] means "one character which is a, b or c".
[ht|f] means "one character which is h, t, | or f", not at all what you want.
On some versions of sed, you'll have to call it with the -r option so that extended regex can be used :
sed -r 's/^(ht|f)tp.*\///' file.txt
If you just want to extract the last part of the url and don't want anything else, you probably want
sed -rn 's/^(ht|f)tp.*\///p' file.txt
How about use "basename" :
basename http://a/b/c/to_extract
to_extract
you can simply achieve what you want with a for loop.
#!/bin/bash
myarr=( $(cat ooo) )
for i in ${myarr[#]}; do
basename $i
done

Using sed to remove all console.log from javascript file

I'm trying to remove all my console.log, console.dir etc. from my JS file before minifying it with YUI (on osx).
The regex I got for the console statements looks like this:
console.(log|debug|info|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*)\);?
and it works if I test it with the RegExr.
But it won't work with sed.
What do I have to change to get this working?
sed 's/___???___//g' <$RESULT >$RESULT_STRIPPED
update
After getting the first answer I tried
sed 's/console.log(.*)\;//g' <test.js >result.js
and this works, but when I add an OR
sed 's/console.\(log\|dir\)(.*)\;//g' <test.js >result.js
it doesn't replace the "logs":
Your original expression looks fine. You just need to pass the -E flag to sed, for extended regular expressions:
sed -E 's/console.(log|debug|info|...|count)\((.*)\);?//g'
The difference between these types of regular expressions is explained in man re_format.
To be honest I have never read that page, but instead simply tack on an -E when things don't work as expected. =)
You must escape ( (for grouping) and | (for oring) in sed's regex syntax. E.g.:
sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
UPDATE example:
$ sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
console.log # <- input line, not matches, no replacement printed on next line
console.log
console.log() # <- input line, matches, no printing
console.log(blabla); # <- input line, matches, no printing
console.log(blabla) # <- input line, matches, no printing
console.debug(); # <- input line, matches, no printing
console.debug(BAZINGA) # <- input line, matches, no printing
DATA console.info(ditto); DATA2 # <- input line, matches, printing of expected data
DATA DATA2
HTH
I also find the way to remove all the console.log ,
and i am trying to use python to do this,
but i find the Regex is not work for.
my writing like this:
var re=/^console.log(.*);?$/;
but it will match the following string:
'console.log(23);alert(234dsf);'
does it work? with the
"s/console.(log|debug|info|...|count)((.*));?//g"
I try this:
sed -E 's/console.(log|debug|info)( ?| +)\([^;]*\);//g'
See the test:
Regex Tester
Here's my implementation
for i in $(find ./dir -name "*.js")
do
sed -E 's/console\.(log|warn|error|assert..timeEnd)\((.*)\);?//g' $i > ${i}.copy && mv ${i}.copy $i
done
took the sed thing from github
I was feeling lazy and hoping to find a script to copy & paste. Alas there wasn't one, so for the lazy like me, here is mine. It goes in a file named something like 'minify.sh' in the same directory as the files to minify. It will overwrite the original file and it needs to be executable.
#!/bin/bash
for f in *.js
do
sed -Ei 's/console.(log|debug|info)\((.*)\);?//g' $f
yui-compressor $f -o $f
done
I'd just like to add here that I was running into issues with namespaced console.logs such as window.console.log. Also Tweenmax.js has some interesting uses of console.log in some parts such as
window.console&&console.log(t)
So I used this
sed -i.bak s/[^\&a-zA-Z0-9\.]console.log\(/\\/\\//g js/combined.js
The regex effectively says replace all console.logs that don't start with &, alphanumerics, and . with a '//' comment, which uglify later takes out.
Rodrigocorsi's works with nested parentheses. I added a ? after the ; because yuicompressor was omitting some semicolons.
It is probable that the reason this is not working is that you are not 'limiting'
the regex to not include a closing parenthesises ()) in the method parameters.
Try this regular expression:
console\.(log|trace|error)\(([^)]+)\);
Remember to include the rest of your method names in the capture group.

Regular Expression to strip comments from Bash script

This is deceptively complex. I need a regular expression to strip comments from Bash shell scripts.
Bear in mind that $#, ${#foo}, string="this # string", string='that # string', ${foo#bar}, ${foo##baar}, and
string="really complex args=$# ${applejack##"jack"} $(echo "$#, again")"; `echo this is a ${#nasty[*]} example`
are all valid shell expressions that should not be stripped.
Edit:
Note that:
# This is a comment in bash
# But so is this
echo "foo bar" # This is also a comment
Edit:
Note that lines that might be misconstrued as comments may be tucked inside HEREDOCs but since it is multi-line I can live without handling/accounting for it:
cat<<EOF>>out.txt
This is just a heredoc
# This line looks like a comment, but it isn't
EOF
You cannot do that with regular expressions.
echo ${baz/${foo/${foo/#bar/foo}/bar}/qux}
You need to match nested braces. Regular expressions can't do that, unless you're willing to consider PCREs "regular expressions", in which case it would be simpler to just write the parser in Perl.
Just for fun ...
I don't believe you can do this without using/implementing a parser but it's fun seeing how far you can get without doing that.
The closest I gotten is to use a simple regex with sed. It preserves the hash bang which is a definite must but can't cope with the HEREDOC. You could go further but then it might not be fun anymore.
Sample bash script (called doit)
#!/bin/bash
#This
# is a
echo $1 #comment
Running that ...
cat doit | sed -e 's/#[^!].*$//'
#!/bin/bash
echo $1
But obviously there are blank lines produced which you don't want AND it doesn't handle HERE docs.
Again, not a serious suggestion but please play around with it.
EDITED: I admit it! sed won't work for the reasons given in comments - sed doesn't handle lookaheads/lookbehinds. Thanks for pointing that out!
I thought a comment in bash was a line that started with a #. If so, here's your regex:
^#
And here's the sed command that will strip them:
sed -i '' -e 's/^\s*#(?!!).*$//' myfile.sh
EDITED to factor in downvoter's comments: ie
allow whitespace before the # using \s*
exclude lines that have a ! following the # using negative lookahead (?!!)