shell script to recognise jira key

shell script to recognise jira key - regex

Below lines in my jenkins job configuration Execute shell retrieves jira key
JIRA_KEY=$(curl --request GET "http://jenkins-server/job/myProject/job/mySubProject/job/myComponent/${BUILD_NUMBER}/api/xml?xpath=/*/changeSet/item/comment" | sed -e "s/<comment>\(.*\)<\/comment>/\1/")
JIRA_KEY=$(echo $JIRA_KEY | cut -c10-17)
But in case if text doesn't start with jira key then as per the current logic it will assign any text in the range of 10-17. I need to store empty string "" in the variable JIRA_KEY when jira key is not present in the <comment>, how can we do that?
Here is the xml
<freeStyleBuild _class="hudson.model.FreeStyleBuild">
<changeSet _class="hudson.plugins.git.GitChangeSetList">
<item _class="hudson.plugins.git.GitChangeSet">
<comment>
JRA-1011 This is commit
message.
</comment>
</item>
</changeSet>
</freeStyleBuild>

As I mentioned in comment section it is not clear which output you need, so based on some assumptions, could you please try following and let me know on same.
I- If you need all the strings between to then you could run following.
awk '/<\/comment>/{a="";next}/<comment>/{a=1;next}a' Input_file
II- If you need to find only JRA string between to then you could do following.
awk '/<\/comment>/{a="";next}/<comment>/{a=1;next} a && /JRA/{match($0,/[a-zA-Z]+[^ ]*/);print substr($0,RSTART,RLENGTH)}' Input_file

Related

bash script - fetch only unique domains from email list to variable

I am new to bash and having problem understanding how to get this done.
Check all "To:" field email address domains and list all unique domains to a variable to compare it to from domain.
I get the "from address" domain by using
grep -m 1 "From: " filename | cut -f 2 -d '#' | cut -d ">" -f 1
when reading a mail stored in file filename.
For "to address" domain there can be multiple To: addresses and having multiple domains. I am not sure how to get unique domains from "to address field".
Example to address line will be like this:
To: user#domain.com, user2#domain.com,
User Name <sample#domaintest.com>, test#domainname.com
grep -m 1 "^To: " filename | cut -f 2 -d '#' | cut -d ">" -f 1
but there are different format of email. So I am not sure if grep is right or if I should search for awk or something.
I need to get the unique domain list from the "To:" field email address/addresses to a variable in bash script.
Desired output for above example:
domain.com,domaintest.com,domainname.com

If you are hellbent on doing this with line-oriented utilities, there is a utility formail in the Procmail distribution which can normalize things for you somewhat.
bash$ formail -czxTo: <<\==test==
> From: me <sender#example.com>
> To: you <first#example.org>,
> them <other#example.net>
> Subject: quick demo
>
> Very quick, innit.
> ==test==
first#example.org, other#example.net
So with that you have input which you can actually pass to grep or Awk ... or sed.
fromdom=$(formail -czxTo: <message | tr ',' '\n' | sed 's/.*#//')
The From: address will not be normalized by formail -czxFrom: but you can use a neat trick: make formail generate a reply back to the From: address, and then extract the To: header from that.
todoms=$(formail -rtzcxTo: <message | sed 's/.*#//')
In some more detail, -r says to create a new reply to whoever sent you message, and then we do -zcxTo: on that.
(The -t option may or may not do what you want. In this case, I would perhaps omit it. http://www.iki.fi/era/procmail/formail.html has (vague) documentation for what it does; see also the section just before http://www.iki.fi/era/procmail/mini-faq.html#group-writable and sorry for the clumsy link -- there doesn't seem to be a good page-internal anchor to link to.)

Email address normalization is tricky because there are so many variants to choose from.
From: Elvis Parsley <king#graceland.example.com>
From: king#graceland.example.com
From: "Parsley, Elvis" <king#graceland.example.com> (kill me, I have to use Outlook)
From: "quoted#string" <king#graceland.example.com> (wait, he is already dead)
To: This could fold <recipient#example.net>,
over multiple lines <another#example.org>
I would turn to a more capable language with proper support for parsing all of these formats. My choice would be Python, though you could probably also pull this off in a few lines of Ruby or Perl.
The email library was revamped in Python 3.6 so this assumes you have at least that version. The email.Headerregistry class which is new in 3.6 is particularly convenient here.
#!/usr/bin/env python3
from email.policy import default
from email import message_from_binary_file
import sys
if len(sys.argv) == 1:
sys.argv.append('-')
for arg in sys.argv[1:]:
if arg == '-':
handle = sys.stdin
else:
handle = open(arg, 'rb')
message = message_from_binary_file(handle, policy=default)
from_dom = message.get('From').address.domain
to_doms = set()
for addr in message.get('To').addresses:
dom = addr.domain
if dom == from_dom:
continue
to_doms.add(dom)
print(','.join([from_dom] + list(to_doms)))
if arg != '-':
handle.close()
This simply produces a comma-separated list of domain names; you might want to do the rest of the processing in Python too instead, or change this so that it prints something in a slightly different format.
You'd save this in a convenient place (say, /usr/local/bin/fromto) and mark it as executable (chmod 755 /usr/local/bin/fromto). Now you can call this from the shell like any other utility like grep.

sed regex to clean input domain list of www. www1. ftp. etc

I have tried so many combinations and suggestions now to get this to work but I am just not succeeding.
UPDATE: to Provide more info
I have an input list like this
0------------0-------------0.0n-line.info
0-0--0-000.com
0-3.us
aw.dermalmask.com
idolstudio.free.fr
idolstudio2.free.fr
something.blogspot.com
anything.blogspot.com
xxx.blogspot.ca
www.hola.org
www10.a8.net
www11.alsto.com
www148.myquicksearch.com
ftp.thaitattoo.nl
ftp01.pornocrawler.ws
ftp04.pornocrawler.ws
g.blogads.com
wvw.tielecreidito-pe.com
And I was given a sed by someone which almost get's this right but is missing escaping some characters and stripping off some periods.
I am using
sed -r 's:(^.?(aw|www|ftp|ww|wvw)[[:alnum:]]?.|^..?)::g' input.txt > output.txt
But it gives me this output
-----------0-------------0.0n-line.info
.a8.net
.alsto.com
.pornocrawler.ws
0--0-000.com
3.us
8.myquicksearch.com
blogads.com
dermalmask.com
hola.org
mething.blogspot.com
olstudio.free.fr
olstudio2.free.fr
thaitattoo.nl
tielecreidito-pe.com
x.blogspot.ca
ything.blogspot.com
Instead of this
0-----------0-------------0.0n-line.info
0-0--0-000.com
0-3.us
dermalmask.com
idolstudio.free.fr
idolstudio2.free.fr
something.blogspot.com
anything.blogspot.com
xxx.blogspot.ca
hola.org
a8.net
alsto.com
myquicksearch.com
thaitattoo.nl
pornocrawler.ws
pornocrawler.ws
blogads.com
tielecreidito-pe.com
And Ultimately I would actually like this kind of output.
0n-line.info
0-0--0-000.com
0-3.us
dermalmask.com
idolstudio.free.fr
idolstudio2.free.fr
something.blogspot.com
anything.blogspot.com
xxx.blogspot.ca
hola.org
a8.net
alsto.com
myquicksearch.com
thaitattoo.nl
pornocrawler.ws
pornocrawler.ws
blogads.com
tielecreidito-pe.com

I think your last line is www.tielecreidito-pe.com and not wvw...
So you can try this one
sed '1s/[^.]*.//;s/^[w\|a]w[^\.]*.\|^[f\|g][^\.]*.//' infile

Git Bash regex to match latest tag

My VCS has these tags
0.0.3.156-alpha+2
0.0.3.154
0.0.3.153
build-.139
build-.140
build-.142
build-0.0.1.28
build-0.0.1.29
build-0.0.1.30
build-0.0.1.32
I want to git describe --match "<regex>" to get the latest tag of the form number.number.number.number (so it's 0.0.3.154 in this case)
I have tried with git describe --match "[0-9]*.[0-9]*.[0-9]*.[0-9]*$" but it doesn't result in anything, and neither do these pattern:
"[0-9]*.[0-9]*.[0-9]*.[0-9]+"
"[0-9]*.[0-9]*.[0-9]*.[0-9]{1,}"
I need to get the latest tag in other to bump version for the next release. So i'm thinking of doing this automatically. Please let me know if I miss anything
Thanks
UPDATE:
In my build.gradle file I have a function to get tag like this (follow #Marc reply):
version getVersionFromTag()
def getVersionFromTag() {
def stdout = new ByteArrayOutputStream()
exec {
commandLine 'git', 'tag', '|' , 'grep', '^\([0-9]\+\.\?\)\+$', '|', 'sort' , '-nr', '|', 'head', '-1'
standardOutput = stdout
}
return stdout.toString().trim()
}
Here it gives errors Unexpected Char '\' in the regex above. Hence I removed them to becomes '^([0-9]+.?)+$', then it runs fine but in my final artifact, it does not have the version appended to the name (i.e helloword.jar instead of helloword-0.0.3.154.jar
=> My question is how should I put #Marc's suggested command to the gradle function correctly?

For testing I've put the output of your git describe in a file. This will do:
cat file | grep '^\([0-9]\+\.\?\)\+$' | sort -nr | head -1
0.0.3.154
Suppose you've created some irregular formatted tags and you want to use those as well (like your build--tags) for finding the highest tag:
sed -E 's/^[^0-9.]*//' | grep '^\([0-9]\+\.\?\)\+$' | sort -nr | head -1

Removing specific tags in a KML file

I have a KML file which is a list of places around the world with coordinates and some other attributes. It looks like this for one place:
<Placemark>
<name>Albania - Durrës</name>
<open>0</open>
<visibility>1</visibility>
<description>(Spot ID: 275801) show <![CDATA[forecast]]></description>
<styleUrl>#wgStyle001</styleUrl><Point>
<coordinates>19.489747,41.277806,0</coordinates>
</Point>
<LookAt><range>200000</range><longitude>19.489747</longitude><latitude>41.277806</latitude></LookAt>
</Placemark>
I would like to remove everything except the name of the place. So in this case that would mean I would like to remove everything except
<name>Albania - Durrës</name>
The problem is, this KML file includes more than 1000 of these places. Doing this manually obviously isn't an option, so then how can I remove all tags except for the name tags for all of the items in the list? Can I use some kind of program for that?

Use a specialized command line tool that understands XML documents.
One such tool is xmlstarlet, which is available here for Linux, Windows and Solaris.
To address your particular problem, I used the xmlstarlet executable xml.exe like this (on Windows):
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -t -v /ns:kml/ns:Document/ns:Placemark/ns:name places.kml
This produces this output:
Albania - Durrës
Second Name
Third Name
...
Final Name
If you can guarantee that <name> occurs only as a child of <Placemark>, then this abbreviated version will produce the same result:
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -t -v //ns:name places.kml
(This is because this shorter version finds all <name> elements no matter where they occur in the document.)
If you really want an XML document, you'll need to do a little post-processing. Here's an example of a complete XML document:
<?xml version='1.0' encoding='utf-8'?>
<items>
<item>Albania - Durrës</item>
<item>Second Name</item>
<item>Third Name</item>
<!-- ... -->
<item>Final Name</item>
</items>
This first line is the XML declaration. It declares the Unicode encoding utf-8. You'll need to include this line so that XML processors recognize that your document includes Unicode characters. (As in Durrës.)
More: Here's an enhanced 'xmlstarlet' command that will produce the XML document above:
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -T -t -o "<?xml version='1.0' encoding='utf-8'?>" -n -t -v "'<items>'" -n -t -m //ns:Placemark -v "concat('<item>',ns:name,'</item>')" -n -t -o "</items>" -n places.kml

If you are on linux or similar:
grep "<name>" your_file.kml > file_with_only_name_tags
On windows, see What are good grep tools for Windows?

regular expression to extract data from html page

I want to extract all anchor tags from html pages. I am using this in Linux.
lynx --source http://www.imdb.com | egrep "<a[^>]*>"
but that is not working as expected, since result contains unwanted results
<a class="amazon-affiliate-site-name" href="http://www.fabric.com">Fabric</a><br>
I want just
<a href >...</a>
any good way ?

If you have a -P option in your grep so that it accepts PCRE patterns, you should be able to use better regexes. Sometimes a minimal quantifier like *? helps. Also, you’re getting the whole input line, not just the match itself; if you have a -o option to grep, it will list only the part that matches.
egrep -Po '<a[^<>]*>'
If your grep doesn’t have those options, try
perl -00 -nle 'print $1 while /(<a[^<>]*>)/gi'
Which now crosses line boundaries.
To do a real parse of HTML requires regexes subtantially more more complex than you are apt to wish to enter on the command line. Here’s one example, and here’s another. Those may not convince you to try a non-regex approach, but they should at least show you how much harder it is in the general case than in specific ones.
This answer shows why all things are possible, but not all are expedient.

why can't you use options like --dump ?
lynx --dump --listonly http://www.imdb.com

Try grep -Eo:
$ echo '<a class="amazon-affiliate-site-name" href="http://www.fabric.com">Fabric</a><br>' | grep -Eo '<a[^>]*>'
<a class="amazon-affiliate-site-name" href="http://www.fabric.com">
But please read the answer that MAK linked to.

Here's some examples of why you should not use regex to parse html.
To extract values of 'href' attribute of anchor tags, run:
$ python -c'import sys, lxml.html as h
> root = h.parse(sys.argv[1]).getroot()
> root.make_links_absolute(base_url=sys.argv[1])
> print "\n".join(root.xpath("//a/#href"))' http://imdb.com | sort -u
Install lxml module if needed: $ sudo apt-get install python-lxml.
Output
http://askville.amazon.com
http://idfilm.blogspot.com/2011/02/another-class.html
http://imdb.com
http://imdb.com/
http://imdb.com/a2z
http://imdb.com/a2z/
http://imdb.com/advertising/
http://imdb.com/boards/
http://imdb.com/chart/
http://imdb.com/chart/top
http://imdb.com/czone/
http://imdb.com/features/hdgallery
http://imdb.com/features/oscars/2011/
http://imdb.com/features/sundance/2011/
http://imdb.com/features/video/
http://imdb.com/features/video/browse/
http://imdb.com/features/video/trailers/
http://imdb.com/features/video/tv/
http://imdb.com/features/yearinreview/2010/
http://imdb.com/genre
http://imdb.com/help/
http://imdb.com/helpdesk/contact
http://imdb.com/help/show_article?conditions
http://imdb.com/help/show_article?rssavailable
http://imdb.com/jobs
http://imdb.com/lists
http://imdb.com/media/index/rg2392693248
http://imdb.com/media/rm3467688448/rg2392693248
http://imdb.com/media/rm3484465664/rg2392693248
http://imdb.com/media/rm3719346688/rg2392693248
http://imdb.com/mymovies/list
http://imdb.com/name/nm0000207/
http://imdb.com/name/nm0000234/
http://imdb.com/name/nm0000631/
http://imdb.com/name/nm0000982/
http://imdb.com/name/nm0001392/
http://imdb.com/name/nm0004716/
http://imdb.com/name/nm0531546/
http://imdb.com/name/nm0626362/
http://imdb.com/name/nm0742146/
http://imdb.com/name/nm0817980/
http://imdb.com/name/nm2059117/
http://imdb.com/news/
http://imdb.com/news/celebrity
http://imdb.com/news/movie
http://imdb.com/news/ni7650335/
http://imdb.com/news/ni7653135/
http://imdb.com/news/ni7654375/
http://imdb.com/news/ni7654598/
http://imdb.com/news/ni7654810/
http://imdb.com/news/ni7655320/
http://imdb.com/news/ni7656816/
http://imdb.com/news/ni7660987/
http://imdb.com/news/ni7662397/
http://imdb.com/news/ni7665028/
http://imdb.com/news/ni7668639/
http://imdb.com/news/ni7669396/
http://imdb.com/news/ni7676733/
http://imdb.com/news/ni7677253/
http://imdb.com/news/ni7677366/
http://imdb.com/news/ni7677639/
http://imdb.com/news/ni7677944/
http://imdb.com/news/ni7678014/
http://imdb.com/news/ni7678103/
http://imdb.com/news/ni7678225/
http://imdb.com/news/ns0000003/
http://imdb.com/news/ns0000018/
http://imdb.com/news/ns0000023/
http://imdb.com/news/ns0000031/
http://imdb.com/news/ns0000128/
http://imdb.com/news/ns0000136/
http://imdb.com/news/ns0000141/
http://imdb.com/news/ns0000195/
http://imdb.com/news/ns0000236/
http://imdb.com/news/ns0000344/
http://imdb.com/news/ns0000345/
http://imdb.com/news/ns0004913/
http://imdb.com/news/top
http://imdb.com/news/tv
http://imdb.com/nowplaying/
http://imdb.com/photo_galleries/new_photos/2010/
http://imdb.com/poll
http://imdb.com/privacy
http://imdb.com/register/login
http://imdb.com/register/?why=footer
http://imdb.com/register/?why=mymovies_footer
http://imdb.com/register/?why=personalize
http://imdb.com/rg/NAV_TWITTER/NAV_EXTRA/http://www.twitter.com/imdb
http://imdb.com/ri/TRAILERS_HPPIRATESVID/TOP_BUCKET/102785/video/imdb/vi161323033/
http://imdb.com/search
http://imdb.com/search/
http://imdb.com/search/name?birth_monthday=02-12
http://imdb.com/search/title?sort=num_votes,desc&title_type=feature&my_ratings=exclude
http://imdb.com/sections/dvd/
http://imdb.com/sections/horror/
http://imdb.com/sections/indie/
http://imdb.com/sections/tv/
http://imdb.com/showtimes/
http://imdb.com/tiger_redirect?FT_LIC&licensing/
http://imdb.com/title/tt0078748/
http://imdb.com/title/tt0279600/
http://imdb.com/title/tt0377981/
http://imdb.com/title/tt0881320/
http://imdb.com/title/tt0990407/
http://imdb.com/title/tt1034389/
http://imdb.com/title/tt1265990/
http://imdb.com/title/tt1401152/
http://imdb.com/title/tt1411238/
http://imdb.com/title/tt1411238/trivia
http://imdb.com/title/tt1446714/
http://imdb.com/title/tt1452628/
http://imdb.com/title/tt1464174/
http://imdb.com/title/tt1464540/
http://imdb.com/title/tt1477837/
http://imdb.com/title/tt1502404/
http://imdb.com/title/tt1504320/
http://imdb.com/title/tt1563069/
http://imdb.com/title/tt1564367/
http://imdb.com/title/tt1702443/
http://imdb.com/tvgrid/
http://m.imdb.com
http://pro.imdb.com/r/IMDbTabNB/
http://resume.imdb.com
http://resume.imdb.com/
https://secure.imdb.com/register/subscribe?c=a394d4442664f6f6475627
http://twitter.com/imdb
http://wireless.amazon.com
http://www.3news.co.nz/The-Hobbit-media-conference--full-video/tabid/312/articleID/198020/Default.aspx
http://www.amazon.com/exec/obidos/redirect-home/internetmoviedat
http://www.audible.com
http://www.boxofficemojo.com
http://www.dpreview.com
http://www.endless.com
http://www.fabric.com
http://www.imdb.com/board/bd0000089/threads/
http://www.imdb.com/licensing/
http://www.imdb.com/media/rm1037220352/rg261921280
http://www.imdb.com/media/rm2695346688/tt1449283
http://www.imdb.com/media/rm3987585536/tt1092026
http://www.imdb.com/name/nm0000092/
http://www.imdb.com/photo_galleries/new_photos/2010/index
http://www.imdb.com/search/title?sort=num_votes,desc&title_type=tv_series&my_ratings=exclude
http://www.imdb.com/sections/indie/
http://www.imdb.com/title/tt0079470/
http://www.imdb.com/title/tt0079470/quotes?qt0471997
http://www.imdb.com/title/tt1542852/
http://www.imdb.com/title/tt1606392/
http://www.imdb.de
http://www.imdb.es
http://www.imdb.fr
http://www.imdb.it
http://www.imdb.pt
http://www.movieline.com/2011/02/watch-jon-hamm-talk-butthole-surfers-paul-rudd-impersonate-jay-leno-at-book-reading-1.php
http://www.movingimagesource.us/articles/un-tv-20110210
http://www.npr.org/blogs/monkeysee/2011/02/10/133629395/james-franco-recites-byron-to-the-worlds-luckiest-middle-school-journalist
http://www.nytimes.com/2011/02/06/books/review/Brubach-t.html
http://www.shopbop.com/welcome
http://www.smallparts.com
http://www.twinpeaks20.com/details/
http://www.twitter.com/imdb
http://www.vanityfair.com/hollywood/features/2011/03/lauren-bacall-201103
http://www.warehousedeals.com
http://www.withoutabox.com
http://www.zappos.com

To extract values of 'href' attribute of anchor tags you may also use xmlstarlet after converting HTML to XHTML using HTML Tidy (Mac OS X version released on 25 March 2009):
curl -s www.imdb.com |
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -t -m "//x:a/#href" -v '.' -n |
grep '^[[:space:]]*http://' | sort -u | nl

On Mac OS X you may also use the command line tool linkscraper:
linkscraper http://www.imdb.com
see: http://codesnippets.joyent.com/posts/show/10772

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

shell script to recognise jira key - regex

Related

bash script - fetch only unique domains from email list to variable

sed regex to clean input domain list of www. www1. ftp. etc

Git Bash regex to match latest tag

Removing specific tags in a KML file

regular expression to extract data from html page

Categories

Resources