xmlstarlet extract values from Office 365 XML Files - xslt

I am attempting to extract the IPv4 Addresses and URLs for each product in the Office 365 portfolio from http://go.microsoft.com/fwlink/?LinkId=533185. I would like to create an input file for updating Firewall and Proxy server rules.
Snippet from XML File
<?xml version="1.0" encoding="utf-8"?>
<products updated="8/31/2016">
<product name="WAC">
<addresslist type="IPv6">
<address>2a01:111:f406:8800::/64</address>
<address>2a01:111:f406:400::/64</address>
<address>2a01:111:f406:1c01::/64</address>
<address>2a01:111:f406:9400::/64</address>
<address>2a01:111:f406:2402::/64</address>
<address>2a01:111:f406:a804::/64</address>
<address>2a01:111:f406:b401::/64</address>
<address>2620:1ec:c11::204</address>
<address>2a01:111:202c::204</address>
<address>2620:1ec:c11::205</address>
<address>2a01:111:202c::205</address>
</addresslist>
<addresslist type="IPv4">
<address>13.69.187.20/32</address>
<address>13.70.184.242/32</address>
<address>13.71.155.176/32</address>
<address>13.75.153.216/32</address>
<address>13.76.140.48/32</address>
<address>13.78.114.39/32</address>
<address>13.85.84.102/32</address>
<address>13.88.248.161/32</address>
<address>13.88.254.212/32</address>
<address>13.94.209.165/32</address>
<address>23.103.183.15/32</address>
<address>40.68.166.51/32</address>
<address>40.74.130.243/32</address>
<address>40.74.138.42/32</address>
<address>40.76.54.124/32</address>
<address>40.86.230.88/32</address>
<address>40.114.192.209/32</address>
<address>40.117.226.146/32</address>
<address>40.126.236.216/32</address>
<address>40.127.79.139/32</address>
<address>52.169.109.48/32</address>
<address>52.172.13.171/32</address>
<address>52.172.153.104/32</address>
<address>52.175.25.142/32</address>
<address>52.232.128.169/32</address>
<address>104.40.225.204/32</address>
<address>104.41.62.54/32</address>
<address>104.211.103.207/32</address>
<address>104.211.229.230/32</address>
<address>104.214.38.136/32</address>
<address>104.215.194.17/32</address>
<address>134.170.27.86/32</address>
<address>134.170.48.20/32</address>
<address>134.170.48.22/32</address>
<address>134.170.65.86/32</address>
<address>134.170.170.86/32</address>
<address>137.116.172.39/32</address>
<address>137.135.65.72/32</address>
<address>191.235.87.181/32</address>
<address>191.237.40.220/32</address>
</addresslist>
<addresslist type="URL">
<address>*.officeapps.live.com</address>
<address>*.cdn.office.net</address>
</addresslist>
</product>
<product name="Sway">
I have figured out out to list the products which can be used as to cycle through the products to get the end result but I can seem to get past filtering on a particular product and IPv4 address.
./xmlstarlet sel -t -m '/products/product' -v #name -n Downloads/O365IPAddresses.xml
WAC
Sway
Planner
Yammer
OneNote
OfficeiPad
OfficeMobile
ProPlus
RCA
LYO
SPO
Office365Video
identity
EXO
CRLs
o365
EOP
I have tried many variations of the following and no output.
./xmlstarlet sel -t -m '//root/products/product[name="WAC"]/addresslist[type="IPv4"]' Downloads/O365IPAddresses.xml
Ultimately I would like to get to a text file that has something like the following.
WAC,IPv4,13.69.187.20/32
WAC,IPv4,13.70.184.242/32
.....
.....
WAC,URL,*.officeapps.live.com
I can then use this csv file to update the collection of Firewalls and Proxy servers in my company.
Many thanks in advance for the help.

Related

How to extract the data from a consecutive xml tag attribute based on the previous tag value

I have trouble getting my regex right for the below use case.
<LOB>
<LOBStatusInfo>
<LOB>Mobile</LOB>
<Status>Active</Status>
</LOBStatusInfo>
<LOBStatusINfo>
<LOB>Voice</LOB>
<Status>Active</Status>
</LOBStatusInfo>
<LOBStatusInfo>
<LOB>Internet</LOB>
<Status>Disconnect</Status>
</LOBStatusInfo>
</LOBStatus>
In the above XML, I'm looking to extract only the status corresponding to Voice (which is active).
So far, I was able to get the LOB itself, but not the corresponding status.
ps: I'm a newbie, please pardon if the details weren't enough.
We don't parse XML with regex, check: Using regular expressions with HTML tags
Instead, you can use xpath and a proper xml parser. What is your environment, language ?
Test :
Input file
<LOB>
<LOBStatus>
<LOBStatusInfo>
<LOB>Mobile</LOB>
<Status>Active</Status>
</LOBStatusInfo>
<LOBStatusInfo>
<LOB>Voice</LOB>
<Status>Active</Status>
</LOBStatusInfo>
<LOBStatusInfo>
<LOB>Internet</LOB>
<Status>Disconnect</Status>
</LOBStatusInfo>
</LOBStatus>
</LOB>
Command
(just an example, now in shell, but the query can be used in any language of your choice)
xmllint --xpath '//LOB[text()="Voice"]/../Status/text()' file.xml
or
xmllint --xpath '//LOB[text()="Voice"]/following-sibling::Status/text()' file.xml
Output:
Active

shell script to recognise jira key

Below lines in my jenkins job configuration Execute shell retrieves jira key
JIRA_KEY=$(curl --request GET "http://jenkins-server/job/myProject/job/mySubProject/job/myComponent/${BUILD_NUMBER}/api/xml?xpath=/*/changeSet/item/comment" | sed -e "s/<comment>\(.*\)<\/comment>/\1/")
JIRA_KEY=$(echo $JIRA_KEY | cut -c10-17)
But in case if text doesn't start with jira key then as per the current logic it will assign any text in the range of 10-17. I need to store empty string "" in the variable JIRA_KEY when jira key is not present in the <comment>, how can we do that?
Here is the xml
<freeStyleBuild _class="hudson.model.FreeStyleBuild">
<changeSet _class="hudson.plugins.git.GitChangeSetList">
<item _class="hudson.plugins.git.GitChangeSet">
<comment>
JRA-1011 This is commit
message.
</comment>
</item>
</changeSet>
</freeStyleBuild>
As I mentioned in comment section it is not clear which output you need, so based on some assumptions, could you please try following and let me know on same.
I- If you need all the strings between to then you could run following.
awk '/<\/comment>/{a="";next}/<comment>/{a=1;next}a' Input_file
II- If you need to find only JRA string between to then you could do following.
awk '/<\/comment>/{a="";next}/<comment>/{a=1;next} a && /JRA/{match($0,/[a-zA-Z]+[^ ]*/);print substr($0,RSTART,RLENGTH)}' Input_file

DataPower file transfer returns base64

I am using below cURL command to get DataPower files from applaince to a remote Solaris server.
/usr/local/bin/curl -s --insecure --data-binary #getFile.xml -u username:password https://ip:port/service/mgmt/current
Content of getFile.xml is as below.
<?xml version="1.0"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Body>
<dp:request xmlns:dp="http://www.datapower.com/schemas/management">
<dp:get-file name="config:///unicenter.cfg"/>
</dp:request>
</env:Body>
</env:Envelope>
When I am running the cURL metioned above on Solaris, I am getting long base64 encoded string. But I wish to get the complete file copied to Solaris.
The long Base64-encoded string is your file. You need to do a little work to extract it.
This curl command is using the DataPower XML Management interface, and they call it that because all requests and responses are XML-formatted. You may not have seen it as the long string flew by, but it was wrapped in XML. Here's a sample response with a small payload:
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Body>
<dp:response xmlns:dp="http://www.datapower.com/schemas/management">
<dp:timestamp>2014-10-23T17:12:39-04:00</dp:timestamp>
<dp:file name="local:///testfile.txt">VGhpcyBpcyBub3QgYW4gYWN0dWFsIGVtZXJnZW5jeS4K</dp:file>
</dp:response>
</env:Body>
</env:Envelope>
So, you have two jobs to do. First, get the Base64 string out of its XML wrapper, and second, decode it. There are a million ways to do this -- I'll give you one of them. Get a copy of XmlStarlet to do the extraction, and OpenSSL to do the Base64 decoding.
Then, pipe the curl output like so:
/usr/local/bin/curl -s --insecure --data-binary #getFile.xml -u username:password https://ip:port/service/mgmt/current \
| (xmlstarlet sel -T -t -v "//*[local-name()='file']" && echo) \
| fold -w 64 \
| openssl enc -d -base64 >this-is-the-real-file
Two quick notes -- the "&& echo" is to add a trailing newline, and the "fold" is to split the Base64 string into lines. A less finicky Base64 decoder wouldn't need these. I just picked "openssl" because most people already have it.

Need help finding KML data for specific ZIP code

I'm making a heat map using New Jersey ZIP codes, but the geometry information I'm using is incorrect for two ZIP codes. Does anyone know where I can go to get the KML information for these two specific ZIP codes? I've seen a lot of posts here on what resources to use for KML data, but they are very advanced and I have no idea how to mine the databases that other people on the forum have linked to.
Specifically, I need the KML ZIP code data for ZIP codes 08559 and 08757. The following is the flawed KML data I have for these ZIP codes:
08559:
<Polygon><outerBoundaryIs><LinearRing><coordinates>-74.892528,40.414294,0.0 -74.967386,40.39857,0.0 -75.060361,40.420788,0.0 -75.064463,40.500774,0.0 -75.06583,40.517523,0.0 -75.068223,40.457705,0.0 -74.991314,40.481632,0.0 -74.88911,40.47001,0.0 -74.892528,40.414294,0.0</coordinates></LinearRing></outerBoundaryIs></Polygon>
08757:
<Polygon><outerBoundaryIs><LinearRing><coordinates>-74.190432,39.946002,0.0 -74.221537,39.932329,0.0 -74.322374,39.946002,0.0 -74.234185,40.008896,0.0 -74.26905,39.986678,0.0 -74.206839,39.952154,0.0 -74.19761,39.949762,0.0 -74.190432,39.946002,0.0</coordinates></LinearRing></outerBoundaryIs></Polygon>
How to Create a Region Map with Multiple Zip Codes
Download KML file containing all US zip codes from census.gov. The most current file on this site is: http://www2.census.gov/geo/tiger/GENZ2015/kml/cb_2015_us_zcta510_500k.zip. If this link is broken search Google for site:census.gov KML ZIP. Another option: https://www.filosophy.org/post/17/zipcodes_in_kml/
Open this huge (175MB) text file in a plaintext editor and search for the zip code you want and copy the <Polygon> section. Here is the result when searching for >94117:
<Placemark id="cb_2015_us_zcta510_500k.kml">
<name><at><openparen><closeparen></name>
<visibility>1</visibility>
<description><![CDATA[<center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3">
<th>ZCTA5CE10</th>
<td>94117</td>
</tr><tr bgcolor="">
<th>AFFGEOID10</th>
<td>8600000US94117</td>
</tr><tr bgcolor="#E3E3F3">
<th>GEOID10</th>
<td>94117</td>
</tr><tr bgcolor="">
<th>ALAND10</th>
<td>4373059</td>
</tr><tr bgcolor="#E3E3F3">
<th>AWATER10</th>
<td>1625</td>
</tr></table></center>]]></description>
<LookAt>
<longitude>-102</longitude>
<latitude>38.5</latitude>
<range>7000000</range>
<tilt>10</tilt>
<heading>0</heading>
</LookAt>
<styleUrl>#KMLStyler</styleUrl>
<ExtendedData>
<SchemaData schemaUrl="#kml_schema_ft_cb_2015_us_zcta510_500k">
<SimpleData name="ZCTA5CE10">94117</SimpleData>
<SimpleData name="AFFGEOID10">8600000US94117</SimpleData>
<SimpleData name="GEOID10">94117</SimpleData>
<SimpleData name="ALAND10">4373059</SimpleData>
<SimpleData name="AWATER10">1625</SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<extrude>0</extrude>
<tessellate>1</tessellate>
<altitudeMode>clampToGround</altitudeMode>
<outerBoundaryIs>
<LinearRing>
<coordinates>-122.477297,37.766069,0 -122.477379,37.765482,0 -122.458405,37.76616,0 -122.45779,37.766015,0 -122.457536,37.763566,0 -122.455999,37.763904,0 -122.456994,37.761842,0 -122.459173,37.761912,0 -122.455944,37.760239,0 -122.456603,37.759235,0 -122.454002,37.758785,0 -122.451817,37.759453,0 -122.447682,37.75919,0 -122.446783,37.761781,0 -122.445309,37.76188,0 -122.442915,37.763648,0 -122.443347,37.765333,0 -122.441242,37.765271,0 -122.4382,37.767159,0 -122.435624,37.767328,0 -122.435794,37.769058,0 -122.429128,37.769456,0 -122.428426,37.770452,0 -122.429178,37.774181,0 -122.429929,37.777909,0 -122.430115,37.778842,0 -122.444967,37.776958,0 -122.44478,37.776017,0 -122.446471,37.775802,0 -122.446846,37.777669,0 -122.453188,37.776853,0 -122.45281,37.774995,0 -122.463749,37.773624,0 -122.464611,37.77244,0 -122.459162,37.771314,0 -122.459901,37.770442,0 -122.464402,37.769669,0 -122.467004,37.768013,0 -122.469758,37.769209,0 -122.472245,37.76861,0 -122.473124,37.767116,0 -122.477297,37.766069,0 </coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
Create a clean KML file and move the <Polygon> to this file. Below is an example of a KML file (service-delivery-area.kml) with two zip code polygons. Google has a great KML Reference.
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<name>Service Delivery Area</name>
<open>0</open>
<Placemark>
<name>94117</name>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-122.477297,37.766069,0 -122.477379,37.765482,0 -122.458405,37.76616,0 -122.45779,37.766015,0 -122.457536,37.763566,0 -122.455999,37.763904,0 -122.456994,37.761842,0 -122.459173,37.761912,0 -122.455944,37.760239,0 -122.456603,37.759235,0 -122.454002,37.758785,0 -122.451817,37.759453,0 -122.447682,37.75919,0 -122.446783,37.761781,0 -122.445309,37.76188,0 -122.442915,37.763648,0 -122.443347,37.765333,0 -122.441242,37.765271,0 -122.4382,37.767159,0 -122.435624,37.767328,0 -122.435794,37.769058,0 -122.429128,37.769456,0 -122.428426,37.770452,0 -122.429178,37.774181,0 -122.429929,37.777909,0 -122.430115,37.778842,0 -122.444967,37.776958,0 -122.44478,37.776017,0 -122.446471,37.775802,0 -122.446846,37.777669,0 -122.453188,37.776853,0 -122.45281,37.774995,0 -122.463749,37.773624,0 -122.464611,37.77244,0 -122.459162,37.771314,0 -122.459901,37.770442,0 -122.464402,37.769669,0 -122.467004,37.768013,0 -122.469758,37.769209,0 -122.472245,37.76861,0 -122.473124,37.767116,0 -122.477297,37.766069,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
<Placemark>
<name>94102</name>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-122.429929,37.777909,0 -122.429178,37.774181,0 -122.428426,37.770452,0 -122.42822,37.769441,0 -122.426402,37.769596,0 -122.419334,37.77521,0 -122.419219,37.775316,0 -122.418704,37.775645,0 -122.404743,37.786778,0 -122.406399,37.786615,0 -122.406771,37.788499,0 -122.408595,37.789226,0 -122.411886,37.788808,0 -122.414807,37.78652,0 -122.414242,37.783724,0 -122.419182,37.783101,0 -122.420689,37.781955,0 -122.420906,37.782883,0 -122.422287,37.781752,0 -122.424108,37.782477,0 -122.427396,37.782057,0 -122.426829,37.779258,0 -122.429929,37.777909,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
</Document>
</kml>
Go to https://www.google.com/mymaps/ and create a new map. Click "Add new layer", then click "Import", and upload your KML file. If the polygons are incorrect, you can edit the polygons on the map.
Your map should have a result like this: https://www.google.com/maps/d/u/0/embed?mid=1zop5GMD1b2afmOvObQIi7YvF1d4
There's a directory where you can easily search and download Census KML boundary files for any zip code, county or state.
Individual zip code KML
Here are links to KML files for the 2 zip codes you mentioned:
https://census.simplecrew.com/us/zip-codes/08559
https://census.simplecrew.com/us/zip-codes/08757
All zip codes by county KML
You could even download a single KML file for all zip codes by county. For example, your 2 zip codes are in Hunterdon and Ocean counties:
https://census.simplecrew.com/us/states/nj/counties/hunterdon/zip-codes
https://census.simplecrew.com/us/states/nj/counties/ocean/zip-codes
All zip codes by state KML
Also, instead of creating a map of all zip codes in NJ yourself, you could just download a single KML file here:
https://census.simplecrew.com/us/states/nj/zip-codes
For just US zip codes there is data available from the US Census:
http://www.census.gov/geo/www/cob/z52000.html#ascii
Download ASCII dataset for New Jersey or any of other 50 states.
You can lookup the index to your zip codes in the small file (e.g. zt34_d00a.dat)
317
"08559"
"08559"
"Z5"
"5-Digit ZCTA"
Then find the matching lon/lat coordinate list in the large file.
317 -0.749719256389896E+02 0.404391641604938E+02
-0.750611840000000E+02 0.404766960000000E+02
-0.750607720000000E+02 0.404766880000000E+02
-0.750620530000000E+02 0.404692640000000E+02
-0.750637010000000E+02 0.404652440000000E+02
...
-0.750622670000000E+02 0.404709200000000E+02
-0.750611840000000E+02 0.404766960000000E+02
END
The first value is longitude and second value is latitude both in decimal degrees.
NOTE some of the points may be out of order (or erroneous) so you may have to clean it up after you convert to KML. One handy tool to help debug the points and remove the bad points is a KML Number-the-points tool that generates KML Placemarks each with number label corresponding to each point in a line or polygon given some KML as input. Useful if have long list of points and need to easily identify them.

Removing specific tags in a KML file

I have a KML file which is a list of places around the world with coordinates and some other attributes. It looks like this for one place:
<Placemark>
<name>Albania - Durrës</name>
<open>0</open>
<visibility>1</visibility>
<description>(Spot ID: 275801) show <![CDATA[forecast]]></description>
<styleUrl>#wgStyle001</styleUrl><Point>
<coordinates>19.489747,41.277806,0</coordinates>
</Point>
<LookAt><range>200000</range><longitude>19.489747</longitude><latitude>41.277806</latitude></LookAt>
</Placemark>
I would like to remove everything except the name of the place. So in this case that would mean I would like to remove everything except
<name>Albania - Durrës</name>
The problem is, this KML file includes more than 1000 of these places. Doing this manually obviously isn't an option, so then how can I remove all tags except for the name tags for all of the items in the list? Can I use some kind of program for that?
Use a specialized command line tool that understands XML documents.
One such tool is xmlstarlet, which is available here for Linux, Windows and Solaris.
To address your particular problem, I used the xmlstarlet executable xml.exe like this (on Windows):
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -t -v /ns:kml/ns:Document/ns:Placemark/ns:name places.kml
This produces this output:
Albania - Durrës
Second Name
Third Name
...
Final Name
If you can guarantee that <name> occurs only as a child of <Placemark>, then this abbreviated version will produce the same result:
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -t -v //ns:name places.kml
(This is because this shorter version finds all <name> elements no matter where they occur in the document.)
If you really want an XML document, you'll need to do a little post-processing. Here's an example of a complete XML document:
<?xml version='1.0' encoding='utf-8'?>
<items>
<item>Albania - Durrës</item>
<item>Second Name</item>
<item>Third Name</item>
<!-- ... -->
<item>Final Name</item>
</items>
This first line is the XML declaration. It declares the Unicode encoding utf-8. You'll need to include this line so that XML processors recognize that your document includes Unicode characters. (As in Durrës.)
More: Here's an enhanced 'xmlstarlet' command that will produce the XML document above:
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -T -t -o "<?xml version='1.0' encoding='utf-8'?>" -n -t -v "'<items>'" -n -t -m //ns:Placemark -v "concat('<item>',ns:name,'</item>')" -n -t -o "</items>" -n places.kml
If you are on linux or similar:
grep "<name>" your_file.kml > file_with_only_name_tags
On windows, see What are good grep tools for Windows?