how to transform rdf file to hive table

how to transform rdf file to hive table - mapreduce

I have to transform an RDF file into a hive or hbase table and apply a MapReduce job to it. I know how to manipulate hive, but I don't know how to put triples into rows and columns. The following RDF file (from DBpedia) contains multiple triples.
<rdf:RDF
xmlns = "http://dbpedia.org/ontology/"
xml:base="http://dbpedia.org/ontology/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<!-- Ontology Information -->
<owl:Ontology rdf:about="">
<owl:versionInfo xml:lang="de">Version 3.2 2008-11-17</owl:versionInfo>
</owl:Ontology>
<owl:Class rdf:about="http://dbpedia.org/ontology/PopulatedPlace">
<rdfs:label xml:lang="en">Populated Place</rdfs:label>
<rdfs:subClassOf rdf:resource="http://dbpedia.org/ontology/Place"/>
</owl:Class>
<owl:Class rdf:about="http://dbpedia.org/ontology/Place">
<rdfs:label xml:lang="en">Place</rdfs:label>
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>
<owl:Class rdf:about="http://dbpedia.org/ontology/Country">
<rdfs:label xml:lang="en">Country</rdfs:label>
<rdfs:subClassOf rdf:resource="http://dbpedia.org/ontology/PopulatedPlace"/>
</owl:Class>
<owl:Class rdf:about="http://dbpedia.org/ontology/Area">
<rdfs:label xml:lang="en">Area</rdfs:label>
<rdfs:subClassOf rdf:resource="http://dbpedia.org/ontology/PopulatedPlace"/>
</owl:Class>
<owl:Class rdf:about="http://dbpedia.org/ontology/Municipality">
<rdfs:label xml:lang="en">Municipality</rdfs:label>
<rdfs:subClassOf rdf:resource="http://dbpedia.org/ontology/PopulatedPlace"/>
</owl:Class>
<owl:Class rdf:about="http://dbpedia.org/ontology/City">
<rdfs:label xml:lang="en">City</rdfs:label>
<rdfs:subClassOf rdf:resource="http://dbpedia.org/ontology/PopulatedPlace"/>
</owl:Class>
<owl:Class rdf:about="http://dbpedia.org/ontology/River">
<rdfs:label xml:lang="en">River</rdfs:label>
<rdfs:subClassOf rdf:resource="http://dbpedia.org/ontology/Place"/>
</owl:Class>
<owl:Class rdf:about="http://dbpedia.org/ontology/HistoricPlace">
<rdfs:label xml:lang="en">Historic Place</rdfs:label>
<rdfs:subClassOf rdf:resource="http://dbpedia.org/ontology/Place"/>
</owl:Class>
</rdf:RDF>

Related

XSLT - How to get the first value in a repeating Tag - So simple but not that simple

I have an XML where I am getting various phone types and their values. The XML has more than one value for the same phone type (ex, one can have two mobile phone numbers). I have to get the first value using XSLT, no matter how many numbers it has. Its a real simple ask but I am breaking my head on this. Here is the example XML, I want the mobile number with all 2s -
<Work_Phone>
<Work_Phones_group>
<WorkphoneType wd:Descriptor="Fax">
<ID wd:type="Device">Fax</wd:ID>
</WorkphoneType>
<WorkphoneNumber>111-111-1111</wd:WorkphoneNumber>
</Work_Phones_group>
<Work_Phones_group>
<WorkphoneType wd:Descriptor="Mobile">
<ID wd:type="Device">Mobile</wd:ID>
</WorkphoneType>
<WorkphoneNumber>222-222-2222</wd:WorkphoneNumber>
</Work_Phones_group>
<Work_Phones_group>
<WorkphoneType wd:Descriptor="Mobile">
<ID wd:type="Device">Mobile</wd:ID>
</WorkphoneType>
<WorkphoneNumber>333-333-3333</wd:WorkphoneNumber>
</Work_Phones_group>
</Work_Phone>
Output value required - 222-222-2222
Unfortunately, there is no key or any other field that differentiates two mobile number entries. Do you have any suggestions or solutions for this? Any pointer is appreciated.

Your XML is not well-formed. I changed some namespace prefixes to make it well-formed which removes some inconsistencies:
<Work_Phone xmlns:wd="http://some.wd.ns">
<Work_Phones_group>
<WorkphoneType wd:Descriptor="Fax">
<wd:ID wd:type="Device">Fax</wd:ID>
</WorkphoneType>
<wd:WorkphoneNumber>111-111-1111</wd:WorkphoneNumber>
</Work_Phones_group>
<Work_Phones_group>
<WorkphoneType wd:Descriptor="Mobile">
<wd:ID wd:type="Device">Mobile</wd:ID>
</WorkphoneType>
<wd:WorkphoneNumber>222-222-2222</wd:WorkphoneNumber>
</Work_Phones_group>
<Work_Phones_group>
<WorkphoneType wd:Descriptor="Mobile">
<wd:ID wd:type="Device">Mobile</wd:ID>
</WorkphoneType>
<wd:WorkphoneNumber>333-333-3333</wd:WorkphoneNumber>
</Work_Phones_group>
</Work_Phone>
Now, to get the desired value of 222-222-2222, use this XPath-1.0 expression
/Work_Phone/Work_Phones_group[WorkphoneType/#wd:Descriptor='Mobile'][1]/wd:WorkphoneNumber
or customize this template to get the values you really want
<xsl:template match="/">
<xsl:for-each select="Work_Phone/Work_Phones_group[WorkphoneType/#wd:Descriptor='Mobile'][1]">
<xsl:value-of select="wd:WorkphoneNumber" />
</xsl:for-each>
</xsl:template>

Freemarker: getting a Template with include (prefix)

Im trying the following:
MainTemplate.ftl
<root>
<#List items as item>
<#include "custom_item.ftl"> [Option 1]
</#List>
<#include "custom_item.ftl"> [Option 2]
</root>
custom_item.ftl
<root>
<name>${name}</name>
</root>
In some files the include is like [Option 1], in others as [Option 2].
To access the ${name} variable i have to use 2 different ways:
- Option 1: ${item.name}
- Option 2: ${name}
Totally understandable, but also my issue. How can i make sure it always works? Like supplying a prefix to the include so its always the same.
For example like:
MainTemplate.ftl
<root>
<#List items as item>
<#include "custom_item.ftl" prefix='item'> [Option 1]
</#List>
<#include "custom_item.ftl"> [Option 2]
</root>
custom_item.ftl
<root>
<# assign prefix = prefix?root>
<name>${prefix.name}</name>
</root>
Which then always would work. My approach clearly doesnt work, does someone has a solution that does work?
Edit: Answer included
MainTemplate.ftl
<root>
<#List listItems as listItem>
<#assign item = listItem>
<#include "custom_item.ftl">
</#List>
<#assign item = .data_model>
<#include "custom_item.ftl">
</root>
custom_item.ftl
<root>
<name>${item.name}</name>
</root>

Always use ${item.name}. In the case when the data-model root itself is the item (is it?), you can do something like <#assign item = .data_model> before the #include.

How to use XSLT 1.0 or XQuery 1.0 to simulate sql (group by and working with multiple xml nodes)

I have 4 xml documents in nodes and I am trying to use XSLT 1.0 or XQuery 1.0 to process query these xml files and generate one xml as output. I am not sure how to use key and group-by functionality in XSLT 1.0 and how to efficiently query multiple xml nodes to form one output xml. (there are multiple rows in each xml file)
tableA
<table>
<row>
<id></id>
<group></group>
<version></version>
<status></status>
<row>
<table>
tableB
<table>
<row>
<id></id>
<group></group>
<version></version>
</row>
<table>
tableC
<table>
<row>
<id></id>
<code></code>
<version></version>
<code_version></code_version>
<version></version>
</row>
<table>
tableD
<table>
<row>
<id></id>
<code></code>
<code_version></code_version>
<date></date>
</row>
<table>
My SQL Query:
SELECT a.[id]
,c.[code]
,d.[date]
,a.[group ]
FROM [source].[dbo].[tableA] a, [source].[dbo].[tableB] b,
[source].[dbo].[tableC] c, [source].[dbo].[tableD] d
WHERE a.[id] = b.[id]
AND a.[version] = b.[version]
AND a.[id] = c.[id]
AND a.[version] = c.[version]
AND c.[code] = d.[code]
AND c.[code_version] = d.[code_version]
GROUP BY a.[id]
,c.[code]
,d.[date]
,a.[status]
ORDER BY a.[id], c.[code]
I also tried creating partial XQuery 1.0, but it is not working.
My XQuery 1.0:
xquery version "1.0";
declare namespace ms = "http://www.ms.com/extensions";
let $A := fn:doc('tableA.xml')
let $B := fn:doc('tableB.xml')
let $C := fn:doc('tableC.xml')
let $D := fn:doc('tableD.xml')
for $a in $A,
$b in $B,
$c in $C,
$d in $D
where $a/table/row/id = $b/table/row/id
and $a/table/row/version = $b/table/row/version
and $a/table/row/id = $c/table/row/id
and $a/table/row/version = $c/table/row/version
and $c/table/row/code = $d/table/row/code
and $c/table/row/code_version = $d/table/row/code_version
order by $a
return $a
This is just a partial query. This query works good if i don't apply where clause on $c and $d. I am not sure if this is the right way to perform this operation though.

Stuctured XML to Map or Flat XML

I need to insert the line items on my XML to a Map or a flat XML in mulesoft. Iam planning to use XSLT but Im having only single values instead of multiple Line Items. Im not sure how the for each function works for this. any help would be appreciated.
Input
<?xml version="1.0" encoding="utf-8"?><XmlInterchange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1" xmlns="http://www.edi.com.au/EnterpriseService/">
<InterchangeInfo>
<Date>2016-02-29T05:56:10.272+05:00</Date>
<XmlType>LightWeight</XmlType>
<Source></Source>
<Target></Target>
</InterchangeInfo>
<Payload>
<WhsDockets>
<WhsDocket>
<Identifier>
<Reference>2370519</Reference>
</Identifier>
<DocketDetail>
<WarehouseCode>ROC</WarehouseCode>
<CustomerReference>3340527</CustomerReference>
<Units>41</Units>
<Packages>0</Packages>
<Pallets>0</Pallets>
<Weight DimensionType="KG">720</Weight>
<Cubic DimensionType="M3">5.922</Cubic>
<TransportInsurance>0.0000</TransportInsurance>
<ShipperCODAmount>0.0000</ShipperCODAmount>
<CustomerOrderDetail>
<OrderType>ORD</OrderType>
<DateRequired>2015-09-02T00:00:00</DateRequired>
<Consignee AddressType="CEA">
<AddressLine1>Cnr Maroochydore and BroadmeadowRds</AddressLine1>
<CityOrSuburb>MAROOCHYDORE</CityOrSuburb>
<StateOrProvince>QLD</StateOrProvince>
<PostCode>4558</PostCode>
<CompanyName>Bunnings Maroochydore OLD Warehouse</CompanyName>
<CountryCode>AU</CountryCode>
<ContactName>The Import Manager</ContactName>
</Consignee>
</CustomerOrderDetail>
<CustomAttributes />
</DocketDetail>
<DocketLines>
<DocketLine>
<Product>E4342</Product>
<Description>R 3 5/3 6 175mm x 430mm x 1160mm</Description>
<QuantityFromClientOrder>5</QuantityFromClientOrder>
<QuantityActuallyOrdered>5</QuantityActuallyOrdered>
<ProductUQ>MST</ProductUQ>
<LineAttributes />
<LineNumber>1</LineNumber>
<Confirmation>
<Lines>
<Line>
<Quantity>25</Quantity>
<QuantityUQ>PAC</QuantityUQ>
</Line>
</Lines>
<Quantity>25</Quantity>
</Confirmation>
</DocketLine>
<DocketLine>
<Product>E2281</Product>
<Description>R 3 5 175mm x 580mm x 1160mm</Description>
<QuantityFromClientOrder>4</QuantityFromClientOrder>
<QuantityActuallyOrdered>4</QuantityActuallyOrdered>
<ProductUQ>MST</ProductUQ>
<LineAttributes />
<LineNumber>2</LineNumber>
<Confirmation>
<Lines>
<Line>
<Quantity>16</Quantity>
<QuantityUQ>PAC</QuantityUQ>
</Line>
</Lines>
<Quantity>16</Quantity>
</Confirmation>
</DocketLine>
</DocketLines>
</WhsDocket>
</WhsDockets>
</Payload></XmlInterchange>
I need to flatten the XML but use the Litem Item details together with the Reference Number per each Item.
Output
<?xml version="1.0" encoding="utf-8"?><Items>
<LineItem>
<Date/>
<Order>2370519</Order>
<Client>Bunnings Maroochydore OLD Warehouse</Client>
<Product>E2281</Product>
<Description>R 3 5 175mm x 580mm x 1160mm</Description>
<Quantity>4</Quantity>
<UOM>MST</UOM>
<Warebouse>ROC</Warebouse>
<Carrier>Deluxe</Carrier>
</LineItem>
</Items>

Have you looked at DataWeave to transform it from current xml to new xml?
https://docs.mulesoft.com/mule-user-guide/v/3.7/dataweave-examples#xml-basic

XPath expression fails. (libxml2 in C) [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm using libxml2 in a C program to do some stuff within XML documents.
Now... if I look to the following XPath I have an empty result.
/scheda_conservatore[1]/patrimonio_archivistico[1]/lower_list[#type='risorsa_informativa']/risorsa_informativa_nested[#id='037006-001-2012-ri002']
but... If I look for the following XPAth I have a non empty result containing elements that should have been matched even from the first one.
/scheda_conservatore[1]/patrimonio_archivistico[1]/lower_list/risorsa_informativa_nested[#id='037006-001-2012-ri002']
Now... if I check, step by step, my XPath I have...
/scheda_conservatore[1] -> Non empty node set
/scheda_conservatore[1]/patrimonio_archivistico[1] -> Non empty node set
/scheda_conservatore[1]/patrimonio_archivistico[1]/lower_list[#type='risorsa_informativa'] -> Empty Node set.
As I told before, the XML document DO contain a valid path but this is not matched by this request.
More: If I ask jEdit or other editors having XPath support to solve XPath expression for me, the result is a non empty node set.
I'm going mad. I watched the XPath expression thousands of times and there must be somethig very wrong at least as very hidden to my eyes even if it will surelly be brilliant to somebody else.
More...
The following, just asking for the 'type' attribute without looking at it's value, give a valid result. But the value is also correct.
/scheda_conservatore[1]/patrimonio_archivistico[1]/lower_list[#type]/risorsa_informativa_nested[#id='037006-001-2012-ri002']
Here's a "director's cut" of the larger XML document
<?xml version="1.0" encoding="iso-8859-1"?>
<scheda_conservatore anno_rilevazione="2012" stato="non-storicizzata">
<!-- scheda 2012 per Bologna -->
<patrimonio_archivistico>
<lower_list type="complesso_archivistico">
<complesso_archivistico_nested id="037006-001-2012-ca001" inventariazione="n">
<lower_list type="altro_luogo_collocazione">
<altro_luogo_collocazione_nested id="037006-001-2012-alc001">
<!-- altro luogo 1 per bologna 2012 -->
<upper_list type="complesso_archivistico">
<upper ref="ca002"/>
<upper ref="ca003"/>
</upper_list>
<ubicazione>sotterraneo da botola segreta</ubicazione>
<bridge_list type="sede">
<bridge ref="s001"/>
</bridge_list>
</altro_luogo_collocazione_nested>
</lower_list>
<!-- complesso 1 per bologna 2012 -->
<identificazione>
<denominazione>Archivi dei Comprensori della provincia di Bologna</denominazione>
<lista_altre_denominazioni>
<!-- Modificato -->
<altra_denominazione>Archivi dei Comprensori bolognesi</altra_denominazione>
<altra_denominazione>Archivi dei Comprensori felsinei</altra_denominazione>
</lista_altre_denominazioni>
<livello>Complesso di fondi, Superfondo</livello>
</identificazione>
<dati_giuridici>
<tipologia>Pubblico</tipologia>
<notificato_dichiarato presente="y">
<data>20100304T000000</data>
</notificato_dichiarato>
</dati_giuridici>
<lower_list type="titolare">
<titolare_nested id="037006-001-2012-t001">
<!-- titolare 1 per bologna 2012 -->
<upper_list type="complesso_archivistico">
<upper ref="ca001"/>
<upper ref="ca002"/>
</upper_list>
</titolare_nested>
</lower_list>
</complesso_archivistico_nested>
</lower_list>
<lower_list type="risorsa_informativa">
<risorsa_informativa_nested id="037006-001-2012-ri001">
<bridge_list type="complesso_archivistico">
<bridge ref="ca001"/>
<bridge ref="ca002"/>
</bridge_list>
<!-- risorsa 1 per bologna 2012 -->
<descrizione>
<autore>CSR - Centro studi e ricerche</autore>
<titolo>Atti degli uffici: inventario-mappa topografica del...</titolo>
<anno indicativo="y">1986</anno>
<qualifica>
<opz pubbl="y">Strumenti di ricerca archivistici</opz>
</qualifica>
<scelta_multipla nome="standard">
<opz valore="AACR2"/>
<opz valore="Altro">EAD</opz>
</scelta_multipla>
<descr_estrinseca>Dattiloscritto (relativo a: documentazione post 1945 conservata in Viale Martiri della Libert&#x2026;)</descr_estrinseca>
</descrizione>
<lista_pubblicazioni>
<pubblicazione>
<edita presente="y">stampa</edita>
<edita_stampa>
<curatore/>
<edito_in/>
<luogo/>
<data/>
<pagine/>
<sbn/>
<note/>
</edita_stampa>
<url/>
<ultima_consultazione>20120611T165400</ultima_consultazione>
<nota>Nessuna nota</nota>
</pubblicazione>
<pubblicazione>
<edita presente="y">web</edita>
<edita_stampa/>
<url>www.risorsainformativa.gov</url>
<ultima_consultazione/>
<nota>Nessuna nota web</nota>
</pubblicazione>
</lista_pubblicazioni>
<informatizzazione presente="y">
<scelta_multipla nome="applicativi_utilizzati">
<!-- MODIFICATO!! -->
<opz valore="Access (database)"/>
<opz valore="Altro">eXtraWay</opz>
</scelta_multipla>
<partecipazione_sistemi_informativi presente="y">
<descrizione>x.dams</descrizione>
</partecipazione_sistemi_informativi>
</informatizzazione>
</risorsa_informativa_nested>
</lower_list>
<lower_list type="intervento">
<intervento_nested autor_sovraintendenza="y" id="037006-001-2012-i001" in_corso="y">
<!-- intervento 1 per bologna 2012 -->
<descrizione>Restauro archivi dei comprensori della provincia di Bologna</descrizione>
<scelta_multipla nome="tipologia">
<opz valore="Riordino"/>
<opz valore="Altro">Pulizia</opz>
</scelta_multipla>
<avvio>20111101T000000</avvio>
<conclusione_prevista>20120701T000000</conclusione_prevista>
<conclusione_effettiva/>
<autore/>
<promotore/>
<scelta_multipla nome="standard_descrittivi">
<opz valore="ISAD"/>
<opz valore="Altro">Descrizione altro standard descrittivo</opz>
</scelta_multipla>
<informatizzazione presente="y">
<scelta_multipla nome="applicativo_utilizzato">
<opz valore="Access (database)"/>
<opz valore="Altro">eXtraWay</opz>
</scelta_multipla>
<partecipazione_sistemi_informativi presente="y">
<descrizione>x.dams</descrizione>
</partecipazione_sistemi_informativi>
</informatizzazione>
<bridge_list type="complesso_archivistico">
<bridge ref="ca001"/>
</bridge_list>
</intervento_nested>
</lower_list>
<note/>
</patrimonio_archivistico>
<note/>
<?xw-meta Dbms="ExtraWay" DbmsVer="24.3.1" OrgNam="3D Informatica" OrgVer="1.0" Classif="1.0" ManGest="3.1" ManTec="0.0.4" DocType="" InsUser="admin" InsTime="20120910175739" ModUser="rtirabassi" ModTime="20120925145347"?>
<?xw-crc key32=e324f581-406521b5?>
</scheda_conservatore>
Ok, now the problem is assuming another aspect. Probably I have to "close" this and pass to another quiestion.
XPath, on the originale (wider) XML document, is correct and now I see where the problem is but have no idea about how to solve it.
If I execute just once the XPath expression onto the XML document, I HAVE THE EXPECTED RESULT;
If I execute a pretty large sequence of XPath onto the same XML document, complex XPaths (containing condition concerning attribute values) fails (those ones and only those);
So I took a look on how we implemented the XPath evaluation and find that the XPathContext saw never freed. So I changed the code in order to free the context after each XPath evaluation and create a new one everytime but... nothing changes.
Any Idea?

XPath works correctly. You are looking for #id='037006-001-2012-ri002 and the attribute value is 037006-001-2012-ri001. After changing xml to ri002 it matches, libxml returns the correct nodesetval.
In case it does not really resolve the problem: maybe id attribute is treated in a special way? Try changing it to idx. See Java XML DOM: how are id Attributes special?

Ok, find the question.
Sorry for this false alarm. libxml2 works properly but there were changes into the XML document, during a cycle of XPath evaluations, that changed the scenario causing me to believe the XPath processor was failing.
The deep debugging session showed us that something else was wrong and only he XPath expression having a condition on an attribute (but not the last condition) failed. This drove us to the solution.
My fault. Sorry.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

how to transform rdf file to hive table - mapreduce

Related

XSLT - How to get the first value in a repeating Tag - So simple but not that simple

Freemarker: getting a Template with include (prefix)

How to use XSLT 1.0 or XQuery 1.0 to simulate sql (group by and working with multiple xml nodes)

Stuctured XML to Map or Flat XML

XPath expression fails. (libxml2 in C) [closed]

Categories

Resources