REGEX XML question - Finding value of a name value pair

REGEX XML question - Finding value of a name value pair - regex

I have always try to do quick Regex query on XML tags. Yep, I have been told it is not a good idea an you should load it in an object but sometimes like this, it is a in Oracle db blob. I need to get the VALUE for a specific NAME in a name value pair XML like this :
<entry>
<string>NAME</string>
<string>VALUE</string>
</entry>
Is there a way to do this with REGEX

You should parse XML using an XML Parser and in Oracle you can use XMLQUERY:
SELECT XMLQUERY(
'/root/entry/string[1][text()="NAME2"]/../string[2]/text()'
PASSING XMLTYPE( xml, NLS_CHARSET_ID('UTF8') )
RETURNING CONTENT
) AS value
FROM table_name;
Or XMLTABLE:
SELECT value
FROM table_name
CROSS APPLY XMLTABLE(
'/root/entry'
PASSING XMLTYPE( xml, NLS_CHARSET_ID('UTF8') )
COLUMNS
name VARCHAR2(20) PATH './string[1]',
value VARCHAR2(20) PATH './string[2]'
)
WHERE name = 'NAME2';
Which for the sample data:
CREATE TABLE table_name ( xml BLOB );
DECLARE
value CLOB := '<root>
<entry>
<string>NAME1</string>
<string>VALUE1</string>
</entry>
<entry>
<string>NAME2</string>
<string>VALUE2</string>
</entry>
<entry>
<string>NAME3</string>
<string>VALUE3</string>
</entry>
<entry>
<string>NAME4</string>
<string>VALUE4</string>
</entry>
</root>';
dest_offset INTEGER := 1;
src_offset INTEGER := 1;
lang_context INTEGER := DBMS_LOB.DEFAULT_LANG_CTX;
result BLOB;
warning INTEGER;
warning_msg VARCHAR2(50);
BEGIN
DBMS_LOB.CreateTemporary(
lob_loc => result,
cache => TRUE
);
DBMS_LOB.CONVERTTOBLOB(
dest_lob => result,
src_clob => value,
amount => LENGTH( value ),
dest_offset => dest_offset,
src_offset => src_offset,
blob_csid => DBMS_LOB.DEFAULT_CSID,
lang_context => lang_context,
warning => warning
);
INSERT INTO table_name ( xml ) VALUES ( result );
END;
/
Both outputs:
| VALUE |
| :----- |
| VALUE2 |
Can you do it with a regular expression? Yes:
SELECT REGEXP_SUBSTR(
TO_CLOB( xml ),
'<entry>\s*<string>NAME2</string>\s*<string>([^<]*)</string>\s*</entry>',
1,
1,
'c',
1
) AS value
FROM table_name
Which outputs:
| VALUE |
| :----- |
| VALUE2 |
db<>fiddle here
However, you shouldn't as the XML parsing functions take an XPATH which specifies where it should look for the data. The regular expression will just treat the value as a string and look for the first match even if it is not in the expected place in the XML hierarchy.
For example, if your data is:
<root>
<entry>
<string>NAME1</string>
<string>VALUE1</string>
<other><entry><string>NAME2</string><string>NOT THIS</string></entry></other>
</entry>
<entry>
<string>NAME2</string>
<string>VALUE2</string>
</entry>
</root>
Then XMLQUERY and XMLTABLE will find the correct value but the regular expression outputs:
| VALUE |
| :------- |
| NOT THIS |
db<>fiddle here
Or, if your data suddenly has an attribute:
<root>
<entry>
<string>NAME1</string>
<string>VALUE1</string>
</entry>
<entry>
<string>NAME2</string>
<string attr="attr value">VALUE2</string>
</entry>
</root>
Then parsing with the regular expression will fail and return NULL.
db<>fiddle here
So, don't use a regular expression, use a proper XML parser.

Related

how to get document that contain more than collection in xslt?

let $stylesheet := "abc.xsl"
let $params := map:map()
let $_ := map:put ($params,"col1","abc")
return
xdmp:xslt-invoke(
$stylesheet, (), $params,
<options xmlns="xdmp:eval">
<template>a:schema</template>
</options>)
abc.xsl
<xsl:template name="a:schema">
<xsl:param name="collection-uri" as="xs:string" select="$col1"/>
<xsl:apply-templates select="collection($collection-uri)"/>
</xsl:template>
In this currently ,we are taking all the document ,which is coming in collection "abc".
But I want to add more than one collection in $param map, so that the document which contain ,both collection "abc" and "def" will comes
for example :
| Document| collection
|:---------|:----------:|
| Doc1 | abc, def |
| Doc2 | abc |
| Doc3 | abc, def |
it will pick Doc1 and Doc3

collection() accepts a sequence of xs:string, but would return any of the documents in either of the collections specified.
If you want only the docs that are in all of the collections specified, you could use cts:search() with a sequence of cts:collection-query() inside of a cts:and-query().
<xsl:template match="/">
<xsl:param name="collection-uri" as="xs:string*" select="$col1"/>
<xsl:apply-templates select="cts:search(doc(), cts:and-query(( $collection-uri ! cts:collection-query(.) )))"/>
</xsl:template>
Enable the 1.0-ml dialect, so that you can use the cts built-in functions by adding the following attribute to your xsl:stylesheet element:
xdmp:dialect="1.0-ml"
The $collection-uri param is declared as xs:string, so it will only have one string value. You could change that to be a sequence of strings with either * or + quantifier:
<xsl:param name="collection-uri" as="xs:string*" select="$col1"/>
and then set the collections on the $col1 param:
let $_ := map:put ($params,"col1", ('abc', 'def'))

else clause in pmml derivedfield

In derivedfield in pmml, for making local transformation I used the code below
<MapValues outputColumn="longForm">
<FieldColumnPair field="gender" column="shortForm"/>
<InlineTable>
<row><shortForm>m</shortForm><longForm>male</longForm>
</row>
<row><shortForm>f</shortForm><longForm>female</longForm>
</row>
</InlineTable>
</MapValues>
In that code if shortform is m it returns "male", and if shortform is f it returns "female". Also I want want to add else clause for that code. If shortfrom is not m or f it should returns "unknown". How can I do that?

You should specify MapValues#defaultValue attribute:
<MapValues outputColumn="longForm" defaultValue="unknown">
...
</MapValues>

How to use XSLT 1.0 or XQuery 1.0 to simulate sql (group by and working with multiple xml nodes)

I have 4 xml documents in nodes and I am trying to use XSLT 1.0 or XQuery 1.0 to process query these xml files and generate one xml as output. I am not sure how to use key and group-by functionality in XSLT 1.0 and how to efficiently query multiple xml nodes to form one output xml. (there are multiple rows in each xml file)
tableA
<table>
<row>
<id></id>
<group></group>
<version></version>
<status></status>
<row>
<table>
tableB
<table>
<row>
<id></id>
<group></group>
<version></version>
</row>
<table>
tableC
<table>
<row>
<id></id>
<code></code>
<version></version>
<code_version></code_version>
<version></version>
</row>
<table>
tableD
<table>
<row>
<id></id>
<code></code>
<code_version></code_version>
<date></date>
</row>
<table>
My SQL Query:
SELECT a.[id]
,c.[code]
,d.[date]
,a.[group ]
FROM [source].[dbo].[tableA] a, [source].[dbo].[tableB] b,
[source].[dbo].[tableC] c, [source].[dbo].[tableD] d
WHERE a.[id] = b.[id]
AND a.[version] = b.[version]
AND a.[id] = c.[id]
AND a.[version] = c.[version]
AND c.[code] = d.[code]
AND c.[code_version] = d.[code_version]
GROUP BY a.[id]
,c.[code]
,d.[date]
,a.[status]
ORDER BY a.[id], c.[code]
I also tried creating partial XQuery 1.0, but it is not working.
My XQuery 1.0:
xquery version "1.0";
declare namespace ms = "http://www.ms.com/extensions";
let $A := fn:doc('tableA.xml')
let $B := fn:doc('tableB.xml')
let $C := fn:doc('tableC.xml')
let $D := fn:doc('tableD.xml')
for $a in $A,
$b in $B,
$c in $C,
$d in $D
where $a/table/row/id = $b/table/row/id
and $a/table/row/version = $b/table/row/version
and $a/table/row/id = $c/table/row/id
and $a/table/row/version = $c/table/row/version
and $c/table/row/code = $d/table/row/code
and $c/table/row/code_version = $d/table/row/code_version
order by $a
return $a
This is just a partial query. This query works good if i don't apply where clause on $c and $d. I am not sure if this is the right way to perform this operation though.

Notepad++ incrementally replace

Lets say I want to have a 10 rows of data but I want a value to increment for each row or piece of data. How do I increment that value?
For example....If I have these rows, is there a regex way of replacing the id values to increment?
<row id="1" />
<row id="1" />
<row id="1" />
<row id="1" />
<row id="1" />
--- Here is what I would like it to look like... (if the first row's id goes up one thats ok)
<row id="1" />
<row id="2" />
<row id="3" />
<row id="4" />
<row id="5" />

Not sure about regex, but there is a way for you to do this in Notepad++, although it isn't very flexible.
In the example that you gave, hold Alt and select the column of numbers that you wish to change. Then go to Edit->Column Editor and select the Number to Insert radio button in the window that appears. Then specify your initial number and increment, and hit OK. It should write out the incremented numbers.
Note: this also works with the Multi-editing feature (selecting several locations while maintaining Ctrl key pressed).
This is, however, not anywhere near the flexibility that most people would find useful. Notepad++ is great, but if you want a truly powerful editor that can do things like this with ease, I'd say use Vim.

I was looking for the same feature today but couldn't do this in Notepad++. However, we have TextPad to our rescue. It worked for me.
In TextPad's replace dialog, turn on regex; then you could try replacing
<row id="1"/>
by
<row id="\i"/>
Have a look at this link for further amazing replace features of TextPad -
http://sublimetext.userecho.com/topic/106519-generate-a-sequence-of-numbers-increment-replace/

i had the same problem with more than 250 lines and here is how i did it:
for example :
<row id="1" />
<row id="1" />
<row id="1" />
<row id="1" />
<row id="1" />
you put the cursor just after the "1" and you click on alt + shift and start descending with down arrow until your reach the bottom line now you see a group of selections click on erase to erase the number 1 on each line simultaneously and go to Edit -> Column Editor and select Number to Insert then put 1 in initial number field and 1 in incremented by field and check zero numbers and click ok
Congratulations you did it :)

Since there are limited real answers I'll share this workaround. For really simple cases like your example you do it backwards...
From this
1
2
3
4
5
Replace \r\n with " />\r\n<row id=" and you'll get 90% of the way there
1" />
<row id="2" />
<row id="3" />
<row id="4" />
<row id="5
Or is a similar fashion you can hack about data with excel/spreadsheet. Just split your original data into columns and manipulate values as you require.
| <row id=" | 1 | " /> |
| <row id=" | 1 | " /> |
| <row id=" | 1 | " /> |
| <row id=" | 1 | " /> |
| <row id=" | 1 | " /> |
Obvious stuff but it may help someone doing the odd one-off hack job to save a few key strokes.

http://docs.notepad-plus-plus.org/index.php/Inserting_Variable_Text
Notepad++ comes equipped with a Edit -> Column "Alt+C" Editor which can work on a rectangular selection in two different ways: Coledit.png
inserting some fixed text on every line including and following the current line, at the column of the insertion point (aka caret). Initially selected text is left untouched.
As the picture illustrates, a linear series of numbers can be inserted in the same manner. The starting value and increment are to be provided. Left padding with zeroes is an option, and the number may be entered in base 2, 8, 10 or 16 - this is how the computed values will be displayed too, padding being based on the largest.

Solutions suggested above will work only if data is aligned..
See solution in the link using PythonScript Notepad++ plugin, It Works great!
stackoverflow Find/Replace but Increment Value

You can do it using Powershell through regex and foreach loop, if you store your values in file input.txt:
$initialNum=1; $increment=1; $tmp = Get-Content input.txt | foreach { $n = [regex]::match($_,'id="(\d+)"').groups[1
].value; if ($n) {$_ -replace "$n", ([int32]$initialNum+$increment); $increment=$increment+1;} else {$_}; }
After that you can store $tmp in file using $tmp > result.txt. This doesn't need data to be in columns.

(Posting in case someone might have a use of it).
I was looking for a solution for a problem a bit more sophisticated than OP - replacing EVERY occurrence of something with the number by same thing with incremented number
E.g. Replacing something like this:
<row id="1" />
<row id="2" />
<row id="1" />
<row id="3" />
<row id="1" />
By this:
<row id="2" />
<row id="3" />
<row id="2" />
<row id="4" />
<row id="2" />
Couldnt find the solution online so I wrote my own script in groovy (a bit ugly but does the job):
/**
* <p> Finds words that matches template and increases them by 1.
* '_' in word template represents number.
*
* <p> E.g. if template equals 'Row="_"', then:
* ALL Row=0 will be replaced by Row="1"
* All Row=1 will be replaced by Row="2"
* <p> Warning - your find template might not work properly if _ is the last character of it
* etc.
* <p> Requirments:
* - Original text does not contain tepmlate string
* - Only numbers in non-disrupted sequence are incremented and replaced
* (i.e. from example below, if Row=4 exists in original text, but Row=3 not, than Row=4 will NOT be
* replaced by Row=5)
*/
def replace_inc(String text, int startingIndex, String findTemplate) {
assert findTemplate.contains('_') : 'search template needs to contain "_" placeholder'
assert !(findTemplate.replaceFirst('_','').contains('_')) : 'only one "_" placeholder is allowed'
assert !text.contains('_____') : 'input text should not contain "______" (5 underscores)'
while (true) {
findString = findTemplate.replace("_",(startingIndex).toString())
if (!text.contains(findString)) break;
replaceString = findTemplate.replace("_", "_____"+(++startingIndex).toString())
text = text.replaceAll(findString, replaceString)
}
return text.replaceAll("_____","") // get rid of '_____' character
}
// input
findTemplate = 'Row="_"'
path = /C:\TEMP\working_copy.txt/
startingIndex = 0
// do stuff
f = new File(path)
outText = replace_inc(f.text,startingIndex,findTemplate)
println "Results \n: " + outText
f.withWriter { out -> out.println outText }
println "Results written to $f"

You can use this very similar script in Python. It is a very good code. Just change $item_id = abc; with your <row id="abc" /> (in your case)
import os
import re
def read_text_from_file(file_path):
"""
Aceasta functie returneaza continutul unui fisier.
file_path: calea catre fisierul din care vrei sa citesti
"""
with open(file_path, encoding='utf8') as f:
text = f.read()
return text
def write_to_file(text, file_path):
"""
Aceasta functie scrie un text intr-un fisier.
text: textul pe care vrei sa il scrii
file_path: calea catre fisierul in care vrei sa scrii
"""
with open(file_path, 'wb') as f:
f.write(text.encode('utf8', 'ignore'))
def incrementare_fisiere_html(cale_folder_html):
"""
Functia itereaza printr-un folder care contine fisiere html si adauga index-ul corespunzator in fiecare fisier
"""
count = 0
current_id = 1
for f in os.listdir(cale_folder_html):
if f.endswith('.html'):
cale_fisier_html = cale_folder_html + "\\" + f
html_text = read_text_from_file(cale_fisier_html)
item_id_pattern = re.compile('\$item_id = (.*?);')
item_id = re.findall(item_id_pattern, html_text)
if len(item_id) > 0:
print("{} a fost modificat. ".format(f))
item_id = item_id[0]
html_text = html_text.replace(item_id, str(current_id))
current_id += 1
count += 1
write_to_file(html_text, cale_fisier_html)
else:
# print("{} nu are $item_id.".format(f))
continue
else:
continue
print("Numarul de fisiere modificate: ", count)
if __name__ == '__main__':
# sa pui calea catre folderul cu fisiere
incrementare_fisiere_html('e:\\Folder1')
The explanation and SOURCE code CLICK HERE:

libxml2 XPATH - Selecting subset of data from XML

I am fairly new to XML dev.. I had a few questions regarding XML parsing with XPATH and libxml.
I have an XML structured as :
<resultset>
<result count=1>
<row>
<name> He-Man! </name>
<home> Greyskull </home>
<row>
</result>
<result count=2>
<row>
<name> Spider-Man</name>
<home> Some downtown apartment </home>
<row>
<row>
<name> Disco-Man!</name>
<home> The 70's dance floor </home>
<row>
</result>
<resultset>
I need to pick out the names from this XML , but where the count is 2 , i need it only from the first record. I ran through a few tutorials, but i am unable to come up with an XPATH query which would serve this purpose.
/name will select all name elements.
/result[#count > 1 ]/row[1]/name | /result[#count =1 ]/row/name
Is this possible to be done with XPATH ? Is this better to be done via XPATH or by walking the XML tree?
Can some one point me to some complex searches through out XML's ?
Edit : The actual scenario requires select a subset of the XML row , which are nested at 2 levels at times. This sounds like i need to OR '|' many paths to select the nodes i require... I am not sure if that would be efficient as opposed to walking a tree... The above is typed to replicate the problem :)
Thanks!

Try this XPath -
/resultset/result[#count=2]/row/name
This will give a list of all nodes falling under this XPath. From this just take the first element (as you needed only the first record).

I'd probably keep my xpath simpler and just extract both cases, then loop over both node sets.
If you do need to go down the single xpath route, you should try out your xpath expressions in something that lets you enter them live, rather than having to recompile C/C++ code. You should be able to do that by loading your XML into firefox and using firebug - for example typing $x('//name') in the firebug console gives three nodes.
NOTE however that your XML is invalid... You have a bunch of "<row>"s that should be "</row>" and the same for "<resultset>" and your counts need to be
<result count="1">
i.e. with quote marks around the value.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

REGEX XML question - Finding value of a name value pair - regex

Related

how to get document that contain more than collection in xslt?

else clause in pmml derivedfield

How to use XSLT 1.0 or XQuery 1.0 to simulate sql (group by and working with multiple xml nodes)

Notepad++ incrementally replace

libxml2 XPATH - Selecting subset of data from XML

Categories

Resources