Replace control char in xmlelement

Replace control char in xmlelement - regex

Context
Some PL/SQL package created to generate XML are throwing this error :
ORA-31061: Erreur XDB : special char to escaped char conversion failed.
This error happen because some of the text selected in the xmlelement contains control characters which are not allowed.
Solution
Replace all control chars of each xmlelement with a regex :
xmlelement("foo", REGEXP_REPLACE (bar, '[[:cntrl:]]', ''))
Problem with solution
I have 8 packages of about 5k rows each where almost each row is an xmlelement.
Other potential solution
I tought I could write a regex to replace each xmlelement's value automatically, but it fails when I have xmlelement in xmlelement with subquery and sub-subquery etc.
My Question
Is there a smarter way then replacing each xmlelement's value one by one ? I was asked to do all xmlelement of each packages to prevent further bugs but I'm sure there is a better way of doing this.
Edit
For example, you can reproduce the bug with this query :
select xmlelement("foo", unistr('\0013b')) from dual;
And I would fix it using this query :
select xmlelement("foo", regexp_replace(unistr('\0013b'), '[[:cntrl:]]', '')) from dual;

I don't think this is exactly what you want, but it is possible to generate xml for your query without error using dbms_xmlgen. Here is an example:
declare
xml_output CLOB;
my_context dbms_xmlgen.ctxHandle;
begin
my_context := dbms_xmlgen.newcontext('select unistr (''\0013b'') from dual');
xml_output := dbms_xmlgen.getxml(my_context);
dbms_xmlgen.closecontext(my_context);
dbms_output.put_line(xml_output);
end;

Related

replace expression format xx-xx-xxxx_12345678

IDENTIFIER
31-03-2022_13636075
01-04-2022_13650262
04-04-2022_13663174
05-04-2022_13672025
20220099001
11614491_R
10781198
00000000000
11283627_P
11614491_R
-1
how can i remove (only) the "XX-XX-XXXXX_" Part in certain values of a column in SSIS but WITHOUT affecting values that doesn't have this format? For example "21-05-2022_12345678" = "12345678" but the other values i don't want them affected. This are just examples of many rows from this column so i want only the ones that have this format to be affected.
SELECT REVERSE(substring(REVERSE('09-03-2022_13481330'),0,CHARINDEX('_',REVERSE('09-03-2022_13481330'),0)))
result
13481330
but this also affects others values.Also this is in ssms not ssis because i am not sure how to transform this expression in ssis code.
Update : Corrected code in SSIS goes as following:
(FINDSTRING(IDENTIFIER,"__-__-____[_]",1) == 1) ? SUBSTRING(IIDENTIFIER,12,LEN(IDENTIFIER) - 11) : IDENTIFIER

Do you have access to the SQL source? You can do this on the sql by using a LIKE and crafting a match pattern using the single char wildcard _ please see below example
DECLARE #Value VARCHAR(50) = '09-03-2022_13481330'
SELECT CASE WHEN #Value LIKE '__-__-____[_]%' THEN
SUBSTRING(#Value,12,LEN(#Value)-11) ELSE #Value END
Please see the Microsoft Documentation on LIKE and using single char wildcards
If you don't have access to the source SQL it gets a bit more tricky as you might need to use regex in a script task or maybe there is a expression you can apply

What is wrong with my Power BI query (using a parameter)?

I'm brand new to using PBI but as far as I can tell, I should be able to substitute a parameter as part of a Direct Query in place of a hard-coded variable...ie
let
Source = Sql.Database("NAMEOFDB", "CMUtility", [Query="sp_get_residentsinfo "& home_name]),.....
instead of
let
Source = Sql.Database("NAMEOFDB", "CMUtility", [Query="sp_get_residentsinfo 'NAME OF HOME'"]),...
However, the parameter-included version just says
DataSource.Error: Microsoft SQL: Incorrect syntax near 'House'.
Details:
DataSourceKind=SQL
DataSourcePath=NAMEOFDB;CMUtility
Message=Incorrect syntax near 'House'.
Number=102
Class=15
"House" is the currently - assigned last word of the home_name variable. What have I done wrong?
PS - I have surmised that I shouldn't need the extra & at the end of the parameter, as I'm not adding anything else to the query, but even with both &s it still doesn't work.

The type of your parameters is text. In SQL, text literals must be quoted, i.e. sp_get_residentsinfo 'NAME OF HOME', but the statement build by you is sp_get_residentsinfo NAME OF HOME.
You should use Text.Replace to escape single quotes in the parameter's value and append a quote before and after it.

display the content of a file split by a delimiter character

I am trying to display the content of a file, split by a delimiter character.
More exactly, starting from this topic, I am trying to display the result as:
bbb
aaa
qqq
ccc
but the data source to be taken from a file.
Until now, I tried:
DECLARE
l_bfile bfile;
BEGIN
l_bfile := bfilename(my_dir, my_file);
dbms_lob.fileopen(l_bfile);
FOR i IN
(SELECT TRIM(regexp_substr(TO_CHAR(l_bfile),'[^;]+',1,level) ) AS q
FROM dual
CONNECT BY regexp_substr(TO_CHAR(l_bfile),'[^;]+',1,level) IS NOT NULL
ORDER BY level
)
LOOP
dbms_output.put_line(i.q);
END LOOP;
EXCEPTION
WHEN No_Data_Found THEN
NULL;
END;
As result, I got
PL/SQL: ORA-00932: inconsistent datatypes: expected NUMBER got FILE
Can anyone give me a hint, please?

Have to write this as a new answer since this is too big for a comment to #SmartDumb:
Be advised the regex of the form '[^;]+' (commonly used for parsing delimited lists) fails when NULL elements are found in the list. Please see this post for more information: https://stackoverflow.com/a/31464699/2543416
Instead please use this form of the call to regexp_substr (note I removed the second element):
SELECT TRIM(regexp_substr('bbb;;qqq;ccc','(.*?)(;|$)',1,level, null, 1) ) AS q
FROM dual
CONNECT BY regexp_substr('bbb;;qqq;ccc','(.*?)(;|$)',1,level) IS NOT NULL
ORDER BY level
It may or may not be important in this example, it depends on if the order of the element in the string has importance to you or if you need to preserve the NULL. i.e. if you need to know the second element is NULL then this will work.
P.S. Do a search for external tables and see if that is a solution you could use. That would let you query a file as if it were a table.

You could possible try this if your file contains single line (hence the question about file structure):
DECLARE
utlFileHandle UTL_FILE.FILE_TYPE;
vLine varchar2(100);
BEGIN
utlFileHande := UTL_FILE.FOPEN(my_dir, my_file, 'r');
utl_file.get_line(utlFileHande, vLine);
FOR i IN
(SELECT TRIM(regexp_substr(vLine,'[^;]+',1,level) ) AS q
FROM dual
CONNECT BY regexp_substr(vLine,'[^;]+',1,level) IS NOT NULL
ORDER BY level
)
LOOP
dbms_output.put_line(i.q);
END LOOP;
utl_file.fclose(utlFileHande);
EXCEPTION
WHEN No_Data_Found THEN
utl_file.fclose(utlFileHande);
null;
END;

cts:value-match on xs:dateTime() type in Marklogic

I have a variable $yearMonth := "2015-02"
I have to search this date on an element Date as xs:dateTime.
I want to use regex expression to find all files/documents having this date "2015-02-??"
I have path-range-index enabled on ModifiedInfo/Date
I am using following code but getting Invalid cast error
let $result := cts:value-match(cts:path-reference("ModifiedInfo/Date"), xs:dateTime("2015-02-??T??:??:??.????"))
I have also used following code and getting same error
let $result := cts:value-match(cts:path-reference("ModifiedInfo/Date"), xs:dateTime(xs:date("2015-02-??"),xs:time("??:??:??.????")))
Kindly help :)

It seems you are trying to use wild card search on Path Range index which has data type xs:dateTime().
But, currently MarkLogic don't support this functionality. There are multiple ways to handle this scenario:
You may create Field index.
You may change it to string index which supports wildcard search.
You may run this workaround to support your existing system:
for $x in cts:values(cts:path-reference("ModifiedInfo/Date"))
return if(starts-with(xs:string($x), '2015-02')) then $x else ()
This query will fetch out values from lexicon and then you may filter your desired date.

You can solve this by combining a couple cts:element-range-querys inside of an and-query:
let $target := "2015-02"
let $low := xs:date($target || "-01")
let $high := $low + xs:yearMonthDuration("P1M")
return
cts:search(
fn:doc(),
cts:and-query((
cts:element-range-query("country", ">=", $low),
cts:element-range-query("country", "<", $high)
))
)
From the cts:element-range-query documentation:
If you want to constrain on a range of values, you can combine multiple cts:element-range-query constructors together with cts:and-query or any of the other composable cts:query constructors, as in the last part of the example below.

You could also consider doing a cts:values with a cts:query param that searches for values between for instance 2015-02-01 and 2015-03-01. Mind though, if multiple dates occur within one document, you will need to post filter manually after all (like in option 3 of Navin), but it could potentially speed up post-filtering a lot..
HTH!

How to find all the source lines containing desired table names from user_source by using 'regexp'

For example we have a large database contains lots of oracle packages, and now we want to see where a specific table resists in the source code. The source code is stored in user_source table and our desired table is called 'company'.
Normally, I would like to use:
select * from user_source
where upper(text) like '%COMPANY%'
This will return all words containing 'company', like
121 company cmy
14 company_id, idx_name %% end of coding
453 ;companyname
1253 from db.company.company_id where
989 using company, idx, db_name,
So how to make this result more intelligent using regular expression to parse all the source lines matching a meaningful table name (means a table to the compiler)?
So normally we allow the matched word contains chars like . ; , '' "" but not _
Can anyone make this work?

To find company as a "whole word" with a regular expression:
SELECT * FROM user_source
WHERE REGEXP_LIKE(text, '(^|\s)company(\s|$)', 'i');
The third argument of i makes the REGEXP_LIKE search case-insensitive.
As far as ignoring the characters . ; , '' "", you can use REGEXP_REPLACE to suck them out of the string before doing the comparison:
SELECT * FROM user_source
WHERE REGEXP_LIKE(REGEXP_REPLACE(text, '[.;,''"]'), '(^|\s)company(\s|$)', 'i');
Addendum: The following query will also help locate table references. It won't give the source line, but it's a start:
SELECT *
FROM user_dependencies
WHERE referenced_name = 'COMPANY'
AND referenced_type = 'TABLE';

If you want to identify the objects that refer to your table, you can get that information from the data dictionary:
select *
from all_dependencies
where referenced_owner = 'DB'
and referenced_name = 'COMPANY'
and referenced_type = 'TABLE';
You can't get the individual line numbers from that, but you can then either look at user_source or use a regexp on the specific source code, which woudl at least reduce false positives.

SELECT * FROM user_source
WHERE REGEXP_LIKE(text,'([^_a-z0-9])company([^_a-z0-9])','i')
Thanks #Ed Gibbs, with a little trick this modified answer could be more intelligent.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace control char in xmlelement - regex

Related

replace expression format xx-xx-xxxx_12345678

What is wrong with my Power BI query (using a parameter)?

display the content of a file split by a delimiter character

cts:value-match on xs:dateTime() type in Marklogic

How to find all the source lines containing desired table names from user_source by using 'regexp'

Categories

Resources