Handle special characters during textfile import into OracleDB via Apex - regex

I'm working on a tool that imports textfiles into a BLOB column (OracleDB). This is handled via an Apex page with a File Browse button and connected import procedure.
For more details about the import to BLOB procedure: http://ittichaicham.com/2011/03/file-browser-in-apex-4-with-blob-column-specified-in-item-source-attribute/
The textfiles that I'm using contain special characters, null values, decimal seperators etc. For example:
(...) 111888|Overflakkée, Blabla|streetname with Rhône||12-13|UXC
Placename (...)
Since it's all character data, I'm converting the BLOB to CLOB with this procedure:
FUNCTION blob_to_clob (blob_in IN BLOB)
RETURN CLOB
AS
v_clob CLOB;
v_varchar VARCHAR2(32767);
v_start PLS_INTEGER := 1;
v_buffer PLS_INTEGER := 32767;
BEGIN
DBMS_LOB.CREATETEMPORARY(v_clob, TRUE);
FOR i IN 1..CEIL(DBMS_LOB.GETLENGTH(blob_in) / v_buffer)
LOOP
v_varchar := UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(blob_in, v_buffer, v_start));
DBMS_LOB.WRITEAPPEND(v_clob, LENGTH(v_varchar), v_varchar);
v_start := v_start + v_buffer;
END LOOP;
RETURN v_clob;
END blob_to_clob;
See for more info:http://www.dba-oracle.com/t_convert_blob_to_clob_script.htm
The problem:
While converting the blob to clob, some of the special characters are lost/altered.
For example, this row:
(...) 111888|Overflakkée, Blabla|streetname with Rhône||12-13|UXC
Placename (...)
will become this row:
(...) 111888|Overflakk� Blabla|streetname with Rh�|12-13|UXC
Placename (...)
Row length, characters and even seperators (in this case a '|') are altered/not visible.
Is there a way to obtain the lost characters + keep seperators/null values in place? (if its necessary to change 'é' to 'e', that's fine).
Is there a more efficient way to import textfiles into a BLOB/CLOB column?
Regards

You need to do a conversion from the source character set to the character set of the database
Here is an example I made (mainly for getting big json objects, javascript is utf8, to work with in a 8859p1 database), It's pretty simple so I won't explain it too much.
example usage with conversion:
l_clob := blob_to_clob (l_blob, '1');
Function:
function blob_to_clob (blob_in in blob, p_convertutf8 in char default 0)
return clob as
/* Ólafur Tryggvason */
l_clob clob;
l_varchar varchar2 (32767);
l_start pls_integer := 1;
l_buffer pls_integer := 32767;
l_characterset nls_database_parameters.value%type;
begin
select value
into l_characterset
from nls_database_parameters
where parameter = 'NLS_CHARACTERSET';
dbms_lob.createtemporary (l_clob, true);
for i in 1 .. ceil (dbms_lob.getlength (blob_in) / l_buffer) loop
l_varchar := utl_raw.cast_to_varchar2 (dbms_lob.substr (blob_in, l_buffer, l_start));
if p_convertutf8 = '1' then
l_varchar := convert (l_varchar, l_characterset, 'UTF8'); -- WE8ISO8859P1
end if;
dbms_lob.writeappend (l_clob, length (l_varchar), l_varchar);
l_start := l_start + l_buffer;
end loop;
return l_clob;
end blob_to_clob;

Related

Error in Process code: ORA-06502: PL/SQL: numeric or value error: character to number conversion error

Here is my Plsql code used inside the process code, but i am getting the error as char to number conversion, that to due to l_receipt_date column
I need to insert the date as 'YYYY-MM-DD' format as mentioned below so i used the below code inside process,l_receipt_date:=to_char(:P4_RECEIPT_DATE,'YYYY-MM-DD'); ,i am getting error because of this above line.Here is my Plsql code used inside the process code, but i am getting the error as char to number conversion, that to due to l_receipt_date column
DECLARE
l_clob CLOB;
l_emp_no NUMBER;
l_status VARCHAR2(100);
l_employee_name VARCHAR2(100);
l_id NUMBER;
l_employee_salary NUMBER;
l_employee_age NUMBER;
l_request_url VARCHAR2(200);
l_body_clob clob;
x_err varchar2(2000);
l_receipt_date varchar2(1000);
begin
l_request_url := 'https://fa-eoxd-test-saasfaprod1.fa.ocs.oraclecloud.com/fscmRestApi/resources/latest/standardReceipts/';
apex_web_service.g_request_headers(1).name := 'Content-Type';
apex_web_service.g_request_headers(1).value := 'application/json';
l_receipt_date:=to_char(:P4_RECEIPT_DATE,'YYYY-MM-DD');
l_body_clob:='{
"ReceiptNumber":"'||:P4_RECEIPT_NUMBER||'"
,"BusinessUnit":"'||:P4_OPERATING_UNIT_NAME||'"
,"ReceiptMethod":"'||:P4_RECEIPT_METHOD||'"
,"ReceiptDate":"'||l_receipt_date||'"
}';
l_clob :=
APEX_WEB_SERVICE.MAKE_REST_REQUEST(
p_url => l_request_url,
p_http_method => 'POST',
p_username => 'fin.user',
p_password => 'Fusion#123',
p_body => l_body_clob) ;
htp.p(l_clob);
exception when others then
x_err:=sqlerrm;
htp.p(x_err);
END;
P4_RECEIPT_DATE is a page item, and all page items are character strings. So before you can TO_CHAR it to a different formatted string, you first need to TO_DATE it to a date. Suppose it is currently in the format DD-MON-YYYY. Then you will need to do this:
l_receipt_date:=to_char(to_date(:P4_RECEIPT_DATE, 'DD-MON-YYYY'),'YYYY-MM-DD');
i.e.
Take the string in P4_RECEIPT_DATE and convert it to a DATE using format DD-MON-YYYY.
Take that date and convert it to a string again in format YYYY-MM-DD.

Calculation in one field

i'm new to sql and pl/sql. To practice I was giving an assignment to make a calculator. That part works. But they also want the possibility to type the calculation in the text field and then it needs to work. For example 4+4 (then the = button or enter on your keyboard) or 4+6-3=.
My calculator with buttons works, but not if I type a calculation in the text field. Can anyone help me with this?
This is the code I have in my total:
declare
l_operator varchar2(1) := :P3_OPERATOR;
l_value1 number := :P3_VALUE1;
l_value2 number := :P3_VALUE2;
l_result number := nvl(:P3_VALUE1,0);
begin
case l_operator
when '+' then
l_result := l_value1 + l_value2;
when '-' then
l_result := l_value1 - l_value2;
when '*' then
l_result := l_value1 * l_value2;
when '/' then
l_result := l_value1 / l_value2;
else
null;
end case;
:P3_OPERATOR := null;
:P3_VALUE2 := null;
:P3_VALUE1 := l_result;
:P3_NUMBERFIELD := l_result;
end;
with this for als extra for the +, -, * and \ .
:P12_OPERATOR := '*';
:P12_NUMBERFIELD := :P12_OPERATOR;
and this is the code for all my number buttons:
begin
if :P12_OPERATOR is null then
:P12_VALUE1 := :P12_VALUE1 || 4;
:P12_NUMBERFIELD := :P12_VALUE1;
elsif :P12_OPERATOR is not null then
:P12_VALUE2 := :P12_VALUE2 || 4;
:P12_NUMBERFIELD := :P12_VALUE2;
end if;
end;
This is not a typical way to use SQL or PL/SQL (or APEX which it looks like you are also using)!
You could evaluate any expression typed in with code like this:
begin
execute immediate 'select ' || :P3_NUMBERFIELD || ' from dual' into l_result;
exception
when others then
l_result := 'Invalid input';
end;
The exception part is to stop the calculator going wrong if the user types in nonsense like "hello world" instead of an arithmetic expression. The user would need to type in an expression like 4+4 without typing the equals sign, and then press a button to invoke the process to calculate the result.

PostgreSQL return an Array or Record as a Row

I'm trying to return a variable with a PostgreSQL function that returns row/rows so I can use libpqxx on the client side to iterate over it for example using:
for (pqxx::result::const_iterator row = result.begin(); row != result.end(); row++)
{
for (pqxx::const_row_iterator field = row.begin(); field != row.end(); field++)
{
cout << field << '\n';
}
}
This is my PostgresSQL function:
CREATE OR REPLACE FUNCTION seal_diff_benchmark_pgsql(sealparams CHARACTER VARYING) RETURNS RECORD AS $outputVar$
DECLARE
tempVar1 CHARACTER VARYING;
tempVar2 CHARACTER VARYING;
outputVar1 TEXT[];
outputVar record;
sealArray TEXT[];
execTime NUMERIC[];
BEGIN
FOR i IN 1..2 LOOP
SELECT "Pickup_longitude", "Dropoff_longitude" INTO tempVar1, tempVar2 FROM public.nyc2015_09_enc WHERE id=i;
sealArray := (SELECT public.seal_diff_benchmark(tempVar1, tempVar2, sealparams));
outputVar1[i] := sealArray[1];
execTime[i] := sealArray[2];
END LOOP;
SELECT UNNEST(outputVar1) INTO outputVAR;
RETURN outputVar;
END;
$outputVar$ LANGUAGE plpgsql;
I also tried returning outputVar1 as TEXT[]. My field variable on the client side holds {foo, bar} if I use returns TEXT[] or (foo) if I use returns RECORD. But this is not what I need, which is a row like return from a TEXT[] array or a RECORD variable without any (), [], {} chars at the beginning and at the end of the output.
How can I change my PostgreSQL function to make it work? I think I'm missing something but I can't see what.
There are many approaches to do what you want.
If it really is just one column that you want, then you can simply do:
CREATE OR REPLACE FUNCTION seal_diff_benchmark_pgsql(sealparams CHARACTER VARYING)
RETURNS SETOF TEXT AS $outputVar$
DECLARE
tempVar1 CHARACTER VARYING;
tempVar2 CHARACTER VARYING;
sealArray TEXT[];
execTime NUMERIC[];
outputVar text;
BEGIN
FOR i IN 1..2 LOOP
SELECT "Pickup_longitude", "Dropoff_longitude" INTO tempVar1, tempVar2
FROM public.nyc2015_09_enc WHERE id=i;
sealArray := (SELECT public.seal_diff_benchmark(tempVar1, tempVar2, sealparams));
execTime[i] := sealArray[2];
FOREACH outputVar IN ARRAY sealArray[1] LOOP --iterate over that text array
RETURN NEXT outputVar;
END LOOP;
END LOOP;
END;
$outputVar$ LANGUAGE plpgsql;
Returned colum will be named just like the function.
SELECT seal_diff_benchmark_pgsql FROM seal_diff_benchmark_pgsql('stuff');
-- alternative
SELECT seal_diff_benchmark_pgsql('stuff');
You can also specify columns in function parameters:
CREATE OR REPLACE FUNCTION seal_diff_benchmark_pgsql(sealparams CHARACTER VARYING, OUT outputVar text)
Then returned column will be named outputVar. In case of returning just one column, Postgres forces RETURNS to be of that column type, so in this case SETOF TEXT or just TEXT if one row is expected. If you return more than one column, then you need to use RETURNS SETOF RECORD.
When you use named columns in function parameters, then you need to assign values to them just like you would to variables from DECLARE section:
LOOP
outputVar := 'some value';
outputVar2 := 'some value';
outputVar3 := 'some value';
RETURN NEXT;
END LOOP;
There are a few other examples on how to return sets from functions in my old answer here: How to return rows of query result in PostgreSQL's function?

PL/SQL. Parse clob UTF8 chars with regexp_like regular expressions

I want to check if any line of my clob have strange characters like (ñ§). These characters are read from a csv-file with an unexpected encoding (UTF-8) which converts some of them.
I tried to filter each line using a regular expression but it's not working as intended. Is there a way to know the encoding of a csv-file when read?
How could I fix the regular expression to allow lines with only these characters? a-zA-Z 0-9 .,;:"'()-_& space tab.
Clob example readed from csv:
l_clob clob :='
"exp","objc","objc","OBR","031110-5","S","EXAMPLE","NAME","08/03/2018",,"122","3","12,45"
"xp","objc","obj","OBR","031300-5","S","EXAMPLE","NAME","08/03/2018",,"0","0","0"
';
Another clob:
DECLARE
l_clob CLOB
:= '"exp","objc","objc","OBR","031110-5","S","EXAMPLE","NAME","08/03/2018",,"122","3","12,45"
"xp","objc","obj","OBR","031300-5","S","EXAMPLE","NAME","08/03/2018",,"0","0","0"';
l_offset PLS_INTEGER := 1;
l_line VARCHAR2 (32767);
csvregexp CONSTANT VARCHAR2 (1000)
:= '^([''"]+[-&\s(a-z0-9)]*[''"]+[,:;\t\s]?)?[''"]+[-&\s(a-z0-9)]*[''"]+' ;
l_total_length PLS_INTEGER := LENGTH (l_clob);
l_line_length PLS_INTEGER;
BEGIN
WHILE l_offset <= l_total_length
LOOP
l_line_length := INSTR (l_clob, CHR (10), l_offset) - l_offset;
IF l_line_length < 0
THEN
l_line_length := l_total_length + 1 - l_offset;
END IF;
l_line := SUBSTR (l_clob, l_offset, l_line_length);
IF REGEXP_LIKE (l_line, csvregexp, 'i')
THEN -- i (case insensitive matches)
DBMS_OUTPUT.put_line ('Ok');
DBMS_OUTPUT.put_line (l_line);
ELSE
DBMS_OUTPUT.put_line ('Error');
DBMS_OUTPUT.put_line (l_line);
END IF;
l_offset := l_offset + l_line_length + 1;
END LOOP;
END;
If you only want to allow special characters you can use this regex:
Your Regex
csvregexp CONSTANT VARCHAR2 (1000) := '^[a-zA-Z 0-9 .,;:"''()-_&]+$' ;
Regex-Details
^ Start of your string - no chars before this - prevents partial match
[] a set of allowed chars
[]+ a set of allowed chars. Has to be one char minimum up to inf. (* instead of + would mean 0-inf.)
[a-zA-Z]+ 1 to inf. letters
[a-zA-Z0-9]+ 1 to inf. letters and numbers
$ end of your string - no chars behind this - prevents partial match
I think you can work it out with this ;-)
If you know there could be an other encoding in your input, you could try to convert and check against the regex again.
Example-convert
select convert('täst','us7ascii', 'utf8') from dual;

PL/SQL key-value String using Regex

I have a String stored in a table in the following key-value format: "Key1☺Value1☺Key2☺Value2☺KeyN☺ValueN☺".
Given a Key how can I extract the Value? Is regex the easiest way to handle this? I am new to PL/SQL as well as Regex.
In this case, I would use just a regular split and iterate through the resulting array.
public string GetValue(string keyValuePairedInput, string key, char separator)
{
var split = keyValuePairedInput.Split(separator);
if(split.Lenght % 2 == 1)
throw new KeyWithoutValueException();
for(int i = 0; i < split.Lenght; i += 2)
{
if(split[i] == key)
return split[i + 1];
}
throw new KeyNotFoundException();
}
(this was not compiled and is not pl/sql anyway, treat it as pseudocode ☺)
OK I hear your comment...
Making use of pl/sql functions, you might be able to use something like this:
select 'key' as keyValue,
(instr(keyValueStringField, keyValue) + length(keyValue) + 1) as valueIndex,
substr(keyValueStringField, valueIndex, instr(keyValueStringField, '\1', valueIndex) - valueIndex) as value
from Table
For this kind of string slicing and dicing in PL/SQL you will probably have to use regular expressions. Oracle has a number of regular expression functions you can use. The most commonly used one is REGEXP_LIKE which is very similar to the LIKE operator but does RegEx matching.
However you probably need to use REGEXP_INSTR to find the positions where the separators are then use the SUBSTR function to slice up the string at the matched positions. You could also consider using REGEXP_SUBSTR which does the RegEx matching and slicing in one step.
As an alternative to regular expressions...
Assuming you have an input such as this:
Key1,Value1|Key2,Value2|Key3,Value3
You could use some PL/SQL as shown below:
FUNCTION get_value_by_key
(
p_str VARCHAR2
, p_key VARCHAR2
, p_kvp_separator VARCHAR2
, p_kv_separator VARCHAR2
) RETURN VARCHAR2
AS
v_key VARCHAR2(32767);
v_value VARCHAR2(32767);
v_which NUMBER;
v_cur VARCHAR(1);
BEGIN
v_which := 0;
FOR i IN 1..length(p_str)
LOOP
v_cur := substr(p_str,i,1);
IF v_cur = p_kvp_separator
THEN
IF v_key = p_key
THEN
EXIT;
END IF;
v_key := '';
v_value := '';
v_which := 0;
ELSIF v_cur = p_kv_separator
THEN
v_which := 1;
ELSE
IF v_which = 0
THEN
v_key := v_key || v_cur;
ELSE
v_value := v_value || v_cur;
END IF;
END IF;
END LOOP;
IF v_key = p_key
THEN
RETURN v_value;
END IF;
raise_application_error(-20001, 'key not found!');
END;
To get the value for 'Key2' you could do this (assuming your function was in a package called test_pkg):
SELECT test_pkg.get_value_by_key('Key1,Value1|Key2,Value2|Key3,Value3','Key2','|',',') FROM dual