Untranslatable character when extracting dates from strings

Untranslatable character when extracting dates from strings - regex

I am attempting to extract dates from a free-text field (because our process is awesome like that :\ ) and keep hitting Teradata error 6706. The regex I'm using is: REGEXP_SUBSTR(original_field,'(\d{2})\/(\d{2})\/(\d{4})',1) AS new_field. I'm unsure of the field's type HELP TABLE has a blank in the Type column for the field.
I've already tried converting using TRANSLATE(col USING LATIN_TO_UNICODE), as well as UNICODE_TO_LATIN, those both actually cause the error by themselves. A straight CAST(original_field AS VARCHAR(255)) doesn't fix the issue, though that cast does work. I've also tried stripping various special characters (new-line, carriage return, etc.) from the field before letting the REGEXP_SUBSTR take a crack at it, both by itself and with the CAST & TRANSLATEs I already mentioned.
At this point I'm not sure what the issue could be, and could use some guidance on additional options to try.

The final version that worked ended up being
, CASE
WHEN TRANSLATE_CHK(field USING LATIN_TO_UNICODE) = 0 THEN
REGEXP_SUBSTR(TRANSLATE(field USING LATIN_TO_UNICODE),'(\d{2})\/(\d{2})\/(\d{4})',1)
ELSE NULL
END AS Ref_Date
For whatever reason, using a TRIM inside the TRANSLATE seems to cause an issue. Only once I striped any and all functions from inside the TRANSLATE did the TRANSLATE, and thus the REGEXP_SUBSTR, work.

Related

After using libdb2o.so ( instead of libdb2.so ), Program throws an DB2 / LUW SQL Error out of range ( -302 ) by the same size/length of std::string

I have a strange question about the using of libdb2.so and libdb2o.so. My C++ Program have an string with length/size 33. I'm trying to write that string inside a database column of character which have a size of 35 character. Important: That string have a german umlauts (ü) and several special characters. It looks like that:
Xüxxxxx / Xxxxxxxxxxxx-Xxxxxxxxxxx
If my program ran with the hard-linked libdb2.so it ran perfectly and I have no SQL Error for many month. After a view month I thought would be a great idea to use the libdb2o.so which I did not hard-linked inside my C++ Program. All other SQL Statements went well, but my INSERT-SQLs got an error like that:
[IBM][CLI Driver][DB2/AIX64] SQL0302N The value of a host variable in the EXECUTE or OPEN statement is out of range for its corresponding use. SQLSTATE=22001
After some analysis I regonized that I could have an encoding problem, but not inside my std::string. The size of std::string did not changed. It is still 33 long. If I replaced my umlauts to normal character (ü => u) it worked fine, but it is not what I want.
I thought if I using libdb2o.so I have the standard encoding UTF8, but it looks like it could not be, or? If I tried to set the UTF8 inside my connection string like that above, it did not worked and I got an error of "unkown paramter inside connection string"
CONNECTION_STRING=DRIVER=libdb2o.so;Database=XXXX;Protocol=tcpip;Hostname=XXXX;Servicename=XXXX;UID=XXXX;PWD=XXXX;
Well, I did not found a simpel solution (okay I found NO solutions or explanations), therefore I would be greatful if someone know how I could fix that problem. Maybe any ideas to simpel use UTF8 inside my INSERT without changing the content of my std::string?
Am I on the wrong track that the problem could be UTF8 encoding? Any other ideas?

Matlab: What's the most efficient approach to parse a large table or cell array with regexp when sometimes there is no match?

I am working with a messy manually maintained "database" that has a column containing a string with name,value pairs. I am trying to parse the entire column with regexp to pull out the values. The column is huge (>100,000 entries). As a proxy for my actual data, let's use this code:
line1={'''thing1'': ''-583'', ''thing2'': ''245'', ''thing3'': ''246'', ''morestuff'':, '''''};
line2={'''thing1'': ''617'', ''thing2'': ''239'', ''morestuff'':, '''''};
line3={'''thing1'': ''unexpected_string(with)parens5'', ''thing2'': 245, ''thing3'':''246'', ''morestuff'':, '''''};
mycell=vertcat(line1,line2,line3);
This captures the general issues encountered in the database. I want to extract what thing1, thing2, and thing3 are in each line using cellfun to output a scalar cell array. They should normally be 3 digit numbers, but sometimes they have an unexpected form. Sometimes thing3 is completely missing, without the name even showing up in the line. Sometimes there are minor formatting inconsistencies, like single quotes missing around the value, spaces missing, or dashes showing up in front of the three digit value. I have managed to handle all of these, except for the case where thing3 is completely missing.
My general approach has been to use expressions like this:
expr1='(?<=thing1''):\s?''?-?([\w\d().]*?)''?,';
expr2='(?<=thing2''):\s?''?-?([\w\d().]*?)''?,';
expr3='(?<=thing3''):\s?''?-?([\w\d().]*?)''?,';
This looks behind for thingX' and then tries to match : followed by zero or one spaces, followed by 0 or 1 single quote, followed by zero or one dash, followed by any combination of letters, numbers, parentheses, or periods (this is defined as the token), using a lazy match, until zero or one single quote is encountered, followed by a comma. I call regexp as regexp(___,'tokens','once') to return the matching token.
The problem is that when there is no match, regexp returns an empty array. This prevents me from using, say,
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),mycell);
unless I call it with 'UniformOutput',false. The problem with that is twofold. First, I need to then manually find the rows where there was no match. For example, I can do this:
emptyout=cellfun(#(x) isempty(x),out);
emptyID=find(emptyout);
backfill=cell(length(emptyID),1);
[backfill{:}]=deal('Unknown');
out(emptyID)=backfill;
In this example, emptyID has a length of 1 so this code is overkill. But I believe this is the correct way to generalize for when it is longer. This code will change every empty cell array in out with the string Unknown. But this leads to the second problem. I've now got a 'messy' cell array of non-scalar values. I cannot, for example, check unique(out) as a result.
Pardon the long-windedness but I wanted to give a clear example of the problem. Now my actual question is in a few parts:
Is there a way to accomplish what I'm trying to do without using 'UniformOutput',false? For example, is there a way to have regexp pass a custom string if there is no match (e.g. pass 'Unknown' if there is no match)? I can think of one 'cheat', which would be to use the | operator in the expression, and if the first token is not matched, look for something that is ALWAYS found. I would then still need to double back through the output and change every instance of that result to 'Unknown'.
If I take the 'UniformOutput',false approach, how can I recover a scalar cell array at the end to easily manipulate it (e.g. pass it through unique)? I will admit I'm not 100% clear on scalar vs nonscalar cell arrays.
If there is some overall different approach that I'm not thinking of, I'm also open to it.
Tangential to the main question, I also tried using a single expression to run regexp using 3 tokens to pull out the values of thing1, thing2, and thing3 in one pass. This seems to require 'UniformOutput',false even when there are no empty results from regexp. I'm not sure how to get a scalar cell array using this approach (e.g. an Nx1 cell array where each cell is a 3x1 cell).
At the end of the day, I want to build a table using these results:
mytable=table(out1,out2,out3);
Edit: Using celldisp sheds some light on the problem:
celldisp(out)
out{1}{1} =
246
out{2} =
Unknown
out{3}{1} =
246
I assume that I need to change the structure of out so that the contents of out{1}{1} and out{3}{1} are instead just out{1} and out{3}. But I'm not sure how to accomplish this if I need 'UniformOutput',false.

Note: I've not used MATLAB and this doesn't answer the "efficient" aspect, but...
How about forcing there to always be a match?
Just thinking about you really wanting a match to skip this problem, how about an empty match?
Looking on the MATLAB help page here I can see a 'emptymatch' option, perhaps this is something to try.
E.g.
the_thing_i_want_to_find|
Match "the_thing_i_want_to_find" or an empty match, note the | character.
In capture group it might look like this:
(the_thing_i_want_to_find|)

As a workaround, I have found that using regexprep can be used to find entries where thing3 is missing. For example:
replace='$1 ''thing3'': ''Unknown'', ''morestuff''';
missingexpr='(?<=thing2'':\s?)(''?-?[\w\d().]*?''?,) ''morestuff''';
regexprep(mycell{2},missingexpr,replace)
ans =
''thing1': '617', 'thing2': '239', 'thing3': 'Unknown', 'morestuff':, '''
Applying it to the entire array:
fixedcell=cellfun(#(x) regexprep(x,missingexpr,replace),mycell);
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),fixedcell,'UniformOutput',false);
This feels a little roundabout, but it works.

cellfun can be replaced with a plain old for loop. Your code will either be equally fast, or maybe even faster. cellfun is implemented with a loop anyway, there is no advantage of using it other than fewer lines of code. In your explicit loop, you can then check the output of regexp, and build your output array any way you like.

How can I replace text in a Siebel data mapping?

I have an outgoing web service to send data from Siebel 7.8 to an external system. In order for the integration to work, before I send the data, I must change one of the field values, replacing every occurence of "old" with "new". How can I do this with EAI data mappings?
In an ideal world I would just use an integration source expression like Replace([Description], "old", "new"). However Siebel is far from ideal, and doesn't have a replace function (or if it does, it's not documented). I can use all the Siebel query language functions which don't need an execution context. I can also use the functions available for calculated fields (sane people could expect both lists to be the same, but Siebel documentation is also far from ideal).
My first attempt was to use the InvokeServiceMethod function and replace the text myself in eScript. So, this is my field map source expression:
InvokeServiceMethod('MyBS', 'MyReplace', 'In="' + [Description] + '"', 'Out')
After some configuration steps it works fine... except if my description field contains the " character: Error parsing expression 'In="This is a "test" with quotes"' for field '3' (SBL-DAT-00481)
I know why this happens. My double quotes are breaking the expression and I have to escape them by doubling the character, as in This is a ""test"" with quotes. However, how can I replace each " with "" in order to call my business service... if I don't have a replace function? :)
Oracle's support web has only one result for the SBL-DAT-00481 error, which as a workaround, suggests to place the whole parameter inside double quotes (which I already had). There's a linked document in which they acknowledge that the workaround is valid for a few characters such as commas or single quotes, but due to a bug in Siebel 7.7-7.8 (not present in 8.0+), it doesn't work with double quotes. They suggest to pass instead the row id as argument to the business service, and then retrieve the data directly from the BC.
Before I do that and end up with a performance-affecting workaround (pass only the ID) for the workaround (use double quotes) for the workaround (use InvokeServiceMethod) for not having a replace function... Am I going crazy here? Isn't there a simple way to do a simple text replacement in a Siebel data mapping?

first thing (quite possibly - far from optimal one) which is coming to my mind - is to create at source BC calculated field, aka (NEW_VALUE), which becomes "NEW" for every record, where origin field has a value "OLD". and simply use this field in integration map.

How do I prohibit double quotes in an inputText in XPages?

I've been trying to prohibit users from entering double-quotes (") into some fields that are used in JSON strings, as they cause unexpected termination of values in the strings. Unfortunately, while the regex isn't hard to write, I can't get it to work within XPages.
I tried using both double-quotes alone and using the escape character. Both ways fail any string, not just ones including the double-quotes.
<xp:validateConstraint message="Please do not use double quotes in organization/vendor names">
<xp:this.regex><![CDATA['^[^\"]*$]]></xp:this.regex>
</xp:validateConstraint>
There must be a simple way around this issue.

I think you're running into issues with your regex property for your xp:validateConstraint validator. You seem to be attempting to strip the characters in the xp:this.regex as opposed to specifying what characters are allowed, as I believe the docs read. I might recommend checking out the xp:customConverter (bias: I'm more familiar with the customConverter) which gives you the ability to alter the getValueAsObject and getValueAsString methods; then you can escape the undesired characters.
Here's what I'm thinking of, to strip them out. If you plug this into an XPage, you'll find that when the value is pulled (e.g.- by the partial refresh), it converts the input content accordingly by stripping out quotes (both single and double, in my case).
<?xml version="1.0" encoding="UTF-8"?>
<xp:view xmlns:xp="http://www.ibm.com/xsp/core">
<xp:inputTextarea
id="inputTextarea1"
value="#{viewScope.myStuff}"
disableClientSideValidation="true">
<xp:this.converter>
<xp:customConverter>
<xp:this.getAsString><![CDATA[#{javascript:return value.replace(/["']/g, "");}]]></xp:this.getAsString>
<xp:this.getAsObject><![CDATA[#{javascript:return value.replace(/["']/g, "");}]]></xp:this.getAsObject>
</xp:customConverter>
</xp:this.converter>
</xp:inputTextarea>
<xp:button
value="Do Something"
id="button1">
<xp:eventHandler
event="onclick"
submit="true"
refreshMode="partial"
refreshId="computedField1" />
</xp:button>
<xp:text
escape="true"
id="computedField1"
value="#{viewScope.myStuff}" />
</xp:view>
My interaction with the above code yields:
Notice that for it to reflect in the refresh, I'm modifying both the getAsString and the getAsObject, since it's updating the viewScope'd object during the refresh (a fact I had to remind myself of), but saving to a text field in XPages will get the value by the getAsString (provided your data source knows its a String related field, e.g.- NotesXspDocument as document1, with known Form, where the field is a Text field).
As the above comments alluded to, this performs an act of filtering the input values as opposed to escaping or validating those values. You could also change my replace methods to replacing with a text escape character, return value.replace(/"/g,"\"").replace(/'/g,"\'");.

Is the simple answer just add a JavaScript function call on the submit button to remove the quote?
A more elegant solution would be to not allow typing of the quote by checking the keydown event and preventing for that character code. The user should not be able to type one thing and then have it changed on them in processing

#Eric McCormick recommends a customConverter which in my opinion is a neat solution I probably would be going for in many cases. Sometimes however we need to teach users to adhere to the rules so we have to show them where they did wrong. That's when we may need a validator.
Playing around a bit the simplest solution I came up with is a xp:validateExpression simply looking for the first occurrence of a double quote within the String entered:
<xp:inputText
id="inputText1"
value="#{viewScope.testvalue}">
<xp:this.validators>
<xp:validateExpression
message="Hey, wait! Didn't I tell you not to use double quotes in here?">
<xp:this.expression><![CDATA[#{javascript:value.indexOf("\"")==-1}]]></xp:this.expression>
</xp:validateExpression>
</xp:this.validators>
</xp:inputText>
If that's a single occurrence in your application that's it, really. If you need this and similar solutions all over the place you might want to take a look into writing a small validator bean (java), register it via faces-config.xml and then use it everywhere in your application e.g by using an xp:validator instead

As suggested by #Tomalik and #sidyll, this is attempt to solve the wrong problem. While each of the answers supplied do solve the problem of preventing the user from entering undesirable characters, it is better to encode those characters to preserve the user's input. In this particular case, the intermediate step in providing the data to the user via a JSON string is to pull the value from a view.
So, all I had to do was change the column formula to encode the string using the UTF-8 character set and it displays the values with the "undesirable characters". The unencoded value is stored on the document so that Old Notes access won't create confusion.
#URLEncode ("UTF-8"; vendorName )
In one case, the JSON is computed as part of the form design, but the same solution works.

ColdFusion -- Do I need URLDecode with form POSTs? / URLDecode randomly removes one character

I'm using a WYSIWYG to allow users to format text. This is the error-causing text:
<p><span style="line-height: 115%">This text starts with a 'T'</span></p>
The error is that the 'T' in "This", or whatever the first letter happens to be, is randomly removed when using URLDecode and saving to the DB. Removing URLDecode on the server side seems to fix it without any negative side-effects (the DB contains the same information).
The documentation says that
Query strings in HTTP are always URL-encoded.
Is this really the case? If so, why doesn't removing URLDecode seem to mess everything up?
So two questions:
Why is URLDecode causing the first text character to be removed like this (it seems to only happen when the line-height property is present)?
Do I really need (or would I even want) to use URLDecode before putting POSTed data into the database?
Edit: I made a test page to echo back the decoded text, and URLDecode is definitely removing that character, but I have no idea why.

I believe decoding is done automatically when form scope is populated. That's why characters after % (this char is used for encoding) are removed -- you are trying to decode the string second time.
For security reasons you might be interested in stripping script tags, or even cleaning up HTML using white-list. Try to search in CFLib.org for applicable functions.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Untranslatable character when extracting dates from strings - regex

Related

After using libdb2o.so ( instead of libdb2.so ), Program throws an DB2 / LUW SQL Error out of range ( -302 ) by the same size/length of std::string

Matlab: What's the most efficient approach to parse a large table or cell array with regexp when sometimes there is no match?

How can I replace text in a Siebel data mapping?

How do I prohibit double quotes in an inputText in XPages?

ColdFusion -- Do I need URLDecode with form POSTs? / URLDecode randomly removes one character

Categories

Resources