Remove or replace '�' character in Informatica - replace

We have a requirement wherein we need to replace or remove '�' character (which is an unrecognizable, undefined character) present in our source. While running my workflow it runs successfully but when i check the records in target they are not committed. I get the following error in Informatica
Error executing query for record 37: 6706: The string contains an untranslatable character.
I tried functions like replace_chr, reg_replace, replace_str etc., but none seems to be working. Kindly advise on how to get rid of this. Any reply is greatly appreciated.

You need to use in your schema definitions charset=> utf8-unidode-ci
but now you can do:
UPDATE tablename
SET columnToCheck = REPLACE(CONVERT(columnToCheck USING ascii), '?', '')
WHERE ...
or
update tablename
set columnToCheck = replace(columnToCheck , char(146), '');
Replace NonASCII Characters in MYSQL

You can replace the special characters in an expression transformation.
REPLACESTR(1,Column_Name,'?',NULL)
REPLACESTR - Function
1 - Position
Column_Name - Column name which has a special character
? - Special character
NULL - Replacing character

You need to fetch rows with the appropriate character set defined on your connection. What is the connection you're using, ODBC or native? What's the DB?

Special characters are a challenge and having checked the informatica network I can see there is a kludge involving replace_str setting first a variable to the string with all non special characters first and then using the resulting variable in a replace_str so that the final value has only the allowed characters https://network.informatica.com/thread/20642 (awesome workaround by nico so long as you can positively identify every character that should be allowed) ...
As an alternate kludge I would also attempt something using an xml transformation somewhere within the mapping as informatica conveniently converts special characters to encoded (decimal or hex I cant remember) values... so long as you can live with these encoded values appearing in your target text you should be fine ( and build some extra space into your strings to accommodate any bloatage from the extra characters

Related

How to replace or ignore the Accented characters in SSIS

I have a SSIS package which reads the input file first & then validate it and then process the same. The validation is being carried through Script Task.
When the file is processed i am getting an error "invalid character in the given encoding". When verified i identified that this is due to the Accented character present in the file first name: André.
I tried replacing these characters in the xslt file using the replace(normalize-unicode()) function but its not working because the script task is being called initially.
Can anyone help me in ignoring/replacing these special character while processing the file?
In a dataflow task you can replace values using the applicable unicode hex value. The following code replaces three common accent marks with a blank space:
(DT_STR,500,1252)TRIM(REPLACE(REPLACE(REPLACE([YOUR_FIELD],"\x0060",""),"\x00B4",""),"\x02CB",""))
Find more here: http://www.utf8-chartable.de/

SAS Date (numeric) to Character when missing (.)

Most likely a silly question, but I must be overlooking something.
I have a date field in which sometimes the date is missing (.). I have to create a file against this data set, but the requirements to have this loaded into a DB2 environment are requesting that instead of native SAS null numeric value (.), they require it to be a blank string.
This should be a simple task, by first converting the variable to character, and using the appropriate format:
LAST_ATTEMPT = PUT(ATTMPT1,YYMMDDS10.);
When a proc contents is run on the data set, it confirms that this has been converted to a character variable.
The issue is that when I look at the data set, it still has the (.) for the missing values. In an attempt to convert the missing date(.) to a blank string, it then blanks out every value for the variable...
What am I missing here?
Options MISSING=' ';
This will PUT blank for missing value when you execute your assignment.
One way is to use Options MISSING=' ';, but this might have unwanted impact on other parts of your program.
Another safer way is just adding a test to the original program:
IF ATTMPT1~=. THEN LAST_ATTEMPT = PUT(ATTMPT1,YYMMDDS10.);
ELSE LAST_ATTEMPT = "";

How to prevent illegal characters error in DB2 SQL query?

I'm working with a huge DB2 table (hundreds of millions of rows), trying to select only the rows that are matched by this regular expression:
\b\d([- \/\\]?\d){12,15}(\D|$)
(That is, a word boundary, followed by 13 to 16 digits separated by nothing or a single dash, space, slash, or backslash, followed be either a non-digit or the end of the line.)
After much Googling, I've managed to create the following SQL:
SELECT idx, comment FROM tblComment
WHERE xmlcast(xmlquery('fn:matches($c,"\b\d([- \/\\]?\d){12,15}(\D|$)")' PASSING comment AS "c") AS INTEGER)=1
Which works perfectly, as far as I can tell... unless it finds a row with an illegal character:
An illegal XML character "#x3" was found in an SQL/XML expression or function argument that begins with string [...]
The data contains many illegal XML characters, and changing the data is not an option (I have limited read-only access, and there are far too many rows that would need to be fixed). Is there a way to strip out or ignore illegal characters, without first modifying the database? Or, is there a different way for me to write my query that has the same effect?
You will have to identify what are all the illegal XML characters that occur in your data. Once you know them, you can use the TRANSLATE() function to eliminate them during the pattern matching.
Say, you determine that all ASCII control characters (0x00 through 0x0F and 0x7F) may be present in the COMMENT column. Your query might then look like:
SELECT idx, comment FROM tblComment
WHERE xmlcast(xmlquery(
'fn:matches($c,"\b\d([- \/\\]?\d){12,15}(\D|$)")'
PASSING TRANSLATE(comment, ' ', x'01020304050607080B0C0F7F') AS "c")
AS INTEGER)=1
All legal XML characters are listed in the manual. 0x09, 0x0A and 0x0D are legal, so you don't need to TRANSLATE() them, for example.

Add a '~' symbol in the HL7 message

I have an HL7 Message exporting.
There's one field which has a tild symbol (~) in the input.
The HL7 is converting that into symbol "\R\"
I also tried exporting this value by using the ASCII value (126) for the '~' character using VBScript as I am .
But that was also converted by HL7 to "\R\"
How Can I get the '~' exported ?
Any Help would be appreciated.
HL7 escapes the repetition character "~" to "\R\" when transferring a message. The receiver should that change back to your tilde, when working with that field.
But there is a second way to deal with that issue. HL7 allows to change the encoding chars. Unfortunately not all HL7 engines support that.
This character (~) represents that this field can have multiple values.
Consider this PID.3 field from a given HL7 message
12345^^^XYZ~6789^^^PQR
What it means that, the patient has 2 patient ids coming from different sources viz. XYZ and PQR. This is what the (~) character means functionally.
If I go by the statement in the question body, I believe you want to achieve the functionality of (~).
To do this, try following below process. I don't know vbscript so I can't give you the code, however I have some Javascript code for the same, and I think you can mimic the same on vbscript. I'll leave that task to you.
//Calculates number of current repetitions by counting the length
var pidfieldlen=msg.PID['PID.3'].length();
//Store the last field node
var lastpidnode=msg['PID']['PID.3'][pidfieldlen-1]; //If length is 5,node index is 4
//Create new pid field and append with last pid node
var newpidfield=<PID.3/> //Creating new separate element for PID.3
newpidfield['PID.3.1']="567832" //Adding Field Values
newpidfield['PID.3.4']="NEW SOURCE"
lastpidnode.appendChild(newpidfield) //Adding above created to the last node
This will transform the PID.3 into
12345^^^XYZ~6789^^^PQR~567832^^^NEW SOURCE
Try to replace the tilde characters with ~ or ~ (decimal).
See the unicode reference for this character.
If you have already done so, this is not the source of error. I suspect that HL7 attaches a special meaning to this character. According to this webpage it denotes a "Field Repeat Separator".

c++ - escape special characters

I need to escape all special characters and replace national characters and get "plain text" for a tablename.
string getTableName(string name)
My string could be "šárka65_%&." and I want to get string I can use in my database as a tablename.
Which DBMS?
In standard SQL, a name enclosed in double quotes is a delimited identifier and may contain any characters.
In MS SQL Server, a name enclosed in square brackets is a delimited identifier.
In MySQL, a name enclosed in back-ticks is a delimieted identifier.
You could simply choose to enclose the name in the appropriate markers.
I had a feeling that wasn't what you wanted...
What codeset is your string in? It seems to be UTF-8 by the time it gets to my browser. Do you need to be able to invert the mapping unambiguously? That is harder.
You can use many schemes to map the information:
One simple minded one is simply to hex-encode everything, using a marker (X) to protect against leading digits:
XC5A1C3A1726B6136355F25262E
One slightly less simple minded one is hex-encode anything that is not already an ASCII alphanumeric or underscore.
XC5A1C3A1rka65_25262E
Or, as a comment suggests, you can devise a mapping table for accented Latin letters - indeed, a mapping table appropriately initialized will be the fastest approach. The input is the character in the source string; the output is the desired mapped character or characters. If you use an 8-bit character set, this is entirely manageable. If you use full Unicode, it is a lot less manageable (not least, how do you map all the Han syllabary to ASCII?).
Or ...