How to replace or ignore the Accented characters in SSIS - xslt

I have a SSIS package which reads the input file first & then validate it and then process the same. The validation is being carried through Script Task.
When the file is processed i am getting an error "invalid character in the given encoding". When verified i identified that this is due to the Accented character present in the file first name: André.
I tried replacing these characters in the xslt file using the replace(normalize-unicode()) function but its not working because the script task is being called initially.
Can anyone help me in ignoring/replacing these special character while processing the file?

In a dataflow task you can replace values using the applicable unicode hex value. The following code replaces three common accent marks with a blank space:
(DT_STR,500,1252)TRIM(REPLACE(REPLACE(REPLACE([YOUR_FIELD],"\x0060",""),"\x00B4",""),"\x02CB",""))
Find more here: http://www.utf8-chartable.de/

Related

Dataprep - accents and special characters

How do I solve this problem with accents / special characters in the dataprep? I need this information to appear.
Thank you very much for your attention.
DataPrep has builtin recipes which allow you to remove or change special characters. For example, you can change accented letters to unaccented ones with Remove accents in text or you can also replace non recognised characters for another character with Replace text or patterns.
Below are the steps to change a special character or accented letter.
Create your flow.
Add/import your data
Click Add a recipe, as per documentation. In your case you can do one or both of the following:
First, in case you have an accented word, go to Search Transformations > Select Remove accents in text. Then, select the column, which there are accented words. It will replace the accented words for non-accented ones. Your data your be shown to you so you can check the transformation.
Second, in case you have an non recognised character, go to Search Transformations > Replace text or patterns > Select the column you want to transform the data > Within Find write the letter/symbol between single quotes > In Replace with write the letter which will be placed instead. Finally, preview your data to see the transformation.
UPDATE: I was able to load a .csv file with the mentioned characters to DataPrep. Below are my steps and sample data:
The .csv file I used had the following content:
Test
Non rec. char É
Non rec. char ç
Accented word não
In the DataPrep UI home page, click on Import Data (top right corner) Google Cloud Storage (left part of the screen). Then, find and select you file (test just importing one file instead of parametrizing) and click in the add(+) symbol. In this step, you can already see the characters, in my case I could see them normally. Finally, click in Import&Wrangle and visualise your data. Using the data above, I was able to see the characters properly without any issues.

Flat file schema validation using regular expression - not allow new line and delimiter char

I know this must be primitive question but I am still not able to find a solution to my simple problem.
In a BizTalk solution, I want to validate a inbound flat file against a flat file schema (Delimiter char is pipe '|'). The rule is that there must be exact same number of fields in every record (every line). So after disassembling, none of the field must have new line char (CR LF or \r\n) and pipe '|' char.
Every line in flat file is a single record and there are 10 fields in every record. so there must me exact 9 '|' pipe chars in every line.
I tried to solve it using XSD regular expression validation but since regex is not my area of expertise, I am not able to create a final regex. Currently I am testing with .*(?!([^\r\n\|])).* but it doesn't work when there are more than 9 '|' chars however it works when there are less than 9.
Finally I want a XSD regex which must not allow a new line char and '|' in string but can have empty '' value.
I have referred below links to create my regex,
XML Schema Regular Expressions
XML Schema - Regular Expressions
I think you're trying to solve the wrong problem.
First, do you really need to do this? I don't recall ever needing or even considering what you're describing.
Second, you can just Validate the parsed Xml. If the field count is wrong, it will fail there. If you really need to check for extra '|', you can put that in the Schema to test for it in a Map.
IBM Integration Bus solves this problem by allowing you to describe the non-XML data format using an XSD. The technology is called Data Format Description Language (DFDL).
https://en.wikipedia.org/wiki/Data_Format_Description_Language

Remove or replace '�' character in Informatica

We have a requirement wherein we need to replace or remove '�' character (which is an unrecognizable, undefined character) present in our source. While running my workflow it runs successfully but when i check the records in target they are not committed. I get the following error in Informatica
Error executing query for record 37: 6706: The string contains an untranslatable character.
I tried functions like replace_chr, reg_replace, replace_str etc., but none seems to be working. Kindly advise on how to get rid of this. Any reply is greatly appreciated.
You need to use in your schema definitions charset=> utf8-unidode-ci
but now you can do:
UPDATE tablename
SET columnToCheck = REPLACE(CONVERT(columnToCheck USING ascii), '?', '')
WHERE ...
or
update tablename
set columnToCheck = replace(columnToCheck , char(146), '');
Replace NonASCII Characters in MYSQL
You can replace the special characters in an expression transformation.
REPLACESTR(1,Column_Name,'?',NULL)
REPLACESTR - Function
1 - Position
Column_Name - Column name which has a special character
? - Special character
NULL - Replacing character
You need to fetch rows with the appropriate character set defined on your connection. What is the connection you're using, ODBC or native? What's the DB?
Special characters are a challenge and having checked the informatica network I can see there is a kludge involving replace_str setting first a variable to the string with all non special characters first and then using the resulting variable in a replace_str so that the final value has only the allowed characters https://network.informatica.com/thread/20642 (awesome workaround by nico so long as you can positively identify every character that should be allowed) ...
As an alternate kludge I would also attempt something using an xml transformation somewhere within the mapping as informatica conveniently converts special characters to encoded (decimal or hex I cant remember) values... so long as you can live with these encoded values appearing in your target text you should be fine ( and build some extra space into your strings to accommodate any bloatage from the extra characters

Add a '~' symbol in the HL7 message

I have an HL7 Message exporting.
There's one field which has a tild symbol (~) in the input.
The HL7 is converting that into symbol "\R\"
I also tried exporting this value by using the ASCII value (126) for the '~' character using VBScript as I am .
But that was also converted by HL7 to "\R\"
How Can I get the '~' exported ?
Any Help would be appreciated.
HL7 escapes the repetition character "~" to "\R\" when transferring a message. The receiver should that change back to your tilde, when working with that field.
But there is a second way to deal with that issue. HL7 allows to change the encoding chars. Unfortunately not all HL7 engines support that.
This character (~) represents that this field can have multiple values.
Consider this PID.3 field from a given HL7 message
12345^^^XYZ~6789^^^PQR
What it means that, the patient has 2 patient ids coming from different sources viz. XYZ and PQR. This is what the (~) character means functionally.
If I go by the statement in the question body, I believe you want to achieve the functionality of (~).
To do this, try following below process. I don't know vbscript so I can't give you the code, however I have some Javascript code for the same, and I think you can mimic the same on vbscript. I'll leave that task to you.
//Calculates number of current repetitions by counting the length
var pidfieldlen=msg.PID['PID.3'].length();
//Store the last field node
var lastpidnode=msg['PID']['PID.3'][pidfieldlen-1]; //If length is 5,node index is 4
//Create new pid field and append with last pid node
var newpidfield=<PID.3/> //Creating new separate element for PID.3
newpidfield['PID.3.1']="567832" //Adding Field Values
newpidfield['PID.3.4']="NEW SOURCE"
lastpidnode.appendChild(newpidfield) //Adding above created to the last node
This will transform the PID.3 into
12345^^^XYZ~6789^^^PQR~567832^^^NEW SOURCE
Try to replace the tilde characters with ~ or ~ (decimal).
See the unicode reference for this character.
If you have already done so, this is not the source of error. I suspect that HL7 attaches a special meaning to this character. According to this webpage it denotes a "Field Repeat Separator".

Writing an interpreter in C++

I'm working on a C++ project which should do following operations:
Open a .txt file which contains list of strings
(for example String1: "Hi,name_1_is,;Ondrej,age24;year,,88;") with optional values determined by empty commas ",,".
After this check each string using regular expressions for valid input
(like "Hi" shouldn't be a number or "1" must be a number and everything with ",," is optional and can be skipped or user can enter this value as well).
Then evaluate the result and save it to variable or new .txt generated file.
This result shows if whole string is correct with an "ok" message attached to it or it will attach "not ok" message right to the parameter with wrong input.
I have already finished the part with opening a .txt file, checking the whole string and saving the right strings to the new file (using Qt and Visual Studio 2010 Express).
I need to do the part where each parameter will be checked but somehow I don't know how exactly, as I should not build Parser but the whole programm must be build like Interpreter.
Actually I'm stucked at this point because I have no idea how to start to build this like an Interpreter.
All my attempts resulted always with structure similar to Parser
(that means: I used split string, then checked each token or char using regex, then built the string together again, ect.)
Could you provide me with some usefull links or tips of how to achieve that or at least where to start at all please?