How to strip invisible 'hex c' character from us-ascii document that I cannot process in xslt - xslt

I'm reading in a us-ascii document and trying to parse into XML
<xsl:analyze-string select="unparsed-text($filename,'us-ascii')" regex="{$regex_clp}">
However I'm getting the error
XTDE1190: The unparsed-text file contains a character that is illegal in XML (line=51 column=2 value=hex c)
In seeking to identify this character, I did a find and replace on all the visible character
and new lines and I'm left with a blank character in the document that causes the same error in a different position when executing the xslt script.
XTDE1190: The unparsed-text file contains a character that is illegal in XML (line=1 column=2 value=hex c)
When I copy and paste this 'hex c' into a java application attempting to strip it and then try to delete it with the backspace button it does not delete and rather I can push the backspace button multiple times and the cursor remains in the same position next to 'hex c' that I pasted.
I've uploaded this file here that has the 'hex c' character
https://drive.google.com/file/d/1e0hkfraiSz39QEPV_zWn0ujyYcQknSCD/view?usp=sharing
Any idea what this character is and how to strip this character out of the file?
Regards
Conteh

Okay so after I posted the text file to google in creating this question, I download the out.txt just to make it still had the same problem after uploading and download from google.
However this time I could see a bunch of symbols in the text file. I copied them and into OxygenXML find and replace to see these were \f form feed
I found and replaced with \f and the problem was solved.
\f identifies (form feed, 0x0C).
Regards
Conteh

Related

Dataprep - accents and special characters

How do I solve this problem with accents / special characters in the dataprep? I need this information to appear.
Thank you very much for your attention.
DataPrep has builtin recipes which allow you to remove or change special characters. For example, you can change accented letters to unaccented ones with Remove accents in text or you can also replace non recognised characters for another character with Replace text or patterns.
Below are the steps to change a special character or accented letter.
Create your flow.
Add/import your data
Click Add a recipe, as per documentation. In your case you can do one or both of the following:
First, in case you have an accented word, go to Search Transformations > Select Remove accents in text. Then, select the column, which there are accented words. It will replace the accented words for non-accented ones. Your data your be shown to you so you can check the transformation.
Second, in case you have an non recognised character, go to Search Transformations > Replace text or patterns > Select the column you want to transform the data > Within Find write the letter/symbol between single quotes > In Replace with write the letter which will be placed instead. Finally, preview your data to see the transformation.
UPDATE: I was able to load a .csv file with the mentioned characters to DataPrep. Below are my steps and sample data:
The .csv file I used had the following content:
Test
Non rec. char É
Non rec. char ç
Accented word não
In the DataPrep UI home page, click on Import Data (top right corner) Google Cloud Storage (left part of the screen). Then, find and select you file (test just importing one file instead of parametrizing) and click in the add(+) symbol. In this step, you can already see the characters, in my case I could see them normally. Finally, click in Import&Wrangle and visualise your data. Using the data above, I was able to see the characters properly without any issues.

Google Apps Script - ReplaceText vertical tab

Whenever I paste text into a Google Docs document, all the newline characters get convereted into vertical tab characters (\013 OR \v). This happens regardless of the source of the clipboard text (webpage, word, notepad++).
Usually this means I have to work my way through the document clearing all the vertical tabs and replacing them with proper newlines by backspacing the character and hitting return. However, I want to write a script to replace all the characters in the doc at once. The Replace ui feature doesn't support newline characters but I'm hoping the scripting api does.
I have written the code below, but though it runs, the vertical tab characters are not replaced. I can still see hundreds in the document with the find/replace ui feature. What am I doing wrong?
function myFunction() {
var body = DocumentApp.getActiveDocument().getBody();
body.replaceText("\\v", "\n");
}

How to replace or ignore the Accented characters in SSIS

I have a SSIS package which reads the input file first & then validate it and then process the same. The validation is being carried through Script Task.
When the file is processed i am getting an error "invalid character in the given encoding". When verified i identified that this is due to the Accented character present in the file first name: André.
I tried replacing these characters in the xslt file using the replace(normalize-unicode()) function but its not working because the script task is being called initially.
Can anyone help me in ignoring/replacing these special character while processing the file?
In a dataflow task you can replace values using the applicable unicode hex value. The following code replaces three common accent marks with a blank space:
(DT_STR,500,1252)TRIM(REPLACE(REPLACE(REPLACE([YOUR_FIELD],"\x0060",""),"\x00B4",""),"\x02CB",""))
Find more here: http://www.utf8-chartable.de/

Escape a line, which includes a special character while reading a text file C++

Using C++, I want to read a text file which includes characters and and, details of set of characters given by a special character ">". After reading it from the file I want to add all characters to an array. I have no idea, how to escape the detail lines and escape character "\n". Please help me to get the characters to an array without details and escape characters.
Here is my example text file.
>ENA|JH373222|JH373222.1 Canis lupus familiaris chromosome 32 genomic scaffold chr32
GAATTCGTAGGTTTTCAGGATGATTTGAAAGTTATTTAGGGGGATCCCTGGGTGGCGCAG
CGGTTTGGTGCCTGCCTTTGGCCCGGGGCGCGATCCTGGAGGCCTGGGATCGAATCCCAC
GTCGGGCTCCCTGCATGGAGCCTGCTTCTCCCTCTGCCTGTGTCTCTGCCTCTCTGTCTC
TCTCTGTGTAACTATCATGAATAAATAAATAAAATCTTAAAAAAAAAAGAAAGTTATTTA
GGTAATTTGGTGGGGACAGGTGACTTGGGGACCCTACTCTTCGGCCATCTTGCAGCCTCC
TACTCTGTTTTCCGATTAAAATTGTTTCTAGGCAATGGCATCTGGAGGGTCAATGAGAAA
Just Googling "read fasta file c++" you can get answers like this (from Rosetta), this (from a cprogramming), and more. All ready code, that you can just copy and paste. Maybe a bit more time could have been used researching this online.

SAS - parse portion of rtf to another rtf

I am struggling hard on this one.
I need to parse a portion of rtf1 document to rtf2 document. Here is the basic approach:
1) Open rtf1 in notepad and find the unique start point(line) and end point(line).
2) Copy portion from point start point to end point and insert it in rtf2 document.
I know how to insert the portion in rtf2 document but couldn't figure out how to extract the portion. The portion that needs to be copied is lengthy so I have to find a way to input start point and end point so I can use those two reference points to extract anything that falls in between.
Thank you in advance for your valuable input.
Zora
Find startPoint.*?endPoint using Regular expression -search mode with . matches newline enabled. Then copy selected text to clipoard.
.*? basically means "match any character until end point is found".
Remember to escape any special characters when defining the start- and endpoints.
|.......before.......|....portion-to-copy....|.....after..........|
If I need to perform such a task, I use simple method (in Notepad, Notepad++) - you can try it, too:
With text cursor at start point, you can press Ctrl+Shift+Home and then Delete. This deletes all content before your portion. (Do not save the file.)
With text cursor at end point, you can press Ctrl+Shift+End and then Delete. This deletes all content after your portion. (Do not save the file.)
Now you have only your portion. Press Ctrl+A (select all), then Ctrl+C to copy it into the clipboard. In SAS: Press Ctrl+Home, then press Ctrl+Shift+End. Then press Ctrl+C.
Paste your portion where you need.
Close your original document (used in steps 1-3) without saving.