Error converting client characters into server’s? - coldfusion

I have textarea field where user copy/paste the text from word document. When user tried to save the form this error message was returned fromt he server:
Error converting characters into server's character set. Some character(s) could not be converted.
After looking carefully over the text that user entered in the field I found out which character is causing the issue. So when user copy/paste the text from the word document dash character is different/longer than standard dash. So the text looks like this:
Red – All devices muted/Do not disturb is enabled
I simply replaced the dash and text looked like this:
Red - All devices muted/Do not disturb is enabled
and I was able to save the form. I'm wondering what I can do to correct/prevent this kind of issues? I use ColdFusion 2018 on the back end in combination with Sybase database. If anyone have suggestions on how to fix this error please let me know.

Related

handling the UTF 8 characters in apex

I have created a textarea pageitem. In the textarea, I am pasting a sql query or writing a simple text with space and linebreak . On submit, in the next page , the same needs to be displayed.
As my sql query or the inputted text was getting displayed in same line without linebreak, I used the below code. Which solved the problem.
:P2_NEW := replace(replace(:P2_NEW,CHR(32),' '),chr(10),'');
htp.p(P2_NEW);
But now when i copy the sql and try to run in sql developer, the sql doesnt run, On pasting the sql in notepad ++ and selecting encoding->UTF8, i am seeing that whereever there is space, it is getting replaced as 'xA0' characters Similarly for text which contained ü is replaced by 'xFC'.
What needs to be done so that, the space doesnot get replaced by xA0 and ü by xFC(Please note there are many similar characters that gets replaced with the corresponding UTF8 characters.eg ä)
Please advice.

Queries on "Get Regex matches" in Robocorp

I have a form in MS Word which the user fills and emails me. I have to open the form and capture all the details entered by the user and use the same to submit a form in my portal.
I am trying to create a robot using Robocorp to automate this process. Using "Get All texts" - RPA Word library, I am logging the contents from the Word document in Robocorp and then trying to get the required data using Regex but need some help on extracting the data using Regex.
Please find the raw text logged in Robocorp below,
Source Text
Query 1:
Need to extract Manager name:
In Regex101, I am getting the name returned as expected upon using, [^Manager\n].*
In Robocorp, when I use 'Get regex matches' with [^Manager\n].*, I am getting all the content of the text file.
Please help me with the regex to use in Robocorp to extract the Manager name.
Query 2:
I need to extract the answers provided by the user for the questions in the above form. (Note: The answers change with every form submitted)
I tried the below,
For eg: I pulled one question first from the above form using - Get Regex matches (?s)(Lunch).*?(No).
I got the below value returned in robocorp,
['Lunch account required? \x07☒ Yes\r☐ No']
Now again from this value returned (using this as string), I tried to get the answer selected by the user using,
Get Regex matches (?<=☒)\s\w+
But I am getting the error "TypeError: expected string or bytes-like object".
Not sure, If the above flow is right or can I get the answers selected by the user for all questions in a different way?
Sorry if my questions are simple. I am totally new to using Regex and in my learning phase.

How to find and replace box character in text file?

I have a large text file that I'm going to be working with programmatically but have run into problems with a special character strewn throughout the file. The file is way too large to scan it looking for specific characters. Most of the other unwanted special characters I've been able to get rid of using some regex pattern. But there is a box character, similar to "□". When I tried to copy the character from the actual text file and past it here I get "�", so the example of the box is from Windows character map which includes the code 'U+25A1', which I'm not sure how to interpret or if it's something I could use for a regex search.
Would anyone know how I could search for the box symbol similar to "□" in a UTF-8 encoded file?
EDIT:
Here is an example from the text file:
"� Prune palms when flower spathes show, or delay pruning until after the palm has finished flowering, to prevent infestation of palm flower caterpillars. Leave the top five rows."
The only problem is that, as mentioned in the original post, the square gets converted into a diamond question mark.
It's unclear where and how you are searching, although you could use the hex equivalent:
\x{25A1}
Example:
https://regex101.com/r/b84oBs/1
The black diamond with a question mark is not a character, per se. It is what a browser spits out at you when you give it unrecognizable bytes.
Find out where that data is coming from.
Determine its encoding. (Usually UTF-8, but might be something else.)
Be sure the browser is configured to display that encoding. This is likely to suffice <meta charset=UTF-8> in the header of the page.
I found a workaround using Notepad++ and this website. It's still not clear what encoding system the square is originally from, but when I post it into the query field in the website above or into the Notepad++ Conversion Table (Plugins > Converter > Conversion Table) it gives the hex-character code for the "Replacement Character" which is the diamond with the question mark.
Using this code in a regex expression, \x{FFFD}, within Notepad++ search gave me all the squares, although recognizing them as the Replacement Character.

regex validation for keyboard special character- grails

i am very new to this Grails.
I know there are ways in it to stop the input with specific character using constraints and matches for the field.
I am using it to stop the user from entering any special character from the keyboard
I have used
matches:/^[^$##*^%~]*$/
it checks that field does not have *^%$##~, and it works fine for this set of characters but I also want to restrict the user from adding +-(}/\|{[?]!<>~;',=&_.:" (in short all the special symbols on keyboard). And using only this constraints. I have tried putting them in this regular expression pattern but it is still allowing it or if does allow than it not showing in error message which were entered in the field.
For ex:- If I have entered (+)&^ than error message is shown only as "Please do not enter ^." but I want, "Please do not enter (+)&^."
Please let me know if anyone knows.
Please also note that I am required to use only Grails/Groovy support no JS/JQuery.
Thanks
Below regex will prevent from entering any character other than alphanumeric, and also at least one character. If you do not want minimum one character, then replace + with *
/^[^a-zA-Z0-9]+$/

ColdFusion -- Do I need URLDecode with form POSTs? / URLDecode randomly removes one character

I'm using a WYSIWYG to allow users to format text. This is the error-causing text:
<p><span style="line-height: 115%">This text starts with a 'T'</span></p>
The error is that the 'T' in "This", or whatever the first letter happens to be, is randomly removed when using URLDecode and saving to the DB. Removing URLDecode on the server side seems to fix it without any negative side-effects (the DB contains the same information).
The documentation says that
Query strings in HTTP are always URL-encoded.
Is this really the case? If so, why doesn't removing URLDecode seem to mess everything up?
So two questions:
Why is URLDecode causing the first text character to be removed like this (it seems to only happen when the line-height property is present)?
Do I really need (or would I even want) to use URLDecode before putting POSTed data into the database?
Edit: I made a test page to echo back the decoded text, and URLDecode is definitely removing that character, but I have no idea why.
I believe decoding is done automatically when form scope is populated. That's why characters after % (this char is used for encoding) are removed -- you are trying to decode the string second time.
For security reasons you might be interested in stripping script tags, or even cleaning up HTML using white-list. Try to search in CFLib.org for applicable functions.