Using ColdFusion to show Chinese Characters from AS/400 server - coldfusion

I am writing a ColdFusion program that uses cfquery to get data from an AS/400 iSeries table and then output that data to a web page. Some times the Data is in Chinese, but it does not output the Chinese characters correctly.
I built the query below for testing,
<cfprocessingdirective pageEncoding="UTF-8" />
<cfquery name="Test" Datasource = "AS400">
select dsc1 from sales where ref = '123456'
</cfquery>
<cfoutput>#test.dsc1#</cfoutput>
The result should be "M5方头螺栓" but I only get "M5". I did another test running just:
<cfset x = "M5方头螺栓"/>
<cfoutput>#x#</cfoutput>
and it displays the Chinese no problem.
Since ColdFusion can display the characters when they are written out in the code, but not when it goes to get the data through SQL, it seems like the issue is with either my ODBC settings or my ColdFusion Server Data Source Settings but I'm not familiar enough with these settings to know what needs to be changed to get this working.

A workaround was found and discussed within the comments. Adding some details here as an answer for future visitors to this page.
There are a couple of considerations when dealing with Unicode (Chinese) characters:
The data type for the database table must be set to nvarchar
The form processing script (CFML) must be set to utf-8
I believe ColdFusion defaults to this but you can specify the setting to be sure.For example: <cfprocessingDirective pageEncoding=”utf-8″>
Enable "String Format" within the ColdFusion datasource settings
Under the ColdFusion administrator datasource settings select the appropriate datasource you are using. Then click on the "show advanced settings" button. That will show an option for "String Format" Enable High ASCII characters and Unicode for data sources configured for non-Latin characters. Select this option and save the datasource.
The issue for the OP was that they were using an ODBC datasource and the "String Format" option was not available. After some research and the lack of finding any way to configure an ODBC datasource for that setting I recommended trying to use the builtin JDBC driver for "DB2 Universal Database" that comes with ColdFusion. Switching to that driver resolved this issue for the OP.
From the comments
Good info. Though is "Enable String Format..." necessary with the added support for cf_sql_nvarchar in CF10+? – #Leigh
I do believe Leigh is correct that the newer versions of ColdFusion (10 and later) have much better support for nvarchar fields.
Also to note, it looks like some older versions of ColdFusion don't always work with the installed DB2 Universal Driver, and it doesn't look like the older standard versions even have it, I'm not sure if the newer ones have it either, but using the "other" option with jt400.jar, should also work. - #MHall

You've already proven that CF can output UTF-8 characters correctly. Have you tried running that query in the DB console or UI? Do you get the correct charaters?
If the characters were stored as VARCHAR and not NVARCHAR, then there's nothing you can do. The data has to have been properly stored in the first place.
If the characters are stored correctly in the DB, try adding <cfprocessingdirective pageEncoding="utf-8"> at the top of the request. CF should be using UTF-8 by defualt, but this will force the correct character set if, for some reason, it isn't.

Related

Referencing code-created datasources in Lucee

I have created a number of datasources in Lucee using code. This is for a legacy ColdFusion application that we are migrating to Azure, and per the powers-that-be, they want the DSNs created in code so we can store the DSN passwords in a keystore. I have that part already working.
The datasources look something like this: this.datasources["myDSN"]
If, in the code (Application.cfm), I do this:
<cfset myDSN = this.datasources["myDSN"]>
This will then fail:
<cfquery name="whatever" datasource="#myDSN#">
It fails with "datasource myDSN not found."
BUT, if I do this instead:
<cfquery name="whatever" datasource="#this.datasources['myDSN']#">
... it works fine.
Is there a workaround for this? At last check in this one application alone, there are 368 occurrences of datasource= in 115 files. I'd rather not have to do a bulk search/replace. It makes no sense to me that the variable "myDSN" would fail.
As there are multiple datasources being used, I can't just set the default datasource and remove the datasource= attribute entirely; even then, it'd still require a mass search/replace.
I must be missing something. I've read the Lucee docs on datasources but it hasn't helped. Thanks!
Turns out that Scott Stroz was correct. I switched over to Application.cfc and now it works fine.

How to solve the cfhtmltopdf encoding bug in coldfusion 11 update 8?

We have a system in coldfusion using cfhtmltopdf to generate reports in pdf.
The html of all reports are generated with the
< meta charset="iso-8859-1">
It was working well until we install the update 8 on it.
Now all the variables come from database has the acentuation bugged.
A "É" is showed like a "Ã%°"
A "Â" is showed like a "Ã,"
To fix the accentuation in the pdf just using encodeForHTML in each variable, and it's an impossible job, some reports have 200 variables from database.
Did you try cfprocessingdirective?
<cfprocessingdirective pageEncoding = "iso-8859-1">
</cfprocessingdirective>

Django character latin1 mysql

I create a django application with existing MySQL database, the problem is the encoding to database is latin1_general_c and the utf8 characters is save like this ñ => ñ , ó => ó, i need present the information in that page in correctly form but django show the information of database like this
recepción, 4 oficinas, 2 baños
i need show like this
recepcíon, 4 oficinas, 2 baños
For many reasons I can't change the database to utf8
what do I do for show information the correctly way?
Django seems to explicitly require that your database use UTF-8:
Django assumes that all databases use UTF-8 encoding. Using other encodings may result in unexpected behavior.
That said, it should be possible to use the custom OPTIONS setting to pass the desired character set to the database driver. See this answer, in which setting charset solved the problem. But you should check the documentation for the specific MySQL driver and database that you're using, since these options are not used by Django itself.
As suggested by the quote above, even if this works to correctly translate strings between the database and unicode, parts of Django still won't work correctly. For example, to correctly validate the length of a CharField Django has to know the encoding, and it will always assume UTF-8.

encoding of query string parameters in IE10

I got a request from a customer that he wants to be able to type the query string of my web service with parameters in the IE10 address bar and get the service results. The parameters include string in Hebrew, like:
http://mywebsite.com/service.asmx/foo?param1=123&param2=מחרוזתבעברית
It seems to me that that IE10 won't encode the query string parameters - every non-ASCII character that goes after the ? mark would be turned to '3f' byte, though it does encode what goes before the ? mark - the url itself.
For example, if i try to reach the url (the parameter is fictional, url is not, and I have no connection with the site)
http://www.shlomo.co.il/pageshe/sales/רכב-למכירה.asp?param=פאראם
and look in wireshark for the bytes I send to the server, it shows me
You can see it does substitute the hebrew part of the URL with urlencoded string, but substitutes the hebrew parameters with ?????, which are '3f's.
The same string in chrome would be encoded in it's entirety:
GET http://www.shlomo.co.il/pageshe/sales/%D7%A8%D7%9B%D7%91-%D7%9C%D7%9E%D7%9B%D7%99%D7%A8%D7%94.asp?param=%D7%A4%D7%90%D7%A8%D7%90%D7%9D HTTP/1.1
I tried it on machines with win7/IE10 and winXPheb/IE8.
My IE settings are (especially checked the "Always show encoded addresses option" to see if it helps and restarted, but made no difference):
I tried to search around for any info about the issue, but didn't find much of it.
My questions are:
Is it indeed like this, or am I missing something?
Is this behavior documented anywhere?
Are there any settings in IE/Win which enable the parameters encoding.
p.s. Sure if I was developing the client/web ui, I would simply urlencode my query, but my request from customer was exactly to paste the query to IE address bar, that's why I'm interested in this specific behavior.
Thanks.
Yes, your observation of the behavior is accurate. Internet Explorer 10 and below follow a complicated algorithm for encoding the URL. This was allegedly updated in Internet Explorer 11, but I've found that the new option doesn't seem to work.
The "Always show encoded addresses option" concerns whether PunyCode is shown for IDN hostnames, and does not impact the query string. Send UTF-8 URLs mostly applies to the encoding of the path, although it can also affect other codepaths
The behavior isn't fully documented anywhere. I'd meant to write a full post on my IEInternals blog about it but ended up moving on from Microsoft before doing so. There's a partial explanation in this blog post.
Yes, there are settings that impact the behavior. The Send UTF-8 URLs checkbox inside Tools > Internet Options > Advanced is one of the variables that determines how URLs are sent, but the option does not blindly do what it implies (it only UTF-8 encodes the path, not the query string). Other variables involved include:
Where the URL was typed (e.g. address bar vs. Start > Run, etc)
What the system's ANSI codepage is (e.g. what locale the OS uses as default)
The charset of the currently loaded page in the browser
As a consequence of these variables, you cannot reliably use URLs which are not properly encoded (e.g. %-escaped UTF8) in Internet Explorer.
Unfortunately this is still true for Internet Explorer 11 (build 11.0.9600.17358, win7-x64)
I saw that you can not unfortunately change the web server. However those who are developing new services may consider changing request parameters into path variables, e.g. from http://myserver.com/page?τεστ into http://myserver.com/τεστ/
If the client is calling the web-service from javascript,
encodeuricomponent can be used. In your case encodeuricomponent("מחרוזתבעברית");
http://www.w3schools.com/jsref/jsref_encodeURIComponent.asp

Data type support in ColdFusion querynew()

Does anyone know of a way to store values as NVARCHAR in a manually created query in ColdFusion using the querynew() function? I have multiple parts of a largish program relying on using a query as an input point to construct an excel worksheet (using Ben's POI) so it's somewhat important I can continue to use it as a query to avoid a relatively large rewrite.
The problem came up when a user tried storing something that is outside of the VARCHAR range, some Japanese characters and such.
Edit: If this is not possible, and you are 100% sure, I'd like to know that too :)
When creating a ColdFusion query with queryNew(), you can pass a list of datatypes as a second argument. For example:
<cfset x = queryNew("foo,bar","integer,varchar") />
Alternatively, you can use cf_sql_varchar (which you would use in queryparam tags). According to the livedocs, nvarchar is accepted for the CF varchar data type.
QueryParam livedoc (referenced for nvarchar data type)
QueryNew livedoc (referenced for data type definition)
Managing Data Types livedoc (referenced for using cf_sql_datatype)
The only thing I've been able to come up with so far is this:
<cfset x = QueryNew("foobar")/>
<cfset queryAddRow(x) />
<cfset querySetCell(x, "foobar", chr(163)) />
<cfdump var="#x#">
When dumped, this query does contain the British Pound symbol.
I haven't tried this with Ben's POI utility, but hopefully it helps you some.
You might try using JavaCast() to set the values, as shown here:
Kinky Solutions (Ben Nadel) on JavaCast()
Make sure you're using Unicode end-to-end.
This is pretty much all you need:
<cfprocessingdirective pageEncoding="utf-8">
ColdFusion (& java) stores string in UTF-8 by default. All you need is to tell CF that the encoding of the page is UTF8. The alternative way is to save the Byte-order mark (BOM), but Eclipse/CFEclipse doesn't do it.