encoding of query string parameters in IE10 - web-services

I got a request from a customer that he wants to be able to type the query string of my web service with parameters in the IE10 address bar and get the service results. The parameters include string in Hebrew, like:
http://mywebsite.com/service.asmx/foo?param1=123&param2=מחרוזתבעברית
It seems to me that that IE10 won't encode the query string parameters - every non-ASCII character that goes after the ? mark would be turned to '3f' byte, though it does encode what goes before the ? mark - the url itself.
For example, if i try to reach the url (the parameter is fictional, url is not, and I have no connection with the site)
http://www.shlomo.co.il/pageshe/sales/רכב-למכירה.asp?param=פאראם
and look in wireshark for the bytes I send to the server, it shows me
You can see it does substitute the hebrew part of the URL with urlencoded string, but substitutes the hebrew parameters with ?????, which are '3f's.
The same string in chrome would be encoded in it's entirety:
GET http://www.shlomo.co.il/pageshe/sales/%D7%A8%D7%9B%D7%91-%D7%9C%D7%9E%D7%9B%D7%99%D7%A8%D7%94.asp?param=%D7%A4%D7%90%D7%A8%D7%90%D7%9D HTTP/1.1
I tried it on machines with win7/IE10 and winXPheb/IE8.
My IE settings are (especially checked the "Always show encoded addresses option" to see if it helps and restarted, but made no difference):
I tried to search around for any info about the issue, but didn't find much of it.
My questions are:
Is it indeed like this, or am I missing something?
Is this behavior documented anywhere?
Are there any settings in IE/Win which enable the parameters encoding.
p.s. Sure if I was developing the client/web ui, I would simply urlencode my query, but my request from customer was exactly to paste the query to IE address bar, that's why I'm interested in this specific behavior.
Thanks.

Yes, your observation of the behavior is accurate. Internet Explorer 10 and below follow a complicated algorithm for encoding the URL. This was allegedly updated in Internet Explorer 11, but I've found that the new option doesn't seem to work.
The "Always show encoded addresses option" concerns whether PunyCode is shown for IDN hostnames, and does not impact the query string. Send UTF-8 URLs mostly applies to the encoding of the path, although it can also affect other codepaths
The behavior isn't fully documented anywhere. I'd meant to write a full post on my IEInternals blog about it but ended up moving on from Microsoft before doing so. There's a partial explanation in this blog post.
Yes, there are settings that impact the behavior. The Send UTF-8 URLs checkbox inside Tools > Internet Options > Advanced is one of the variables that determines how URLs are sent, but the option does not blindly do what it implies (it only UTF-8 encodes the path, not the query string). Other variables involved include:
Where the URL was typed (e.g. address bar vs. Start > Run, etc)
What the system's ANSI codepage is (e.g. what locale the OS uses as default)
The charset of the currently loaded page in the browser
As a consequence of these variables, you cannot reliably use URLs which are not properly encoded (e.g. %-escaped UTF8) in Internet Explorer.

Unfortunately this is still true for Internet Explorer 11 (build 11.0.9600.17358, win7-x64)
I saw that you can not unfortunately change the web server. However those who are developing new services may consider changing request parameters into path variables, e.g. from http://myserver.com/page?τεστ into http://myserver.com/τεστ/

If the client is calling the web-service from javascript,
encodeuricomponent can be used. In your case encodeuricomponent("מחרוזתבעברית");
http://www.w3schools.com/jsref/jsref_encodeURIComponent.asp

Related

Using ColdFusion to show Chinese Characters from AS/400 server

I am writing a ColdFusion program that uses cfquery to get data from an AS/400 iSeries table and then output that data to a web page. Some times the Data is in Chinese, but it does not output the Chinese characters correctly.
I built the query below for testing,
<cfprocessingdirective pageEncoding="UTF-8" />
<cfquery name="Test" Datasource = "AS400">
select dsc1 from sales where ref = '123456'
</cfquery>
<cfoutput>#test.dsc1#</cfoutput>
The result should be "M5方头螺栓" but I only get "M5". I did another test running just:
<cfset x = "M5方头螺栓"/>
<cfoutput>#x#</cfoutput>
and it displays the Chinese no problem.
Since ColdFusion can display the characters when they are written out in the code, but not when it goes to get the data through SQL, it seems like the issue is with either my ODBC settings or my ColdFusion Server Data Source Settings but I'm not familiar enough with these settings to know what needs to be changed to get this working.
A workaround was found and discussed within the comments. Adding some details here as an answer for future visitors to this page.
There are a couple of considerations when dealing with Unicode (Chinese) characters:
The data type for the database table must be set to nvarchar
The form processing script (CFML) must be set to utf-8
I believe ColdFusion defaults to this but you can specify the setting to be sure.For example: <cfprocessingDirective pageEncoding=”utf-8″>
Enable "String Format" within the ColdFusion datasource settings
Under the ColdFusion administrator datasource settings select the appropriate datasource you are using. Then click on the "show advanced settings" button. That will show an option for "String Format" Enable High ASCII characters and Unicode for data sources configured for non-Latin characters. Select this option and save the datasource.
The issue for the OP was that they were using an ODBC datasource and the "String Format" option was not available. After some research and the lack of finding any way to configure an ODBC datasource for that setting I recommended trying to use the builtin JDBC driver for "DB2 Universal Database" that comes with ColdFusion. Switching to that driver resolved this issue for the OP.
From the comments
Good info. Though is "Enable String Format..." necessary with the added support for cf_sql_nvarchar in CF10+? – #Leigh
I do believe Leigh is correct that the newer versions of ColdFusion (10 and later) have much better support for nvarchar fields.
Also to note, it looks like some older versions of ColdFusion don't always work with the installed DB2 Universal Driver, and it doesn't look like the older standard versions even have it, I'm not sure if the newer ones have it either, but using the "other" option with jt400.jar, should also work. - #MHall
You've already proven that CF can output UTF-8 characters correctly. Have you tried running that query in the DB console or UI? Do you get the correct charaters?
If the characters were stored as VARCHAR and not NVARCHAR, then there's nothing you can do. The data has to have been properly stored in the first place.
If the characters are stored correctly in the DB, try adding <cfprocessingdirective pageEncoding="utf-8"> at the top of the request. CF should be using UTF-8 by defualt, but this will force the correct character set if, for some reason, it isn't.

How to disable Sitecore's embedded language parser

I have a site that has many URL rewrites and a good portion of them contain old links that are prefixed with a country code (e.g. /fr, /de, etc). Rewrites without the prefixes work just fine but those with trigger Sitecore's embedded language URL parser which bypasses the rewrite module entirely.
Example
/fr/old-link tries to parse 'fr' as a language and fails as 'fr-FR' is the name of the French language.
Solution I need to disable Sitecore's ability to detect a language prefix in the URL so the URL rewrite module can proceed unhindered.
I can't find where it is in the pipeline that this occurs. I've gone through numerous with Reflector and come up short. I need help please.
Another pipeline to look at is the preprocessRequest pipeline. It has a StripLanguage processor that detects if the first part of the URL is a language and acts on it.
More info on how to get Sitecore to ignore the language part of the url can be found in this post http://sitecoreblog.patelyogesh.in/2013/11/sitecore-item-with-language-name.html
You will need to create a new LanguageResolver to replace the standard Sitecore one (Sitecore.Pipelines.HttpRequest.LanguageResolver). This is referenced in the <httpRequestBegin> pipeline section in web.config. Here you can handle requests beginning with fr as opposed to fr-FR etc. In the past I have done a similar thing for when we wanted to use non-ISO language codes.
EDIT
The LanguageResolver resolves language based on query string first, but will also resolve based on file path (i.e. having fr-FR in the start of your path). I think you would need to inherit from the Sitecore LanguageResolver and override the GetLanguageFromRequest method changing the else statement to use something different to Context.Data.FilePathLanguage - possibly just using regex/string manipulation to get the first folder from the URL then use that to set the context language. This should prevent the failure to resolve language which I understand is killing your URL rewrite module.

cfhttp how to not encode plus sign

Situation: I am trying to call the LinkedIn API from a ColdFusion CFC to get the user's profile and network (connections). The LinkedIn API states that to do this you must call a URL with scope=r_fullprofile+r_network.
Issue: ColdFusion is automatically encoding the URL, so the plus sign is getting encoded, and LinkedIn is rejecting my call. Is there any way around this? I've posted a link below to some code snippets on github which I believe illustrate the issue.
https://gist.github.com/4535364
Any help would be appreciated!
I have searched around on this for a bit and I am seeing lots of examples where ColdFusion is not playing nicely with the LinkedIn API. So I'm afraid if you do get passed this issue (although I have not come up with an alternative yet) another will crop up. While searching I found several suggestions from people to use the linkedin-j, A Java wrapper for LinkedIn APIs instead. Here are some of the references that I found:
Working example Coldfusion and Linkedin API
LinkedIn-J does not return educations
401 Unauthorized response. API people/~ and people/id=; ColdFusion, cfhttp
Problem updating status - 401 unauthorized - ColdFusion
linkedin-j Getting Started
Side Note Your github code example is making a cfhttp call to 'receiver.cfm' but you called the file 'cfhttp_receiver.cfm'. In this line:
<cfhttp url="http://#cgi.http_host#/sandbox/receiver.cfm?scope=#url.scope#" method="post" resolveurl="no">
The scope field is a space delimited list.
The + character is commonly used as a shortcut for space, since it's more readable than %20 (which is what space encodes to).
If using a plus character results in an encoded plus (%2B) being sent, then you are left with two other ways of putting the space into the URL:
using a literal space character, or
using an encoded space %20
Try both of those options, ideally using a network snifer (e.g. WireShark) so that you can see accurately what is being sent.
Update: As per comments below, %20 is correct, but the signature based string needs to be encoded again, so for that the % becomes %25, giving a result of %2520.

How to retrieve codepage from cURL HTTP response?

I'm using lib-cURL as a HTTP client to retrieve various pages (can be any URL for that matter).
Usually the data comes as a UTF-8 string and then I just call "MultiByteToWideChar" and it works well.
However, some web-pages still use code-page encoding and I see gibberish if i try to convert those pages to UTF-8.
Is there an easy way to retrieve the code page from the data? or I'll have to scan it manually (for "encoding=") and then translate it accordingly.
If so, how do i get the code-page id from name (Code Page Identifiers)?
Thanks,
Omer
There are several location where a document can state its encoding:
the Content-Type HTTP header
the (optional) XML declaration
the Content-Type meta tag inside the document header
for HTML5 documents the charset meta tag.
There are probably even more I've forgotten.
In the end, detecting the actual encoding is rather hard. You really shouldn't do this yourself but use high-level libraries for retrieving and parsing HTML content. I'm sure they are available even for C++, even if they have to be thiefed from the a browser environment. :)
I used DetectInputCodepage in IMultiLanguage2 interface and it worked great !

How set a website as homepage in IE, Firefox, Chrome and Safari with C++?

Is there a way to set a website like google.com as homepage through C++ or C ? How ?
Not sure what your motive is, but I don't think of this as something I want any code on my system to be setting out from under me. It sounds like the kind of thing adware/malware would do to your grandparents (who wouldn't know how to fix it once it's set). Note the negative comments when the question was asked of how to do it from JavaScript:
How can I set default homepage in FF and Chrome via javascript?
It's better to point people at instructions for doing it themselves. Remind with a banner which says "Make us your homepage!", and link to something along these lines:
http://www.makeuseof.com/tag/how-to-change-your-homepage-in-5-browsers/
If not for the aesthetic reasons, there are technical reasons not to try and write code for it. Each browser stores this information in its own place. In IE's case, there appears to be a registry setting:
HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\Start Page
So you'd use calls to the Windows Registry API to query it and set it. But Firefox doesn't save this in the registry, it saves it in something called prefs.js and you'll be looking for:
user_pref("browser.startup.homepage", .... );
Then there's Opera, Safari, Chrome, etc. All told, better to just give people directions and put them in control of their experience!
Imports Microsft.Win32
...
Module Util
Sub SetHomePage(Dim theUrl As String)
Registry.SetValue("HKCU\Software\Microsoft\Internet Explorer\Main", "Start Page", theUrl)
End Sub
End Module
Yes.
Find the way each browser saves its configuration to disk and edit that (*). It may be a file, or records in a database, or some data in a central registry, or some other scheme --- the browser documentation should tell you.
To open/read/write/save/close a file, the C functions declared in the header <stdio.h> may be helpful.
(*) for Firefox it's a file named "prefs.ini" in a directory somewhere under the users home path; there may be more than 1 such file if the user has more than 1 profile.