How are unicode values encoded in Google Cloud Load Balancer custom headers? - google-cloud-platform

When using Google Cloud Load Balancer, you can add certain fields to custom headers such as {client_city}, which (surely?) are unicode strings.
I cannot find in the documentation how these are encoded in the header (bearing in mind that HTTP headers are ASCII only, or possibly Latin-1 according the "obsolete" productions in the HTTP RFC).
Since I'm in a city with an ASCII name, I've not been able to run a test from my local box.

Google has a document for this:
client_city:
Name of the city from which the request originated, for
example, “Mountain View” for Mountain View, California. There is no
canonical list of valid values for this variable. The city names may
contain US-ASCII letters, numbers, spaces, and the following
characters: !#$%&'*+-.^_`|~.
Some headers can be Unicode, whereas some well known headers are Ascii.
https://cloud.google.com/load-balancing/docs/backend-service

Related

Is HTML supported in subject line of email

I'm sending emails, using Django framework.
And I wonder is it possible to make subject line red, using <span style="color:red">Some subject line</span>
No, HTML tags will not be rendered in the subject field of an email by RFC2822 compliant clients.
The RFC defines lexical tokens (HTML tags etc.) to be used in the body, not in the header fields. The subject is part of the header fields.
Note that this is not a limitation of Django.
If you want fanciness, you might want to look into including unicode characters, which is becoming more and more popular these days.

DCM4CHE cannot display Japnese Character

I am using dcm4che as my PACS and I am inserting a DICOM file which contains the patient name in Japanese character.
But the web based url of dcm4chee is not supporting Japanese character and showing the patient name as garbled characters( like question marks and squares ).
For DCM4CHE i am using postgresql as the DataBase. In DB properties it is showing 'Encoding as UTF8', 'Collation as English_India.1252' and 'Character Type as English_India.1252'. Does my DB supports Japanese character ?
I am new to Database and any help will be appreciated.
EDIT:
This issue was not related to PACS. I obtained a valid DICOM file with Japanese charters( they are using specific character set as \ISO 2022 IR 87 ) and send the same to PACS. Its correctly showing in the PACS. So the issue is with my DICOM file. I also inserted the specific character set as '\ISO 2022 IR 87'. But still I am getting garbled japanese characters.
I am using MergeCom Dicom utility and using 'MC_Set_Value_From_String' API for inserting the japanese string. Am I missing anything ? Is it not possible to insert Japanese characters by using 'MC_Set_Value_From_String' ? I am thinking of using the API MC_Set_Value_From_UnicodeString.
UTF-8 supports all unicode code points, which includes Japanese. So it is unlikely the database is the issue.
What is the content of the Specific Character Set (0008,0005) tag? The default character encoding for dicom is ASCII. There is a section in the dicom spec providing examples of use with Japanese.
I could solve the issue.
The issue was related to the encoding. For Unicode conversion, I was using the windows API "WideCharToMultiByte" with code page was UTF-8. This was not properly converting the Japanese characters which was fixed by using code page as 50222.
You can find all the entire code page from below link.
https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx

encoding of query string parameters in IE10

I got a request from a customer that he wants to be able to type the query string of my web service with parameters in the IE10 address bar and get the service results. The parameters include string in Hebrew, like:
http://mywebsite.com/service.asmx/foo?param1=123&param2=מחרוזתבעברית
It seems to me that that IE10 won't encode the query string parameters - every non-ASCII character that goes after the ? mark would be turned to '3f' byte, though it does encode what goes before the ? mark - the url itself.
For example, if i try to reach the url (the parameter is fictional, url is not, and I have no connection with the site)
http://www.shlomo.co.il/pageshe/sales/רכב-למכירה.asp?param=פאראם
and look in wireshark for the bytes I send to the server, it shows me
You can see it does substitute the hebrew part of the URL with urlencoded string, but substitutes the hebrew parameters with ?????, which are '3f's.
The same string in chrome would be encoded in it's entirety:
GET http://www.shlomo.co.il/pageshe/sales/%D7%A8%D7%9B%D7%91-%D7%9C%D7%9E%D7%9B%D7%99%D7%A8%D7%94.asp?param=%D7%A4%D7%90%D7%A8%D7%90%D7%9D HTTP/1.1
I tried it on machines with win7/IE10 and winXPheb/IE8.
My IE settings are (especially checked the "Always show encoded addresses option" to see if it helps and restarted, but made no difference):
I tried to search around for any info about the issue, but didn't find much of it.
My questions are:
Is it indeed like this, or am I missing something?
Is this behavior documented anywhere?
Are there any settings in IE/Win which enable the parameters encoding.
p.s. Sure if I was developing the client/web ui, I would simply urlencode my query, but my request from customer was exactly to paste the query to IE address bar, that's why I'm interested in this specific behavior.
Thanks.
Yes, your observation of the behavior is accurate. Internet Explorer 10 and below follow a complicated algorithm for encoding the URL. This was allegedly updated in Internet Explorer 11, but I've found that the new option doesn't seem to work.
The "Always show encoded addresses option" concerns whether PunyCode is shown for IDN hostnames, and does not impact the query string. Send UTF-8 URLs mostly applies to the encoding of the path, although it can also affect other codepaths
The behavior isn't fully documented anywhere. I'd meant to write a full post on my IEInternals blog about it but ended up moving on from Microsoft before doing so. There's a partial explanation in this blog post.
Yes, there are settings that impact the behavior. The Send UTF-8 URLs checkbox inside Tools > Internet Options > Advanced is one of the variables that determines how URLs are sent, but the option does not blindly do what it implies (it only UTF-8 encodes the path, not the query string). Other variables involved include:
Where the URL was typed (e.g. address bar vs. Start > Run, etc)
What the system's ANSI codepage is (e.g. what locale the OS uses as default)
The charset of the currently loaded page in the browser
As a consequence of these variables, you cannot reliably use URLs which are not properly encoded (e.g. %-escaped UTF8) in Internet Explorer.
Unfortunately this is still true for Internet Explorer 11 (build 11.0.9600.17358, win7-x64)
I saw that you can not unfortunately change the web server. However those who are developing new services may consider changing request parameters into path variables, e.g. from http://myserver.com/page?τεστ into http://myserver.com/τεστ/
If the client is calling the web-service from javascript,
encodeuricomponent can be used. In your case encodeuricomponent("מחרוזתבעברית");
http://www.w3schools.com/jsref/jsref_encodeURIComponent.asp

Django Mezzanine Unicode Error

i get an unicode error when trying url like www.mysite.com/blog/category/πρακτικα/ or www.mysite.com/blog/πρακτικα/
but i dont get the error when trying www.mysite.com/blog/tag/πρακτικα/
UnicodeEncodeError at /blog/category/πρακτικα/ 'latin-1' codec can't encode characters in >position 58-65: ordinal not in range(256)
Exception Location: /home/vagrant/sullogos-venv/local/lib/python2.7/site-packages/django/template/loaders/filesystem.py in load_template_source, line 37
seems it haves different behavior at categories and at tags
The difference is that categories can have a custom template and tags can't. So in the category case, a template name is searched for using the category slug - the error you're getting is due to an incorrectly configured locale which doesn't support utf8.
This is not a problem with Mezzanine or Django, but with the environment used to deploy them. See this issue and this documentation for more details. It's not enough for Python to support a specific locale, but it's also necessary for the webserver to be able to handle Unicode files correctly.
How to fix it will depend on the webserver used. If you're using Apache, for instance, you need to set LANG and LC_ALL to Unicode-compatible values (on *NIX systems at least you should find them at /etc/apache2/envvars). An example would be:
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'
Feel free to replace the language/country codes with another one more suitable for your needs (I used pt_BR instead of en_US and things worked fine for me). From the error message you're seeing, these settings in your system are probably using ISO-Latin (ISO-8859-1) instead of UTF-8 (which I assume can't handle cyrillic).
If you're using a different webserver, check its documentation on localization/internationalization to see what needs to be changed. The important thing is to offer support to Unicode file names, as I understood.

How to retrieve codepage from cURL HTTP response?

I'm using lib-cURL as a HTTP client to retrieve various pages (can be any URL for that matter).
Usually the data comes as a UTF-8 string and then I just call "MultiByteToWideChar" and it works well.
However, some web-pages still use code-page encoding and I see gibberish if i try to convert those pages to UTF-8.
Is there an easy way to retrieve the code page from the data? or I'll have to scan it manually (for "encoding=") and then translate it accordingly.
If so, how do i get the code-page id from name (Code Page Identifiers)?
Thanks,
Omer
There are several location where a document can state its encoding:
the Content-Type HTTP header
the (optional) XML declaration
the Content-Type meta tag inside the document header
for HTML5 documents the charset meta tag.
There are probably even more I've forgotten.
In the end, detecting the actual encoding is rather hard. You really shouldn't do this yourself but use high-level libraries for retrieving and parsing HTML content. I'm sure they are available even for C++, even if they have to be thiefed from the a browser environment. :)
I used DetectInputCodepage in IMultiLanguage2 interface and it worked great !