GATE ontology editor with Arabic lang - gate

Dears,
I use Gate Developer 8.5, and Ontology plugin, Ontology Editor :
I load my initial file "test.owl", its classes' names were written in Arabic Language, but when I try to update this file , like adding sub class and try to write its name in Arabic,
I get an error: invalid class name, so, how can I enable Arabic characters in Ontology Editor interfaces.
adding sub class with English name done successfully.

I'm quite surprised that class names can use Arabic characters directly.
According to this answer:
https://stackoverflow.com/a/1547940/1857897
nonstandard characters in URIs need to be encoded with the percent-encoding...
You should be able to use Arabic characters in class labels without problems. But I'm not sure about class names...

Related

Special Characters in Sitecore Item Names (EXM)

I'd like to create a new Sitecore item using a name that contains German characters. At the moment every time the name contains "ä" "ö" or "ü" Sitecore complains about the name and prevents me from creating it.
I can see the same problem using the EXM module (E-mail Experience Manager). I cannot create any newsletters that contain special characters.
Is there any way I could change it?
The problem is caused by the regular expression which is used for the "Name" field validation.
You can resolve the issue by replacing the ItemNameValidation setting value with the following value:
^[\w\*\$]*[a-zA-ZäöüßÄÖÜẞ\][\w\s\-\$]*(\(\d{1,}\)){0,1}$ or add a new patch config file to your project which is recomended.
<setting name="ItemNameValidation" set:value="^[\w\*\$]*[a-zA-ZäöüßÄÖÜẞ\][\w\s\-\$]*(\(\d{1,}\)){0,1}$" />

DCM4CHE cannot display Japnese Character

I am using dcm4che as my PACS and I am inserting a DICOM file which contains the patient name in Japanese character.
But the web based url of dcm4chee is not supporting Japanese character and showing the patient name as garbled characters( like question marks and squares ).
For DCM4CHE i am using postgresql as the DataBase. In DB properties it is showing 'Encoding as UTF8', 'Collation as English_India.1252' and 'Character Type as English_India.1252'. Does my DB supports Japanese character ?
I am new to Database and any help will be appreciated.
EDIT:
This issue was not related to PACS. I obtained a valid DICOM file with Japanese charters( they are using specific character set as \ISO 2022 IR 87 ) and send the same to PACS. Its correctly showing in the PACS. So the issue is with my DICOM file. I also inserted the specific character set as '\ISO 2022 IR 87'. But still I am getting garbled japanese characters.
I am using MergeCom Dicom utility and using 'MC_Set_Value_From_String' API for inserting the japanese string. Am I missing anything ? Is it not possible to insert Japanese characters by using 'MC_Set_Value_From_String' ? I am thinking of using the API MC_Set_Value_From_UnicodeString.
UTF-8 supports all unicode code points, which includes Japanese. So it is unlikely the database is the issue.
What is the content of the Specific Character Set (0008,0005) tag? The default character encoding for dicom is ASCII. There is a section in the dicom spec providing examples of use with Japanese.
I could solve the issue.
The issue was related to the encoding. For Unicode conversion, I was using the windows API "WideCharToMultiByte" with code page was UTF-8. This was not properly converting the Japanese characters which was fixed by using code page as 50222.
You can find all the entire code page from below link.
https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx

HTML labels with ocamlgraph

Is it possible to make a graph like this one with ocamlgraph? HTML labels have to be delimited with <> instead of "" and I don't see any mention of this functionality in the documentation.
They can parse this kind of dot nodes: the documentation for the Dot_ast module of OCamlgraph has a Html of string case of the id type for this. It seems like they cannot print this kind of dot files, as the `Label node of the Dot attributes only handles direct strings.
If you need this feature, you could consider implementing it yourself (just change the files graphviz.ml and graphviz.mli), I'm sure the authors would be glad to have some contribution.

How to not transform special characters to html entities with owasp antisamy

I use Owasp Anti samy with Ebay policy file to prevent XSS attacks on my website.
I also use Hibernate search to index my objects.
When I use this code:
String html = "special word: été";
// use the Ebay configuration file
Policy policy = Policy.getInstance(xssPolicyFile.getInputStream());
AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(html, policy);
// result is now : "special word: été"
result = cr.getCleanHTML();
As you can see all chars "é" has been transformed to their html entity equivalent "é"
My page is on UTF-8, so I don't need this transformation. Moreover, when I index this text with Hibernate Search, it indexes the word with html entities, so I can't find word "été" on my index.
How can I force antisamy to not transform special chars to their html entity equivalent ?
thanks
PS: an issue has been opened : http://code.google.com/p/owaspantisamy/issues/detail?id=99
I ran into the same problem this morning.
I have encapsulated antisamy in a class and I use apache StringEscapeUtil from apache common-lang to restore special characters.
CleanResults cleanResults = antiSamy.scan(taintedHtml);
cleanedHtml = cleanResults.getCleanHTML();
return StringEscapeUtils.unescapeHtml(cleanedHtml)
The result is a cleaned up HTML without the HTML escaping of special characters.
Hope this helps.
Like Mohamad said it in a comment, Antisamy has just released a new directive named : entityEncodeIntlChars
here is the detail : http://code.google.com/p/owaspantisamy/source/detail?r=240
It seems that this directive solves the problem.
After scouring the AntiSamy source code, I found no way of changing this behavior apart from modifying AntiSamy.
Check out this one: http://code.google.com/p/owaspantisamy/source/browse/#svn/trunk/dotNet/current/source/owaspantisamy/html/scan
Grab the source and notice that key classes (AntiSamyDOMScanner, CleanResults) use standard framework objects (like XmlDocument). Compile and run with the binary you compiled - so that you can see everything in a debugger - as in which of the major classes actually corrupts your data. With that in hand you'll be able to either change a few properties on major objects to make it stop or inject your own post-processing to revert the wrongdoing (say with a regexp). Latter you can expose that as additional top-level property, say one named NoMess :-)
Chances are that behavior in that respect is different between languages (there's 3 in that trunk) but the same tactics will work no matter which one you have to deal with.

How can I access django admin's transliterator for slugs when using fixtures?

In the admin, if you enter in a slug two things are applied through JS:
The string is made slug-friendly
The string is transliterated if the language is not English, so for example Cyrillic Russian text gets converted into Transliterated Russian ( typed out in English )
I'm basically inserting a couple thousand rows and I need to access this. Does django provide a non-js server-side version of this transliterator which I can access to somehow do the insertion?
Looks like I have to port over the usr/lib/pymodules/python2.5/django/contrib/admin/media/js/urlify.js code unless I can figure out a way to programmatically load all articles on the client side and slugify themselves automatically.