Weka shows numbers in Persian and saves data output with question marks - weka

I am about to use Weka for processing a dataset that contains numeric and nominal values. As Persian and Arabic are the secondary languages I am using in my Windows 7 operating system, I assumed that this might be the reason that when I am trying to save a loaded CSV data file in Weka as an ARFF file format, the numbers are all saved as question (?) marks! However, even by removing these languages from the Control Panel setting, nothing changed.
Moreover, I have upgraded my Java version from 8 to 9 recently. I am not sure if this could be the reason for this.
I searched for a probable reason on the internet, though could not find any solution. Thanks, everybody in advance.

I found the solution, and it was easy!
In the Control Panel, I selected the Region and Language option. Then in the Formats tab, I changed the Format option from Persian to English (United States), and everything turned out to be working alright in Weka!
Moreover, in the Administrative tab, for the Language for non-Unicode programs option, it is better to set the language in use as English (United States).
Cheers ...

Related

Using Arabic text with custom Font in Cocos2DX

I have a use-case involving Arabic text in a game, with custom font. I am currently using the createWithTTF API call, and selecting the Font file that I would need.
However, since Arabic is a Right to Left(RTL) language instead of a Left to Right(LTR) language, the texts are getting printed incorrectly. Apparently, the best solution for this is to use the createWithSystemFont API call. However with this call, I would not be able to use a custom font and I would have to resort to a system font.
Is there any way that you guys know in Cocos2DX to use a custom font, with Arabic text? I did look into this Github issue. I tried the Arabic Writer out, but this gives glitchy output in certain cases. I know that editing the source JSON/Plist file is an option, and I have tried using reversed Arabic strings in the source. However, since Arabic is a language that has combined characters, the result that I get on my UI is not 1:1 with the expected result, and some characters are disjointed(which are supposed to form a special character after getting merged).
Looking for suggestions on how to tackle this. I have looked into almost all open threads related to this, and could not find anything conclusive. Thanks!
I wrote a fix for the Persian language. It works for Arabic as well but you may need some Arabic only characters to it. (Might need some editing)
https://github.com/MohammadFakhreddin/cocos2dx-persian-arabic-support

While writing records in a flat file using Informatica ETL job, greek characters are coming as boxes

While writing records in a flat file using Informatica ETL job, greek characters are coming as boxes.We can see original characters in the database.In session level, we are using UTF-8 encoding.We have a multi language application and need to process Chinese, Russian, Greek,Polish,Japanese etc. characters.Please suggest.
try to change your page encoding. I also faced this kind of issue. We are using ANSII encoding, hence we created separate integration service with different encoding and file ran successfully.
There is an easy option. In session properties, select target flat file then click set file propeties. In that you can change the code-page. There you can choose UTF-8. By default it is in ANSII, that's why you are facing this issue.

Visual C++/MFC: getting Japanese characters to work without UNICODE

I have software originally developed 20 years ago in Visual C++ using MFC without UNICODE. Currently strings are held either in char[] or CString, and it works on English and Japanese Windows PCs until Japanese characters are used, as these tend to get converted to strange characters or empty boxes.
Setting UNICODE is presumably the way forward but will require a massive code change, whereas quite a lot seems to work simply by setting System Locale to Japan (in “Window’s Language for non-Unicode programs” setting). I have no idea how Windows does this, but some Japanese character things now work on my English Windows PC, e.g. I can open and save Japanese filenames with no code changes. And in Japan they set System Locale to English and again much works, but not everything.
I get the impression the problems are due to using a font that doesn’t include Japanese characters. Currently I am using Arial / MS Sans Serif and charset set to ANSI_CHARSET or DEFAULT_CHARSET. Is there a different font I should be using, or can I extend these fonts to include Japanese characters? Or am I barking up the wrong tree in which case what do I do next? Am very new to all this unfortunately …
That's a common question (OK I guess not so common any more in 2015, as MBCS programs luckily are a dying breed - I still maintain several though...)
Either way, I'm afraid that, depending on your definition of 'working', to get this working you'll have to bite the bullet and convert to a Unicode build. If you can't make a business case for that, then you'll have to set the right locale (well, worse, have the user set the 'right' one) and test what works and what doesn't, and ask more specific questions on what doesn't.
If your goal is to make one application that correctly displays strings in various encodings in the 'right' way regardless of the locale settings on the computer, and compatible with every input data set / database content without the user having to be aware of encoding issues, then you're out of luck with an MBCS build.
The font missing characters is most likely not the problem. Before you go any further and/or ask further questions, you should read http://www.joelonsoftware.com/articles/Unicode.html, read it again, sleep on it, read it again, explain to somebody else what the relationship is between 'encoding', 'locale', 'character set', 'font' and 'Unicode code point', because only after you can do that, you can decide on how to progress with your application. Sorry, it's not what you want to hear, but it's the reality if you've been tasked with handling internationalization.

Localization Font

I am very new to localization, I am trying to localize a small software which has 19 folders 'en', 'jp, 'tw' as names for example. Inside each one is a text file saved as utf-8 with language data.
The problem is when I try and copy and paste from a chinese site I get strange glyphs like this [][][][] I presume its because my system font is not chinese and it does not support that.
As a developer should I somehow change my entire system font to have all of these languages supported? Is there such a font? I am unsure how software companies handle these things.
As a developer should I somehow change my entire system font to have
all of these languages supported?
No, you should not. Consider localization string as data.
The problem is when I try and copy and paste from a chinese site I get
strange glyphs like this [][][][] I presume its because my system font
is not chinese and it does not support that.
But you should be provided with such data and you should know it's encoding.
Also, I've suggest you to check internationalization libraries (like gettext) to prevent reinventing the wheel.

Microsoft SQL Server 2000 file records not in standard unicode, but made with Borland C++

I am recently trying to change our company's old program. One of the huge rocks in my way is that the old program was made with Borland C++, and it had its own way of connecting to the SQL Server 2000 database.
After 8 years, I'm trying to retire this program. But when I looked at the database, I got freaked out!
The whole database was in a vague language that was supposed to be Persian.
I'll give you a portion of the database converted to SQL Server 2005, so you can see it for yourself. I've spent many days trying to figure out how to decode this data. But so far no results has come out of it.
Link to the sample Database File
So please if you can tell me how to use them in Microsoft C#.net it will be much appreciated.
These are the datatypes used for them:
And this is how it looks:
Thanks a lot.
1) Analyse existing program and original database
Try to figure out how the C++ program stored Persian text in the database. What are the collations defined on the original server, database, and on column level.
Does the C++ program convert the data to be stored and retrieved from the database? If so, find out how.
It may well be that the program displays data in Persian, but does not store it in a compatible way. Or it uses a custom font that supports custom encoding. All this needs to be analyzed.
2) The screen shots looks as if everything Persian is encoded as ASCII characters higher than CHAR(128).
If this a standardized encoding or custom created?
3) To migrate the database, you most likely will need to convert the data mapping original characters to Unicode characters.
First recreate the tables using Unicode-enabled columns (NVARCHAR, NVARCHAR(MAX)) rather than CHAR and VARCHAR, which only support Latin or Extended Latin.
4) Even if you successfully migrated your data, SSMS may not correctly display the stored data due to font settings or OS support.
I summarized the difficulties of displaying Unicode in SSMS on my blog.
But first, you need to investigate the original database and application.