Pepper - QiChat: pronunciation of a word in a different language - pepper

I'd doing an italian dialog on Pepper with qichat language
I have to change the pronunce of a single word (i.e. Engineering) to English.
What is the qichat instruction that I have to write in the dialog box?
Thanks,
Debora

I think you have two options there:
Try different spellings (e.g. "Ingeniireng", ...) until Peppers pronunciation satisfies your expectations.
Inserting phonetic text doc

Related

Translation of Accelerators into Chinese

When translating an app (MFC in this case) into Chinese, what do I do with Accelerators?
Is F1 still used for Help?
What about things like CTRL-A? Will the translator know what to do those?
Any advice or links appreciated.
From Wikipedia:
Chinese keyboards are usually in US layout with/without Chinese input
method labels printed on keys.
They also have F-Keys, so F1 for help is fine.
Don't translate accelerators, keep them in latin alphabet. Ampersand accelerators within text are usually moved to the right of the text, changed to uppercase and wrapped in parenthesis. For example "E&nter the text:" becomes "输入文字(&N):". There is no whitespace between Chinese text and the first parenthesis.
This is how the Windows "run" dialog looks in Chinese (simplified):
And this is Notepad's menu:
You can see for yourself by installing a Chinese language pack and changing the primary display language to Chinese via the Windows settings app.

NLTK synset with other languages

Right now i'm trying to compare words from two different files, one english, one chinese. I have to identify if any of the english words are related to the chinese words and if they are, are they equal or is one a hypernym of the other. I can use synsets for english but what can i do about the chinese words?
It looks like there is a Chinese (cmn) WordNet available from a university in Taiwan: http://casta-net.jp/~kuribayashi/multi/ . If this WordNet has the same format as the English WordNet, then you can probably use the WordNetCorpusReader (http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#WordNetCorpusReader) in NLTK to import the Mandarin data. I don't know how you're doing your alignments or translations between the two datasets, but assuming you can map English to Chinese, this should help you figure out how the relation between two English words compares to the relation between two Mandarin words. Note that if your data uses the simplified script, you may also need to convert to the traditional script before using this cmn WordNet.

How to get sentence number from input?

It seems hard to detect a sentence boundary in a text. Quotation marks like .!? may be used to delimite sentences but not so accurate as there may be ambiguous words and quotations such as U.S.A or Prof. or Dr. I am studying Tperlregex library and Regular Expression Cookbook by Jan Goyvaerts but I do not know how to write the expression that detects sentence?
What may be comparatively accurate expression using Tperlregex in delphi?
Thanks
First, you probably need to arrive at your own definition of what a "sentence" is, then implement that definition. For example, how about:
He said: "It's OK!"
Is it one sentence or two? A general answer is irrelevant. Decide whether you want it to interpret it as one or two sentences, and proceed accordingly.
Second, I don't think I'd be using regular expressions for this. Instead, I would scan each character and try to detect sequences. A period by itself may not be enough to delimit a sentence, but a period followed by whitespace or carriage return (or end of string) probably does. This immediately lets you weed out U.S.A (periods not followed by whitespace).
For common abbreviations like Prof. an Dr. it may be a good idea to create a dictionary - perhaps editable by your users, since each language will have its own set of common abbreviations.
Each language will have its own set of punctuation rules too, which may affect how you interpret punctuation characters. For example, English tends to put a period inside the parentheses (like this.) while Polish does the opposite (like this). The same difference will apply to double quotes, single quotes (some languages don't use them at all, sometimes they are indistinguishable from apostrophes etc.). Your rules may well have to be language-specific, at least in part.
In the end, you may approximate the human way of delimiting sentences, but there will always be cases that can throw the analysis off. For example, assuming that you have a dictionary that recognizes "Prof." as an abbreviation, what are you going to do about
Most people called him Professor Jones, but to me he was simply The Prof.
Even if you have another sentence that follows and starts with a capital letter, that still won't help you know where the sentence ends, because it might as well be
Most people called him Professor Jones, but to me he was simply Prof. Bill.
Check my tutorial here http://code.google.com/p/graph-expression/wiki/SentenceSplitting. This concrete example can be easily rewritten to regular expressions and some imperative code.
It will be wise to use a NLP processor with a pre-trained model. EnglishSD.nbin is one such model that is available for OpenNLP and it can be used in Visual Studio with SharpNLP.
The advantage of using this method is numerous. For example consider the input
Prof. Jessica is a wonderful woman. She is a native of U.S.A. She is married to Mr. Jacob Jr.
If you are using a regex split, for example
string[] sentences = Regex.Split(text, #"(?<=['""A-Za-z0-9][\.\!\?])\s+(?=[A-Z])");
Then the above input will be split as
Prof.
Jessica is a wonderful woman.
She is a native of U.
S.
A.
She is married to Mr.
Jacob Jr.
However the desired output is
Prof. Jessica is a wonderful woman.
She is a native of U.S.A. She is married to Mr. Jacob Jr.
This kind of logical sentence split can be achieved only using trained models from OpenNLP project. The method is as simple as this.
private string mModelPath = #"C:\Users\ATS\Documents\Visual Studio 2012\Projects\Google_page_speed_json\Google_page_speed_json\bin\Release\";
private OpenNLP.Tools.SentenceDetect.MaximumEntropySentenceDetector mSentenceDetector;
private string[] SplitSentences(string paragraph)
{
if (mSentenceDetector == null)
{
mSentenceDetector = new OpenNLP.Tools.SentenceDetect.EnglishMaximumEntropySentenceDetector(mModelPath + "EnglishSD.nbin");
}
return mSentenceDetector.SentenceDetect(paragraph);
}
where mModelPath is the path of the directory containing the nbin file.
The mSentenceDetector is derived from the OpenNLP dll.
You can get the desired output by
string[] sentences = SplitSentences(text);
Kindly read through this article I have written for integrating SharpNLP with your Application in Visual Studio to make use of the NLP tools

How to translate gendered pronouns in Qt's QTranslator

I'm looking at an open source QT4 game (http://cockatrice.de), and it uses QTranslator for internationalization. However, every phrase that refers to the player uses a masculine pronoun ("his hand", "his deck", "he does such-and-such", etc.)
My first thought for fixing this would be to just replace every instance of "his" or "he" with a variable that is set to the correct gendered pronoun, but I don't know how this would affect the translations. But for translating/internationalization, this might break, especially if the pronoun's gender affects the rest of the phrase.
Has anyone else worked with this sort of problem before? Is it possible to separate out the pronoun and the phrase in at least the easy languages (like English, in this case)? will the translation file have to include a copy of each phrase for each gendered pronoun? (doubling, at least, the size of the translation file)?
some sample code from how it's currently set up:
calling source:
case CaseNominative: return hisOwn ? tr("his hand", "nominative") : tr("%1's hand", "nominative").arg(ownerName);
case CaseGenitive: return hisOwn ? tr("of his hand", "genitive") : tr("of %1's hand", "genitive").arg(ownerName);
case CaseAccusative: return hisOwn ? tr("his hand", "accusative") : tr("%1's hand", "accusative").arg(ownerName);
english translation file:
<message>
<location filename="../src/cardzone.cpp" line="51"/>
<source>his hand</source>
<comment>nominative</comment>
<translation type="unfinished"></translation>
</message>
maybe split:
tr("his hand")
into:
tr("his").append("hand")
and then translate "his", "her", "its", ... seperately.
but there will be some problems in languages like italian, where there are personal suffixes...
alternative:
tr("his hand");
tr("her hand");
//...
and translate them all.
EDIT: the second alternative shown is also used in many other games (like oolite, ...), because it's the only way to be sure there will not be problems (like suffixes instead of prefixes...) in other languages.
btw what would Konami or Nintendo say when they'll realize you are creating an open source Yu-Gi-Oh!/Pokemon playing field? nobody will buy their cards :-P

How can I detect Russian spam posts with Perl?

I have an English language forum site written in perl that is continually bombarded with spam in Russian. Is there a way using Perl and regex to detect Russian text so I can block it?
You can use the following to detect Cyrillic characters (used in Russian):
[\u0400-\u04FF]+
If you really just want Russian characters, you can take a look at the aforesaid document, which contains the exact range used for the Basic Russian alphabet which is [\u0410-\u044F]. Of course you'd also need to consider extension Cyrillic characters that are used exclusively in Russian -- also mentioned in the document.
using the unicode cyrillic charset as suggested by JG is fine if everything is encoded as such. however, this is spam and for the most part, things are not. additionally, spammers will very often use a mix of charsets in spams which further screws up this approach.
i find that the best way (or at least the preliminary step in the process) of detecting russian spam is to grep for the most commonly used charsets:
koi8-r
windows-1251
iso-8859-5
next step after that would be to try some language detection algorithms on what remains. if it's a big enough problem, use a paid service such as google translate (which also "detects") or xerox. these services provide IMO the best language detection around.