QSqlDatabase accent words - c++

how to search with QSqlDatabase c++ words with accent, both uppercase and lowercase? For example "atún", "camión", ...
db.exec("PRAGMA encoding = UTF-16"); don't work

Related

I want to remove symbols from a string in dart

I want to remove all symbols except for characters (Japanese hiragana, kanji, and Roman alphabet ) that unmatch this regex.
var reg = RegExp(
r'([\u3040-\u309F]|\u3000|[\u30A1-\u30FC]|[\u4E00-\u9FFF]|[a-zA-Z]|[々〇〻])');
I don't know what to put in this "?".
text=text.replaceAll(?,"");
a="「私は、アメリカに行きました。」、'I went to the United States.'"
b="私はアメリカに行きましたI went to the United States"
I want to make a into b.
You can use
String a = "「私は、アメリカに行きました。」、'I went to the United States.'";
a = a.replaceAll(RegExp(r'[^\p{L}\p{M}\p{N}\s]+', unicode: true), '') );
Also, if you just want to remove any punctuation or math symbols, you can use
.replaceAll(RegExp(r'[\p{P}\p{S}]+', unicode: true), '')
Output:
私はアメリカに行きましたI went to the United States
The [^\p{L}\p{M}\p{N}\s]+ regex matches one or more chars other than letters (\p{L}), diacritics (\p{M}), digits (\p{N}) and whitespace chars (\s).
The [\p{P}\p{S}]+ regex matches one or more punctuation proper (\p{P}) or match symbol (\p{S}) chars.
The unicode: true enables the Unicode property class support in the regex.
You can need to specify the Pattern (RegEx) you want to apply on your replaceAll method.
// Creating the regEx/Pattern
var reg = RegExp(r'([\u3040-\u309F]|\u3000|[\u30A1-\u30FC]|[\u4E00-\u9FFF]|[a-zA-Z]|[々〇〻])');
// Applying it to your text.
text=text.replaceAll(reg,"");
You can learn more about it here:
https://api.flutter.dev/flutter/dart-core/String/replaceAll.html

QString convert camel case to space separated words

I am trying to convert a camel cased QString into lowercased words separated by spaces. I currently have:
QString camelCase = "thisIsACamelCaseWord"
QString unCamelCase = camelCase.replace(QRegularExpression("([A-Z])", " $1")).toLower();
Which seems to work here,
"this Is A Camel Case Word"
but it is returning with:
"this $1s $1 $1amel $1ase $1ord"
Since QRegularExpression uses PRCE the back reference syntax is '\0', '\1' and so on as explained in the documentation.

Regular Expression Arabic characters and numbers only

I want Regular Expression to accept only Arabic characters, Spaces and Numbers.
Numbers are not required to be in Arabic.
I found the following expression:
^[\u0621-\u064A]+$
which accepts only only Arabic characters while I need Arabic characters, Spaces and Numbers.
Just add 1-9 (in Unicode format) to your character-class:
^[\u0621-\u064A0-9 ]+$
OR add \u0660-\u0669 to the character-class which is the range of Arabic numbers :
^[\u0621-\u064A\u0660-\u0669 ]+$
You can use:
^[\u0621-\u064A\s\p{N}]+$
\p{N} will match any unicode numeric digit.
To match only ASCII digit use:
^[\u0621-\u064A\s0-9]+$
EDIT: Better to use this regex:
^[\p{Arabic}\s\p{N}]+$
RegEx Demo
you can use
[ء-ي]
it worked for me in javascript Jquery forme.validate rules
for my example I want to force user to insert 3 characters
[a-zA-Zء-ي]
use this
[\u0600-\u06FF]
it worked for me on visual studio
With a lot of try and edit i got this for Persian names:
[گچپژیلفقهمو ء-ي]+$
^[\u0621-\u064Aa-zA-Z\d\-_\s]+$
This regex must accept Arabic letters,English letters, spaces and numbers
Simple, use this code:
^[؀-ۿ]+$
This works for Arabic/Persian even numbers.
function HasArabicCharacters(string text)
{
var regex = new RegExp(
"[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufc3f]|[\ufe70-\ufefc]");
return regex.test(text);
}
To allow Arabic + English Letters with min&max allowed number of characters in a field, try this, tested 100%:
^[\u0621-\u064A\u0660-\u0669a-zA-Z\-_\s]{4,35}$
A- Arabic English letters Allowed.
B- Numbers not allowed.
C- {4,35} means the Min,Max characters allowed.
Update: On submit: Accepted English words with spaces, but the Arabic words with spaces could not be submitted!
All cases tested
Regex for English and Arabic Numbers only
function HasArabicEnglishNumbers(text)
{
var regex = new RegExp(
"^[\u0621-\u064A0-9]|[\u0621-\u064A\u0660-\u0669]+$");
return regex.test(text);
}
#Pattern(regexp = "^[\\p{InArabic}\\s]+$")
Accept arabic digit and character
This one allows Arabic letters, Arabic numbers and English numbers
var arabic = RegExp("^[\u0621-\u064A\u0660-\u0669 1-9]+\$");
In PHP, use this:
preg_replace("/\p{Arabic}/u", 'x', 'abc123ابت');// will replace arabic letters with "x".
Note: For \p{Arabic} to match arabic letters, you need to pass u modifier (for unicode) at the end.
The posts above include much more than arabic (MSA) characters, it includes persian, urdu, quranic symbols, and some other symbols. The arabic MSA characters are only (see Arabic Unicode)
[\u0621-\u063A\u0641-\u0652]
I always use these to control user input in my apps
public static Regex IntegerString => new(#"^[\s\da-zA-Zء-ي]+[^\.]*$");
public static Regex String => new(#"^[\sa-zA-Zء-ي]*$");
public static Regex Email => new(#"^[\d\#\.a-z]*$");
public static Regex Phone => new(#"^[\d\s\(\)\-\+]+[^\.]*$");
public static Regex Address => new(#"^[\s\d\.\,\،\-a-zA-Zء-ي]*$");
public static Regex Integer => new(#"^[\d]+[^\.]*$");
public static Regex Double => new(#"^[\d\.]*$");
This is useful example
public class Test {
public static void main(String[] args) {
String thai = "1ประเทศไทย1ประเทศไทย";
String arabic = "1عربي1عربي";
//correct inputs
System.out.println(thai.matches("[[0-9]*\\p{In" + Character.UnicodeBlock.THAI.toString() + "}*]*"));
System.out.println(arabic.matches("[[0-9]*\\p{In" + Character.UnicodeBlock.ARABIC.toString() + "}*]*"));
//incorrect inputs
System.out.println(arabic.matches("[[0-9]*\\p{In" + Character.UnicodeBlock.THAI.toString() + "}*]*"));
System.out.println(thai.matches("[[0-9]*\\p{In" + Character.UnicodeBlock.ARABIC.toString() + "}*]*"));
}
}
[\p{IsArabic}-[\D]]
An Arabic character that is not a non-digit

Strip Chinese Characters from a string (vba)

I am using Microsoft Project VBA to translate my activity names from English to Chinese.
My problem is I have some Chinese translations embedded in some of the English activity names. I want to strip out the Chinese characters before passing the string to Microsoft Translator.
Any ideas as to how I can do that?
You can use a Regexp to strip the Chinese unicode characters
Wikipedia lists the relevant characters below
Sub Test()
Dim myString as String
myString = "This is my string with a " & ChrW$(&H6C49) & " in it."
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[\u4E00-\u9FFF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF]+"
MsgBox .Replace(myString, vbNullString)
End With
End Sub
So this regexp will strip out these ranges. I have used aldo.roman.nurena's string example
You have to use ChrW$() as this:
MyString = "This is my string with a " & ChrW$(&H6C49) & " in it."
The H6C49 is available (thanks God for that) on Unicode as CJK codes (Chinese, Japanese and Korean). See this to take a look of the characters range.
So, you have to check the character Unicode code and then compare if it is already on the CJK range so as to translate it or not.
There is also a good explanation and even a program to translate strings here

regex to count english words as single char inside char count of asian words

need some help from a regex jedi master:
If I have a string of mb chars (specifically, Japanese, Korean or Chinese) with English words sprinkled throughout, I would like to count:
asian characters as 1 per single char
english "words" (no dictionary check needed - just a string of consecutive english letters) as a single char.
English only is fine - don't worry about special spanish, swedish, etc. chars.
I am searching for a regex pattern I can use to count these strings, that will function in php and js.
Example:
これは猫です、けどKittyも大丈夫。
should count as 13 chars.
thanks for your help!
jeff
What ever you are trying to achieve, this will help you:
To count only Hiragana+Katakana+Kanji (Japanese) Chars (excluding punctuation marks):
var x = "これは猫です、けどKittyも大丈夫。";
x.match(/[ぁ-ゖァ-ヺー一-龯々]/g).length; //Result: 12 : これは猫ですけども大丈夫
Updated:
To count only words in Alphabet:
x.match(/\w+/g).length; //Result: 1 : "Kitty"
All in one line (as function):
function myCount(str) {
return str.match(/[ぁ-ゖァ-ヺー一-龯々]|\w+/g).length;
}
alert(myCount("これは猫です、けどKittyも大丈夫。")); //13
alert(myCount("これは犬です。DogとPuppyもOKですね!")); //14
These are the arrays resulted of match:
["こ", "れ", "は", "猫", "で", "す", "け", "ど", "Kitty", "も", "大", "丈", "夫"]
["こ", "れ", "は", "犬", "で", "す", "Dog", "と", "Puppy", "も", "OK", "で", "す", "ね"]
Updated (JAP, KOR, CH):
function myCount(str) {
return str.match(/[ぁ-ㆌㇰ-䶵一-鿃々가-힣-豈ヲ-ン]|\w+/g).length;
}
These will cover around 99% of the Japanese, Chinese and Korean. You may need to manually add extra characters that are not included such as "〶".
A very good reference is:
http://www.tamasoft.co.jp/en/general-info/unicode.html
This should solve your question.
OK, so I would do two runs: First count the occurrences of the English words and then of the Asian ones. This is a JS example, it might be different in PHP. In JS, only ASCII chars match \w.
string = "これは猫です、けどKittyも大丈夫";
var m = string.match(/\w+/gm);
var e_count = m.length; // is 1
Next count the Asian chars.
m = string.match(/([^\w\s\d])/gm); // any non-whitespace, non-word, non-digit chars
var a_count = m.length; // is 13
You might have to tweak it a bit. But in JS, you can add up e_count and a_count, and you should be good to go.
Also check out Rubular: http://www.rubular.com
Johannes
Something like /[[:ascii:]]+|./ will match one non-ASCII character or one or more ASCII characters. Probably is that'll give 15. So it seems that you want to ignore punctuation. So possibly: /[A-Za-z]+|[^[:punct:]]/
$ perl -E 'use utf8; $f = "これは猫です、けどKittyも大丈夫。"; ++$c while $f =~ /[A-Za-z]+|[^[:punct:]]/g; say $c'
13
So, that works in Perl at least. Probably in JS and PHP as well, provided their [[:punct:]] understands Unicode.
The alternative approach is to filter out stuff instead.