What is the preferred and best way in C++ to do this: Split the letters of the alphabeth into 7 groups so I can later ask if a char is in group 1, 3 or 4 etc... ? I can of course think of several ways of doing this myself but I want to know the standard and stick with it when doing this kinda stuff.
0
AEIOUHWY
1
BFPV
2
CGJKQSXZ
3
DT
4
MN
5
L

6
R
best way in C++ to do this: Split the letters of the alphabeth into 7 groups so I can later ask if a char is in group 1, 3 or 4 etc... ?
The most efficient way to do the "split" itself is to have an array from letter/char to number.
// A B C D E F G H...
const char lookup[] = { 0, 1, 2, 3, 0, 1, 2, 0...
A switch/case statement's another reasonable choice - the compiler can decide itself whether to create an array implementation or some other approach.
It's unclear what use of those 1-6 values you plan to make, but an enum appears a reasonable encoding choice. That has the advantage of still supporting any use you might have for those specific numeric values (e.g. in < comparisons, streaming...) while being more human-readable and compiler-checked than "magic" numeric constants scattered throughout the code. constant ints of any width are also likely to work fine, but won't have a unifying type.
Create a lookup table.
int lookup[26] = { 0, 1, 2, 3, 0, 1, 2, 0 .... whatever };
inline int getgroup(char c)
{
return lookup[tolower(c) - 'a'];
}
call it this way
char myc = 'M';
int grp = lookup(myc);
Error checks omitted for brevity.
Of course, depending on what the 7 groups represent , you can make enums instead of using 0, 1, 2 etc.
Given the small amount of data involved, I'd probably do it as a bit-wise lookup -- i.e., set up values:
cat1 = 1;
cat2 = 2;
cat3 = 4;
cat4 = 8;
cat5 = 16;
cat6 = 32;
cat7 = 64;
Then just create an array of 26 values, one for each letter in the alphabet, with each containing the value of the category for that letter. When you want to classify a letter, you just categories[ch-'A'] to find it.
Related
I asked a similar question earlier. I'm attempting to fill in missing values such that observations 0-458 are e 0, 445-832 are 1, and 832-850 are 0.
The following code allowed me to replace missing values in observations 1-160 with 1, with the rest of the observations set to 0.
replace myvar = cond(_n <= 160, 1, 0) if missing(myvar)
How can I interpret this command for what my current purpose?
There is no observation 0. I assume you meant observation 1. Your rules are ambiguous otherwise as you give two rules for 445-458 and two rules for 832.
I will give code for a minimal data example.
clear
set obs 6
gen myvar = .
Assume you want myvar in observations 1/2 to be 0, 3/4 to be 1, 5/6 to be 0.
Method 1
replace myvar = inrange(_n, 3, 4) if missing(myvar)
Method 2
replace myvar = cond(_n <= 2, 0, cond(_n <= 4, 1, 0))
Method 3
replace myvar = 0 if missing(myvar) in 1/2
replace myvar = 1 if missing(myvar) in 3/4
replace myvar = 0 if missing(myvar) in 5/6
In general, however, replacing in terms of observation numbers is not best technique. It is utterly dependent on sort order. Also, if there are criteria in terms of other variables, they are preferable as making more and better sense in records of reproducible research, to yourself in the future and to colleagues, reviewers and yet others too.
I often use enums for bitflags like the following
enum EventType {
NODE_ADDED = 1 << 0,
NODE_DELETED = 1 << 1,
LINK_ADDED = 1 << 2,
LINK_DELETED = 1 << 3,
IN_PIN_ADDED = 1 << 4,
IN_PIN_DELETED = 1 << 5,
IN_PIN_CHANGE = 1 << 6,
OUT_PIN_ADDED = 1 << 7,
OUT_PIN_DELETED = 1 << 8,
OUT_PIN_CHANGE = 1 << 9,
ALL = NODE_ADDED | NODE_DELETED | ...,
};
Is there a clean less repetitive way to define an ALL flag that combines all other flags in an enum? For small enums the above works well, but lets say there are 30 flags in an enum, it gets tedious to do it this way. Does something work (in general) like this
ALL = -1
?
Use something that'll always cover every other option, like:
ALL = 0xFFFFFFFF
Or as Swordfish commented, you can flip the bits of an unsigned integer literal:
ALL = ~0u
To answer your comment, you can explicitly tell the compiler what type you want your enum to have:
enum EventType : unsigned int
The root problem here is how may one-bits you need. That depends on the number of enumerators previously. Trying to define ALL inside the enum makes that a case of circular logic
Instead, you have to define it outside the enum:
const auto ALL = (EventType) ~EventType{};
EventType{} has sufficient zeroes, ~ turns it into an integral type with enough ones, so you need another cast back to EventType
I have an arbitrary Unicode string that represents a number, such as "2", "٢" (U+0662, ARABIC-INDIC DIGIT TWO) or "Ⅱ" (U+2161, ROMAN NUMERAL TWO). I want to convert that string into an int. I don't care about specific locales (the input might not be in the current locale); if it's a valid number then it should get converted.
I tried QString.toInt and QLocale.toInt, but they don't seem to get the job done. Example:
bool ok;
int n;
QString s = QChar(0x0662); // ARABIC-INDIC DIGIT TWO
n = s.toInt(&ok); // n == 0; ok == false
QLocale anyLocale(QLocale::AnyLanguage, QLocale::AnyScript, QLocale::AnyCountry);
n = anyLocale.toInt(s, &ok); // n == 0; ok == false
QLocale cLocale = QLocale::C;
n = cLocale.toInt(s, &ok); // n == 0; ok == false
QLocale arabicLocale = QLocale::Arabic; // Specific locale. I don't want that.
n = arabicLocale.toInt(s, &ok); // n == 2; ok == true
Is there a function I am missing?
I could try all locales:
QList<QLocale> allLocales = QLocale::matchingLocales(QLocale::AnyLanguage, QLocale::AnyScript, QLocale::AnyCountry);
for(int i = 0; i < allLocales.size(); i++)
{
n = allLocales[i].toInt(s, &ok);
if(ok)
break;
}
But that feels slightly hackish. Also, it does not work for all strings (e.g. Roman numerals, but that's an acceptable limitation). Are there any pitfalls when doing it that way, such as conflicting rules in different locales (cf. Turkish vs. non-Turkish letter case rules)?
I' not aware of any ready to use package which does this (but
maybe ICU supports it), but it isn't hard to do if you really
want to. First, you should download the UnicodeData.txt file
from http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
This is an easy to parse ASCII file; the exact syntax is
described in http://www.unicode.org/reports/tr44/tr44-10.html,
but for your purposes, all you need to know is that each line in
the file consists of semi-colon separated fields. The first
field contains the character code in hex, the third field the
"general category", and if the third field is "Nd" (numeric,
decimal), the seventh field contains the decimal value.
This file can easily be parsed using Python or a number of other
scripting languages, to build a mapping table. You'll want some
sort of sparse representation, since there are over a million
Unicode characters, of which very few (a couple of hundred) are
decimal digits. The following Python script will give you a C++
table which can be used to initialize an
std::map<int, int>;. If the character is
in the map, the mapped element is its value.
Whether this is sufficient or not depends on your application.
It has several weaknesses:
It requires extra logic to recognize when two successive
digits are in different alphabets. Presumably a sequence "1١"
should be treated as two numbers (1 and 1), rather than as one
(11). (Because all of the sets of decimal digits are in 10
successive codes, it would be fairly easy, once you know the
digit, to check whether the preceding digit character was in the
same set.)
It ignores non-decimal digits, like ௰ or ൱ (Tamil ten and
Malayam one hundred). There aren't that many of them, and they are
also in the UnicodeData.txt file, so it might be possible to
find them manually and add them to the table. I don't know
myself, however, how they combine with other digits when numbers
have been composed.
If you're converting numbers, you might have to worry about
the direction. I'm not sure how this is handled (but there is
documentation at the Unicode site); in general, text will appear
in its natural order. In the case of Arabic and related
languages, when reading in the natural order, the low order
digits appear first: something like "١٢" (literally "12",
but because the writing is from right to left, the digits will
appear in the order "21") should be interpreted as 12, and not 21. Except that I'm not sure whether a change direction mark is
present or not. (The exact rules are described in the
documentation at the Unicode site; in the UnicodeData.txt file,
the fifth field—index 4—gives this information. I
think if it's anything but "AN", you can assume the big-endian
standard used in Europe, but I'm not sure.)
Just to show how simple this is, here's the Python script to
parse the UnicodeData.txt file for the digit values:
print('std::pair<int, int> initUnicodeMap[] = {')
for line in open("UnicodeData.txt"):
fields = line.split(';')
if fields[2] == 'Nd':
print(' {{{:d}, {:d}}},'.format(int(fields[0], 16), int(fields[7])))
print('};')
If you're doing any work with Unicode, this files is a gold mine
for generating all sorts of useful tables.
You can get the numeric equivalent of an unicode character with the method QChar::digitValue:
int value = QChar::digitValue((uint)0x0662);
It will return -1 if the character does not have numeric value.
See the documentation if you need more help, I don't really know much about c++/qt
Chinese numerals mentioned in that wikipedia article belong to 0x4E00-0x9FCC. There is no useful metadata about individual characters in this range:
4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FCC;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
So if you wish to map chinese numerals to integers, you must do that mapping yourself, simple as that.
Here's simple mapping of the symbols in the wikipedia article where a single symbol maps to some single number:
0x96f6,0x3007 = 0
0x58f9,0x4e00,0x5f0c = 1
0x8cb3,0x8d30,0x4e8c,0x5f0d,0x5169,0x4e24 = 2
0x53c3,0x53c1,0x4e09,0x5f0e,0x53c3,0x53c2,0x53c4,0x53c1 = 3
0x8086,0x56db,0x4989 = 4
0x4f0d,0x4e94 = 5
0x9678,0x9646,0x516d = 6
0x67d2,0x4e03 = 7
0x634c,0x516b = 8
0x7396,0x4e5d = 9
0x62fe,0x5341,0x4ec0 = 10
0x4f70,0x767e = 100
0x4edf,0x5343 = 1000
0x842c,0x842c,0x4e07 = 10000
0x5104,0x5104,0x4ebf = 100000000
0x5e7a = 1
0x5169,0x4e24 = 2
0x5440 = 10
0x5ff5,0x5eff = 20
0x5345 = 30
0x534c = 40
0x7695 = 200
0x6d1e = 0
0x5e7a = 1
0x4e24 = 2
0x5200 = 4
0x62d0 = 7
0x52fe = 9
I have to implement small multimage graphic control, which in essence is an array of 9 images, shown one by one. The final goal is to act as minislider.
Now, this graphic control is going to receive various integer ranges: from 5 to 25 or from 0 to 7 or from -9 to 9.
If I am going to use proportion - "rule of three" I am afraid is not technically suistainable because it can be a source of errors. My guess is to use some lookup tables, but has anyone an good advice for approach?
Thnx
I'm not sure look up tables are required. You can get from your input value to an image index between 0 and 9 proportionally:
int ConvertToImageArrayIndex(int inputValue)
{
int maxInputFromOtherModule = 25;
int minInputFromOtherModule = 5;
// +1 required so include both min and max input values in possible range.
// + 0.5 required so that round to the nearest image instead of always rounding down.
// 8.0 required to get to an output range of 9 possible indexes [0..8]
int imageIndex = ( (float)((inputValue-minInputFromOtherModule) * 8.0) / (float)(maxInputFromOtherModule - minInputFromOtherModule + 1) ) + 0.5;
return imageIndex;
}
yes, a lookup table is a good solution
int lookup[9] = {5, 25, ... the other values };
int id1 = floor(slider);
int id2 = id1+1;
int texId1 = lookup[id1];
int texId2 = lookup[id2];
interpolate(texId1, texId2, slider - float(id1));
I've found functions which follow the pattern of 1 / bc produce nice curves which can be coupled with interpolation functions really nicely.
The way I use the function is by treating 'c' as the changing value, i.e. the interpolation value between 0 and 1, while varying b for 'sharpness'. I use it to work out an interpolation value between 0 and 1, so generelly the function I use is as such:
float interpolationvalue = 1 - 1/pow(100,c);
linearinterpolate( val1, val2, interpolationvalue);
Up to this point I've been using a hacked approach to make it 'work' since when interpolation value = 1 the value is very close to but not quite 0.
So I was wondering, is there a function in the form of or one which can reproduce similar curves to the ones produced by 1 / bc where at c = 0 result = 1 and c = 1 result = 0.
Or even C = 0, result = 0 and C = 1 result = 1.
Thanks for any help!
For interpolation the approach offering the most flexibility is using splines, in your case quadratic splines would seem sufficient. The wikipedia page is math heavy, but you can find adapted desciptions on google.
1 - c ^ b with small values for b? Another option would be to use a cubic polynomial and specifying the slope at 0 and 1.
You could use a similar curve of the form A - 1 / b^(c + a), choosing values of A and a to match your constraints. So, for c = 0, result = 1:
1 = A - 1/b^a => A = 1 + 1/b^a
and for c = 1, result = 0:
0 = A - 1/b^(1+a) => A = 1/b^(1+a)
Combining these, we can find a in terms of b:
1 + 1/b^a = 1/b^(1+a)
b^(1+a) + b = 1
b * (b^a - 1) = 1
b^a = 1/b - 1
So:
a = log_b(1/b - 1) = log(1/b - 1) / log(b)
A = 1 + 1/b^a = 1 / (1-b)
In real numbers, the ones that mathematician use, no function of the form you specify is ever going to return 0, division can't do that. (1/x)==0 has no real solutions. In floating point arithmetic, the poor relation of real arithmetic that computers use, you could write 1/(MAX_FP_VALUE^1) which will give you as close to 0 as you are ever going to get (actually, it might give you a NaN or one of the other odd returns that IEEE 754 allows).
And, as I'm sure you've noticed, 1/(b^0) always returns 1 since b^0 is, by definition of 0-th power, always 1.
So, no function with c = 0 will produce a result of 0.
For c = 1, result = 1, set b = 1
But I guess this is only a partial answer, I'm not terribly sure I understand what you are trying to do.
Regards
Mark