C++ MFC RegEx issue - c++

I am using a Regex to restrict the some characters entered into a Textbox
I am using the below for the allowed characters
CAtlRegExp<> regex;
CString csText2 = "Some Test £"
CString m_szRegex = "([a-zA-Z0-9\\.\\,\";\\:'##$£?\\+\\*\\-\\/\\%! ()])";
REParseError status = regex.Parse(m_szRegex, true);
CAtlREMatchContext<> mc;
if (!regex.Match(csText2, &mc))
{
AfxMessageBox("Inavlid Char")
}
This works fine, except for the £ symbol. It doesn't seem to pick this up
Can anyone advise on what I am missing
thanks

This seems to be a bug that affects all Extended ASCII characters (those higher than 0x7F).
The character value is converted to integer and used as an index into
some kind of attributes array. Since char is signed, it undergoes sign
expansion, so any character above 0x7F will cause an overflow.
size_t u = static_cast<size_t>(static_cast<_TUCHAR>(* ((RECHAR *) sz)));
if (pBits[u >> 3] & 1 << (u & 0x7))
You can find more discussions on this topic here: CAtlRegExp crashes with pound sign!

Related

What did I do CORRECTLY?-comparing index from string using .at(), error messages [duplicate]

When should I use single quotes and double quotes in C or C++ programming?
In C and in C++ single quotes identify a single character, while double quotes create a string literal. 'a' is a single a character literal, while "a" is a string literal containing an 'a' and a null terminator (that is a 2 char array).
In C++ the type of a character literal is char, but note that in C, the type of a character literal is int, that is sizeof 'a' is 4 in an architecture where ints are 32bit (and CHAR_BIT is 8), while sizeof(char) is 1 everywhere.
Some compilers also implement an extension, that allows multi-character constants. The C99 standard says:
6.4.4.4p10: "The value of an integer character constant containing more
than one character (e.g., 'ab'), or
containing a character or escape
sequence that does not map to a
single-byte execution character, is
implementation-defined."
This could look like this, for instance:
const uint32_t png_ihdr = 'IHDR';
The resulting constant (in GCC, which implements this) has the value you get by taking each character and shifting it up, so that 'I' ends up in the most significant bits of the 32-bit value. Obviously, you shouldn't rely on this if you are writing platform independent code.
Single quotes are characters (char), double quotes are null-terminated strings (char *).
char c = 'x';
char *s = "Hello World";
'x' is an integer, representing the numerical value of the
letter x in the machine’s character set
"x" is an array of characters, two characters long,
consisting of ‘x’ followed by ‘\0’
I was poking around stuff like: int cc = 'cc'; It happens that it's basically a byte-wise copy to an integer. Hence the way to look at it is that 'cc' which is basically 2 c's are copied to lower 2 bytes of the integer cc. If you are looking for a trivia, then
printf("%d %d", 'c', 'cc'); would give:
99 25443
that's because 25443 = 99 + 256*99
So 'cc' is a multi-character constant and not a string.
Cheers
Single quotes are for a single character. Double quotes are for a string (array of characters). You can use single quotes to build up a string one character at a time, if you like.
char myChar = 'A';
char myString[] = "Hello Mum";
char myOtherString[] = { 'H','e','l','l','o','\0' };
single quote is for character;
double quote is for string.
In C, single-quotes such as 'a' indicate character constants whereas "a" is an array of characters, always terminated with the \0 character
Double quotes are for string literals, e.g.:
char str[] = "Hello world";
Single quotes are for single character literals, e.g.:
char c = 'x';
EDIT As David stated in another answer, the type of a character literal is int.
A single quote is used for character, while double quotes are used for strings.
For example...
printf("%c \n",'a');
printf("%s","Hello World");
Output
a
Hello World
If you used these in vice versa case and used a single quote for string and double quotes for a character, this will be the result:
printf("%c \n","a");
printf("%s",'Hello World');
output :
For the first line. You will get a garbage value or unexpected value or you may get an output like this:
�
While for the second statement, you will see nothing. One more thing, if you have more statements after this, they will also give you no result.
Note: PHP language gives you the flexibility to use single and double-quotes easily.
Use single quote with single char as:
char ch = 'a';
here 'a' is a char constant and is equal to the ASCII value of char a.
Use double quote with strings as:
char str[] = "foo";
here "foo" is a string literal.
Its okay to use "a" but its not okay to use 'foo'
Single quotes are denoting a char, double denote a string.
In Java, it is also the same.
While I'm sure this doesn't answer what the original asker asked, in case you end up here looking for single quote in literal integers like I have...
C++14 added the ability to add single quotes (') in the middle of number literals to add some visual grouping to the numbers.
constexpr int oneBillion = 1'000'000'000;
constexpr int binary = 0b1010'0101;
constexpr int hex = 0x12'34'5678;
constexpr double pi = 3.1415926535'8979323846'2643383279'5028841971'6939937510;
In C & C++ single quotes is known as a character ('a') whereas double quotes is know as a string ("Hello"). The difference is that a character can store anything but only one alphabet/number etc. A string can store anything.
But also remember that there is a difference between '1' and 1.
If you type
cout<<'1'<<endl<<1;
The output would be the same, but not in this case:
cout<<int('1')<<endl<<int(1);
This time the first line would be 48. As when you convert a character to an int it converts to its ascii and the ascii for '1' is 48.
Same, if you do:
string s="Hi";
s+=48; //This will add "1" to the string
s+="1"; This will also add "1" to the string
different way to declare a char / string
char char_simple = 'a'; // bytes 1 : -128 to 127 or 0 to 255
signed char char_signed = 'a'; // bytes 1: -128 to 127
unsigned char char_u = 'a'; // bytes 2: 0 to 255
// double quote is for string.
char string_simple[] = "myString";
char string_simple_2[] = {'m', 'S', 't', 'r', 'i', 'n', 'g'};
char string_fixed_size[8] = "myString";
char *string_pointer = "myString";
char string_poionter_2 = *"myString";
printf("char = %ld\n", sizeof(char_simple));
printf("char_signed = %ld\n", sizeof(char_signed));
printf("char_u = %ld\n", sizeof(char_u));
printf("string_simple[] = %ld\n", sizeof(string_simple));
printf("string_simple_2[] = %ld\n", sizeof(string_simple_2));
printf("string_fixed_size[8] = %ld\n", sizeof(string_fixed_size));
printf("*string_pointer = %ld\n", sizeof(string_pointer));
printf("string_poionter_2 = %ld\n", sizeof(string_poionter_2));

Conversion from int to char failed and console prints weird symbol

I am having a weird issue and I don't know how to explain it. When I run this code it prints this symbol -> .
This is my code:
#include <iostream>
int main() {
int num = 1;
char number = num;
std::cout<<number<<std::endl;
system("PAUSE");
return 0;
}
I don't understand why. Normally it should convert the integer to char. I am using Dev C++ and my language standard is ISO C++11. I am programming for 4 years now and this is the first time I get something like this. I hope I explained my issue and if someone can help me I will be grateful.
Conversion from int to char failed
Actually, int was successfully converted to char.
Normally it should convert the integer to char.
That's what it did. The result of the conversion is char with the value 1.
Computers use a "character encoding". Each symbol that you see on the screen is encoded as a number. For example (assuming ASCII or compatible encoding) the value of 'a' character is 97.
A char with value of 1 is not the same as char with the value that encodes the character '1'. As such, when you print a character with value 1, you don't see the number 1, but the character that the value 1 encodes. In the ASCII and compatible encodings, 1 encodes a non-visible symbol "start of heading".
I wanted to print 1 as a char.
You can do it like this:
std::cout << '1' << '\n';
Then, since 4 years you seem to misunderstand what char is. It's not directly a character, but a number. It's the encoding that turns that number into a readable character.
Essentially, char i=1 is not the same as char i='1' (ascii table).

VS Intellisense shows escaped characters for some (not all) byte constants

In Visual Studio C++ I have defined a series of channelID constants with decimal values from 0 to 15. I have made them of type uint8_t for reasons having to do with the way they are used in the embedded context in which this code runs.
When hovering over one of these constants, I would like intellisense to show me the numeric value of the constant. Instead, it shows me the character representation. For some non-printing values it shows an escaped character representing some ASCII value, and for others it shows an escaped octal value for that character.
const uint8_t channelID_Observations = 1; // '\001'
const uint8_t channelID_Events = 2; // '\002'
const uint8_t channelID_Wav = 3; // '\003'
const uint8_t channelID_FFT = 4; // '\004'
const uint8_t channelID_Details = 5; // '\005'
const uint8_t channelID_DebugData = 6; // '\006'
const uint8_t channelID_Plethysmography = 7; // '\a'
const uint8_t channelID_Oximetry = 8; // 'b'
const uint8_t channelID_Position = 9; // ' ' ** this is displayed as a space between single quotes
const uint8_t channelID_Excursion = 10; // '\n'
const uint8_t channelID_Motion = 11; // '\v'
const uint8_t channelID_Env = 12; // '\f'
const uint8_t channelID_Cmd = 13; // '\r'
const uint8_t channelID_AudioSnore = 14; // '\016'
const uint8_t channelID_AccelSnore = 15; // '\017'
Some of the escaped codes are easily recognized and the hex or decimal equivalents easily remembered (\n == newline == 0x0A) but others are more obscure. For example decimal 7 is shown as '\a', which in some systems represents the ASCII BEL character.
Some of the representations are mystifying to me -- for example decimal 9 would be an ASCII tab, which today often appears as '\t', but intellisense shows it as a space character.
Why is an 8-bit unsigned integer always treated as a character, no matter how I try to define it as a numeric value?
Why are only some, but not all of these characters shown as escaped symbols for their ASCII equivalents, while others get their octal representation?
What is the origin of the obscure symbols used? For example, '\a' for decimal 7 matches the ISO-defined Control0 set, which has a unicode representation -- but then '\t' should be shown for decimal 9. Wikipedia C0 control codes
Is there any way to make intellisense hover tips show me the numeric value of such constants rather than a character representation? Decoration? VS settings? Typedefs? #defines?
You are misreading the intellisense. 0x7 = '\a' not the literal char 'a'. '\a' is the bell / alarm.
See the following article on escape sequences - https://en.wikipedia.org/wiki/Escape_sequences_in_C
'\a' does indeed have the value 0x7. If you assign 0x07 to a uint8_t, you can be pretty sure that the compiler will not change that assignment to something else. IntelliSense just represents the value in another way, it doesn't change your values.
Also, 'a' has the value 0x61, that's what probably tripped you up.
After more than a year, I've decided to document what I found when I pursued this further. The correct answer was implied in djgandy's answer, which cited Wikipedia, but I want to make it explicit.
With the exception of one value (0x09), Intellisense does appear to treat these values consistently, and that treatment is rooted in an authoritative source: my constants are unsigned 8-bit constants, thus they are "character-constants" per the C11 language standard (section 6.4.4).
For character constants that do not map to a displayable character, Section 6.4.4.4 defines their syntax as
6.4.4.4 Character constants
Syntax
. . .
::simple-escape-sequence: one of
\'
\"
\?
\\
\a
\b
\f
\n
\r
\t
\v
::octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
"Escape Sequences" are further defined in the C language definition section 5.2.2:
§5.2.2 Character display semantics
2) Alphabetic escape sequences representing nongraphic characters in
the execution character set are intended to produce actions on display
devices as follows:
\a (alert) Produces an audible or visible alert
without changing the active position.
\b (backspace) Moves the active position to the previous position on
the current line. If the active position is at the initial position of
a line, the behavior of the display device is unspecified.
\f ( form feed) Moves the active position to the initial position at
the start of the next logical page.
\n (new line) Moves the active position to the initial position of the
next line.
\r (carriage return) Moves the active position to the initial position
of the current line.
\t (horizontal tab) Moves the active position
to the next horizontal tabulation position on the current line. If the
active position is at or past the last defined horizontal tabulation
position, the behavior of the display device is unspecified.
\v (vertical tab) Moves the active position to the initial position of
the next vertical tabulation position. If the active position is at or
past the last defined vertical tabulation position, the behavior of
the display device is unspecified.
3) Each of these escape sequences shall produce a unique
implementation-defined value which can be stored in a single char
object. The external representations in a text file need not be
identical to the internal representations, and are outside the scope
of this International Standard.
Thus the only place where Intellisense falls down is in the handling of 0x09, which should be displayed as
'\t'
but actually is displayed as
''
So what's that all about? I suspect Intellisense considers a tab to be a printable character, but suppresses the tab action in its formatting. This seems to me inconsistent with the C and C++ standards and is also inconsistent with its treatment of other escape characters, but perhaps there's some justification for it that "escapes" me :)

Why does the size of this std::string change, when characters are changed?

I have an issue in which the size of the string is effected with the presence of a '\0' character. I searched all over in SO and could not get the answer still.
Here is the snippet.
int main()
{
std::string a = "123123\0shai\0";
std::cout << a.length();
}
http://ideone.com/W6Bhfl
The output in this case is
6
Where as the same program with a different string having numerals instead of characters
int main()
{
std::string a = "123123\0123\0";
std::cout << a.length();
}
http://ideone.com/mtfS50
gives an output of
8
What exactly is happening under the hood? How does presence of a '\0' character change the behavior?
The sequence \012 when used in a string (or character) literal is an octal escape sequence. It's the octal number 12 which corresponds to the ASCII linefeed ('\n') character.
That means your second string is actually equal to "123123\n3\0" (plus the actual string literal terminator).
It would have been very clear if you tried to print the contents of the string.
Octal sequences are one to three digits long, and the compiler will use as many digits as possible.
If you check the coloring at ideone you will see that \012 has a different color. That is because this is a single character written in octal.

Regarding conversion of text to hex via ASCII in C++

So, I've looked up how to do conversion from text to hexadecimal according to ASCII, and I have a working solution (proposed on here). My problem is that I don't understand why it works. Here's my code:
#include <string>
#include <iostream>
int main()
{
std::string str1 = "0123456789ABCDEF";
std::string output[2];
std::string input;
std::getline(std::cin, input);
output[0] = str1[input[0] & 15];
output[1] = str1[input[0] >> 4];
std::cout << output[1] << output[0] << std::endl;
}
Which is all well and good - it returns the hexadecimal value for single characters, however, what I don't understand is this:
input[0] & 15
input[0] >> 4
How can you perform bitwise operations on a character from a string? And why does it oh-so-nicely return the exact values we're after?
Thanks for any help! :)
In C++ a character is 8 bits long.
If you '&' it with 15 (binary 1111), then the least significant 4 bits are outputted to the first digit.
When you apply right shift by 4, then it is equivalent of dividing the character value by 16. This gives you the most significant 4 bits for second digit.
Once the above digit values are calculated, the required character is picked up from the constant string str1 having all the characters in their respective positions.
"Characters in a string" are not characters (individual strings of one character only). In some programming languages they are. In Javascript, for example,
var string = "testing 1,2,3";
var character = string[0];
returns "t".
In C and C++, however, 'strings' are arrays of 8-bit characters; each element of the array is an 8-bit number from 0..255.
Characters are just integers. In ASCII the character '0' is the integer 48. C++ makes this conversion implicitly in many contexts, including the one in your code.