How could I include a unit separator (value 31 in ascii table) in a string other than using snprintf()? I want to do like we normally initialize a string.
eg
char[100] a = "abc"
31 in dec = 0x1f in hex. Therefore,
char x[] = "blah\x1f" "blah";
// ^^^^ unit separator.
The string is split into two to avoid the compiler reading the escape sequence as 0x1fb (it should be read as 0x1f, which is 31 in decimal). Alternatively you could use octal sequence:
char x[] = "blah\037blah";
// ^^^^ unit separator.
You can do:
char str[] = {'a',31,'b','c',0};
Related
When should I use single quotes and double quotes in C or C++ programming?
In C and in C++ single quotes identify a single character, while double quotes create a string literal. 'a' is a single a character literal, while "a" is a string literal containing an 'a' and a null terminator (that is a 2 char array).
In C++ the type of a character literal is char, but note that in C, the type of a character literal is int, that is sizeof 'a' is 4 in an architecture where ints are 32bit (and CHAR_BIT is 8), while sizeof(char) is 1 everywhere.
Some compilers also implement an extension, that allows multi-character constants. The C99 standard says:
6.4.4.4p10: "The value of an integer character constant containing more
than one character (e.g., 'ab'), or
containing a character or escape
sequence that does not map to a
single-byte execution character, is
implementation-defined."
This could look like this, for instance:
const uint32_t png_ihdr = 'IHDR';
The resulting constant (in GCC, which implements this) has the value you get by taking each character and shifting it up, so that 'I' ends up in the most significant bits of the 32-bit value. Obviously, you shouldn't rely on this if you are writing platform independent code.
Single quotes are characters (char), double quotes are null-terminated strings (char *).
char c = 'x';
char *s = "Hello World";
'x' is an integer, representing the numerical value of the
letter x in the machine’s character set
"x" is an array of characters, two characters long,
consisting of ‘x’ followed by ‘\0’
I was poking around stuff like: int cc = 'cc'; It happens that it's basically a byte-wise copy to an integer. Hence the way to look at it is that 'cc' which is basically 2 c's are copied to lower 2 bytes of the integer cc. If you are looking for a trivia, then
printf("%d %d", 'c', 'cc'); would give:
99 25443
that's because 25443 = 99 + 256*99
So 'cc' is a multi-character constant and not a string.
Cheers
Single quotes are for a single character. Double quotes are for a string (array of characters). You can use single quotes to build up a string one character at a time, if you like.
char myChar = 'A';
char myString[] = "Hello Mum";
char myOtherString[] = { 'H','e','l','l','o','\0' };
single quote is for character;
double quote is for string.
In C, single-quotes such as 'a' indicate character constants whereas "a" is an array of characters, always terminated with the \0 character
Double quotes are for string literals, e.g.:
char str[] = "Hello world";
Single quotes are for single character literals, e.g.:
char c = 'x';
EDIT As David stated in another answer, the type of a character literal is int.
A single quote is used for character, while double quotes are used for strings.
For example...
printf("%c \n",'a');
printf("%s","Hello World");
Output
a
Hello World
If you used these in vice versa case and used a single quote for string and double quotes for a character, this will be the result:
printf("%c \n","a");
printf("%s",'Hello World');
output :
For the first line. You will get a garbage value or unexpected value or you may get an output like this:
�
While for the second statement, you will see nothing. One more thing, if you have more statements after this, they will also give you no result.
Note: PHP language gives you the flexibility to use single and double-quotes easily.
Use single quote with single char as:
char ch = 'a';
here 'a' is a char constant and is equal to the ASCII value of char a.
Use double quote with strings as:
char str[] = "foo";
here "foo" is a string literal.
Its okay to use "a" but its not okay to use 'foo'
Single quotes are denoting a char, double denote a string.
In Java, it is also the same.
While I'm sure this doesn't answer what the original asker asked, in case you end up here looking for single quote in literal integers like I have...
C++14 added the ability to add single quotes (') in the middle of number literals to add some visual grouping to the numbers.
constexpr int oneBillion = 1'000'000'000;
constexpr int binary = 0b1010'0101;
constexpr int hex = 0x12'34'5678;
constexpr double pi = 3.1415926535'8979323846'2643383279'5028841971'6939937510;
In C & C++ single quotes is known as a character ('a') whereas double quotes is know as a string ("Hello"). The difference is that a character can store anything but only one alphabet/number etc. A string can store anything.
But also remember that there is a difference between '1' and 1.
If you type
cout<<'1'<<endl<<1;
The output would be the same, but not in this case:
cout<<int('1')<<endl<<int(1);
This time the first line would be 48. As when you convert a character to an int it converts to its ascii and the ascii for '1' is 48.
Same, if you do:
string s="Hi";
s+=48; //This will add "1" to the string
s+="1"; This will also add "1" to the string
different way to declare a char / string
char char_simple = 'a'; // bytes 1 : -128 to 127 or 0 to 255
signed char char_signed = 'a'; // bytes 1: -128 to 127
unsigned char char_u = 'a'; // bytes 2: 0 to 255
// double quote is for string.
char string_simple[] = "myString";
char string_simple_2[] = {'m', 'S', 't', 'r', 'i', 'n', 'g'};
char string_fixed_size[8] = "myString";
char *string_pointer = "myString";
char string_poionter_2 = *"myString";
printf("char = %ld\n", sizeof(char_simple));
printf("char_signed = %ld\n", sizeof(char_signed));
printf("char_u = %ld\n", sizeof(char_u));
printf("string_simple[] = %ld\n", sizeof(string_simple));
printf("string_simple_2[] = %ld\n", sizeof(string_simple_2));
printf("string_fixed_size[8] = %ld\n", sizeof(string_fixed_size));
printf("*string_pointer = %ld\n", sizeof(string_pointer));
printf("string_poionter_2 = %ld\n", sizeof(string_poionter_2));
I would like a pattern like ".c", match "." with any utf8 followed by 'c' using std::regex.
I've tried under Microsoft C++ and g++. I get the same result, each time the "." only matches a single byte.
here's my test case:
#include <stdio.h>
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main(int argc, char** argv)
{
// make a string with 3 UTF8 characters
const unsigned char p[] = { 'a', 0xC2, 0x80, 'c', 0 };
string tobesearched((char*)p);
// want to match the UTF8 character before c
string pattern(".c");
regex re(pattern);
std::smatch match;
bool r = std::regex_search(tobesearched, match, re);
if (r)
{
// m.size() will be bytes, and we expect 3
// expect 0xC2, 0x80, 'c'
string m = match[0];
cout << "match length " << m.size() << endl;
// but we only get 2, we get the 0x80 and the 'c'.
// so it's matching on single bytes and not utf8
// code here is just to dump out the byte values.
for (int i = 0; i < m.size(); ++i)
{
int c = m[i] & 0xff;
printf("%02X ", c);
}
printf("\n");
}
else
cout << "not matched\n";
return 0;
}
I wanted the pattern ".c" to match 3 bytes of my tobesearched string, where the first two are a 2-byte utf8 character followed by 'c'.
Some regex flavours support \X which will match a single unicode character, which may consist of a number of bytes depending on the encoding. It is common practice for regex engines to get the bytes of the subject string in an encoding the engine is designed to work with, so you shouldn't have to worry about the actual encoding (whether it is US-ASCII, UTF-8, UTF-16 or UTF-32).
Another option is the \uFFFF where FFFF refers to the unicode character at that index in the unicode charset. With that, you could create a ranged match inside a character class i.e. [\u0000-\uFFFF]. Again, it depends on what the regex flavour supports. There is another variant of \u in \x{...} which does the same thing, except the unicode character index must be supplied inside curly braces, and need not be padded e.g. \x{65}.
Edit: This website is amazing for learning more about regex across various flavours https://www.regular-expressions.info
Edit 2: To match any Unicode-exclusive character, i.e. excluding characters in the ASCII table / 1 byte characters, you can try "[\x{80}-\x{FFFFFFFF}]" i.e. any character that has a value of 128-4,294,967,295 which is from the first character outside the ASCII range to the last unicode charset index which currently uses up to a 4-byte representation (was originally to be 6, and may change in future).
A loop through the individual bytes would be more efficient, though:
If the lead bit is 0, i.e. if its signed value is > -1, it is a 1 byte char representation. Skip to the next byte and start again.
Else if the lead bits are 11110 i.e. if its signed value is > -17, n=4.
Else if the lead bits are 1110 i.e. if its signed value is > -33, n=3.
Else if the lead bits are 110 i.e. if its signed value is > -65, n=2.
Optionally, check that the next n bytes each start with 10, i.e. for each byte, if it has a signed value < -63, it is invalid UTF-8 encoding.
You now know that the previous n bytes constitute a unicode-exclusive character. So, if the NEXT character is 'c' i.e. == 99, you can say it matched - return true.
In other words, based on the ASCII table, from the range of '0' to '9',
how may I convert them into integers 0 to 9?
A solution such as:
char a = '6';
int b = a-48;
has already been floating around these parts, but I was wondering if there are other ways to go about this without the use of magic numbers?
Since '0' is not guaranteed to be 48, but the numbers are guaranteed to be consecutive, you can use a-'0'.
If you really want to, you could use a stringstream like this:
#include <string>
#include <sstream>
int charToInt(char c) {
// initialize a buffered stream with a 1-character string
std::stringstream ss(std::string(1,c));
// read an int from the stream
int v;
ss >> v;
return v;
}
Not the simplest way to do the conversion, but this way you don't see any of the implementation details involving "magic" number or character. You also get error handling (an exception is thrown) if the caracter was not a number.
On the other hand, if you're absolutely certain that the character c is in the '0'..'9' range, I don't see why not use c - '0'.
Another solution is to replace c - 48 with c & 0xf, but that still involves magic numbers and is less readable than c - '0'
The ascii table is ordered in an hexadecimal way, so it's very easy to change numbers characters to real number value, or another things like to Uppercase to Lower...
As the numbers begin in the 0x30, then 0x30 =0 , 0x31 = 1, 0x32 =2, etc, you must just remove the 0x30 to get the real value.
char number='2';
int numberValue = (int)number - 0x30; /* you can rest the '0' value too */
As it, to convert an int to char is the same, just add it the 0x30.
int numberValue=5;
char number = (int)numberValue +0x30; /* or add '0' to your var */
Subtract ASCII zero from the number:
char a = '2';
int b = a-'0';
If you can't use '0', how about that kind of cheating?
(int)(a + 2) % 10;
If it's a char, not a char pointer, you can do this:
int convert (char x)
{
return (int(x) - int('0'));
}
How do you initialize special ASCII chars, for example EOT (0x04), ENQ(0x05)?
char CHAR1 = '\EOT';
char CHAR2 = '\ENQ';
Is this correct?
You can put character code into the variable:
char CHAR1 = 4;
char CHAR2 = 5;
You can also use escape sequences which you'll find here.
By hex or octal, there is no support for their names. '\x04' eg.
You can simply assign char to its hexadecimal value:
char CHAR1 = 0x04;
Is this correct ? - no the way you initialize it is not correct as compiler expects one escape character after '\'
I have const binary data that I need insert to buffer
for example
char buf[] = "1232\0x1";
but how can do it when binary data is at first like below
char buf[] = "\0x11232";
compiler see it like a big hex number
but my perpose is
char buf[] = {0x1,'1','2','3','2'};
You can use compile-time string concatenation:
char buf[] = "\x01" "1232";
However, with a 2-digit number after \x it also works without:
char buf[] = "\x011232";
You can create a single string literal by composing it of adjacent strings - the compiler will concatenate them:
char buf[] = "\x1" "1232";
is equivalent to:
char buf[] = {0x1,'1','2','3','2', 0}; // note the terminating null, which may or may not be important to you
You have to write it in two byte or four byte format:
\xhh = ASCII character in hexadecimal notation
\xhhhh = Unicode character in hexadecimal notation if this escape sequence is used in a wide-character constant or a Unicode string literal.
so in your case you have to write "\x0112345"