Is there a way to decrement a character value alphabetically in C++?
For example, changing a variable containing
'b' to the value 'a' or a variable containing
'd' to the value 'c' ?
I tried looking at character sequence but couldn't find anything useful.
Characters are essentially one byte integers (although the representation may vary between compilers). While there are many encodings which map integer values to characters, almost all of them map 'a' to 'z' characters in successive numerical order. So, if you wanted to change the string "aaab" to "aaaa" you could do something like the following:
char letters [4] = {'a','a','a','b'};
letters[3]--;
Alphabet characters are part of the ASCII character table. 65 is uppercase letter A, and 32 bits later, which is 97, is lowercase letter A. Letters B through Z and b through z are 66 through 90 and 98 through 122, respectively) The original computer programmers made it 32 bits apart in the ASCII chart rather than 26 (letters in the alphabet) because bit manipulation can be done to either easily change from lowercase to uppercase (and vice-versa), as well as ignoring the case (by ignoring the 32 bit - 0010 0000).
This way, for example, the 84th character on the ASCII chart, which represents the letter T, is represented with the bits 0101 0100. Lowercase t is 116 which is 0111 0100. When ignoring the case, the 1 in the 32 bit (6th position from the right) is ignored. You can see all the other bits are exactly the same for uppercase and lowercase. This makes it more convenient for everyone and more optimal for the computer.
To decrement just convert the character to its ASCII character value, decrement by 1, then take that integer and convert it back into ASCII value. Be careful when you have an 'A' though (or 'a'), as that's a special case.
Related
BCH regex was recently updated (in the API) to: "address_regex": "^([13][a-km-zA-HJ-NP-Z1-9]{25,34})|^((bitcoincash:)?(q|p)[a-z0-9]{41})|^((BITCOINCASH:)?(Q|P)[A-Z0-9]{41})$"
Is this a Segwit thing?
I understand it's now saying addresses may start with "bitcoincash:" or "BITCOINCASH:", but that's a thing, or is it some internal Coinbase designation?
Breaking down this regex, there are three possible that constitute a valid BCH address:
1st Alternative ^([13][a-km-zA-HJ-NP-Z1-9]{25,34}):
Starts with either a 1 or a 3
Follows this with between 25 and 34 alphanumeric characters excluding l, I, O and 0
2nd Alternative ^((bitcoincash:)?(q|p)[a-z0-9]{41}):
Starts with the literal string bitcoincash: (strangely this can occur more than once)
Follows this with either a q or a p
Follows this with exactly 41 alphanumeric characters (only in lowercase)
3rd Alternative ^((BITCOINCASH:)?(Q|P)[A-Z0-9]{41})$:
Starts with the literal string BITCOINCASH: (strangely this can occur more than once)
Follows this with either a Q or a P
Follows this with exactly 41 alphanumeric characters (only in uppercase)
Essentially, Coinbase is now simply accepting the three above regexes as valid BCH addresses, adding bitcoincash as a recognised protocol used by BCH.
let me break it down for you
so there are three regex in it, as after new additions now all three are accepted as valid BCH addresses now
/^([13]{1}[a-km-zA-HJ-NP-Z1-9]{33}|(bitcoincash:)?(q|p)[a-z0-9]{41}|(BITCOINCASH:)?(Q|P)[A-Z0-9]{41})$/
Breaking it down
First type of addresses
[13]{1}
address will start with L, M or 3, {1} defines that only match one character in square bracket
/[13]{1}[a-km-zA-HJ-NP-Z1-9]/
cannot have l (small el), I (capital eye), O (capital O) and 0 (zero)
/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/
can be 27 to 34 characters long, remember we already checked the first character to be 1 or 3, so remaining address will be 26 to 33 characters long
second type of address
bitcoincash:
will start with bitcoincash:
(bitcoincash:)?(q|p)
followed by q or p
(bitcoincash:)?(q|p)[a-z0-9]
can only have lower case letters and numbers
(bitcoincash:)?(q|p)[a-z0-9]{41}
will be 54 characters long, we already checked first 11 characters to be bitcoincash: followed by another character to be Q or p, so remaining address will be 41 characters long
third type of address
BITCOINCASH:
will start with BITCOINCASH:
(BITCOINCASH:)?(Q|P)
followed by Q or P
(BITCOINCASH:)?(Q|P)[a-z0-9]
can only have lower case letters and numbers
(BITCOINCASH:)?(Q|P)[a-z0-9]{41}
will be 54 characters long, we already checked first 11 characters to be BITCOINCASH: followed by another character to be Q or P, so remaining address will be 41 characters long
I have always wondered why I can't replace an unknown whitespace character until just an hour ago that I decided to loop through it and using php ord function I found out that it is actually character ASCII number 13. I tried the following to remove it but didn't work:
preg_replace('/\x13/','',$string)
any help?
13 in hexadecimal is 19 in decimal, which is the ASCII control character DC3, which isn't properly whitespace.
You probably mean decimal 13, which is a carriage return. In hexadecimal, that's D, so you'd use \x0D instead.
I'm using the libconfig to create an configuration file and one of the fields is a content of a encrypted file. The problem occurs because in the file have some escapes characters that causes a partial storing of the content. What is the best way to store this data to avoid accidental escapes caracter ? Convert to unicode?
Any suggestion?
You can use either URL encoding, where each non-ASCII character is encoded as a % character followed by two hex digits, or you case use base64 encoding, where each set of 3 bytes is encoded to 4 ASCII characters (3x8 bits -> 4x6 bits).
For example, if you have the following bytes:
00 01 41 31 80 FE
You can URL encode it as follows:
%00%01A1%80%FE
Or you can base64 encode it like this, with 0-25 = A-Z, 26-51 = a-z, 52-62 = 0-9, 62 = ., 63 = /:
(00000000 00000001 01000001) (00110001 10000000 11111110) -->
(000000 000000 000101 000001) (001100 011000 000011 111110)
AAJBNYD.
The standard for encoding binary data in text used to be uuencode and is now base64. Both use same paradigm: a byte uses 8bits, so 3 bytes use 24 bits or 4 6 bits characters.
uuencode just used the 6 bits with an offset of 32 (ascii code for space), so characters are in range 32-96 => all in printable ascii range, but including space and possibly other characters that could have special meanings
base64 choosed these 64 characters to represent values from 0 to 63 (no =:;,'"\*(){}[] that could have special meaning...):
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
and the equal sign(=) being a place holder for empty positions and the end of an encoded string to ensure that the encoded string length is a multiple of 4.
Unfortunately, neither the C nor C++ standard library offer functions for uuencode not base 64 conversions, but you can find nice implementations around, with many pointers in this other SO answer: How do I base64 encode (decode) in C?
While using B64 decode it doesn't remove the padded extra bytes added in Base64 encoding ?
Consider the scenario where I am giving data of size 50(not in multiple of 3) to the encode function this returns encoded data of size 68.
While using decode for the encoded data(input of 68 bytes) then decode function returns 51 bytes data, which I was expecting as zero.
How base 64 encode/decode should be handled properly when the data size is not in multiple of 3 ?
I have used open source Base64 encode/decode library which is compliant with RFC4648.properly
Base64 encoding uses a special marker at the end to indicate that padding was added.
It always generates a multiple of four output characters, each corresponding to three octets of input data except possibly the last one.
For that last one, if there are only two octets left, it encodes those into three characters (each taking six bits = eighteen, sixteen bits real data and two bits of junk) then adds a special padding = character to give four characters.
If there is only one octet left, it encodes that into two characters (sixteen bits, twelve bits real data and four bits of junk) then adds a special padding == character to give four characters.
Hence, during decoding, it's the number of = characters at the end that tells you how to handle the last section so as to end up with exactly the same data you encoded.
In other words, the input data AAAA (each A holding bits abcdef) gives:
decoding input: abcdef abcdef abcdef abcdef
|
V
output: abcdefab cdefabcd efabcdef
For a slightly short block AAA= (irrelevant bits being + and padding bits being =):
decoding input: abcdef abcdef abcd++ ======
|
V
output: abcdefab cdefabcd
And a very short block AA==:
decoding input: abcdef ab++++ ====== ======
|
V
output: abcdefab
So here '=' is used as a padding character for base64 encoding.
By determining number of '=' characters at the end of encoded data I can figure out whether the input data is in multiple of 3 or not.
In other words if there is '='(single equal to) character then possibly two octets left and if there is '==' then single octet was left in the last group of three characters.
So I've been searching for a solution to a problem that one step involves counting the frequency of each unique letter. Everywhere I go has the same array incrementor. I haven't seen this form and don't fully understand it. I have attempted to find support documentation for the format but can't figure out what it actually does.I Can get it to work; however, I'm not sure what each peice represents.
Peice I'm having issues understanding is what's going on inside the brackets here.
frequency[toupper(new_letter) - 'A']++;
Where frequency is an array
an example from: count number of times a character appears in an array?
Algorithm:
Open file / read a letter.
Search for the letters array for the new letter.
If the new letter exists: increment the frequency slot for
that letter: frequency[toupper(new_letter) - 'A']++; If the new
letter is missing, add to array and set frequency to 1.
After all letters are processed, print out the frequency array: `
cout << 'A' +
index << ": " << frequency[index] << endl;
any help understanding would be much apprecaited.
This is simply an array. Maybe the part that is confusing you is toupper(new_letter) - 'A' what we do here is - we convert the letter to upercase and then subtract the ASCII code of 'A' from the ASCII code of the result. Thus the result is a number in the range [0-25]. After that by adding this to 'A' we get the origianl uppercase character. As for the rest of the algorithm - this is simply something like counting sort.
Unfortunately, this solution is not completely portable. It assumes that in the execution character set, the capital letters A-Z have consecutive values. That is, it assumes 'A' + 1 is equal to 'B', 'B' + 1 is equal to 'C', and so on. This is not necessarily true, but it usually is.
toupper simply converts whatever character is passed to it to uppercase. Subtracting 'A' from this, given the above assumption, will work out the "distance" from 'A' to the given letter. That is, if new_letter is 'A', the result will be 0. If it is 'b', the result will be 1. As you can see, the reason for using toupper was to make it independent as to whether new_letter was uppercase or lowercase.
This result (essentially the position of the letter in the alphabet) is then used to access the array. If frequency is an array of 26 ints (one for each letter), you will access the corresponding int. That int is then incremented.
If it's an array (e.g. int frequency[26];) then we don't add to array - it is already there, but with a value of zero.
The ++ operator is short hand for add one to the thing, so
frequency[toupper(new_letter) - 'A']++;
is the same as:
frequency[toupper(new_letter) - 'A'] = frequency[toupper(new_letter) - 'A'] + 1;
Obviously, the short hand version is much easier to read, as there is much less repetition that has to be carefully checked that it's the same on both sides, etc.
The index is toupper(new_letter) - 'A' - this works by first making any letter into an uppercase one - so we don't care if it's a or A, 'c' or C, etc, and then subtract the value of first letter in the alphabet, 'A'. This means that if new_letter is 'A' the index is zero. If new_letter is 'G' we use index 7, etc. [This assumes that all the letters are sequential, which isn't absolutely certain, and for sure, if we talk about languages other than English that have for example ä, ǹ, Ë or ê, etc as part of the language, then those would definitely not be following A-Z]
If you were to count the number of letters in a piece of text by hand, you could just list all the letters A-Z along the edge of the paper, and then put a dot next to each letter as you read them in the text, and then count the number of dots. This does the same sort of thing, except it keeps each count running as you go along.