what does variable != 0xFF mean in C++? - c++

I have the following if function that has a condition on an array of a data buffer, which stores data of a wav file
bool BFoundEnd = FALSE;
if (UCBuffer[ICount] != 0xFF){
BFoundEnd = TRUE;
break;
}
I was just confused on how 0xFF defines the condition inside the if function.

what does variable != 0xFF mean in C++?
variable is presumably an identifier that names a variable.
!= is the inequality operator. It results in false when left and right hand operands are equal and true otherwise.
0xFF is an integer literal. The 0x prefix means that the literal uses hexadecimal system (base 16). The value is 255 in decimal system (base 10) and 1111'1111 in binary system (base 2). For more information about the base i.e. the radix of numeral systems, see wikipedia: Radix

Reread with comments:
// Remembers if buffer did end with 0xFF (255) or not.
bool BFoundEnd = FALSE;
// ... Later in loop.
// Actual check for above variable
// (where "if ... != ..." means if not equal).
if (UCBuffer[ICount] != 0xFF) {
BFoundEnd = TRUE;
// Cancels looping as reached buffer-end.
break;
}
// ... Outside the loop.
// Handles something based on variable.
In short, someone decided to make 255 a special value which marks the end of the buffer and/or array (instead of providing array length).

Related

How to walk along UTF-16 codepoints?

I have the following definition of varying ranges which correspond to codepoints and surrogate pairs:
https://en.wikipedia.org/wiki/UTF-16#Description
My code is based on ConvertUTF.c from the Clang implementation.
I'm currently struggling with wrapping my head around how to do this.
The code which is most relevant from LLVM's implementation that I'm trying to understand is:
unsigned short bytesToWrite = 0;
const u32char_t byteMask = 0xBF;
const u32char_t byteMark = 0x80;
u8char_t* target = *targetStart;
utf_result result = kConversionOk;
const u16char_t* source = *sourceStart;
while (source < sourceEnd) {
u32char_t ch;
const u16char_t* oldSource = source; /* In case we have to back up because of target overflow. */
ch = *source++;
/* If we have a surrogate pair, convert to UTF32 first. */
if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_HIGH_END) {
/* If the 16 bits following the high surrogate are in the source buffer... */
if (source < sourceEnd) {
u32char_t ch2 = *source;
/* If it's a low surrogate, convert to UTF32. */
if (ch2 >= UNI_SUR_LOW_START && ch2 <= UNI_SUR_LOW_END) {
ch = ((ch - UNI_SUR_HIGH_START) << halfShift)
+ (ch2 - UNI_SUR_LOW_START) + halfBase;
++source;
} else if (flags == kStrictConversion) { /* it's an unpaired high surrogate */
--source; /* return to the illegal value itself */
result = kSourceIllegal;
break;
}
} else { /* We don't have the 16 bits following the high surrogate. */
--source; /* return to the high surrogate */
result = kSourceExhausted;
break;
}
} else if (flags == kStrictConversion) {
/* UTF-16 surrogate values are illegal in UTF-32 */
if (ch >= UNI_SUR_LOW_START && ch <= UNI_SUR_LOW_END) {
--source; /* return to the illegal value itself */
result = kSourceIllegal;
break;
}
}
...
Specifically they say in the comments:
If we have a surrogate pair, convert to UTF32 first.
and then:
If it's a low surrogate, convert to UTF32.
I'm getting lost along the lines of "if we have.." and "if it's.." and my response being while reading the comments: "what do we have?" and "what is it?"
I believe ch and ch2 is the first char16 and the next char16 (if one exists), checking to see if the second is part of a surrogate pair, and then walking along each char16 (or do you walk along pairs of chars?) until the end.
I'm getting lost along the lines of how they are using UNI_SUR_HIGH_START, UNI_SUR_HIGH_END, UNI_SUR_LOW_START, UNI_SUR_LOW_END, and their use of halfShift and halfBase.
Wikipedia also notes:
There was an attempt to rename "high" and "low" surrogates to "leading" and "trailing" due to their numerical values not matching their names. This appears to have been abandoned in recent Unicode standards.
Making note of "leading" and "trailing" in any responses may help clarify things as well.
ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_HIGH_END checks if ch is in the range where high surrogates are, that is, [D800-DBFF]. That's it. Then the same is done for checking if ch2 is in the range where low surrogates are, meaning [DC00-DFFF].
halfShift and halfBase are just used as prescribed by the UTF-16 decoding algorithm, which turns a pair of surrogates into the scalar value they represent. There's nothing special being done here; it's the textbook implementation of that algorithm, without any tricks.

How to compare to numeric_limits<int64_t>::min()

Consider that the sign (+1 or -1) is known and there is a code that parses unsigned integer. That unsigned integer can be equal to -numeric_limits<int64_t>::max(). How to correctly compare without triggering undefined behavior?
int8_t sign = /* +1 or -1 */;
uint64_t result = /* parse the remaining string as unsigned integer */;
if( result > uint64_t(numeric_limits<int64_t>::max()))
{
if(sign == 1) return false; // error: out of range for int64_t
// Is the below code correct or how to implement correctly its intent?
if(result == uint64_t(-numeric_limits<int64_t>::min()))
{
return true;
}
return false;
}
As noted by Holt, you're effectively assuming 2's complement arithmetic. Therefore, you can replace -min by max+1:
if(result == uint64_t(numeric_limits<int64_t>::max()) + 1)
This avoids the undefined behavior (signed integer overflow) that results when negating the minimal value.
It might be a good idea to verify your system really uses 2's complement (depends on how strictly you want to comply with the C++ standard). This can be achieved by comparing -max with min:
if (numeric_limits<int64_t>::max() + numeric_limits<int64_t>::min() == 0)
{
// If not two's complement:
// Too large absolute value == error, regardless of sign
return false;
// on all sane (2's complement) systems this will be optimized out
}
There are no possibilities for other relations between min and max; this is explained here.

How can I know if the memory address I'm reading from is empty or not in C++?

So on an embedded system I'm reading and writing some integers in to the flash memory. I can read it with this function:
read(uint32_t *buffer, uint32_t num_words){
uint32_t startAddress = FLASH_SECTOR_7;
for(uint32_t i = 0; i < num_words; i++){
buffer[i] = *(uint32_t *)(startAddress + (i*4));
}
}
then
uint32_t buf[10];
read(buf,10);
How can I know if buff[5] is empty (has anything on it) or not?
Right now on the items that are empty I get something like this 165 '¥' or this 255 'ÿ'
Is there a way to find that out?
You need first to define "empty", since you are using uint32_t. A good ide is to use value 0xFFFFFFFF (4294967295 decimal) to be the empty value, but you need to be sure that this value isn't used to other things. Then you can test if if ( buf [ 5 ] == 0xFFFFFFFF ).
But if your using the whole range of uint32_t, then there is no way to detect if it's empty.
Another way is to use structures, and define a empty bit.
struct uint31_t
{
uint32_t empty : 0x01; // If set, then uint31_t.value is empty
uint32_t value : 0x1F;
};
Then you can check if the empty bit is set, but the negative part is that you lose a whole bit.
If your array is an array of pointers you can check to see by comparing it to {nullptr}, otherwise, you cannot unless you initialize all the initial indexes to the same value, and then check if the value is still the same.

C++ function convertCtoD

I'm new to C++. As part of an assignment we have to write to functions, but I don't know what the teacher mean by what he is requesting. Has anyone seen this or at least point me in the right direction. I don't want you to write the function, I just don't know what the output or what he is asking. I'm actually clueless right now.
Thank you
convertCtoD( )
This function is sent a null terminated character array
where each character represents a Decimal (base 10) digit.
The function returns an integer which is the base 10 representation of the characters.
convertBtoD( )
This function is sent a null terminated character array
where each character represents a Binary (base 2) digit.
The function returns an integer which is the base 10 representation of the character.
This function is sent a null terminated character array where each character represents a Decimal (base 10) digit. The function returns an integer which is the base 10 representation of the characters.
I'll briefly mention the fact that "an integer which is the base 10 representation of the characters" is useless here, the integer will represent the value whereas "base 10 representation" is the presentation of said value.
However, the desription given simply means you take in a (C-style) string of digits and put out an integer. So you would start with:
int convertCtoD(char *decimalString) {
int retVal = 0
// TBD: needs actual implementation.
return retVal;
}
This function is sent a null terminated character array where each character represents a Binary (base 2) digit. The function returns an integer which is the base 10 representation of the character.
This will be very similar:
int convertBtoD(char *binaryString) {
int retVal = 0
// TBD: needs actual implementation.
return retVal;
}
You'll notice I've left the return type as signed even though there's no need to handle signed values at all. You'll see why in the example implementation I provide below as I'm using it to return an error condition. The reason I'm providing code even though you didn't ask for it is that I think five-odd years is enough of a gap to ensure you can't cheat by passing off my code as your own :-)
Perhaps the simplest example would be:
int convertCToD(char *decimalStr) {
// Initialise accumulator to zero.
int retVal = 0;
// Process each character.
while (*str != '\0') {
// Check character for validity, add to accumulator (after
// converting char to int) then go to next character.
if ((*str < '0') || (*str > '9')) return -1;
retVal *= 10;
retVal += *str++ - '0';
}
return retVal;
}
The binary version would basically be identical except that it would use '1' as the upper limit and 2 as the multiplier (as opposed to '9' and 10).
That's the simplest form but there's plenty of scope for improvement to make your code more robust and readable:
Since the two functions are very similar, you could refactor out the common bits so as to reduce duplication.
You may want to consider an empty string as invalid rather than just returning zero as it currently does.
You probably want to detect overflow as an error.
With those in mind, it may be that the following is a more robust solution:
#include <stdbool.h>
#include <limits.h>
int convertBorCtoD(char *str, bool isBinary) {
// Configure stuff that depends on binary/decimal choice.
int maxDigit = isBinary ? '1' : '9';
int multiplier = maxDigit - minDigit + 1;
// Initialise accumulator to zero.
int retVal = 0;
// Optional check for empty string as error.
if (*str == '\0') return -1;
// Process each character.
while (*str != '\0') {
// Check character for validity.
if ((*str < '0') || (*str > maxDigit)) return -1;
// Add to accumulator, checking for overflow.
if (INT_MAX / multiplier < retVal) return -1;
retVal *= multiplier;
if (INT_MAX - (*str - '0') < retVal) return -1;
retVal += *str++ - '0';
}
return retVal;
}
int convertCtoD(char *str) { return convertBorCtoD(str, false); }
int convertBtoD(char *str) { return convertBorCtoD(str, true); }

How do I represent binary numbers in C++ (used for Huffman encoder)?

I am writing my own Huffman encoder, and so far I have created the Huffman tree by using a minHeap to pop off the two lowest frequency nodes and make a node that links to them and then pushing the new node back one (lather, rinse, repeat until only one node).
So now I have created the tree, but I need to use this tree to assign codes to each character. My problem is I don't know how to store the binary representation of a number in C++. I remember reading that unsigned char is the standard for a byte, but I am unsure.
I know I have to recusively traverse the tree and whenever I hit a leaf node I must assign the corresponding character whatever code is current representing the path.
Here is what I have so far:
void traverseFullTree(huffmanNode* root, unsigned char curCode, unsigned char &codeBook){
if(root->leftChild == 0 && root->rightChild == 0){ //you are at a leaf node, assign curCode to root's character
codeBook[(int)root->character] = curCode;
}else{ //root has children, recurse into them with the currentCodes updated for right and left branch
traverseFullTree(root->leftChild, **CURRENT CODE SHIFTED WITH A 0**, codeBook );
traverseFullTree(root->rightChild, **CURRENT CODE SHIFTED WITH A 1**, codeBook);
}
return 0;
}
CodeBook is my array that has a place for the codes of up to 256 characters (for each possible character in ASCII), but I am only going to actually assign codes to values that appear in the tree.
I am not sure if this is the corrent way to traverse my Huffman tree, but this is what immediately seems to work (though I haven't tested it). Also how do I call the traverse function of the root of the whole tree with no zeros OR ones (the very top of the tree)?
Should I be using a string instead and appending to the string either a zero or a 1?
Since computers are binary ... ALL numbers in C/C++ are already in binary format.
int a = 10;
The variable a is binary number.
What you want to look at is bit manipulation, operators such as & | << >>.
With the Huffman encoding, you would pack the data down into an array of bytes.
It's been a long time since I've written C, so this is an "off-the-cuff" pseudo-code...
Totally untested -- but should give you the right idea.
char buffer[1000]; // This is the buffer we are writing to -- calc the size out ahead of time or build it dynamically as go with malloc/ remalloc.
void set_bit(bit_position) {
int byte = bit_position / 8;
int bit = bit_position % 8;
// From http://stackoverflow.com/questions/47981/how-do-you-set-clear-and-toggle-a-single-bit-in-c
byte |= 1 << bit;
}
void clear_bit(bit_position) {
int byte = bit_position / 8;
int bit = bit_position % 8;
// From http://stackoverflow.com/questions/47981/how-do-you-set-clear-and-toggle-a-single-bit-in-c
bite &= ~(1 << bit);
}
// and in your loop, you'd just call these functions to set the bit number.
set_bit(0);
clear_bit(1);
Since the curCode has only zero and one as its value, BitSet might suit your need. It is convenient and memory-saving. Reference this: http://www.sgi.com/tech/stl/bitset.html
Only a little change to your code:
void traverseFullTree(huffmanNode* root, unsigned char curCode, BitSet<N> &codeBook){
if(root->leftChild == 0 && root->rightChild == 0){ //you are at a leaf node, assign curCode to root's character
codeBook[(int)root->character] = curCode;
}else{ //root has children, recurse into them with the currentCodes updated for right and left branch
traverseFullTree(root->leftChild, **CURRENT CODE SHIFTED WITH A 0**, codeBook );
traverseFullTree(root->rightChild, **CURRENT CODE SHIFTED WITH A 1**, codeBook);
}
return 0;
}
how to store the binary representation of a number in C++
You can simply use bitsets
#include <iostream>
#include <bitset>
int main() {
int a = 42;
std::bitset<(sizeof(int) * 8)> bs(a);
std::cout << bs.to_string() << "\n";
std::cout << bs.to_ulong() << "\n";
return (0);
}
as you can see they also provide methods for conversions to other types, and the handy [] operator.
Please don't use a string.
You can represent the codebook as two arrays of integers, one with the bit-lengths of the codes, one with the codes themselves. There is one issue with that: what if a code is longer than an integer? The solution is to just not make that happen. Having a short-ish maximum codelength (say 15) is a trick used in most practical uses of Huffman coding, for various reasons.
I recommend using canonical Huffman codes, and that slightly simplifies your tree traversal: you'd only need the lengths, so you don't have to keep track of the current code. With canonical Huffman codes, you can generate the codes easily from the lengths.
If you are using canonical codes, you can let the codes be wider than integers, because the high bits would be zero anyway. However, it is still a good idea to limit the lengths. Having a short maximum length (well not too short, that would limit compression, but say around 16) enables you to use the simplest table-based decoding method, a simple single-level table.
Limiting code lengths to 25 or less also slightly simplifies encoding, it lets you use a 32bit integer as a "buffer" and empty it byte by byte, without any special handling of the case where the buffer holds fewer than 8 bits but encoding the current symbol would overflow it (because that case is entirely avoided - in the worst case there would be 7 bits in the buffer and you try to encode a 25-bit symbol, which works just fine).
Something like this (not tested in any way)
uint32_t buffer = 0;
int bufbits = 0;
for (int i = 0; i < symbolCount; i++)
{
int s = symbols[i];
buffer <<= lengths[s]; // make room for the bits
bufbits += lengths[s]; // buffer got longer
buffer |= values[s]; // put in the bits corresponding to the symbol
while (bufbits >= 8) // as long as there is at least a byte in the buffer
{
bufbits -= 8; // forget it's there
writeByte((buffer >> bufbits) & 0xFF); // and save it
}
}