input three integers, n, tag and flag.
if flag = 0 then
return ((n & tag) == tag);
if flag != 0 then
return ((n & tag) != tag);
Ideally, I want something simple without if statement.
If flag can be converted to a bool, it can be simplified to:
return !(flag ^ ((n & tag) == tag))
You could convert the flag to a bool. In C++:
bool b_flag = flag;
return !b_flag * ((n & tag) == tag) + b_flag * ((n & tag) != tag);
Or you could use a ternary operator.
Related
This is a variable template used to verify string template arguments in a library I'm working on.
template <fixed_string S>
inline constexpr bool verify_long_option = S.size() == 2 ||
(std::find_if_not(S.begin(), S.end(), [](char c) {
return ascii_isalnum(c) ||
c == '-';
}) == S.end());
clang-format seems to align the lambda to the edge of the bracket. What I want instead is this:
template <fixed_string S>
inline constexpr bool verify_long_option = S.size() == 2 ||
(std::find_if_not(S.begin(), S.end(), [](char c) {
return ascii_isalnum(c) || c == '-';
}) == S.end());
I've tried setting LambdaBodyIndentation: OuterScope, but it doesn't seem to work here.
I have also tried removing the brackets, however, this results:
template <fixed_string S>
inline constexpr bool verify_long_option =
S.size() == 2 || std::find_if_not(S.begin(), S.end(), [](char c) {
return ascii_isalnum(c) || c == '-';
}) == S.end();
Is there an option that changes lambda alignment so that it lines up with the indent of the whole expression?
This other SO question is related, but is the opposite of what I'm after.
I need to extract Unicode strings from a PE file. While extracting I need to detect it first. For UTF-8 characters, I used the following link - How to easily detect utf8 encoding in the string?. Is there any similar way to detect UTF-16 characters. I have tried the following code. Is this right? Please do help or provide suggestions. Thanks in advance!!!
BYTE temp1 = buf[offset];
BYTE temp2 = buf[offset+1];
while (!(temp1 == 0x00 && temp2 == 0x00) && offset <= bufSize)
{
if ((temp1 >= 0x00 && temp1 <= 0xFF) && (temp2 >= 0x00 && temp2 <= 0xFF))
{
tmp += 2;
}
else
{
break;
}
offset += 2;
temp1 = buf[offset];
temp2 = buf[offset+1];
if (temp1 == 0x00 && temp2 == 0x00)
{
break;
}
}
I just implemented right now a function for you, DecodeUtf16Char(), basically it is able to do two things - either just check if it is a valid utf-16 (when check_only = true) or check and return valid decoded Unicode code-point (32-bit). Also it supports either big endian (default, when big_endian = true) or little endian (big_endian = false) order of bytes within two-byte utf-16 word. bad_skip equals to number of bytes to be skipped if failed to decode a character (invalid utf-16), bad_value is a value that is used to signify that utf-16 wasn't decoded (was invalid) by default it is -1.
Example of usage/tests are included after this function definition. Basically you just pass starting (ptr) and ending pointer to this function and when returned check return value, if it is -1 then at pointer begin was invalid utf-16 sequence, if it is not -1 then this returned value contains valid 32-bit unicode code-point. Also my function increments ptr, by amount of decoded bytes in case of valid utf-16 or by bad_skip number of bytes if it is invalid.
My functions should be very fast, because it contains only few ifs (plus a bit of arithmetics in case when you ask to actually decode chars), always place my function into headers so that it is inlined into calling function to produce very fast code! Also pass in only compile-time-constants check_only and big_endian, this will remove extra decoding code through C++ optimizations.
If for example you just want to detect long runs of utf-16 bytes then you do next thing, iterate in a loop calling this function and whenever it first returned not -1 then it will be possible beginning, then iterate further and catch last not-equal-to -1 value, this will be the last point of text. Also important to pass in bad_skip = 1 when searching for utf-16 bytes because valid char may start at any byte.
I used for testing different characters - English ASCII, Russian chars (two-byte utf-16) plus two 4-byte chars (two utf-16 words). My tests append converted line to test.txt file, this file is UTF-8 encoded to be easily viewable e.g. by notepad. All of the code after my decoding function is not needed for it to work, the rest is just testing code.
My function to work needs two functions - _DecodeUtf16Char_ReadWord() (helper) plus DecodeUtf16Char() (main decoder). I only include one standard header <cstdint>, if you're not allowed to include anything then just define uint8_t and uint16_t and uint32_t, I use only these types definition from this header.
Also, for reference, see my other post which implements both from scratch (and using standard C++ library) all types of conversions between UTF-8<-->UTF-16<-->UTF-32!
Try it online!
#include <cstdint>
static inline bool _DecodeUtf16Char_ReadWord(
uint8_t const * & ptrc, uint8_t const * end,
uint16_t & r, bool const big_endian
) {
if (ptrc + 1 >= end) {
// No data left.
if (ptrc < end)
++ptrc;
return false;
}
if (big_endian) {
r = uint16_t(*ptrc) << 8; ++ptrc;
r |= uint16_t(*ptrc) ; ++ptrc;
} else {
r = uint16_t(*ptrc) ; ++ptrc;
r |= uint16_t(*ptrc) << 8; ++ptrc;
}
return true;
}
static inline uint32_t DecodeUtf16Char(
uint8_t const * & ptr, uint8_t const * end,
bool const check_only = true, bool const big_endian = true,
uint32_t const bad_skip = 1, uint32_t const bad_value = -1
) {
auto ptrs = ptr, ptrc = ptr;
uint32_t c = 0;
uint16_t v = 0;
if (!_DecodeUtf16Char_ReadWord(ptrc, end, v, big_endian)) {
// No data left.
c = bad_value;
} else if (v < 0xD800 || v > 0xDFFF) {
// Correct single-word symbol.
if (!check_only)
c = v;
} else if (v >= 0xDC00) {
// Unallowed UTF-16 sequence!
c = bad_value;
} else { // Possibly double-word sequence.
if (!check_only)
c = (v & 0x3FF) << 10;
if (!_DecodeUtf16Char_ReadWord(ptrc, end, v, big_endian)) {
// No data left.
c = bad_value;
} else if ((v < 0xDC00) || (v > 0xDFFF)) {
// Unallowed UTF-16 sequence!
c = bad_value;
} else {
// Correct double-word symbol
if (!check_only) {
c |= v & 0x3FF;
c += 0x10000;
}
}
}
if (c == bad_value)
ptr = ptrs + bad_skip; // Skip bytes.
else
ptr = ptrc; // Skip all eaten bytes.
return c;
}
// --------- Next code only for testing only and is not needed for decoding ------------
#include <iostream>
#include <string>
#include <codecvt>
#include <fstream>
#include <locale>
static std::u32string DecodeUtf16Bytes(uint8_t const * ptr, uint8_t const * end) {
std::u32string res;
while (true) {
if (ptr >= end)
break;
uint32_t c = DecodeUtf16Char(ptr, end, false, false, 2);
if (c != -1)
res.append(1, c);
}
return res;
}
#if (!_DLL) && (_MSC_VER >= 1900 /* VS 2015*/) && (_MSC_VER <= 1914 /* VS 2017 */)
std::locale::id std::codecvt<char16_t, char, _Mbstatet>::id;
std::locale::id std::codecvt<char32_t, char, _Mbstatet>::id;
#endif
template <typename CharT = char>
static std::basic_string<CharT> U32ToU8(std::u32string const & s) {
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> utf_8_32_conv;
auto res = utf_8_32_conv.to_bytes(s.c_str(), s.c_str() + s.length());
return res;
}
template <typename WCharT = wchar_t>
static std::basic_string<WCharT> U32ToU16(std::u32string const & s) {
std::wstring_convert<std::codecvt_utf16<char32_t, 0x10ffffUL, std::little_endian>, char32_t> utf_16_32_conv;
auto res = utf_16_32_conv.to_bytes(s.c_str(), s.c_str() + s.length());
return std::basic_string<WCharT>((WCharT*)(res.c_str()), (WCharT*)(res.c_str() + res.length()));
}
template <typename StrT>
void OutputString(StrT const & s) {
std::ofstream f("test.txt", std::ios::binary | std::ios::app);
f.write((char*)s.c_str(), size_t((uint8_t*)(s.c_str() + s.length()) - (uint8_t*)s.c_str()));
f.write("\n\x00", sizeof(s.c_str()[0]));
}
int main() {
std::u16string a = u"привет|мир|hello|𐐷|world|𤭢|again|русский|english";
*((uint8_t*)(a.data() + 12) + 1) = 0xDD; // Introduce bad utf-16 byte.
// Also truncate by 1 byte ("... - 1" in next line).
OutputString(U32ToU8(DecodeUtf16Bytes((uint8_t*)a.c_str(), (uint8_t*)(a.c_str() + a.length()) - 1)));
return 0;
}
Output:
привет|мир|hllo|𐐷|world|𤭢|again|русский|englis
1) I have a big buffer
2) I have a lot of variables of almost every types,
I use this buffer to send to multiple destinations, with different byte orders.
when I send to a network byte order, I usually used htons, or htonl and a customized function for specific data types,
so my issue,
every time I am constructing the buffer, I change byte order for each variable then use memcpy.
however, does anyone know a better way, like I was wishing for an efficient memcpy with specific intended byte order
an example,
UINT32 dwordData = 0x01234567;
UINT32 dwordTmp = htonl(dwordData);
memcpy(&buffer[loc], &dwordTmp, sizeof(UNIT32));
loc += sizeof(UNIT32);
this is just an example I just randomly wrote btw
I hope for a function that look like
memcpyToNetwork(&buffer[loc], &dwordTmp, sizeof(UNIT32));
if you know what I mean, naming is just a descriptive, and depending on the data type it does the byte order for the specific data type so I dont have to keep changing orders manually and have a temp variable to copy to, saving copying twice.
There is no standard solution, but it is fairly easy to write yourself.
Off the top of my head, an outline could look like this:
// Macro to be able to switch easily between encodings. Just for convenience
#define WriteBuffer WriteBufferBE
// Generic template as interface specification. Not implemented itself
// Takes buffer (of sufficient size) and value, returns number of bytes written
template <typename T>
size_t WriteBufferBE(char* buffer, const T& value);
template <typename T>
size_t WriteBufferLE(char* buffer, const T& value);
// Specializations for specific types
template <>
size_t WriteBufferBE(char* buffer, const UINT32& value)
{
buffer[0] = (value >> 24) & 0xFF;
buffer[1] = (value >> 16) & 0xFF;
buffer[2] = (value >> 8) & 0xFF;
buffer[3] = (value) & 0xFF;
return 4;
}
template <>
size_t WriteBufferBE(char* buffer, const UINT16& value)
{
buffer[0] = (value >> 8) & 0xFF;
buffer[1] = (value) & 0xFF;
return 2;
}
template <>
size_t WriteBufferLE(char* buffer, const UINT32& value)
{
buffer[0] = (value) & 0xFF;
buffer[1] = (value >> 8) & 0xFF;
buffer[2] = (value >> 16) & 0xFF;
buffer[3] = (value >> 24) & 0xFF;
return 4;
}
template <>
size_t WriteBufferLE(char* buffer, const UINT16& value)
{
buffer[0] = (value) & 0xFF;
buffer[1] = (value >> 8) & 0xFF;
return 2;
}
// Other types left as an exercise. Can use the existing functions!
// Usage:
loc += writeBuffer(&buffer[loc], dwordData);
I want to create a kind of parser of the form:
#include <iostream>
#include <string>
#include <sstream>
#include <cctype>
using namespace std;
bool isValid(istringstream& is)
{
char ch;
is.get(ch); //I know get(ch) is a good start but this is as for as I got :)
.......
....
}
int main()
{
string s;
while(getline(cin,s))
{
istringstream is(s);
cout<<(isValid(is)? "Expression OK" : "Not OK")<<endl;
}
}
A boolean function that returns TRUE if the sequence of char is of the form "5" or "(5+3)" or "((5+3)+6)" or "(((4+2)+1)+6)" ...etc and FALSE for any other case
Basically, an expression will be considered as valid if it is either a single digit or of the form "open parenthesis-single digit-plus sign-single digit-close parenthesis"
Valid Expression = single digit
and
Valid Expression = (Valid Expression + Valid Expression)
Given that there is no limit to the size of the above form (number of opening and closing parenthesis..etc.) I'd like to do that using recursion
Being the newbie that I am.. Thank you for any helpful input!
To do a recursive solution you're gonna want to read the string into a buffer first, then do something like this:
int expression(char* str) {
if (*str == '(') {
int e1 = expression(str + 1);
if (e1 == -1 || *(str + 1 + e) != '+') {
return -1;
}
int e2 = expression(str + 1 + e + 1);
if (e2 == -1 || *(str + 1 + e + 1 + e2) != ')') {
return -1;
}
return 1 + e1 + 1 + e2 + 1;
}
if (*str >= '0' || *str <= '9') {
return 1;
}
return -1;
}
bool isvalid(char* str) {
int e1 = expression(str);
if (e1 < 0) {
return false;
}
if (e1 == strlen(str)) {
return true;
}
if (*(str + e1) != '+') {
return false;
}
int e2 = expression(str + e1 + 1);
if (e2 < 0) {
return false;
}
return (e1 + 1 + e2 == strlen(str));
}
Basically, the expression function returns the length of the valid expression at it's argument. If it's argument begins with a parenthesis, it gets the length of the expression after that, verifies the plus after that, then verifies the closing parenthesis after the next expression. If the argument begins with a number, return 1. If something is messed up, return -1. Then using that function we can figure out whether or not the string is valid by some sums and the length of the string.
I haven't tested the function at all, but the only case this might fail in that I can think of would be excessive parenthesis: ((5)) for example.
An alternative to recursion could be some sort of lexical parsing such as this:
enum {
ExpectingLeftExpression,
ExpectingRightExpression,
ExpectingPlus,
ExpectingEnd,
} ParseState;
// returns true if str is valid
bool check(char* str) {
ParseState state = ExpectingLeftExpression;
do {
switch (state) {
case ExpectingLeftExpression:
if (*str == '(') {
} else if (*str >= '0' && *str <= '9') {
state = ExpectingPlus;
} else {
printf("Error: Expected left hand expression.");
return false;
}
break;
case ExpectingPlus:
if (*str == '+') {
state = ExpectingRightExpression;
} else {
printf("Error: Expected plus.");
return false;
}
break;
case ExpectingRightExpression:
if (*str == '(') {
state = ExpectingLeftExpression;
} else if (*str >= '0' && *str <= '9') {
state = ExpectingEnd;
} else {
printf("Error: Expected right hand expression.");
return false;
}
break;
}
} while (*(++str));
return true;
}
That function's not complete at all, but you should be able to see where it's going. I think the recursion works better in this case anyways.
I use the boost lexical_cast library for parsing text data into numeric values quite often. In several situations however, I only need to check if values are numeric; I don't actually need or use the conversion.
So, I was thinking about writing a simple function to test if a string is a double:
template<typename T>
bool is_double(const T& s)
{
try
{
boost::lexical_cast<double>(s);
return true;
}
catch (...)
{
return false;
}
}
My question is, are there any optimizing compilers that would drop out the lexical_cast here since I never actually use the value?
Is there a better technique to use the lexical_cast library to perform input checking?
You can now use boost::conversion::try_lexical_convert now defined in the header boost/lexical_cast/try_lexical_convert.hpp (if you only want try_lexical_convert). Like so:
double temp;
std::string convert{"abcdef"};
assert(boost::conversion::try_lexical_convert<double>(convert, temp) != false);
Since the cast might throw an an exception, a compiler that would just drop that cast would be seriously broken. You can assume that all major compilers will handle this correctly.
Trying to to do the lexical_cast might not be optimal from a performance point of view, but unless you check millions of values this way it won't be anything to worry about.
I think you want to re-write that function slightly:
template<typename T>
bool tryConvert(std::string const& s)
{
try { boost::lexical_cast<T>(s);}
catch (...) { return false; }
return true;
}
You could try something like this.
#include <sstream>
//Try to convert arg to result in a similar way to boost::lexical_cast
//but return true/false rather than throwing an exception.
template<typename T1, typename T2>
bool convert( const T1 & arg, T2 & result )
{
std::stringstream interpreter;
return interpreter<<arg &&
interpreter>>result &&
interpreter.get() == std::stringstream::traits_type::eof();
}
template<typename T>
double to_double( const T & t )
{
double retval=0;
if( ! convert(t,retval) ) { /* Do something about failure */ }
return retval;
}
template<typename T>
double is_double( const T & t )
{
double retval=0;
return convert(t,retval) );
}
The convert function does basically the same things as boost::lexical_cast, except lexical cast is more careful about avoiding allocating dynamic storage by using fixed buffers etc.
It would be possible to refactor the boost::lexical_cast code into this form, but that code is pretty dense and tough going - IMHO its a pity that lexical_cast wasn't implemented using somethign like this under the hood... then it could look like this:
template<typename T1, typename T2>
T1 lexical_cast( const T2 & t )
{
T1 retval;
if( ! try_cast<T1,T2>(t,retval) ) throw bad_lexical_cast();
return retval;
}
Better use regexes first and lexical_cast just to convert to the real type.
The compiler is pretty unlikely to manage to throw out the conversion no matter what. Exceptions are just the icing on the cake. If you want to optimize this, you'll have to write your own parser to recognize the format for a float. Use regexps or manually parse, since the pattern is simple:
if ( s.empty() ) return false;
string::const_iterator si = s.begin();
if ( *si == '+' || * si == '-' ) ++ si;
if ( si == s.end() ) return false;
while ( '0' <= *si && *si <= '9' && si != s.end() ) ++ si;
if ( si == s.end() ) return true;
if ( * si == '.' ) ++ si;
if ( ( * si == 'e' || * si == 'E' )
&& si - s.begin() <= 1 + (s[0] == '+') + (s[0] == '-') ) return false;
if ( si == s.end() ) return si - s.begin() > 1 + (s[0] == '+') + (s[0] == '-');
while ( '0' <= *si && *si <= '9' && si != s.end() ) ++ si;
if ( si == s.end() ) return true;
if ( * si == 'e' || * si == 'E' ) {
++ si;
if ( si == s.end() ) return false;
if ( * si == '-' || * si == '+' ) ++ si;
if ( si == s.end() ) return false;
while ( '0' <= *si && *si <= '9' && si != s.end() ) ++ si;
}
return si == s.end();
Not tested… I'll let you run through all the possible format combinations ;v)
Edit: Also, note that this is totally incompatible with localization. You have absolutely no hope of internationally checking without converting.
Edit 2: Oops, I thought someone else already suggested this. boost::lexical_cast is actually deceptively simple. To at least avoid throwing+catching the exception, you can reimplement it somewhat:
istringstream ss( s );
double d;
ss >> d >> ws; // ws discards whitespace
return ss && ss.eof(); // omit ws and eof if you know no trailing spaces
This code, on the other hand, has been tested ;v)
As the type T is a templated typename, I believe your answer is the right one, as it will be able to handle all cases already handled by boost::lexical_cast.
Still, don't forget to specialize the function for known types, like char *, wchar_t *, std::string, wstring, etc.
For example, you could add the following code :
template<>
bool is_double<int>(const int & s)
{
return true ;
}
template<>
bool is_double<double>(const double & s)
{
return true ;
}
template<>
bool is_double<std::string>(const std::string & s)
{
char * p ;
strtod(s.c_str(), &p) ; // include <cstdlib> for strtod
return (*p == 0) ;
}
This way, you can "optimize" the processing for the types you know, and delegate the remaining cases to boost::lexical_cast.