convert emoji string to icu::UnicodeString - c++

I have a method reads a json file and returns a const char* that can be any text, including emojis. I don't have access to the source of this method.
For example, I created a json file with the england flag, 🏴󠁧󠁢󠁥󠁮󠁧󠁿 ({message: "\uD83C\uDFF4\uDB40\uDC67\uDB40\uDC62\uDB40\uDC65\uDB40\uDC6E\uDB40\uDC67\uDB40\uDC7F"}).
When I call that method, it returns something like 🏴󠁧󠁢󠁥󠁮󠁧󠁿, but in order to use it properly, I need to convert it to icu::UnicodeString because I use another method (closed source again) that expects it.
The only way I found to make it work was something like:
icu::UnicodeString unicode;
unicode.setTo((UChar*)convertMessage().data());
std::string messageAsString;
unicode.toUTF8String(messageAsString);
after doing that, messageAsString is usable and everything works.
convertMessage() is a method that uses std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t>::from_bytes(str).
My question is, is there a way to create a icu::UnicodeString without using that extra convertMessage() call?

This is sample usage of ucnv_toUChars function. I took these function from postgresql source code and used it for my project.
UConverter *icu_converter;
static int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, int32_t nbytes)
{
UErrorCode status;
int32_t len_uchar;
status = U_ZERO_ERROR;
len_uchar = ucnv_toUChars(icu_converter, NULL, 0,buff, nbytes, &status);
if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
return -1;
*buff_uchar = (UChar *) malloc((len_uchar + 1) * sizeof(**buff_uchar));
status = U_ZERO_ERROR;
len_uchar = ucnv_toUChars(icu_converter, *buff_uchar, len_uchar + 1,buff, nbytes, &status);
if (U_FAILURE(status))
assert(0); //(errmsg("ucnv_toUChars failed: %s", u_errorName(status))));
return len_uchar;
}
static int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
{
UErrorCode status;
int32_t len_result;
status = U_ZERO_ERROR;
len_result = ucnv_fromUChars(icu_converter, NULL, 0,
buff_uchar, len_uchar, &status);
if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
assert(0); // (errmsg("ucnv_fromUChars failed: %s", u_errorName(status))));
*result = (char *) malloc(len_result + 1);
status = U_ZERO_ERROR;
len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
buff_uchar, len_uchar, &status);
if (U_FAILURE(status))
assert(0); // (errmsg("ucnv_fromUChars failed: %s", u_errorName(status))));
return len_result;
}
void main() {
const char *utf8String = "Hello";
int len = 5;
UErrorCode status = U_ZERO_ERROR;
icu_converter = ucnv_open("utf8", &status);
assert(status <= U_ZERO_ERROR);
UChar *buff_uchar;
int32_t len_uchar = icu_to_uchar(&buff_uchar, ut8String, len);
// use buff_uchar
free(buff_uchar);
}

Related

sending i2c command from C++ application

I want to send a signal to i2c device though C++ application. I have tried to use system() function, but it take about 7-10ms to return.
so I have found this library but it doesn't allow me to send the port number.
this is the command that i want to send
i2cset -f -y 0 0x74 2 0x00
where, 2 is the port number. 0x00: is the command that I need to set in destination device.
So my question is is there any way to send a direct way to communicate with i2c device the same as i2cset application does?
Yes, there is a way. You can read some documentation here: https://www.kernel.org/doc/Documentation/i2c/dev-interface
Basically you have to first open the I2C device for reading and writing, on Raspberry Pi (which is where I have used this) it is:
int m_busFD = open("/dev/i2c-0", O_RDWR);
Then there are two ways:
Either use ioctl to set the address and then read() or write() to read or write to the line. This can look like so:
bool receiveBytes(const int addr, uint8_t *buf, const int len)
{
if (ioctl(busFD, I2C_SLAVE, addr) < 0)
return -1;
int res = read(busFD, buf, len);
return res == len;
}
Or use the i2c_msg/i2c_rdwr_ioctl_data struct interface with ioctl. This looks more complicated, but allows you to do more complex operations such as a write-restart-read operation. Here is the same read as before, but using this interface:
bool receiveBytes(const int addr, uint8_t *buf, const int len)
{
i2c_msg msgs[1] = {
{.addr = static_cast<uint16_t>(addr),
.flags = I2C_M_RD,
.len = static_cast<uint16_t>(len),
.buf = buf}};
i2c_rdwr_ioctl_data wrapper = {
.msgs = msgs,
.nmsgs = 1};
if (ioctl(m_busFD, I2C_RDWR, &wrapper) < 0)
return false;
return (msgs[0].len == len);
}
And here is an example of a write-restart-read:
bool sendRecBytes(
const int addr,
uint8_t *sbuf, const int slen,
uint8_t *rbuf, const int rlen)
{
i2c_msg msgs[2] = {
{.addr = static_cast<uint16_t>(addr),
.flags = {},
.len = static_cast<uint16_t>(slen),
.buf = sbuf},
{.addr = static_cast<uint16_t>(addr),
.flags = I2C_M_RD,
.len = static_cast<uint16_t>(rlen),
.buf = rbuf}};
i2c_rdwr_ioctl_data wrapper = {
.msgs = msgs,
.nmsgs = 2};
if (ioctl(m_busFD, I2C_RDWR, &wrapper) < 0)
return false;
return (msgs[0].len == slen) && (msgs[1].len == rlen);
}
Edit: Forgot to mention that this all requires:
#include <sys/ioctl.h>
#include <linux/i2c-dev.h>
#include <linux/i2c.h>

OpenSSL EVP_PKEY_verify() returns -1

I wrote a function that should verify a signature by opening a file and checking the signature against unsigned char buff[]= "data";.
This function returns -1, which
indicates an error other than signature verification failure
as per evp_pkey_verify.
What kind of error is this? Why is there no further documentation there? I find it pretty useless if a function returns values that are not described in the function description.
bool verify_sig_of_buff(const string & pub_key_file_path, const unsigned char * buff, size_t buff_len, const string & sig){
FILE * f = fopen(pub_key_file_path.c_str(), "r");
EC_KEY *ec_key = PEM_read_EC_PUBKEY(f, NULL, NULL, NULL);
fclose(f);
EVP_PKEY * key = EVP_PKEY_new();
assert(1==EVP_PKEY_assign_EC_KEY(key, ec_key));
EVP_PKEY_CTX * key_ctx = EVP_PKEY_CTX_new(key,NULL);
assert(1==EVP_PKEY_verify_init(key_ctx));
assert(1==EVP_PKEY_CTX_set_signature_md(key_ctx, EVP_sha256()) );
size_t sig_len=0;
const int ret=EVP_PKEY_verify(key_ctx, (unsigned char * )&sig[0],sig.size(), buff , buff_len);
EVP_PKEY_CTX_free(key_ctx);
EVP_PKEY_free(key);
cout<<ret<<endl;
return ret;
}

ICU: ucnv_convertEx – detect encoding error on the fly

Is it possible to detect encoding errors with ICU at conversion time, or is it necessary to pre or post check the conversion?
Given the initialization where a conversion from UTF8 to UTF32 is setup:
#include <stdio.h>
#include "unicode/ucnv.h" /* C Converter API */
static void eval(UConverter* from, UConverter* to);
int main(int argc, char** argv)
{
UConverter* from;
UConverter* to;
UErrorCode status;
/* Initialize converter from UTF8 to Unicode ___________________________*/
status = U_ZERO_ERROR;
from = ucnv_open("UTF-8", &status);
if( ! from || ! U_SUCCESS(status) ) return 1;
status = U_ZERO_ERROR;
to = ucnv_open("UTF32", &status);
if( ! to || ! U_SUCCESS(status) ) return 1;
/*______________________________________________________________________*/
eval(from, to);
return 0;
}
Then, applying the conversion using ucnv_convertEx via
static void eval(UConverter* from, UConverter* to)
{
UErrorCode status = U_ZERO_ERROR;
uint32_t drain[1024];
uint32_t* drain_p = &drain[0];
uint32_t* p = &drain[0];
/* UTF8 sequence with error in third byte ______________________________*/
const char source[] = { "\xED\x8A\x0A\x0A" };
const char* source_p = &source[0];
ucnv_convertEx(to, from, (char**)&drain_p, (char*)&drain[1024],
&source_p, &source[5],
NULL, NULL, NULL, NULL, /* reset = */TRUE, /* flush = */TRUE,
&status);
/* Print conversion result _____________________________________________*/
printf("source_p: source + %i;\n", (int)(source_p - &source[0]));
printf("status: %s;\n", u_errorName(status));
printf("drain: (n=%i)[", (int)(drain_p - &drain[0]));
for(p=&drain[0]; p != drain_p ; ++p) { printf("%06X ", (int)*p); }
printf("]\n");
}
where source contains an inadmissible UTF8 code unit sequence, the function should somehow report an error. Storing the above fragments in "test.c" and compiling the above code with
$ gcc test.c $(icu-config --ldflags) -o test
The output of ./test is (surprisingly):
source_p: source + 5;
status: U_ZERO_ERROR;
drain: (n=5)[00FEFF 00FFFD 00000A 00000A 000000 ]
So, no obvious sign of a detected error. Can error detection be done more elegantly than manually checking the content?
As #Eljay suggests in the comments, you can use an error callback. You don't even need to write your own, since the built-in UCNV_TO_U_CALLBACK_STOP will do what you want (ie, return a failure for any bad characters).
int TestIt()
{
UConverter* utf8conv{};
UConverter* utf32conv{};
UErrorCode status{ U_ZERO_ERROR };
utf8conv = ucnv_open("UTF8", &status);
if (!U_SUCCESS(status))
{
return 1;
}
utf32conv = ucnv_open("UTF32", &status);
if (!U_SUCCESS(status))
{
return 2;
}
const char source[] = { "\xED\x8A\x0A\x0A" };
uint32_t target[10]{ 0 };
ucnv_setToUCallBack(utf8conv, UCNV_TO_U_CALLBACK_STOP, nullptr,
nullptr, nullptr, &status);
if (!U_SUCCESS(status))
{
return 3;
}
auto sourcePtr = source;
auto sourceEnd = source + ARRAYSIZE(source);
auto targetPtr = target;
auto targetEnd = reinterpret_cast<const char*>(target + ARRAYSIZE(target));
ucnv_convertEx(utf32conv, utf8conv, reinterpret_cast<char**>(&targetPtr),
targetEnd, &sourcePtr, sourceEnd, nullptr, nullptr, nullptr, nullptr,
TRUE, TRUE, &status);
if (!U_SUCCESS(status))
{
return 4;
}
printf("Converted '%s' to '", source);
for (auto start = target; start != targetPtr; start++)
{
printf("\\x%x", *start);
}
printf("'\r\n");
return 0;
}
This should return 4 for invalid Unicode codepoints, and print out the UTF-32 values if it was successful. It's unlikely we'd get an error from ucnv_setToUCallBack, but we check just in case. In the example above, we pass nullptr for the previous action since we don't care what it was and don't need to reset it.

What build environment differences are causing the mesa GBM library to behave differently

I am working on a basic demo application (kmscube) to do rendering via the DRM and GBM APIs. My application uses the TI variation of libgbm (mesa generic buffer management). TI provides the source, GNU autotools files, and a build environment (Yocto) to compile it. I compiled GBM in their environment and it works perfectly.
I did not want to use Yocto, so I moved the source into my own build system (buildroot) and compiled it. Everything compiles correctly (using the same autotools files), but when it runs, I get a segmentation fault when I call gbm_surface_create. I debugged as far as I could and found that the program steps into the gbm library but fails on return gbm->surface_create(gbm, width, height, format, flags);
What would cause a library to run differently when compiled in different environments. Are there some really important compiler or linker flags that I could be missing?
This is the code from the graphics application (in kmscube.c)
gbm.dev = gbm_create_device(drm.fd);
gbm.surface = gbm_surface_create(gbm.dev,
drm.mode[DISP_ID]->hdisplay, drm.mode[DISP_ID]->vdisplay,
drm_fmt_to_gbm_fmt(drm.format[DISP_ID]),
GBM_BO_USE_SCANOUT | GBM_BO_USE_RENDERING);
if (!gbm.surface) {
printf("failed to create gbm surface\n");
return -1;
}
return 0;
This is the call stack that creates the device ( in gbm.c)
GBM_EXPORT struct gbm_device *
gbm_create_device(int fd)
{
struct gbm_device *gbm = NULL;
struct stat buf;
if (fd < 0 || fstat(fd, &buf) < 0 || !S_ISCHR(buf.st_mode)) {
fprintf(stderr, "gbm_create_device: invalid fd: %d\n", fd);
return NULL;
}
if (device_num == 0)
memset(devices, 0, sizeof devices);
gbm = _gbm_create_device(fd);
if (gbm == NULL)
return NULL;
gbm->dummy = gbm_create_device;
gbm->stat = buf;
gbm->refcount = 1;
if (device_num < ARRAY_SIZE(devices)-1)
devices[device_num++] = gbm;
return gbm;
}
(continued in backend.c)
struct gbm_device *
_gbm_create_device(int fd)
{
const struct gbm_backend *backend = NULL;
struct gbm_device *dev = NULL;
int i;
const char *b;
b = getenv("GBM_BACKEND");
if (b)
backend = load_backend(b);
if (backend)
dev = backend->create_device(fd);
for (i = 0; i < ARRAY_SIZE(backends) && dev == NULL; ++i) {
backend = load_backend(backends[i]);
if (backend == NULL)
continue;
fprintf(stderr, "found valid GBM backend : %s\n", backends[i]);
dev = backend->create_device(fd);
}
return dev;
}
static const void *
load_backend(const char *name)
{
char path[PATH_MAX];
void *module;
const char *entrypoint = "gbm_backend";
if (name[0] != '/')
snprintf(path, sizeof path, MODULEDIR "/%s", name);
else
snprintf(path, sizeof path, "%s", name);
module = dlopen(path, RTLD_NOW | RTLD_GLOBAL);
if (!module) {
fprintf(stderr, "failed to load module: %s\n", dlerror());
return NULL;
}
else {
fprintf(stderr, "loaded module : %s\n", name);
}
return dlsym(module, entrypoint);
}
And here is the code that throws a segmentation fault (in gbm.c)
GBM_EXPORT struct gbm_surface *
gbm_surface_create(struct gbm_device *gbm,
uint32_t width, uint32_t height,
uint32_t format, uint32_t flags)
{
return gbm->surface_create(gbm, width, height, format, flags);
}

OpenSSL: AES CCM 256 bit encryption of large file by blocks: is it possible?

I am working on a task to encrypt large files with AES CCM mode (256-bit key length). Other parameters for encryption are:
tag size: 8 bytes
iv size: 12 bytes
Since we already use OpenSSL 1.0.1c I wanted to use it for this task as well.
The size of the files is not known in advance and they can be very large. That's why I wanted to read them by blocks and encrypt each blocks individually with EVP_EncryptUpdate up to the file size.
Unfortunately the encryption works for me only if the whole file is encrypted at once. I get errors from EVP_EncryptUpdate or strange crashes if I attempt to call it multiple times. I tested the encryption on Windows 7 and Ubuntu Linux with gcc 4.7.2.
I was not able to find and information on OpenSSL site that encrypting the data block by block is not possible (or possible).
Additional references:
http://www.fredriks.se/?p=23
http://incog-izick.blogspot.in/2011/08/using-openssl-aes-gcm.html
Please see the code below that demonstrates what I attempted to achieve. Unfortunately it is failing where indicated in the for loop.
#include <QByteArray>
#include <openssl/evp.h>
// Key in HEX representation
static const char keyHex[] = "d896d105b05aaec8305d5442166d5232e672f8d5c6dfef6f5bf67f056c4cf420";
static const char ivHex[] = "71d90ebb12037f90062d4fdb";
// Test patterns
static const char orig1[] = "Very secret message.";
const int c_tagBytes = 8;
const int c_keyBytes = 256 / 8;
const int c_ivBytes = 12;
bool Encrypt()
{
EVP_CIPHER_CTX *ctx;
ctx = EVP_CIPHER_CTX_new();
EVP_CIPHER_CTX_init(ctx);
QByteArray keyArr = QByteArray::fromHex(keyHex);
QByteArray ivArr = QByteArray::fromHex(ivHex);
auto key = reinterpret_cast<const unsigned char*>(keyArr.constData());
auto iv = reinterpret_cast<const unsigned char*>(ivArr.constData());
// Initialize the context with the alg only
bool success = EVP_EncryptInit(ctx, EVP_aes_256_ccm(), nullptr, nullptr);
if (!success) {
printf("EVP_EncryptInit failed.\n");
return success;
}
success = EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_CCM_SET_IVLEN, c_ivBytes, nullptr);
if (!success) {
printf("EVP_CIPHER_CTX_ctrl(EVP_CTRL_CCM_SET_IVLEN) failed.\n");
return success;
}
success = EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_CCM_SET_TAG, c_tagBytes, nullptr);
if (!success) {
printf("EVP_CIPHER_CTX_ctrl(EVP_CTRL_CCM_SET_TAG) failed.\n");
return success;
}
success = EVP_EncryptInit(ctx, nullptr, key, iv);
if (!success) {
printf("EVP_EncryptInit failed.\n");
return success;
}
const int bsize = 16;
const int loops = 5;
const int finsize = sizeof(orig1)-1; // Don't encrypt '\0'
// Tell the alg we will encrypt size bytes
// http://www.fredriks.se/?p=23
int outl = 0;
success = EVP_EncryptUpdate(ctx, nullptr, &outl, nullptr, loops*bsize + finsize);
if (!success) {
printf("EVP_EncryptUpdate for size failed.\n");
return success;
}
printf("Set input size. outl: %d\n", outl);
// Additional authentication data (AAD) is not used, but 0 must still be
// passed to the function call:
// http://incog-izick.blogspot.in/2011/08/using-openssl-aes-gcm.html
static const unsigned char aadDummy[] = "dummyaad";
success = EVP_EncryptUpdate(ctx, nullptr, &outl, aadDummy, 0);
if (!success) {
printf("EVP_EncryptUpdate for AAD failed.\n");
return success;
}
printf("Set dummy AAD. outl: %d\n", outl);
const unsigned char *in = reinterpret_cast<const unsigned char*>(orig1);
unsigned char out[1000];
int len;
// Simulate multiple input data blocks (for example reading from file)
for (int i = 0; i < loops; ++i) {
// ** This function fails ***
if (!EVP_EncryptUpdate(ctx, out+outl, &len, in, bsize)) {
printf("DHAesDevice: EVP_EncryptUpdate failed.\n");
return false;
}
outl += len;
}
if (!EVP_EncryptUpdate(ctx, out+outl, &len, in, finsize)) {
printf("DHAesDevice: EVP_EncryptUpdate failed.\n");
return false;
}
outl += len;
int finlen;
// Finish with encryption
if (!EVP_EncryptFinal(ctx, out + outl, &finlen)) {
printf("DHAesDevice: EVP_EncryptFinal failed.\n");
return false;
}
outl += finlen;
// Append the tag to the end of the encrypted output
if (!EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_CCM_GET_TAG, c_tagBytes, out + outl)) {
printf("DHAesDevice: EVP_CIPHER_CTX_ctrl failed.\n");
return false;
};
outl += c_tagBytes;
out[outl] = '\0';
EVP_CIPHER_CTX_cleanup(ctx);
EVP_CIPHER_CTX_free(ctx);
QByteArray enc(reinterpret_cast<const char*>(out));
printf("Plain text size: %d\n", loops*bsize + finsize);
printf("Encrypted data size: %d\n", outl);
printf("Encrypted data: %s\n", enc.toBase64().data());
return true;
}
EDIT (Wrong Solution)
The feedback that I received made me think in a different direction and I discovered that EVP_EncryptUpdate for size must be called for each block that it being encrypted, not for the total size of the file. I moved it just before the block is encrypted: like this:
for (int i = 0; i < loops; ++i) {
int buflen;
(void)EVP_EncryptUpdate(m_ctx, nullptr, &buflen, nullptr, bsize);
// Resize the output buffer to buflen here
// ...
// Encrypt into target buffer
(void)EVP_EncryptUpdate(m_ctx, out, &len, in, buflen);
outl += len;
}
AES CCM encryption block by block works this way, but not correctly, because each block is treated as independent message.
EDIT 2
OpenSSL's implementation works properly only if the complete message is encrypted at once.
http://marc.info/?t=136256200100001&r=1&w=1
I decided to use Crypto++ instead.
For AEAD-CCM mode you cannot encrypt data after associated data was feed to the context.
Encrypt all the data, and only after it pass the associated data.
I found some mis-conceptions here
first of all
EVP_EncryptUpdate(ctx, nullptr, &outl
calling this way is to know how much output buffer is needed so you can allocate buffer and second time give the second argument as valid big enough buffer to hold the data.
You are also passing wrong (over written by previous call) values when you actually add the encrypted output.