I have the code below:
std::string myName = "BLABLABLA";
//check if there are illegal characters
for (unsigned int i = 0; i < myName.length(); i++)
{
const char& c = myName[i];
if (!(isalnum(c) || (c == '_') || (c == '-')))
{
return 0;
}
}
This is the output of valgrind at line "const char& c = myName[i];"
==17249== 51 bytes in 1 blocks are possibly lost in loss record 116 of 224
==17249== at 0x4C2714E: operator new(unsigned long) (vg_replace_malloc.c:261)
==17249== by 0x602A498: std::string::_Rep::_S_create(unsigned long, unsigned long,
std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.16)
==17249== by 0x602A689: std::string::_M_mutate(unsigned long, unsigned long,
unsigned long) (in /usr/lib64/libstdc++.so.6.0.16)
==17249== by 0x602AFB5: std::string::_M_leak_hard() (in
/usr/lib64/libstdc++.so.6.0.16)
==17249== by 0x602B0A4: std::string::operator[](unsigned long) (in /
/usr/lib64/libstdc++.so.6.0.16)
I do not see anything wrong with this...
Yes, it's the horrible COW implementation!
You can also force use of the const (and therefore non-mutating) overloads like so:
std::string const myName = "BLABLABLA";
//check if there are illegal characters
for (unsigned int i = 0; i < myName.length(); i++)
{
const char& c = myName[i];
if (!(isalnum(c) || (c == '_') || (c == '-')))
{
return 0;
}
}
or (if you don't want to modify the original string type):
std::string myName = "BLABLABLA";
std::string const &cref = myName;
//check if there are illegal characters
for (unsigned int i = 0; i < myName.length(); i++)
{
const char& c = cref[i];
if (!(isalnum(c) || (c == '_') || (c == '-')))
{
return 0;
}
}
etc.
COW reference, because I knew I'd written something about it somewhere.
Related
#include<string>
#include<cstring>
class Solution {
void shift_left(char* c, const short unsigned int bits) {
const unsigned short int size = sizeof(c);
memmove(c, c+bits, size - bits);
memset(c+size-bits, 0, bits);
}
public:
string longestPalindrome(string s) {
char* output = new char[s.length()];
output[0] = s[0];
string res = "";
char* n = output;
auto e = s.begin() + 1;
while(e != s.end()) {
char letter = *e;
char* c = n;
(*++n) = letter;
if((letter != *c) && (c == &output[0] || letter != (*--c)) ) {
++e;
continue;
}
while((++e) != s.end() && c != &output[0]) {
if((letter = *e) != (*--c)) {
const unsigned short int bits = c - output + 1;
shift_left(output, bits);
n -= bits;
break;
}
(*++n) = letter;
}
string temp(output);
res = temp.length() > res.length()? temp : res;
shift_left(output, 1);
--n;
}
return res;
}
};
input string longestPalindrome("babad");
the program works fine and prints out "bab" as the longest palindrome but there's a heap overflow somewhere. Error like this appears:
Read of size 6 at ...memory address... thread T0
"babad" is size 5 and after going over this for an hour. I don't see the point where the iteration ever exceeds 5
There is 3 pointers here that iterate.
e as the element of string s.
n which is the pointer to the next char of output.
and c which is a copy of n and decrements until it reaches the address of &output[0].
maybe it's something with the memmove or memset since I've never used it before.
I'm completely lost
TL;DR : mixture of char* and std::string are not really good idea if you don't understand how exactly it works.
If you want to length of string you cant do this const unsigned short int size = sizeof(c); (sizeof will return size of pointer (which is commonly 4 on 32-bit machine and 8 on 64-bit machine). You must do this instead: const size_t size = strlen(c);
Address sanitizers is right that you (indirectly) are trying to get an memory which not belongs to you.
How does constructor of string from char* works?
Answer: char* is considered as c-style string, which means that it must be null '\0' terminated.
More details: constructor of string from char* calls strlen-like function which looks like about this:
https://en.cppreference.com/w/cpp/string/byte/strlen
int strlen(char *begin){
int k = 0;
while (*begin != '\0'){
++k;
++begin;
}
return k;
}
If c-style char* string does not contain '\0' it cause accessing memory which doesn't belongs to you.
How to fix?
Answer (two options):
not use mixture of char* and std::string
char* output = new char[s.length()]; replace with char* output = new char[s.length() + 1]; memset(output, 0, s.length() + 1);
Also you must delete all memory which you newed. So add delete[] output; before return res;
This question already has answers here:
Convert string from UTF-8 to ISO-8859-1
(3 answers)
Closed 1 year ago.
Referring to the ISO-8859-1 (Latin-1) encoding:
The capital E acute (É) has a hex value of C9.
I am trying to write a function that takes a std::string and then converts it to hex according to the ISO-8859-1 encoding above.
Currently, I am only able to write a function that converts an ASCII string to hex:
std::string Helper::ToHex(std::string input) {
std::stringstream strstream;
std::string output;
for (int i=0; i<input.length(); i++) {
strstream << std::hex << unsigned(input[i]);
}
strstream >> output;
}
However, this function can't do the job when the input has accented characters. It will convert É to a hex value of ffffffc3ffffff89.
std::string has no encoding of its own. It can easily hold characters encoded in ASCII, UTF-8, ISO-8859-x, Windows-125x, etc. They are just raw bytes, as far as std::string is concerned. So, before you can print your output in ISO-8859-1 specifically, you need to first know what the std::string is already holding so it can be converted to ISO-8859-1 if needed.
FYI, ffffffc3ffffff89 is simply the two char values 0xc3 0x89 (the UTF-8 encoded form of É) being sign-extended to 32 bits. Which means your compiler implements char as a signed type rather than an unsigned type. To eliminate the leading fs, you need to cast each char to unsigned char before then casting to unsigned. You also will need to account for unsigned values < 10 so that the output is an even multiple of 2 hex digits per char, eg:
strstream << std::hex << std::setw(2) << std::setfill('0') << static_cast<unsigned>(static_cast<unsigned char>(input[i]));
So, it appears that your std::string is encoded in UTF-8. There are plenty of libraries available that can convert text from one encoding to another, such as ICU or ICONV. Or platform-specific APIs, like WideCharToMultiByte()/MultiByteToWideChar() on Windows, std::mbstowcs()/std::wcstombs(), etc (provided suitable locales are installed in the OS). But there is nothing really built-in to C++ for this exact UTF-8 to ISO-8859-1 conversion. Though, you could use the (deprecated) std::wstring_convert to decode the UTF-8 std::string to a UTF-16/32 encoded std::wstring, or a UTF-16 encoded std::u16string, at least. And then you can convert that to ISO-8859-1 using whatever library you want as needed.
Or, knowing that the input is UTF-8 and the output is ISO-8859-1, it is really not that hard to just convert the data manually, decoding the UTF-8 into codepoints, and then encoding those codepoints to bytes. Both encodings are well-documented and fairly easy to write code for without too much effort, eg:
size_t nextUtf8CodepointLen(const char* data)
{
unsigned char ch = static_cast<unsigned char>(*data);
if ((ch & 0x80) == 0) {
return 1;
}
if ((ch & 0xE0) == 0xC0) {
return 2;
}
if ((ch & 0xF0) == 0xE0) {
return 3;
}
if ((ch & 0xF8) == 0xF0) {
return 4;
}
return 0;
}
unsigned nextUtf8Codepoint(const char* &data, size_t &data_size)
{
if (data_size == 0) return -1;
unsigned char ch = static_cast<unsigned char>(*data);
size_t len = nextUtf8CodepointLen(data);
++data;
--data_size;
if (len < 2) {
return (len == 1) ? static_cast<unsigned>(ch) : 0xFFFD;
}
--len;
unsigned cp;
if (len == 1) {
cp = ch & 0x1F;
}
else if (len == 2) {
cp = ch & 0x0F;
}
else {
cp = ch & 0x07;
}
if (len > data_size) {
data += data_size;
data_size = 0;
return 0xFFFD;
}
for(size_t j = 0; j < len; ++j) {
ch = static_cast<unsigned char>(data[j]);
if ((ch & 0xC0) != 0x80) {
cp = 0xFFFD;
break;
}
cp = (cp << 6) | (ch & 0x3F);
}
data += len;
data_size -= len;
return cp;
}
std::string Helper::ToHex(const std::string &input) {
const char *data = input.c_str();
size_t data_size = input.size();
std::ostringstream oss;
unsigned cp;
while ((cp = nextUtf8Codepoint(data, data_size)) != -1) {
if (cp > 0xFF) {
cp = static_cast<unsigned>('?');
}
oss << std::hex << std::setw(2) << std::setfill('0') << cp;
}
return oss.str();
}
Online Demo
I have a wstring, what's the best way to convert it to string in escaped form like \u043d\u043e\u043c\u0430 ?
The one below works but does not seem to be the best:
string output;
for (wchar_t chr : wtst) {
char code[7];
sprintf(code,"\\u%0.4X",chr);
output += code;
}
A less compact but faster version that a) allocates ahead of time and b) avoids the cost of printf re-interpreting the format string every iteration, c) avoiding the function call overhead to printf.
std::wstring wstr(L"\x043d\x043e\x043c\x0430");
std::string sstr;
// Reserve memory in 1 hit to avoid lots of copying for long strings.
static size_t const nchars_per_code = 6;
sstr.reserve(wstr.size() * nchars_per_code);
char code[nchars_per_code];
code[0] = '\\';
code[1] = 'u';
static char const* const hexlut = "0123456789abcdef";
std::wstring::const_iterator i = wstr.begin();
std::wstring::const_iterator e = wstr.end();
for (; i != e; ++i) {
unsigned wc = *i;
code[2] = (hexlut[(wc >> 12) & 0xF]);
code[3] = (hexlut[(wc >> 8) & 0xF]);
code[4] = (hexlut[(wc >> 4) & 0xF]);
code[5] = (hexlut[(wc) & 0xF]);
sstr.append(code, code + nchars_per_code);
}
I'm reading a string from a file so it's in the form of a char array. I need to tokenize the string and save each char array token as a uint8_t hex value in an array.
char* starting = "001122AABBCC";
// ...
uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
How can I convert from starting to ending? Thanks.
Here is a complete working program. It is based on Rob I's solution, but fixes several problems has been tested to work.
#include <string>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
const char* starting = "001122AABBCC";
int main()
{
std::string starting_str = starting;
std::vector<unsigned char> ending;
ending.reserve( starting_str.size());
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, 2 );
ending.push_back(::strtol( pair.c_str(), 0, 16 ));
}
for(int i=0; i<ending.size(); ++i) {
printf("0x%X\n", ending[i]);
}
}
strtoul will convert text in any base you choose into bytes. You have to do a little work to chop the input string into individual digits, or you can convert 32 or 64bits at a time.
ps uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
Doesn't mean anything, you aren't storing the data in a uint8 as 'hex', you are storing bytes, it's upto how you (or your debugger) interpretes the binary data
With C++11, you may use std::stoi for that :
std::vector<uint8_t> convert(const std::string& s)
{
if (s.size() % 2 != 0) {
throw std::runtime_error("Bad size argument");
}
std::vector<uint8_t> res;
res.reserve(s.size() / 2);
for (std::size_t i = 0, size = s.size(); i != size; i += 2) {
std::size_t pos = 0;
res.push_back(std::stoi(s.substr(i, 2), &pos, 16));
if (pos != 2) {
throw std::runtime_error("bad character in argument");
}
}
return res;
}
Live example.
I think any canonical answer (w.r.t. the bounty notes) would involve some distinct phases in the solution:
Error checking for valid input
Length check and
Data content check
Element conversion
Output creation
Given the usefulness of such conversions, the solution should probably include some flexibility w.r.t. the types being used and the locale required.
From the outset, given the date of the request for a "more canonical answer" (circa August 2014) liberal use of C++11 will be applied.
An annotated version of the code, with types corresponding to the OP:
std::vector<std::uint8_t> convert(std::string const& src)
{
// error check on the length
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
auto ishex = [] (decltype(*src.begin()) c) {
return std::isxdigit(c, std::locale()); };
// error check on the data contents
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
// allocate the result, initialised to 0 and size it to the correct length
std::vector<std::uint8_t> result(src.length() / 2, 0);
// run the actual conversion
auto str = src.begin(); // track the location in the string
std::for_each(result.begin(), result.end(), [&str](decltype(*result.begin())& element) {
element = static_cast<std::uint8_t>(std::stoul(std::string(str, str + 2), nullptr, 16));
std::advance(str, 2); // next two elements
});
return result;
}
The template version of the code adds flexibility;
template <typename Int /*= std::uint8_t*/,
typename Char = char,
typename Traits = std::char_traits<Char>,
typename Allocate = std::allocator<Char>,
typename Locale = std::locale>
std::vector<Int> basic_convert(std::basic_string<Char, Traits, Allocate> const& src, Locale locale = Locale())
{
using string_type = std::basic_string<Char, Traits, Allocate>;
auto ishex = [&locale] (decltype(*src.begin()) c) {
return std::isxdigit(c, locale); };
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
std::vector<Int> result(src.length() / 2, 0);
auto str = std::begin(src);
std::for_each(std::begin(result), std::end(result), [&str](decltype(*std::begin(result))& element) {
element = static_cast<Int>(std::stoul(string_type(str, str + 2), nullptr, 16));
std::advance(str, 2);
});
return result;
}
The convert() function can then be based on the basic_convert() as follows:
std::vector<std::uint8_t> convert(std::string const& src)
{
return basic_convert<std::uint8_t>(src, std::locale());
}
Live sample.
uint8_t is typically no more than a typedef of an unsigned char. If you're reading characters from a file, you should be able to read them into an unsigned char array just as easily as a signed char array, and an unsigned char array is a uint8_t array.
I'd try something like this:
std::string starting_str = starting;
uint8_t[] ending = new uint8_t[starting_str.length()/2];
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, i+2 );
ending[i/2] = ::strtol( pair.c_str(), 0, 16 );
}
Didn't test it but it looks good to me...
You may add your own conversion from set of char { '0','1',...'E','F' } to uint8_t:
uint8_t ctoa(char c)
{
if( c >= '0' && c <= '9' ) return c - '0';
else if( c >= 'a' && c <= 'f' ) return 0xA + c - 'a';
else if( c >= 'A' && c <= 'F' ) return 0xA + c - 'A';
else return 0;
}
Then it will be easy to convert a string in to array:
uint32_t endingSize = strlen(starting)/2;
uint8_t* ending = new uint8_t[endingSize];
for( uint32_t i=0; i<endingSize; i++ )
{
ending[i] = ( ctoa( starting[i*2] ) << 4 ) + ctoa( starting[i*2+1] );
}
This simple solution should work for your problem
char* starting = "001122AABBCC";
uint8_t ending[12];
// This algo will work for any size of starting
// However, you have to make sure that the ending have enough space.
int i=0;
while (i<strlen(starting))
{
// convert the character to string
char str[2] = "\0";
str[0] = starting[i];
// convert string to int base 16
ending[i]= (uint8_t)atoi(str,16);
i++;
}
uint8_t* ending = static_cast<uint8_t*>(starting);
recently I decide to debug my application with valgrind.
I've resolved a lot of errors, but can't this one.
==12205== Invalid read of size 8
==12205== at 0x37E1864C40: std::_Rb_tree_increment(std::_Rb_tree_node_base*) (in /usr/lib64/libstdc++.so.6.0.8)
==12205== by 0x40393C: readConfig(std::string) (stl_tree.h:257)
==12205== by 0x4058BE: main (application.cpp:42)
==12205== Address 0x5589b88 is 24 bytes inside a block of size 48 free'd
==12205== at 0x4A05A33: operator delete(void*) (vg_replace_malloc.c:346)
==12205== by 0x4067AD: std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::erase(std::_Rb_tree_iterator<std::pair<std::string const, std::string> >, std::_Rb_tree_iterator<std::pair<std::string const, std::string> >) (new_allocator.h:94)
==12205== by 0x406841: std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::erase(std::string const&) (stl_tree.h:1215)
==12205== by 0x403934: readConfig(std::string) (stl_map.h:461)
==12205== by 0x4058BE: main (application.cpp:42)
Part of my code:
string config_file;
if (strlen(getParam("config") . c_str()) > 0)
{
config_file = getParam("config");
}
else
config_file = string("default.conf");
if (access(config_file . c_str(), 0) == -1)
{
printf("Config file \"%s\" not exists\n", config_file . c_str());
exit(1);
}
if (!readConfig(config_file))
{
printf("Application error: read config file\n");
exit(1);
}
String #42:
if (!readConfig(config_file))
Please try to help me.
Thanks in advance!
Update #1:
I apologize for so large function :(
bool readConfig(string filename)
{
time_t rawtime;
struct tm * timeinfo;
time ( &rawtime );
timeinfo = localtime ( &rawtime );
map<string,string> tmp_buff;
ifstream ifs( filename.c_str() );
string temp,tmp;
int i;
unsigned int MAX_MATCH = 40;
regex_t re;
char *pattern = "([^=]+)=(.+)";
const char* s;
map<int,string> matches_tmp;
map<string,string>::const_iterator it;
char s1[1024];
size_t rm;
regmatch_t pmatch[MAX_MATCH];
regcomp(&re, pattern, REG_ICASE|REG_EXTENDED|REG_NOSUB);
if ((rm = regcomp (&re, pattern, REG_EXTENDED)) != 0)
{
printf("Invalid expression:'%s'\n",pattern);
return false;
}
int start[2]={0},end[2]={0},current[2]={0};
char * b;
string substr;
bool start_time=false,inside_time=false,error=false;
while( getline( ifs, temp ) )
{
tmp=trim(temp);
tmp=temp;
if(strlen(tmp.c_str())==0) continue;
s=tmp.c_str();
if(!regexec(&re, s, MAX_MATCH, pmatch, 0))
{
for(i=1;i<=2;i++)
{
strncpy (s1, s + pmatch[i].rm_so, pmatch[i].rm_eo - pmatch[i].rm_so);
s1[pmatch[i].rm_eo - pmatch[i].rm_so] = '\0';
matches_tmp[i]=trim((string)s1);
}
if(matches_tmp[1]==string("start-time"))
{
substr=matches_tmp[2].substr(0,2);
b=new char[substr.length()+1];
strcpy(b, substr.c_str() );
if(strlen(b)!=2) continue;
start[0]=atoi(b);
//free(b);
substr=matches_tmp[2].substr(3,2);
b=new char[substr.length()+1];
strcpy(b, substr.c_str() );
if(strlen(b)!=2) continue;
start[1]=atoi(b);
start_time=true;
continue;
}
if(matches_tmp[1]==string("end-time"))
{
start_time=false;
substr=matches_tmp[2].substr(0,2);
b=new char[substr.length()+1];
strcpy(b, substr.c_str() );
if(strlen(b)!=2) error=true;
end[0]=atoi(b);
substr=matches_tmp[2].substr(3,2);
b=new char[substr.length()+1];
strcpy(b, substr.c_str() );
if(strlen(b)!=2) error=true;
end[1]=atoi(b);
if(error)
{
printf("ERROR1\n");
error=false;
continue;
}
current[0]=timeinfo->tm_hour;
current[1]=timeinfo->tm_min;
if(end[0]<start[0])
{
if(
(current[0]<start[0] && current[0]>end[0]) ||
(current[0]==start[0] && current[1]<start[1]) ||
(current[0]==end[0] && current[1]>end[1])
)
{
error=true;
}
}else
{
if(
(current[0]<start[0]) ||
(current[0]>start[0] && current[0]>end[0]) ||
(current[0]==start[0] && current[1]<start[1]) ||
(current[0]==end[0] && current[1]>end[1])
)
{
error=true;
}
}
if(error)
{
error=false;
continue;
}
for (it = tmp_buff.begin(); it != tmp_buff.end(); ++it)
{
if(config.find( it->first ) != config.end()) config.erase(it->first);
config[it->first]=it->second;
tmp_buff.erase(it->first);
}
}
if(strlen(matches_tmp[1].c_str())==0) continue;
if(start_time)
{
tmp_buff[matches_tmp[1]]=matches_tmp[2];
}
else
config[matches_tmp[1]]=matches_tmp[2];
}
}
}
I suppose you are incrementing an invalid std::set or std::map iterator. This incorrect program produces a similar valgrind error:
#include <set>
int main () {
std::set<int> s;
s.insert(1);
s.insert(2);
s.insert(3);
for(std::set<int>::iterator it = s.begin(); it != s.end(); ++it) {
if(*it == 2) s.erase(it);
}
}
EDIT: Yep, you are doing exactly what I said:
for (it = tmp_buff.begin(); it != tmp_buff.end(); ++it)
{
if(config.find( it->first ) != config.end()) config.erase(it->first);
config[it->first]=it->second;
tmp_buff.erase(it->first);
}
The call to tmp_buff.erase(it->first) invalidates it. But, you subsequently increment it: ++it. This is not allowed.
Also, there is no reason to call config.erase. The entry in config will be implicity destroyed when it is overwritten in the next line. Try:
for (it = tmp_buff.begin(); it != tmp_buff.end(); ++it)
{
config[it->first]=it->second;
}
tmp_buff.clear();