I have a string that is containing a small json containing only strings. I have used stringstream and boost::property_tree::read_json for reading it. I have seen that this is not very fast, moreover, the boost json parser is not thread safe (because of the streams). So I have tried to make it in another way:
std::vector< std::string > fields;
std::vector< std::string > values;
int separator = -1;
int prevSeparator = 0;
int fieldBegin = 0;
int fieldEnd = 0;
int valueBegin = 0;
int valueEnd = 0;
int64 t0 = cv::getTickCount();
do
{
prevSeparator = separator + 1;
separator = jsonStream.substr(prevSeparator, jsonStream.size() - prevSeparator - 1).find_first_of(',') + prevSeparator;
std::string element = jsonStream.substr(prevSeparator, separator - prevSeparator);
int fvSeparator = element.find_first_of(':');
std::string field = element.substr(0, fvSeparator);
std::string value = element.substr(fvSeparator + 1, element.size() - fvSeparator - 1);
fieldBegin = field.find_first_of('\"') + 1;
fieldEnd = field.find_last_of('\"');
fields.push_back(field.substr(fieldBegin, fieldEnd - fieldBegin));
valueBegin = value.find_first_of('\"') + 1;
valueEnd = value.find_last_of('\"');
values.push_back(value.substr(valueBegin, valueEnd - valueBegin));
} while (prevSeparator - separator <= 0);
Do you think it is good enough or what shall I improve?
If i understand your description of the input right, you have a Json-Array containing strings. That means, it starts with [", then follows a sequence of strings separated by ",". Finally you have "].
Here is a high level algorithm for you:
Split input by ", watching for escaped quotes.
Remove strings [ and ] from the ends (there can be whitespace in there, too).
Remove strings , that appear in between the desired strings (there can be whitespace in there, too).
Unescape based on Json escaping rules, in case there are any escapes.
Related
I have an XML file with many values and a working C++ function that can retrieve these values
Two of these values are:
A file path such as: "C:\foo1\foo2" and
A file name: "foo3.txt"
Combining these together, they would become "C:\foo1\foo2\foo3.txt"
However, while trying to set a CString to save a file path, it will give an error because using the character, \, in a string is not allowed due to string notation and its interaction with the \ character.
I am using MFC, and I know WIN32 allows you to create a file path with / instead of \, so: "C:/foo1/foo2/foo3.txt" would work. I tested this in Windows Explorer and it worked.
I would like to collect the file path from XML file, but when it comes in, it will have \ instead of / in its file path, meaning it will not be possible to replace the character (the string coming in will have an error already due to XML not having a problem with the \ character.
How do I safely retrieve the path as a CString, ideally while converting any \ character to a / character.
Now I'm not familiar with the "CString" class you are refering to. Googling the API documentation just has the standard c style char array format commands, so I'm going to assume rightly or wrongly cstring is a char array.
The fact we are going to need to use an object that is not resizable means we either
Need to use the heap, which will be slow, and can leak memory if the memory isn't deleted later
Allow a maximum string length and accept it will be truncated if below this
Heap example (NOTE: I'm not using smart pointers as I assume they don't have access to them, else you'd just std::string and not do this.)
char* escapeString(const char* data, unsigned int length){
//multiplying by 1.5 means this could still truncate,
//but I'm making an educated guess it's not all bad characters.
const int newLen = (length + 1) * 1.5;
char* escaped = new char[newLen + 1];
unsigned int index = 0;
for(unsigned int i = 0; i < length && i < newLen; i++){
if(data[i] == '\\' || data[i] == '\"'){
escaped[index++] = '\\';
}
else if(data[i] == '%'){
escaped[index++] = '%';
}
//else anything else you want to escape
escaped[index++] = data[i];
}
//Make sure a null string is null terminatedescaped
escaped[index] = '\0';
return escaped;
}
int main() {
const char* stringWithBadChars = "I\"m not a %%good \\string";
char* escapedString = escapeString(stringWithBadChars, strlen(stringWithBadChars));
std::cout << escapedString;
delete [] escapedString;
return 0;
}
If we do this on the stack instead it would be a lot faster, but we are limited by the size of the buffer we give, and the size of the buffer in the function. We will return a bool if either fails.
bool escapeString(char* data, unsigned int length){
const int newLen = 1000;
char escaped[1001];
unsigned int index = 0;
for(unsigned int i = 0; i < length && i < newLen; i++){
if(data[i] == '\\' || data[i] == '\"'){
escaped[index++] = '\\';
}
else if(data[i] == '%'){
escaped[index++] = '%';
}
escaped[index++] = data[i];
}
//Make sure a null string is null terminatedescaped
memcpy(data, escaped, index);
escaped[index] = '\0';
return index < length && index < 1000;
}
You could probably get even more efficiency using memmov rather than copy it character by character. Doing it this way you also wouldn't need the second char array.
CString reserves some special characters. Have a look at the Format command as an example. The linked documentation refers you to: Format specification syntax: printf and wprintf functions.
The \ is used as mentioned in the comments to indicate a special character. For example:
\t will insert a tab character.
\" will insert a double quote character.
So when it hits the \ it expects the next character to be one of the special ones. Therefore, when you actually need a backslash, you use \\.
The linked article does explain about % but not the slash. However, tt is exactly the same with % because it too has special meaning. So you would use %% when you want the percent sign.
I'm trying to take a string of emojis and split them into a vector of each emoji Given the string:
std::string emojis = "😀🔍🦑😁🔍🎉😂🤣";
I'm trying to get:
std::vector<std::string> splitted_emojis = {"😀", "🔍", "🦑", "😁", "🔍", "🎉", "😂", "🤣"};
Edit
I've tried to do:
std::string emojis = "😀🔍🦑😁🔍🎉😂🤣";
std::vector<std::string> splitted_emojis;
size_t pos = 0;
std::string token;
while ((pos = emojis.find("")) != std::string::npos)
{
token = emojis.substr(0, pos);
splitted_emojis.push_back(token);
emojis.erase(0, pos);
}
But it seems like it throws terminate called after throwing an instance of 'std::bad_alloc' after a couple of seconds.
When trying to check how many emojis are in a string using:
std::string emojis = "😀🔍🦑😁🔍🎉😂🤣";
std::cout << emojis.size() << std::endl; // returns 32
it returns a bigger number which i assume are the unicode data. I don't know too much about unicode data but i'm trying to figure out how to check for when the data of an emoji begins and ends to be able to split the string to each emoji
I would definitely recommend that you use a library with better unicode support (all large frameworks do), but in a pinch you can get by with knowing that the UTF-8 encoding spreads Unicode characters over multiple bytes, and that the first bits of the first byte determine how many bytes a character is made up of.
I stole a function from boost. The split_by_codepoint function uses an iterator over the input string and constructs a new string using the first N bytes (where N is determined by the byte count function) and pushes it to the ret vector.
// Taken from boost internals
inline unsigned utf8_byte_count(uint8_t c)
{
// if the most significant bit with a zero in it is in position
// 8-N then there are N bytes in this UTF-8 sequence:
uint8_t mask = 0x80u;
unsigned result = 0;
while(c & mask)
{
++result;
mask >>= 1;
}
return (result == 0) ? 1 : ((result > 4) ? 4 : result);
}
std::vector<std::string> split_by_codepoint(std::string input) {
std::vector<std::string> ret;
auto it = input.cbegin();
while (it != input.cend()) {
uint8_t count = utf8_byte_count(*it);
ret.emplace_back(std::string{it, it+count});
it += count;
}
return ret;
}
int main() {
std::string emojis = u8"😀🔍🦑😁🔍🎉😂🤣";
auto split = split_by_codepoint(emojis);
std::cout << split.size() << std::endl;
}
Note that this function simply splits a string into UTF-8 strings containing one code point each. Determining if the character is an emoji is left as an exercise: UTF-8-decode any 4-byte characters and see if they are in the proper range.
I am fairly new with C++ so for some people the answer to the quesiton I have might seem quite obvious.
What I want to achieve is to create a method which would return the given char array fill with empty spaces before and after it in order to meet certain length. So the effect at the end would be as if the given char array would be in the middle of the other, bigger char array.
Lets say we have a char array with HelloWorld!
I want the method to return me a new char array with the length specified beforehand and the given char array "positioned" in the middle of returning char array.
char ch[] = "HelloWorld";
char ret[20]; // lets say we want to have the resulting char array the length of 20 chars
char ret[20] = " HelloWorld "; // this is the result to be expected as return of the method
In case of odd number of given char array would like for it to be in offset of one space on the left of the middle.
I would also like to avoid any memory consuming strings or any other methods that are not in standard library - keep it as plain as possible.
What would be the best way to tackle this issue? Thanks!
There are mainly two ways of doing this: either using char literals (aka char arrays), like you would do in C language or using built-in std::string type (or similar types), which is the usual choice if you're programming in C++, despite there are exceptions.
I'm providing you one example for each.
First, using arrays, you will need to include cstring header to use built-in string literals manipulation functions. Keep in mind that, as part of the length of it, a char array always terminates with the null terminator character '\0' (ASCII code is 0), therefore for a DIM-dimensioned string you will be able to store your characters in DIM - 1 positions. Here is the code with comments.
constexpr int DIM = 20;
char ch[] = "HelloWorld";
char ret[DIM] = "";
auto len_ch = std::strlen(ch); // length of ch without '\0' char
auto n_blanks = DIM - len_ch - 1; // number of blank chars needed
auto half_n_blanks = n_blanks / 2; // half of that
// fill in from begin and end of ret with blanks
for (auto i = 0u; i < half_n_blanks; i++)
ret[i] = ret[DIM - i - 2] = ' ';
// copy ch content into ret starting from half_n_blanks position
memcpy_s(
ret + half_n_blanks, // start inserting from here
DIM - half_n_blanks, // length from our position to the end of ret
ch, // string we need to copy
len_ch); // length of ch
// if odd, after ch copied chars
// there will be a space left to insert a blank in
if (n_blanks % 2 == 1)
*(ret + half_n_blanks + len_ch) = ' ';
I chose first to insert blank spaces both to the begin and to the end of the string and then to copy the content of ch.
The second approach is far easier (to code and to understand). The max characters size a std::string (defined in header string) can contain is std::npos, which is the max number you can have for the type std::size_t (usually a typedef for unsigned int). Basically, you don't have to worry about a std::string max length.
std::string ch = "HelloWorld", ret;
auto ret_max_length = 20;
auto n_blanks = ret_max_length - ch.size();
// insert blanks at the beginning
ret.append(n_blanks / 2, ' ');
// append ch
ret += ch;
// insert blanks after ch
// if odd, simply add 1 to the number of blanks
ret.append(n_blanks / 2 + n_blanks % 2, ' ');
The approach I took here is different, as you can see.
Notice that, because of '\0', the result of these two methods are NOT the same. If you want to obtain the same behaviour, you may either add 1 to DIM or subtract 1 from ret_max_length.
Assuming that we know the size, s, of the array, ret and knowing that the last character of any char array is '\0', we find the length, l, of the input char array, ch.
int l = 0;
int i;
for(i=0; ch[i]!='\0'; i++){
l++;
}
Then we compute how many spaces we need on either side. If total_space is even, then there are equal spaces on either side. Otherwise, we can choose which side will have the extra space, in this case, the left side.
int total_spaces = size-l-1; // subtract by 1 to adjust for '\0' character
int spaces_right = 0, spaces_left = 0;
if((total_spaces%2) == 0){
spaces_left = total_spaces/2;
spaces_right = total_spaces/2;
}
else{
spaces_left = total_spaces/2;
spaces_right = (total_spaces/2)+1;
}
Then first add the left_spaces, then the input array, ch, and then the right_spaces to ret.
i=0;
while(spaces_left > 0){
ret[i] = ' ';
spaces_left--;
i++;
} // add spaces
ret[i] = '\0';
strcat(ret, ch); // concatenate ch to ret
while(spaces_right){
ret[i] = ' ';
spaces_right--;
i++;
}
ret[i] = '\0';
Make sure to include <cstring> to use strcat().
I want to List the logical drives with:
const size_t BUFSIZE = 100;
char buffer[ BUFSIZE ];
memset(buffer,0,BUFSIZE);
//get available drives
DWORD drives = GetLogicalDriveStringsA(BUFSIZE,static_cast<LPSTR>(buffer));
The buffer then contains: 'C',':','\','0'
Now I want to have a List filled with "C:\","D:\" and so on. Therefore I tried something like this:
std::string tmp(buffer,BUFSIZE);//to split this string then
QStringList drivesList = QString::fromStdString(tmp).split("\0");
But it didn't worked. Is it even possible to split with the delimiter \0? Or is there a way to split by length?
The problem with String::fromStdString(tmp) is that it will create a string only from the first zero-terminated "entry" in your buffer, because that's how standard strings works. It is certainly possible, but you have to do it yourself manually instead.
You can do it by finding the first zero, extract the substring, then in a loop until you find two consecutive zeroes, do just the same.
Pseudoish-code:
current_position = buffer;
while (*current_position != '\0')
{
end_position = current_position + strlen(current_position);
// The text between current_position and end_position is the sub-string
// Extract it and add to list
current_position = end_position + 1;
}
What is the best way to split a string in two? I have this but can't get 'name' right as the substr doesnt allow me to set where to start from and where to finish, only where to start from and for how many characters (which is unknown to me):
string query = "key=value";
string key;
string value;
int positionOfEquals = query.find("=");
key = query.substr(0, positionOfEquals );
value = query.substr(positionOfEquals + 1);
Yours is a fine approach, but you still have one bug. What if there is no '='?
string query = "key=value";
string key;
string value;
int positionOfEquals = query.find("=");
key = query.substr(0, positionOfEquals );
if(positionOfEquals != string::npos)
value = query.substr(positionOfEquals + 1);