Parse (replace) in C++ std::string - c++

I have a "custom" string that has the following format. Example:
std::string MyString = "RndOrder%5d - RndCustomer%8s - RndHex%8x";
I would like to replace/parse the string:
the %5d (%NUM_d) would be replaced with a random 5-digit decimal
the %8s (%NUM_s) would be replaced with a random 8-chars
the %8x (%NUM_x) would be replaced with a random 8-digit hexadecimal
Is there any function that helps me parse those "special marks"? Not sure if I would have to parse the string char by char and check for every possible combination.

If the format can be variant (not always the fixed 3 arguments: %5d, %8s and %8x) and you want to be flexible in that manner, you should write your own implementation for that.
Assuming that count defined after % is a general digit (not only 5 or 8) you could try using the std::regex_search or std::regex_match to find the actual mnemonics you are looking for. For example your expression could look like %\d+[dsx]
Then you should parse it to find the COUNT and type and substitute with a random number acquired with the desired generator.
To parse you could try updating the above expression to %(\d+)([dsx]) and capturing groups.
A sample parse implementation for your case could look like this:
std::string text = "RndOrder%5d - RndCustomer%8s - RndHex%8x";
auto reg = std::regex("%(\\d+)([sdx])");
std::smatch match;
while (std::regex_search(text, match, reg))
{
const auto& full = match.str(); // in 1st iter contains "%5d"
const auto& count = match.str(1); // in 1st iter contains "5"
const auto& type = match.str(2); // in 1st iter contains "d"
// further processing: type conversion, number generation, string replacement
text = match.suffix().str();
}
For implementation example with search and group capturing you can also check out another question: Retrieving a regex search in C++

Ok, assuming that you're actually asking about string parsing here (and not random number/data generation)... have a look at this:
int iRandom1 = 12345; // 5-digit decimal
int iRandom3 = 0x12345678; // 8-digit hexadecimal
char cRandom2[9] = "RandomXY\0"; // Don't forget to NULL-terminate!
std::string sFormat = "RndOrder%5d - RndCustomer%8s - RndHex%8x";
char cResultBuffer[500]; // Make sure this buffer is big enough!
std::sprintf( cResultBuffer, sFormat.c_str(), iRandom1, cRandom2, iRandom3 );
std::string MyString = cResultBuffer; // MyString = "RndOrder12345 - RndCustomerRandomXY - RndHex12345678";

It's a candidate for std::snprintf (c++14), but take care to request the correct buffer size in one call, allocate a buffer and then format the string into the buffer:
#include <iostream>
#include <cstring>
#include <string>
template<class...Args>
std::string replace(const char* format, Args const&... args)
{
// determine number of characters in output
auto len = std::snprintf(nullptr, 0, format, args...);
// allocate buffer space
auto result = std::string(std::size_t(len), ' ');
// write string into buffer. Note the +1 is allowing for the implicit trailing
// zero in a std::string
std::snprintf(&result[0], len + 1, format, args...);
return result;
};
int main() {
auto s = replace("RndOrder%5d - RndCustomer%8s - RndHex%8x", 5, "foo", 257);
std::cout << s << std::endl;
}
expected output:
RndOrder 5 - RndCustomer foo - RndHex 101

Related

Copy a part of an std::string in a char* pointer

Let's suppose I've this code snippet in C++
char* str;
std::string data = "This is a string.";
I need to copy the string data (except the first and the last characters) in str.
My solution that seems to work is creating a substring and then performing the std::copy operation like this
std::string substring = data.substr(1, size - 2);
str = new char[size - 1];
std::copy(substring.begin(), substring.end(), str);
str[size - 2] = '\0';
But maybe this is a bit overkilling because I create a new string. Is there a simpler way to achieve this goal? Maybe working with offets in the std:copy calls?
Thanks
As mentioned above, you should consider keeping the sub-string as a std::string and use c_str() method when you need to access the underlying chars.
However-
If you must create the new string as a dynamic char array via new you can use the code below.
It checks whether data is long enough, and if so allocates memory for str and uses std::copy similarly to your code, but with adapted iterators.
Note: there is no need to allocate a temporary std::string for the sub-string.
The Code:
#include <string>
#include <iostream>
int main()
{
std::string data = "This is a string.";
auto len = data.length();
char* str = nullptr;
if (len > 2)
{
auto new_len = len - 2;
str = new char[new_len+1]; // add 1 for zero termination
std::copy(data.begin() + 1, data.end() - 1, str); // copy from 2nd char till one before the last
str[new_len] = '\0'; // add zero termination
std::cout << str << std::endl;
// ... use str
delete[] str; // must be released eventually
}
}
Output:
his is a string
There is:
int length = data.length() - 1;
memcpy(str, data.c_str() + 1, length);
str[length] = 0;
This will copy the string in data, starting at position [1] (instead of [0]) and keep copying until length() - 1 bytes have been copied. (-1 because you want to omit the first character).
The final character then gets overwritten with the terminating \0, finalizing the string and disposing of the final character.
Of course this approach will cause problems if the string does not have at least 1 character, so you should check for that beforehand.

How to split a string by emojis in C++

I'm trying to take a string of emojis and split them into a vector of each emoji Given the string:
std::string emojis = "😀🔍🦑😁🔍🎉😂🤣";
I'm trying to get:
std::vector<std::string> splitted_emojis = {"😀", "🔍", "🦑", "😁", "🔍", "🎉", "😂", "🤣"};
Edit
I've tried to do:
std::string emojis = "😀🔍🦑😁🔍🎉😂🤣";
std::vector<std::string> splitted_emojis;
size_t pos = 0;
std::string token;
while ((pos = emojis.find("")) != std::string::npos)
{
token = emojis.substr(0, pos);
splitted_emojis.push_back(token);
emojis.erase(0, pos);
}
But it seems like it throws terminate called after throwing an instance of 'std::bad_alloc' after a couple of seconds.
When trying to check how many emojis are in a string using:
std::string emojis = "😀🔍🦑😁🔍🎉😂🤣";
std::cout << emojis.size() << std::endl; // returns 32
it returns a bigger number which i assume are the unicode data. I don't know too much about unicode data but i'm trying to figure out how to check for when the data of an emoji begins and ends to be able to split the string to each emoji
I would definitely recommend that you use a library with better unicode support (all large frameworks do), but in a pinch you can get by with knowing that the UTF-8 encoding spreads Unicode characters over multiple bytes, and that the first bits of the first byte determine how many bytes a character is made up of.
I stole a function from boost. The split_by_codepoint function uses an iterator over the input string and constructs a new string using the first N bytes (where N is determined by the byte count function) and pushes it to the ret vector.
// Taken from boost internals
inline unsigned utf8_byte_count(uint8_t c)
{
// if the most significant bit with a zero in it is in position
// 8-N then there are N bytes in this UTF-8 sequence:
uint8_t mask = 0x80u;
unsigned result = 0;
while(c & mask)
{
++result;
mask >>= 1;
}
return (result == 0) ? 1 : ((result > 4) ? 4 : result);
}
std::vector<std::string> split_by_codepoint(std::string input) {
std::vector<std::string> ret;
auto it = input.cbegin();
while (it != input.cend()) {
uint8_t count = utf8_byte_count(*it);
ret.emplace_back(std::string{it, it+count});
it += count;
}
return ret;
}
int main() {
std::string emojis = u8"😀🔍🦑😁🔍🎉😂🤣";
auto split = split_by_codepoint(emojis);
std::cout << split.size() << std::endl;
}
Note that this function simply splits a string into UTF-8 strings containing one code point each. Determining if the character is an emoji is left as an exercise: UTF-8-decode any 4-byte characters and see if they are in the proper range.

Converting Zero-Terminated String To D String

Is there a function in Phobos for converting a zero-terminated string into a D-string?
So far I've only found the reverse case toStringz.
I need this in the following snippet
// Lookup user name from user id
passwd pw;
passwd* pw_ret;
immutable size_t bufsize = 16384;
char* buf = cast(char*)core.stdc.stdlib.malloc(bufsize);
getpwuid_r(stat.st_uid, &pw, buf, bufsize, &pw_ret);
if (pw_ret != null) {
// TODO: The following loop maybe can be replace by some Phobos function?
size_t n = 0;
string name;
while (pw.pw_name[n] != 0) {
name ~= pw.pw_name[n];
n++;
}
writeln(name);
}
core.stdc.stdlib.free(buf);
which I use to lookup the username from a user id.
I assume UTF-8 compatiblity for now.
There's two easy ways to do it: slice or std.conv.to:
const(char)* foo = c_function();
string s = to!string(foo); // done!
Or you can slice it if you are going to use it temporarily or otherwise know it won't be written to or freed elsewhere:
immutable(char)* foo = c_functon();
string s = foo[0 .. strlen(foo)]; // make sure foo doesn't get freed while you're still using it
If you think it can be freed, you can also copy it by slicing then duping: foo[0..strlen(foo)].dup;
Slicing pointers works the same way in all array cases, not just strings:
int* foo = get_c_array(&c_array_length); // assume this returns the length in a param
int[] foo_a = foo[0 .. c_array_length]; // because you need length to slice
Just slice the original string (no coping). The $ inside [] is translated to str.length. If the zero is not at the end, just replace the "$ - 1" expression with position.
void main() {
auto str = "abc\0";
str.trimLastZero();
write(str);
}
void trimLastZero (ref string str) {
if (str[$ - 1] == 0)
str = str[0 .. $ - 1];
}
You can do the following to strip away the trailing zeros and convert it to a string:
char[256] name;
getNameFromCFunction(name.ptr, 256);
string s = to!string(cast(char*)name); //<-- this is the important bit
If you just pass in name you will convert it to a string but the trailing zeroes will still be there. So you cast it to a char pointer and voila std.conv.to will convert whatever it meets until a '\0' is encountered.

Constructing a string of a specific length starting with a specific prefix

I need to construct a string of a specific length starting with a specific prefix. Is there any faster way (in terms of performance) to achieve the objective of the following piece of code? Would it be of any help to use char* here?
int strLen = 15;
string prefix = "1234"; // could be a number of any length less than strLen
int prefixLen = prefix.length();
string str = prefix;
for(int i=0;i<strLen-prefixLen;i++)
{
str.append("9"); // use character '9' as filler
}
printf("str: %s \n", str.c_str());
Sample prefix and output:
prefix: 123, str: 123999999999999
prefix: 1234, str: 123499999999999
The only thing I do not want changed in this code is the type of 'prefix' which should remain string.
try this:
std::string content(15, '9'); // start off with all 9s
content.replace(0, 4, "1234"); // replace the first four characters etc.
int StrLength = 15;
string PreFix = "1234";
string RestOfStr(StrLength - PreFix.length(), '9');
cout << PreFix << RestOfStr << endl;
the string class has an overloaded Constructor, taking a size and a char.
The constructor will create a string object filled with the char repeated x amount of times
Hope This Helps
Try this:
unsigned strLen(15);
std::string prefix("1234");
prefix += std::string(strLen - prefix.length(), '9');

How to capture length of sscanf'd string?

I'm parsing a string that follows a predictable pattern:
1 character
an integer (one or more digits)
1 colon
a string, whose length came from #2
For example:
s5:stuff
I can see easily how to parse this with PCRE or the like, but I'd rather stick to plain string ops for the sake of speed.
I know I'll need to do it in 2 steps because I can't allocate the destination string until I know its length. My problem is gracefully getting the offset for the start of said string. Some code:
unsigned start = 0;
char type = serialized[start++]; // get the type tag
int len = 0;
char* dest = NULL;
char format[20];
//...
switch (type) {
//...
case 's':
// Figure out the length of the target string...
sscanf(serialized + start, "%d", &len);
// <code type='graceful'>
// increment start by the STRING LENGTH of whatever %d was
// </code>
// Don't forget to skip over the colon...
++start;
// Build a format string which accounts for length...
sprintf(format, "%%%ds", len);
// Finally, grab the target string...
sscanf(serialized + start, format, string);
break;
//...
}
That code is roughly taken from what I have (which isn't complete because of the issue at hand) but it should get the point across. Maybe I'm taking the wrong approach entirely. What's the most graceful way to do this? The solution can either C or C++ (and I'd actually like to see the competing methods if there are enough responses).
You can use the %n conversion specifier, which doesn't consume any input - instead, it expects an int * parameter, and writes the number of characters consumed from the input into it:
int consumed;
sscanf(serialized + start, "%d%n", &len, &consumed);
start += consumed;
(But don't forget to check that sscanf() returned > 0!)
Use the %n format specifier to write the number of characters read so far to an integer argument.
Here's a C++ solution, it could be better, and is hard-coded specifically to deal with your example input, but shouldn't require much modification to get working.
std::stringstream ss;
char type;
unsigned length;
char dummy;
std::string value;
ss << "s5:Helloxxxxxxxxxxx";
ss >> type;
ss >> length;
ss >> dummy;
ss.width(length);
ss >> value;
std::cout << value << std::endl;
Disclaimer:
I'm a noob at C++.
You can probably just use atoi which will ignore the colon.
e.g. len = atoi(serialized + start);
The only thing with atoi is that if it returns zero it could mean either the conversion failed, or that the length was truly zero. So it's not always the most appropriate function.
if you replace you colon with a space scanf will stop on it and you can get the size malloc the size then run another scanf to get the rest of the string`
int main (int argc, const char * argv[]) {
char foo[20];
char *test;
scanf("%s",foo); //"hello world"
printf("foo = %s\n", foo);//prints hello
//get size
test = malloc(sizeof(char)* 10);//replace 10 with your string size
scanf("%s", test);
printf("test = %s\n", test);//prints world
return 0;
}
`
Seems like the format is overspecified... (using a variable length field to specify the length of a variable length field).
If you're using GCC, I'd suggest
if (sscanf(serialized,"%c%d:%as",&type,&len,&dest)<3) return -1;
/* use type, dest; ignore len */
free(dest);
return 0;