std::string substr method problems - c++

Hello I'm writing this method. I want it to extract from a given buffer a portion that is in a given place. I have a string like this something=one;something=two and I want to get "one"
This is my idea :
static std::string Utils::getHeader( unsigned char * buffer)
{
std::string *str = new std::string(buffer);
std::size_t b_pos = str->find("=");
std::size_t a_pos = str->find(";");
return str->substr((a_pos + 1) ,(b_pos + 1));
}
but on eclipse I get this error in reference to the std::string substr method
Invalid arguments ...
Candidates are:
std::basic_string<char,std::char_traits<char>,std::allocator<char>> substr(?, ?)
Can someone explain me why I get this error and how I can fix it?

The code should probably look like:
static std::string Utils::getHeader(unsigned char * buffer, size_t size)
{
if(!buffer || !size)
return "";
const std::string str(reinterpret_cast<char*>(buffer), size);
std::size_t b_pos = str.find("=");
if(b_pos == std::string::npos)
throw ...;
std::size_t a_pos = str.find(";");
if(a_pos == std::string::npos)
throw ...;
if(b_pos > a_pos)
throw ...'
return str.substr((a_pos + 1), (b_pos + 1));
}
substr takes a starting position and a length. Maybe something like:
const size_t start = b_pos + 1;
const size_t length = (a_pos + 1) - (b_pos + 1) + 1;
And then, return str.substr(start, length);.
I'm not certain of the a_pos + 1 and b_pos + 1 is correct, though. Be certain that's what you want.

Ok, assuming you know that the input string is formatted as you mentioned you probably want something like this:
static std::string Utils::getHeader(const std::string & params) {
size_t start = params.find('=') +1; // Don't include =
size_t length = params.find(';') - start; // Already not including ';'
return str.substr(start, length);
}

Related

How can I speed up parsing of large strings?

So I've made a program that reads in various config files. Some of these config files can be small, some can be semi-large (largest one is 3,844 KB).
The read in file is stored in a string (in the program below it's called sample).
I then have the program extract information from the string based on various formatting rules. This works well, the only issue is that when reading larger files it is very slow....
I was wondering if there was anything I could do to speed up the parsing or if there was an existing library that does what I need (extract string up until a delimiter & extract string string in between 2 delimiters on the same level). Any assistance would be great.
Here's my code & a sample of how it should work...
#include "stdafx.h"
#include <string>
#include <vector>
std::string ExtractStringUntilDelimiter(
std::string& original_string,
const std::string& delimiter,
const int delimiters_to_skip = 1)
{
std::string needle = "";
if (original_string.find(delimiter) != std::string::npos)
{
int total_found = 0;
auto occurance_index = static_cast<size_t>(-1);
while (total_found != delimiters_to_skip)
{
occurance_index = original_string.find(delimiter);
if (occurance_index != std::string::npos)
{
needle = original_string.substr(0, occurance_index);
total_found++;
}
else
{
break;
}
}
// Remove the found string from the original string...
original_string.erase(0, occurance_index + 1);
}
else
{
needle = original_string;
original_string.clear();
}
if (!needle.empty() && needle[0] == '\"')
{
needle = needle.substr(1);
}
if (!needle.empty() && needle[needle.length() - 1] == '\"')
{
needle.pop_back();
}
return needle;
}
void ExtractInitialDelimiter(
std::string& original_string,
const char delimiter)
{
// Remove extra new line characters
while (!original_string.empty() && original_string[0] == delimiter)
{
original_string.erase(0, 1);
}
}
void ExtractInitialAndFinalDelimiters(
std::string& original_string,
const char delimiter)
{
ExtractInitialDelimiter(original_string, delimiter);
while (!original_string.empty() && original_string[original_string.size() - 1] == delimiter)
{
original_string.erase(original_string.size() - 1, 1);
}
}
std::string ExtractStringBetweenDelimiters(
std::string& original_string,
const std::string& opening_delimiter,
const std::string& closing_delimiter)
{
const size_t first_delimiter = original_string.find(opening_delimiter);
if (first_delimiter != std::string::npos)
{
int total_open = 1;
const size_t opening_index = first_delimiter + opening_delimiter.size();
for (size_t i = opening_index; i < original_string.size(); i++)
{
// Check if we have room for opening_delimiter...
if (i + opening_delimiter.size() <= original_string.size())
{
for (size_t j = 0; j < opening_delimiter.size(); j++)
{
if (original_string[i + j] != opening_delimiter[j])
{
break;
}
else if (j == opening_delimiter.size() - 1)
{
total_open++;
}
}
}
// Check if we have room for closing_delimiter...
if (i + closing_delimiter.size() <= original_string.size())
{
for (size_t j = 0; j < closing_delimiter.size(); j++)
{
if (original_string[i + j] != closing_delimiter[j])
{
break;
}
else if (j == closing_delimiter.size() - 1)
{
total_open--;
}
}
}
if (total_open == 0)
{
// Extract result, and return it...
std::string needle = original_string.substr(opening_index, i - opening_index);
original_string.erase(first_delimiter, i + closing_delimiter.size());
// Remove new line symbols
ExtractInitialAndFinalDelimiters(needle, '\n');
ExtractInitialAndFinalDelimiters(original_string, '\n');
return needle;
}
}
}
return "";
}
int main()
{
std::string sample = "{\n"
"Line1\n"
"Line2\n"
"{\n"
"SubLine1\n"
"SubLine2\n"
"}\n"
"}";
std::string result = ExtractStringBetweenDelimiters(sample, "{", "}");
std::string LineOne = ExtractStringUntilDelimiter(result, "\n");
std::string LineTwo = ExtractStringUntilDelimiter(result, "\n");
std::string SerializedVector = ExtractStringBetweenDelimiters(result, "{", "}");
std::string SubLineOne = ExtractStringUntilDelimiter(SerializedVector, "\n");
std::string SubLineTwo = ExtractStringUntilDelimiter(SerializedVector, "\n");
// Just for testing...
printf("LineOne: %s\n", LineOne.c_str());
printf("LineTwo: %s\n", LineTwo.c_str());
printf("\tSubLineOne: %s\n", SubLineOne.c_str());
printf("\tSubLineTwo: %s\n", SubLineTwo.c_str());
system("pause");
}
Use string_view or a hand rolled one.
Don't modify the string loaded.
original_string.erase(0, occurance_index + 1);
is code smell and going to be expensive with a large original string.
If you are going to modify something, do it in one pass. Don't repeatedly delete from the front of it -- that is O(n^2). Instead, procceed along it and shove "finished" stuff into an output accumulator.
This will involve changing how your code works.
You're reading your data into a string. "Length of string" should not be a problem. So far, so good...
You're using "string.find().". That's not necessarily a bad choice.
You're using "string.erase()". That's probably the main source of your problem.
SUGGESTIONS:
Treat the original string as "read-only". Don't call erase(), don't modify it.
Personally, I'd consider reading your text into a C string (a text buffer), then parsing the text buffer, using strstr().
Here is a more efficient version of ExtractStringBetweenDelimiters. Note that this version does not mutate the original buffer. You would perform subsequent queries on the returned string.
std::string trim(std::string buffer, char what)
{
auto not_what = [&what](char ch)
{
return ch != what;
};
auto first = std::find_if(buffer.begin(), buffer.end(), not_what);
auto last = std::find_if(buffer.rbegin(), std::make_reverse_iterator(first), not_what).base();
return std::string(first, last);
}
std::string ExtractStringBetweenDelimiters(
std::string const& buffer,
const char opening_delimiter,
const char closing_delimiter)
{
std::string result;
auto first = std::find(buffer.begin(), buffer.end(), opening_delimiter);
if (first != buffer.end())
{
auto last = std::find(buffer.rbegin(), std::make_reverse_iterator(first),
closing_delimiter).base();
if(last > first)
{
result.assign(first + 1, last);
result = trim(std::move(result), '\n');
}
}
return result;
}
If you have access to string_view (c++17 for std::string_view or boost::string_view) you could return one of these from both functions for extra efficiency.
It's worth mentioning that this method of parsing a structured file is going to cause you problems down the line if any of the serialised strings contains a delimiter, such as a '{'.
In the end you'll want to write or use someone else's parser.
The boost::spirit library is a little complicated to learn, but creates very efficient parsers for this kind of thing.

C++ get string between two delimiters and replace it

I want to replace a substring in a string with something that depends on the substring between to delimiters. Little example:
I got the string
The result is __--__3__--__.
and a function
int square(int x): { return x*x };
Now I want to output just the string with the result without delimiters, so:
The result is 9.
I already tried several algorithms but none of them worked yet.
Best regard
My best attempt to far:
const std::string emptyString = "";
std::string ExtractString(std::string source, std::string start, std::string end)
{
std::size_t startIndex = source.find(start);
// If the starting delimiter is not found on the string
// stop the process, you're done!
//
if (startIndex == std::string::npos)
{
return emptyString;
}
// Adding the length of the delimiter to our starting index
// this will move us to the beginning of our sub-string.
//
startIndex += start.length();
// Looking for the end delimiter
//
std::string::size_type endIndex = source.find(end, startIndex);
// Returning the substring between the start index and
// the end index. If the endindex is invalid then the
// returned value is empty string.
return source.substr(startIndex, endIndex - startIndex);
}
int square(int x): { return x*x };
int main() {
std::string str = "The result is __--__3__--__.";
std::string foundNum = ExtractString(str, "__--__", "__--__");
int foundNumInt = atoi(foundNum.c_str());
int result = square(foundNumInt);
std::string toReplace = "__--__";
toReplace.append(foundNumInt);
toReplace.append("__--__");
str.replace(str.begin(), str.end(), toReplace, result);
}
The Question is: How to take the first string given ( The result is __--__<number>__--__.>, get the number from it, preform a function on that number, and then end with a string that looks like this The result is <number squared>.
Here is a way to take the first string, find the number. I then just squared the number, but you could plug that into your own function of you wanted to.
std::string s = "The result is __--__3__--__.";
std::regex r( "[0-9]+");
std::smatch m;
//
std::sregex_iterator iter(s.begin(), s.end(), r);
std::sregex_iterator end;
std::string value;
//
int index = 0;
while (iter != end)
{
for (unsigned i = 0; i < iter->size(); ++i)
{
value = (*iter)[i];
}
++iter;
index++;
}
int num = stoi(value);
int answer = num*num;
s = s.substr(0, s.find('_'));
s = s + " " + std::to_string(answer);
std::cout << s << std::endl;
Have you tried std::string::find?
const std::string example_data = "The result is __--__3__--__.";
static const char text_to_find[] = "__--__";
const std::string::size_type start_position = example_data.find(text_to_find);
if (start_position != std::string::npos)
{
const std::string::size_type replacement_start_position = start_position + sizeof(text_to_find) - 1;
if (replacement_start_position < example_data.length())
{
// Perform replacement
}
}
The "sizeof(text_to_find) - 1" returns the length of the text, without counting the terminating nul character.
To skip past the number, you could do something like:
const std::string after_number_position = example_data.find(replacement_start_position, "_");
The substring between replacement_start_position and after_number_position will contain your number. You can use a variety of functions to convert the substring to a number.
See also std::ostringstream for converting numbers to text.
Edit 1:
Corrected declaration of replacement_start_position.
You must need these functions(for c++17, much faster):
auto replace_all
(std::string str, std::string_view from, std::string_view to) noexcept -> decltype(str) {
unsigned start_pos{ 0 };
while ((start_pos = str.find(from, start_pos)) != std::string::npos) {
str.replace(start_pos, from.length(), to);
start_pos += to.length();
}
return str;
}
auto remove_all
(std::string str, std::string_view from) noexcept -> decltype(str) {
return replace_all(str, from, "");
}
and for later versions:
std::string replace_all
(std::string str, std::string from, std::string to) noexcept {
unsigned start_pos{ 0 };
while ((start_pos = str.find(from, start_pos)) != std::string::npos) {
str.replace(start_pos, from.length(), to);
start_pos += to.length();
}
return str;
}
std::string remove_all
(std::string str, std::string from) noexcept {
return replace_all(str, from, "");
}
I tested:
int main() {
std::string str = "__+__hello__+__";
std::cout << remove_all(str, "__+__");
std::cin.get();
return 0;
}
and my output was:
hello

How do I get part of a char*?

I have the following code that solves a small image using Tesseract.
char *answer = tess_api.GetUTF8Text();
I know beforehand that the result will always start with the character '+' and it's one word so I want to get rid of any junk it finds.
I get the result as "G+ABC S\n\n" and I need only +ABC. So basically I need to ignore anything before + and everything after the first space. I was thinking I should use rindex to find the position of + and spaces.
std::string ParseString(const std::string& s)
{
size_t plus = s.find_first_of('+');
size_t space = s.find_first_of(" \n", plus);
return s.substr(plus, space-plus);
}
int main()
{
std::cout << ParseString("G+ABC S\n\n").c_str() << std::endl;
std::cout << ParseString("G +ABC\ne\n").c_str() << std::endl;
return 0;
}
Gives
+ABC
+ABC
If you really can't use strings then something like this might do
char *ParseString2(char *s)
{
int plus,end;
for (plus = 0 ; s[plus] != '+' ; ++plus){}
for (end = plus ; s[end] != ' ' && s[end] != '\n' ; ++end){}
char *result = new char[end - plus + 1];
memcpy(result, s + plus, end - plus);
result[end - plus] = 0;
return result;
}
You can use:
// just scan "answer" to find out where to start and where to end
int indexStart = // find the index of '+'
int indexEnd = // find the index before space
int length = indexEnd-indexStart+1;
char *dataYouWant = (char *) malloc(length+1); // result will be stored here
memcpy( dataYouWant, &answer[indexStart], length );
// for example answer = "G+ABC S\n\n"
dataYouWant[length] = '\0'; // dataYouWant will be "+ABC"
You can check out Strings in c, how to get subString for other alternatives.
P.S. suggestion: use string instead in C++, it will be much easier (check out #DavidSykes's answer).

Converting from char string to an array of uint8_t?

I'm reading a string from a file so it's in the form of a char array. I need to tokenize the string and save each char array token as a uint8_t hex value in an array.
char* starting = "001122AABBCC";
// ...
uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
How can I convert from starting to ending? Thanks.
Here is a complete working program. It is based on Rob I's solution, but fixes several problems has been tested to work.
#include <string>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
const char* starting = "001122AABBCC";
int main()
{
std::string starting_str = starting;
std::vector<unsigned char> ending;
ending.reserve( starting_str.size());
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, 2 );
ending.push_back(::strtol( pair.c_str(), 0, 16 ));
}
for(int i=0; i<ending.size(); ++i) {
printf("0x%X\n", ending[i]);
}
}
strtoul will convert text in any base you choose into bytes. You have to do a little work to chop the input string into individual digits, or you can convert 32 or 64bits at a time.
ps uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
Doesn't mean anything, you aren't storing the data in a uint8 as 'hex', you are storing bytes, it's upto how you (or your debugger) interpretes the binary data
With C++11, you may use std::stoi for that :
std::vector<uint8_t> convert(const std::string& s)
{
if (s.size() % 2 != 0) {
throw std::runtime_error("Bad size argument");
}
std::vector<uint8_t> res;
res.reserve(s.size() / 2);
for (std::size_t i = 0, size = s.size(); i != size; i += 2) {
std::size_t pos = 0;
res.push_back(std::stoi(s.substr(i, 2), &pos, 16));
if (pos != 2) {
throw std::runtime_error("bad character in argument");
}
}
return res;
}
Live example.
I think any canonical answer (w.r.t. the bounty notes) would involve some distinct phases in the solution:
Error checking for valid input
Length check and
Data content check
Element conversion
Output creation
Given the usefulness of such conversions, the solution should probably include some flexibility w.r.t. the types being used and the locale required.
From the outset, given the date of the request for a "more canonical answer" (circa August 2014) liberal use of C++11 will be applied.
An annotated version of the code, with types corresponding to the OP:
std::vector<std::uint8_t> convert(std::string const& src)
{
// error check on the length
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
auto ishex = [] (decltype(*src.begin()) c) {
return std::isxdigit(c, std::locale()); };
// error check on the data contents
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
// allocate the result, initialised to 0 and size it to the correct length
std::vector<std::uint8_t> result(src.length() / 2, 0);
// run the actual conversion
auto str = src.begin(); // track the location in the string
std::for_each(result.begin(), result.end(), [&str](decltype(*result.begin())& element) {
element = static_cast<std::uint8_t>(std::stoul(std::string(str, str + 2), nullptr, 16));
std::advance(str, 2); // next two elements
});
return result;
}
The template version of the code adds flexibility;
template <typename Int /*= std::uint8_t*/,
typename Char = char,
typename Traits = std::char_traits<Char>,
typename Allocate = std::allocator<Char>,
typename Locale = std::locale>
std::vector<Int> basic_convert(std::basic_string<Char, Traits, Allocate> const& src, Locale locale = Locale())
{
using string_type = std::basic_string<Char, Traits, Allocate>;
auto ishex = [&locale] (decltype(*src.begin()) c) {
return std::isxdigit(c, locale); };
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
std::vector<Int> result(src.length() / 2, 0);
auto str = std::begin(src);
std::for_each(std::begin(result), std::end(result), [&str](decltype(*std::begin(result))& element) {
element = static_cast<Int>(std::stoul(string_type(str, str + 2), nullptr, 16));
std::advance(str, 2);
});
return result;
}
The convert() function can then be based on the basic_convert() as follows:
std::vector<std::uint8_t> convert(std::string const& src)
{
return basic_convert<std::uint8_t>(src, std::locale());
}
Live sample.
uint8_t is typically no more than a typedef of an unsigned char. If you're reading characters from a file, you should be able to read them into an unsigned char array just as easily as a signed char array, and an unsigned char array is a uint8_t array.
I'd try something like this:
std::string starting_str = starting;
uint8_t[] ending = new uint8_t[starting_str.length()/2];
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, i+2 );
ending[i/2] = ::strtol( pair.c_str(), 0, 16 );
}
Didn't test it but it looks good to me...
You may add your own conversion from set of char { '0','1',...'E','F' } to uint8_t:
uint8_t ctoa(char c)
{
if( c >= '0' && c <= '9' ) return c - '0';
else if( c >= 'a' && c <= 'f' ) return 0xA + c - 'a';
else if( c >= 'A' && c <= 'F' ) return 0xA + c - 'A';
else return 0;
}
Then it will be easy to convert a string in to array:
uint32_t endingSize = strlen(starting)/2;
uint8_t* ending = new uint8_t[endingSize];
for( uint32_t i=0; i<endingSize; i++ )
{
ending[i] = ( ctoa( starting[i*2] ) << 4 ) + ctoa( starting[i*2+1] );
}
This simple solution should work for your problem
char* starting = "001122AABBCC";
uint8_t ending[12];
// This algo will work for any size of starting
// However, you have to make sure that the ending have enough space.
int i=0;
while (i<strlen(starting))
{
// convert the character to string
char str[2] = "\0";
str[0] = starting[i];
// convert string to int base 16
ending[i]= (uint8_t)atoi(str,16);
i++;
}
uint8_t* ending = static_cast<uint8_t*>(starting);

Reverse string find_first_not_of

I have a std::string and I want to find the position of the first character that:
Is different from all the following characters: ' ', '\n' and '\t'.
Has lower position from that indicated by me.
So, for example if I have the following string and position:
string str("AAA BBB=CCC DDD");
size_t pos = 7;
I want to have the possibility to use a method like this:
size_t res = find_first_of_not_reverse(str, pos, " \n\t");
// now res = 4, because 4 is the position of the space character + 1
How can I do?
As Bo commented, templatetypedef's answer was 99% of the way there; we just need std::string::find_last_of rather than std::string::find_last_not_of:
#include <cassert>
#include <string>
std::string::size_type find_first_of_not_reverse(
std::string const& str,
std::string::size_type const pos,
std::string const& chars)
{
assert(pos > 1);
assert(pos < str.size());
std::string::size_type const res = str.find_last_of(chars, pos - 1) + 1;
return res == pos ? find_first_of_not_reverse(str, pos - 1, chars)
: res ? res
: std::string::npos;
}
int main()
{
std::string const str = "AAA BBB=CCC DDD";
std::string const chars = " \n\t";
std::string::size_type res = find_first_of_not_reverse(str, 7, chars); // res == 4
res = find_first_of_not_reverse(str, 2, chars); // res == npos
}
I was curious why basic_string does not define rfind_first_of and friends myself. I think it should. Regardless here is a non-recursive (see ildjarn's answer) implementation that should fulfill the requirements of this question. It compiles but I've not tested it.
std::string delims = " \n\t";
reverse_iterator start = rend()-pos-1, found =
std::find_first_of(start,rend(),delims.begin(),delims.end());
return found==rend()?npos:pos-(found-start);
To be like rfind pos needs to be set to size() if it's npos or greater than size().
PS: I think this question could benefit from some editing. For one "find_first_of_not_reverse" is pretty misleading. It should be rfind_first_of I think (and then add 1 to the result.)