(String)Iterator based conversion to int - c++

There are the atox, strtox and stox families that I know of, but I can't seem to find any iterator based string to int conversions in the Standard Library or Boost.
The reason I need them is because I am having a parser whose match result is a range referencing the input string. I might very well have an input string like
...8973893488349798923475...
^begin ^end
so I need 738934883 as an integer.
Of couse, I could first take begin and end to construct an std::string to use with any of above families, but I would very much like to avoid that overhead.
So my question: Is there anything in the Standard Library or Boost accepting iterators as input, or do I have to write my own.

Boost does actually support this, using the Lexical Cast library. The following code uses a substring range to parse the number without performing any dynamic allocation:
#include <boost/lexical_cast.hpp>
#include <string>
#include <iostream>
int convert_strings_part(const std::string& s, std::size_t pos, std::size_t n)
{
return boost::lexical_cast<int>(s.data() + pos, n);
}
int main(int argc, char* argv[])
{
std::string s = "8973893488349798923475";
// Expect: 738934883
std::cout << convert_strings_part(s, 2, 9) << std::endl;
return 0;
}
The output (tested on OS X with Boost 1.60):
738934883
The lexical cast library has some great features for conversion to and from strings, though it isn't as well known as some of the others for some reason.

Until gavinb's answer, I was not aware of any such library function. My try would have been this, using any of atox and strtox as follows (you could avoid a dependency on boost library then, if wanted):
::std::string::iterator b; // begin of section
::std::string::iterator e; // end of section, pointing at first char NOT to be evaluated
char tmp = *e;
*e = 0;
int n = atoi(&*b);
*e = tmp;
If you only had const_iterators available, you would have to apply a const_cast to *e before modifying.
Be aware that this solution is not thread safe, though.

You could do it with strstream but it was depracated. Below two examples, with strstream and boost arrays:
http://coliru.stacked-crooked.com/a/04d4bde6973a1972
#include <iostream>
#include <strstream>
#include <boost/iostreams/device/array.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/iostreams/copy.hpp>
int main()
{
std::string in = "8973893488349798923475";
// ^^^^^
auto beg = in.begin()+2;
auto end = in.begin()+6;
// strstream example - DEPRECATED
std::istrstream os(&*beg, end-beg);
int n;
std::string ss;
os >> n;
std::cout << n << "\n";
// Boost example
namespace io = boost::iostreams;
int n2;
io::array_source src(&*beg, end-beg);
io::stream<io::array_source> os2(src);
os2 >> n2;
std::cout << n2 << "\n";
return 0;
}

With modern STL implementations std::string(begin,end) is not that bad - SSO eliminates any allocations for strings, smaller than ~15 chars (22 for 64bit).

Related

Join a container of `std::string_view`

How can you concisely combine a container of std::string_views?
For instance, boost::algorithm::join is great, but it only works for std::string.
An ideal implementation would be
static std::string_view unwords(const std::vector<std::string_view>& svVec) {
std::string_view joined;
boost::algorithm::join(svVec," ");
return joined;
}
ITNOA
short C++20 answer version:
using namespace std::literals;
const auto bits = { "https:"sv, "//"sv, "cppreference"sv, "."sv, "com"sv };
for (char const c : bits | std::views::join) std::cout << c;
std::cout << '\n';
since C++23 if you want to add special string or character between parts you can just use simple join_with and your code is just below (from official cppreference example)
#include <iostream>
#include <ranges>
#include <vector>
#include <string_view>
int main() {
using namespace std::literals;
std::vector v{"This"sv, "is"sv, "a"sv, "test."sv};
auto joined = v | std::views::join_with(' ');
for (auto c : joined) std::cout << c;
std::cout << '\n';
}
Note1: if you do not like use not stable release of language, you can simple use range-v3 library for join_with views
Note2: As Nicol Bolas you cannot join literally to exact one string_view without any copy (you can copy to string and ... :D), if you want to know more detailed about that you can see Why can't I construct a string_view from range iterators? SO question and answer.

Constructing a vector with istream_iterators is slower than a while loop with string / std::atof, why?

I'm reading a stream of values in from std::cin, and constructing a std::vector from it. Originally, I used a while loop with a temporary std::string object, and used std::atof with the c_str() from the temporary string. There's a few calls in there, and generally a lot going on. I replaced it with the range constructor, using std::istream_iterator with std::cin, thinking it would look simpler, and be quicker. To my surprise, it was a bit slower, though it does look cleaner.
My question is this: Why, in the code below, is the construction of std::vector using std::istream_iterator slower than the alternative method, using a mashup of function calls? Also, is there a way to modify the range construction using say, std::istreambuf_iterator, such that the performance of the two methods are equivalent? I've seen answers stating that I should add std::ios_base::sync_with_stdio(false); to the code. While this increases the performance, it does so in both cases, and a difference between the two methods still exists.
Minimal Working Example:
#include <iostream>
#include <iterator>
#include <string>
#include <vector>
using namespace std;
int main()
{
/* Faster Method */
string temporary_line{};
vector<double> data{};
while(cin>> temporary_line)
data.push_back(atof(temporary_line.c_str()));
/* Slower Method */
//vector<double> data{ istream_iterator<double>{cin},
// stream_iterator<double>{} };
cout<< data.back() << '\n';
}
I ran the code through 5 different compilers, g++-{7,8}, and clang++-{6,7,8}. The code was compiled under -O2 for all runs, with each time representing the average of 5 runs. The times were tight enough that adding more trials wouldn't have mattered. The results show the same behavior across all compilers, with g++ edging out clang++ by just a small amount of time on both methods.
To test, create a file of ~1,000,000 random integers:
$ for i in {0..999999}; do echo $RANDOM >> datafile; done
To compile:
$ g++ -o ds descriptive_statistics.cpp -O2
To run with the generated example data:
$ time cat datafile | ./ds
The full code:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <cmath>
#include <cstdint>
#include <cstdlib>
#include <memory>
#include <numeric>
#include <string>
#include <vector>
class DS {
public:
DS() = default;
DS(const DS& ) = default;
DS(DS&& ) = default;
DS(const double*, std::size_t length);
DS(const double*, const double*);
virtual ~DS() = default;
DS& operator=(const DS& ) = default;
DS& operator=(DS&& ) = default;
friend std::ostream& operator<<(std::ostream& , const DS& );
bool operator<(const DS& ) = delete;
bool operator>(const DS& ) = delete;
bool operator==(const DS& ) = delete;
private:
double min;
double first_quartile;
double mean;
double median;
double third_quartile;
double max;
double sum;
double variance;
double standard_deviation;
};
DS::DS(const double* begin, const double* end) {
const std::size_t size = std::distance(begin, end) ;
min = *begin;
first_quartile = begin[ size/4 ] ;
sum = std::accumulate(begin, end, double{});
mean = sum / size ;
const std::size_t idx{ size / 2 };
median = begin[ idx ] ;
if( ! (size & 1) ) {
median += begin[ idx - 1 ];
median *= 0.5;
}
third_quartile = begin[ 3*size/4 ] ;
variance = std::accumulate(begin, end, double{},
[&] (double a, double b) {
return a + std::pow(b - mean, 2.0);
}) / size ;
standard_deviation = std::sqrt(variance);
max = *std::prev(end);
}
DS::DS(const double* begin, std::size_t length) {
const double* end = begin + length;
*this = DS(begin,end);
}
std::ostream& operator<<(std::ostream& os, const DS& ds) {
os << ds.min << '\n'
<< ds.first_quartile << '\n'
<< ds.mean << '\n'
<< ds.median << '\n'
<< ds.third_quartile << '\n'
<< ds.max << '\n'
<< ds.sum << '\n'
<< ds.variance << '\n'
<< ds.standard_deviation << '\n';
return os;
}
int main(int argc, char** argv)
{
// This section is faster than the section below
std::string temporary_line{};
std::vector<double> data{};
while(std::cin>> temporary_line) {
data.push_back(std::atof(temporary_line.c_str()));
}
// This section is slower than the section above
// std::vector<double> data{
// std::istream_iterator<double>{std::cin},
// std::istream_iterator<double>{}
// };
if(! std::is_sorted(data.cbegin(), data.cend()) ) {
std::sort(data.begin(), data.end());
}
DS ds(&*data.cbegin(), &*data.cend());
std::cout<< ds << std::endl;
return(EXIT_SUCCESS);
}
Taking a look at the implementation of std::istream_iterator<double> you can notice that doing
std::vector<double> data{ std::istream_iterator<double>{file},
std::istream_iterator<double>{} };
is really an equivalent of doing
double temporary_line;
std::vector<double> data{};
while (file>>temporary_line) {
data.push_back(temporary_line);
}
See the difference in assembly code on godbolt
So your whole question boils down to why std::atof is faster than operator>>.
As you can notice in O2 with gcc there is a call to strtod instead of call std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&) https://www.godbolt.org/z/Od-FIk but the structure of the code is basically the same.
And I believe the reason for time difference is the locale. std::atof is partly oblivious to the locale (it sees the C locale) on the other hand operator>> does the job of parsing with the constraint of the specified C++ locale and eventually with a UNICODE encoder.
Doing more complex operation takes more time. But 50% penalty time for taking into account UNICODE and every locale isn't that bad, don't you think ?
The two programs do similar, but really different things.
The atof program parses strings that look like C floating point numbers. Their format is fixed. excluding the decimal-point character which is determined by the current locale.
I gave profiled the two versions using gcc and libstdc++. According to my profiling results (which should really be your profiling results) the program spends about half of its time in the strtod function. Other significant contributors are dynamic_cast (which is necessary for locale handling, which is necessary for string reading) and the std::istream::sentry constructor.
The iterator program parses strings according to arbitraty, possibly user-supplied locale facets. It spends most of its time in std::num_get::do_get, which is a virtual function; it in turn divides its time between internal functions std::num_get::_M_extract_float and std::__convert_to_v<double>. The former parses the number in order to find out which range of characters to pass to strtod. The latter just calls strtod, which parses and converts the number. So it appears that the iterator version parses each number twice.
I don't really know if it's feasible to squash the two parsing passes into one. Perhaps nobody cares to do this optimisation. Everybody knows that iostreams formatting is bad bas and should only be used in non-performance-critical applications, and it's due to an overhaul and replacement anyway.

Efficiently store array of up to 2048 characters?

Getting input from another source; which populates a string of up to 2048 characters.
What is the most efficient way of populating and comparing this string? - I want to be able to easily append to the string also.
Here are three attempts of mine:
C-style version
#include <cstdio>
#include <cstring>
int main(void) {
char foo[2048];
foo[0]='a', foo[1]='b', foo[2]='c', foo[3]='\0'; // E.g.: taken from u-input
puts(strcmp(foo, "bar")? "false": "true");
}
C++-style version 0
#include <iostream>
int main() {
std::string foo;
foo.reserve(2048);
foo += "abc"; // E.g.: taken from user-input
std::cout << std::boolalpha << (foo=="bar");
}
C++-style version 1
#include <iostream>
int main() {
std::string foo;
foo += "abc"; // E.g.: taken from user-input
std::cout << std::boolalpha << (foo=="bar");
}
What is most efficient depends on what you optimize for.
Some common criteria:
Program Speed
Program Size
Working Set Size
Code Size
Programmer Time
Safety
Undoubted King for 1 and 2, in your example probably also 3, is C style.
For 4 and 5, C++ style 1.
Point 6 is probably with C++-style.
Still, the proper mix of emphasizing these goal is called for, which imho favors C++ option 0.

Wrong code about c++ char*. Any body view?

#include<stdlib.h>
#include<stdio.h>
#include<string.h>
#include<iostream>
using namespace std;
#define attr_size 3
int main(){
const char* attr[attr_size];
int i=0;
for(i=0;i<attr_size;i++){
char* t=(char*)malloc(sizeof(int));
sprintf(t,"%d",i);
string temp="attr";
temp+=t;
attr[i]=temp.c_str();
cout<<attr[i]<<endl;
free(t);
}
for(i=0;i<attr_size;i++){
cout<<attr[i]<<endl;
}
}
And the result is:
attr0
attr1
attr2
attr2
attr
attr2
Actually, I want to get the result that:
attr0
attr1
attr2
attr0
attr1
attr2
Maybe something wrong with loop. Anybody help me?
The problem is that the c_str return a pointer that is temporary. So when the loop continue it's iteration the object you got the pointer from is destructed and the pointer is no longer valid, leading to undefined behavior when you later dereference that pointer.
If you want an array of strings, why not declare it as an array of strings?
There are also other problems with your code, like you only allocating four bytes for a string that can be 12 (with sign and string terminator) characters.
I would suggest you remake your program like this:
#include <iostream>
#include <array>
#include <sstream>
const size_t ATTR_SIZE = 3;
int main()
{
std::array<std::string, ATTR_SIZE> attr;
for (int i = 0; i < ATTR_SIZE; ++i)
{
std::istringstream is;
is << "attr" << i;
attr[i] = is.str();
}
for (const std::string& s : attr)
std::cout << s << '\n';
}
The above uses some C++11 features like std::array (you can use std::vector instead) and range-base for loop (you can use normal iteration instead).

String reverse error

Can anyone explain to me why im getting a ".exe has encountered a problem and needs close"error, it compiles and works sometimes when i fiddle with the char array, but when it does work i sometimes get strange characters at the end of the string.
#include <iostream>
using namespace std;
char* StrReverse3(char*);
char* StrReverse3(char* str)
{
char *p;
int length=0,start=0,end=0;
length=strlen(str);
for(start=0,end=length-1;end>= 0,start<=length-1;end--,start++)
{
p[start]=str[end];
}
return p;
}
int main()
{
char str[100]="Saw my reflection in snow covered hills";
StrReverse3(str);
cin.get();
return 0;
}
You are not initializing p. It's an uninitialized pointer that you are writing to.
Since you are writing this in C++, not C, I'd suggest using std::string and std::reverse:
#include <string>
#include <algorithm>
#include <iostream>
int main()
{
std::string str = "Saw my reflection in snow covered hills";
std::reverse(str.begin(), str.end());
std::cout << str;
return 0;
}
Output:
sllih derevoc wons ni noitcelfer ym waS
See it working online at ideone
char *p; is never initialized, yet p[start] is used as the destination of an assignment. Don't you get compiler warnings from this? I'm amazed it even "works sometimes".
You are accessing memory that wasn't allocated by your program (p can point anywhere!). This is the reason for the problems you have.
I strongly encourage you to
read into the topic of dynamically allocating memory with new and delete to understand a very important topic
read into the standard template library, especially std::string. You should not use raw pointers like char*, always use standard types when possible.
#include <iostream>
#include <cstring>
using namespace std;
char* StrReverse3(char* str){
int length=0,start=0,end=0;
length=strlen(str);
for(start=0,end=length-1;end > start;end--,start++){
char temp;
temp = str[start];
str[start]=str[end];
str[end]=temp;
}
return str;
}
int main(){
char str[100]="Saw my reflection in snow covered hills";
cout << StrReverse3(str);
cin.get();
return 0;
}