Efficiently store array of up to 2048 characters? - c++

Getting input from another source; which populates a string of up to 2048 characters.
What is the most efficient way of populating and comparing this string? - I want to be able to easily append to the string also.
Here are three attempts of mine:
C-style version
#include <cstdio>
#include <cstring>
int main(void) {
char foo[2048];
foo[0]='a', foo[1]='b', foo[2]='c', foo[3]='\0'; // E.g.: taken from u-input
puts(strcmp(foo, "bar")? "false": "true");
}
C++-style version 0
#include <iostream>
int main() {
std::string foo;
foo.reserve(2048);
foo += "abc"; // E.g.: taken from user-input
std::cout << std::boolalpha << (foo=="bar");
}
C++-style version 1
#include <iostream>
int main() {
std::string foo;
foo += "abc"; // E.g.: taken from user-input
std::cout << std::boolalpha << (foo=="bar");
}

What is most efficient depends on what you optimize for.
Some common criteria:
Program Speed
Program Size
Working Set Size
Code Size
Programmer Time
Safety
Undoubted King for 1 and 2, in your example probably also 3, is C style.
For 4 and 5, C++ style 1.
Point 6 is probably with C++-style.
Still, the proper mix of emphasizing these goal is called for, which imho favors C++ option 0.

Related

Join a container of `std::string_view`

How can you concisely combine a container of std::string_views?
For instance, boost::algorithm::join is great, but it only works for std::string.
An ideal implementation would be
static std::string_view unwords(const std::vector<std::string_view>& svVec) {
std::string_view joined;
boost::algorithm::join(svVec," ");
return joined;
}
ITNOA
short C++20 answer version:
using namespace std::literals;
const auto bits = { "https:"sv, "//"sv, "cppreference"sv, "."sv, "com"sv };
for (char const c : bits | std::views::join) std::cout << c;
std::cout << '\n';
since C++23 if you want to add special string or character between parts you can just use simple join_with and your code is just below (from official cppreference example)
#include <iostream>
#include <ranges>
#include <vector>
#include <string_view>
int main() {
using namespace std::literals;
std::vector v{"This"sv, "is"sv, "a"sv, "test."sv};
auto joined = v | std::views::join_with(' ');
for (auto c : joined) std::cout << c;
std::cout << '\n';
}
Note1: if you do not like use not stable release of language, you can simple use range-v3 library for join_with views
Note2: As Nicol Bolas you cannot join literally to exact one string_view without any copy (you can copy to string and ... :D), if you want to know more detailed about that you can see Why can't I construct a string_view from range iterators? SO question and answer.

Node size for unordered_map buckets

I have a program where I want to store kmers (substrings of size k) and the number of times they appear. For this particular application, I'm reading in a file with these values and if the number of times they appear is > 255, it is ok to round down to 255. I thought that if I store the key-value pairs as (string, unsigned char) that might save space compared to storing the key-value pairs as (string, int), but this did not seem to be the case when I checked the max resident size by running /usr/bin/time.
To confirm, I also tried running the following test program where I alternated the type of the value in the unordered_map:
#include <iostream>
#include <unordered_map>
#include <utility>
#include <string>
#include <fstream>
int main() {
std::unordered_map<std::string, unsigned char> kmap;
std::ifstream infile("kmers_from_reads");
std::string kmer;
int abun;
while(infile >> kmer >> abun) {
unsigned char abundance = (abun > 255) ? 255 : abun;
kmap[kmer] = abundance;
}
std::cout << sizeof(*kmap.begin(0)) << std::endl;
}
This did not seem to impact the size of the nodes in the bucket (on my machine it returned 40 for both unsigned char and int values).
I was wondering how the size of the nodes in each bucket is determined.
My understanding of unordered maps is that the c++ standard more or less requires separate chaining and each node in a bucket must have at least one pointer so that the elements are iterable and can be erased (http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html). However, I don't understand how the amount of space to store a value is determined, and it seems like it must also be flexible to accommodate larger values. I also tried looking at the gcc libstc++ unordered_map header (https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h) but had a hard time understanding what was going on.
Compile and execute this code:
#include <iostream>
#include <unordered_map>
#include <utility>
#include <string>
#include <fstream>
class foo
{
std::string kmer;
unsigned char abun;
};
class bar
{
std::string kmer;
int abun;
};
int main() {
std::cout << sizeof(foo) << " " << sizeof(bar) << std::endl;
}
I get, and you probably will too, 40 40. This is because of alignment requirements. If, for example, std::string contains at least one pointer (which it almost certainly does), it has to be aligned on at least a 4-byte boundary.
Imagine if sizeof(foo) was 39 and you had code that did foo foos[2]. If the pointer in foos[0].kmer was properly aligned, the pointer in foos[1].kmer wouldn't be. That would be a disaster.

(String)Iterator based conversion to int

There are the atox, strtox and stox families that I know of, but I can't seem to find any iterator based string to int conversions in the Standard Library or Boost.
The reason I need them is because I am having a parser whose match result is a range referencing the input string. I might very well have an input string like
...8973893488349798923475...
^begin ^end
so I need 738934883 as an integer.
Of couse, I could first take begin and end to construct an std::string to use with any of above families, but I would very much like to avoid that overhead.
So my question: Is there anything in the Standard Library or Boost accepting iterators as input, or do I have to write my own.
Boost does actually support this, using the Lexical Cast library. The following code uses a substring range to parse the number without performing any dynamic allocation:
#include <boost/lexical_cast.hpp>
#include <string>
#include <iostream>
int convert_strings_part(const std::string& s, std::size_t pos, std::size_t n)
{
return boost::lexical_cast<int>(s.data() + pos, n);
}
int main(int argc, char* argv[])
{
std::string s = "8973893488349798923475";
// Expect: 738934883
std::cout << convert_strings_part(s, 2, 9) << std::endl;
return 0;
}
The output (tested on OS X with Boost 1.60):
738934883
The lexical cast library has some great features for conversion to and from strings, though it isn't as well known as some of the others for some reason.
Until gavinb's answer, I was not aware of any such library function. My try would have been this, using any of atox and strtox as follows (you could avoid a dependency on boost library then, if wanted):
::std::string::iterator b; // begin of section
::std::string::iterator e; // end of section, pointing at first char NOT to be evaluated
char tmp = *e;
*e = 0;
int n = atoi(&*b);
*e = tmp;
If you only had const_iterators available, you would have to apply a const_cast to *e before modifying.
Be aware that this solution is not thread safe, though.
You could do it with strstream but it was depracated. Below two examples, with strstream and boost arrays:
http://coliru.stacked-crooked.com/a/04d4bde6973a1972
#include <iostream>
#include <strstream>
#include <boost/iostreams/device/array.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/iostreams/copy.hpp>
int main()
{
std::string in = "8973893488349798923475";
// ^^^^^
auto beg = in.begin()+2;
auto end = in.begin()+6;
// strstream example - DEPRECATED
std::istrstream os(&*beg, end-beg);
int n;
std::string ss;
os >> n;
std::cout << n << "\n";
// Boost example
namespace io = boost::iostreams;
int n2;
io::array_source src(&*beg, end-beg);
io::stream<io::array_source> os2(src);
os2 >> n2;
std::cout << n2 << "\n";
return 0;
}
With modern STL implementations std::string(begin,end) is not that bad - SSO eliminates any allocations for strings, smaller than ~15 chars (22 for 64bit).

Issue with vector<bool> and printf

#include <vector>
#include <iostream>
#include <stdio.h>
using namespace std;
int main(int argc, const char *argv[])
{
vector<bool> a;
a.push_back(false);
int t=a[0];
printf("%d %d\n",a[0],t);
return 0;
}
This code give output "5511088 1". I thought it would be "0 0".
Anyone know why is it?
The %d format specifier is for arguments the size of integers, therefore the printf function is expecting two arguments both the size of an int. However, you're providing it with one argument that isn't an int, but rather a special object returned by vector<bool> that is convertible to bool.
This is basically causing the printf function to treat random bytes from the stack as part of the values, while in fact they aren't.
The solution is to cast the first argument to an int:
printf("%d %d\n", static_cast<int>(a[0]), t);
An even better solution would be to prefer streams over printf if at all possible, because unlike printf they are type-safe which makes it impossible for this kind of situation to happen:
cout << a[0] << " " << t << endl;
And if you're looking for a type-safe alternative for printf-like formatting, consider using the Boost Format library.
%d format specifier is for int type. So, try -
cout << a[0] << "\t" << t << endl;
The key to the answer is that vector isn't really a vector of bools. It's really a vector of proxy objects, which are translatable into ints & bools. This allows each bool to be stored as a single bit, for greater space efficiency (at the cost of speed efficiency), but causes a number of problems like the one seen here. This requirement was voted into the C++ Standard in a rash moment, and I believe most committee members now believe it was a mistake, but it's in the Standard and we're kind-of stuck with it.
The problem is triggered by the specialization for bool of vectors.
The Standard Library defines a specialization of the vector template for bool. The description of this specialization indicates that the implementation should pack the elements so that every bool only uses one bit of memory. This is widely considered a mistake.
Basically std::bool use 1 bit instead of 1 byte, so you face undefined behavior regarding printf.
If you are really willing to use printf, you can solve this issue by defining std::bool as char and print it as integer %d (implicit conversion, 1 for true and 0 for false).
#include <vector>
#include <iostream>
#include <stdio.h>
#define bool char // solved
using namespace std;
int main(int argc, const char *argv[])
{
vector<bool> a;
a.push_back(false);
int t = a[0];
printf("%d %d\n", a[0], t);
return 0;
}

what is wrong with this program?

#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main() {
string x;
getline(cin,x);
ofstream o("f:/demo.txt");
o.write( (char*)&x , sizeof(x) );
}
I get the unexpected output.I don't get what i write in a string function.
Why is this ?
Please explain .
Like when i write steve pro i get the output as 8/ steve pro ÌÌÌÌÌÌ ÌÌÌÌ in the file
I expect that the output be steve pro
You are treating an std::string like something that it is not. It's a complex object that, somewhere in its internals, stores characters for you.
There is no reason to assume that a character array is at the start of the object (&x), and the sizeof the object has no relation to how many characters it may indirectly hold/represent.
You're probably looking for:
o.write(x.c_str(), x.length());
Or just use the built-in formatted I/O mechanism:
o << x;
You seem to have an incorrect model of sizeof, so let me try to get it right.
For any given object x of type T, the expression sizeof(x) is a compile-time constant. C++ will never actually inspect the object x at runtime. The compiler knows that x is of type T, so you can imagine it silently transforming sizeof(x) to sizeof(T), if you will.
#include <string>
int main()
{
std::string a = "hello";
std::string b = "Stack Overflow is for professional and enthusiast programmers, people who write code because they love it.";
std::cout << sizeof(a) << std::endl; // this prints 4 on my system
std::cout << sizeof(b) << std::endl; // this also prints 4 on my system
}
All C++ objects of the same type take up the exact amount of memory. Of course, since strings have vastly different lengths, they will internally store a pointer to a heap-allocated block of memory. But this does not concern sizeof. It couldn't, because as I said, sizeof operates at compile-time.
You get exactly what you write: the binary raw value of a pointer to char...
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
string x;
getline(cin,x);
ofstream o("tester.txt");
o << x;
o.close();
}
If you insist on writing a buffer directly, you can use
o.write(x.c_str(), x.size());
PS A little attention to code formatting unclouds the mind
You're passing the object's address to write into the file, whereas the original content lies somewhere else, pointed to by one of its internal pointers.
Try this:
string x;
getline(cin,x);
ofstream o("D:/tester.txt");
o << x;
// or
// o.write( x.c_str() , x.length());