Is there a standard library class for substrings? - c++

I have a big read-only string that I scan for syntax and based on that simple syntax I extract a bunch of smaller strings that I use later for further processing. Based on testing, creating and copying most of the big string into the small strings is kind of a performance bottleneck (there are thousands of them per each big string).
I figured that I don't actually need to allocate for-, and copy the data though. What I really need is a sort of string snippet type instead that would only store a pointer to the start of the relevant data and the length but at the same time, it should be a drop-in replacement for std::string and all the standard library interactions it has.
That would be the easiest to implement anyways, I could roll my own class for that and implement the functions I need but if there is already something like it in the standard library then why bother.
So basically, is there a substring sort of class in STL?

Yes, since C++17 you have std::string_view.
Example:
#include <iostream>
#include <string>
#include <string_view>
int main() {
std::string foo = "Hello world";
std::string_view a(foo.c_str(), 5);
std::string_view b(foo.c_str() + 6, 5);
std::cout << a << '\n' // prints Hello
<< b << '\n'; // prints world
}

This is where using std::string_view instead of std::string is very beneficial in reducing copies of those original strings and being able to use std::string_view::substr.
Instead of copying the strings you are operating on, a string view provides a view to the underlying string - pretty much just the pointer to the start of the string and the size of it.

Related

What's the necessity of string in c++ while we already have char[]?

Many topics have discussed the difference between string and char[]. However, they are not clear to me to understand why we need to bring string in c++? Any insight is welcome, thanks!
char[] is C style. It is not object oriented, it forces you as the programmer to deal with implementation details (such as '\0' terminator) and rewrite standard code for handling strings every time over and over.
char[] is just an array of bytes, which can be used to store a string, but it is not a string in any meaningful way.
std::string is a class that properly represents a string and handles all string operations.
It lets you create objects and keep your code fully OOP (if that is what you want).
More importantly, it takes care of memory management for you.
Consider this simple piece of code:
// extract to string
#include <iostream>
#include <string>
main ()
{
std::string name;
std::cout << "Please, enter your name: ";
std::cin >> name;
std::cout << "Hello, " << name << "!\n";
return 0;
}
How would you write the same thing using char[]?
Assume you can not know in advance how long the name would be!
Same goes for string concatenation and other operations.
With real string represented as std::string you combine two strings with a simple += operator. One line.
If you are using char[] however, you need to do the following:
Calculate the size of the combined string + terminator character.
Allocate memory for the new combined string.
Use strncpy to copy first string to new array.
Use strncat to append second string to first string in new array.
Plus, you need to remember not to use the unsafe strcpy and strcat and to free the memory once you are done with the new string.
std::string saves you all that hassle and the many bugs you can introduce while writing it.
As noted by MSalters in a comment, strings can grow. This is, in my opinion, the strongest reason to have them in C++.
For example, the following code has a bug which may cause it to crash, or worse, to appear to work correctly:
char message[] = "Hello";
strcat(message, "World");
The same idea with std::string behaves correctly:
std::string message{"Hello"};
message += "World";
Additional benefits of std::string:
You can send it to functions by value, while char[] can only be sent by reference; this point looks rather insignificant, but it enables powerful code like std::vector<std::string> (a list of strings which you can add to)
std::string stores its length, so any operation which needs the length is more efficient
std::string works similarly to all other C++ containers (vector, etc) so if you are already familiar with containers, std::string is easy to use
std::string has overloaded comparison operators, so it's easy to use with std::map, std::sort, etc.
String class is no more than an amelioration of the char[] variable.
With strings you can achieve the same goals than the use of a char[] variable, but you won't have to matter about little tricks of char[] like pointers, segmentation faults...
This is a more convenient way to build strings, but you don't really see the "undergrounds" of the language, like how to implement concatenation or length functions...
Here is the documentation of the std::string class in C++ : C++ string documentation

How to use inplace const char* as std::string content

I am working on a embedded SW project. A lot of strings are stored inside flash memory. I would use these strings (usually const char* or const wchar*) as std::string's data. That means I want to avoid creating a copy of the original data because of memory restrictions.
An extended use might be to read the flash data via stringstream directly out of the flash memory.
Example which unfortunately is not working in place:
const char* flash_adr = 0x00300000;
size_t length = 3000;
std::string str(flash_adr, length);
Any ideas will be appreciated!
If you are willing to go with compiler and library specific implementations, here is an example that works in MSVC 2013.
#include <iostream>
#include <string>
int main() {
std::string str("A std::string with a larger length than yours");
char *flash_adr = "Your source from the flash";
char *reset_adr = str._Bx._Ptr; // Keep the old address around
// Change the inner buffer
(char*)str._Bx._Ptr = flash_adr;
std::cout << str << std::endl;
// Reset the pointer or the program will crash
(char*)str._Bx._Ptr = reset_adr;
return 0;
}
It will print Your source from the flash.
The idea is to reserve a std::string capable of fitting the strings in your flash and keep on changing its inner buffer pointer.
You need to customize this for your compiler and as always, you need to be very very careful.
I have now used string_span described in CPP Core Guidelines (https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md). GSL provides a complete implementation (GSL: Guidelines Support Library https://github.com/Microsoft/GSL).
If you know the address of your string inside flash memory you can just use the address directly with the following constructor to create a string_span.
constexpr basic_string_span(pointer ptr, size_type length) noexcept
: span_(ptr, length)
{}
std::string_view might have done the same job as Captain Obvlious (https://stackoverflow.com/users/845568/captain-obvlious) commented as my favourite comment.
I am quite happy with the solution. It works good from performance side including providing a good readability.

An effective way to concatenate multiple buffers

I need to design an efficient and readable class with 2 main functions:
add_buffer(char* buffer) - add a buffer.
char* read_all() - get one big buffer that contains all the buffers that the user added until now (by order).
for example:
char first_buffer[] = {1,2,3};
char second_buffer[] = {4,5,6};
MyClass instance;
instance.add_buffer(first_buffer);
instance.add_buffer(second_buffer);
char* big_buffer = instance.read_all(); // big_buffer = [1,2,3,4,5,6]
NOTE: There are a lot of solutions for this problem but I'm looking for an efficient one because in real life the buffers will be many and big, and I want to save a lot of copying and reallocs (like what std::vector does). I'm also want a readble c++ code.
NOTE: The real life problem is: I'm reading data from an HTTP request that came to me at separated chunks. After all chunks arrived I want to return the whole data to the user.
Use an std::vector<char> with enough memory reserved. Since C++11, you can access the internal buffer with std::vector::data() (until C++11, you have to use &*std::vector::begin()).
If you can use Boost, boost::algorithm::join will do:
#include <boost/algorithm/string/join.hpp>
#include <vector>
#include <iostream>
int main(int, char **)
{
std::vector<std::string> list;
list.push_back("Hello");
list.push_back("World!");
std::string joined = boost::algorithm::join(list, ", ");
std::cout << joined << std::endl;
}
Output:
Hello, World!
Original answer by Tristram Gräbener
Use some standard approach like,
start with some initial memory, say 256
whenever it gets full use reallocate and double the size.
If you don't want to do it yourself, use STL containers like
std::vector<char>
It automatically reallocates memory for you when buffer is full.

quick access of hash function (without using string object)

The following code snippet can do hash value on a string object. I would like to get hash value a binary string (a pointer and length). I know I can form a string object with pointer and length, but there is extra overhead to form a string only for that. Wonder if it's possible to use the std hash function with two parameters: pointer and length.
Thanks.
#include <iostream>
#include <functional>
#include <string>
int main()
{
std::string str = "Meet the new boss...";
std::hash<std::string> hash_fn;
std::size_t str_hash = hash_fn(str);
std::cout << str_hash << '\n';
}
I found this article in stack overflow which shows that the underlying hash function is actually a function of the bytes in the string's internal buffer:
What is the default hash function used in C++ std::unordered_map?
But rather than risk undefined behaviour by calling into internal functions within the standard library, why not ask the question, "how much performance will I lose by creating a std::string"? Given that you can always create such a string as a static const (zero overhead) I wonder what you're actually going to save?

Dynamically allocated strings in C

I was doing a relatively simple string problem in UVa's online judge to practice with strings since I've been having a hard time with them in C. The problem basically asks to check if a string B contains another string A if you remove the 'clutter' and concatenate the remaining characters, for example if "ABC" is contained in "AjdhfmajBsjhfhC" which in this case is true.
So, my question is how can I efficiently allocate memory for a string which I don't know its length? What I did was to make a string really big char Mstring[100000], read from input and then use strlen(Mstring) to copy the string the a properly sized char array. Something like :
char Mstring[100000];
scanf("%s",Mstring);
int length = strlen(Mstring);
char input[length+1]={0};
for(int i = 0; i<length;i++){
input[i]=Mstring[i];
}
Is there a better/standard way to do this in C? I know that C does not has a great support for strings, if there is not a better way to do it in C maybe in C++?
If you have the option of using C++ (as you mentioned), that is going to make your life a lot easier. You can then use a STL string (std::string) which manages dynamically sized strings for you. You can also drop the old scanf() beast and use std::cin.
Example:
#include <iostream>
#include <string>
void main()
{
std::string sInput;
std::getline(std::cin, sInput);
// alternatively, you could execute this line instead:
// std::cin >> sInput;
// but that will tokenize input based on whitespace, so you
// will only get one word at a time rather than an entire line
}
Describing how to manage strings that can grow dynamically in C will take considerably more explanation and care, and it sounds like you really don't need that. If so, however, here is a starting point: http://www.strchr.com/dynamic_arrays.