Do C/C++ have standard 'slice' container? - c++

I often use a slice structure in my projects:
struct SSlice {
void *pData;
size_t length;
};
Also I see other projects use similar containers to work with data w/o copying it (like RocksDB, MDB, etc...). Do anybody knows is there a standard (or OS-related) header with such container?
In STL, linux headers, no matter.
UPD. Main purpose of such container in my and mentioned projects is to work with data w/o copying it. E.g. I use it to parse URI path or LDAP DN and represent it as vector of slices.

It is more typical in C++ standard library, to use a range of iterators (begin, end), rather than an iterator and length (begin, length). Pointers are a case of iterator, which is a more general concept.
There exist no standard structure for ranges, nor slices† that you describe. The standard interfaces treat the begin and end as separate objects.
However, addition of ranges has been proposed and there exists a technical specification, that may already be supported by some standard library implementations.
† At least, not in general, but there is the special std::valarray container which provides a slice interface to its contents.

std::string_view is a C++17 non-owning view of a range of characters with std::string-like functionality. It is intended to speed oarsing, among other things.
span and array_view are names for various standarization and proto-standization efforts that also match your concept, but are not string-presuming.

Related

Unit testing custom STL-compatible containers and iterators

I've just implemented a custom iterator type for a custom container. The container models the C++17 ReversibleContainer, and its iterators (both const and non-const) model LegacyRandomAccessIterator.
My question is; is there some sort of built-in thing in std that I can use to test if both the container and its iterators adhere to the specified named requirements, or do I have to write all the tests myself (which is mostly doable for me but I'd rather not reinvent the wheel; and also I'm not sure I'm enough of a template wizard to really thoroughly prove that e.g. types and such are correct)?
Things like (I know this is one of many), e.g. this from the operational semantics of operator <:
custom_container::iterator a = ...;
custom_container::iterator b = ...;
assert((a < b) == (b - a > 0));
And that return types are correct, etc., and such.
I've managed to find some capabilities already, for example <type_traits> has some useful utilities like:
if (!std::is_copy_constructible<custom_container::iterator>::value)
/* fail test */ ;
Which is good for some of the fundamental named requirements at least.
No there us not.
In fact proving a type satisfies all requirements of being an iterator cannot be done statically in C++.
You can test statically if the types are correct, and that operators and the like exist. But ths semantics cannot be proven in the general case (I think both practically and theoretically due to Rice's theorem).
I find most of the requirements are easy to check in practice (if not as easy to automate). The most common gotcha I find is that "legacy" iterators stronger than input iterators must have actual backing persistent data they return references and pointers to; that data cannot live within the iterator, or be otherwise temporary/generated.

`std::back()` like function in C++

I can create begin and end iterators using std::begin() and std::end().
e.g:
int arr[4][4] = <something here>;
auto begin_it = std::begin(arr);
auto end_it = std::end(arr);
However, why do we not have std::front() and std::back(). Is there any specific reason for them to be ommitted?
Are there any similar functions I can use (apart from begin and end of course)?
Not all containers have constant-time access to the last element of the list.
std::forward_list for example.
Just like Marshall Clow said, inconsistent access time across containers should be considered.
My opinion is front would cause UB frequently on empty or not defined front container but begin would not cause that as long as it's not dereferenced.
std::vector v;
auto item = std::front(v); // This line immediately cause undefined behavior.
If std::front exists, it might cause more problems than the convenience it brings.
Because every call of front must come after empty. Then that's two function calls to make it convenient..
std::vector v;
// ...
if( !v.empty() ){
auto item = std::front(v);
// ...tasks
}
HMM.. Is it convenient?
While programmer could get front consciously by *begin(container) and use it before check. All we saved is just an asterisk.
std::vector v;
// ...
if( !v.empty() ){
auto item = *std::begin(v);
// ...tasks
}
Is there any specific reason for them to be ommitted?
Is there any specific reason for them to be included?
Things are not thrown into the standard library just because they can be. The standard library is intended to consist of useful functions, templates, and the like. The "burden of proof" lies with the person who wants something included. It's only after a use-case has been established, presented to the standards committee, and dismissed by said committee that it would be accurate to call the something "omitted".
That being said, there are several potential reasons for something not be included in the standard library, and "not useful" is only one of these. Other potential reasons include "implementation not good enough across all platforms" and "oh, we didn't think of that". While I do not know which reason applies here, I can provide some food for thought by comparing std::end() to a hypothetical std::back().
One can use std::end() with a C-style array (of known size) or anything meeting the requirements of "Container", such as all containers in the standard library. It has great utility in looping over containers, which is a reasonably common operation. A result of adding std::end() to the standard library is that many of the algorithms of the standard library no longer need template specialization to handle C-style arrays. (Admittedly, range-based for loops had a similar effect.)
One would be able to use std::back() with a C-style array (of known size), a std::array, or anything meeting the requirements of "SequenceContainer", such as strings, vectors, deques, lists, and forward lists (and nothing else from the standard library). It has utility in... um... I'm drawing a blank, but I'll grant the possibility of a general use for back(), although probably nowhere close to being as common as looping.
Since std::back() would be applicable in fewer situations than std::end(), it might have been overlooked. At the same time, since std::back() would be applicable in fewer situations than std::end(), it might not meet the "useful" criteria for being included in the standard library.
Are there any similar functions I can use (apart from begin and end of course)?
You could switch from C-style arrays to std::array, then use std::array::back(). After all, std::array is just a C-style array wrapped up to look like a contiguous sequence container. The type declaration is longer, but the functions you are looking for become readily available.
The standard library has a goal of supporting C-style arrays because it is a library, destined to be used by code the library's authors never dreamt of. If you're not writing a library, you probably don't really need to support C-style arrays.

Safely use containers in C++ library interface

When designing a C++ library, I read it is bad practice to include standard library containers like std::vector in the public interface (see e.g. Implications of using std::vector in a dll exported function).
What if I want to expose a function that takes or returns a list of objects? I could use a simple array, but then I would have to add a count parameter, which makes the interface more cumbersome and less safe. Also it wouldn't help much if I wanted to use a map, for example. I guess libraries like Qt define their own containers which are safe to export, but I'd rather not add Qt as a dependency, and I don't want to roll my own containers.
What's the best practice to deal with containers in the library interface? Is there maybe a tiny container implementation (preferably just one or two files I can drop in, with a permissive license) that I can use as "glue"? Or is there even a way to make std::vector etc. safe across .DLL/.so boundaries and with different compilers?
You can implement a template function. This has two advantages:
It lets your users decide what sorts of containers they want to use with your interface.
It frees you from having to worry about ABI compatibility, because there is no code in your library, it will be instantiated when the user invokes the function.
For example, put this in your header file:
template <typename Iterator>
void foo(Iterator begin, Iterator end)
{
for (Iterator it = begin; it != end; ++it)
bar(*it); // a function in your library, whose ABI doesn't depend on any container
}
Then your users can invoke foo with any container type, even ones they invented that you don't know about.
One downside is that you'll need to expose the implementation code, at least for foo.
Edit: you also said you might want to return a container. Consider alternatives like a callback function, as in the gold old days in C:
typedef bool(*Callback)(int value, void* userData);
void getElements(Callback cb, void* userData) // implementation in .cpp file, not header
{
for (int value : internalContainer)
if (!cb(value, userData))
break;
}
That's a pretty old school "C" way, but it gives you a stable interface and is pretty usable by basically any caller (even actual C code with minor changes). The two quirks are the void* userData to let the user jam some context in there (say if they want to invoke a member function) and the bool return type to let the callback tell you to stop. You can make the callback a lot fancier with std::function or whatever, but that might defeat some of your other goals.
Actually this is not only true for STL containers but applies to pretty much any C++ type (in particular also all other standard library types).
Since the ABI is not standardized you can run into all kinds of trouble. Usually you have to provide separate binaries for each supported compiler version to make it work. The only way to get a truly portable DLL is to stick with a plain C interface. This usually leads to something like COM, since you have to ensure that all allocations and matching deallocations happen in the same module and that no details of the actual object layout are exposed to the user.
TL;DR There is no issue if you distribute either the source code or compiled binaries for the various supported sets of (ABI + Standard Library implementation).
In general, the latter is seen as cumbersome (with reasons), thus the guideline.
I trust hand-waving guidelines about as far as I can throw them... and I encourage you to do the same.
This guidelines originates from an issue with ABI compatibility: the ABI is a complex set of specifications that defines the exact interface of a compiled library. It is includes notably:
the memory layout of structures
the name mangling of functions
the calling conventions of functions
the handling of exception, runtime type information, ...
...
For more details, check for example the Itanium ABI. Contrary to C which has a very simple ABI, C++ has a much more complicated surface area... and therefore many different ABIs were created for it.
On top of ABI compatibility, there is also an issue with Standard Library Implementation. Most compilers come with their own implementation of the Standard Library, and these implementations are incompatible with each others (they do not, for example, represent a std::vector the same way, even though all implement the same interface and guarantees).
As a result, a compiled binary (executable or library) may only be mixed and matched with another compiled binary if both were compiled against the same ABI and with compatible versions of a Standard Library implementation.
Cheers: no issue if you distribute source code and let the client compile.
If you are using C++11, you can use cppcomponents. https://github.com/jbandela/cppcomponents
This will allow you to use among other things std::vector as a parameter or return value across Dll/or .so files created using different compilers or standard libraries. Take a look at my answer to a similar question for an example Passing reference to STL vector over dll boundary
Note for the example, you need to add a CPPCOMPONENTS_REGISTER(ImplementFiles) after the CPPCOMPONENTS_DEFINE_FACTORY() statement

STL container requierments

Does the standard require that some_container<T>::value_type be T?
I am asking because I am considering different approaches to implementing an STL-compliant 2d dynamic array. One of them is to have 2Darray<T>::value_type be 2Darray_row<T> or something like that, where the array would be iterated as a collection of rows (a little simplified. My actual implementation allows iteration in 3 directions)
The container requirements are a bit funky in the sense that they are actually not used by any generic algorithm. In that sense, it doesn't really matter much.
That said, the requirements are on the interface for containers not on how the container is actually instantiated. Even non-template classes can conform to the various requirements and, in fact, do. The requirement is that value_type is present; what it is defined to depends entirely on the container implementation.
Table 96 in §23.2.1 in the standard (c++11) requires a container class X containing objects of type T to return T for X::value_type.
So, if your some_container stores objects of type T, then value_type has to be T.
Either have a nested container (so colArray<rowArray<T> >) or have a single wrapping (2dArray<T>), but don't try to mix them. The nested approach allows you to use STL all the way down (vector<vector<T> >), but can be confusing and doesn't allow you column iterators etc, which you seem to want.
This SO answer addresses using ublas, and another suggests using Boost multi-arrays.
Generally, go for the STL or Boost option if you can. You are unlikely to write something as well by yourself.

Why aren't there convenience functions for set_union, etc, which take container types instead of iterators?

std::set_union and its kin take two pairs of iterators for the sets to be operated on. That's great in that it's the most flexible thing to do. However they very easily could have made an additional convenience functions which would be more elegant for 80% of typical uses.
For instance:
template<typename ContainerType, typename OutputIterator>
OutputIterator set_union( const ContainerType & container1,
const ContainerType & container2,
OutputIterator & result )
{
return std::set_union( container1.begin(), container1.end(),
container2.begin(), container2.end(),
result );
}
would turn:
std::set_union( mathStudents.begin(), mathStudents.end(),
physicsStudents.begin(), physicsStudents.end(),
students.begin() );
into:
std::set_union( mathStudents, physicsStudents, students.begin() );
So:
Are there convenience functions like this hiding somewhere that I just haven't found?
If not, can anyone thing of a reason why it would be left out of STL?
Is there perhaps a more full featured set library in boost? (I can't find one)
I can of course always put my implementations in a utility library somewhere, but it's hard to keep such things organized so that they're used across all projects, but not conglomerated improperly.
Are there convenience functions like this hiding somewhere that I just haven't found?
Not in the standard library.
If not, can anyone thing of a reason why it would be left out of STL?
The general idea with algorithms is that they work with iterators, not containers. Containers can be modified, altered, and poked at; iterators cannot. Therefore, you know that, after executing an algorithm, it has not altered the container itself, only potentially the container's contents.
Is there perhaps a more full featured set library in boost?
Boost.Range does this. Granted, Boost.Range does more than this. It's algorithms don't take "containers"; they take iterator ranges, which STL containers happen to satisfy the conditions for. They also have lazy evaluation, which can be nice for performance.
One reason for working with iterators is of course that it is more general and works on ranges that are not containers, or just a part of a container.
Another reason is that the signatures would be mixed up. Many algorithms, like std::sort have more than one signature already:
sort(Begin, End);
sort(Begin, End, Compare);
Where the second one is for using a custom Compare when sorting on other than standard less-than.
If we now add a set of sort for containers, we get these new functions
sort(Container);
sort(Container, Compare);
Now we have the two signatures sort(Begin, End) and sort(Container, Compare) which both take two template parameters, and the compiler will have problems resolving the call.
If we change the name of one of the functions to resolve this (sort_range, sort_container?) it will not be as convenient anymore.
I agree, STL should take containers instead of iterators-pairs for the following reasons;
Simpler code
Algorithms could be overloaded for specified containers, ie, could use the map::find algorithm instead of std::find -> More general code
A subrange could easily be wrapped into a container, as is done in boost::range
#Bo Persson has pointed to a problem with ambiguity, and I think that's quite valid.
I think there's a historical reason that probably prevented that from ever really even being considered though.
The STL was introduced into C++ relatively late in the standardization process. Shortly after it was accepted, the committee voted against even considering any more new features for addition into C++98 (maybe even at the same meeting). By the time most people had wrapped their head around the existing STL to the point of recognizing how much convenience you could get from something like ranges instead of individual iterators, it was too late to even be considered.
Even if the committee was still considering new features, and somebody had written a proposals to allow passing containers instead of discrete iterators, and had dealt acceptably with the potential for ambiguity, I suspect the proposal would have been rejected. Many (especially the C-oriented people) saw the STL as a huge addition to the standard library anyway. I'm reasonably certain quite a few people would have considered it completely unacceptable to add (lots) more functions/overloads/specializations just to allowing passing one parameter in place of two.
Using the begin & end elements for iteration allows one to use non-container types as inputs. For example:
ContainerType students[10];
vector<ContainerType> physicsStudents;
std::set_union(physicsStudents.begin(), physicsStudents.end(),
&students[0], &students[10],
physicsStudents.begin());
Since they are such simple implementations, I think it makes sense not to add them to the std library and allow authors to add their own. Especially given that they are templates, thus potentially increasing the lib size of the code and adding convenience functions across std would lead to code bloat.