Can std::string overload "substr" for rvalue *this and steal resources? - c++

It just occurred to me I noticed that std::string's substr operation could be much more efficient for rvalues when it could steal the allocated memory from *this.
The Standard library of N3225 contains the following member function declaration of std::string
basic_string substr(size_type pos = 0, size_type n = npos) const;
Can an implementation that could implement an optimized substr for rvalues overload that and provide two versions, one of which could reuse the buffer for rvalue strings?
basic_string substr(size_type pos = 0) &&;
basic_string substr(size_type pos, size_type n) const;
I imagine the rvalue version could be implemented as follows, reusing the memory of *this an setting *this to a moved-from state.
basic_string substr(size_type pos = 0) && {
basic_string __r;
__r.__internal_share_buf(pos, __start + pos, __size - pos);
__start = 0; // or whatever the 'empty' state is
return __r;
}
Does this work in an efficient fashion on common string implementations or would this take too much housekeeping?

Firstly, an implementation cannot add an overload that steals the source, since that would be detectable:
std::string s="some random string";
std::string s2=std::move(s).substr(5,5);
assert(s=="some random string");
assert(s2=="rando");
The first assert would fail if the implementation stole the data from s, and the C++0x wording essentially outlaws copy on write.
Secondly, this wouldn't necessarily be an optimization anyway: you'd have to add additional housekeeping in std::string to handle the case that it's a substring of a larger string, and it would mean keeping large blocks around when there was no longer any strings referencing the large string, just some substring of it.

Yes, and maybe it should be proposed to the standards committee, or maybe implemented in a library. I don't really know how valuable the optimization would be. And that would be an interesting study all on its own.
When gcc grows support for r-value this, someone ought to try it and report how useful it is.

There are a few string classes out there implementing copy-on-write. But I wouldn't recommend adding yet another string type to your project unless really justified.
Check out the discussion in Memory-efficient C++ strings (interning, ropes, copy-on-write, etc)

Related

is a += a safe for std::string in c++ [duplicate]

Considering a code like this:
std::string str = "abcdef";
const size_t num = 50;
const size_t baselen = str.length();
while (str.length() < num)
str.append(str, 0, baselen);
Is it safe to call std::basic_string<T>::append() on itself like this? Cannot the source memory get invalidated by enlarging before the copy operation?
I could not find anything in the standard specific to that method. It says the above is equivalent to str.append(str.data(), baselen), which I think might not be entirely safe unless there is another detection of such cases inside append(const char*, size_t).
I checked a few implementations and they seemed safe one way or another, but my question is if this behavior is guaranteed. E.g. "Appending std::vector to itself, undefined behavior?" says it's not for std::vector.
According to §21.4.6.2/§21.4.6.3:
The function [basic_string& append(const charT* s, size_type n);] replaces the string controlled by *this with a string of length size() + n whose first size() elements are a copy of the original string controlled by *this and whose remaining elements are a copy of the initial n elements of s.
Note: This applies to every append call, as every append can be implemented in terms of append(const charT*, size_type), as defined by the standard (§21.4.6.2/§21.4.6.3).
So basically, append makes a copy of str (let's call the copy strtemp), appends n characters of str2 to strtemp, and then replaces str with strtemp.
For the case that str2 is str, nothing changes, as the string is enlarged when the temporary copy is assigned, not before.
Even though it is not explicitly stated in the standard, it is guaranteed (if the implementation is exactly as stated in the standard) by the definition of std::basic_string<T>::append.
Thus, this is not undefined behavior.
This is complicated.
One thing that can be said for certain. If you use iterators:
std::string str = "abcdef";
str.append(str.begin(), str.end());
then you are guaranteed to be safe. Yes, really. Why? Because the specification states that the behavior of the iterator functions is equivalent to calling append(basic_string(first, last)). That obviously creates a temporary copy of the string. So if you need to insert a string into itself, you're guaranteed to be able to do it with the iterator form.
Granted, implementations don't have to actually copy it. But they do need to respect the standard specified behavior. An implementation could choose to make a copy only if the iterator range is inside of itself, but the implementation would still have to check.
All of the other forms of append are defined to be equivalent to calling append(const charT *s, size_t len). That is, your call to append above is equivalent to you doing append(str.data(), str.size()). So what does the standard say about what happens if s is inside of *this?
Nothing at all.
The only requirement of s is:
s points to an array of at least n elements of charT.
Since it does not expressly forbid s pointing into *this, then it must be allowed. It would also be exceedingly strange if the iterator version allows self-assignment, but the pointer&size version did not.

Should I compare a std::string to "string" or "string"s?

Consider this code snippet:
bool foo(const std::string& s) {
return s == "hello"; // comparing against a const char* literal
}
bool bar(const std::string& s) {
return s == "hello"s; // comparing against a std::string literal
}
At first sight, it looks like comparing against a const char* needs less assembly instructions1, as using a string literal will lead to an in-place construction of the std::string.
(EDIT: As pointed out in the answers, I forgot about the fact that effectively s.compare(const char*) will be called in foo(), so of course no in-place construction takes place in this case. Therefore striking out some lines below.)
However, looking at the operator==(const char*, const std::string&) reference:
All comparisons are done via the compare() member function.
From my understanding, this means that we will need to construct a std::string anyway in order to perform the comparison, so I suspect the overhead will be the same in the end (although hidden by the call to operator==).
Which of the comparisons should I prefer?
Does one version have advantages over the other (may be in specific situations)?
1 I'm aware that less assembly instructions doesn't neccessarily mean faster code, but I don't want to go into micro benchmarking here.
Neither.
If you want to be clever, compare to "string"sv, which returns a std::string_view.
While comparing against a literal like "string" does not result in any allocation-overhead, it's treated as a null terminated string, with all the concomittant disadvantages: No tolerance for embedded nulls, and users must heed the null terminator.
"string"s does an allocation, barring small-string-optimisation or allocation elision. Also, the operator gets passed the length of the literal, no need to count, and it allows for embedded nulls.
And finally using "string"sv combines the advantages of both other approaches, avoiding their individual disadvantages. Also, a std::string_view is a far simpler beast than a std::string, especially if the latter uses SSO as all modern ones do.
At least since C++14 (which generally allowed eliding allocations), compilers could in theory optimise all options to the last one, given sufficient information (generally available for the example) and effort, under the as-if rule. We aren't there yet though.
No, compare() does not require construction of a std::string for const char* operands.
You're using overload #4 here.
The comparison to string literal is the "free" version you're looking for. Instantiating a std::string here is completely unnecessary.
From my understanding, this means that we will need to construct a std::string anyway in order to perform the comparison, so I suspect the overhead will be the same in the end (although hidden by the call to operator==).
This is where that reasoning goes wrong. std::compare does not need to allocate its operand as a C-style null-terminated string to function. According to one of the overloads:
int compare( const CharT* s ) const; // (4)
4) Compares this string to the null-terminated character sequence beginning at the character pointed to by s with length Traits::length(s).
Although whether to allocate or not is an implementation detail, it does not seem reasonable that a sequence comparison would do so.

Are there downsides to using std::string as a buffer?

I have recently seen a colleague of mine using std::string as a buffer:
std::string receive_data(const Receiver& receiver) {
std::string buff;
int size = receiver.size();
if (size > 0) {
buff.resize(size);
const char* dst_ptr = buff.data();
const char* src_ptr = receiver.data();
memcpy((char*) dst_ptr, src_ptr, size);
}
return buff;
}
I guess this guy wants to take advantage of auto destruction of the returned string so he needs not worry about freeing of the allocated buffer.
This looks a bit strange to me since according to cplusplus.com the data() method returns a const char* pointing to a buffer internally managed by the string:
const char* data() const noexcept;
Memcpy-ing to a const char pointer? AFAIK this does no harm as long as we know what we do, but have I missed something? Is this dangerous?
Don't use std::string as a buffer.
It is bad practice to use std::string as a buffer, for several reasons (listed in no particular order):
std::string was not intended for use as a buffer; you would need to double-check the description of the class to make sure there are no "gotchas" which would prevent certain usage patterns (or make them trigger undefined behavior).
As a concrete example: Before C++17, you can't even write through the pointer you get with data() - it's const Tchar *; so your code would cause undefined behavior. (But &(str[0]), &(str.front()), or &(*(str.begin())) would work.)
Using std::strings for buffers is confusing to readers of your function's definition, who assume you would be using std::string for, well, strings. In other words, doing so breaks the Principle of Least Astonishment.
Worse yet, it's confusing for whoever might use your function - they too may think what you're returning is a string, i.e. valid human-readable text.
std::unique_ptr would be fine for your case, or even std::vector. In C++17, you can use std::byte for the element type, too. A more sophisticated option is a class with an SSO-like feature, e.g. Boost's small_vector (thank you, #gast128, for mentioning it).
(Minor point:) libstdc++ had to change its ABI for std::string to conform to the C++11 standard, so in some cases (which by now are rather unlikely), you might run into some linkage or runtime issues that you wouldn't with a different type for your buffer.
Also, your code may make two instead of one heap allocations (implementation dependent): Once upon string construction and another when resize()ing. But that in itself is not really a reason to avoid std::string, since you can avoid the double allocation using the construction in #Jarod42's answer.
You can completely avoid a manual memcpy by calling the appropriate constructor:
std::string receive_data(const Receiver& receiver) {
return {receiver.data(), receiver.size()};
}
That even handles \0 in a string.
BTW, unless content is actually text, I would prefer std::vector<std::byte> (or equivalent).
Memcpy-ing to a const char pointer? AFAIK this does no harm as long as we know what we do, but is this good behavior and why?
The current code may have undefined behavior, depending on the C++ version. To avoid undefined behavior in C++14 and below take the address of the first element. It yields a non-const pointer:
buff.resize(size);
memcpy(&buff[0], &receiver[0], size);
I have recently seen a colleague of mine using std::string as a buffer...
That was somewhat common in older code, especially circa C++03. There are several benefits and downsides to using a string like that. Depending on what you are doing with the code, std::vector can be a bit anemic, and you sometimes used a string instead and accepted the extra overhead of char_traits.
For example, std::string is usually a faster container than std::vector on append, and you can't return std::vector from a function. (Or you could not do so in practice in C++98 because C++98 required the vector to be constructed in the function and copied out). Additionally, std::string allowed you to search with a richer assortment of member functions, like find_first_of and find_first_not_of. That was convenient when searching though arrays of bytes.
I think what you really want/need is SGI's Rope class, but it never made it into the STL. It looks like GCC's libstdc++ may provide it.
There a lengthy discussion about this being legal in C++14 and below:
const char* dst_ptr = buff.data();
const char* src_ptr = receiver.data();
memcpy((char*) dst_ptr, src_ptr, size);
I know for certain it is not safe in GCC. I once did something like this in some self tests and it resulted in a segfault:
std::string buff("A");
...
char* ptr = (char*)buff.data();
size_t len = buff.size();
ptr[0] ^= 1; // tamper with byte
bool tampered = HMAC(key, ptr, len, mac);
GCC put the single byte 'A' in register AL. The high 3-bytes were garbage, so the 32-bit register was 0xXXXXXX41. When I dereferenced at ptr[0], GCC dereferenced a garbage address 0xXXXXXX41.
The two take-aways for me were, don't write half-ass self tests, and don't try to make data() a non-const pointer.
From C++17, data can return a non const char *.
Draft n4659 declares at [string.accessors]:
const charT* c_str() const noexcept;
const charT* data() const noexcept;
....
charT* data() noexcept;
The code is unnecessary, considering that
std::string receive_data(const Receiver& receiver) {
std::string buff;
int size = receiver.size();
if (size > 0) {
buff.assign(receiver.data(), size);
}
return buff;
}
will do exactly the same.
The big optimization opportunity I would investigate here is: Receiver appears to be some kind of container that supports .data() and .size(). If you can consume it, and pass it in as a rvalue reference Receiver&&, you might be able to use move semantics without making any copies at all! If it’s got an iterator interface, you could use those for range-based constructors or std::move() from <algorithm>.
In C++17 (as Serge Ballesta and others have mentioned), std::string::data() returns a pointer to non-const data. A std::string has been guaranteed to store all its data contiguously for years.
The code as written smells a bit, although it’s not really the programmer’s fault: those hacks were necessary at the time. Today, you should at least change the type of dst_ptr from const char* to char* and remove the cast in the first argument to memcpy(). You could also reserve() a number of bytes for the buffer and then use a STL function to move the data.
As others have mentioned, a std::vector or std::unique_ptr would be a more natural data structure to use here.
One downside is performance.
The .resize method will default-initialize all the new byte locations to 0.
That initialization is unnecessary if you're then going to overwrite the 0s with other data.
I do feel std::string is a legitimate contender for managing a "buffer"; whether or not it's the best choice depends on a few things...
Is your buffer content textual or binary in nature?
One major input to your decision should be whether the buffer content is textual in nature. It will be less potentially confusing to readers of your code if std::string is used for textual content.
char is not a good type for storing bytes. Keep in mind that the C++ Standard leaves it up to each implementation to decide whether char will be signed or unsigned, but for generic black-box handling of binary data (and sometimes even when passing characters to functions like std::toupper(int) that have undefined behaviour unless the argument is in range for unsigned char or is equal to EOF) you probably want unsigned data: why would you assume or imply that the first bit of each byte is a sign bit if it's opaque binary data?
Because of that, it's undeniably somewhat hackish to use std::string for "binary" data. You could use std::basic_string<std::byte>, but that's not what the question asks about, and you'd lose some inoperability benefits from using the ubiquitous std::string type.
Some potential benefits of using std::string
Firstly a few benefits:
it sports the RAII semantics we all know and love
most implementations feature short-string optimisation (SSO), which ensures that if the number of bytes is small enough to fit directly inside the string object, dynamic allocation/deallocation can be avoided (but there may be an extra branch each time the data is accessed)
this is perhaps more useful for passing around copies of data read or to be written, rather than for a buffer which should be pre-sized to accept a decent chunk of data if available (to improve throughput by handling more I/O at a time)
there's a wealth of std::string member functions, and non-member functions designed to work well with std::strings (including e.g. cout << my_string): if your client code would find them useful to parse/manipulate/process the buffer content, then you're off to a flying start
the API is very familiar to most C++ programmers
Mixed blessings
being a familiar, ubiquitous type, the code you interact with may have specialisations to for std::string that better suit your use for buffered data, or those specialisations might be worse: do evaluate that
Concerned
As Waxrat observed, what is lacking API wise is the ability to grow the buffer efficiently, as resize() writes NULs/'\0's into the characters added which is pointless if you're about to "receive" values into that memory. This isn't relevant in the OPs code where a copy of received data is being made, and the size is already known.
Discussion
Addressing einpoklum's concern:
std::string was not intended for use as a buffer; you would need to double-check the description of the class to make sure there are no "gotchas" which would prevent certain usage patterns (or make them trigger undefined behavior).
While it's true that std::string wasn't originally intended for this, the rest is mainly FUD. The Standard has made concessions to this kind of usage with C++17's non-const member function char* data(), and string has always supported embedded zero bytes. Most advanced programmers know what's safe to do.
Alternatives
static buffers (C char[N] array or std::array<char, N>) sized to some maximum message size, or ferrying slices of the data per call
a manually allocated buffer with std::unique_ptr to automate destruction: leaves you to do fiddly resizing, and track the allocated vs in-use sizes yourself; more error-prone overall
std::vector (possibly of std::byte for the element type; is widely understood to imply binary data, but the API is more restrictive and (for better or worse) it can't be expected to have anything equivalent to Short-String Optimisation.
Boost's small_vector: perhaps, if SSO was the only thing holding you back from std::vector, and you're happy using boost.
returning a functor that allows lazy access to the received data (providing you know it won't be deallocated or overwritten), deferring the choice of how it's stored by client code
use C++23's string::resize_and_overwrite
https://en.cppreference.com/w/cpp/string/basic_string/resize_and_overwrite
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1072r10.html
[[nodiscard]] static inline string formaterr (DWORD errcode) {
string strerr;
strerr.resize_and_overwrite(2048, [errcode](char* buf, size_t buflen) {
// https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessage
return FormatMessageA(
FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
nullptr,
errcode,
0,
buf,
static_cast<DWORD>(buflen),
nullptr
);
});
return strerr;
}

Is it safe to append std::string to itself?

Considering a code like this:
std::string str = "abcdef";
const size_t num = 50;
const size_t baselen = str.length();
while (str.length() < num)
str.append(str, 0, baselen);
Is it safe to call std::basic_string<T>::append() on itself like this? Cannot the source memory get invalidated by enlarging before the copy operation?
I could not find anything in the standard specific to that method. It says the above is equivalent to str.append(str.data(), baselen), which I think might not be entirely safe unless there is another detection of such cases inside append(const char*, size_t).
I checked a few implementations and they seemed safe one way or another, but my question is if this behavior is guaranteed. E.g. "Appending std::vector to itself, undefined behavior?" says it's not for std::vector.
According to §21.4.6.2/§21.4.6.3:
The function [basic_string& append(const charT* s, size_type n);] replaces the string controlled by *this with a string of length size() + n whose first size() elements are a copy of the original string controlled by *this and whose remaining elements are a copy of the initial n elements of s.
Note: This applies to every append call, as every append can be implemented in terms of append(const charT*, size_type), as defined by the standard (§21.4.6.2/§21.4.6.3).
So basically, append makes a copy of str (let's call the copy strtemp), appends n characters of str2 to strtemp, and then replaces str with strtemp.
For the case that str2 is str, nothing changes, as the string is enlarged when the temporary copy is assigned, not before.
Even though it is not explicitly stated in the standard, it is guaranteed (if the implementation is exactly as stated in the standard) by the definition of std::basic_string<T>::append.
Thus, this is not undefined behavior.
This is complicated.
One thing that can be said for certain. If you use iterators:
std::string str = "abcdef";
str.append(str.begin(), str.end());
then you are guaranteed to be safe. Yes, really. Why? Because the specification states that the behavior of the iterator functions is equivalent to calling append(basic_string(first, last)). That obviously creates a temporary copy of the string. So if you need to insert a string into itself, you're guaranteed to be able to do it with the iterator form.
Granted, implementations don't have to actually copy it. But they do need to respect the standard specified behavior. An implementation could choose to make a copy only if the iterator range is inside of itself, but the implementation would still have to check.
All of the other forms of append are defined to be equivalent to calling append(const charT *s, size_t len). That is, your call to append above is equivalent to you doing append(str.data(), str.size()). So what does the standard say about what happens if s is inside of *this?
Nothing at all.
The only requirement of s is:
s points to an array of at least n elements of charT.
Since it does not expressly forbid s pointing into *this, then it must be allowed. It would also be exceedingly strange if the iterator version allows self-assignment, but the pointer&size version did not.

Possibility of COW std::string implementation in C++11

Today I passed by this SO question: Legality of COW std::string implementation in C++11
The most voted answer (35 upvotes) for that question says:
It's not allowed, because as per the standard 21.4.1 p6, invalidation
of iterators/references is only allowed for
— as an argument to any standard library function taking a reference
to non-const basic_string as an argument.
— Calling non-const member functions, except operator[], at, front,
back, begin, rbegin, end, and rend.
For a COW string, calling non-const operator[] would require making a
copy (and invalidating references), which is disallowed by the
paragraph above. Hence, it's no longer legal to have a COW string in
C++11.
I wonder whether that justification is valid or not because it seems C++03 has similar requirements for string iterator invalidation:
References, pointers, and iterators referring to the elements of a
basic_string sequence may be invalidated by the following uses of that
basic_string object:
As an argument to non-member functions swap() (21.3.7.8), operator>>() (21.3.7.9), and getline() (21.3.7.9).
As an argument to basic_string::swap().
Calling data() and c_str() member functions.
Calling non-const member functions, except operator[](), at(), begin(), rbegin(), end(), and rend().
Subsequent to any of the above uses except the forms of insert() and erase() which return iterators, the first call to non-const member
functions operator[](), at(), begin(), rbegin(), end(), or rend().
These are not exactly the same as those of C++11's, but at least the same for the part of operator[](), which the original answer took as the major justification. So I guess, in order to justify the illegality of COW std::string implementation in C++11, some other standard requirements need to be cited. Help needed here.
That SO question has been inactive for over a year, so I've decided to raise this as a separate question. Please let me know if this is inappropriate and I will find some other way to clear my doubt.
The key point is the last point in the C++03 standard. The
wording could be a lot clearer, but the intent is that the first
call to [], at, etc. (but only the first call) after
something which established new iterators (and thus invalidated
old ones) could invalidate iterators, but only the first. The
wording in C++03 was, in fact, a quick hack, inserted in
response to comments by the French national body on the CD2 of
C++98. The original problem is simple: consider:
std::string a( "some text" );
std::string b( a );
char& rc = a[2];
At this point, modifications through rc must affect a, but
not b. If COW is being used, however, when a[2] is called,
a and b share a representation; in order for writes through
the returned reference not to affect b, a[2] must be
considered a "write", and be allowed to invalidate the
reference. Which is what CD2 said: any call to a non-const
[], at, or one of the begin or end functions could
invalidate iterators and references. The French national body
comments pointed out that this rendered a[i] == a[j] invalid,
since the reference returned by one of the [] would be
invalidated by the other. The last point you cite of C++03 was
added to circumvent this—only the first call to [] et
al. could invalidate the iterators.
I don't think anyone was totally happy with the results. The
wording was done quickly, and while the intent was clear to
those who were aware of the history, and the original problem,
I don't think it was fully clear from standard. In addition,
some experts began to question the value of COW to begin with,
given the relative impossibility of the string class itself to
reliably detect all writes. (If a[i] == a[j] is the complete
expression, there is no write. But the string class itself must
assume that the return value of a[i] may result in a write.)
And in a multi-threaded environment, the cost of managing the
reference count needed for copy on write was deemed a relatively
high cost for something you usually don't need. The result is
that most implementations (which supported threading long before
C++11) have been moving away from COW anyway; as far as I know,
the only major implementation still using COW was g++ (but there
was a known bug in their multithreaded implementation) and
(maybe) Sun CC (which the last time I looked at it, was
inordinately slow, because of the cost of managing the counter).
I think the committee simply took what seemed to them the
simplest way of cleaning things up, by forbidding COW.
EDIT:
Some more clarification with regards to why a COW implementation
has to invalidate iterators on the first call to []. Consider
a naïve implementation of COW. (I will just call it String, and
ignore all of the issues involving traits and allocators, which
aren't really relevant here. I'll also ignore exception and
thread safety, just to make things as simple as possible.)
class String
{
struct StringRep
{
int useCount;
size_t size;
char* data;
StringRep( char const* text, size_t size )
: useCount( 1 )
, size( size )
, data( ::operator new( size + 1 ) )
{
std::memcpy( data, text, size ):
data[size] = '\0';
}
~StringRep()
{
::operator delete( data );
}
};
StringRep* myRep;
public:
String( char const* initial_text )
: myRep( new StringRep( initial_text, strlen( initial_text ) ) )
{
}
String( String const& other )
: myRep( other.myRep )
{
++ myRep->useCount;
}
~String()
{
-- myRep->useCount;
if ( myRep->useCount == 0 ) {
delete myRep;
}
}
char& operator[]( size_t index )
{
return myRep->data[index];
}
};
Now imagine what happens if I write:
String a( "some text" );
String b( a );
a[4] = '-';
What is the value of b after this? (Run through the code by
hand, if you're not sure.)
Obviously, this doesn't work. The solution is to add a flag,
bool uncopyable; to StringRep, which is initialized to
false, and to modify the following functions:
String::String( String const& other )
{
if ( other.myRep->uncopyable ) {
myRep = new StringRep( other.myRep->data, other.myRep->size );
} else {
myRep = other.myRep;
++ myRep->useCount;
}
}
char& String::operator[]( size_t index )
{
if ( myRep->useCount > 1 ) {
-- myRep->useCount;
myRep = new StringRep( myRep->data, myRep->size );
}
myRep->uncopyable = true;
return myRep->data[index];
}
This means, of course, that [] will invalidate iterators and
references, but only the first time it is called on an object.
The next time, the useCount will be one (and the image will be
uncopyable). So a[i] == a[j] works; regardless of which the
compiler actually evaluates first (a[i] or a[j]), the second
one will find a useCount of 1, and will not have to duplicate.
And because of the uncopyable flag,
String a( "some text" );
char& c = a[4];
String b( a );
c = '-';
will also work, and not modify b.
Of course, the above is enormously simplified. Getting it to
work in a multithreaded environment is extremely difficult,
unless you simply grab a mutex for the entire function for any
function which might modify anything (in which case, the
resulting class is extremely slow). G++ tried, and
failed—there is on particular use case where it breaks.
(Getting it to handle the other issues I've ignored is not
particularly difficult, but does represent a lot of lines of
code.)