I am working on a embedded SW project. A lot of strings are stored inside flash memory. I would use these strings (usually const char* or const wchar*) as std::string's data. That means I want to avoid creating a copy of the original data because of memory restrictions.
An extended use might be to read the flash data via stringstream directly out of the flash memory.
Example which unfortunately is not working in place:
const char* flash_adr = 0x00300000;
size_t length = 3000;
std::string str(flash_adr, length);
Any ideas will be appreciated!
If you are willing to go with compiler and library specific implementations, here is an example that works in MSVC 2013.
#include <iostream>
#include <string>
int main() {
std::string str("A std::string with a larger length than yours");
char *flash_adr = "Your source from the flash";
char *reset_adr = str._Bx._Ptr; // Keep the old address around
// Change the inner buffer
(char*)str._Bx._Ptr = flash_adr;
std::cout << str << std::endl;
// Reset the pointer or the program will crash
(char*)str._Bx._Ptr = reset_adr;
return 0;
}
It will print Your source from the flash.
The idea is to reserve a std::string capable of fitting the strings in your flash and keep on changing its inner buffer pointer.
You need to customize this for your compiler and as always, you need to be very very careful.
I have now used string_span described in CPP Core Guidelines (https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md). GSL provides a complete implementation (GSL: Guidelines Support Library https://github.com/Microsoft/GSL).
If you know the address of your string inside flash memory you can just use the address directly with the following constructor to create a string_span.
constexpr basic_string_span(pointer ptr, size_type length) noexcept
: span_(ptr, length)
{}
std::string_view might have done the same job as Captain Obvlious (https://stackoverflow.com/users/845568/captain-obvlious) commented as my favourite comment.
I am quite happy with the solution. It works good from performance side including providing a good readability.
Related
Consider a scenario, where std::string is used to store a secret. Once it is consumed and is no longer needed, it would be good to cleanse it, i.e overwrite the memory that contained it, thus hiding the secret.
std::string provides a function const char* data() returning a pointer to (since C++11) continous memory.
Now, since the memory is continous and the variable will be destroyed right after the cleanse due to scope end, would it be safe to:
char* modifiable = const_cast<char*>(secretString.data());
OpenSSL_cleanse(modifiable, secretString.size());
According to standard quoted here:
$5.2.11/7 - Note: Depending on the type of the object, a write operation through the pointer, lvalue or pointer to data member resulting from a const_cast that casts away a const-qualifier68 may produce undefined behavior (7.1.5.1).
That would advise otherwise, but do the conditions above (continuous, to-be-just-removed) make it safe?
The standard explicitly says you must not write to the const char* returned by data(), so don't do that.
There are perfectly safe ways to get a modifiable pointer instead:
if (secretString.size())
OpenSSL_cleanse(&secretString.front(), secretString.size());
Or if the string might have been shrunk already and you want to ensure its entire capacity is wiped:
if (secretString.capacity()) {
secretString.resize(secretString.capacity());
OpenSSL_cleanse(&secretString.front(), secretString.size());
}
It is probably safe. But not guaranteed.
However, since C++11, a std::string must be implemented as contiguous data so you can safely access its internal array using the address of its first element &secretString[0].
if(!secretString.empty()) // avoid UB
{
char* modifiable = &secretString[0];
OpenSSL_cleanse(modifiable, secretString.size());
}
std::string is a poor choice to store secrets. Since strings are copyable and sometimes copies go unnoticed, your secret may "get legs". Furthermore, string expansion techniques may cause multiple copies of fragments (or all of) your secrets.
Experience dictates a movable, non-copyable, wiped clean on destroy, unintelligent (no tricky copies under-the-hood) class.
You can use std::fill to fill the string with trash:
std::fill(str.begin(),str.end(), 0);
Do note that simply clearing or shrinking the string (with methods such clear or shrink_to_fit) does not guarantee that the string data will be deleted from the process memory. Malicious processes may dump the process memory and can extract the secret if the string is not overwritten correctly.
Bonus: Interestingly, the ability to trash the string data for security reasons forces some programming languages like Java to return passwords as char[] and not String. In Java, String is immutable, so "trashing" it will make a new copy of the string. Hence, you need a modifiable object like char[] which does not use copy-on-write.
Edit: if your compiler does optimize this call out, you can use specific compiler flags to make sure a trashing function will not be optimized out:
#ifdef WIN32
#pragma optimize("",off)
void trashString(std::string& str){
std::fill(str.begin(),str.end(),0);
}
#pragma optimize("",on)
#endif
#ifdef __GCC__
void __attribute__((optimize("O0"))) trashString(std::string& str) {
std::fill(str.begin(),str.end(),0);
}
#endif
#ifdef __clang__
void __attribute__ ((optnone)) trashString(std::string& str) {
std::fill(str.begin(),str.end(),0);
}
#endif
There's a better answer: don't!
std::string is a class which is designed to be userfriendly and efficient. It was not designed with cryptography in mind, so there are few guarantees written into it to help you out. For example, there's no guarantees that your data hasn't been copied elsewhere. At best, you could hope that a particular compiler's implementation offers you the behavior you want.
If you actually want to treat a secret as a secret, you should handle it using tools which are designed for handling secrets. In fact, you should develop a threat model for what capabilities your attacker has, and choose your tools accordingly.
Tested solution on CentOS 6, Debian 8 and Ubuntu 16.04 (g++/clang++, O0, O1, O2, O3):
secretString.resize(secretString.capacity(), '\0');
OPENSSL_cleanse(&secretString[0], secretString.size());
secretString.clear();
If you were really paranoid you could randomise the data in the cleansed string, so as not to give away the length of the string or a location that contained sensitive data:
#include <string>
#include <stdlib.h>
#include <string.h>
typedef void* (*memset_t)(void*, int, size_t);
static volatile memset_t memset_func = memset;
void cleanse(std::string& to_cleanse) {
to_cleanse.resize(to_cleanse.capacity(), '\0');
for (int i = 0; i < to_cleanse.size(); ++i) {
memset_func(&to_cleanse[i], rand(), 1);
}
to_cleanse.clear();
}
You could seed the rand() if you wanted also.
You could also do similar string cleansing without openssl dependency, by using explicit_bzero to null the contents:
#include <string>
#include <string.h>
int main() {
std::string secretString = "ajaja";
secretString.resize(secretString.capacity(), '\0');
explicit_bzero(&secretString[0], secretString.size());
secretString.clear();
return 0;
}
I have following code in C++ (wrote in Visual Studio 2010).
void TEST(BYTE data[], int size)
{
wstring aData = L"Here is my string";
//something code to append aData string to data array
WinHttpClient client(url);
// Send HTTP post request.
client.SendHttpRequest(L"POST");
}
How can i append aData string to data BYTE array.
Short answer: You can't.
So you get passed the BYTE[] and the size, but you don't know if the data is on the stack or heap, so you can't just use realloc or something to make it bigger.
One approach is to make sure the array is bigger than you need, and pass the current and max size.
Another approach would be to change the API to allow you to use realloc or similar to resize the array.
The only way I can think of without changing the API is to use some sort of marker in data to delimit used and unused space (doesn't work for binary data). e.g. 0 means unused, so you can just look for the first 0 and start appending from there.
edit
Perhaps I misread the intent. If you don't want to make a permanent change, you could take a local copy, append to that (and make sure you clean it up). I thought you wanted a permanent, "in place" append.
#include "stdafx.h"
#include <iostream>
using namespace std;
#include <boost/locale/encoding_utf.hpp>
std::wstring utf8_to_wstring(const std::string& str)
{
return utf_to_utf<wchar_t>(str.c_str(), str.c_str() + str.size());
}
int main()
{
unsigned char test[10];
std::string tesStr((char*)test);
wstring temp = utf8_to_wstring(tesStr);
}
I need to design an efficient and readable class with 2 main functions:
add_buffer(char* buffer) - add a buffer.
char* read_all() - get one big buffer that contains all the buffers that the user added until now (by order).
for example:
char first_buffer[] = {1,2,3};
char second_buffer[] = {4,5,6};
MyClass instance;
instance.add_buffer(first_buffer);
instance.add_buffer(second_buffer);
char* big_buffer = instance.read_all(); // big_buffer = [1,2,3,4,5,6]
NOTE: There are a lot of solutions for this problem but I'm looking for an efficient one because in real life the buffers will be many and big, and I want to save a lot of copying and reallocs (like what std::vector does). I'm also want a readble c++ code.
NOTE: The real life problem is: I'm reading data from an HTTP request that came to me at separated chunks. After all chunks arrived I want to return the whole data to the user.
Use an std::vector<char> with enough memory reserved. Since C++11, you can access the internal buffer with std::vector::data() (until C++11, you have to use &*std::vector::begin()).
If you can use Boost, boost::algorithm::join will do:
#include <boost/algorithm/string/join.hpp>
#include <vector>
#include <iostream>
int main(int, char **)
{
std::vector<std::string> list;
list.push_back("Hello");
list.push_back("World!");
std::string joined = boost::algorithm::join(list, ", ");
std::cout << joined << std::endl;
}
Output:
Hello, World!
Original answer by Tristram Gräbener
Use some standard approach like,
start with some initial memory, say 256
whenever it gets full use reallocate and double the size.
If you don't want to do it yourself, use STL containers like
std::vector<char>
It automatically reallocates memory for you when buffer is full.
Im currently working on a project which uses different language settings. To solve this a table is used to store all the texts in different languages that are used in the program. So whenever a text is about to be written on the screen this table is called and depending on what the current language setting is a text string is returned. I recently joined this project and I noticed that the way of storing this wasn't very optimized and for every new language that was added the time it would take to look up the correct string would increase. I therefore came up with a (in my mind) better solution. However, when I tried to implement it I ran into the problem of getting an error that too much memory is used and I don't understand why. I am using IAR enbedded workbench.
The original solution in pseudo/c++ code:
typedef struct
{
enum textId;
enum language;
string textString;
} Text;
static const Text s_TextMap[] =
{
{ TextId::RESTORE_DATA_Q ,Language::ENGLISH ,"Restore Data?" },
{ TextId::RESTORE_DATA_Q ,Language::SWEDISH ,"Återställa data?" },
{ TextId::RESTORE_DATA_Q ,Language::GERMAN ,"Wiederherstellen von Daten?" },
{ TextId::CHANGE_LANGUAGE ,Language::ENGLISH ,"Change Language" },
{ TextId::CHANGE_LANGUAGE ,Language::SWEDISH ,"Välj språk" },
{ TextId::CHANGE_LANGUAGE ,Language::GERMAN ,"Sprache wählen" },
};
My solution in pseudo/c++ code:
typedef struct
{
const char* pEngText;
const char* pSweText;
const char* pGerText;
} Texts;
static Texts addTexts(const char* pEngText, const char* pSweText, const char* pGerText)
{
Texts t;
t.pEngText = pEngText;
t.pSweText = pSweText;
t.pGerText = pGerText;
return t;
}
typedef struct
{
enum textId;
Texts texts;
} Text;
static const TextTest s_TextMapTest[] =
{
{TextId::RESTORE_DATA_Q, addTexts("Restore Data?","Återställa data?","Wiederherstellen von Daten?")},
{TextId::CHANGE_LANGUAGE, addTexts("Change Language","Välj språk","Sprache wählen")},
};
My solution is obviously faster to lookup in the average case and based on my calculations it should also use less memory. When the full tables are used I've calculated that the original solution requires 7668 bytes and that my solution requires 4248 bytes. The way I did this was to implement the full tables in a small testprogram and use sizeof(s_TextMap). However, when I try to compile the code I get linking errors saying:
Error[Lp011]: section placement failed
unable to allocate space for sections/blocks with a total estimated minimum size of 0x130301 bytes (max align 0x1000) in <[0x0000a000-0x0007ffff]> (total uncommitted space 0x757eb).
Error[Lp011]: section placement failed
unable to allocate space for sections/blocks with a total estimated minimum size of 0x47de4 bytes (max align 0x20) in <[0x1fff0000-0x2000fff0]> (total uncommitted space 0x1fff1).
Error[Lp021]: the destination for compressed initializer batch "USER_DEFAULT_MEMORY-1" is placed at an address that is dependent on the size of the batch, which is not allowed when using lz77 compression. Consider using "initialize by copy with packing = zeros" (or none) instead.
Error[Lp021]: the destination for compressed initializer batch "USER_DEFAULT_MEMORY-1" is placed at an address that is dependent on the size of the batch, which is not allowed when using lz77 compression. Consider using "initialize by copy with packing = zeros" (or none) instead.
The error I'm most confused about is the first one which states that my code is estimated to take 0x130301 bytes of memory and I see no way this is possible. Could this be some bug in IAR or am I missing something?
In your solution s_TextMapTest[] will necessarily be located in RAM rather than ROM because the pointers are set at run-time - though it is not clear how you managed to use a function call as an array element initialiser. On most microcontrollers RAM is a far more limited resource. You have provided no information about the target or its memory map.
Either way you should check the memory map output from your linker to verify that data is located appropriately and where you expect it to be.
The original code and the solution you propose in your own answer is ROMable.
A solution that maintains the form of your original proposal while remaining ROMable is to write addTexts() as a macro rather than a function:
#define addTexts( eng, swe, ger, fin ) {eng, swe, ger, fin}
though I cannot see what the advantage is, and you are not really "adding" text - the text was always there by initialisation.
Ok, so I managed to get it to work. I remowed the extra struct and the functions and just simplified it to:
typedef struct
{
TextId::e textId;
const char* pEngText;
const char* pSweText;
const char* pGerText;
const char* pFinText;
} Text;
It doesn't look as good to me but at least it works
Is it possible to somehow adapt a c-style string/buffer (char* or wchar_t*) to work with the Boost String Algorithms Library?
That is, for example, it's trimalgorithm has the following declaration:
template<typename SequenceT>
void trim(SequenceT &, const std::locale & = std::locale());
and the implementation (look for trim_left_if) requires that the sequence type has a member function erase.
How could I use that with a raw character pointer / c string buffer?
char* pStr = getSomeCString(); // example, could also be something like wchar_t buf[256];
...
boost::trim(pStr); // HOW?
Ideally, the algorithms would work directly on the supplied buffer. (As far as possible. it obviously can't work if an algorithm needs to allocate additional space in the "string".)
#Vitaly asks: why can't you create a std::string from char buffer and then use it in algorithms?
The reason I have char* at all is that I'd like to use a few algorthims on our existing codebase. Refactoring all the char buffers to string would be more work than it's worth, and when changing or adapting something it would be nice to just be able to apply a given algorithm to any c-style string that happens to live in the current code.
Using a string would mean to (a) copy char* to string, (b) apply algorithm to string and (c) copy string back into char buffer.
For the SequenceT-type operations, you probably have to use std::string. If you wanted to implement that by yourself, you'd have to fulfill many more requirements for creation, destruction, value semantics etc. You'd basically end up with your implementation of std::string.
The RangeT-type operations might be, however, usable on char*s using the iterator_range from Boost.Range library. I didn't try it, though.
There exist some code which implements a std::string like string with a fixed buffer. With some tinkering you can modify this code to create a string type which uses an external buffer:
char buffer[100];
strcpy(buffer, " HELLO ");
xstr::xstring<xstr::fixed_char_buf<char> >
str(buffer, strlen(buffer), sizeof(buffer));
boost::algorithm::trim(str);
buffer[str.size()] = 0;
std::cout << buffer << std::endl; // prints "HELLO"
For this I added an constructor to xstr::xstring and xstr::fixed_char_buf to take the buffer, the size of the buffer which is in use and the maximum size of the buffer. Further I replaced the SIZE template argument with a member variable and changed the internal char array into a char pointer.
The xstr code is a bit old and will not compile without trouble on newer compilers but it needs some minor changes. Further I only added the things needed in this case. If you want to use this for real, you need to make some more changes to make sure it can not use uninitialized memory.
Anyway, it might be a good start for writing you own string adapter.
I don't know what platform you're targeting, but on most modern computers (including mobile ones like ARM) memory copy is so fast you shouldn't even waste your time optimizing memory copies. I say - wrap char* in std::string and check whether the performance suits your needs. Don't waste time on premature optimization.