parsing a string to a structure of c-style character arrays - c++

I have a Visual Studio 2008 C++ project where I need to parse a string to a structure of c-style character arrays. What is the most elegant/efficient way of doing this?
Here is my current (functioning) solution:
struct Foo {
char a[ MAX_A ];
char b[ MAX_B ];
char c[ MAX_C ];
char d[ MAX_D ];
};
Func( const Foo& foo );
std::string input = "abcd#efgh#ijkl#mnop";
std::vector< std::string > parsed;
boost::split( parsed, input, boost::is_any_of( "#" ) );
Foo foo = { 0 };
parsed[ 1 ].copy( foo.a, MAX_A );
parsed[ 2 ].copy( foo.b, MAX_B );
parsed[ 3 ].copy( foo.c, MAX_C );
parsed[ 4 ].copy( foo.d, MAX_D );
Func( foo );

Here is my (now tested) idea:
#include <vector>
#include <string>
#include <cstring>
#define MAX_A 40
#define MAX_B 3
#define MAX_C 40
#define MAX_D 4
struct Foo {
char a[ MAX_A ];
char b[ MAX_B ];
char c[ MAX_C ];
char d[ MAX_D ];
};
template <std::ptrdiff_t N>
const char* extractToken(const char* inIt, char (&buf)[N])
{
if (!inIt || !*inIt)
return NULL;
const char* end = strchr(inIt, '#');
if (end)
{
strncpy(buf, inIt, std::min(N, end-inIt));
return end + 1;
}
strncpy(buf, inIt, N);
return NULL;
}
int main(int argc, const char *argv[])
{
std::string input = "abcd#efgh#ijkl#mnop";
Foo foo = { 0 };
const char* cursor = input.c_str();
cursor = extractToken(cursor, foo.a);
cursor = extractToken(cursor, foo.b);
cursor = extractToken(cursor, foo.c);
cursor = extractToken(cursor, foo.d);
}
[Edit] Tests
Adding a little test code
template <std::ptrdiff_t N>
std::string display(const char (&buf)[N])
{
std::string result;
for(size_t i=0; i<N && buf[i]; ++i)
result += buf[i];
return result;
}
int main(int argc, const char *argv[])
{
std::string input = "abcd#efgh#ijkl#mnop";
Foo foo = { 0 };
const char* cursor = input.c_str();
cursor = extractToken(cursor, foo.a);
cursor = extractToken(cursor, foo.b);
cursor = extractToken(cursor, foo.c);
cursor = extractToken(cursor, foo.d);
std::cout << "foo.a: '" << display(foo.a) << "'\n";
std::cout << "foo.b: '" << display(foo.b) << "'\n";
std::cout << "foo.c: '" << display(foo.c) << "'\n";
std::cout << "foo.d: '" << display(foo.d) << "'\n";
}
Outputs
foo.a: 'abcd'
foo.b: 'efg'
foo.c: 'ijkl'
foo.d: 'mnop'
See it Live on http://ideone.com/KdAhO

What about redesigning Foo?
struct Foo {
std::array<std::string, 4> abcd;
std::string a() const { return abcd[0]; }
std::string b() const { return abcd[1]; }
std::string c() const { return abcd[2]; }
std::string d() const { return abcd[3]; }
};
boost::algorithm::split_iterator<std::string::iterator> end,
it = boost::make_split_iterator(input, boost::algorithm::first_finder("#"));
std::transform(it, end, foo.abcd.begin(),
boost::copy_range<std::string, decltype(*it)>);

using a regex would look like this (in C++11, you can translate this to boost or tr1 for VS2008):
// Assuming MAX_A...MAX_D are all 10 in our regex
std::cmatch res;
if(std::regex_match(input.data(),input.data()+input.size(),
res,
std::regex("([^#]{0,10})([^#]{0,10})([^#]{0,10})([^#]{0,10})")))
{
Foo foo = {};
std::copy(res[1].first,res[1].second,foo.a);
std::copy(res[2].first,res[2].second,foo.b);
std::copy(res[3].first,res[3].second,foo.c);
std::copy(res[4].first,res[4].second,foo.d);
}
You should probably create the pattern using a format string and the actual MAX_* variables rather than hard coding the values in the regex like I did here, and you might also want to compile the regex once and save it instead of recreating it every time.
But otherwise, this method avoids doing any extra copies of the string data. The char *s held in each submatch in res is a pointer directly into the input string's buffer, so the only copy is directly from the input string to the final foo object.

Related

Replace a sef of characters in c++with the the least number of code lines

I am using the following function in c++ to replace a set of ASCII characters.
std::string debug::convertStringToEdiFormat(const char *ediBuffer) {
std::string local(ediBuffer);
std::replace(local.begin(), local.end(), '\037', ':');
std::replace(local.begin(), local.end(), '\031', '*');
std::replace(local.begin(), local.end(), '\035', '+');
std::replace(local.begin(), local.end(), '\034', '\'');
return std::string(local);
}
the problem is that it is too long. If I want to replace like 100 characters it will have 100 lines of code. Is there another function that takes less code and allows me to do the same?
This is what you are looking for:
array< char, 256 > m;
// fill m
//...
m['\037'] = ':';
m['\031'] = '*';
m['\035'] = '+';
m['\034'] = '\'';
//...
string s{ "Hello world!" };
for (auto& c : s)
c = m[c];
If all you need is change just a couple of characters, you may use std::transform:
auto my_transform = [](const char c)
{
switch (c)
{
case '\037': return ':';
case '\031': return '*';
case '\035': return '+';
case '\034': return '\'';
default: return c;
}
};
std::string s{ "\037\031\035\034" };
std::transform(s.begin(), s.end(), s.begin(), my_transform);
See the live example.
Best approach is to extract a function/functor which does character conversion.
Here is fully functional functor which is able to perform conversion in both directions.
class ReplaceEncoder {
public:
ReplaceEncoder() {
initArray(m_encode);
initArray(m_decode);
}
void updateEncoding(const std::string &from, const std::string &to) {
assert(from.size() == to.size());
for (int i=0; i<from.size()) {
m_encode[from[i]] = to[i];
m_decode[ to[i]] = from[i];
}
}
char encode(char ch) const {
return m_encode[static_cast<unsigned char>(ch)];
}
char decode(char ch) const {
return m_decode[static_cast<unsigned char>(ch)];
}
char operator()(char ch) const {
return encode(ch);
}
private:
void initArray(std::array<char, 0x100> &arr) {
for (size_t i = 0; i < arr.size(); ++i) {
arr[i] = static_cast<char>(i);
}
}
private:
std::array<char, 0x100> m_encoder;
std::array<char, 0x100> m_decoder;
};
ReplaceEncoder encrypt;
encrypt.updateEncoding("absdet", "gmbstp");
string s{ "Hello world!" };
std::transform(s.begin(), s.end(), s.begin(), encrypt);
Why not use an unordered_map and one loop?
static const std::unordered_map<char, char> a = {{'\037', ':'}, {'\031', '*'}, {'\035', '+'}};
void convertStringToEdiFormat(std::string &ediBuffer) {
for (auto& c : ediBuffer)
{
c = a.at(c);
}
}

load a file of 1's and 0's into a char** line by line

I have a file, at the end of each line there is possibly a newline:
111\n
100\n
101
In C++ you can load the lines of a file into an array of byte strings like this:
auto lines_from( istream& is )
-> vector<string>
{
string line;
vector<string> result;
while( getline( is, line ) )
{
result.push_back( line );
}
return result;
}
auto main() -> int
{
vector<string> const lines = lines_from( cin );
// Use it.
}
Here string is std::string from the <string> header, getline is std::getline from the same header, and vector is std::vector from the <vector> header. I chose to use a descriptive name for the function, lines_from. However, it's commonly named readall.
Where you absolutely need a char**, presumably with an assumption of some given buffer size for each string, then you can use a vector of pointers, pointing to buffers that e.g. are managed by a class like this:
class C_strings
{
private:
vector<string> buffers_;
vector<char*> pointers_;
int bufsize_;
C_strings( C_strings const& ) = delete;
auto operator=( C_strings const& ) -> C_strings& = delete;
public:
auto pointer() -> char** { return pointers_.data(); }
auto bufsize() const -> int { return bufsize_; }
C_strings( vector<string> const& strings, int const bufsize )
: buffers_( strings )
, bufsize_( bufsize )
{
pointers_.reserve( buffers_.size() + 1 );
for( string& s : buffers_ )
{
s.reserve( bufsize );
if( s.empty() or s.back() != '\0' ) { s += '\0'; }
pointers_.push_back( &s[0] );
}
pointers_.push_back( nullptr );
}
C_strings( C_strings&& other )
: buffers_( move( other.buffers_ ) )
, pointers_( move( other.pointers_ ) )
{}
};
Then let's say you want to call a double-star function like this:
void doublestarfunc( char** const lines )
{
using std::cout;
for( char** pps = lines; *pps != nullptr; ++pps )
{
if( strlen( *pps ) < 40 ) { strcat( *pps, " < Oh la la!" ); }
cout << *pps << '\n';
}
cout << '\n';
}
It can be done very simply:
using namespace std; // cin, cout
int const columns = 80;
int const cstring_bufsize = columns + 1;
auto c_strings = C_strings( lines_from( cin ), cstring_bufsize );
doublestarfunc( c_strings.pointer() );
But is it a good idea? No, except when you have to relate to an existing C style API. For C++ code, better restructure it to use C++ std::string throughout.

C++: read dataset and check if vector<Class> is subset of vector<Class>

I have the following piece of code. The code creates a vector Dataset, each element of which is a vector. It also creates a vector S.
I want to check which vector of Dataset contain vector of S. Apparently I am doing something wrong, because for the following example,
Dataset is:
a b c
a d
a b d
and S:
a b
it should print: 0 2
and for me it prints: 0 1 2
#include <iostream>
#include <fstream>
#include <sstream>
#include <string.h>
#include <string>
#include <time.h>
#include <vector>
#include <algorithm>
using namespace std;
class StringRef
{
private:
char const* begin_;
int size_;
public:
int size() const { return size_; }
char const* begin() const { return begin_; }
char const* end() const { return begin_ + size_; }
StringRef( char const* const begin, int const size )
: begin_( begin )
, size_( size )
{}
bool operator<(const StringRef& obj) const
{
return (strcmp(begin(),obj.begin()) > 0 );
}
};
/************************************************
* Checks if vector B is subset of vector A *
************************************************/
bool isSubset(std::vector<StringRef> A, std::vector<StringRef> B)
{
std::sort(A.begin(), A.end());
std::sort(B.begin(), B.end());
return std::includes(A.begin(), A.end(), B.begin(), B.end());
}
vector<StringRef> split3( string const& str, char delimiter = ' ' )
{
vector<StringRef> result;
enum State { inSpace, inToken };
State state = inSpace;
char const* pTokenBegin = 0; // Init to satisfy compiler.
for(auto it = str.begin(); it != str.end(); ++it )
{
State const newState = (*it == delimiter? inSpace : inToken);
if( newState != state )
{
switch( newState )
{
case inSpace:
result.push_back( StringRef( pTokenBegin, &*it - pTokenBegin ) );
break;
case inToken:
pTokenBegin = &*it;
}
}
state = newState;
}
if( state == inToken )
{
result.push_back( StringRef( pTokenBegin, &str.back() - pTokenBegin ) );
}
return result;
}
int main() {
vector<vector<StringRef> > Dataset;
vector<vector<StringRef> > S;
ifstream input("test.dat");
long count = 0;
int sec, lps;
time_t start = time(NULL);
cin.sync_with_stdio(false); //disable synchronous IO
for( string line; getline( input, line ); )
{
Dataset.push_back(split3( line ));
count++;
};
input.close();
input.clear();
input.open("subs.dat");
for( string line; getline( input, line ); )
{
S.push_back(split3( line ));
};
for ( std::vector<std::vector<StringRef> >::size_type i = 0; i < S.size(); i++ )
{
for(std::vector<std::vector<StringRef> >::size_type j=0; j<Dataset.size();j++)
{
if (isSubset(Dataset[j], S[i]))
{
cout << j << " ";
}
}
}
sec = (int) time(NULL) - start;
cerr << "C++ : Saw " << count << " lines in " << sec << " seconds." ;
if (sec > 0) {
lps = count / sec;
cerr << " Crunch speed: " << lps << endl;
} else
cerr << endl;
return 0;
}
Your StringRef type is dangerous because it contains a const char * pointer, but no concept of ownership. So the pointer could be invalidated at some point after the object is constructed.
And indeed this is what happens here: You have a single string (line) and create StringRefs with pointers to its internal data. When the string is later modified, these pointers are invalidated.
You should create a vector<std::string> instead to prevent this problem.

how to copy std::vector<const char*> into void*?

I have a std::vector<const char*> v vector fully initiated with strings. How do I copy it into an initialized void*?
int main() {
std::vector<const char*> v;
v.push_back("abc");
v.push_back("def");
void* b = malloc(6);
// how to copy v in to b?
}
Thank you
You can use the range based for statement. For example
void *p = b;
for ( const char *s : v )
{
size_t n = std::strlen( s );
std::memcpy( p, s, n );
p = ( char * )p + n;
}
The same can be done with standard algorithm std::accumulate declared in header <numeric>
For example
#include <iostream>
#include <vector>
#include <numeric>
#include <cstring>
int main()
{
std::vector<const char*> v;
v.push_back( "abc" );
v.push_back( "def" );
void *b = new char[6];
auto p = std::accumulate( v.begin(), v.end(), b,
[]( void *acc, const char *s ) -> void *
{
size_t n = std::strlen( s );
return ( ( char * )std::memcpy( acc, s, n ) + n );
} );
std::cout.write( (const char * )b , ( const char * )p - ( const char * )b );
std::cout << std::endl;
return 0;
}
The output is
abcdef
Take into account that it would be better to write
void* b = new char[6];
Iterate through the vector moving your pointer along by the string length each time.
int currentIndex = 0;
for(int i = 0; i < v.length(); i++) {
memcpy(b + currentIndex, v[i], strlen(v[i]));
currentIndex += strlen(v[i]);
}
Something like that, may not work I didn't test it.
And if you want to use b as a string, you need to put a 0 on the end.
try this :
std::vector<const char*> v;
v.push_back("abc");
v.push_back("def");
void* b = malloc(6);
int l = strlen(v[0]);
memcpy(b, v[0], l);
l = strlen(v[1]);
memcpy(b+l, v[1], l);
This is the most C++-like way I can think of. It doesn't use any C-style string manipulation or memory management functions and it is as safe as can be. Downside is that a temporary buffer is allocated and released without actual need.
The following headers are used in this example.
#include <cstddef>
#include <iostream>
#include <memory>
#include <string>
#include <vector>
Now, the first version (which I prefer) allocates a buffer of the correct size and returns it wrapped in a std::unique_ptr<char[]> from which the caller can then extract the buffer and implicitly cast it to void * if needed.
std::unique_ptr<char[]>
concatenate(const std::vector<const char *>& words)
{
std::string buffer {};
std::unique_ptr<char[]> dup_uptr {};
for (const auto word : words)
buffer += word;
dup_uptr.reset(new char[buffer.size() + 1]);
buffer.copy(dup_uptr.get(), buffer.size());
dup_uptr.get()[buffer.size()] = '\0';
return dup_uptr;
}
The second version takes the destination buffer and its size as parameters and copies the concatenated string there. This can obviously fail if the buffer happens to be too small and it is also less safe since we might make a mistake and pass the wrong buffer size. For convenience, it returns a char * pointer to the string or a nullptr if the buffer was too small.
char *
concatenate(const std::vector<const char *>& words,
void *const dest,
const std::size_t size)
{
char * dest_chars = static_cast<char *>(dest);
std::string buffer {};
for (const auto word : words)
buffer += word;
// If we cannot copy the whole thing, copy nothing at all.
if (buffer.size() >= size)
return nullptr;
buffer.copy(dest_chars, buffer.size());
dest_chars[buffer.size()] = '\0';
return dest_chars;
}
The two functions can be used like this:
int
main()
{
const std::vector<const char*> vec {"abc", "def"};
// first version
{
auto concat = concatenate(vec);
std::cout << concat.get() << std::endl;
}
// second version
{
auto tempbuff = std::get_temporary_buffer<void>(100);
if (auto concat = concatenate(vec, tempbuff.first, tempbuff.second))
std::cout << concat << std::endl;
std::return_temporary_buffer(tempbuff.first);
}
return 0;
}

How to access c style string variables sequentially in for loop

I have a long list of some good old fashioned c style strings:
const char * p1key = PROPERTY_MAX_THREADS;
const char * p1value = "12";
const char * p2key = PROPERTY_MAX_FRAMES;
const char * p2value = "400";
const char * p3key = PROPERTY_MAX_FRAMEMEMORY;
const char * p3value = "140";
...
Then I do some stuff with them:
// write p1, p2, p3, pn to disk in fancy format
At the end I want to be able to write a loop and compare the written values to the original values.
int numProperties = 20;
for (int i = 0; i < numProperties; ++i) {
// on the first iteration, access p1 key/value
// on the second, access p2 key/value
// ...
}
How can I access p1 on the first iteration, p2 on the second, etc? Would an array of pointers help? I'm struggling to come up with the syntax to make this work. Any help would be very much appreciated.
Edit:
I would consider the best answer to show both the C and C++ way
INTRODUCTION
You'd have to store the pointers in some sort of container to be able to iterate over them in the manner as you propose.
Since you are dealing with pairs, std::pair from <utility> seems like a perfect match. Wrapping these std::pairs in a container such as std::vector will make it very easy to iterate over them in a clean manner.
SAMPLE IMPLEMENTATION
#include <iostream>
#include <utility>
#include <vector>
#define PROPERTY_MAX_THREADS "max_threads"
#define PROPERTY_MAX_FRAMES "max_frames"
#define PROPERTY_MAX_FRAMEMEMORY "max_fmemory"
const char * p1key = PROPERTY_MAX_THREADS;
const char * p1value = "12";
const char * p2key = PROPERTY_MAX_FRAMES;
const char * p2value = "400";
const char * p3key = PROPERTY_MAX_FRAMEMEMORY;
const char * p3value = "140";
int
main (int argc, char *argv[])
{
std::vector<std::pair<char const *, char const *>> properties {
{ p1key, p1value }, { p2key, p2value }, { p3key, p3value }
};
std::cout << "properties:\n";
for (auto& it : properties) {
std::cout << " " << it.first << " = " << it.second << "\n";
}
}
properties:
max_threads = 12
max_frames = 400
max_fmemory = 140
I TRIED THE ABOVE BUT IT DOESN'T COMPILE, WHY?
The previously written snippet makes use of features introduced in C++11, if you are unable to compile such code you will need to resort to functionality that your compiler does provide.
Below is a modified implementation that can be compiled by any compiler that supports C++03:
int const PROPERTIES_LEN = 3;
std::pair<char const *, char const*> properties[PROPERTIES_LEN] = {
std::make_pair (p1key, p1value),
std::make_pair (p2key, p2value),
std::make_pair (p3key, p3value)
};
for (int i = 0; i < PROPERTIES_LEN; ++i) {
std::cout << properties[i].first << " = " << properties[i].second << "\n";
}
You tagged it C++, so I'm going to give the C++ suggestion.
#include <iostream>
#include <vector>
#include <utility>
#define PROPERTY_MAX_THREADS "1"
#define PROPERTY_MAX_FRAMES "2"
#define PROPERTY_MAX_FRAMEMEMORY "3"
const char * p1key = PROPERTY_MAX_THREADS;
const char * p1value = "12";
const char * p2key = PROPERTY_MAX_FRAMES;
const char * p2value = "400";
const char * p3key = PROPERTY_MAX_FRAMEMEMORY;
const char * p3value = "140";
int main() {
using namespace std;
vector<pair<const char*,const char *>> collection =
{{p1key,p1value},{p2key,p2value},{p3key,p3value}};
for(auto &ele : collection){
cout << "key:" << ele.first
<< "value:" << ele.second << endl;
}
return 0;
}
alternatively just declare it as a collection from the beginning
#include <iostream>
#include <string>
#include <vector>
#include <utility>
#define PROPERTY_MAX_THREADS "1"
#define PROPERTY_MAX_FRAMES "2"
#define PROPERTY_MAX_FRAMEMEMORY "3"
int main() {
using namespace std;
vector<pair<const string,const string>> collection =
{
{PROPERTY_MAX_THREADS, "12" },
{PROPERTY_MAX_FRAMES, "400"},
{PROPERTY_MAX_FRAMEMEMORY, "140"}
};
for(auto &ele : collection){
cout << "key:" << ele.first
<< " value:" << ele.second << endl;
}
return 0;
}
In C you can do this way.
#define STRA_END 0
const char* keyArray[] = {
"string1",
"string2",
"string3",
STRA_END
}
const char* valueArray[] = {
"string1",
"string2",
"string3",
STRA_END
}
main(){
int i;
for( i=0; keyArray[i]!=0; ++i )
doSometingToString(keyArray[i], valueArray[i]);
}