std::vectors to arrow::table

std::vectors to arrow::table - c++

I've looked through the Apache Arrow docs but I can't find a clean way of converting equal length std::vectors into an arrow::Array and then an arrow::Table. Here's the code in question.
#include <vector>
#include <arrow/array.h>
#include <arrow/table.h>
const std::vector<double> a = {1,2,3,4,5};
const std::vector<bool> b = {true, false, false, true, true};
auto schema = arrow::schema({arrow::field("a", arrow::float64()), arrow::field("b", arrow::boolean())});
std::shared_ptr<arrow::Array> array_a(N, arrow::float64());
std::shared_ptr<arrow::Array> array_b(N, arrow::boolean());
// how to store the contents of the vectors a and b into array_a and array_b, resp.
// ...?
std::shared_ptr<arrow::Table> table = arrow::Table::Make(schema, {array_a, array_b});

As mentioned in the documentation,you can use a builder for this:
const std::vector<double> a = {1,2,3,4,5};
const std::vector<bool> b = {true, false, false, true, true};
auto schema = arrow::schema({arrow::field("a", arrow::float64()), arrow::field("b", arrow::boolean())});
arrow::DoubleBuilder aBuilder;
PARQUET_THROW_NOT_OK(aBuilder.AppendValues(a));
arrow::BooleanBuilder bBuilder;
PARQUET_THROW_NOT_OK(bBuilder.AppendValues(b));
std::shared_ptr<arrow::Array> array_a, array_b;
ARROW_ASSIGN_OR_RAISE(array_a, aBuilder.Finish());
ARROW_ASSIGN_OR_RAISE(array_b, bBuilder.Finish());
std::shared_ptr<arrow::Table> table = arrow::Table::Make(schema, {array_a, array_b});

Related

What is the most resource efficient way to test if two std::lists contain the same unique elements?

In my code, I have to compare keys from structures that are returned in a random order as a list. I need to check if the two structures have the same key elements, ignoring the order and only comparing unique elements.
At the moment, I use code like the one shown in the next example:
#include <list>
#include <set>
#include <string>
template<typename T>
auto areListsAsSetsEqual(const std::list<T> &a, const std::list<T> &b) -> bool {
auto aSet = std::set<T>{a.begin(), a.end()};
auto bSet = std::set<T>{b.begin(), b.end()};
return aSet == bSet;
}
auto main() -> int {
auto x = std::list<std::string>{"red", "blue", "yellow", "green", "green"};
auto y = std::list<std::string>{"blue", "green", "yellow", "red", "red"};
auto z = std::list<std::string>{"green", "red", "yellow"};
auto xyEqual = areListsAsSetsEqual(x, y);
assert(xyEqual == true);
auto xzEqual = areListsAsSetsEqual(x, z);
assert(xzEqual == false);
return 0;
}
It works and is short and reliable code, but for every comparison, two new sets must be created, and all elements from the two lists must be copied.
Is there a more efficient and elegant way to compare the two lists for the same unique elements, using less CPU and/or memory?

Here's a different take. It only requires one new container, and only one list is copied.
#include <cassert>
#include <list>
#include <string>
#include <unordered_map>
template <typename T>
auto areListsAsSetsEqual(const std::list<T>& a, const std::list<T>& b) -> bool {
std::unordered_map<T, bool> firstListKeys;
for (const auto& i : a) {
firstListKeys[i] = true;
}
for (const auto& i : b) {
if (firstListKeys.find(i) != firstListKeys.end()) {
firstListKeys[i] = false;
} else {
return false;
}
}
for (const auto& p : firstListKeys) {
if (p.second == true) {
return false;
}
}
return true;
}
auto main() -> int {
auto x = std::list<std::string>{"red", "blue", "yellow", "green", "green"};
auto y = std::list<std::string>{"blue", "green", "yellow", "red", "red"};
auto z = std::list<std::string>{"green", "red", "yellow"};
auto xyEqual = areListsAsSetsEqual(x, y);
assert(xyEqual == true);
auto xzEqual = areListsAsSetsEqual(x, z);
assert(xzEqual == false);
return 0;
}
The first list is copied in to the std::unordered_map and each key is set to true. The second list is iterated and searches the map (O(1)). If not found, we can immediately return. If found, the key is set to false. We then have to search the unordered_map to see if any elements were left in the true state.
I haven't run any benchmarks, so testing would be required to see if this solution is any more efficient. On paper, both run at O(3N), but average run-time would need to be tested.
But while runtime efficiency gains are nebulous, this method does offer a clear space efficiency advantage. Your current algorithm currently requires ~2N space, where mine is closer to N.

For completeness sake: you can also try to go with std algorithms. Assuming you don't want to modify your inputs, you'll need copies. (For a solution with only one copy, see #sweenish answer.)
#include <list>
#include <string>
#include <cassert>
#include <algorithm>
#include <iterator>
#include <vector>
template<typename T>
auto areListsEqual(std::list<T> const &a, std::list<T> const &b) -> bool {
std::vector<T> aVector(a.size());
std::partial_sort_copy(a.cbegin(), a.cend(), aVector.begin(), aVector.end());
auto aEnd = std::unique(aVector.begin(), aVector.end());
std::vector<T> bVector(b.size());
std::partial_sort_copy(b.cbegin(), b.cend(), bVector.begin(), bVector.end());
auto bEnd = std::unique(bVector.begin(), bVector.end());
return std::distance(aVector.begin(),aEnd) == std::distance(bVector.begin(),bEnd)
? std::equal(aVector.begin(), aEnd, bVector.begin())
: false;
}
auto main() -> int {
auto x = std::list<std::string>{"red", "blue", "yellow", "green", "green"};
auto y = std::list<std::string>{"blue", "green", "yellow", "red", "red"};
auto z = std::list<std::string>{"green", "red", "yellow"};
auto w = std::list<std::string>{"green", "red", "yellow", "black"};
auto xyEqual = areListsEqual(x, y);
assert(xyEqual == true);
auto xzEqual = areListsEqual(x, z);
assert(xzEqual == false);
auto xwEqual = areListsEqual(x, w);
assert(xwEqual == false);
return 0;
}
This solution would not be faster in terms of "big-O". But it uses sequential containers as intermediate storage, which might be more cache-efficient on modern hardware. As always with optimisations nowadays, you have to measure with representative data.

For completeness of this question, the fastest algorithm is roughly based on the answer from sweenish but works without copies of the original data. Therefore the following algorithm is superior under the following two conditions:
Creating a copy/hash of the elements in the list is expensive (e.g. for std::string or strings in general.
Searching for elements sequentially in the container is fast (short lists, fast compare).
#include <vector>
#include <list>
#include <algorithm>
#include <string>
template<typename T>
auto areListsAsSetsEqual(const std::list<T> &a, const std::list<T> &b) -> bool {
auto keyMap = std::vector<bool>(a.size(), false);
const auto srcEndA = a.end();
const auto srcEndB = b.end();
std::size_t keyIndex = 0;
for (auto it = a.begin(); it != srcEndA; ++it, ++keyIndex) {
if (std::find(a.begin(), it, *it) == it) {
keyMap[keyIndex] = true;
}
}
for (const auto &element : b) {
auto foundIt = std::find(a.begin(), a.end(), element);
if (foundIt == a.end()) {
return false;
}
keyMap[std::distance(a.begin(), foundIt)] = false;
}
return std::all_of(keyMap.begin(), keyMap.end(), [](bool flag) -> bool { return !flag; });
}
auto main() -> int {
auto x = std::list<std::string>{"red", "blue", "yellow", "green", "green"};
auto y = std::list<std::string>{"blue", "green", "yellow", "red", "red"};
auto z = std::list<std::string>{"green", "red", "yellow"};
auto xyEqual = areListsAsSetsEqual(x, y);
assert(xyEqual == true);
auto xzEqual = areListsAsSetsEqual(x, z);
assert(xzEqual == false);
return 0;
}
Using my extensive test data that uses std::string keys with sizes from 4 up to 128 characters and key sets of zero to 32 elements. Making 200'000 comparisons of sets with small mutations, I get these results:
Implementation
Avg Time per Call
Avg Time if Equal
Speed Gain
original
0.005232 ms
0.005444 ms
1×
kaba
0.004337 ms
0.004275 ms
1.21×
sweenish
0.002796 ms
0.003919 ms
1.87×
this solution
0.000566 ms
0.001305 ms
9.24×
The algorithm also has the lowest memory usage.

Conditional boost::range::join

I want to write a function like this:
template<class IterableType>
void CheckAndProcessIterables(IterableType& a, IterableType& b, IterableType& c) {
IteratorRangeType range{}; // empty range
if (Check(a)) {
range = boost::range::join(range, a);
}
if (Check(b)) {
range = boost::range::join(range, b);
}
if (Check(c)) {
range = boost::range::join(range, c);
}
Process(range);
}
Is it possible? Which type should I use instead of IteratorRangeType?
As far as I understand it, boost::range::join return type depends on it's arguments. Is there some wrapper class which can be assigned any type of range as long as its underlying value type is same?

You can use type erased iterator ranges, which Boost has in the form of any_range.
Beware of the performance cost of these, which can quickly become very noticable. I'd rethink the approach unless you're very sure that this not on any hot path and readability is a much more of a concern than performance.
Live On CoCompiler Explorer
#include <boost/range/join.hpp>
#include <boost/range/any_range.hpp>
// for demo only:
#include <boost/range/algorithm/for_each.hpp>
#include <boost/lambda/lambda.hpp>
#include <fmt/ranges.h>
template<class Range>
bool Check(Range const& r) {
bool odd_len = boost::size(r) % 2;
fmt::print("Check {}, odd_len? {}\n", r, odd_len);
return odd_len;
}
template<class Range>
void Process(Range const& r) {
fmt::print("Processing {}\n", r);
using namespace boost::lambda;
for_each(r, _1 *= _1);
}
template<class IterableType>
void CheckAndProcessIterables(IterableType& a, IterableType& b, IterableType& c) {
using V = typename boost::range_value<IterableType>::type;
using ErasedRange= boost::any_range<V, boost::forward_traversal_tag>;
ErasedRange range{}; // empty range
if (Check(a)) {
range = boost::range::join(range, a);
}
if (Check(b)) {
range = boost::range::join(range, b);
}
if (Check(c)) {
range = boost::range::join(range, c);
}
Process(range);
}
int main() {
std::vector a{1, 2, 3}, b{4, 5}, c{6, 7, 8};
CheckAndProcessIterables(a, b, c);
fmt::print("After process: a:{} b:{} c:{}\n", a, b, c);
}
Prints
Check {1, 2, 3}, odd_len? true
Check {4, 5}, odd_len? false
Check {6, 7, 8}, odd_len? true
Processing {1, 2, 3, 6, 7, 8}
After process: a:{1, 4, 9} b:{4, 5} c:{36, 49, 64}

Looking for an easy way to reinitialize a struct

I have a struct called CoolStruct:
struct CoolStruct
{
int id;
uint32 type;
uint32 subtype;
String name;
};
I have a vector of these structs as well:
std::vector<CoolStruct> coolVector;
I want to create a bunch of structs which have predefined values to push_back into this coolVector. I'd like to keep the code from getting cludgy and ugly. I would really like to keep this notation:
CoolStruct t = {1, EQData::EQ_EFFECT_TYPE_PARAMETRIC, 0, T("Parametric")};
coolVector.push_back(t);
CoolStruct t = {2, EQData::EQ_EFFECT_TYPE_FILTER_LOW_PASS,EQData::EQ_FILTER_TYPE_FILTER_BUTTERWORTH_12DB, T("Low Pass")};
coolVector.push_back(t);
But of course this doesn't work... Not allowed to do a reinitialization. Is there any other solution to make this as readable as possible? The only alternative I can think of is it manually set each paramater of the struct:
t.id = whatever; t.type = somethingelse; t.subtype = thisisalotofcode; t.name = justtosetupthisvector;
coolVector.push_back(t);

how about:
CoolStruct t1 = {1, EQData::EQ_EFFECT_TYPE_PARAMETRIC, 0, T("Parametric")};
coolVector.push_back(t1);
CoolStruct t2 = {2, EQData::EQ_EFFECT_TYPE_FILTER_LOW_PASS,EQData::EQ_FILTER_TYPE_FILTER_BUTTERWORTH_12DB, T("Low Pass")};
coolVector.push_back(t2);
In C++0x, I think you should be able to do:
CoolStruct t;
t = {1, EQData::EQ_EFFECT_TYPE_PARAMETRIC, 0, T("Parametric")};
coolVector.push_back(t);
t = {2, EQData::EQ_EFFECT_TYPE_FILTER_LOW_PASS,EQData::EQ_FILTER_TYPE_FILTER_BUTTERWORTH_12DB, T("Low Pass")};
coolVector.push_back(t);
or even:
coolVector.push_back({1, EQData::EQ_EFFECT_TYPE_PARAMETRIC, 0, T("Parametric")});
coolVector.push_back({2, EQData::EQ_EFFECT_TYPE_FILTER_LOW_PASS,EQData::EQ_FILTER_TYPE_FILTER_BUTTERWORTH_12DB, T("Low Pass")});
In fact, if you really want to get creative (and you don't have any previous elements that you want to keep), you can replace the whole vector with this syntax:
coolVector = {
{1, EQData::EQ_EFFECT_TYPE_PARAMETRIC, 0, T("Parametric")},
{2, EQData::EQ_EFFECT_TYPE_FILTER_LOW_PASS,EQData::EQ_FILTER_TYPE_FILTER_BUTTERWORTH_12DB, T("Low Pass")}
};

if you add a simple constructor:
struct CoolStruct
{
CoolStruct(int id, uint32 type, uint32 subtype, String name) : id(id), type(type), subtype(subtype), name(name) {}
int id;
uint32 type;
uint32 subtype;
String name;
};
you can then do this:
CoolVector.push_back(CoolStruct(1, EQData::EQ_EFFECT_TYPE_PARAMETRIC, 0, T("Parametric")));
CoolVector.push_back(CoolStruct(2, EQData::EQ_EFFECT_TYPE_FILTER_LOW_PASS,EQData::EQ_FILTER_TYPE_FILTER_BUTTERWORTH_12DB, T("Low Pass")));

How to put initialized structs in a struct?

I have a struct :
typedef struct
{
int nNum;
string str;
}KeyPair;
Then I initialize my struct into something like this:
KeyPair keys[] =
{
{0, "tester"},
{2, "yadah"},
{0, "tester"}
};
And yet, let's say a number of other initializations:
KeyPair keysA[] =
{
{0, "tester"},
{2, "yadah"},
{0, "tester"}
};
KeyPair keysB[] =
{
{0, "testeras"},
{2, "yadahsdf"},
{3, "testerasss"}
};
KeyPair OtherkeysA[] =
{
{1, "tester"},
{2, "yadah"},
{3, "tester"}
};
and like 20 more of 'em.
Now, how do I create another struct and initialize it such that it contains these initiazed KeyPairs?
The reason for this is because I will repetitively call a function whose parameters would come for these structs. And I DO NOT want to do it this way:
pressKeyPairs( keys, sizeof( keys) / sizeof( keys[0] ) );
pressKeyPairs( keysA, sizeof( keysA) / sizeof( keysA[0] ) );
pressKeyPairs( keysB, sizeof( keysB) / sizeof( keysB[0] ) );
pressKeyPairs( OtherkeysA, sizeof( OtherkeysA) / sizeof( OtherkeysA[0] ) );
and so on...
So I would like to just loop through a struct containing these inilialized instantiations of KeyPairs...
OR I would like to put these initialized instances of KeyPairs into a vector and just loop through the vector... How do I do that?

Assuming that you have a fixed number key pairs, you could use a structure member function:
typedef struct KeyPairs {
KeyPair keysA[3];
KeyPair keysB[3];
KeyPair otherKeysA[3];
void init() {
keysA[0].nNum = 0;
keysA[0].str = "tester";
keysA[1].nNum = 2;
keysA[1].str = "yadah";
keysA[2].nNum = 0;
keysA[2].str = "tester";
// and so on for other keys
}
} KeyPairs;
Then use it like so:
KeyPairs pairs;
pairs.init();

How about doing real C++ and using constructors ?
(note that typedefs are implicits for structs in C++)
struct KeyPair
{
int nNum;
string str;
public:
KeyPair() {}
KeyPair(int n, string s) : nNum(n), str(s) {}
};
And then use another struct :
struct TripleKeyPair
{
KeyPair keys[3];
TripleKeyPair()
{
// Your initialisation code goes here
}
};
And finally, I wouldn't advice using names such as :
KeysA, KeysB, KeysC ...
Arrays are exactly for this. Why note use std::vector ?

How about using "null" objects as delimiters in the array? You would have to use constructors though:
struct KeyPair
{
KeyPair() : fIsEmpty(true) {}
KeyPair(int nNum_, const char *szStr) : nNum(nNum_), str(szStr), fIsEmpty(false) {}
int nNum;
string str;
bool fIsEmpty;
};
Then you can initialize it like this:
KeyPair allKeys[] =
{
KeyPair(0, "testeras"),
KeyPair(2, "yadahsdf"),
KeyPair(3, "testerasss"),
KeyPair(),
KeyPair(0, "tester"),
KeyPair(2, "yadah"),
KeyPair(3, "tester"),
KeyPair(1, "moreyadah"),
KeyPair()
};
And the iteration is trivial if you implement a kind of strlen() analog for KeyPair object array.

C++: How to implement (something like) JSON

Not sure how to explain it - I'm pretty new to C++, but... let me try:
Let's say I have 300+ names (Jeff, Jack...) with 300+ int values (0 or 1). In JS I would use JSON. Something like this:
var people = {"person": [
{"name": "Jeff","val": 0},
{"name": "Jill","val": 1},
{"name": "Jack","val": 0},
{"name": "Jim","val": 1},
{"name": "John","val": 0}
]}
What's the best way to do this in C++?
Thanks.

If you can have duplicate names you can't use a map, so you could use something like this:
struct Person
{
Person( const std::string & n, int v ) : name(n), val(v) {}
std::string name;
int val;
};
int main()
{
std::vector<Person> people;
people.push_back( Person( "Jeff", 0 ) );
people.push_back( Person( "Jill", 1 ) );
...
}
If you wanted uniqueness of names you could do something like this:
std::map<std::string, int> people;
people["Jeff"] = 0;
people["Jill"] = 1;
or
std::map<std::string, Person> people;
people["Jeff"] = Person("Jeff",0);
people["Jill"] = Person("Jill",1);
If you're using this code a lot you can clean up the repeated cruft.
template<typename K, typename V>
struct BuildMap
{
BuildMap() : map_() {}
BuildMap<K,V>& operator()( const K & key, const V & value )
{
map_[key]=value;
return *this;
}
std::map<K,V> operator()() { return map_; }
std::map<K,V> map_;
};
std::map<std::string,int> people = BuildMap<std::string,int>()
( "Jeff", 0 )
( "Jill", 1 )
( "John", 1 )
();
Hope this gives you some ideas.

Take a look at jsoncpp - it is a lightweight json parser, that makes it very easy to use json in your c++ project.
http://sourceforge.net/projects/jsoncpp/
Then you can create a text file, write some entries in the json format there and then open this file in your c++ program. There are plenty of tutorials of how to do it with jsoncpp.

Try looking at std::map.
link here
http://www.cplusplus.com/reference/stl/map/
It's an associative container that is similar to a dictionary. Something like this?
#include <map>
#include <string>
std::map<string,int> person;
void initPeople(){
person["Jeff"] = 0;
person["Jill"] = 1;
person["Jack"] = 0;
person["Jim"] = 1;
person["John"] = 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

std::vectors to arrow::table - c++

Related

What is the most resource efficient way to test if two std::lists contain the same unique elements?

Conditional boost::range::join

Looking for an easy way to reinitialize a struct

How to put initialized structs in a struct?

C++: How to implement (something like) JSON

Categories

Resources