Sorting based on associative arrays in D - d

I am trying to follow examples given in various places for D apps. Generally when learning a language I start on example apps and change them myself, purely to test stuff out.
One app that caught my eye was to count the frequency of words in a block of text passed in. As the dictionary was built up in an associative array (with the elements storing the frequency, and the keys being the words themselves), the output was not in any particular order. So, I attempted to sort the array based on examples given on the site.
Anyway, the example showed a lambda 'sort!(...)(array);' but when I attempt the code dmd won't compile it.
Here's the boiled down code:
import std.stdio;
import std.string;
void main() {
uint[string] freqs;
freqs["the"] = 51;
freqs["programming"] = 3;
freqs["hello"] = 10;
freqs["world"] = 10;
/*...You get the point...*/
//This is the actual example given, but it doesn't
//seem to work, old D version???
//string[] words = array(freqs.keys);
//This seemed to work
string[] words = freqs.keys;
//Example given for how to sort the 'words' array based on
//external criteria (i.e. the frequency of the words from
//another array). This is the line where the compilor craps out!
sort!((a,b) {return freqs[a] < freqs[b];})(words);
//Should output in frequency order now!
foreach(word; words) {
writefln("%s -> %s", word, freqs[word]);
}
}
When I try to compile this code, I get the following
s1.d(24): Error: undefined identifier sort
s1.d(24): Error: function expected before (), not sort of type int
Can anyone tell me what I need to do here?
I use DMD v2.031, I've tried installing the gdc but this only seems to support the v1 language spec. I've only started looking at dil, so I can't comment on whether this supports the code above.

Try adding this near the top of the file:
import std.algorithm;

Here's an even simpler way to get an input file (from cmdline), get lines/words and print a table of word frequencing, in descending order :
import std.algorithm;
import std.file;
import std.stdio;
import std.string;
void main(string[] args)
{
auto contents = cast(string)read(args[1]);
uint[string] freqs;
foreach(i,line; splitLines(contents))
foreach(word; split(strip(line)))
++freqs[word];
string[] words = freqs.keys;
sort!((a,b)=> freqs[a]>freqs[b])(words);
foreach(s;words)
writefln("%s\t\t%s",s,freqs[s]);
}
Well, almost 4 years later... :-)

Related

In which order yaml-cpp returns data?

I wrote the following yaml file:
linear: [0.0,1.0,10.0,0.05]
linear: [1.0,0.5,5.0,0.05]
rotational: [0.0,6.28,20,0.5]
rotational: [6.28,0.0,20,0.5]
and I use yaml-cpp to parse it with the following code:
YAML::Node sequence = YAML::LoadFile(filename_);
int count = 1;
for (YAML::const_iterator it = sequence.begin(); it != sequence.end(); ++it)
{
const std::string& name = it->first.as<std::string>();
const std::vector<double>& parameters = it->second.as<std::vector<double> >();
...
if I print name and parameters values (in the order I get them) the output is:
linear: [0,1,10,0.05]
rotational: [6.28,0,20,0.5]
linear: [1,0.5,5,0.05]
rotational: [0,6.28,20,0.5]
can someone please explain me what is happening and suggest me how to fix this issue?
Thanks.
YAML maps are not allowed to have duplicate keys, so that YAML file is actually illegal. yaml-cpp is simply lenient here and doesn't report an error.
What's more, YAML maps do not specify a key order, and so yaml-cpp simply chooses whatever order is most convenient internally to iterate over. It's probably best to assume that unspecified order means random order, i.e., you can't rely on it.

Best approach to query a database to create a collection of a custom class in C++

I am new to interfaces with databases through c++ and was wondering what is the best approach to do the following:
I have an object with member variables that I define ahead of time, and member variables that I need to pull from a database given the known variables. For example:
class DataObject
{
public:
int input1;
string input2;
double output1;
DataObject(int Input1, string Input2) :
input1(Input1), input2(Input2)
{
output1 = Initializer(input1,input2);
}
private:
Initializer(int, string);
static RecordSet rs; //I am just guessing the object would be called RecordSet
}
Now, I can do something like:
std::vector<DataObject> v;
for (int n = 0; n <= 10; ++n)
for (char w = 'a'; w <= 'z'; ++w)
v.push_back(DataObject{n,z});
And get an initialized vector of DataObjects. Behind the scenes, Initializer will check check if rs already has data. If not, it will connect to the database and query something like: select input1, input2, output1 from ... where input1 between 1 and 10 and input 2 between 'a' and 'z', and then start initializing each DataObject with output1 given each pair of input1 and input2.
This would be utterly simple in C#, but from code samples I have found online it looks utterly ugly in C++. I am stuck on two things. As stated earlier, I am completely new to database interfaces in C++, and there are so many methods from which to choose, but I would like to hone in on a specific method that truly fits my purpose. Furthermore - and this is the purpose - I am trying to make use of a static data set to pull data in a single query, rather than run a new query for each input1/input2 combination; even better yet, is there a way to have database results written directly into the newly created DataObjects rather than making a pit stop in some temporary RecordSet object.
To summarize and clarify: I have database on a relational database, and I am trying to pull the data and store it into a collection of objects. How do I do this? Any tips/direction - I am much obliged.
EDIT 8/16/17: After some research and trials I have come up with the below
So I've had progress by using an ADORecordset with the put_CursorLocation set to adUseServer:
rs->put_CursorLocation(adUseServer)
My understanding is that by using this setting the query result is stored on the server, and the client side only gets the current row pointed to by rs.
So I get my data from the row and create the DataObject on the spot, emplace_back it into the vector, and finally call rs->MoveNext() to get the next row and repeat until I reach the end. Partial example as follows:
std::vector<DataObject> v;
DataObject::rs.Open(connString,Sql); // Connection for wrapper class
for (int n = 0; n <= 10; ++n)
for (char w = 'a'; w <= 'z'; ++w)
v.emplace_back(DataObject{n,z});
// Somewhere else...
void DataObject::Initializer(int a, string b) {
int ra; string rb; double rc;
// For simplicity's sake, let's assume the result set is ordered
// in the same way as the for-loop, and that no data is missing.
// So the below sanity-check would be unnecessary, but included.
while (!rs.IsEOF())
{
// Let's assume I defined these 'Get' functions
ra = rs.Get<int>("Input1");
rb = rs.Get<string>("Input2");
rc = rs.Get<double>("Output1");
rs.MoveNext();
if (ra == a && rb == b) break;
}
return rc;
}
// Constructor for RecordSet:
RecordSet::RecordSet()
{
HRESULT hr = rs_.CoCreateInstance(CLSID_CADORecordset);
ATLENSURE_SUCCEEDED(hr);
rs_->put_CursorLocation(adUseServer);
}
Now I'm hoping that I interpreted how this works correctly; otherwise, this would be a whole lot of fuss over nothing. I am not an ADO or .Net expert - clearly - but I'm hoping someone can chime in to confirm that this is indeed how this works, and perhaps shed some more light on the topic. On my end, I tested the memory usage using VS2015's diagnostic tool, and the heap seems to be significantly larger when using adUseClient. If my conjecture is correct, then why would anyone opt to use adUseClient, or any of the other choices, over adUseServer.
I think of two options: by member type and BLOB.
For classes, I recommend one row per class instance with one column per member. Search the supported data types by your database. There are some common types.
Another method is to use the BLOB (Binary Large OBject) data type. This is a "binary" data type used for storing data-as-is.
You can use the BLOB type for members that are of unsupported data types.
You can get more complicated by researching "Database Normalization" or "Database normal forms".

Removing all occurrences of a given value from an array in D

Suppose that I have an array. I want to remove all the elements within the array that have a given value. Does anyone know how to do this? The value I am trying to remove may occur more than once and the array is not necessarily sorted. I would prefer to filter the array in-place instead of creating a new array. For example, removing the value 2 from the array [1, 2, 3, 2, 4] should produce the result [1, 3, 4].
This is the best thing I could come up with:
T[] without(T)(T[] stuff, T thingToExclude) {
auto length = stuff.length;
T[] result;
foreach (thing; stuff) {
if (thing != thingToExclude) {
result ~= thing;
}
}
return result;
}
stuff = stuff.without(thingToExclude);
writeln(stuff);
This seems unnecessarily complex and inefficient. Is there a simpler way? I looked at the std.algorithm module in the standard library hoping to find something helpful but everything that looked like it would do what I wanted was problematic. Here are some examples of things I tried that didn't work:
import std.stdio, std.algorithm, std.conv;
auto stuff = [1, 2, 3, 2, 4];
auto thingToExclude = 2;
/* Works fine with a hard-coded constant but compiler throws an error when
given a value unknowable by the compiler:
variable thingToExclude cannot be read at compile time */
stuff = filter!("a != " ~ to!string(thingToExclude))(stuff);
writeln(stuff);
/* Works fine if I pass the result directly to writeln but compiler throws
an error if I try assigning it to a variable such as stuff:
cannot implicitly convert expression (filter(stuff)) of type FilterResult!(__lambda2,int[]) to int[] */
stuff = filter!((a) { return a != thingToExclude; })(stuff);
writeln(stuff);
/* Mysterious error from compiler:
template to(A...) if (!isRawStaticArray!(A)) cannot be sliced with [] */
stuff = to!int[](filter!((a) { return a != thingToExclude; })(stuff));
writeln(stuff);
So, how can I remove all occurrences of a value from an array without knowing the indexes where they appear?
std.algorithm.filter is pretty close to what you want: your second try is good.
You'll want to either assign it to a new variable or use the array() function on it.
auto stuffWithoutThing = filter!((a) { return a != thingToExclude; })(stuff);
// use stuffWithoutThing
or
stuff = array(filter!((a) { return a != thingToExclude; })(stuff));
The first one does NOT create a new array. It just provides iteration over the thing with the given thing filtered out.
The second one will allocate memory for a new array to hold the content. You must import the std.array module for it to work.
Look up function remove in http://dlang.org/phobos/std_algorithm.html. There are two strategies - stable and unstable depending on whether you want the remaining elements to keep their relative positions. Both strategies operate in place and have O(n) complexity. The unstable version does fewer writes.
if you want to remove the values you can use remove
auto stuffWithoutThing = remove!((a) { return a == thingToExclude; })(stuff);
this will not allocate a new array but work in place, note that the stuff range needs to be mutable

Fast conversion of C/C++ vector to Numpy array

I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:
def __array__(self, dtype=float):
return np.fromiter(self, dtype, self.size())
The problem is that each iterator next call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.
Is there a way to pass a pointer to internal_data_[0] and the value internal_data_.size() to numpy so that it can directly access or copy the data without all the Python overhead?
You will want to define __array_interface__() instead. This will let you pass back the pointer and the shape information directly.
Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy
The advantage is that it handles the conversion to numpy arrays automatically.
Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.
If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy to copy the data.
So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:
%insert("python") %{
import numpy as np
%}
/*! Templated function to copy contents of a container to an allocated memory
* buffer
*/
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>
template < typename Container_T >
void copy_to_buffer(
const Container_T& field,
typename Container_T::value_type* buffer,
typename Container_T::size_type length
)
{
// ValidateUserInput( length == field.size(),
// "Destination buffer is the wrong size" );
// put your own assertion here or BAD THINGS CAN HAPPEN
if (length == field.size()) {
std::copy( field.begin(), field.end(), buffer );
}
}
//====
%}
%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {
res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
if ( res < 0 ) {
PyErr_Clear();
%argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
$symname, $argnum);
}
$1 = ($1_ltype) buffer_;
$2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef
%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)
TYPEMAP_COPY_TO_BUFFER(CLASS)
%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;
%extend CLASS {
%insert("python") %{
def __array__(self):
"""Enable access to this data as a numpy array"""
a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
_copy_to_buffer_ ## PYCLASS(self, a)
return a
%}
}
%enddef
then you can make a container "Numpy"-able with
%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);
Then in Python, just do:
# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )
This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.
A slightly more complete version of this code is part of my PyTRT project at github.

Simplest lua function that returns a vector of strings

I need a very simple c++ function that calls a lua function that returns an array of strings, and stores them as a c++ vector. The function can look something like this:
std::vector<string> call_lua_func(string lua_source_code);
(where lua source code contains a lua function that returns an array of strings).
Any ideas?
Thanks!
Here is some source that may work for you. It may need some more polish and testing. It expects that the Lua chunk is returning the array of strings, but with slight modification could call a named function in the chunk. So, as-is, it works with "return {'a'}" as a parameter, but not "function a() return {'a'} end" as a parameter.
extern "C" {
#include "../src/lua.h"
#include "../src/lauxlib.h"
}
std::vector<string> call_lua_func(string lua_source_code)
{
std::vector<string> list_strings;
// create a Lua state
lua_State *L = luaL_newstate();
lua_settop(L,0);
// execute the string chunk
luaL_dostring(L, lua_source_code.c_str());
// if only one return value, and value is a table
if(lua_gettop(L) == 1 && lua_istable(L, 1))
{
// for each entry in the table
int len = lua_objlen(L, 1);
for(int i=1;i <= len; i++)
{
// get the entry to stack
lua_pushinteger(L, i);
lua_gettable(L, 1);
// get table entry as string
const char *s = lua_tostring(L, -1);
if(s)
{
// push the value to the vector
list_strings.push_back(s);
}
// remove entry from stack
lua_pop(L,1);
}
}
// destroy the Lua state
lua_close(L);
return list_strings;
}
First of all, remember Lua arrays can contain not only integers but also other types as keys.
Then, you can import the Lua source code using luaL_loadstring.
At this point, the only requirement left is the "return vector".
Now, you can use lua_istable to check whether a value is a table(array) and use lua_gettable to extract the multiple fields(see http://www.lua.org/pil/25.1.html) and manually add them one by one to the vector.
If you can not figure out how to deal with the stack, there seem to be some tutorials to help you. To find the number of elements, I found this mailing list post, which might be helpful.
Right now, I don't have Lua installed, so I can't test this information. But I hope it helps anyway.
Not really an answer to your question:
I've had a lot of trouble when writing c++ <=> lua interface code with the plain lua c-api. Then I tested many different lua-wrapper and I really suggest luabind if you are trying to achieve anything more or less complex. It's possible to make types available to lua in seconds, the support for smart pointers works great and (compared to other projects) the documentation is more or less good.