In the recent project, I am trying to write a simple wrapper of C++ library (OpenCV) to be utilized in Julia with the use of CxxWrap.
The case of C++ code (where arguments and return types are my own, rather simple, structs) is working.
The problem I have is with more complex data structures (defined in, let's say OpenCV); in our case (I want it to be simple to understand) I want to get information about the frame, so I execute:
using PyCall
const cv2 = pyimport("cv2")
module CppHello
using CxxWrap
#wrapmodule(joinpath(#__DIR__,"libhello"))
function __init__()
#initcxx
end
end
cap = CppHello.openVideo() // (*)
to above I have two questions:
do I have to explicitly define returned type by openVideo() -- suppose for this moment that I want to use only my library in C++ to start any of the OpenCV functions;
if "No" to the above can I do something like that:
cap = cv2.VideoCapture(0) # from library
cap.isOpened() && frm = cap.frame()
The point is that I am interested only in a few operations at frame along with passing returned value to other procedures (show frame utilizing C++ on the screen or save in file).
Motivation is a problem of low performance of imshow() executed on Julia level with used PyCall (in contrary with goodFeaturesToTrack or calcOpticalFlowPyrLK) and drastic low FPS compared with C++.
Maybe there is another solution I unnoticed.
As I have a problem with (*) I thought that maybe I can simply write a struct (of known elements) of pointers to hold the data returned by C++ functions?
As it is my first edition of the question, I will be grateful for any info about correctness and completeness.
Related
I'm trying to implement a new optimizer that consist in a big part of the Gradient Descent method (which means I want to perform a few Gradient Descent steps, then do different operations on the output and then again). Unfortunately, I found 2 pieces of information;
You can't perform a given amount of steps with the optimizers. Am I wrong about that? Because it would seem a logical option to add.
Given that 1 is true, you need to code the optimizer using C++ as a kernel and thus losing the powerful possibilities of TensorFlow (like computing gradients).
If both of them are true then 2 makes no sense for me, and I'm trying to figure out then what's the correct way to build a new optimizer (the algorithm and everything else are crystal clear).
Thanks a lot
I am not 100% sure about that, but I think you are right. But I don't see the benefits of adding such option to TensorFlow. The optimizers based on GD I know usually work like this:
for i in num_of_epochs:
g = gradient_of_loss()
some_storage = f(previous_storage, func(g))
params = func2(previous_params, some_storage)
If you need to perform a couple of optimization steps, you can simply do it in a loop:
train_op = optimizer.minimize(loss)
for i in range(10):
sess.run(train_op)
I don't think parameter multitrain_op = optimizer.minimize(loss, steps) was needed in the implementation of the current optimizers and the final user can easily simulate it with code before, so that was probably the reason it was not added.
Let's take a look at a TF implementation of an example optimizer, Adam: python code, c++ code.
The "gradient handling" part is processed entirely by inheriting optimizer.Optimizer in python code. The python code only define types of storage to hold the moving window averages, square of gradients, etc, and executes c++ code passing to it the already calculated gradient.
The c++ code has 4 lines, updating the stored averages and parameters.
So to your question "how to build an optimizer":
1 . define what you need to store between the calculations of the gradient
2. inherit optimizer.Optimizer
3. implement updating the variables in c++.
Consider the following goal:
Create a program that solves: minimize f(x) for an arbitrary f and x supplied as input.
How could one design a C++ program that could receive a description of f and x and process it efficiently?
If the program was actually a C++ library then one could explicitly write the code for f and x (probably inheriting from some base function class for f and state class for x).
However, what should one do if the program is for example a service, and the user is sending the description of f and x in some high level representation, e.g. a JSON object?
Ideas that come to mind
1- Convert f into an internal function representation (e.g. a list of basic operations). Apply those whenever f is evaluated.
Problems: inefficient unless each operation is a batch operation (e.g. if we are doing vector or matrix operations with large vectors / matrices).
2- Somehow generate C++ code and compile the code for representing x and computing f. Is there a way to restrict compilation so that only that code needs to be compiled, but the rest of the code is 'pre-compiled' already?
The usual approach used by the mp library and others is to create an expression tree (or DAG) and use some kind of a nonlinear optimization method that normally relies on derivative information which can be computed using automatic or numeric differentiation.
An expression tree can be efficiently traversed for evaluation using a generic visitor pattern. Using JIT might be an overkill unless the time taken for evaluating a function takes substantial fraction of the optimization time.
I'm using LPSolve IDE to solve a LP problem. I have to test the model against about 10 or 20 sets of different parameters and compare them.
Is there any way for me to keep the general model, but to specify the constants as I wish? For example, if I have the following constraint:
A >= [c]*B
I want to test how the model behaves when [c] = 10, [c] = 20, and so on. For now, I'm simply preparing different .lp files via search&replace, but:
a) it doesn't seem too efficient
b) at some point, I need to consider the constraint of the form A >= B/[c] // =(1/[c]*B). It seems, however, that LPSolve doesn't recogize the division operator. Is specifying 1/[c] directly each time the only option?
It is not completely clear what format you use with lp_solve. With the cplex lp format for example, there is no better way: you cannot use division for the coefficient (or even multiplication for that matter) and there is no function to 'include' another file or introduce a symbolic names for a parameter. It is a very simple language, and not suitable for any complex task.
There are several solutions for your problem; it depends if you are interested in something fast to implement, or 'clean', reusable and with a short runtime (of course this is a compromise).
You have the possibility to generate your lp files from another language, e.g. python, bash, etc. This is a 'quick and dirty' solution: very slow at runtime, but probably the faster to implement.
As every lp solver I know, lp_solve comes with several modelling interfaces: you can for example use the GNU mp format instead of the current one. It recognizes multiplication, divisions, conditionals, etc. (everything you are looking for, see the section 3.1 'numeric expressions')
Finally, you have the possibility to use directly the lp_solve interface from another programming language (e.g. C) which will be the most flexible option, but it may require a little bit more work.
See the lp_solve documentation for more details on the supported input formats and the API reference.
I'm not sure how to pose this question, but here it goes:
When programming on my Atmel MCU's in c++ I tend to mix the 'program'-variables and the 'user'-variables in the same datamemory. Which in time is a hassle because I want to make a few presets that can be loaded or saved. And I do not want the 'program'-variables saved because the program will generate the correct values based on the 'user'-values. Is it common practice to split that in memoryplaces? Eg. timercounter in PGM-Memory, thresholdByUser in DATA-memory?
In my program i've made several different functions which all have their own set of uservariables.
Eg: settings has 5 uservariables, generator has 6 uservariables etc...
Would you make 1 big array and then make #define settingsgeneratorSpeed 1, #define settingsBacklight 2 as places, so you could call them as such: Array[generatorSpeed], Array[settingsBacklight] or would you still split it up and collect them by using a struct orso?
Working on atmelstudio 4.0 with a ATMEGA644 on STK500.
Thanks for all the help you can give!
Assuming you are using AT(X)Mega, when referring to Atmel MCU's: IIRC it depends which compiler suite you are using. With gcc, if you have something like a static int, it will go to PGM and it copied to RAM when the program runs. Hence, if you want your variables not to be in PGM memory, you must make them stack or heap variables. Constants and statics will always reside in both. If you wan't to have PGM constants only, you can specifiy that, but this requires special read operations.
For question 2, I'd use const int& settingX = array[Xoffset] instead of a define. But that assumes there's some need to iterate though the array, else I'd just define separate variables.
I'm working on a program that will update a list of objects every (.1) seconds. After the program finishes updating the list, the program will be aware if any object is within a certain distance of any other object. Every object has an X,Y position on a graph. Every object has a value known as 'Range'. Every tick (.1s) the program will use the distance formula to calculate if any other objects are less than or equal to the range of the object being processed.
For instance, if point A has a range of 4 and is at (1,1) and point B is at (1,2), the distance formula will return ~1, meaning point B is within range of point A. The calculation will look similar to this:
objects = { A = {X = 1,Y = 1,Range = 4}, B = {X = 1,Y = 2,Range = 3}, C = {X = 4,Y = 7,Range = 9} }
while(true) do
for i,v in pairs(objects) do
v:CheckDistance()
end
wait()
end
-- Point:CheckDistance() calculates the distance of all other points from Point "self".
-- Returns true if a point is within range of the Point "self", otherwise false.
--
The Problem:
The graph may contain over 200 points, each point would have math applied to it for every other point that exists. This will occur for every point every .1s. I imagine this may slow down or create lag in the 3D environment I am using.
Question:
Does this sound like the optimal way to do this?
What are your ideas on how this should be done more efficiently/quickly?
As Alex Feinamn said: it seems you are making your own collision detector, albeit a primitive one.
I'm not sure if you have points on a 2D or 3D plane, however. You say every object "has an X,Y position on a graph" and further on talk about "lag in the 3D environment I am using."
Well, both 2D and 3D physics – as well as Lua – are well developed fields, so there are no shortage of optimisations.
Spatial Trees
A quadtree (or octree for 3D) is a data structure that represents your entire 2 world as a square divided into four squares, which are each divided into four squares, and so on.
You can experiment with an interactive example yourself at this handy site.
Spatial trees in general provide very fast access for localised points.
The circles represent the interaction radius of a particular particle. As you can see, it is easy to find exactly which branches need to be traversed.
When dealing with point clouds, you need to ensure two points do not share the same location, or that there is a maximum division depth to your tree; otherwise, it will attempt to infintely divide branches.
I don't know of any octree implementations in Lua, but it would be pretty easy to make one. If you need examples, look for a Python or C implementation; do not look for one in C++, unless you can handle the template-madness.
Alternatively, you can use a C or C++ implementation via Lua API bindings or a FFI library (recommended, see binding section).
LuaJIT
LuaJIT is a custom Lua 5.1 interpreter and just-in-time compiler that provides significant speed and storage optimisations as well as an FFI library that allows for easy and efficient use of C functions and types, such as integers.
Using C types to represent your points and spatial tree will significant improve performance.
local ffi = require"ffi"
ffi.cdef[[
// gp = graphing project
struct gp_point_s {
double x, y;
double range;
};
struct gp_quadtree_root_s {
// This would be extensive
};
struct gp_quadtree_node_s {
//
};
]]
gp_point_mt = {
__add = function(a, b)
return gp_point(a.x+b.x, a.y+b.y)
end,
__tostring = function(self)
return self.x..", "..self.y
end
__index = {
-- I couldn't think of anything you might need here!
something = function(self) return self.range^27 end,
},
}
gp_point = ffi.metatype("struct gp_point_s", gp_point_mt)
-- Now use gp_point at will
local p = gp_point(22.5, 5.4, 6)
print(p)
print(p+gp_point(1, 1, 0))
print(p:something())
LuaJIT will compile any runtime usage of gp_point to native assembly, meaning C-like speeds in some cases.
Lua API vs FFI
This is a tricky one...
Calls via the Lua API cannot be properly optimised, as they are in authority over the Lua state.
Whereas raw calls to C functions via LuaJIT's FFI can be fully optiised.
It's up to you to decide how your code should interoperate:
Directly within the scripts (Lua, limiting factor: dynamic languages can only be optimised to a certain extent)
Scripts -> Application bindings (Lua -> C/C++, limiting factor: Lua API)
Scripts -> External libraries (Lua -> C, limiting factor: none, FFI calls are JIT compiled)
Delta time
Not really optimisation, but it's important.
If you're making an application designed for user interaction, then you should not fix your time step; that is, you cannot assume that every iteration takes exactly 0.1 seconds. Instead, you must multiply all time dependant operations by time.
pos = pos+vel*delta
vel = vel+accel*delta
accel = accel+jerk*delta
-- and so on!
However, this is a physics simulation; there are distinct issues with both fixed and variable time steps for physics, as discussed by Glenn Fiedler:
Fix your timestep or explode
... If you have a series of really stiff spring constraints for shock absorbers in a car simulation then tiny changes in dt can actually make the simulation explode. ...
If you use a fixed time step, then the simulation should theoretically run identically every time. If you use variable time step, it will be very smooth but unpredictable. I'd suggest asking your professor. (This is a university project, right?)
I don't know whether it's possible within your given circumstances, but I'd definitely use events rather than looping. That means track when a point changes it's position and react to it. This is much more efficient as it needs less processing and refreshes the positions faster than every 1 second. You should probably put in some function-call-per-time cap if your points float around because then these events would be called very often.