ANN training progress resets every new training session using FANN - c++

I have a standard neural network which I have trained for some time, but not until perfection. After the training session is complete, I save the network on disk.
After some time I want to resume training the network from where it left. The problem is, it seems that every time I start a new training session, the weights and biases seem to be totally reset, which means I'm training the network from scratch all over again:
Previous session:
New session:
Here is the excerpt from my training function:
void trainNet(fann *net) {
const unsigned int
max_epochs = 1000,
epochs_between_reports = 10;
const float desired_error = 0.01f;
net -> learning_momentum = 0.1f;
fann_train_on_file(net, "", max_epochs, epochs_between_reports, desired_error);
fann_save(net, "");
What am I missing? It seems so intuitive to me that you could train a network over a span of multiple sessions. Am I wrong? Is it a limitation of the library?
The training data has remained constant between sessions. This isn't limited to this specific network, either -- networks of any format seem to invoke the same issue.

What am I missing?
As per Documentation - FANN Training > Training Data Manipulation > fann_set_training_algorithm :
Set the training algorithm.
Example :
fann_set_training_algorithm(net, FANN_TRAIN_INCREMENTAL)


Kernel automatically restart after loading data when using AI platform of google cloud platform

I'm trying to load a 600 MB data using notebook of AI platform.
The data loading was fine at first, but right after the loading complete, the kernel will restart automatically. I've successfully load the data before, the issue comes after I do some preprocessing to the images while loading data.
I'm wondering if I have done anything wrong to make this happen since I'm new to GCP. I have tried to setup higher RAM but it still not work. And here is the code that trigger the problem.
for i in random.sample(items,5600):
j += 1
img = cv2.imread(PATH + "C1-P1_Train/" + labels[i][0])
img = cv2.resize(img,size)
img = cv2.resize(preprocess(img), size)
print("Import successfully!")
Thanks for your help

OpenVINO GPU performance optimization

I'm trying to speed up the inference on a people counter application, in order to use the GPU I've set the inference engine configuration setting as described:
device_name = "GPU"
ie.SetConfig({ {PluginConfigParams::KEY_CONFIG_FILE, "./cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml"} }, device_name);
and loading the network on the inference engine I've set the target device like described below:
CNNNetwork net = netReader.getNetwork();
TargetDevice t_device = InferenceEngine::TargetDevice::eGPU;
const std::map<std::string, std::string> dyn_config = { { PluginConfigParams::KEY_DYN_BATCH_ENABLED, PluginConfigParams::YES } };
ie_.LoadNetwork(network,device_name, dyn_config);
but the inference engine use the CPU yet, and this slow down the inference time. There is a way to use the Intel GPU at maximum power to do inference on a particular network? I'm using the person-detection-retail-0013 model.
Have you meant person-detection-retail-0013? Because I haven't found pedestrian-detection-retail-013 in open_model_zoo repo.
This might be expected that you see a slowdown while using GPU. The network, you tested, has the following layers as part of the network topology: PriorBox, DetectionOutput . Those layers are executed on CPU as documentation says:
I have a guess that this may be the reason of the slowdown.
But to be 100% percent sure I would suggest to run benchmark_app tool to do bench-marking of the model. This tool can print detailed performance information about each layer. It should help to shed light what is the real root cause of the slowdown. More information about benchmark_app can be found here:
PS: Just a piece of advice regarding usage of IE API. network.setTargetDevice(t_device); - setTargetDevice is a deprecated method. It is enough to set a device using LoadNetwork like in your example: ie_.LoadNetwork(network,device_name, dyn_config);
Hope it will help.

Significant RAM cost while running TensorFlow in C++

I'm currently writing some inference code using a trained TensorFlow graph on GPU machine with C++ APIs.
Here are my settings:
Platform: CentOS 7
TensorFlow Version: TensorFlow 1.5
CUDA Version: CUDA 9.0
C++ Version: C++11
There are a couple of questions that I'm struggling with.
1) First, I followed this tutorial to learn the basic template to load a graph in C++. The example in this tutorial is quite simple, but the program takes almost 0.9G in RAM when I run it (on GPU machine).
2) My graph is way more complicated than the one in that tutorial. There are approximately 20 layers and the numbers of nodes in layers vary from 300 to 5000.
My (pseudo) code snippet is here. For simplicity, I only keep the code that causes (potential) memory issue:
tensorflow::Tensor input = getDataFromSomewhere(...);
int length = size of the input;
int g_batch_size = 50;
// 1) Create session...
// 2) Load graph...
// 3) Inference
for (int x = 0; x < length; x += g_batch_size) {
tensorflow::Tensor out;
auto cur_slice = input.Slice(x, std::min(x + g_batch_size, length));
inference(cur_slice, out);
// doSomethingWithOutput(out);
// 4) Close session and free session memory
// Inference helper function
tensorflow::Status inference(tensorflow::Tensor& input_tensors, tensorflow::Tensor& out) {
// This line increases a lot more memory usage
TensorDict feed_dict = {{"IteratorGetNext:0", input_tensors}};
std::vector<tensorflow::Tensor> outputs;
tensorflow::Status status = session->Run(feed_dict, {"final_dense:0"}, {}, &outputs);
// UpdateOutWithOutputs();
return tensorflow::Status::OK();
After I created session and loaded graph, the memory cost is around 1.2G.
Then, as I noted in my code, when the program reached session->Run(...), the memory usage went up to more than 2G.
I'm not sure if this is a normal behavior of TensorFlow. I've checked this and this thread, but I don't quite know if I created redundant ops in my code.
Any comments or suggestions are appreciated! Thanks in advance!
The issue that I found was that Tensorflow dynamic libraries would take about 200MB and CUDA dynamic libraries would take more than 500MB memory. So loading those libraries already takes a great amount of memory.

Inserting rows on BigQuery: InsertAllRequest Vs BigQueryIO.writeTableRows()

When I'm inserting rows on BigQuery using writeTableRows, performance is really bad compared to InsertAllRequest. Clearly, something is not setup correctly.
Use case 1: I wrote a Java program to process 'sample' Twitter stream using Twitter4j. When a tweet comes in I write it to BigQuery using this:
When I run this program from my Mac, it inserts about 1000 rows per minute directly into BigQuery table. I thought I could do better by running a Dataflow job on the cluster.
Use case 2: When a tweet comes in, I write it to a topic of Google's PubSub. I run this from my Mac which sends about 1000 messages every minute.
I wrote a Dataflow job that reads this topic and writes to BigQuery using BigQueryIO.writeTableRows(). I have a 8 machine Dataproc cluster. I started this job on the master node of this cluster with DataflowRunner. It's unbelievably slow! Like 100 rows every 5 minutes or so. Here's a snippet of the relevant code:
statuses.apply("ToBQRow", ParDo.of(new DoFn<Status, TableRow>() {
public void processElement(ProcessContext c) throws Exception {
TableRow row = new TableRow();
Status status = c.element();
row.set("Id", status.getId());
row.set("Text", status.getText());
row.set("RetweetCount", status.getRetweetCount());
row.set("FavoriteCount", status.getFavoriteCount());
row.set("Language", status.getLang());
row.set("ReceivedAt", null);
row.set("UserId", status.getUser().getId());
row.set("CountryCode", status.getPlace().getCountryCode());
row.set("Country", status.getPlace().getCountry());
.apply("WriteTableRows", BigQueryIO.writeTableRows().to(tweetsTable)//
What am I doing wrong? Should I use a 'SparkRunner'? How do I confirm that it's running on all nodes of my cluster?
With BigQuery you can either:
Stream data in. Low latency, up to 100k rows per second, has a cost.
Batch data in. Way higher latency, incredible throughput, totally free.
That's the difference you are experiencing. If you only want to ingest 1000 rows, batching will be noticeably slower. The same with 10 billion rows will be way faster thru batching, and at no cost.
Dataflow/Bem's BigQueryIO.writeTableRows can either stream or batch data in.
With BigQueryIO.Write.Method.FILE_LOADS the pasted code is choosing batch.

Effective way of reducing data for real-time plot

I am developing scientific application in Windows Forms (VC++ 2010), which controls relatively new, electronic device. I control it by additional, wrapped library written in C. After initial setup of all parameters, this application triggers a measurement in the device. Then, it sends to my app a huge data of over 200k samples of int at significant rate – let’s assume it’s 50 datasets per second.
Now, I need to plot my data at the real-time pace using Windows Forms chart. It would be perfect to have 750 samples plotted inside chart at rate of about 30 FPS. The problem I encountered lies in the algorithm of reducing database in a fast way without losing reliability of plot.
My ideas (data is oscilating around value = 127):
Choose 750 points just by selecting every (200 000/ 750) th point
Group the data and calculate mean value
Group the data and select maximum or minimum (based on overall group placement – if most of them is above 127 – select minimum, else maximum).
Which one (if any) of those solution is the best considering I have to plot data at real-time speed and plot should not miss spots, where we have any significant signal (looking like a kind of narrowed, modulated sine wave)? Is there any better approach?
And the last question: should I consider using table of pointers to my huge data buffer or data copies as data for plot considering I always have the same buffer of collected data (device just overwrites this buffer constantly with new data)?
This is my first post, so please inform me if there will be anything wrong in the style of post.
I developed an application that reads data at 256Hz (256 samples / second) from 16 channels and displays it in 16 different charts. The best way of plotting all data in real time was using a separate thread to updoat the plots. Here is the solution (in c#) that might be useful for you too.
When new data is read, data is stored in a list or array. Since it is real-time data, the timestamps are also generated here. Using the sample rate of the data acquired: timeStamp = timeStamp + sampleIdx/sampleRate;
public void OnDataRead(object source, EEGEventArgs e)
if ((e.rawData.Length > 0) && (!_shouldStop))
lock (_bufferRawData)
for (int sampleIdx = 0; sampleIdx < e.rawData.Length; sampleIdx++)
// Append data
// Calculate corresponding timestamp
secondsToAdd = (float) sampleIdx/e.sampleRate;
// Append corresponding timestamp
_bufferXValues.Add( e.timeStamp.AddSeconds(secondsToAdd));
Then, create a thread that sleeps every N ms (100ms is suitable for me for a 2 seconds display of data, but if I wanna display 10 seconds, I need to increase to 500ms of sleep time for the thread)
//Create thread
//define a thread to add values into chart
ThreadStart addDataThreadObj = new ThreadStart(AddDataThreadLoop);
_addDataRunner = new Thread(addDataThreadObj);
addDataDel += new AddDataDelegate(AddData);
//Start thread
And finally, update the charts and make the thread sleep every N ms
private void AddDataThreadLoop()
while (!_shouldStop)
// Sleeep thread for 100ms
Data will be added to the chart every 100ms
private void AddData()
// Copy data stored in lists to arrays
float[] rawData;
DateTime[] xValues;
if (_bufferRawData.Count > 0)
// Copy buffered data in thread-safe manner
lock (_bufferRawData)
rawData = _bufferRawData.ToArray();
xValues = _bufferXValues.ToArray();
for (int sampleIdx = 0; sampleIdx < rawData.Length; sampleIdx++)
foreach (Series ptSeries in chChannels[channelIdx].Series)
// Add new datapoint to the corresponding chart (x, y, chartIndex, seriesIndex)
AddNewPoint(xValues[sampleIdx], rawData[sampleIdx], ptSeries);