Read a text file with regexp and store into structure

Read a text file with regexp and store into structure - regex

I everybody,
I'm trying to parse a text file into matlab: it consists of several blocks (START_BLOCK/END_BLOCK) where are allocated strings (variable) and values (associated to the previous variables).
An example is this:
START_BLOCK_EXTREMEWIND
velocity_v1 29.7
velocity_v50 44.8
velocity_vred1 32.67
velocity_vred50 49.28
velocity_ve1 37.9
velocity_ve50 57
velocity_vref 50
END_BLOCK_EXTREMEWIND
Currently, my code is:
fid = fopen('test_struct.txt','rt');
C = textscan(fid,'%s %f32 %*[^\n]','CollectOutput',true);
C{1} = reshape(C{1},1,numel(C{1}));
C{2} = reshape(C{2},1,numel(C{2}));
startIdx = find(~cellfun(#isempty, regexp(C{1}, 'START_BLOCK_', 'match')));
endIdx = find(~cellfun(#isempty, regexp(C{1}, 'END_BLOCK_', 'match')));
assert(all(size(startIdx) == size(endIdx)))
extract_parameters = #(n)({C{1}{startIdx(n)+1:endIdx(n) - 1}});
parameters = arrayfun(extract_parameters, 1:numel(startIdx), 'UniformOutput', false);
s = cell2struct(cell(size(parameters{1})),parameters{1}(1:numel(parameters{1})),2);
s.velocity_v1 = C{2}(2);
s.velocity_v50 = C{2}(3);
s.velocity_vred1 = C{2}(4);
s.velocity_vred50 = C{2}(5);
s.velocity_ve1 = C{2}(6);
s.velocity_ve50 = C{2}(7);
s.velocity_vref = C{2}(8);
It works, but it's absolutely static. I would rather have a code able to:
1. check the existence of blocks --> as already implemented;
2. the strings are to be taken as fields of the structure;
3. the numbers are meant to be the attributes of each field.
Finally, if there is more than one block, there should be and iteration about those blocks to get the whole structure.
It's the first time I approach structure coding at all, so please be patient.
I thank you all in advance.
Kindest regards.

It sounds like you will want to make use of dynamic field names. If you have a struct s, a string fieldName that stores the name of a field, and fieldVal which holds the value that you'd like to set for this field, then you can use the following syntax to perform the assignment:
s.(fieldName) = fieldVal;
This MATLAB doc provides further info.
With this in mind, I took a slightly different approach to parse the text. I iterated through the text with a for loop. Although for loops are sometimes frowned upon in MATLAB (since MATLAB is optimized for vectorized operations), I think in this case it helps to make the code cleaner. Furthermore, my understanding is that if you are having to make use of arrayfun, then replacing this with a for loop probably won't really cause much of a performance hit, anyway.
The following code will convert each block in the text to a struct with the specified fields and values. These resulting "block" structs are then added to a higher-level "result" struct.
fid = fopen('test_struct.txt','rt');
C = textscan(fid,'%s %f32 %*[^\n]','CollectOutput',true);
fclose(fid);
paramNames = C{1};
paramVals = C{2};
curBlockName = [];
inBlock = 0;
blockCount = 0;
%// Iterate through all of the entries in "paramNames". Each block will be a
%// new struct that is then added to a high-level "result" struct.
for i=1:length(paramNames)
curParamName = paramNames{i};
isStart = ~isempty(regexp(curParamName, 'START_BLOCK_', 'match'));
isEnd = ~isempty(regexp(curParamName, 'END_BLOCK_', 'match'));
%// If at the start of a new block, create a new struct with a single
%// field - the BlockName (as specified by the text after "START_BLOCK_"
if(isStart)
assert(inBlock == 0);
curBlockName = curParamName(length('START_BLOCK_') + 1:end);
inBlock = 1;
blockCount = blockCount + 1;
s = struct('BlockName', curBlockName);
%// If at the end of a block, add the struct that we've just populated to
%// our high-level "result" struct.
elseif(isEnd)
assert(inBlock == 1);
inBlock = 0;
%// EDIT - storing result in "structure of structures"
%// rather than array of structs
%// s_array(blockCount) = s;
result.(curBlockName) = s;
%// Otherwise, assume that we are inside of a block, so add the current
%// parameter to the struct.
else
assert(inBlock == 1);
s.(curParamName) = paramVals(i);
end
end
%// Results stored in "result" structure
Hopefully this answers your question... or at least provides some helpful hints.

I edited my code today and now it almost works as meant to be:
clc, clear all, close all
%Find all row headers
fid = fopen('test_struct.txt','r');
row_headers = textscan(fid,'%s %*[^\n]','CommentStyle','%','CollectOutput',1);
row_headers = row_headers{1};
fclose(fid);
%Find all attributes
fid1 = fopen('test_struct.txt','r');
attributes = textscan(fid1,'%*s %s','CommentStyle','%','CollectOutput',1);
attributes = attributes{1};
fclose(fid1);
%Collect row headers and attributes in a single cell
parameters = [row_headers,attributes];
%Find all the blocks
startIdx = find(~cellfun(#isempty, regexp(parameters, 'BLOCK_START_', 'match')));
endIdx = find(~cellfun(#isempty, regexp(parameters, 'BLOCK_END_', 'match')));
assert(all(size(startIdx) == size(endIdx)))
%Extract fields between BLOCK_START_ and BLOCK_END_
extract_fields = #(n)(parameters(startIdx(n)+1:endIdx(n)-1,1));
struct_fields = arrayfun(extract_fields, 1:numel(startIdx), 'UniformOutput', false);
%Extract attributes between BLOCK_START_ and BLOCK_END_
extract_attributes = #(n)(parameters(startIdx(n)+1:endIdx(n)-1,2));
struct_attributes = arrayfun(extract_attributes, 1:numel(startIdx), 'UniformOutput', false);
for i = 1:numel(struct_attributes)
s{i} = cell2struct(struct_attributes{i},struct_fields{i},1);
end
Now, in the end, I get a cell of stuctures that could, let's say, fulfill my requirements. The only point that I would like to improve is:
- Give each structure the name of the respective block.
Does anybody have valuable hints?
Thank you all for supporting me.
Regards,
Francesco

Related

TFLite c++ determine the classification on output

I'm trying to get an output from a trained model which has a classification, the input node count is 1, and the output node count is 2. However, I'm not quite sure where the classification lands and how exactly do I handle it.
for(size_t idx = 0; idx < input_node_count; idx++)
{
float* data_ptr = interpreter->typed_input_tensor<float>(idx);
memcpy(data_ptr, my_input.data(), input_elem_size[idx]);
}
if (kTfLiteOk != interpreter->Invoke())
{
return false;
}
for(size_t idx = 0; idx < output_node_count; idx++)
{
float* output = interpreter->typed_output_tensor<float>(idx);
output_buffer[idx] = std::vector<float> (output,
output + output_elem_size[idx]);
}
result = output_buffer[1];
classification_result = output_buffer[0]; // Best way to approach this
As of now, I can just print out the sizes and see that result is 196.608 elements and classification_result is 2, as it should. My problem is I hard-coded this to be index 1 and 0 but this might not always be the case in my program which runs all sorts of models. So sometimes classification might be index 1, which causes the above code to fall apart.
I've tried to check the sizes of the buffers however that is also not guaranteed since the classification size and the result size is different for each input. Is there a way for me to know for certain which index is which? Am I approaching this the right way?

Use Tensorflow lite signatures for this. Signature defs can help you with that by accessing inputs/outputs using names defined in the original model.
See conversion and inference example here
Python example
# Load the TFLite model in TFLite Interpreter
interpreter = tf.lite.Interpreter(TFLITE_FILE_PATH)
# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()
# my_signature is callable with input as arguments.
output = my_signature(x=tf.constant([1.0], shape=(1,10), dtype=tf.float32))
# 'output' is dictionary with all outputs from the inference.
# In this case we have single output 'result'.
print(output['result'])
For C++
# To run
auto my_signature = interpreter_->GetSignatureRunner("my_signature");
# Set your inputs and allocate tensors
auto* input_tensor_a = my_signature->input_tensor("input_a");
...
# Execute
my_signature->Invoke();
# output
auto* output_tensor_x = my_signature->output_tensor("output_x");

How to write multiple nodes with OPC-UA at once using open62541?

I am attempting to write multiple nodes in a single request, however I have not found any documentation or examples on how to do that, every time I find anything regarding the issue, a single node is written. Based on my understanding of the open62541 library (which is not much), I've attempted to do this like so:
void Write_from_3_to_5_piece_queue() {
char NodeID[128];
char NodeID_backup[128];
char aux[3];
bool bool_to_write = false;
strcpy(NodeID_backup, _BaseNodeID);
strcat(NodeID_backup, "POU.AT2.piece_queue["); // this is where I want to write, I need only to append the array index in which to write
UA_WriteRequest wReq;
UA_WriteValue my_nodes[3]; // this is where I start to make things up, I'm not sure this is the correct way to do it
my_nodes[0] = *UA_WriteValue_new();
my_nodes[1] = *UA_WriteValue_new();
my_nodes[2] = *UA_WriteValue_new();
strcpy(NodeID, NodeID_backup);
strcat(NodeID, "3]"); //append third index of array (will write to piece_queue[3])
my_nodes[0].nodeId = UA_NODEID_STRING_ALLOC(_nodeIndex, NodeID);
my_nodes[0].attributeId = UA_ATTRIBUTEID_VALUE;
my_nodes[0].value.hasValue = true;
my_nodes[0].value.value.type = &UA_TYPES[UA_TYPES_BOOLEAN];
my_nodes[0].value.value.storageType = UA_VARIANT_DATA_NODELETE;
my_nodes[0].value.value.data = &bool_to_write;
strcpy(NodeID, NodeID_backup);
strcat(NodeID, "4]");
my_nodes[1].nodeId = UA_NODEID_STRING_ALLOC(_nodeIndex, NodeID);
my_nodes[1].attributeId = UA_ATTRIBUTEID_VALUE;
my_nodes[1].value.hasValue = true;
my_nodes[1].value.value.type = &UA_TYPES[UA_TYPES_BOOLEAN];
my_nodes[1].value.value.storageType = UA_VARIANT_DATA_NODELETE;
my_nodes[1].value.value.data = &bool_to_write;
strcpy(NodeID, NodeID_backup);
strcat(NodeID, "5]");
my_nodes[2].nodeId = UA_NODEID_STRING_ALLOC(_nodeIndex, NodeID);
my_nodes[2].attributeId = UA_ATTRIBUTEID_VALUE;
my_nodes[2].value.hasValue = true;
my_nodes[2].value.value.type = &UA_TYPES[UA_TYPES_BOOLEAN];
my_nodes[2].value.value.storageType = UA_VARIANT_DATA_NODELETE;
my_nodes[2].value.value.data = &bool_to_write;
UA_WriteRequest_init(&wReq);
wReq.nodesToWrite = my_nodes;
wReq.nodesToWriteSize = 3;
UA_WriteResponse wResp = UA_Client_Service_write(_client, wReq);
UA_WriteResponse_clear(&wResp);
UA_WriteRequest_clear(&wReq);
return;
}
At first I didn't have much hope that this would work, but it turns out this actually writes the values that I wish. The problem is that on UA_WriteRequest_clear(&wReq); I trigger an exception in the open62541 library:
Also, I know I can write multiple values to arrays specifically, even though in this particular example that would fix my issue, that's not what I mean to do, this example is just to simplify my problem. Just suppose I have a multi-type structure and I want to write to it, all in a single request. I appreciate any help!

First of all, this is bad:
UA_WriteValue my_nodes[3];
my_nodes[0] = *UA_WriteValue_new();
my_nodes[1] = *UA_WriteValue_new();
my_nodes[2] = *UA_WriteValue_new();
my_nodes is already created on the stack, and then you are copying the content of a new object into it by dereferencing. This definitely leads to memory leaks. You probably want to use UA_WriteValue_init() instead.
Never ever dereference the return value of a new() function.
Let's go bottom up:
UA_WriteRequest_clear(&wReq) is recursively freeing all content of the wReq steucture.
This means that it will also call:
UA_Array_delete(wReq.nodesToWrite, wReq.nodesToWriteSize, ...)
which in turn calls UA_free(wReq.nodesToWrite)
And you have:
wReq.nodesToWrite = my_nodes;
with
UA_WriteValue my_nodes[3];
This means that you are assigning a variable, which lives on the stack to a pointer, and later this pointer is freed. free can only delete stuff which is on the heap and not stack, and therefore it fails.
You have two options now:
If you still want to use the stack trick the UA_clear in thinking that the variable is empty:
wReq.nodesToWrite = NULL;
wReq.nodesToWriteSize = 0;
UA_clear(&wReq);
Put the nodes on the heap:
Instead of
UA_WriteValue my_nodes[3]; use Something like UA_WriteValue *my_nodes = (UA_WriteValue*)UA_malloc(sizeof(UA_WriteValue)*3);
Also I strongly recommend that you either use valgrind or clang memory sanitizer to avoid all these memory issues.

How to make a readable return of data

I read this article http://www.slideshare.net/redigon/refactoring-1658371
on page 53 it states that "You have a method that returns a value but also changes the state of the object. create two methods,one for the query and one for the modification.
But what if on the query I need the values of more than 1 field.
For example:
QSqlQuery query(QSqlDatabase::database("MAIN"));
QString command = "SELECT FIELD1, FIELD2, FIELD3, FIELD4, FIELD5 FROM TABLE";
query.exec( command );
This is the method I know but I really feel that this is not that readable
QString values;
columnDelimiter = "[!##]";
rowDelimiter = "[$%^]";
while( query.next )
{
values += query.value(0).toString() + columnDelimiter;
values += query.value(1).toString() + columnDelimiter;
values += query.value(2).toString() + columnDelimiter;
values += query.value(3).toString() + columnDelimiter;
values += rowDelimiter;
}
And I will retrive it like this.
QStringList rowValues, columnValues;
rowValues = values.split(rowDelimiter);
int rowCtr =0;
while( rowCtr < rowValues.count() )
{
columnValues.clear();
// Here i got the fields I need
columnValues = rowValues.at( rowCtr ).split( columnDelimiter );
// I will put the modification on variables here
rowCtr++;
}
EDIT: Is there a more readable way of doing this?

"Is there a more readable way of doing this?" is a subjective question. I'm not sure whether your question will last long on SO, as SO prefers factual problems and solutions.
What I personally think will make your code more readable, would be:
Use a custom made data structure for your data set. Strings are not the right data structures for tabulated data. Lists of custom made structs are better.
Example:
// data structure for a single row
struct MyRow {
QString a, b, c;
}
...
QList<MyRow> myDataSet;
while( query.next )
{
MyRow currentRow;
// fill with data
currentRow.a = query.value(0).toString();
currentRow.b = query.value(1).toString();
...
myDataSet.append(currentRow);
}
I doubt all your data is text. Some is probably numbers. Never store numbers as strings. That's inefficient.
You first read all data into a data structure, and then read the data structure to process it. Why don't you combine the two? I.e. process while reading the data, in the same while(...)
In your comment, you're confused by the difference between an enum and struct. I suggest, stop doing complex database and QT stuff. Grab a basic C++ book and try to understand C++ first.

Byte length of a MySQL column in C++

I am using C++ to write a program for a MySQL database. I am trying to check a condition by comparing the length of a column (in bytes) to pass/fail. Here is the code:
while (row = mysql_fetch_row(result))
{
lengths = mysql_fetch_lengths(result);
num_rows = mysql_num_rows(result);
for (i = 0; i < num_fields; i++)
{
if (strstr(fields[i].name, "RSSI") != NULL)
{
if (lengths[*row[i]] == ??)
printf("current value is %s \t", row[i]);
}
}
}
So basically what i am trying to do is to look for the string "RSSI" in the columns and if the string is present i want to print that value. The values in each column are 3 bytes in length if present . So how do i check if lengths [*rows[i]] is 3 bytes in length? Thanks

According to the official MySQL documentation mysql_fetch_lengths returns an array of unsigned long with the lengths of the columns of the current row. Although the description isn't clear whether it's in bytes or something else, the example shown clarifies it.
So you should be checking directly to 3.
Also, there are some syntactic and semantic errors, and a possible refactoring in your code, among them the following:
Given the lengths variable is an array with the current rows' lengths, the expression lengths[*row[i]] should just be lengths[i] because i is the index of the current column.
The two ifs inside the for could be merged with the && operator for better readability.
Some variables are not defined or used correctly.
The code would look like this:
// Properly assign a value to fields variable.
fields = mysq_fetch_fields(result);
// Getting the number of fields outside the loop is better.
num_fields = mysql_num_fields(result);
while (row = mysql_fetch_row(result))
{
lengths = mysql_fetch_lengths(row);
for (i = 0; i < num_fields; i++)
if (strstr(fields[i].name, "RSSI") != NULL && lengths[i] == 3)
printf("current value is %s \t", row[i]);
printf("\n"); // For better output print each row in a new line.
}
You should really read the documentation carefully in order to avoid compilation or logic errors for using the wrong function.

I think there is a typo:
dev docs states:
(http://dev.mysql.com/doc/refman/5.0/en/mysql-fetch-lengths.html)
...
num_fields = mysql_num_fields(result);
lengths = mysql_fetch_lengths(result);
for(i = 0; i < num_fields; i++)
NOT
lengths = mysql_fetch_lengths(row);

ORM Entity - Remove a record, and add a property

I am putting together a store finder which works on a radius from a postal code. I have done this many times using standard Queries and QoQs, but now trying to put one together using cf9 ORM... but seemed to have reached the limit of my capabilities with the last bit.
I am pulling out an entity and processing it. At the end, I need to:
a. Remove a record if it doesn't meet a certain criteria (distance is greater than specified by user)
OR b. Add a new property to the record to store the distance.
So at the end, all I want in my entity are those stores that are within the range specified by the user, with each record containing the calculated distance.
Best way to see what I am trying to do is to view the full function
Any suggestions greatly appreciated!!
public function getByPostcodeRadius(required postcode="",
required radius=""){
//set some initial vals
rs = {};
geo = New _com.util.geo().init();
local.postcodeGeo = geo.getGeoCode("#arguments.postcode#, Australia");
local.nearbyStores = "";
local.returnStores = {};
//load stores
local.stores = entityload("stores");
//loop over all stores and return list of stores inside radius
for(i=1; i <= ArrayLen(local.stores); i++){
store = {};
store.id = local.stores[i].getID();
store.geoCode = local.stores[i].getGeoCode();
store.Lat = ListgetAt(store.geoCode,1);
store.Lng = ListgetAt(store.geoCode,2);
distance = geo.getDistanceByGeocode(local.postcodeGeo.Lat,local.postcodeGeo.Lng,store.Lat,store.Lng);
//************************
//HERE IS WHERE I AM STUCK.
if (distance LT arguments.radius){
//here I need to add a property 'distance' and set it's value
local.stores[i].distance = distance; // this adds it to the object, but not with the PROPERTIES
} else {
// here i need to remove the store from the object as it's distance was greater than the one passed in
arrayDeleteAt(local.stores,i); //this clearly isn't working, as positions are changing with each loop over
}
}
return local.stores;
}

If you delete an object from an array it will mess up your loop.
Try either looping backwards:
var i = arrayLen( local.stores );
for ( i; i == 0; i-- )
Or looping like this
for ( var local.store in local.stores )
(That's rough code and may need some tweaks)

I'd approach this from a different angle:
1) Instead of deleting from the array of all stores those that don't match, I'd build an array of those that do and return that.
2) If the distance is specific to each query and not a property of the store object, then I wouldn't try adding it to the store, but just "associate" it with the specific data I'm returning for this search.
Putting to the 2 together, I'd return an array of structs containing the store object and its distance from the requested postcode. (You could just return a single struct of the store object and distance, with the store ID as key, but I prefer working with arrays.)
Here's how I'd code it (not tested because I don't have your geo class or entity code):
public array function getByPostcodeRadius(required postcode="", required radius=""){
hint="I return an array of structs each containing a store object within the requested radius and its distance from the requested post code"
// Geo settings
local.geo = New _com.util.geo().init();
local.postcodeGeo = local.geo.getGeoCode("#arguments.postcode#, Australia");
// initialise the array of structs to return, which will contain stores within the requested radius and their distance from the postcode
local.nearbyStores = [];
//load all stores
local.stores = entityload("stores");
//loop over all stores and add those inside the radius to the return array with their distance
for( var storeObject in local.stores ){
// determine the lat-lng for this store
local.storeLat = ListgetAt(storeObject.getGeoCode(),1);
local.storeLng = ListgetAt(storeObject.getGeoCode(),2);
// get the distance from the requested postcode
local.distance = local.geo.getDistanceByGeocode(local.postcodeGeo.Lat,local.postcodeGeo.Lng,local.storeLat,local.storeLong);
if (local.distance LT arguments.radius){
// create a struct of the store object and its distance and add to the nearby stores array
local.thisStore = {
store = storeObject
,distance = local.distance
};
ArrayAppend( local.nearbyStores,local.thisStore );
}
}
return local.nearbyStores;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Read a text file with regexp and store into structure - regex

Related

TFLite c++ determine the classification on output

How to write multiple nodes with OPC-UA at once using open62541?

How to make a readable return of data

Byte length of a MySQL column in C++

ORM Entity - Remove a record, and add a property

Categories

Resources