Yaml-cpp parsing doesn't work space is missing after colon - c++

I have encountered problem in yaml-cpp parser. When I try to load following definition:
DsUniversity:
university_typ: {type: enum, values:[Fachhochschule, Universitat, Berufsakademie]}
students_at_university: {type: string(50)}
I'm getting following error:
Error: yaml-cpp: error at line 2, column 39: end of map flow not found
I tried to verify yaml validity on http://yaml-online-parser.appspot.com/ and http://yamllint.com/ and both services reports yaml as valid.
Problem is caused by missing space after "values:" definition. When yaml is updated to following format:
DsUniversity:
university_typ: {type: enum, values: [Fachhochschule, Universitat, Berufsakademie]}
students_at_university: {type: string(50)}
everything works as expected.
Is there any way how to configure/update/fix yaml-cpp parser to proceed also yamls with missing space after colon?
Added:
It seems that problem is caused by requirement for empty char as separator. When I simplified testing snippet to
DsUniversity:[Fachhochschule, Universitat, Berufsakademie]
yaml-cpp parser reads it as one scalar value "DsUniversity:[Fachhochschule, Universitat, Berufsakademie]". When empty char is added after colon, yaml-cpp correctly loads element with sequence.

yaml-cpp is correct here, and those online validators are incorrect. From the YAML 1.2 spec:
7.4.2. Flow Mappings
Normally, YAML insists the “:” mapping value indicator be separated from the value by white space. A benefit of this restriction is that the “:” character can be used inside plain scalars, as long as it is not followed by white space. This allows for unquoted URLs and timestamps. It is also a potential source for confusion as “a:1” is a plain scalar and not a key: value pair.
...
To ensure JSON compatibility, if a key inside a flow mapping is JSON-like, YAML allows the following value to be specified adjacent to the “:”. This causes no ambiguity, as all JSON-like keys are surrounded by indicators. However, as this greatly reduces readability, YAML processors should separate the value from the “:” on output, even in this case.
In your example, you're in a flow mapping (meaning a map surrounded by {}), but your key is not JSON-like: you just have a plain scalar (values is unquoted). To be JSON-like, the key needs to be either single- or double-quoted, or it can be a nested flow sequence or map itself.
In your simplified example,
DsUniversity:[Fachhochschule, Universitat, Berufsakademie]
both yaml-cpp and the online validators parse this correctly as a single scalar - in order to be a map, as you intend, you're required a space after the :.
Why does YAML require that space?
In the simple plain scalar case:
a:b
could be ambiguous: it could be read as either a scalar a:b, or a map {a: b}. YAML chooses to read this as a scalar so that URLs can be easily embedded in YAML without quoting:
http://stackoverflow.com
is a scalar (like you'd expect), not a map {http: //stackoverflow.com}!
In a flow context, there's one case where this isn't ambiguous: when the key is quoted, e.g.:
{"a":b}
This is called JSON-like because it's similar to JSON, which requires quotes around all scalars. In this case, YAML knows that the key ends at the end-quote, and so it can be sure that the value starts immediately.
This behavior is explicitly allowed because JSON itself allows things like
{"a":"b"}
Since YAML 1.2 is a strict superset of JSON, this must be legal in YAML.

I think it would be beneficial to parse scalar/keys differently immediately inside a flow map{, if you agree, vote here please.
https://github.com/yaml/yaml-spec/issues/267

Related

OpenModelica SimulationOptions 'variableFilter' not working with '^' exceptions

To reduce size of my simulation output files, I want to give variable name exceptions instead of a list of many certain variables to the simulationsOptions/outputFilter (cf. OpenModelica Users Guide / Output) of my model. I found the regexp operator "^" to fullfill my needs, but that didn't work as expected. So I think that something is wrong with the interpretation of connected character strings when negated.
Example:
When I have any derivatives der(...) in my model and use variableFilter=der.* the output file will contain all the filtered derivatives. Since there are no other varibles beginning with character d the same happens with variableFilter=d.*. For testing I also tried variableFilter=rde.* to confirm that every variable is filtered.
When I now try to except by variableFilter=^der.*, =^rde.* or =^d.*, I get exactly the same result as without using ^. So the operator seems to be ignored in this notation.
When I otherwise use variableFilter=[^der].*, =[^rde].* or even =[^d].*, all wanted derivation variables are filtered from the ouput, but there is no difference between those three expressions above. For me it seems that every character is interpretated standalone and not as as a connected string.
Did I understand and use the regexp usage right or could this be a code bug?
Side/follow-up question: Where can I officially report this for software revision?
_
OpenModelica v.1.19.2 (64-bit)

Matlab: What's the most efficient approach to parse a large table or cell array with regexp when sometimes there is no match?

I am working with a messy manually maintained "database" that has a column containing a string with name,value pairs. I am trying to parse the entire column with regexp to pull out the values. The column is huge (>100,000 entries). As a proxy for my actual data, let's use this code:
line1={'''thing1'': ''-583'', ''thing2'': ''245'', ''thing3'': ''246'', ''morestuff'':, '''''};
line2={'''thing1'': ''617'', ''thing2'': ''239'', ''morestuff'':, '''''};
line3={'''thing1'': ''unexpected_string(with)parens5'', ''thing2'': 245, ''thing3'':''246'', ''morestuff'':, '''''};
mycell=vertcat(line1,line2,line3);
This captures the general issues encountered in the database. I want to extract what thing1, thing2, and thing3 are in each line using cellfun to output a scalar cell array. They should normally be 3 digit numbers, but sometimes they have an unexpected form. Sometimes thing3 is completely missing, without the name even showing up in the line. Sometimes there are minor formatting inconsistencies, like single quotes missing around the value, spaces missing, or dashes showing up in front of the three digit value. I have managed to handle all of these, except for the case where thing3 is completely missing.
My general approach has been to use expressions like this:
expr1='(?<=thing1''):\s?''?-?([\w\d().]*?)''?,';
expr2='(?<=thing2''):\s?''?-?([\w\d().]*?)''?,';
expr3='(?<=thing3''):\s?''?-?([\w\d().]*?)''?,';
This looks behind for thingX' and then tries to match : followed by zero or one spaces, followed by 0 or 1 single quote, followed by zero or one dash, followed by any combination of letters, numbers, parentheses, or periods (this is defined as the token), using a lazy match, until zero or one single quote is encountered, followed by a comma. I call regexp as regexp(___,'tokens','once') to return the matching token.
The problem is that when there is no match, regexp returns an empty array. This prevents me from using, say,
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),mycell);
unless I call it with 'UniformOutput',false. The problem with that is twofold. First, I need to then manually find the rows where there was no match. For example, I can do this:
emptyout=cellfun(#(x) isempty(x),out);
emptyID=find(emptyout);
backfill=cell(length(emptyID),1);
[backfill{:}]=deal('Unknown');
out(emptyID)=backfill;
In this example, emptyID has a length of 1 so this code is overkill. But I believe this is the correct way to generalize for when it is longer. This code will change every empty cell array in out with the string Unknown. But this leads to the second problem. I've now got a 'messy' cell array of non-scalar values. I cannot, for example, check unique(out) as a result.
Pardon the long-windedness but I wanted to give a clear example of the problem. Now my actual question is in a few parts:
Is there a way to accomplish what I'm trying to do without using 'UniformOutput',false? For example, is there a way to have regexp pass a custom string if there is no match (e.g. pass 'Unknown' if there is no match)? I can think of one 'cheat', which would be to use the | operator in the expression, and if the first token is not matched, look for something that is ALWAYS found. I would then still need to double back through the output and change every instance of that result to 'Unknown'.
If I take the 'UniformOutput',false approach, how can I recover a scalar cell array at the end to easily manipulate it (e.g. pass it through unique)? I will admit I'm not 100% clear on scalar vs nonscalar cell arrays.
If there is some overall different approach that I'm not thinking of, I'm also open to it.
Tangential to the main question, I also tried using a single expression to run regexp using 3 tokens to pull out the values of thing1, thing2, and thing3 in one pass. This seems to require 'UniformOutput',false even when there are no empty results from regexp. I'm not sure how to get a scalar cell array using this approach (e.g. an Nx1 cell array where each cell is a 3x1 cell).
At the end of the day, I want to build a table using these results:
mytable=table(out1,out2,out3);
Edit: Using celldisp sheds some light on the problem:
celldisp(out)
out{1}{1} =
246
out{2} =
Unknown
out{3}{1} =
246
I assume that I need to change the structure of out so that the contents of out{1}{1} and out{3}{1} are instead just out{1} and out{3}. But I'm not sure how to accomplish this if I need 'UniformOutput',false.
Note: I've not used MATLAB and this doesn't answer the "efficient" aspect, but...
How about forcing there to always be a match?
Just thinking about you really wanting a match to skip this problem, how about an empty match?
Looking on the MATLAB help page here I can see a 'emptymatch' option, perhaps this is something to try.
E.g.
the_thing_i_want_to_find|
Match "the_thing_i_want_to_find" or an empty match, note the | character.
In capture group it might look like this:
(the_thing_i_want_to_find|)
As a workaround, I have found that using regexprep can be used to find entries where thing3 is missing. For example:
replace='$1 ''thing3'': ''Unknown'', ''morestuff''';
missingexpr='(?<=thing2'':\s?)(''?-?[\w\d().]*?''?,) ''morestuff''';
regexprep(mycell{2},missingexpr,replace)
ans =
''thing1': '617', 'thing2': '239', 'thing3': 'Unknown', 'morestuff':, '''
Applying it to the entire array:
fixedcell=cellfun(#(x) regexprep(x,missingexpr,replace),mycell);
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),fixedcell,'UniformOutput',false);
This feels a little roundabout, but it works.
cellfun can be replaced with a plain old for loop. Your code will either be equally fast, or maybe even faster. cellfun is implemented with a loop anyway, there is no advantage of using it other than fewer lines of code. In your explicit loop, you can then check the output of regexp, and build your output array any way you like.

YAML parsing error. Expected <block end>, but found '-'

I have the following config.yml:
- persist_to_workspace:
root: ~/project
paths: *build_cache_paths
# for integration tests:
- /home/circleci/cache/Cypress
I'm trying to persist_to_workspace /home/circleci/cache/Cypress. What's wrong with my syntax?
Your paths key has the value *build_cache_paths which is an alias. That means the value of paths is a reference to the node with the anchor &build_cache_paths (assuming it exists).
Two lines below, you start a sequence with -. Generally, a sequence at this level would be the value of a previous implicit key. But in this case it can't be, since the key paths already has a value. Hence the error.
If your goal is to merge the sequence behind *build_cache_paths with the sequence you give below: That is not possible with YAML. YAML is a serialization language, it doesn't implement operations on data (apart from the non-standard merge key << that is supported by some implementations but only works on mappings, not on sequences).

nifi routeText processor usage issue

I am facing issue in configuring RouteText Processor correctly. I have to filter out those lines which have say a particular string values at a particular index. Let's say I want all the lines which have 'BT' or 'PV7' and 'PV30' values at index 19. My file is csv.
I tried using below configuration but all of my lines are moved to unmatched relation. However, data is containing other lines too.
You need to change the Matching Strategy to "Satisfies Expression" since you are not using regular expressions here.
The docs for Satisfies Expression says:
"Match lines based on whether or not the the text satisfies the given Expression Language expression. I.e., the line will match if the property value, evaluated as an Expression, returns true. The expression is able to reference FlowFile Attributes, as well as the variables 'line' (which is the text of the line to evaluate) and 'lineNo' (which is the line number being evaluated. This will be 1 for the first line, 2 for the second and so on)."

How can I replace text in a Siebel data mapping?

I have an outgoing web service to send data from Siebel 7.8 to an external system. In order for the integration to work, before I send the data, I must change one of the field values, replacing every occurence of "old" with "new". How can I do this with EAI data mappings?
In an ideal world I would just use an integration source expression like Replace([Description], "old", "new"). However Siebel is far from ideal, and doesn't have a replace function (or if it does, it's not documented). I can use all the Siebel query language functions which don't need an execution context. I can also use the functions available for calculated fields (sane people could expect both lists to be the same, but Siebel documentation is also far from ideal).
My first attempt was to use the InvokeServiceMethod function and replace the text myself in eScript. So, this is my field map source expression:
InvokeServiceMethod('MyBS', 'MyReplace', 'In="' + [Description] + '"', 'Out')
After some configuration steps it works fine... except if my description field contains the " character: Error parsing expression 'In="This is a "test" with quotes"' for field '3' (SBL-DAT-00481)
I know why this happens. My double quotes are breaking the expression and I have to escape them by doubling the character, as in This is a ""test"" with quotes. However, how can I replace each " with "" in order to call my business service... if I don't have a replace function? :)
Oracle's support web has only one result for the SBL-DAT-00481 error, which as a workaround, suggests to place the whole parameter inside double quotes (which I already had). There's a linked document in which they acknowledge that the workaround is valid for a few characters such as commas or single quotes, but due to a bug in Siebel 7.7-7.8 (not present in 8.0+), it doesn't work with double quotes. They suggest to pass instead the row id as argument to the business service, and then retrieve the data directly from the BC.
Before I do that and end up with a performance-affecting workaround (pass only the ID) for the workaround (use double quotes) for the workaround (use InvokeServiceMethod) for not having a replace function... Am I going crazy here? Isn't there a simple way to do a simple text replacement in a Siebel data mapping?
first thing (quite possibly - far from optimal one) which is coming to my mind - is to create at source BC calculated field, aka (NEW_VALUE), which becomes "NEW" for every record, where origin field has a value "OLD". and simply use this field in integration map.