How to get the first occurrence of value with the matching number from multiple sets - xslt

I have request comes with multiple elements where I need the first occurrence of the where data_type="3". Hence there could be multiple values comes as 0,2,3,4 in random.
When I tried to put the below Xpath function it's returning the all values where data_type='3'
<xsl:value-of select="/process/TransactionType/data_xml/transaction/sub_documents/transactionLine[#data_type='3']/Ref"/>
Full input and output code click here code snippet
How I can get the one value instead all values.
Please help me out here.

Well, with XPath if exp gives you a sequence of values and you want the first use e.g. (exp)[1] i.e. (/process/TransactionType/data_xml/transaction/sub_documents/transactionLine[#data_type='3']/Ref)[1].

Related

perform mathematical operations on a number without changing the attached text

I need a formula that can multiply or divide all the numbers in a string without changing the text attached to the numbers.
I need the numbers in the next column to automatically change according to the given mathematical operation, but the text from the original line must remain unchanged.
I've tried using a combination of REGEXMATCH and REGEXEXTRACT and by doing this I just get the result of multiplying/dividing all the numbers in the string (no text whatsoever).
I also had no success using REGEXREPLACE. I'm not even sure we can actually use it in this case, and maybe I need a different formula instead. Maybe you first need to extract the numbers, multiply them and use something like TEXTJOIN or CONCATENATE to put them together in a string with the values already changed, and is this even possible in this specific example? It's totally fine to perform the operation in several steps if needed (for example, adding SPLIT function or something like that), but the format of the raw data we need to enter and recalculate, unfortunately, cannot be modified.
A sample table for better visualisation can be seen below. Any help would be greatly appreciated!
Raw data
Operation
Desired outcome
25STR/40DEX/70FRES
*0.25
6.25STR/10DEX/17.5FRES
80VIT/30INT/50CRES
*0.75
60STR/22.5INT/37.5CRES
60VIT/20STR/45LRES
*1.25
75VIT/25STR/56.25LRES
You may try:
=byrow(index(bycol(split(A2:A,"/"),lambda(z,ifna(ifs(left(B2:B,1)="*",regexextract(z,"\d+")*mid(B2:B,2,99),left(B2:B,1)="/",round(regexextract(z,"\d+")/mid(B2:B,2,99),2))&regexextract(z,"\d+(.*)"))))),lambda(y,if(y="",,join("/",y))))

spliting based on a condition and Array arguments to IF are of different size

I'm working on a sheet that can be extracted from a system and all the data is in one cell so i need to split them and it is basic for all cells except for resutls. as you can see test results should follow the same pattren of result status. so i regularlly splited one column (test status) and i tried to split test results based on if condition
it worked perfectly, however, for some status test results were not spliting because (Array arguments to IF are of different size.)
how do i fix this, please help
Thank you
Because IF() function is checking only first cell of C2:G2 range. Concat them into a single string them use search function to detect word Final. Try-
=IF(ISNUMBER(SEARCH("Final",JOIN(",",C2:G2))),SPLIT(H2,","),"")
You may try this:
=BYCOL(C2:G2,LAMBDA(Σ,(IF(Σ<>"Final",,INDEX(SPLIT(H2,","),, COUNTIF(C2:Σ,Σ))))))

Matlab: What's the most efficient approach to parse a large table or cell array with regexp when sometimes there is no match?

I am working with a messy manually maintained "database" that has a column containing a string with name,value pairs. I am trying to parse the entire column with regexp to pull out the values. The column is huge (>100,000 entries). As a proxy for my actual data, let's use this code:
line1={'''thing1'': ''-583'', ''thing2'': ''245'', ''thing3'': ''246'', ''morestuff'':, '''''};
line2={'''thing1'': ''617'', ''thing2'': ''239'', ''morestuff'':, '''''};
line3={'''thing1'': ''unexpected_string(with)parens5'', ''thing2'': 245, ''thing3'':''246'', ''morestuff'':, '''''};
mycell=vertcat(line1,line2,line3);
This captures the general issues encountered in the database. I want to extract what thing1, thing2, and thing3 are in each line using cellfun to output a scalar cell array. They should normally be 3 digit numbers, but sometimes they have an unexpected form. Sometimes thing3 is completely missing, without the name even showing up in the line. Sometimes there are minor formatting inconsistencies, like single quotes missing around the value, spaces missing, or dashes showing up in front of the three digit value. I have managed to handle all of these, except for the case where thing3 is completely missing.
My general approach has been to use expressions like this:
expr1='(?<=thing1''):\s?''?-?([\w\d().]*?)''?,';
expr2='(?<=thing2''):\s?''?-?([\w\d().]*?)''?,';
expr3='(?<=thing3''):\s?''?-?([\w\d().]*?)''?,';
This looks behind for thingX' and then tries to match : followed by zero or one spaces, followed by 0 or 1 single quote, followed by zero or one dash, followed by any combination of letters, numbers, parentheses, or periods (this is defined as the token), using a lazy match, until zero or one single quote is encountered, followed by a comma. I call regexp as regexp(___,'tokens','once') to return the matching token.
The problem is that when there is no match, regexp returns an empty array. This prevents me from using, say,
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),mycell);
unless I call it with 'UniformOutput',false. The problem with that is twofold. First, I need to then manually find the rows where there was no match. For example, I can do this:
emptyout=cellfun(#(x) isempty(x),out);
emptyID=find(emptyout);
backfill=cell(length(emptyID),1);
[backfill{:}]=deal('Unknown');
out(emptyID)=backfill;
In this example, emptyID has a length of 1 so this code is overkill. But I believe this is the correct way to generalize for when it is longer. This code will change every empty cell array in out with the string Unknown. But this leads to the second problem. I've now got a 'messy' cell array of non-scalar values. I cannot, for example, check unique(out) as a result.
Pardon the long-windedness but I wanted to give a clear example of the problem. Now my actual question is in a few parts:
Is there a way to accomplish what I'm trying to do without using 'UniformOutput',false? For example, is there a way to have regexp pass a custom string if there is no match (e.g. pass 'Unknown' if there is no match)? I can think of one 'cheat', which would be to use the | operator in the expression, and if the first token is not matched, look for something that is ALWAYS found. I would then still need to double back through the output and change every instance of that result to 'Unknown'.
If I take the 'UniformOutput',false approach, how can I recover a scalar cell array at the end to easily manipulate it (e.g. pass it through unique)? I will admit I'm not 100% clear on scalar vs nonscalar cell arrays.
If there is some overall different approach that I'm not thinking of, I'm also open to it.
Tangential to the main question, I also tried using a single expression to run regexp using 3 tokens to pull out the values of thing1, thing2, and thing3 in one pass. This seems to require 'UniformOutput',false even when there are no empty results from regexp. I'm not sure how to get a scalar cell array using this approach (e.g. an Nx1 cell array where each cell is a 3x1 cell).
At the end of the day, I want to build a table using these results:
mytable=table(out1,out2,out3);
Edit: Using celldisp sheds some light on the problem:
celldisp(out)
out{1}{1} =
246
out{2} =
Unknown
out{3}{1} =
246
I assume that I need to change the structure of out so that the contents of out{1}{1} and out{3}{1} are instead just out{1} and out{3}. But I'm not sure how to accomplish this if I need 'UniformOutput',false.
Note: I've not used MATLAB and this doesn't answer the "efficient" aspect, but...
How about forcing there to always be a match?
Just thinking about you really wanting a match to skip this problem, how about an empty match?
Looking on the MATLAB help page here I can see a 'emptymatch' option, perhaps this is something to try.
E.g.
the_thing_i_want_to_find|
Match "the_thing_i_want_to_find" or an empty match, note the | character.
In capture group it might look like this:
(the_thing_i_want_to_find|)
As a workaround, I have found that using regexprep can be used to find entries where thing3 is missing. For example:
replace='$1 ''thing3'': ''Unknown'', ''morestuff''';
missingexpr='(?<=thing2'':\s?)(''?-?[\w\d().]*?''?,) ''morestuff''';
regexprep(mycell{2},missingexpr,replace)
ans =
''thing1': '617', 'thing2': '239', 'thing3': 'Unknown', 'morestuff':, '''
Applying it to the entire array:
fixedcell=cellfun(#(x) regexprep(x,missingexpr,replace),mycell);
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),fixedcell,'UniformOutput',false);
This feels a little roundabout, but it works.
cellfun can be replaced with a plain old for loop. Your code will either be equally fast, or maybe even faster. cellfun is implemented with a loop anyway, there is no advantage of using it other than fewer lines of code. In your explicit loop, you can then check the output of regexp, and build your output array any way you like.

Nested Regexmatch Not Working on Range of Zeros and Ones

I have a sum filter formula and have nested a REGEXMATCH function within it as a condition to filter the range to be summed.
The full formula looks like:
=sum(filter(data,
region1=$AF$4,
industry=$A11,
quarter=AG$9,
REGEXMATCH(consent,"1")))
The range "consent" is just 0 or 1 for each value in the range.
When I run this function 0 is returned whereas I expect about 1,000.
The documentation for REGEXMATCH says
"This function only works with text (not numbers) as input and returns
text as output. If a number is desired as the output, try using the
VALUE function in conjunction with this function. If numbers are used
as input, convert them to text using the TEXT function."
I'm not sure what to do with that. I tried the following:
REGEXMATCH(consent,1) // no luck
REGEXMATCH(TEXT(consent),"1") // no luck
REGEXMATCH(TEXT(consent),TEXT(1)) // no luck
But, if I do this:
REGEXMATCH(consent,".*") // does work for all data in consent
How can I tell GSheets to REGEXMATCH on the range consent where it equals 1?
I think the documentation is a bit misleading, because while you can convert to text using the TEXT function (which requires a second argument that prescribes the format of the output, which is why your attempt was not working), it is probably not the easiest way to do it. Probably better would be TO_TEXT, or simply appending &"":
REGEXMATCH(TO_TEXT(consent),"1")
REGEXMATCH(consent&"","1")
That being said, is there a reason you can't just use consent=1 (in which case, you could just use consent by itself as an argument in FILTER)?

Element-in-List testing

For a stylesheet I'm writing (actually for a set of them, each generating a different output format), I have the need to evaluate whether a certain value is present in a list of values. In this case, the value being tested is taken from an element's attribute. The list it is to be tested against comes from the invocation of the stylesheet, and is taken as a top-level <xsl:param> (to be provided on the command-line when I call xsltproc or a Saxon equivalent invocation). For example, the input value may be:
v0_01,v0_10,v0_99
while the attribute values will each look very much like one such value. (Whether a comma is used to separate values, or a space, is not important-- I chose a comma for now because I plan on passing the value via command-line switch to xsltproc, and using a space would require quoting the argument, and I'm lazy-enough to not want to type the extra two characters.)
What I am looking for is something akin to Perl's grep, wherein I can see if the value I currently have is contained in the list. It can be done with sub-string tests, but this would have to be clever so as not to get a false-positive (v0_01 should not match a string that contains v0_011). It seems that the only non-scalar data-type that XSL/XSLT supports is a node-set. I suppose it's possible to convert the list into a set of text nodes, but that seems like over-kill, even compared to making a sub-string test with extra boundaries-checking to prevent false matches.
Actually, using XPath string functions is the right way to do it. All you have to make sure is that you test for the delimiters as well:
contains(concat(',' $list, ','), concat(',', $value, ','))
would return a Boolean value. Or you might use one of these:
substring-before(concat('|,' $list, ',|'), concat(',', $value, ','))
or
substring-after(concat('|,' $list, ',|'), concat(',', $value, ','))
If you get an empty string as the result, $value is not in the list.
EDIT:
#Dimitre's comment is correct: substring-before() (or substring-after()) would also return the empty string if the found string is the first (or the last) in the list. To avoid that, I added something at the start and the end of the list. Still contains() is the recommended way of doing this.
In addition to the XPath 1.0 solution provided by Tomalak,
Using XPath 2.0 one can tokenize the list of values:
exists(tokenize($list, ',')[. = $value])
evaluates to true() if and only if $value is contained in the list of values $list