Nested Regexmatch Not Working on Range of Zeros and Ones - regex

I have a sum filter formula and have nested a REGEXMATCH function within it as a condition to filter the range to be summed.
The full formula looks like:
=sum(filter(data,
region1=$AF$4,
industry=$A11,
quarter=AG$9,
REGEXMATCH(consent,"1")))
The range "consent" is just 0 or 1 for each value in the range.
When I run this function 0 is returned whereas I expect about 1,000.
The documentation for REGEXMATCH says
"This function only works with text (not numbers) as input and returns
text as output. If a number is desired as the output, try using the
VALUE function in conjunction with this function. If numbers are used
as input, convert them to text using the TEXT function."
I'm not sure what to do with that. I tried the following:
REGEXMATCH(consent,1) // no luck
REGEXMATCH(TEXT(consent),"1") // no luck
REGEXMATCH(TEXT(consent),TEXT(1)) // no luck
But, if I do this:
REGEXMATCH(consent,".*") // does work for all data in consent
How can I tell GSheets to REGEXMATCH on the range consent where it equals 1?

I think the documentation is a bit misleading, because while you can convert to text using the TEXT function (which requires a second argument that prescribes the format of the output, which is why your attempt was not working), it is probably not the easiest way to do it. Probably better would be TO_TEXT, or simply appending &"":
REGEXMATCH(TO_TEXT(consent),"1")
REGEXMATCH(consent&"","1")
That being said, is there a reason you can't just use consent=1 (in which case, you could just use consent by itself as an argument in FILTER)?

Related

perform mathematical operations on a number without changing the attached text

I need a formula that can multiply or divide all the numbers in a string without changing the text attached to the numbers.
I need the numbers in the next column to automatically change according to the given mathematical operation, but the text from the original line must remain unchanged.
I've tried using a combination of REGEXMATCH and REGEXEXTRACT and by doing this I just get the result of multiplying/dividing all the numbers in the string (no text whatsoever).
I also had no success using REGEXREPLACE. I'm not even sure we can actually use it in this case, and maybe I need a different formula instead. Maybe you first need to extract the numbers, multiply them and use something like TEXTJOIN or CONCATENATE to put them together in a string with the values already changed, and is this even possible in this specific example? It's totally fine to perform the operation in several steps if needed (for example, adding SPLIT function or something like that), but the format of the raw data we need to enter and recalculate, unfortunately, cannot be modified.
A sample table for better visualisation can be seen below. Any help would be greatly appreciated!
Raw data
Operation
Desired outcome
25STR/40DEX/70FRES
*0.25
6.25STR/10DEX/17.5FRES
80VIT/30INT/50CRES
*0.75
60STR/22.5INT/37.5CRES
60VIT/20STR/45LRES
*1.25
75VIT/25STR/56.25LRES
You may try:
=byrow(index(bycol(split(A2:A,"/"),lambda(z,ifna(ifs(left(B2:B,1)="*",regexextract(z,"\d+")*mid(B2:B,2,99),left(B2:B,1)="/",round(regexextract(z,"\d+")/mid(B2:B,2,99),2))&regexextract(z,"\d+(.*)"))))),lambda(y,if(y="",,join("/",y))))

How to get the first occurrence of value with the matching number from multiple sets

I have request comes with multiple elements where I need the first occurrence of the where data_type="3". Hence there could be multiple values comes as 0,2,3,4 in random.
When I tried to put the below Xpath function it's returning the all values where data_type='3'
<xsl:value-of select="/process/TransactionType/data_xml/transaction/sub_documents/transactionLine[#data_type='3']/Ref"/>
Full input and output code click here code snippet
How I can get the one value instead all values.
Please help me out here.
Well, with XPath if exp gives you a sequence of values and you want the first use e.g. (exp)[1] i.e. (/process/TransactionType/data_xml/transaction/sub_documents/transactionLine[#data_type='3']/Ref)[1].

Matlab: What's the most efficient approach to parse a large table or cell array with regexp when sometimes there is no match?

I am working with a messy manually maintained "database" that has a column containing a string with name,value pairs. I am trying to parse the entire column with regexp to pull out the values. The column is huge (>100,000 entries). As a proxy for my actual data, let's use this code:
line1={'''thing1'': ''-583'', ''thing2'': ''245'', ''thing3'': ''246'', ''morestuff'':, '''''};
line2={'''thing1'': ''617'', ''thing2'': ''239'', ''morestuff'':, '''''};
line3={'''thing1'': ''unexpected_string(with)parens5'', ''thing2'': 245, ''thing3'':''246'', ''morestuff'':, '''''};
mycell=vertcat(line1,line2,line3);
This captures the general issues encountered in the database. I want to extract what thing1, thing2, and thing3 are in each line using cellfun to output a scalar cell array. They should normally be 3 digit numbers, but sometimes they have an unexpected form. Sometimes thing3 is completely missing, without the name even showing up in the line. Sometimes there are minor formatting inconsistencies, like single quotes missing around the value, spaces missing, or dashes showing up in front of the three digit value. I have managed to handle all of these, except for the case where thing3 is completely missing.
My general approach has been to use expressions like this:
expr1='(?<=thing1''):\s?''?-?([\w\d().]*?)''?,';
expr2='(?<=thing2''):\s?''?-?([\w\d().]*?)''?,';
expr3='(?<=thing3''):\s?''?-?([\w\d().]*?)''?,';
This looks behind for thingX' and then tries to match : followed by zero or one spaces, followed by 0 or 1 single quote, followed by zero or one dash, followed by any combination of letters, numbers, parentheses, or periods (this is defined as the token), using a lazy match, until zero or one single quote is encountered, followed by a comma. I call regexp as regexp(___,'tokens','once') to return the matching token.
The problem is that when there is no match, regexp returns an empty array. This prevents me from using, say,
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),mycell);
unless I call it with 'UniformOutput',false. The problem with that is twofold. First, I need to then manually find the rows where there was no match. For example, I can do this:
emptyout=cellfun(#(x) isempty(x),out);
emptyID=find(emptyout);
backfill=cell(length(emptyID),1);
[backfill{:}]=deal('Unknown');
out(emptyID)=backfill;
In this example, emptyID has a length of 1 so this code is overkill. But I believe this is the correct way to generalize for when it is longer. This code will change every empty cell array in out with the string Unknown. But this leads to the second problem. I've now got a 'messy' cell array of non-scalar values. I cannot, for example, check unique(out) as a result.
Pardon the long-windedness but I wanted to give a clear example of the problem. Now my actual question is in a few parts:
Is there a way to accomplish what I'm trying to do without using 'UniformOutput',false? For example, is there a way to have regexp pass a custom string if there is no match (e.g. pass 'Unknown' if there is no match)? I can think of one 'cheat', which would be to use the | operator in the expression, and if the first token is not matched, look for something that is ALWAYS found. I would then still need to double back through the output and change every instance of that result to 'Unknown'.
If I take the 'UniformOutput',false approach, how can I recover a scalar cell array at the end to easily manipulate it (e.g. pass it through unique)? I will admit I'm not 100% clear on scalar vs nonscalar cell arrays.
If there is some overall different approach that I'm not thinking of, I'm also open to it.
Tangential to the main question, I also tried using a single expression to run regexp using 3 tokens to pull out the values of thing1, thing2, and thing3 in one pass. This seems to require 'UniformOutput',false even when there are no empty results from regexp. I'm not sure how to get a scalar cell array using this approach (e.g. an Nx1 cell array where each cell is a 3x1 cell).
At the end of the day, I want to build a table using these results:
mytable=table(out1,out2,out3);
Edit: Using celldisp sheds some light on the problem:
celldisp(out)
out{1}{1} =
246
out{2} =
Unknown
out{3}{1} =
246
I assume that I need to change the structure of out so that the contents of out{1}{1} and out{3}{1} are instead just out{1} and out{3}. But I'm not sure how to accomplish this if I need 'UniformOutput',false.
Note: I've not used MATLAB and this doesn't answer the "efficient" aspect, but...
How about forcing there to always be a match?
Just thinking about you really wanting a match to skip this problem, how about an empty match?
Looking on the MATLAB help page here I can see a 'emptymatch' option, perhaps this is something to try.
E.g.
the_thing_i_want_to_find|
Match "the_thing_i_want_to_find" or an empty match, note the | character.
In capture group it might look like this:
(the_thing_i_want_to_find|)
As a workaround, I have found that using regexprep can be used to find entries where thing3 is missing. For example:
replace='$1 ''thing3'': ''Unknown'', ''morestuff''';
missingexpr='(?<=thing2'':\s?)(''?-?[\w\d().]*?''?,) ''morestuff''';
regexprep(mycell{2},missingexpr,replace)
ans =
''thing1': '617', 'thing2': '239', 'thing3': 'Unknown', 'morestuff':, '''
Applying it to the entire array:
fixedcell=cellfun(#(x) regexprep(x,missingexpr,replace),mycell);
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),fixedcell,'UniformOutput',false);
This feels a little roundabout, but it works.
cellfun can be replaced with a plain old for loop. Your code will either be equally fast, or maybe even faster. cellfun is implemented with a loop anyway, there is no advantage of using it other than fewer lines of code. In your explicit loop, you can then check the output of regexp, and build your output array any way you like.

Airtable If-statement outputting NaN

I'm using an If-statement to assign integers to strings from another cell. This seems to be working, but if I reference these columns, I'm getting a NaN value. This is my formula below. I tried adding INT() around the output values, but that seemed to break everything. Am I missing something?
IF(FIND('1',{Functional response}),-4,
IF(FIND('2',{Functional response}),-2,
IF(FIND('3',{Functional response}),0,
IF(FIND('4',{Functional response}),2,
IF(FIND('5',{Functional response}),4,"")))))
Assuming Functional response can only store a number 1 to 5 as a string a simple option in excel would be to first convert the string to a number and then use the choose function to assign a value. this works as the numbers are are sequential integers. Assuming Cell K2 has the value of Functional response, your formula could be:
=CHOOSE(--K2,-4,-2,0,2,4)
=CHOOSE(K2+0,-4,-2,0,2,4)
=CHOOSE(K2-0,-4,-2,0,2,4)
=CHOOSE(K2*1,-4,-2,0,2,4)
=CHOOSE(K2/1,-4,-2,0,2,4)
Basically sending the string of a pure number through a math operation has excel convert it to a number. By sending it through a math operation that does not change its value, you get the string as a number.
CHOOSE is like a sequential IF function Supply it with an integer as the first argument and then it will return the value from the subsequent list that matches the number. if the number you supply is greater than the number of options you will get an error.
Alternatively you could just do a straight math convertion on the number stored as a string in K2 using the following formula:
=(K2-3)*2
And as my final option, you could build a table and use VLOOKUP or INDEX/MATCH.
NOTE: If B2:B6 was stored as strings instead of numbers, K2 instead of --K2 would need to be used.

Best way to show blank cell if value if zero

=COUNTIFS(Orders!$T:$T,$B4)
is a code that gives 0 or a +ve result
I use this across 1500 cells which makes the sheet gets filled with 0s
I'd like to remove the Zeros by using the following formula
if(COUNTIFS(Orders!$T:$T,$B3,Orders!$F:$F,""&P$1&"*")=0,
"",
COUNTIFS(Orders!$T:$T,$B3,Orders!$F:$F,""&P$1&"*"))
This calculates every formula twice and increases the calculation time.
How can we do this in 1 formula where if the value is 0 - keep empty - otherwise display the answer
I suggest this cell-function:
=IFERROR(1/(1/COUNTIFS(Orders!$T:$T,$B4)))
EDIT:
I'm not sure what to add as explanation. Basically to replace the result of a complex calculation with blank cells if it results in 0, you can wrap the complex function in
IFERROR(1/(1/ ComplexFunction() ))
It works by twice taking the inverse (1/X) of the result, thus returning the original result in all cases except 0 where a DIV0 error is generated. This error is then caught by IFERROR to result in a blank cell.
The advantage of this method is that it doesn't need to calculate the complex function twice, so can give a significant speed/readability increase, and doesn't fool the output like a custom number format which can be important if this cell is used in further functions.
You only need to set the number format for your range of cells.
Go to the menu Format-->Number-->More Formats-->Custom Number Format...
In the entry area at the top, enter the following: #;-#;""
The "format" of the format string is
(positive value format) ; (negative value format) ; (zero value format)
You can apply colors or commas or anything else. See this link for details
instead of your =COUNTIFS(Orders!$T:$T,$B4) use:
=REGEXREPLACE(""&COUNTIFS(Orders!$T:$T,$B4), "^0$", )
also, to speed up things you should avoid "per row formulae" and use ArrayFormulas