Using regextract and importrange together make a "duplicate" formula - regex

I have a spreadsheet where I look if the data (website) already exist on the master sheet.
=if(countif(importrange("Spreadsheet Key","Leads!N:N"),K2)>0,"COMPANY EXISTS!","")
But the above formula is not dynamic enough. If there are companies with co.uk and on the master sheet if it's registered under .com, it won't show "COMPANY EXISTS!"
So I changed to the formula to look for works after before and after "." on a website.
=ARRAYFORMULA(REGEXEXTRACT(UNIQUE(SUBSTITUTE(importrange("Spreadsheet Key","Leads!N:N"),"www.","")), "([0-9A-Za-z-]+)\."))
But it's not working if I try to incorporate with if and countif.
=if(COUNTIF(ARRAYFORMULA(REGEXEXTRACT(SUBSTITUTE(importrange("Spreadsheet Key","Leads!N:N"),"www.",""), "([0-9A-Za-z-]+)\."),L2:L)>0,"Company Exist!",""))
It shows 'Wrong number of arguments to IF. Expected between 2 and 3 arguments, but got 1 arguments'
Can anyone help me out on where I am making the mistake?
Spreadsheet link- https://docs.google.com/spreadsheets/d/1La3oOWiM5KpzRY0MLLEUQC25LzDuQlqTjgFp-VlS8Bo/edit#gid=0
Edited: Made a mistake beforehand, didn't specify on that cell its looking against

Your formula had a typo. You are not closing the Arrayformula and the Countif correctly (the array formula closing parenthesis should go before the , of the count if). So change this:
=if(COUNTIF(ARRAYFORMULA(REGEXEXTRACT(SUBSTITUTE(importrange("Spreadsheet Key","Leads!N:N"),"www.",""), "([0-9A-Za-z-]+)\."),L2:L)>0,"Company Exist!",""))
To this:
=if(COUNTIF(ARRAYFORMULA(REGEXEXTRACT(SUBSTITUTE(IMPORTRANGE("Spreadsheet Key","Leads!N:N"),"www.",""), "([0-9A-Za-z-]+)\.")),L3:L)>0,"Company Exist!","")
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

try:
=ARRAYFORMULA(IFNA(IF(IFNA(REGEXEXTRACT(SUBSTITUTE(IMPORTRANGE(
"1bnz7Y_xVN9Jo80aCBBeMBMJBnMDHkbZQUWnmL20CRi8", "Leads!N:N"),
"www.", ), "([0-9A-Za-z-]+)\."))>0, "Company Exist!", )))

Related

IMPORTDATA in Google Sheets and XPath

On GoogleSheet I need to crawl the Amazon's share's price (following xPath) on this page : https://www.boursorama.com/cours/AMZN/
//*[#id="main-content"]/div/section[1]/header/div/div/div[1]/div[1]/div/div[1]/span[1]
I can do it using this formula and it works :
IMPORTXML("https://www.boursorama.com/cours/AMZN/", "//*[#id=""main-content""]/div/section[1]/header/div/div/div[1]/div[1]/div/div[1]/span[1]")
BUT there is a daily limitation of IMPORTXML function.
So to avoid this Google's daily limitation I need to use this elegant method so it should be something like this :
=REGEXEXTRACT(QUERY(TRANSPOSE(IMPORTDATA(
"https://www.boursorama.com/cours/AMZN/"));
"where Col1 contains 'basp:""'");"(\d+.*)""") <-- here is the line where something is wrong
I'm not used to work with REGEX, does someone can help me on it ?
try:
=REGEXREPLACE(QUERY(FLATTEN(IMPORTDATA(
"https://www.boursorama.com/cours/AMZN")),
"where Col1 contains 'data-ist-bid-price>'", 0),
"</?\S+[^<>]*>", )*1

Google Sheets ARRAYFORMULA to skip blank rows

How to make my ARRAYFORMULA(A1 + something else) to stop producing results after there are no more values in A1 column, eg. to skip blank values. By default it gives endlessly "something else".
Here is my demo sheet:
https://docs.google.com/spreadsheets/d/1AikL5xRMB94BKwG34Z_tEEiI07aUAmlbNzxGZF2VeYs/edit?usp=sharing
Actual data in column A1 is regularly changing, rows are being added.
I tried the others and they didn't work. This does though:
=ARRAYFORMULA(filter(A1:B;A1:A<>"";B1:B<>""))
use:
=ARRAYFORMULA(IF(A1:A="";;A1:A+1000))
You can try this formula =ARRAYFORMULA(IF(ISBLANK(A1:A),"",(A1:A + B1:B))) if this works out for you.
Reference:
https://support.google.com/docs/answer/3093290?hl=en

Google Sheets: IFS statement wont return date

I'm currently using the following formula:
=IFS(regexmatch(A2,"Malaysia"),
B2-dataset!B3,REGEXMATCH(A2,"Saudi Arabia"),
B2-dataset!B7,REGEXMATCH(A2,"Taiwan"),
B2-dataset!B11,REGEXMATCH(A2,"Russia"),
B2-dataset!B15,REGEXMATCH(A2,"Greece"),
B2-dataset!B19,REGEXMATCH(A2,"South Africa"),
B2-dataset!B23,REGEXMATCH(A2,"UAE"),
B2-dataset!B27,REGEXMATCH(A2,"Albania"),
B2-dataset!B31,REGEXMATCH(A2,"India"),
B2-dataset!B35,REGEXMATCH(A2,"South Korea"),
B2-dataset!B39,REGEXMATCH(A2,"Turkey"),
B2-dataset!B43)
The idea is that B2 (currently as =date(dd/mm/yyyy) has a deadline date. C2 (in which the formula houses) should show the date when everything should be delivered.
Currently the outcome is a number, not a date. I've tried IF statement, which delivers a date but I can only add 3 arguments. Can someone help me?
Kind regards
if your current output is number like 40000+ it's ok and it's just formatting issue. either you will format it internally or use a formula.
try:
=TEXT(IFS(regexmatch(A2,"Malaysia"),
B2-dataset!B3,REGEXMATCH(A2,"Saudi Arabia"),
B2-dataset!B7,REGEXMATCH(A2,"Taiwan"),
B2-dataset!B11,REGEXMATCH(A2,"Russia"),
B2-dataset!B15,REGEXMATCH(A2,"Greece"),
B2-dataset!B19,REGEXMATCH(A2,"South Africa"),
B2-dataset!B23,REGEXMATCH(A2,"UAE"),
B2-dataset!B27,REGEXMATCH(A2,"Albania"),
B2-dataset!B31,REGEXMATCH(A2,"India"),
B2-dataset!B35,REGEXMATCH(A2,"South Korea"),
B2-dataset!B39,REGEXMATCH(A2,"Turkey"),
B2-dataset!B43), "dd/mm/yyyy")

Regex (re2 googlesheets) multiple values in multiline cell

Getting stuck on how to read and pretty up these values from a multiline cell via arrayformula.
Im using regex as preceding line can vary.
just formulas please, no custom code
The first column looks like a set of these:
```
[config]
name = the_name
texture = blah.dds
cost = 1000
[effect0]
value = 1000
type = ATTR_A
[effect1]
value = 8
type = ATTR_B
[feature0]
name = feature_blah
[components]
0 = comp_one,1
[resources]
res_one = 1
res_five = 1
res_four = 1
<br/>
Where to be useful elsewhere, at minimum it needs each [tag] set ([effect\d], [feature\d], ect) to be in one column each, for example the 'effects' column would look like:
ATTR_A:1000,ATTR_B:8
and so on.
Desired output can also be seen in the included spreadsheet
<br/>
<b>Here is the example spreadsheet:</b>
https://docs.google.com/spreadsheets/d/1arMaaT56S_STTvRr2OxCINTyF-VvZ95Pm3mljju8Cxw/edit?usp=sharing
**Current REGEXREPLACE**
Kinda works, finds each 'type' and 'value' great, just cant figure out how to extract just that from the rest, tried capture (and non-capturing) groups before and after but didnt work
=ARRAYFORMULA(REGEXREPLACE($A3:$A,"[\n.][effect\d][\n.](.)\n(.)","1:$1 2:$2"))
**Current SUBSTITUTE + REGEXEXTRACT + REGEXREPLACE**
A different approach entirely, also kinda works, longer form though and left with having to parse the values out of that string, where got stuck again. Idea was to use this to simplify, then regexreplace like above. Getting stuck removing content around the final matches though, and if can do that then above approach is fine too.
// First ran a substitute
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE($A3:$A,char(10),";"),";;",char(10)))
// Then variation of this (gave up on single line 'effect/d' so broke it up to try and get it working)
=ARRAYFORMULA(IF(A3:A<>"",IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect0]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect1]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect2]);(.)$")&";;"),""))
// Then use regexreplace like above
=ARRAYFORMULA(REGEXREPLACE($B3:$B,"value = (.);type = (.);;","1:$1 2:$2"))
**--EDIT--**
Also, as my updated 'Desired Output' sheet shows (see timestamped comment below), bonus kudos if you can also extract just the values of matching 'type's to those extra columns (see spreadsheet).
All good if you cant though, just realized would need that too for lookups.
**--END OF EDIT--**
<br/>
Ive tried dozens of things, discarding each in turn, had a quick look in version history to grab out two promising attempts and shared them in separate sheets.
One of these also used SUBSTITUTE to simplify input column, im happy for a solution using either RAW or the SUBSTITUTE results.
<br/>
**Potentially Useful links:**
https://github.com/google/re2/wiki/Syntax
<br/>
<b>Just some more words:</b>
I also have looked at dozens of stackoverflow and google support pages, so tried both REGEXEXTRACT and REGEXREPLACE, both promising but missing that final tweak. And i tried dozens of tweaks already on both.
Any help would be great, and hopefully help others in future since examples with spreadsheets are great since every new REGEX seems to be a new adventure ;)
<br/>
P.S. if we can think of better title for OP, please say in comment or your answer :)
paste in B3:
=ARRAYFORMULA(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
IF(C3:E<>"", C2:E2&":"&C3:E, )),,999^99))), " ", ", "))
paste in C3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&C2)))
paste in D3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&D2)))
paste in E3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&E2)))
paste in F3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[feature\d+\]\nname = (.*)")))
paste in G3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[components\]\n\d+ = (.*)")))
paste in H3:
=ARRAYFORMULA(IFNA(REGEXREPLACE(INDEX(SPLIT(REGEXEXTRACT(
REGEXREPLACE(A3:A, "\n", ", "), "\[resources\], (.*)"), "["),,1), ", , $", )))
spreadsheet demo
This was a fun exercise. :-)
Caveat first: I have added some "input data". Examples:
[feature1]
name = feature_active_spoiler2
[components]
0 = spoiler,1
1 = spoilerA, 2
So the output has "extra" output.
See the tab ADW's Solution.

Stata: Efficient way to replace numerical values with string values

I have code that currently looks like this:
replace fname = "JACK" if id==103
replace lname = "MARTIN" if id==103
replace fname = "MICHAEL" if id==104
replace lname = "JOHNSON" if id==104
And it goes on for multiple pages like this, replacing an ID name with a first and last name string. I was wondering if there is a more efficient way to do this en masse, perhaps by using the recode command?
I will echo the other answers that suggest a merge is the best way to do this.
But if you absolutely must code the lines item-wise (again, messy) you can generate a long list ("pages") of replace commands by using MS Excel to "help" you write the code. Here is a picture of your Excel sheet with one example, showing the MS Excel formula:
columns:
A B C D
row: 1 last first id code
2 MARTIN JACK 103 ="replace fname=^"&B2&"^ if id=="&C2
You type that in, make sure it looks like Stata code when the formula calculates (aside from the carets), and copy the formula in column D down to the end of your list. Then copy the whole block of Stata code in column D generated by the formulas into your do-file, and do a find and replace (be careful here if you are using the caret elsewhere for mathematical uses!!) for all ^ to be replaced with ", which will end up generating proper Stata syntax.
(This is truly a brute force way of doing this, and is less dynamic in the case that there are subsequent changes to your generation list. All--apologies in advance for answering a question here advocating use of Excel :) )
You don't explain where the strings you want to add come from, but what is generally the best technique is explained at
http://www.stata.com/support/faqs/data-management/group-characteristics-for-subsets/index.html
Create an associative array of ids vs Fname,Lname
103 => JACK,MARTIN
104 => MICHAEL,JOHNSON
...
Replace
id => hash{id} ( fname & lname )
The efficiency of doing this will be taken care by the programming language used