Regexmatch in Google Sheet to find 2 sections of my website - regex

I'm trying to find the number of users coming to 2 different parts of my website:
/blog/
resources.company
To do this I use REGEXMATCH with | separator:
=SUMPRODUCT(('raw data'!$B$2:$B),REGEXMATCH('RAW - New Users'!$A$2:$A,"/blog/|resources.company"))
However, when I check with this formula:
=SUMPRODUCT(('raw data'!$B$2:$B),REGEXMATCH('RAW - New Users'!$A$2:$A,"/blog/") + REGEXMATCH('RAW - New Users'!$A$2:$A,"resources.company"))
I find more users.
I'm pretty sure the . messes up with the 1st formula. I've tried /blog/|resources\.company but it didn't help. How can I change my first formula so the REGEXMATCH finds everything that contains "resources.company" as well as "/blog"?

Related

Using regextract and importrange together make a "duplicate" formula

I have a spreadsheet where I look if the data (website) already exist on the master sheet.
=if(countif(importrange("Spreadsheet Key","Leads!N:N"),K2)>0,"COMPANY EXISTS!","")
But the above formula is not dynamic enough. If there are companies with co.uk and on the master sheet if it's registered under .com, it won't show "COMPANY EXISTS!"
So I changed to the formula to look for works after before and after "." on a website.
=ARRAYFORMULA(REGEXEXTRACT(UNIQUE(SUBSTITUTE(importrange("Spreadsheet Key","Leads!N:N"),"www.","")), "([0-9A-Za-z-]+)\."))
But it's not working if I try to incorporate with if and countif.
=if(COUNTIF(ARRAYFORMULA(REGEXEXTRACT(SUBSTITUTE(importrange("Spreadsheet Key","Leads!N:N"),"www.",""), "([0-9A-Za-z-]+)\."),L2:L)>0,"Company Exist!",""))
It shows 'Wrong number of arguments to IF. Expected between 2 and 3 arguments, but got 1 arguments'
Can anyone help me out on where I am making the mistake?
Spreadsheet link- https://docs.google.com/spreadsheets/d/1La3oOWiM5KpzRY0MLLEUQC25LzDuQlqTjgFp-VlS8Bo/edit#gid=0
Edited: Made a mistake beforehand, didn't specify on that cell its looking against
Your formula had a typo. You are not closing the Arrayformula and the Countif correctly (the array formula closing parenthesis should go before the , of the count if). So change this:
=if(COUNTIF(ARRAYFORMULA(REGEXEXTRACT(SUBSTITUTE(importrange("Spreadsheet Key","Leads!N:N"),"www.",""), "([0-9A-Za-z-]+)\."),L2:L)>0,"Company Exist!",""))
To this:
=if(COUNTIF(ARRAYFORMULA(REGEXEXTRACT(SUBSTITUTE(IMPORTRANGE("Spreadsheet Key","Leads!N:N"),"www.",""), "([0-9A-Za-z-]+)\.")),L3:L)>0,"Company Exist!","")
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)
try:
=ARRAYFORMULA(IFNA(IF(IFNA(REGEXEXTRACT(SUBSTITUTE(IMPORTRANGE(
"1bnz7Y_xVN9Jo80aCBBeMBMJBnMDHkbZQUWnmL20CRi8", "Leads!N:N"),
"www.", ), "([0-9A-Za-z-]+)\."))>0, "Company Exist!", )))

Regex (re2 googlesheets) multiple values in multiline cell

Getting stuck on how to read and pretty up these values from a multiline cell via arrayformula.
Im using regex as preceding line can vary.
just formulas please, no custom code
The first column looks like a set of these:
```
[config]
name = the_name
texture = blah.dds
cost = 1000
[effect0]
value = 1000
type = ATTR_A
[effect1]
value = 8
type = ATTR_B
[feature0]
name = feature_blah
[components]
0 = comp_one,1
[resources]
res_one = 1
res_five = 1
res_four = 1
<br/>
Where to be useful elsewhere, at minimum it needs each [tag] set ([effect\d], [feature\d], ect) to be in one column each, for example the 'effects' column would look like:
ATTR_A:1000,ATTR_B:8
and so on.
Desired output can also be seen in the included spreadsheet
<br/>
<b>Here is the example spreadsheet:</b>
https://docs.google.com/spreadsheets/d/1arMaaT56S_STTvRr2OxCINTyF-VvZ95Pm3mljju8Cxw/edit?usp=sharing
**Current REGEXREPLACE**
Kinda works, finds each 'type' and 'value' great, just cant figure out how to extract just that from the rest, tried capture (and non-capturing) groups before and after but didnt work
=ARRAYFORMULA(REGEXREPLACE($A3:$A,"[\n.][effect\d][\n.](.)\n(.)","1:$1 2:$2"))
**Current SUBSTITUTE + REGEXEXTRACT + REGEXREPLACE**
A different approach entirely, also kinda works, longer form though and left with having to parse the values out of that string, where got stuck again. Idea was to use this to simplify, then regexreplace like above. Getting stuck removing content around the final matches though, and if can do that then above approach is fine too.
// First ran a substitute
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE($A3:$A,char(10),";"),";;",char(10)))
// Then variation of this (gave up on single line 'effect/d' so broke it up to try and get it working)
=ARRAYFORMULA(IF(A3:A<>"",IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect0]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect1]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect2]);(.)$")&";;"),""))
// Then use regexreplace like above
=ARRAYFORMULA(REGEXREPLACE($B3:$B,"value = (.);type = (.);;","1:$1 2:$2"))
**--EDIT--**
Also, as my updated 'Desired Output' sheet shows (see timestamped comment below), bonus kudos if you can also extract just the values of matching 'type's to those extra columns (see spreadsheet).
All good if you cant though, just realized would need that too for lookups.
**--END OF EDIT--**
<br/>
Ive tried dozens of things, discarding each in turn, had a quick look in version history to grab out two promising attempts and shared them in separate sheets.
One of these also used SUBSTITUTE to simplify input column, im happy for a solution using either RAW or the SUBSTITUTE results.
<br/>
**Potentially Useful links:**
https://github.com/google/re2/wiki/Syntax
<br/>
<b>Just some more words:</b>
I also have looked at dozens of stackoverflow and google support pages, so tried both REGEXEXTRACT and REGEXREPLACE, both promising but missing that final tweak. And i tried dozens of tweaks already on both.
Any help would be great, and hopefully help others in future since examples with spreadsheets are great since every new REGEX seems to be a new adventure ;)
<br/>
P.S. if we can think of better title for OP, please say in comment or your answer :)
paste in B3:
=ARRAYFORMULA(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
IF(C3:E<>"", C2:E2&":"&C3:E, )),,999^99))), " ", ", "))
paste in C3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&C2)))
paste in D3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&D2)))
paste in E3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&E2)))
paste in F3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[feature\d+\]\nname = (.*)")))
paste in G3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[components\]\n\d+ = (.*)")))
paste in H3:
=ARRAYFORMULA(IFNA(REGEXREPLACE(INDEX(SPLIT(REGEXEXTRACT(
REGEXREPLACE(A3:A, "\n", ", "), "\[resources\], (.*)"), "["),,1), ", , $", )))
spreadsheet demo
This was a fun exercise. :-)
Caveat first: I have added some "input data". Examples:
[feature1]
name = feature_active_spoiler2
[components]
0 = spoiler,1
1 = spoilerA, 2
So the output has "extra" output.
See the tab ADW's Solution.

Parsing a name from a complex string in Tableau

I have a series of values in Tableau that are long strings intermixed with letters and numbers. I am unable to control the data output, but would like to parse the names from these strings. They follow the following format:
Potato 1TByte 4.5 NFA
Board 256GByte 553 NCA
Launch 4 512GByte 4.5 NFA
Launch 4S 512GByte 4.5 NCA
From each of these, I am attempting to capture the following:
"Potato"
"Board"
"Launch 4"
"Launch 4S"
Each string follows the same format: the name, followed by size, followed by some extra information we don't really care about.
I've tried to put together some text parsing strings, but am coming up short, and am still trying to learn regular expressions.
The Tableau calculated field I was trying to work with was something like the following:
LEFT([String], FIND([String], "Byte") - 2)
The issue is that the text and numbers preceding Byte can be anywhere from 4 to 2 characters and I need a way to identify the length of that.
Any help would be greatly appreciated!
One option which uses a regex replacement:
REGEXP_REPLACE('Launch 4 512GByte 4.5 NFA', ' \d+[A-Z]Byte .*$', '')
This strips off everything from the Byte term to the right, leaving us with only the product name.
You could try the following - this seems to work - Screenshot of Tableau output. Find below the formulas for the various derived columns you see in the screenshot (Your source column is called [Name])
Step1 = LEFT([Name],FIND([Name],"Byte")-1)
Step2 = LEN([Step1])-LEN(REPLACE([Step1]," ",""))
Step3 = FINDNTH([Step1]," ",[Step2])
Step4 = LEFT([Step1],[Step3]-1)
And of course you can nest all these in a single calculated field - kept them as separate columns for easier understanding

HiveQL: Parse strings and count

I am using HiveQL to work with millions of rows of domain name text data stored in HDFS. The following is a hand-selected subset to illustrate lexical diversity. There are duplicate entries.
dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.
mgmtsubnet.mgmtvcn.oraclevcn.com.
asdf.mgmtvcn.oraclevcn.com.
dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.
localhost.
a.localhost.
img.pulsemgr.com.
36.136.154.156.in-addr.arpa.
accounts.spotify.com.
_dmarc.ixia-devops.com.
&eventtype=close&reason=4&duration=35.
&eventtype=close&reason=3&duration=10336.
I am trying to get a count of # of rows based on the last two levels of the domain, where sometimes the 2nd level is absent (i.e. localhost.). For example:
domain_root count
oraclevcn.com. 4
localhost. 1
a.localhost. 1
pulsemgr.com. 1
in-addr.arpa. 1
spotify.com. 1
ixia-devops.com 1
It would be nice to also see how to filter out domains 2nd level is absent.
I am not sure where to start. I have seen use of the SPLIT() function, but that may not be robust since there could be many levels to a domain name, for example: a.b.c.d.e.f.g.h.i etc.
Any ideas are implementations are appreciated.
Below would be the query with regexp_extract.
select domain_root, count(*) from (select regexp_extract('dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.', '[A-Za-z0-9-]+\.[A-Za-z0-9-]+\.$', 0) as domain_root from table) A group by A.domain_root -- replace first argument with column name
regex will extract for domain root with Alphanumeric and special character '-'
hope this helps.

Regex / htaccess : Multiple Pages

What I want
Imagine 3 feeds in a single page what have unique contents and I would like to have a pagination for each feed, so I can paginate 3 feeds at the same time, and when I change a feed's page that doesn't affects the other feeds currently selected pages.
So when I change feed3 to page 12, feed1 remains on page 2 and feed2 remains on page 3
Example
change this : test.com/pages/feed1=2/feed2=3/feed3=12
into this : test.com/index.php?pages=on&feed1=2&feed2=3&feed3=12
Note: there can be more or less than 3 feeds
The Problem
The problem is that I can't convert multiple variables if I want to change the whole link.
This is the regex what I have tried
(\/)+pages+(\/)+([a-zA-Z0-9.-]+)+(\:)+([0-9])
but this gets only /pages/feed1=2
But I would like to convert all the feeds with their variables so I can work with them in PHP.
Take a look at http://php.net/manual/en/function.preg-replace.php , and just do 2 string replacements.
one: replace pages with index.php?pages=on, second replace / with &