Driving Licenses siebel Issue - regex

I have florida driving licenses like A123-123-12-123-1 and A123456789321.Now I am using below expression to show my data like XXXX-XXX-XX-XX1231.
([\s.,:])([a-zA-Z)\d{12}|([a-zA-Z)\d{3}[\s{1}-]\d{2}[\s{1}-]\d{3}[\s{1}-]\d{1}([\s.,:]).
Please let me know how can i use above expression to remove all spaces form the expresson and display the format as i mentioned above.
Thanks

It seems there is a mismatch in the input and output, e.g.
A123-123-12-123-1
XXXX-XXX-XX-XX1231
there are two extra characters (ignoring dashes) in the desired output.
So assuming you want to make the output longer by repeating "12", e.g.
A123-123-12-123-1
A123-123-12-121231
Here is the code:
regex = /(?:[\s.,:])([a-zA-Z)(\d{3})[\s-]?(\d{3})[\s-]?(\d{2})[\s-]?(\d{2})(\d{1})[\s-]?(\d{1})(?:[\s.,:])/
fixed = licence.replace(regex, "$1$2-$3-$4-$5$5$6$7")

Related

Extract digits in between 2 different parameters

I have all data being imported into one cell as:
"<blank space><email address><blank space><CustomerId><blank space><(email address)><line break for next entry>"
Example:
email1#provider.com 12345678 (email1#provider.com)
email224#provider.com 23902490 (email224#provider.com)
I need to extract only the customer ID's, while separating them with a comma, so I tried the following: regexreplace(A2,"([^[:digit:]])",","), however, this also extracts the numbers associated with the emails, so it returns me:
,,,,,1,,,,,,,,,,,,,,12345678,,,,,,,1,,,,,,,,,,,,,,
,,,,,224,,,,,,,,,,,,,,23902490,,,,,,,224,,,,,,,,,,,,,,
Since the email address is set by the user, I don't have control how many digits or if only digits are used in it. I can't seem to understand how to isolate the CustomerIds alone.
Please help!
Edit1:
CustomerID: 64-bit int field, randomly assigned to a client, therefore checking by the length of the string would not work.
Edit2:
For now, I am using the formula below, but I would still be interested in a solution using Regex.
filter(transpose(split($B$4," ")),isnumber(transpose(split($B$4," "))))
If they are separated by a space you should be able to set the space to be your delimiter and extract from there.
https://zapier.com/blog/split-text-excel-zapier/
use:
=ARRAYFORMULA(TEXTJOIN(", ", 1, IFERROR(REGEXEXTRACT(A1:A2, "(?s)(\d{8})"))))

Extract a list of unique text characters/ emojis from a cell

I have a text in cell (A1) like this:
✌😋👅👅☝️😉🍌🍪💧💧
I want to extract the unique emojis from this cell into separate cells:
✌😋👅☝️😉🍌🍪💧
Is this possible?
You want to put each character of ✌😋👅👅☝️😉🍌🍪💧💧 to each cell by splitting using the built-in function of Google Spreadsheet.
Sample formula:
=SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#")
✌😋👅👅☝️😉🍌🍪💧💧 is put in a cell "A1".
Using REGEXREPLACE, # is put to between each character like ✌#😋#👅#👅#☝#️#😉#🍌#🍪#💧#💧#.
Using SPLIT, the value is splitted with #.
Result:
Note:
In your question, the value of ️ which cannot be displayed is included. It's \ufe0f. So "G1" can be seen like no value. But the value is existing. So please be careful this. If you want to remove the value, you can use ✌😋👅👅☝😉🍌🍪💧💧.
References:
REGEXREPLACE
SPLIT
Added:
From marikamitsos's comment, I could notice that my understanding was not correct. So the final result is as follows. This is from marikamitsos.
=TRANSPOSE(UNIQUE(TRANSPOSE(SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#"))))
or try:
=TRANSPOSE(UNIQUE(TRANSPOSE(REGEXEXTRACT(A1, REPT("(.)", LEN(A1))))))
Formula
Appears, one of the best formula solutions would be:
=SPLIT(REGEXREPLACE(A1,"(.)","$1#"),"#")
You may also add some additional checks like skin tones & intermediate chars:
=TRANSPOSE(SPLIT(REGEXREPLACE(A2,"(.[🏻🏼🏽🏾🏿"&CHAR(8205)&CHAR(65039)&"]*)","#$1"),"#"))
It will help to join some emojis as a single emoji.
Script
More precise way is to use the script:
https://github.com/orling/grapheme-splitter/blob/master/index.js
↑
Add the code to Script editor
Add code for sample usage:
function splitEmojis(string) {
var splitter = new GraphemeSplitter();
// split the string to an array of grapheme clusters (one string each)
var graphemes = splitter.splitGraphemes(string);
return graphemes;
}
Tests
Not 100% precise
1
Please note: some emojis are not correctly shown in sheets
🏴󠁧󠁢󠁷󠁬󠁳󠁿🏴󠁧󠁢󠁳󠁣󠁴󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴
↑ emojis:
flag: England
flag: Scotland
flag: Wales
black flag
are the same for Google Sheets.
2
Vlookup function in #GoogleSheets and in #Excel thinks chars
#️⃣ and
*️⃣
are the same!

Simple regex doesn't work in Chrome extension (Popup.js)

I have an issue that is driving me nuts. All other regex' are working, but they "stop" working if it includes a letter.
The string is as following:
مسلسل The Outsider 2020 الموسم الاول الحلقة 8 مترجمة
I want to get the number "8". The text can be understood as "Series x Season y Episode NUMBER". I'm trying to get the number using the following regex:
/(?<=.*الحلقة.*)[0-9]+(?=.*)/g
It's working fine when applying it here https://regexr.com/ OR on the console. I'm implementing it in the Popup.js file.
The following regex is working fine without issues
/(?<=title=")(.*?)(?=")/
My JS code to get the regex:
let episodeNumber = item.match(/(?<=title=")(.*?)(?=")/)[0];
console.log("t: " + episodeNumber);
episodeNumber = episodeNumber.match(/(?<=.*الحلقة.*)[0-9]+(?=.*)/g);
console.log(episodeNumber);
The first log gives me the result (the text above)
The second log gives null in the console. It's really weird.
If the regex isn't formatted well, please replace "الحلقة" with "Episode". It's in Arabic, that's why it would look weird in the formatting.

Need to use regex to extract a part of a string

I'm a regex noob that's trying to use the regexp_extract() function in data studio to extract part of a string. Could you help me out?
I need to extract the part of the string that comes after 'May'. Everything before 'May' is exactly the same across all campaigns.
I've tried googling the solution and killed a lot of time on regexer.com but i can't figure it out
Current Campaign Name:
Xxxxx_xxxxx_PKN_Trueview_24th MayComedy Movie Fans18-24
Xxxxx_xxxxx_PKN_Trueview_24th MaySouth Asian Film Fans18-24
Xxxxx_xxxxx_PKN_Trueview_24th MayCricket Enthusiasts18-24
Xxxxx_xxxxx_PKN_Trueview_24th MayMotorcycle Enthusiasts18-24
Expected Campaign Names:
Comedy Movie Fans18-24
South Asian Film Fans18-24
Cricket Enthusiasts18-24
Motorcycle Enthusiasts18-24
EDIT: I'm trying to use this in data studio in the REGEXP_EXTRACT(Campaign,"regex_code_here") function. I think the acceptable syntax is re2.
You may actually use REGEXP_REPLACE here to remove all before and including May:
REGEXP_REPLACE(Campaign, '.*May', '')
See the regex demo:
The regex you need is this:
(?<=May).*$
Test it here.
You can use replace
^.*?May - Match everything up-to first occurrence of May
"$`" - replace with portion that follows substring Ref
let arr = ["Xxxxx_xxxxx_PKN_Trueview_24th MayComedy Movie Fans18-24","Xxxxx_xxxxx_PKN_Trueview_24th MaySouth Asian Film Fans18-24","Xxxxx_xxxxx_PKN_Trueview_24th MayCricket Enthusiasts18-24","Xxxxx_xxxxx_PKN_Trueview_24th MayMotorcycle Enthusiasts18-24"]
let op = arr.map(str=> str.replace(/^.*?May/g, "$`"))
console.log(op)

How can I use regexextract function in Google Docs spreadsheets to get "all" occurrences of a string?

My text string is in cell D2:
Decision, ERC Case No. 2009-094 MC, In the Matter of the Application for Authority to Secure Loan from the National Electrification Administration (NEA), with Prayer for Issuance of Provisional Authority, Dinagat Island Electric Cooperative, Inc. (DIELCO) applicant(12/29/2011)
This function:
=regexextract(D2,"\([A-Z]*\)")
will grab (NEA) but not (DIELCO)
I would like it to extract both (NEA) and (DIELCO)
You can use capture groups, which will cause regexextract() to return an array. You can use this as the cell result, in which case you will get a range of results, or you can feed the array to another function to reformat it to your purpose. For example:
regexextract( "abracadabra" ; "(bra).*(bra)" )
will return the array:
{bra,bra}
Another approach would be to use regexreplace(). This has the advantage that the replace is global (like s/pattern/replacement/g), so you do not need to know the number of results in advance. For example:
regexreplace( "aBRAcadaBRA" ; "[a-z]+" ; "..." )
will return the string:
...BRA...BRA
Here are two solutions, one using the specific terms in the author's example, the other one expanding on the author's sample regex pattern which appears to match all ALLCAPS terms. I'm not sure which is wanted, so I gave both.
(Put the block of text in A1)
Generic solution for all words in ALLCAPS
=regexreplace(regexreplace(REGEXREPLACE(A1,"\b\w[^A-Z]*\b","|"),"\W+","|"),"^\||\|$","")
Result:
ERC|MC|NEA|DIELCO
NB: The brunt of the work is in the CAPITALIZED formula, the lowercase functions are just for cleanup.
If you want space separation, the formula is a little simpler:
=trim(regexreplace(REGEXREPLACE(A1,"\b\w[^A-Z]*\b"," "),"\W+"," "))
Result:
ERC MC NEA DIELCO
(One way I like playing with regex in google spreadsheets is to read the regex pattern from another cell so I can change it without having to edit or re-paste into all the cells using that pattern. This looks so:
Cell A1:
Block of text
Cell B1 (no quote marks):
\b\w[^A-Z]*\b
Formula, in any cell:
=trim(regexreplace(REGEXREPLACE(A1,B$1," "),"\W+"," "))
By anchoring it to B$1, I can fill all my rows at once and the reference won't increment.)
Previous answer:
Specific solution for selected terms (ERC, DIELCO)
=regexreplace(join("|",IF(REGEXMATCH(A1,"ERC"),"ERC",""),IF(REGEXMATCH(A1,"DIELCO"),"DIELCO","")),"(^\||\|$)","")
Result:
ERC|DIELCO
As before, the brunt of the work is in the CAPITALIZED formula, the lowercase functions are just for cleanup.
This formula will find any ERC or DIELCO, or both in the block of text. The initial order doesn't matter, but the output will always be ERC followed by DIELCO (the order of appearance is lost). This fixes the shortcoming with the previous answer using "(bra).*(bra)" in that isolated ERC or DIELCO can still be matched.
This also has a simpler form with space separation:
=trim(join(" ",IF(REGEXMATCH(A1,"ERC"),"ERC",""),IF(REGEXMATCH(A1,"DIELCO"),"DIELCO","")))
Result:
ERC DIELCO
Please try:
=SPLIT(regexreplace(A1 ; "(?s)(.)?\(([A-Z]+)\)|(.)" ; "🧸$2");"🧸")
or
=REGEXEXTRACT(A1;"\Q"&REGEXREPLACE(A1;"\([A-Z]+\)";"\\E(.*)\\Q")&"\E")