Regular Expressions - Snowflake [closed] - regex

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 months ago.
Improve this question
enter image description hereI am trying to get text till the fourth "\n \n" from the below text. Can you please help me to write the snowflake expression for this issue.
Hello Jeffrey,\n \nWe have not heard from you yet. I hope all is well with you.\n \nChecking in to gather your Goosehead approved office location address, so we can add you to our database here at ERGOS. Once added here, we can schedule your laptop setup.\n \nGoosehead requires all agents to be onboarded by ERGOS so that we can provide IT support as well as get your laptop in our database. \n \nDo you have a laptop ready for setup?

so every thing up to the first \n \n can be fetched with regexp_substr via:
select
regexp_substr(column1, '.*\n \n') as match
from values
('Hello Jeffrey,\n \nWe have not heard from you yet. I hope all is well with you.\n \nChecking in to gather your Goosehead approved office location address, so we can add you to our database here at ERGOS. Once added here, we can schedule your laptop setup.\n \nGoosehead requires all agents to be onboarded by ERGOS so that we can provide IT support as well as get your laptop in our database. \n \nDo you have a laptop ready for setup?');
MATCH
Hello Jeffrey,
now, if we add a group around that ( ) and ask for 4 matches {4}, and swap to a smaller sample text, to make things less ugly for the output
select
regexp_substr(column1, '(.*\n \n){4}') as match
from values
('1111\n \n222222222222222\n \n3333333333333333\n \n44444444444444444\n \n55555555555555555555555');
gives:
MATCH
1111 222222222222222 3333333333333333 44444444444444444
if you are expecting the \n in the output:
then
select
column1,
regexp_substr(column1, '[^\\\\]+\\\\n \\\\n') as match
from values
('1111\\n \\n22222\\n \\n33333333\\n \\n4444444\\n \\n55555\\n \\66666\\n \\n7777');
shows how they need to be encoded in the SQL to output, and thus how to encode the match.
these matches greedy and gives:
COLUMN1
MATCH
1111\n \n22222\n \n33333333\n \n4444444\n \n55555\n \66666\n \n7777
1111\n \n
thus putting the grouping back in:
select
column1,
regexp_substr(column1, '([^\\\\]+\\\\n \\\\n){4}') as match
from values
('1111\\n \\n22222\\n \\n33333333\\n \\n4444444\\n \\n55555\\n \\66666\\n \\n7777');
COLUMN1
MATCH
1111\n \n22222\n \n33333333\n \n4444444\n \n55555\n \66666\n \n7777
1111\n \n22222\n \n33333333\n \n4444444\n \n
Picture to example for escaped new lines:

Related

Update series of numeric values in long string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have text column with following examplary data:
5,5,0.1;6,6,0.15;7,7,0.2;8,8,0.25;9,9,0.3;10,10,0.35;11,11,0.4;12,12,0.45;13,13,0.5;14,14,0.55;15,15,0.6;16,16,0.65;17,17,0.7;18,18,0.75;19,19,0.8;20,20,0.85;
I need to add some fixed value to each of numeric values (the one before semicolon)
so for example from:
5,5,0.1;6,6,0.15; I want add 0.15 so result would be:
5,5,0.25;6,6,0.3;
I guess I should try something with regexp_replace but I have no idea how to start here
The correct solution would be fix your broken data model and not store multiple, delimited values in a single column.
I wouldn't do this with a regex, but unnesting the elements of the string, adding the value to the third element, then aggregate everything back into the broken design:
update badly_designed_table
set denormalized_column =
(select string_agg(concat_ws(',', a, b, round(c + 0.15,2)), ';' order by idx)
from (
select split_part(val, ',', 1) as a,
split_part(val, ',', 2) as b,
split_part(val, ',', 3)::numeric as c,
idx
from unnest(string_to_array(bad_column, ';')) with ordinality as x(val,idx)
-- skip the "empty" element generated by the trailing ;
where nullif(val, '') is not null
) t)

Remove specific characters from string to tidy up URLs [duplicate]

This question already has answers here:
Extracting rootdomains from URL string in Google Sheets
(3 answers)
Closed 2 years ago.
Hi I have a column of messy URL links within Google Sheets I'm trying to clean up, I want all formats of website links to be the same so that I can run a duplicate check on them.
For example, I have a list of URLs with various http, http://, https:// etc. I am trying to use the REGEXREPLACE tool to remove all http combination elements from the column entries, however cannot get it to work. This is what I have:
Before:
http://www.website1.com/
https://website2.com/
https://www.website3.com/
And I want - After:
website.com
website2.com
website3.com
It is ok if this takes place over a number of formulas and thus columns to the end result.
try:
=ARRAYFORMULA(IFERROR(REGEXEXTRACT(INDEX(SPLIT(
REGEXREPLACE(A1:A, "https?://www.|https?://|www.", ), "/"),,1),
"\.(.+\..+)"), INDEX(IFERROR(SPLIT(
REGEXREPLACE(A1:A, "https?://www.|https?://|www.", ), "/")),,1)))
or shorter:
=INDEX(IFERROR(REGEXEXTRACT(A1:A, "^(?:https?:\/\/)?(?:www\.)?([^\/]+)")))
You can try the following formula
=ArrayFormula(regexreplace(LEFT(P1:P3,LEN(P1:P3)-1),"(.*//www.)|(.*//)",""))
Please do adjust ranges as needed.

Access 2010 Query add text to end of existing text if condition is met

I have a column of data, diagnosis codes to be exact. the problem is that when the data is imported it turns 111.0 into 111 (or any whole number). I am wondering if there is an update query I can run that will add the ".0" to the end of any value that is 3 characters long. I had a problem of it stripping a value from 008.45 to 8.45 but I figured that part out using:
UPDATE Master SET DIAGNOSIS01 = LEFT("00", 3-LEN(DIAGNOSIS01)) + DIAGNOSIS01
WHERE LEN(DIAGNOSIS01)<3 AND Len(DIAGNOSIS01)>0;
I got that from here on stackoverflow. Is there a variation of this update query I can use to add to the right if it's only 3 digits?
Additional info... formats of the values in this column include xxx.x or xxx.xx with x being a number
When it comes to sql I am very new so please treat me like I'm 3... ;)
UPDATE Master
SET Master.DIAGNOSIS01 = IIf(Len([Master].[DIAGNOSIS01])=3,[Master].[DIAGNOSIS01] & ".0",[Master].[DIAGNOSIS01]);

How to do ANDing of conditions in a regular expression?

I want to match and modify part of a string if following conditions are true:
I want to capture information regarding a project, like project duration, client, technologies used, etc..
So, I want to select string starting with word "project" or string may start with other words like "details of project" or "project details" or "project #1".
RegEx. should first look at word "project" and it should select the string only when few or all of the following words are found after word "project".
1) client
2) duration
3) environment
4) technologies
5) role
I want to select a string if it matches at least 2 of the above words. Words can appear in any order and if the string contains ANY two or three of these words, then the string should get selected.
I have sample text given below.
Details of Projects :
*Project #1: CVC – Customer Value Creation (Sep 2007 – till now) Time
Warner Cable is the world's leading
media and entertainment company, Time
Warner Cable (TWC) makes coaxial
quiver.
Client : Time Warner Cable,US. ETL
Tool : Informatica 7.1.4
Database : Oracle 9i.
Role : ETL Developer/Team Lead.
O/S : UNIX.
Responsibilities: Created Test Plan and Test Case Book. Peer reviewed team members > Mappings. Documented Mappings. Leading the Development Team. Sending Reports to onsite. Bug >fixing for Defects, Data and Performance related.
Details of Project #2: MYER – Sales
Analysis system (Nov 2005 – till now)
Coles Myer is one of Australia's largest retailers with more than 2,000 > stores throughout Australia,
Client : Coles Myer
Retail, Australia. ETL Tool :
Informatica 7.1.3 Database : Oracle
8i. Role : ETL Developer. O/S :
UNIX. Responsibilities: Extraction,
Transformation and Loading of the data
using Informatica. Understanding the
entire source system.
Created and Run Sessions and
Workflows. Created Sort files using
Syncsort Application.*
Does anyone know how to achieve this using regular expressions?
Any clues or regular expressions are welcome!
Many thanks!
(client|duration|environment|technologies|role).+(client|duration|environment|technologies|role)(?!\1)
I would break it down into a few simpler regex's to get these results. The first would select only the chunk of text between projects: (?=Project #).*(?<=Project #)
With the match that this produces, i would run a seperate regex to ask if it contains any of those words : client | duration | environment | technologies | role
If this match comes back with a count of more then 2 distinct matches, you know to select the original string!
Edit:
string originalText;
MatchCollection projectDescriptions = Regex.Matches(originalText, "(?=Project #).(?:(?!Project #).)*", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Foreach(Match projectDescription in projectDescriptions)
{
MatchCollection keyWordMatches = Regex.Matches(projectDescription.value, "client | duration | environment | technologies | role ", RegexOptions.IgnoreCase);
if(keyWordMatches.Distinct.Count > 2)
{
//At this point, do whatever you need to with the original projectDescription match, the Match object will give you the index etc of the match inside the original string.
}
}
Maybe you need to break that requirements in two steps: first, take your key/value pairs from your string, than apply your filter.
string input = #"Project #...";
Regex projects = new Regex(#"(?<key>\S+).:.(?<value>.*?\.)");
foreach (Match project in projects.Matches(input))
{
Console.WriteLine ("{0} : {1}",
project.Groups["key" ].Value,
project.Groups["value"].Value);
}
Try
^(details of )?project.*?((client|duration|environment|technologies|role).*?){2}.*$
One note: This will also match if only one of the terms appears twice.
In C#:
foundMatch = Regex.IsMatch(subjectString, #"\A(?:(details of )?project.*?((client|duration|environment|technologies|role).*?){2}.*)\Z", RegexOptions.Singleline | RegexOptions.IgnoreCase);

Different spellings of Chanukah Regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Hannuka, Chanukah, Hanukkah...Due to transliteration from another language and character set, there are many ways to spell the name of this holiday. How many legitimate spellings can you come up with?
Now, write a regular expression that will recognise all of them.
According to http://www.holidays.net/chanukah/spelling.htm, it can be spelled any of the following ways:
Chanuka
Chanukah
Chanukkah
Channukah
Hanukah
Hannukah
Hanukkah
Hanuka
Hanukka
Hanaka
Haneka
Hanika
Khanukkah
Here is my regex that matches all of them:
/(Ch|H|Kh)ann?[aeiu]kk?ah?/
Edit: Or this, without branches:
/[CHK]h?ann?[aeiu]kk?ah?/
Call me a sucker for readability.
In Python:
def find_hanukkah(s):
import re
spellings = ['hannukah', 'channukah', 'hanukkah'] # etc...
for m in re.finditer('|'.join(spellings), s, re.I):
print m.group()
find_hanukkah("Hannukah Channukah, Hanukkah")
Something like C?hann?uk?kah? matches most of the common cases. There also a bunch of weirder spellings C?hann?uk?kah?|Han[aei]ka|Khanukkah matches almost every spelling I could think of (that had at least half a million hits on google).
((Ch|H|X|Х|Kh|J)[aа](н|n{1,2})(у|ou|[auei])(к|k|q){1,2}[aа]h?)|(חנו?כה)
This regex is much more inclusive and covers all of the following options:
Channuka
Channukah
Channukka
Channukkah
Chanuka
Chanukah
Chanukah
Chanukka
Chanukkah
Chanuqa
Hanaka
Haneka
Hanika
Hannuka
Hannukah
Hannukka
Hannukkah
Hanoukka
Hanuka
Hanukah
Hanukka
Hanukkah
Januka
Khanukkah
Xanuka
Ханука
Ханука
חנוכה
חנכה
Try this:
/^[ck]?hann?ukk?ah?$/i
I think the only approved spellings in English are Hanukkah and Chanukh, so it's something like
/(Ch|H)anuk?kah/
Or maybe even better
/(Chanukah|Hanukkah)/
I like Triptych's answer, but i would take it one step forward... also in python:
def valid(spelling):
import re
regex_spelling = re.compile(r'^[cCkK]{0,1}han{1,2}uk{1,2}ah$')
valid = regex_spelling.match(spelling)
if valid:
print 'Valid spelling'
else:
print spelling, " is not a spelling for the word"
to use it:
valid("hanukkah")