regex to match some string - regex

I am working a project that need to match certain string in the output..
here the sample:
user code timestamp Action Name S#TPLC Field Name User code group profile
SNGLASK 2012-05-30-20.33.53.003000 Insert User I TEST5 DISPLAY
SNGLASK 2012-05-23-22.06.44.422000 Change Password RSO part U LERAPR SNGCHIS FULL_AUTH
SNGLASK 2012-05-30-20.34.39.066000 Insert User Group Profil I *NONE
basically i have a application that need to understand that each row after the space is belong to next column.
Then, after action name everything can be treated as other.
hence, i have come out a regex format like below:
REGEX = ^([^\s]+)\s+([^\s]+)\s+([^\s]+)s(.*)$
FORMAT = userCode::"$1" TimeStamp::"$2" ActionName::"$3" Others::"$4"
The strategy is recognize the string then ignore the space after that. However, this thing work until action name as they might be space between the action name.
Hence, my problem is, how to use regex to let it recognize the string within the action name like i need "insert user" as an input & "change password RSO part" as another input.

Do multipart words like this:
((\S+\s)+)
which says one or more word, separated with one space.
so the regex whould be:
^((\S+\s)+)\s+(\S+)\s+((\S+\s)+)\s+(.*)$

Related

Extract digits in between 2 different parameters

I have all data being imported into one cell as:
"<blank space><email address><blank space><CustomerId><blank space><(email address)><line break for next entry>"
Example:
email1#provider.com 12345678 (email1#provider.com)
email224#provider.com 23902490 (email224#provider.com)
I need to extract only the customer ID's, while separating them with a comma, so I tried the following: regexreplace(A2,"([^[:digit:]])",","), however, this also extracts the numbers associated with the emails, so it returns me:
,,,,,1,,,,,,,,,,,,,,12345678,,,,,,,1,,,,,,,,,,,,,,
,,,,,224,,,,,,,,,,,,,,23902490,,,,,,,224,,,,,,,,,,,,,,
Since the email address is set by the user, I don't have control how many digits or if only digits are used in it. I can't seem to understand how to isolate the CustomerIds alone.
Please help!
Edit1:
CustomerID: 64-bit int field, randomly assigned to a client, therefore checking by the length of the string would not work.
Edit2:
For now, I am using the formula below, but I would still be interested in a solution using Regex.
filter(transpose(split($B$4," ")),isnumber(transpose(split($B$4," "))))
If they are separated by a space you should be able to set the space to be your delimiter and extract from there.
https://zapier.com/blog/split-text-excel-zapier/
use:
=ARRAYFORMULA(TEXTJOIN(", ", 1, IFERROR(REGEXEXTRACT(A1:A2, "(?s)(\d{8})"))))

Filter an Airtable database for a person with two given family names

I'm receiving the values name and surname captured from a form. Then I'm I'm querying my an Airtable base containing a list of wedding guests using Airtable's airtable.js. The guest list contains a number of guests with double-barreled family names i.e two names split by a hyphen e.g. name-secondname. The complication arises when a guest enters only the first part of their family name e.g. name which Airtable does not recognise. I thought I would try Airtable's built in REGEX inside a filterByFormula along side AND to match name and surname string up until the hyphen. Something like this:
`(AND({name} = "${name}",FIND(REGEX_REPLACE("${surname}",'[^-]*','' ),{surname})>0))`
No joy though. Any Ideas? Thanks.
The answer, almost invariably, will have something to do with escaped characters. Try one before the hyphen:
`(AND({name} = "${name}",FIND(REGEX_REPLACE("${surname}",'[^\\-]*','' ),{surname})>0))`
Syntax Reference

Can I make my Alteryx RegEx parse conditional?

I receive messages with the fields below. I want to group and extract the user inputs. Majority of submissions contain all fields and the regex works great. Problem comes in when someone removes additional lines if let's say they only need to fill in down to Amount 1
Name:
Number:
Amount:
Old Code:
Code 1:
Amount 1:
Code 2:
Amount 2:
Code 3:
Amount 3:
Code 4:
Amount 4:
I'm using Alteryx to parse the message contents and have success with my current regex but want to be ready for unavoidable user submission inconsistency
Name:(.+)\sNumber:(.+)\sAmount:(.+)\sOld Code:(.+)\sCode 1:(.+)\sAmount 1:(.+)\sCode 2:(.*?)\sAmount 2:(.*?)\sCode 3:(.*?)\sAmount 3:(.*?)\sCode 4:(.*?)\sAmount 4:(.*?[^-]*)
Is it possible to have Alteryx return parsed results from a message even if a listed field is deleted?
Alteryx issue with new cascading regex
Anyway, you can always do a cascading nested optional grouping around the
lines to just match what's valid up to a point.
This expects the form lines to be in order. If it's not, a different type
of regex is needed - an out-of-order regex ( see the bottom regex ) .
Both these regex are for Perl 5.10
(?-ms)Name:(.*)(?:\s+Number:(.*)(?:\s+Amount:(.*)(?:\s+Old[ ]+Code:(.*)(?:\s+Code[ ]+1:(.*)(?:\s+Amount[ ]+1:(.*)(?:\s+Code[ ]+2:(.*)(?:\s+Amount[ ]+2:(.*)(?:\s+Code[ ]+3:(.*)(?:\s+Amount[ ]+3:(.*)(?:\s+Code[ ]+4:(.*)(?:\s+Amount[ ]+4:(.*?[^-]*))?)?)?)?)?)?)?)?)?)?)?
https://regex101.com/r/9oKXEE/1
For out-of-order matching, use this
(?m-s)\A(?:[\S\s]*?(?:(?(1)(?!))^\h*Name\h*:\h*(.*)|(?(2)(?!))^\h*Number\h*:\h*(.*)|(?(3)(?!))^\h*Amount\h*:\h*(.*)|(?(4)(?!))^\h*Old\h*Code\h*:\h*(.*)|(?(5)(?!))^\h*Code\h*1\h*:\h*(.*)|(?(6)(?!))^\h*Amount\h*1\h*:\h*(.*)|(?(7)(?!))^\h*Code\h*2\h*:\h*(.*)|(?(8)(?!))^\h*Amount\h*2\h*:\h*(.*)|(?(9)(?!))^\h*Code\h*3\h*:\h*(.*)|(?(10)(?!))^\h*Amount\h*3\h*:\h*(.*)|(?(11)(?!))^\h*Code\h*4\h*:\h*(.*)|(?(12)(?!))^\h*Amount\h*4\h*:\h*(.*?))){1,12}
https://regex101.com/r/f2rG1v/1
In this situation, you don't need to use Regex straight off the bat and given the inconsistent data it could take a while to perfect one regex term...
You can do it this way instead:
- RecordID first,
- Then you can use a Text 2 Columns with a new-line (\n) delimiter. Configure this to "Split to Rows".
- You can then use a Text to Columns to split on the delimter ":".
That will handle additional rows entered etc. At that stage, you can figure out how to clean up the results (filter to remove null lines, multi-row to tag records, cross-tab to create a table etc...). If you want to flag any unknown rows, you can have a Text Input with the required rows and use Find/Replace or Join to separate the data.

Email extraction from csv using regex

I have the following regex:
/(.+?)((?:(?:[^<>()\[\]\\.,;:\s#"]+(?:\.[^<>()\[\]\\.,;:\s#"]+)*)|(?:".+"))#(?:(?:\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(?:(?:[a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,})))/gi
Used to extract email address and name from the following different formats and avoid duplicates,
"FName LName" <fname.lname#gmail.com>, "Eg Name" <egname#gmail.com>,
Closed Call<close_call#gmail.co.um>
toys#urs.com
serima<serima#google.com>
One <one#one.com>;Two <two#two.com>; "New <new#new.com>"
Have couple of problems with it:
On test case #2 t gets trimmed, getting only oys#urs.com, this happens only on the first email address.
Second capturing group returns Name (if present along with a < if present) and then had to strip out the < separately
Any way to extract the above as follows, in much more elegant/efficient way
[{'name':'FName LName', 'email':'fname.lname#gmail.com'},
{'name':'Eg Name', 'email':'egname#gmail.com'},
{'name':'Closed Call', 'email':'close_call#gmail.co.um'}]
[{'name':'', 'email':'toys#urs.com'}]
[{'name':'serima', 'email':'serima#google.com'}]
[{'name':'One', 'email':'one#one.com'},
{'name':'Two', 'email':'two#two.com'},
{'name':'New', 'email':'new#new.com'}]
Note: Name may/maynot be enclosed with double quotes, there may/may not be space between the name and <
Problem#1 solved by making the first capturing group a little more greedy,
/(.*?)((?:(?:[^<>()\[\]\\.,;:\s#"]+(?:\.[^<>()\[\]\\.,;:\s#"]+)*)|(?:".+"))#(?:(?:\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(?:(?:[a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,})))/gi
Problem#2 will save it for tonight's dream-time ;-)

variable number of capturing groups

I have a xpath expression which I want to use to extract City and date from a td which contains a string of this kind:
City(may contain spaces and may be missing, but the following space is always present) on 2013/07/20
So far, I got to the following solution for extracting the date, which works partially:
//path/to/my/td/text()/replace(.,'(.*) on (.*)','$3')
This works when City is present, but when City is missing I get "on 2013/07/20" as a result.
I think this is because the first capturing group fails and so the number of groups is different.
How can I get this expression to work?
I did not fully check your regex, but it looks fine at first sight. Anyway, you can also go an easier way if you only want to get the date by extracting the text after "on ":
//path/to/my/td/text()/substring-after(.,'on ')
edit: or you may go the substring-way and select the last 10 characters of the content:
//path/to/my/td/text()/substring(., string-length(.) - 9)