Aweber Regex Puzzle - regex

I'm using Aweber's automatic email parsing for Unbounce form submissions and it seems that their default regex is a bit wonky:
They match email with \nemail:\s+(.+?)\n and name with \nname:\s+(.+?)\n
The problem is that because I'm not asking users for their name, their regex automatically grabs the next line, which is ===== FORM DATA =====, so it emails users with "Hi ===== FORM DATA =====!"
Here's what a sample Unbounce email looks like:
page_name:
page_id: 2b78ddde-e7bb-11e1-9fde-12313e00ec56
page_url: http://www1.sample.com
variant: C
email: sample#gmail.com
name:
===== FORM DATA =====
email: ["sample#gmail.com"]
ip_address: 88.253.**.**
--
The Unbounce Team
Toll Free 1-888-515-9161
support#unbounce.com
http://unbounce.com
How do I modify their regex so that it stops at the end of the line if there's no value present?

Change the name regex to the following:
\nname:[ \t]+(.+?)\n
The change here is to replace \s with [ \t], because \s will match newlines.
This will cause the match to fail if a name is not provided, if you would like it to still match but put an empty string into the group, you can use the following:
\nname:[ \t]*(.*?)\n
As noted by Evandro Silva, you can make this regex more efficient by replacing the .+? or .*? with [^\n]+ or [^\n]*, respectively.

Try regex pattern [\n\r]name:[^\S\n\r]*([^\n\r]*)

Related

How to match username which is enclosed in special chars

I try to match the username of users on YouNow from a specific field.
I extracted this html, I try to extract the username _You Won
"\n\t\t\t\t\t\t14\n\t\t\t\t\t\t_You Won\n\t\t\t\t\t"
This is my regex attempt:
(\d+)[\\n\\t]+([\W\w]+[^\\n\\t"$])
This worked fine, first I match a number which is the level, then I match the username. However, if the username ends with either t or n then it does not get the last letter. So user game 1n would get cut down to game 1
Does someone know how I can fetch the username correctly?
Play it:
https://regex101.com/r/j8rufa/2
You could use Positive Lookahead at the end instead of [^\\n\\t"$].
Your code will be:
(\d+)[\\nt]+([\W\w]+(?=\\n\\t))
Demo: https://regex101.com/r/j8rufa/4
You can also use Positive Lookbehind to further enhance the code to ensure that the whole name is matched. For example, if the name is something like t_You Won, it will be matched without any issues:
(\d+)[\\nt]+(?<=\\t)([\W\w]+(?=\\n\\t))
Demo: https://regex101.com/r/j8rufa/6

How to build this regex?

I want to match the emails in following texts,
uma#cs.stanford.edu - match
uma at cs.Stanford.edu - match
http://infolab.stanford.edu/~widom/yearoff.h
we
genale.stanford.edu
n <A href="mailto:cheriton#cs.stanford.edu - match
hola # kirti.edu - match
Now I want to capture 2 parts of email address only like (uma) and (cs.stanford) in the email uma#cs.stanford.edu.
My current pattern is :
(\w+)[(\s+at\s+)|(\s*#\s*)]+(\w+|\w+\.\w+).edu
But it matches the string - infolab.stanford.edu - which I don't want.
Can anybody suggest any modification on this?
As long as you understand that this regex doesn't verify the correctness of your email address, but merely acts as a quick first line of defense against malformed addresses, an easy fix to your regex is as follows:
([\w.]+)(?:\s+at\s+|\s*#\s*)(\w+|\w+\.\w+).edu
In particular your regex was missing addresses with usernames containing . (which for example my main email address uses), as well as had a messed up middle part (pretending it's a character class and something weird about letting it repeat??). You can see the results here: http://refiddle.com/2js1

How to define stash issue regex to match only from begining?

The default regex in Stash to match JIRA ID is
JVM_SUPPORT_RECOMMENDED_ARGS="-Dintegration.jira.key.pattern=\"((?<!([a-z]{1,10})-?)[a-z]+-\d+)\""
But it matches regardless where the JIRA ID's location.
I want it only matches from beginning:
JIRA-1 what ever: match!
something JIRA-1 else: not match
How to edit the regex?
Following don't work
\"^((?<!([a-z]{1,10})-?)[a-z]+-\d+)\"
and
\"(^(?<!([a-z]{1,10})-?)[a-z]+-\d+)\"
Solution:
^[a-z]+-\d+ will do.
If you want to match the JIRA-<id> only from the beginning you should try:
\"^JIRA-(\d+)\"

Regex to parse certain fields of a log file

I have this log line:
blabla#gmail.com, Portal, qtp724408050-38, com.blabla.search.lib.SearchServiceImpl .logRequest, [Input request is lookupRequestDTO]
I need to find a regex that grabs that email, then matches lookupRequestDTO ignoring everything in between.
Currently my regex grabs the whole line:
([\w-\.]+)#gmail.com,(.+)lookupRequestDTO
How do I not match anything in between the email and lookupRequestDTO ?
What about this?
([^,]+).*?lookupRequestDTO
[^,]+ matches everything up until the first comma so it should get you the email
It assumes lookupRequestDTO is a criteria for your search. If it is a variable you want to retrieve, you could use this :
([^,]+).*?\[Input request is ([^\]]+)
Assuming you're using PCRE (php, perl, etc., and this should work in javascript):
([\w-\.]+?#gmail\.com),(?:.+)(lookupRequestDTO)
Out of capture groups 1 and 2, you'll get:
MATCH 1
blabla#gmail.com
lookupRequestDTO
Working example: http://regex101.com/r/yW9eU3

Regex empty string or email

I found a lot of Regex email validation in SO but I did not find any that will accept an empty string. Is this possible through Regex only? Accepting either empty string or email only? I want to have this on Regex only.
This regex pattern will match an empty string:
^$
And this will match (crudely) an email or an empty string:
(^$|^.*#.*\..*$)
matching empty string or email
(^$|^[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.(?:[a-zA-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)$)
matching empty string or email but also matching any amount of whitespace
(^\s*$|^[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.(?:[a-zA-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)$)
see more about the email matching regex itself:
http://www.regular-expressions.info/email.html
The answers above work ($ for empty), but I just tried this and it also works to just leave empty like so:
/\A(INTENSE_EMAIL_REGEX|)\z/i
Same thing in reverse order
/\A(|INTENSE_EMAIL_REGEX)\z/i
this will solve, it will accept empty string or exact an email id
"^$|^([\w\.\-]+)#([\w\-]+)((\.(\w){2,3})+)$"
I prefer /^\s+$|^$/gi to match empty and empty spaces.
console.log(" ".match(/^\s+$|^$/gi));
console.log("".match(/^\s+$|^$/gi));
If you need to cover any length of empty spaces then you may want to use following regex:
"^\s*$"
If you are using it within rails - activerecord validation you can set
allow_blank: true
As:
validates :email, allow_blank: true, format: { with: EMAIL_REGEX }
Don't match an email with a regex. It's extremely ugly and long and complicated and your regex parser probably can't handle it anyway. Try to find a library routine for matching them. If you only want to solve the practical problem of matching an email address (that is, if you want wrong code that happens to (usually) work), use the regular-expressions.info link someone else submitted.
As for the empty string, ^$ is mentioned by multiple people and will work fine.