Simple Regex String Manipulation - regex

Just a simple question. I am wondering how to write a regex expression for the phrase 'Ste 5800 Bldg 10 Ste A'.
I would like it to handle both the numeric values and the letter values after the wordphrase. I am just having trouble doing that. I have '\w+\s\d+, but i do not know how to include the letter values.
The end result should be 'Ste 5800','Bldg 10','Ste A'.

Live Demo
Try this:
(\w+) (\d+|[a-zA-Z])\b

This should do the job.
'Ste 5800','Bldg 10','Ste A'.
\w+\s[\d\w]+
If you want to search for 'Ste A' but not 'Ste Aspect'
\b\w+\s\d+|\b\w+\s\w\b

Related

Regex: extract multiple URL strings from a cell of arrays

What is a clean regex pattern for matching URL strings that stops at the first comma? Trying to extract values from an array of arrays in Google Sheets.
Cell A1
{https://www.myshop.com/shop/the_first_shop,Marcus. White's. Shop.,ACTIVE,US};{https://www.myshop.com/shop/a-second-shop,The first! Shop,CLOSED,UK};{EMPTY,ClosedShop,CLOSED,IN}
Desired Output (Cell B1)
https://www.myshop.com/shop/the_first_shop,https://www.myshop.com/shop/a-second-shop
I have figured out how to get a clean array of matching values in my desired output cell using:
=trim(regexreplace(regexreplace(regexreplace(REGEXREPLACE(A2,"/(https?:\/\/[^ ]*)/"," "),";"," "),"}"," "),"{"," "))
But I can't find a regex pattern that stops at a comma. For example, this soution:
"/(https?:\/\/[^ ]*)/"
matches the first URL, but gives me back:
https://www.myshop.com/shop/the_first_shop,Marcus. White's. Shop.,ACTIVE,US https://www.myshop.com/shop/a-second-shop,The first! Shop,CLOSED,UK EMPTY,ClosedShop,CLOSED,IN
I'd go with REGEXREPLACE and use:
=REGEXREPLACE(A1,".*?(?:(https.*?)|$)","$1")
Just a trailing comma to deal with...
=REGEXREPLACE(REGEXREPLACE(A1,".*?(?:(https.*?(,))|$)","$1"),",$","")
A much longer alternative to REGEXREPLACE could be:
=TEXTJOIN(",",,QUERY(TRANSPOSE(SPLIT(SUBSTITUTE(SUBSTITUTE(A1,"{","}"),"}",","),",")),"Select Col1 where Col1 like 'http%'"))
regex pattern that stops at a comma
=REGEXEXTRACT(A1, "(https?:\/\/[^,]*)")

Can you restrict two characters based on their ASCII order in regex?

Let's say I have a string of 2 characters. Using regex (as a thought exercise), I want to accept it only if the first character has an ascii value bigger than that of the second character.
ae should not match because a is before e in the the ascii table.
ea, za and aA should match for the opposite reason
f$ should match because $ is before letters in the ascii table.
It doesn't matter if aa or a matches or not, I'm only interested in the base case. Any flavor of regex is allowed.
Can it be done ? What if we restrict the problem to lowercase letters only ? What if we restrict it to [abc] only ? What if we invert the condition (accept when the characters are ordered from smallest to biggest) ? What if I want it to work for N characters instead of 2 ?
I guess that'd be almost impossible for me to do it then, however bobble-bubble impressively solved the problem with:
^~*\}*\|*\{*z*y*x*w*v*u*t*s*r*q*p*o*n*m*l*k*j*i*h*g*f*e*d*c*b*a*`*_*\^*\]*\\*\[*Z*Y*X*W*V*U*T*S*R*Q*P*O*N*M*L*K*J*I*H*G*F*E*D*C*B*A*#*\?*\>*\=*\<*;*\:*9*8*7*6*5*4*3*2*1*0*\/*\.*\-*,*\+*\**\)*\(*'*&*%*\$*\#*"*\!*$(?!^)
bobble bubble RegEx Demo
Maybe for abc only or some short sequences we would approach solving the problem with some expression similar to,
^(abc|ab|ac|bc|a|b|c)$
^(?:abc|ab|ac|bc|a|b|c)$
that might help you to see how you would go about it.
RegEx Demo 1
You can simplify that to:
^(a?b?c?)$
^(?:a?b?c?)$
RegEx Demo 2
but I'm not so sure about it.
The number of chars you're trying to allow is irrelevant to the problem you are trying to solve:
because you can simply add an independent statement, if you will, for that, such as with:
(?!.{n})
where n-1 would be the number of chars allowed, which in this case would be
(?!.{3})^(?:a?b?c?)$
(?!.{3})^(a?b?c?)$
RegEx Demo 3
A regex is not the best tool for the job.
But it's doable. A naive approach is to enumerate all the printable ascii characters and their corresponding lower range:
\x21[ -\x20]|\x22[ -\x21]|\x23[ -\x22]|\x24[ -\x23]|\x25[ -\x24]|\x26[ -\x25]|\x27[ -\x26]|\x28[ -\x27]|\x29[ -\x28]|\x2a[ -\x29]|\x2b[ -\x2a]|\x2c[ -\x2b]|\x2d[ -\x2c]|\x2e[ -\x2d]|\x2f[ -\x2e]|\x30[ -\x2f]|\x31[ -\x30]|\x32[ -\x31]|\x33[ -\x32]|\x34[ -\x33]|\x35[ -\x34]|\x36[ -\x35]|\x37[ -\x36]|\x38[ -\x37]|\x39[ -\x38]|\x3a[ -\x39]|\x3b[ -\x3a]|\x3c[ -\x3b]|\x3d[ -\x3c]|\x3e[ -\x3d]|\x3f[ -\x3e]|\x40[ -\x3f]|\x41[ -\x40]|\x42[ -\x41]|\x43[ -\x42]|\x44[ -\x43]|\x45[ -\x44]|\x46[ -\x45]|\x47[ -\x46]|\x48[ -\x47]|\x49[ -\x48]|\x4a[ -\x49]|\x4b[ -\x4a]|\x4c[ -\x4b]|\x4d[ -\x4c]|\x4e[ -\x4d]|\x4f[ -\x4e]|\x50[ -\x4f]|\x51[ -\x50]|\x52[ -\x51]|\x53[ -\x52]|\x54[ -\x53]|\x55[ -\x54]|\x56[ -\x55]|\x57[ -\x56]|\x58[ -\x57]|\x59[ -\x58]|\x5a[ -\x59]|\x5b[ -\x5a]|\x5c[ -\x5b]|\x5d[ -\x5c]|\x5e[ -\x5d]|\x5f[ -\x5e]|\x60[ -\x5f]|\x61[ -\x60]|\x62[ -\x61]|\x63[ -\x62]|\x64[ -\x63]|\x65[ -\x64]|\x66[ -\x65]|\x67[ -\x66]|\x68[ -\x67]|\x69[ -\x68]|\x6a[ -\x69]|\x6b[ -\x6a]|\x6c[ -\x6b]|\x6d[ -\x6c]|\x6e[ -\x6d]|\x6f[ -\x6e]|\x70[ -\x6f]|\x71[ -\x70]|\x72[ -\x71]|\x73[ -\x72]|\x74[ -\x73]|\x75[ -\x74]|\x76[ -\x75]|\x77[ -\x76]|\x78[ -\x77]|\x79[ -\x78]|\x7a[ -\x79]|\x7b[ -\x7a]|\x7c[ -\x7b]|\x7d[ -\x7c]|\x7e[ -\x7d]|\x7f[ -\x7e]
Try it online!
A (better) alternative is to enumerate the ascii characters in reverse order and use the ^ and $ anchors to assert there is nothing else unmatched. This should work for any string length:
^\x7f?\x7e?\x7d?\x7c?\x7b?z?y?x?w?v?u?t?s?r?q?p?o?n?m?l?k?j?i?h?g?f?e?d?c?b?a?`?\x5f?\x5e?\x5d?\x5c?\x5b?Z?Y?X?W?V?U?T?S?R?Q?P?O?N?M?L?K?J?I?H?G?F?E?D?C?B?A?#?\x3f?\x3e?\x3d?\x3c?\x3b?\x3a?9?8?7?6?5?4?3?2?1?0?\x2f?\x2e?\x2d?\x2c?\x2b?\x2a?\x29?\x28?\x27?\x26?\x25?\x24?\x23?\x22?\x21?\x20?$
Try it online!
You may replace ? with * if you want to allow duplicate characters.
ps: some people can come up with absurdly long regexes when they aren't the right tool for the job: to parse email, html or the present question.

RegEx to verify: abc123(30x2) and variations there of

I'm using to develop a regex in order to verify a pattern that will match the following:
abc123
Ab3TF56G
BD356-2
abc123(3x4)
Ab3TF56G(24x37)
BD356-2(105x04)
abc123 (3x4)
Ab3TF56G (24x37)
BD356-2 (105x04)
abc123(3x4x10)
Ab3TF56G(24x37x3)
BD356-2(105x04x14)
abc123 (3x4x10)
Ab3TF56G (24x37x3)
BD356-2 (105x04x14)
I'm admittedly terrible at RegEx, but am following the guide at: www.regexr.com, and have come up with this so far:
([A-Za-z0-9])\((\d[x^)]\d+)\)+
Unfortunately, it stops working when I start trying to account for the possible dash and parathentises.
• The alpha-numeric set can be any length
• That sequence can, but does not require a dash followed by an integer
• Which can also be followed by a open & close parentheses with integers separated by the "x" character (basically dimensions)
Any help would be much appreciated.
EDIT
In addition, the following should fail:
abc123 (3x4x10)shs
sdlk234(3x)
sdlk234(3x0)
sdlk234-2 (3x)333
Ab3T F56G
Try this:
([a-zA-Z0-9-]+)\s?(\([\dx]+\))?
See it working here: https://regex101.com/r/pU9oR4/1
Here is a graphical representation: https://www.debuggex.com/r/uVGo8mrIUYhXHxjP
EDIT
After your shouldn't match examples it turns out a bit more harder, so your new pattern should be:
^([a-zA-Z0-9-]+\b)([\s\d-])?(\((?:(?!0)[\d]+)((x(?:(?!0\b)[\d]+))(x(?:(?!0\b)[\d]+))?)\))?$
edited again
See it working here: https://www.debuggex.com/r/dxPPbPw0mUKQPRWg
I also add the validation so it didn't match:
sdlk234(3x0x0)
sdlk234(3x1x0)
sdlk234(0x1x1)
Following your logic of dimensions
101 Regexp Demo
^[\w-]+\s*(\((?!0\b)\d+(x(?!0\b)\d+)+\))?$
(?!0\b): Negative Lookahead ,make sure that after it can't be 0\b
\b:assert position at a word boundary (^\w|\w$|\W\w|\w\W)

REGEX how to extract the part after XYZ

I'am quite new to regex and am trying to extract the 662050,89 from X130503XYZ662050,89 after XYZ using a regex, I tried and wrote .
[a-zA-Z](\d+|,\d)
I can only get 662050. How to get 662050,89 in regex? Thank you in advance
Please note that the XYZ can be any letter and it can be anytimes line XYZ , XXYYZ ect
The regex can be simplier
XYZ(.*)
if XYZ can be anything but numbers, use \D token
\D{3}(.*)
You may try using this reg exp (\d+,\d+). It will work just fine, if there are no float numbers within the X Y and before Z. Hope this will help.
EDIT:
Just keep in mind that the float number must be after the Z. Otherwise you may need to use [\d+,]*($|\z)
You may try matching from the end of the input / word (JavaScript):
[0-9,]*(?:$|\z)
if you use this XYZ(.*) take the group 1 of the match for the 662050,89
or if you use this ([a-zA-Z]+)(\d+[|,]\d+) take the group 2 of the match for the 662050,89.
In the first case you only care for the numbers after the 'XYY', in the second case you care for both 'XYZ' and the numbers after it.
Try this regex ...
/\w([\d\,]+)/

How to negate regex validation string?

I want to replace all the string except the #[anyword]
I have string like this:
yng nnti dkasih tau :)"#mazayalinda: Yg klo ada cenel busana muslim aku mau ikutan dong "#noviwahyu10: Model ! Pasti gk blh klo k
and the #mazayalinda and #noviwahyu10 matches my regex #\w*.
However, I need to get rid all of the string, except for those 2 words above. We need to do the negation, but I am confuses about combining 2 regex, which are the regex to match the #[anyword] and the one to get rid all of the sentence except those 2 words.
Any ideas?
It's not completely clear from the context if this is a viable solution, but when you want to replace everything except a certain pattern it sounds more like you want a regex search rather than a replacement. For example, in python it might look like:
>>> import re
>>> s = 'yng nnti dkasih tau :)"#mazayalinda: Yg klo ada cenel busana muslim aku mau ikutan dong "#noviwahyu10: Model ! Pasti gk blh klo k'
>>> re.findall(r'#\w+', s)
['#mazayalinda', '#noviwahyu10']
Edit: in js, something like this would be more appropriate:
var s = 'yng nnti dkasih tau :)"#mazayalinda: Yg klo ada cenel busana muslim aku mau ikutan dong "#noviwahyu10: Model ! Pasti gk blh klo k';
// code from http://www.activestate.com/blog/2008/04/javascript-refindall-workalike
var rx = new RegExp("#\\w+", "g");
var matches = new Array();
while((match = rx.exec(s)) !== null){
matches.push(match);
}
After this, matches contains all the matched strings. You can always join it back together if needed into a single string.
It seems to me that you want to use capturing groups, not exactly negate the rest of the string, a regex like:
[^#]*(#\w+)[^#]*
Will capture those entries in capturing groups, and then, depending on your language, you can access each of the captured strings: http://rubular.com/r/qHKb35OK3g
use this regex (?<=^|#\w+\b)[^#]+ and union matches
Well, I'm not entirely sure I understand the question, but if you want to keep just the names and use the rest just "to get rid of it", why don't you simply save the names and ignore the rest ?
If you're keen on matching the "non-name pattern" - this seems to be an excerpt from some kind of conversation, where every message starts with ':'. If so, then using this should simply do the trick.
:[^#]*
You can use zero-width negative assertion, i.e., #(?!mazayalinda|noviwahyu10)\w.
This requires some sophisticated regular expression engine like Perl, Ruby, Java and so on.
If you have classic engine, the way as #pcalcao sais is best.