How would I translate this to RegEx? - regex

I'm having trouble to translate this to RegEx:
Actual file format (For excel spreadsheet):
[demo-_File.xls]'SheEt_nAme'!CA
[samPle file 2.xls]'demo Sheet'!D
Inside the bracket and single quote:
Accept any characters from a to z (Regardless of case)
Accepts special characters -_. and space.
After the exclamation mark, it should accept up to 4 capital characters.

Here is my suggestion:
\[[\w\s&.-]*\]'[\w\s&.-]+'![A-Z]{1,4}
In JS:
var re = /\[[\w\s&.-]*\]'[\w\s&.-]+'![A-Z]{1,4}/gi;
[\w\s&.-]* will match all alphanumeric characters and _ with spaces, &, . and -. The [A-Z]{1,4} will match 1 to 4 uppercase English letters. The i option will make matching case-insensitive. If you want to allow digits in the last part, just revert them to [A-Z0-9]{1,4}.
See demo

Related

Negating a complex regex containing three parts

I need a regex which is matched when the string doesn't have both lowercase and uppercase letters.
If the string has only lowercase letters -> should be matched
If the string has only uppercase letters -> should be matched
If the string has only digits or special characters -> should be matched
For example
abc, ABC, 123, abc123, ABC123&^ - should match
AbC, A12b, AB^%12c - should not match
Basically I need an inverse/negation of the following regex:
^(?=.*[a-z])(?=.*[A-Z]).+$
Does not sound like any lookarounds would be needed.
Either match only characters that are not a-z or only characters, that are not A-Z.
^(?:[^a-z]+|[^A-Z]+)$
See this demo at regex101 (used + for one or more)
You may use
^(?!.*[A-Z].*[a-z])(?!.*[a-z].*[A-Z])\S+$
Or
^(?=(?:[^a-z]+|[^A-Z]+)$).*$
See the regex demo #1 and regex demo #2
A lookaround solution like this can be used in more complex scenarios, when you need to apply more restrictions on the pattern. Else, consider a non-lookaround solution.
Details
^ - start of string
(?!.*[A-Z].*[a-z]) - no uppercase followed with a lowercase letter
(?!.*[a-z].*[A-Z]) - no lowercase letter followed with an uppercase one
(?=(?:[^a-z]+|[^A-Z]+)$) - a positive lookahead that requires 1 or more characters other than lowercase ASCII letters ([^a-z]+) to the end of the string, or 1 or more characters other than uppercase ASCII letters ([^A-Z]+) to the end of the string
.+ - 1+ chars other than line break chars
$ - end of string.
You can use this regex
^(([A-Z0-9?&%^](?![a-z]))+|([a-z0-9?&%^](?![A-Z]))+)$
You can test more cases here.
I've only added the characcter ?&%^ as possible character, but you could add which ever you like.
I would go with:
^(?:[^a-z]+?|[^A-Z]+?)$
It translates to "If the entire string is composed of non-lowercase letters or non-uppercase letters then match the string."
Lazy quantifiers +? are used so that the end-string $ anchor is obeyed when the multiline flag is enabled. If you're only validating a single-line string the you can simply use + without the question mark.
If you have a whitelist of specific allowed special chars then change [^A-Z] into [A-Z0-9()_+=-] and list the allowed special chars.
https://regex101.com/r/Wg6tLn/1

Ruby - split on nonalphanumeric characters excluding international characters?

This is my regex so far which will split on non-alphanumeric characters, including international characters (ie Korean, Japanese, Chinese characters).
title = '[MV] SUNMI(선미) _ 누아르(Noir)'
title.split(/[^a-zA-Z0-9 ']/)
this is the regex to match any international character:
[^\x00-\x7F]+
Which I got from: Regular expression to match non-English characters? Let'a ssume this is 100% correct (no debating!)
How do I combine these 2 so I can split on non-alphanumeric characters, excluding international characters? The easy part is done. I just need to combine these regex's somehow.
My expected output would be something like this
["MV", "SUNMI", "선미", "누아르", "Noir"]
TLDR: I want to split on non-alphanumeric characters only (english letters, foreign characters should not be split on)
(?:[^a-zA-Z0-9](?<![^\x00-\x7F]))+
https://regex101.com/r/EDyluc/1
What is not matched (remains from split) is what you want to keep.
Explained:
(?:
[^a-zA-Z0-9] # Not Ascii AlphaNum
(?<! [^\x00-\x7F] ) # Behind, not not Ascii range (Ascii boundary)
)+
Let me know if you need a more detailed explanation.
So basically you want to split on all ascii but non-alphabet characters. You can use this regex which selects all characters within ascii range.
[ -#[-`{-~]+
This regex having ranges space to # then ignoring all uppercase letters then picks all characters from [ to backtick then ignores all lowercase letters then picks all characters from { to ~ as can be seen in ascii table.
In case you want to exclude till extended ascii characters, you can change ~ in regex with ÿ and use [ -#[-{-ÿ]+` regex.
Demo
Check out these Ruby codes,
s = '[MV] SUNMI(선미) _ 누아르(Noir)'
puts s.split(/[ -#\[-`{-~]+/)
Prints,
MV
SUNMI
선미
누아르
Noir
Online Ruby Demo

Regex for a string with alpha numeric containing a '.' character

I have not been able to find a proper regex to match any string not starting and ending with some condition.
This matches
AS.E
23.5
3.45
This doesn't match
.263
321.
.ASD
The regex can be alpha-numeric character with optional '.' character and it has to be with in range of 2-4(minimum 2 chars & maximum 4 chars).
I was able to create one ->
^[^\.][A-Z|0-9|\.]{2,4}$
but with this I couldn't achieve mask '.' character at the end of regex.
Thanks.
Maybe not the most optimized but a working one. Created step by step:
The first character should be alphanumeric
^[a-zA-Z0-9]
0, 1 or 2 character alphanumeric or . but not matching end of string
[a-zA-Z0-9\.]{0,2}
an alphanumeric character matching end of string
[a-zA-Z0-9]$
Concatenate all of this to obtain your regex
^[a-zA-Z0-9][a-zA-Z0-9\.]{0,2}[a-zA-Z0-9]$
Edit: This regex allows multiple dots (up to 2)
If I guessed correctly, you want to match all words that are
Between 2 and 4 characters long ...
... and start and end with a character from [A-Z0-9] ...
... and have characters from [A-Z0-9.] in the middle ...
... and are not preceded or followed by a ..
Try this regex to match all these substrings in a text:
(?<=^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9](?=$|[^.])
However, note that this will match the AA in .AAAA.. If you don't want this match, then please give more details on your requirements.
When you are only interested in the number of matches, but not the matched strings, then you could use
(^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]($|[^.])
If you have one string, and want to know whether that string completely matches or not, then use
^[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]$
If there may be at most one . inside the match, replace the part [A-Z0-9.]{0,2} with ([A-Z0-9]?[A-Z0-9.]?|[A-Z0-9.]?[A-Z0-9]?).
You can use this pattern to match what you say,
^[^\.][a-zA-Z0-9\.]{2,4}[^\.]$
Check the result here..
https://regex101.com/r/8BNdDg/3

Regex: Need to validate barcode

I have the following barcode that I need to validate via regex:
TE1310 2000183B 804F58000020183B 20120509 0013.0002.0000 20161201
We're having an issue with our barcode scanners occassionally cutting off some characters from barcodes, so I need to validate it via the following regex rules:
Starts with "TE1310"
Space
2nd set of characters is exactly 8 length. Can contain numbers or letters
Space
3rd set contains exacly 16 characters. Can be numbers or letters
Space
4th set must be exactly "0013.0002.0000"
Space
5th and final set contains 8 characters. Numeric only
I have the following regex & I'm pretty close but not sure how to do #7 above (0013.0002.0000). I placed "????" into my regex below where I'm unsure of how to do this part:
TE1310\s[A-Za-z0-9]{8}\s[A-Za-z0-9]{16}\s????\s\d{8}
Any idea how to do this?
Thanks
I'm assuming a regular expression syntax similar to JavaScript, the basic ideas can be converted into any other regex that I know of.
1: Starts with TE1310
^TE1310
^ is used to match only at the beginning of a string, the characters that follow are matched literally.
2: Space
/^TE1310 /
I'm adding the / regex delimiters to show that there is in fact a space character contained within the regex. If your regex syntax supports alternative delimiters, you might see something along the lines of ~^TE1310 ~ instead.
3: 2nd set of characters is exactly 8 length. Can contain numbers or letters
/^TE1310 [a-zA-Z0-9]{8}/
[abc] is used to select a character in the provided set, the use of a-zA-Z0-9 is to match any letter (upper or lower case) or number.
{n} is used to repeat the previous selector n times.
4: Space
/^TE1310 [a-zA-Z0-9]{8} /
5: 3rd set contains exactly 16 characters. Can be numbers or letters
/^TE1310 [a-zA-Z0-9]{8} [a-zA-Z0-9]{16}/
6: Space
/^TE1310 [a-zA-Z0-9]{8} [a-zA-Z0-9]{16} /
7: 4th set must be exactly 0013.0002.0000
/^TE1310 [a-zA-Z0-9]{8} [a-zA-Z0-9]{16} 0013\.0002\.0000/
\. is used to escape the . which is a selector for any non-newline character. If you're building the Regex in a string, you may need to double escape the \ character, so it may be \\. instead of \.
8: Space
/^TE1310 [a-zA-Z0-9]{8} [a-zA-Z0-9]{16} 0013\.0002\.0000 /
9: 5th and final set contains 8 characters. Numeric only
/^TE1310 [a-zA-Z0-9]{8} [a-zA-Z0-9]{16} 0013\.0002\.0000 \d{8}/
\d matches numbers, it's equivalent to [0-9]. Similarly to \. you may need to double escape the \ character, which would be \\d instead.
10: End of string
You didn't mention it explicitly, but I assume the match should only match lines that exactly match this pattern, and aren't followed by trailing numbers/letters:
/^TE1310 [a-zA-Z0-9]{8} [a-zA-Z0-9]{16} 0013\.0002\.0000 \d{8}$/
$ is used to match the very end of the string.
#7 is trivial, it should be simply 0013\.0002\.0000 you have to make sure to escape your periods, and escape your escape characters if that's what the language requires
So, try
TE1310\s[A-Za-z0-9]{8}\s[A-Za-z0-9]{16}\s0013\.0002\.0000\s\d{8}
assuming the rest of the points are correct, of course.
Also, as Sednus said, you might want to match the beginning and end of the string. the conventional symbols are ^ for beginning and $ for the end, but I'd check a reference for your particular language just in case.
If you don't do that, the regex will find any TE1310 2000183B 804F58000020183B 20120509 0013.0002.0000 20161201 in a larger string, such as
asgsdaTE1310 2000183B 804F58000020183B 20120509 0013.0002.0000 20161201qeasdfa

How to write regular expression to match only numbers, letters and dashes?

I need an expression that will only accept:
numbers
normal letters (no special characters)
-
Spaces are not allowed either.
Example:
The regular expression should match:
this-is-quite-alright
It should not match
this -is/not,soålright
You can use:
^[A-Za-z0-9-]*$
This matches strings, possibly empty, that is wholly composed of uppercase/lowercase letters (ASCII A-Z), digits (ASCII 0-9), and a dash.
This matches (as seen on rubular.com):
this-is-quite-alright
and-a-1-and-a-2-and-3-4-5
yep---------this-is-also-okay
And rejects:
this -is/not,soålright
hello world
Explanation:
^ and $ are beginning and end of string anchors respectively
If you're looking for matches within a string, then you don't need the anchors
[...] is a character class
a-z, A-Z, 0-9 in a character class define ranges
- as a last character in a class is a literal dash
* is zero-or-more repetition
regular-expressions.info
Anchors, Character Class, Repetition
Variation
The specification was not clear, but if - is only to be used to separate "words", i.e. no double dash, no trailing dash, no preceding dash, then the pattern is more complex (only slightly!)
_"alpha"_ separating dash
/ \ /
^[A-Za-z0-9]+(-[A-Za-z0-9]+)*$
\__________/| \__________/|\
"word" | "word" | zero-or-more
\_____________/
group together
This matches strings that is at least one "word", where words consists of one or more "alpha", where "alpha" consists of letters and numbers. More "words" can follow, and they're always separated by a dash.
This matches (as seen on rubular.com):
this-is-quite-alright
and-a-1-and-a-2-and-3-4-5
And rejects:
--no-way
no-way--
no--way
[A-z0-9-]+
But your question is confusing as it asks for letters and numbers and has an example containing a dash.
This is a community wiki, an attempt to compile links to related questions about this "URL/SEO slugging" topic. Community is invited to contribute.
Related questions
regex/php: how can I convert 2+ dashes to singles and remove all dashes at the beginning and end of a string?
-this--is---a-test-- becomes this-is-a-test
Regex for [a-zA-Z0-9-] with dashes allowed in between but not at the start or end
allow spam123-spam-eggs-eggs1 reject eggs1-, -spam123, spam--spam
Translate “Lorem 3 ipsum dolor sit amet” into SEO friendly “Lorem-3-ipsum-dolor-sit-amet” in Java?
Related tags
[slug]