Regex to match *.sublime-settings files - regex

Packages/Material Theme/widgets/Widget - Material-Theme.sublime-settings
Packages/DA UI/Widget - DA.sublime-settings
Packages/DA UI/Widget - DA (Windows).sublime-settings
Packages/TextMate/TextMate Syntax Definition (JSON).sublime-settings
Packages/DA UI/Widget - DA (Linux).sublime-settings
Packages/DA UI/Widget - DA (OSX).sublime-settings
Packages/User/YAML.sublime-settings
Could anyone have the time to implement a regex to match the first 4 and not the last 3.
rules
after Packages there must be at least one folder.
the folder right after Packages must not be "User".
the name (after the final slash) can optionally contain a pair of parenths right before the . that may not contain the name of other platforms; if the current platform is Windows, OSX and Linux are not accepted (we assume that the current platform is represented by a variable platform)
This is my try:
(?i)Packages/(?:[^/]+?/([^()]+?(?: \((?!OSX|Linux)\))?)\.sublime-settings)
It doesn't match 3 and 4 and matches the last.
Thanks in advance :)

You can use
(?i)^Packages/(?!User)(?:(?![^/]+$).)+/[^(.]+(?:\((?!OSX|Linux).+\))?\.sublime-settings
https://regex101.com/r/kgJMq4/2
(?!User) - Negative lookahead for User right after the first slash
(?:(?![^/]+$).)+ - Repeat any character that's not followed by non-slash characters and the end of the string
/[^(.]+ - Match the final slash, then repeat non-parentheses, non-dot characters, so as to check for an optional group:
(?:\((?!OSX|Linux).+\))? - Parentheses surrounding a phrase that does not start with OSX or Linux

Related

matching numbers after nth occurence of a certain symbol in a line

I'm not sure if using regex is the correct way to go about this here, but I wanted to try solving this with regex first (if it's possible)
I have an edifact file, where the data (in bold) in certain fields in some segments need to be substituted (with different dates, same format)
UNA:+,? '
UNB+UNOC:3+000000000+000000000+20190801:1115+00001+DDMP190001'
UNH+00001+BRKE:01+00+0'
INV+ED Format 1+Brustkrebs+19880117+E000000001+**20080702**+++1+0'
FAL+087897044+0000000++name+000000000+0+**20080702**++1+++J+N+N+N+N+N+++0'
INL+181095200+385762115+++0'
BEE+20080702++++0'
BAA+++J+J++++++J+++++++J++0'
BBA++++++++J++++++J+J++++++J+++++J+++J+J++++++++J+0'
BHP+J+++++J+++++J+++++0'
BLA+++J+++++++++0'
BFA++++++++++++J++0'
BSA++J+++J+J+++0'
BAT+20190801+0'
DAT+**20080702**++++0'
UNT+000014+00001'
UNZ+00001+00001'
at first I was able to match those fields using a positive lookahead and a lookbehind (I had different expressions for matching each date).
Here, for example is the expression I intially used to match the date in the "FAL" segment: (?<=\+[\d]{1}\+)\d{8}(?=\+\+), but then i saw that this date is sometimes preceeded by 9 digits, and sometimes by 1 (based on version) and followed by a either ++ or a + and a date so I added a logiacl OR like this: (?<=\+[\d]{9}\+|\+[\d]{1}\+)\d{8}(?=\+[\d]{8}\+|\+\+)and quickly realized it's not sustainable because I saw that these edifact files vary (far beyond only either 9 and 1 digits)
(I have 6 versions for each type, and i have 6 types total)
Because I have a scheme/map indicating what each version should be built like and I know on what position (based on the + separator) the date is written in each version, I thought about maybe matching the date based on the +, so after the 7th occurence (say in the FAL segment) of plus in a certain line, match the next 8 digits.
is this possible to achieve with regex? and if yes, could someone please tell me how?
I suggest using a pattern like
^((?:[^+\n]*\+){7})\d{8}(?=\+(?:\d{8})?\+)
where {7} can be adjusted to the value you need for each type of segments, and replace with the backreference to Group 1. In Python, it is \g<1>20200101 (where 20200101 is your new date), in PHP/.NET, it is ${1}20200101. In JS, it will be just $1.
To run on a multiline text, use m flag. In Python regex, you may embed it like (?m)^((?:[^+\n]*\+){7})\d{8}(?=\+(?:\d{8})?\+).
See the Python regex demo
Details
^ - start of string/line
((?:[^+\n]*\+){7}) - Group 1: 7 repetitions of any chars other than + and newline, and then a +
\d{8} - 8 digits
(?=\+(?:\d{8})?\+) - that are followed with +, and optional chunk of 8 digits and a +.

Regex pattern to match valid version numbers

I'm looking for a regex pattern that would match a version number.
The solutions I found here don't really match what I need.
I need the pattern to be valid for single numbers and also for numbers followed by .
The valid numbers are
1
1.23
1.2.53.4
Invalid numbers are
01
1.02.3
.1.2
1.2.
-1
Consider:
^[1-9]\d*(\.[1-9]\d*)*$
Breaking that down:
^ - Start at the beginning of the string.
[1-9] - Exactly one of the characters 1 thru 9.
\d* - More digits.
( - Beginning of some optional extra stuff
\. - A literal dot.
[1-9] - Exactly one of the characters 1 thru 9.
\d* - More digits.
) - End of the optional extra stuff.
* - There can be any number of those optional extra stuffs.
$ - And end at the end of the string.
Beware
Some of this syntax differs depending what regex engine you are using. For example, are you using the one from Perl, PHP, Javascript, C#, MySQL...?
In my experience, version numbers do not fit the neat format you described.
Specifically, you get values like 0.3RC5, 12.0-beta6, 2019.04.15-alpha4.5, 3.1stable, V6.8pl7 and more.
If you are validating existing data, make sure that your criteria fit the conditions you've described. In particular, if you are following "Semantic Versioning", be aware that versions which are zeros are legal, so 1.0.1, that "Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.", and that "1" is not a legal version number.
Be warned that the above will also match stupidly long version numbers like 1.2.3.4.5.6.7.8.9.10.11.12.13.14. To prevent this, you can restrict it, like so:
^[1-9]\d*(\.[1-9]\d*){0,3}$
This changes the * for "any number of optional extra dots and numbers" to a range from zero to three. So it'd accept 1, 1.2, 1.2.3, and 1.2.3.4, but not 1.2.3.4.5.
Also, if you want zeros to be legal but only if there are no other numbers (so 0.3, 1.0.1), then it gets a little more complex:
^(0|[1-9]\d*)(\.(0|[1-9]\d*)){0,3}$
This question may also be a duplicate: A regex for version number parsing
Major.Minor.Patch - npm version like 0.1.2:
^([1-9]\d*|0)(\.(([1-9]\d*)|0)){2}$
More or optional minor groups like 1.1.5.0 or just 1.2:
^([1-9]\d*|0)(\.(([1-9]\d*)|0)){0,3}$
Avoid leading zero - no |0 in first group:
^([1-9]\d*)(\.(([1-9]\d*)|0)){0,3}$
Semantic Version String like 1.0.0-beta
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
Break down:
^: match the line start
$: match the line end
( and ): make a group
([1-9]\d*|0): match version number
[1-9]\d*: starting with 1~9, following any number of digit
|: logical or
0: literal zero
\.: literal (escaped) dot
{2}: exact 2 matches
{0,3}: 0~3 matches
Test cases (regex101 JavaScript):
Match:
0.0.0
0.0.1
0.1.0
1.0.0
1.0.1
1.1.0
1.1.1
0.0.10
0.10.0
10.0.0
0.1.10
1.0.10
1.0.100
0.100.1
100.0.0
1.20.0
Not match:
0.0.00
0.00.0
00.00.0
0.0.01
0.01.0
01.0.0
0.01.0
01.0.0
00.0.01
This regex should help:
^(([1-9]+\d*\.)+[1-9]+\d*)|[1-9]+\d*$
Below is the explanation.
[1-9]+\d* means a sequence which begins with a non-zero number, followed by zero or more numbers
The first part (([1-9]+\d*\.)+[1-9]+\d*) catches all of your correct examples, except of 1. So, we have a | (or), followed by a [1-9]+\d* sequence.
([\*,\^])([\-,\*,\w]+[\.])+(\w)*
for npm package fro example
"cross-env": "^5.2.0",

Regex for for Phone Numbers allowing for only 6 to 20 characters

Regex beginner here. I've been trying to tackle this rule for phone numbers to no avail and would appreciate some advice:
Minimum 6 characters
Maximum 20 characters
Must contain numbers
Can contain these symbols ()+-.
Do not match if all the numbers included are the same (ie. 111111)
I managed to build two of the following pieces but I'm unable to put them together.
Here's what I've got:
(^(\d)(?!\1+$)\d)
([0-9()-+.,]{6,20})
Many thanks in advance!
I'd go about it by first getting a list of all possible phone numbers (thanks #CAustin for the suggested improvements):
lst_phone_numbers = re.findall('[0-9+()-]{6,20}',your_text)
And then filtering out the ones that do not comply with statement 5 using whatever programming language you're most comfortable.
Try this RegEx:
(?:([\d()+-])(?!\1+$)){6,20}
Explained:
(?: creates a non-capturing group
(\d|[()+-]) creates a group to match a digit, parenthesis, +, or -
(?!\1+$) this will not return a match if it matches the value found from #2 one or more times until the end of the string
{6,20} requires 6-20 matches from the non-capturing group in #1
Try this :
((?:([0-9()+\-])(?!\2{5})){6,20})
So , this part ?!\2{5} means how many times is allowed for each one from the pattern to be repeated like this 22222 and i put 5 as example and you could change it as you want .

Regex up to a special character and group of letters

Using Regex, I'm attempting to get back the following (stars denote what I'd like to extract) from each string using a single Regex command:
FO4H56FD-BTU (Follow Home 56): PLTD8
\***********
FO4H56FD-SYH-BI (Follow Home 56 SYH): PLTD8
\***********
FO4H52FD-SZH-AG4R-BI (Follow Home 52 SAH): QQTD8
\****************
FO4H58FD-SGH: (Follow Home 58 TGT): PLTS8
\***********
For some reason I'm having a lot of difficulties. I've been using various methods and currently have =REGEXEXTRACT(A43,"(FO.+)\-BI") which isn't working. Mine also isn't looking for the : currently. I was using a | for multiple rules which didn't seem to work out.
You may use
=REGEXEXTRACT(A43,"^(.*?)(?:-BI)?(?:[ :]|$)")
Details:
^ - start of string
(.*?) - capturing group #1 matching any 0+ chars as few as possible
(?:-BI)? - an optional non-capturing group matching 1 or 0 occurrences of -BI substring
(?:[ :]|$) - either a space, : or end of string

best approach for my pattern match

So, I've built a regex which follows this:
4!a2!a2!c[3!c]
which is translated to
4 alpha character followed by
2 alpha characters followed by
2 characters followed by
3 optional character
this is a standard format for SWIFT BIC code HSBCGB2LXXX
my regex to pull this out of string is:
(?<=:32[^:]:)(([a-zA-Z]{4}[a-zA-Z]{2})[0-9][a-zA-Z]{1}[X]{3})
Now this is targeting a specific tag (32) and works, however, I'm not sure if it's the cleanest, plus if there are any characters before H then it fails.
the string being matched against is:
:32B:HsBfGB4LXXXHELLO
the following returns HSBCGB4LXXX, but this:
:32B:2HsBfGB4LXXXHELLO
returns nothing.
EDIT
For clarity. I have a string which contains multiple lines all starting with :2xnumber:optional letter (eg, :58A:) i want to specify a line to start matching in and return a BIC from anywhere in the line.
EDIT
Some more example data to help:
:20:ABCDERF Z
:23B:CRED
:32A:140310AUD2120,
:33B:AUD2120,
:50K:/111222333
Mr Bank of Dad
Dads house
England
:52D:/DBEL02010987654321
address 1
address 2
:53B:/HSBCGB2LXXX
:57A://AU124040
AREFERENCE
:59:/44556677
A line which HSBCGB2LXXX contains a BIC
:70:Another line of data
:71A:Even more
Ok, so I need to pass in as a variable the tag 53 or 59 and return the BIC HSBCGB2LXXX only!
Your regex can be simplified, and corrected to allow a character before the H, to:
:32[^:]:.?([a-zA-Z]{6}\d[a-zA-Z]XXX)
The changes made were:
Lost the look behind - just make it part of the match
Inserting .? meaning "optional character"
([a-zA-Z]{4}[a-zA-Z]{2}) ==> [a-zA-Z]{6} (4+2=6)
[0-9] ==> \d (\d means "any digit")
[X]{3} ==> XXX (just easier to read and less characters)
Group 1 of the match contains your target
I'm not quite sure if I understand your question completely, as your regular expression does not completely match what you have described above it. For example, you mentioned 3 optional characters, but in the regexp you use 3 mandatory X-es.
However, the actual regular expression can be further cleaned:
instead of [a-zA-Z]{4}[a-zA-Z]{2}, you can simply use [a-zA-Z]{6}, and the grouping parentheses around this might be unnecessary;
the {1} can be left out without any change in the result;
the X does not need surrounding brackets.
All in all
(?<=:32[^:]:)([a-zA-Z]{6}[0-9][a-zA-Z]X{3})
is shorter and matches in the very same cases.
If you give a better description of the domain, probably further improvements are also possible.