regex get string within two string - regex

I have a query and want to get table names between from & where
If its a single line and single table without alias i could do so:
(?<=from )([^#]\w*)(?=.*where)
I need to get each table except the prefixed table. i.e course c marks s
But i can't figure out regex for the following Query.
(The where clause could be in same line or new line, on start of line or with space or tab)
from #prefix#student, course c, marks m
where ....
There are also sub queries in some places, if that case could also be handled would help.
select ... from course c
where id = (select ... from student where ...)
I'm trying to find & replace in sublime text 3 editor
Test case queries:
//output [course]
select ... from course
where ...
//output [course c] [marks s]
select ... from course c, marks s
where ....
//output [marks m]
select ... from #prefix#course c, marks m
where ...
//output [student s]
select ... from #prefix#course c
where id = (select ... from student s where ...)

You can use the following regex:
\bfrom\b(?!\s*#)([^w]*(?:\bw(?!here\b)[^w]*)*)\bwhere\b
See the regex demo
Check Case sensitive option in case you need that.
If you need to just highlight all between from and where, use lookarounds:
(?<=\bfrom\b)(?!\s*#)([^w]*(?:\bw(?!here\b)[^w]*)*)(?=\bwhere\b)
See another demo and the screen with results:
Regex breakdown:
(?<=\bfrom\b) - check if there is a whole word from before the next...
(?!\s*#) - make sure there is no 0 or more whitespaces followed by #
([^w]*(?:\bw(?!here\b)[^w]*)*) - match any text that is not where up to...
(?=\bwhere\b) - a whole word where.
UPDATE
Since you need to get comma-separated values excluding prefixed names with their aliases, you need a boundary-constrained regex. It can be achieved with \G operator:
(?:\bfrom\b(?:\s*#\w+(?:\s*\w+))*+|(?!^)\G),?\s*\K(?!(?:\w+ )?\bwhere\b)([\w ]+)(?=[^w]*(?:\bw(?!here\b)[^w]*)*\bwhere\b)
Here,
(?:\bfrom\b(?:\s*#\w+(?:\s*\w+))*+|(?!^)\G) - matches from (as a whole word) followed by optional whitespace followed by # and 1 or more alphanumerics that are followed by whitespaces+alphanumerics (alias)
,?\s*\K - optional (1 or 0) commas followed by 0 or more whitespaces that are followed by \K that forces the engine to omit the whole chunk of text matched so fat
(?!(?:\w+ )?\bwhere\b) - a restrictive lookahead with which we forbid the next or the word following the next word to be equal to where
([\w ]+) - our match, 1 or more alphanumerics or space (may be replaced with [\w\h]+)
(?=[^w]*(?:\bw(?!here\b)[^w]*)*\bwhere\b) - a trailing boundary: there must be text other than where up to the first where.

Related

Using a regex to identify EQUIPMENTID numbers - VBA

Struggling trying to construct a Regexp to identify equipment numbers, I require this to identify equipment numbers in multiple formats including pooled equipment numbers e.g AFD21101 or AFD21101-02-03 or AFD21101-2-3 including various prefixes as per testdata.
Any tips or feedback welcome, possibly it may be easier with multiple RegExp for each scenario but I had hopped to have a master that would identify any of these patterns and be able to extract from a string for further process in a more detailed order. Possibly converting to Long format etc.
Any assistance is greatly appreciated. Hopefully I can return the favour.
What I've tried so far:
^[abcpfsmschafddfcpdcdplldt][glvmdugmrxftiichlewsnuabn][mmrprbdpucdsxtvuwcrslbubk][0-9][0-9xX][0-9xX][0-9xX][0-9xX]|[0-9xX-][0-9]|[0-9]
^[abcpfsmschafddfcpdcdplldt][glvmdugmrxftiichlewsnuabn][mmrprbdpucdsxtvuwcrslbubk][0-9][0-9xX][0-9xX][0-9xX][0-9xX]
^(BLM)|(SUB)|
(CVR)|FDR|SMP|CRU|HXC|ATS|AFD|FTS|DIX|DIT|FIT|FCV|KV|FV|CHU|PLW|BCR|DEC|CTR|CWR|V|DSS|PNL|MTR|LUB|LAU|CCL|DBB|TNK|THK|PIT|[0-9][0-9xX][0-9xX][0-9xX][0-9xX]
Testdata - will have to handle multiple separated by comma or multiline as per testdata examples below
// Example test data 1: (CSV+)
CRN21003 (CB-3), CRN21004 (CB-4)
// Example test data 2: (CSV)
CVR21404, CHU21437, AFD21401
// Example test data 3: (Multi-line)
MGD22401 - 16
DEC22401 - 16
// Example test data 4: (In string)
AFD11122 SOME OTHER RANDOM DATA WDC11121_22 SOME OTHER RANDOM DATA
//Additional matches
AFD21101-03
AFD21101_03
AFD21101-02-03
AFD21101_02_03
AFD21101-2-3
AFD21101_2_3
FDR21407-08
BLM21401
SUB21601
CVR21601
Fdr21601
SMP21501
CRU21501
HXC21501
AFD21501
FTS21X01
DIX21301
DIT22501
FIT21X0X
FCV21501
Pattern:
Base is max 8 digits
1-3 letters (A-Z)
5 Digits (0-9) including X as wildcard
Followed by pooled EQUIPMENT ID's
e.g. AFD21101-2-3, AFD21101-02-03 or AFD21101_02_03
_ or - are delimiters indicating abbreviated subsequent equipment id's or ranges.
AFD21101-02-03 is equivalent to AFD21101, AFD21102, AFD21103 in full form
Possible Prefix's continued
KV
CHU
PLW
BCR
DEC
CTR
CWR
V
DSS
PNL
MTR
LUB
LAU
CCL
DBB
TNK
THK
PIT
AGM2XXXX - valid
Some Invalid matches would be something like
AGM211011 or AGMXXXXX or 21101 or 2110 or AGM21101-094-034 or AGM (prefix only without a trailing 5 digit number/ X wildcard)
If I understand your issue, you need to get the strings which starts with substring provided and contains numbers.
You could try the following regex.
^(?:BLM|SUB|CVR|FDR|SMP|CRU|HXC|ATS|AFD|FTS|DIX|DIT|FIT|FCV|KV|FV|CHU|PLW|BCR|DEC|CTR|CWR|V|DSS|PNL|MTR|LUB|LAU|CCL|DBB|TNK|THK|PIT)[0-9_-]+
Details:
^: start of string
?:: non capturing group
(?:BLM|SUB|CVR|FDR|SMP|CRU|HXC|ATS|AFD|FTS|DIX|DIT|FIT|FCV|KV|FV|CHU|PLW|BCR|DEC|CTR|CWR|V|DSS|PNL|MTR|LUB|LAU|CCL|DBB|TNK|THK|PIT): list of prefixes.
Demo
It isn't 100% clear what you're intending to do because:
The test data you've supplied is comprised wholly of expected matches
The expected output is unclear. Although this largely relays back to point 1!
However, there are many ways of getting the information you require. They all depend on how your source data is organised though...
// Example test data 1:
AFD11122 SOME OTHER RANDOM DATA
WDC11121_22 SOME OTHER RANDOM DATA
// Example test Data 2:
SOME RANDOM DATA AFD11122 AND SOME MORE RANDOM DATA WDC11121_22 WITH SOME MORE
Assuming that the data is at the start of the string AND that you want to capture each string as a whole:
// Option 1
/^(.*?)\s/
^ : Start of string
(.*?) : Non-greedy capture group
\s : First space (first because the capture group was non-greedy)
// Option 2
/^([ABCDEFHIKLMNPRSTUVWX][ABCDEFHILMNRSTUVWX]?[BCDKLMPRSTUVWX]?[x\d]{5}[_\-\d]*)/i
^ : Start of string
( : Start of capture group
[ABCDEFHIKLMNPRSTUVWX] : Capture any letter in character set
[ABCDEFHILMNRSTUVWX]? : OPTIONALLY [?] capture any letter in character set
[BCDKLMPRSTUVWX]? : OPTIONALLY [?] capture any letter in character set
[x\d]{5} : Capture any number or x 5 times
[_\-\d]* : Capture any number, hyphen, or underscore until you reach a character not in the set
) : End of capture group
i : FLAG - case insensitive
// Option 3
/^((?:AFD|BCR|BLM....TNK|V)[\d_\-]*)/i
^ : Start of string
( : Start of capture group
(?: : Start of non-capturing group
AFD|BCR|BLM....TNK|V : List of prefixes separated with "|"
) : End of non-capturing group
[\d_\-]* : Capture any number, hyphen, or underscore until you reach a character not in the set
) : End of capture group
i : FLAG - case insensitive
// Option 4
/^([a-z]{1,3}[x\d]{5}[_\-\d]*)/i :
^ : Start of string
( : Start of capture group
[a-z]{1,3} : Capture any letter [range: a-z] 1 to 3 times {1,3}
[x\d]{5} : Capture any number [\d] or x [x] 5 times {5}
[_\-\d]* : Capture any number, hyphen, or underscore until you reach a character not in the set
) : End of capture group
i : FLAG - case insensitive
Based on your updates to the main question I would stick with option 4 unless you specifically need to make sure that only the set prefixes are matched.
In the event that your data looks more like Example Data 2 then the above expressions will need to be altered accordingly; some examples below:
/([a-z]{1,3}[x\d]{5}[_\-\d]*)/i : Remove the ^
/\b([a-z]{1,3}[x\d]{5}[_\-\d]*)/i : Add a word boundary to the start of the expression
/[^a-z]([a-z]{1,3}[x\d]{5}[_\-\d]*)/i : Start the expression with anything BUT a letter
How you alter it will depend on the data that you're searching through.
Updated RegEx based on latest question edits
/([a-z]{1,3}(?!xxxxx)[x\d]{5}(?!\d)[_\-\d]*)/ig
Try this:
[A-Z]{1,3}[\dX]{5}([_-])0?\d(\10?\d)?
This requires the separator to be the consistent, ie either both - or both _, by capturing the separator and using a back reference to it \1, although the second “pooled ID” is optional.
As far as I can tell, this matches all of your examples.

Regex for SQL Query

Hello together I have the following problem:
I have a long list of SQL queries which I would like to adapt to one of my changes. Finally, I have a renaming problem and I'm afraid I want to solve it more complicated than expected.
The query looks like this:
INSERT member (member, prename, name, street, postalcode, town, tel1, tel2, fax, bem, anrede, salutation, email, name2, name3, association, project) VALUES (2005, N'John', N'Doe', N'Street 4711', N'1234', N'Town', N'1234-5678', N'1234-5678', N'1234-5678', N'Leader', NULL, N'Dear Mr. Doe', N'a#b.com', N'This is the text i want to delete', N'Name2', N'Name3', NULL, NULL);
In the "Insert" there was another column which I removed (which I did simply via Notepad++ by typing the search term - "example, " - and replaced it with an empty field. Only the following entry in Values I can't get out using this method, because the text varies here. So far I have only worked with the text file in which I adjusted the list of queries.
So as you can see there is one more entry in Values than in the insertions (there was another column here, but it was removed by my change).
It is the entry after the email address. I would like to remove this including the comma (N'This is the text i want to delete',).
My idea was to form a group and say that the 14th digit after the comma should be removed. However, even after research I do not know how to realize this.
I thought it could look like this (tried in https://regex101.com/)
VALUES\s?\((,) something here
Is this even the right approach or is there another method? I only knew Regex to solve this problem, because of course the values look different here.
And how can I finally use the regex to get the queries adapted (because the queries are local to my computer and not yet included in the code).
Short summary:
Change the query from
VALUES (... test5, test6, test7 ...)
To
VALUES (... test5, test7 ...)
As per my comment, you could use find/replace, where you search for:
(\bVALUES +\((?:[^,]+,){13})[^,]+,
And replace with $1
See the online demo
( - Open 1st capture group.
\bValues +\( - Match a word-boundary, literally 'VALUES', followed by at least a single space and a literal open paranthesis.
(?: - Open non-capturing group.
[^,]+, - Match anything but a comma at least once followed by a comma.
){13} - Close non-capture group and repeat it 13 times.
) - Close 1st capture group.
[^,]+, - Match anything but a comma at least once followed by a comma.
You may use the following to remove / replace the value you need:
Find What: \bVALUES\s*\((\s*(?:N'[^']*'|\w+))(?:,(?1)){12}\K,(?1)
Replace With: (empty string, or whatever value you need)
See the regex demo
Details
\bVALUES - whole word VALUES
\s* - 0+ whitespaces
\( - a (
(\s*(?:N'[^']*'|\w+)) - Group 1: 0+ whitespaces and then either N' followed with any 0 or more chars other than ' and then a ', or 1+ word chars
(?:,(?1)){12} - twelve repetitions of , followed with the Group 1 pattern
\K - match reset operator that discards the text matched so far from the match memory buffer
, - a comma
(?1) - Group 1 pattern.
Settings screen:

How can I search and replace guids in Sublime 3

I have a textfile where I would like to replace all GUIDs with space.
I want:
92094, "970d6c9e-c199-40e3-80ea-14daf1141904"
91995, "970d6c9e-c199-40e3-80ea-14daf1141904"
87445, "f17e66ef-b1df-4270-8285-b3c15da366f7"
87298, "f17e66ef-b1df-4270-8285-b3c15da366f7"
96713, "3c28e493-015b-4b48-957f-fe3e7acc8412"
96759, "3c28e493-015b-4b48-957f-fe3e7acc8412"
94665, "87ac12a3-62ed-4e1d-a1a6-51ae05e01b1a"
94405, "87ac12a3-62ed-4e1d-a1a6-51ae05e01b1a"
To become:
92094,
91995,
87445,
87298,
96713,
96759,
94665,
94405,
How can i accomplish this in Sublime 3?
Ctrl+H
Find: "[\da-f-]{36}"
Replace: LEAVE EMPTY
Enable regex mode
Replace all
Explanation:
" : double quote
[ : start class character
\d : any digit
a-f : or letter from a to f
- : or a dash
]{36} : end class, 36 characters must be present
" : double quote
Result for given example:
92094,
91995,
87445,
87298,
96713,
96759,
94665,
94405,
Try doing a search for this pattern in regex search mode:
"[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12}"
And then just replace with empty string. This should strip off the GUID, leaving you with the output you want.
Demo
Another regex solution involving a slightly different search-replace strategy where we don't care about the GUI format and simply get the first column:
Search for ([^,]*,).* (again don't forget to activate the regex mode .*).
Replace with $1.
Details about the regular expression
The idea here is to capture all first columns. A column here is defined by a sequence of
"some non-comma character": [^,]*
followed by a comma: [^,]*,
The first column can then be followed by anything .* (the GUI format doesn't matter): [^,]*,.*
Finally we need to capture the 1st column using group capturing: ([^,]*,).*
In the replace field we use a backreference $x which refers the the x-th capturing group.

Regex to grab formulas

I am trying to parse a file that contains parameter attributes. The attributes are setup like this:
w=(nf*40e-9)*ng
but also like this:
par_nf=(1) * (ng)
The issue is, all of these parameter definitions are on a single line in the source file, and they are separated by spaces. So you might have a situation like this:
pd=2.0*(84e-9+(1.0*nf)*40e-9) nf=ng m=1 par=(1) par_nf=(1) * (ng) plorient=0
The current algorithm just splits the line on spaces and then for each token, the name is extracted from the LHS of the = and the value from the RHS. My thought is if I can create a Regex match based on spaces within parameter declarations, I can then remove just those spaces before feeding the line to the splitter/parser. I am having a tough time coming up with the appropriate Regex, however. Is it possible to create a regex that matches only spaces within parameter declarations, but ignores the spaces between parameter declarations?
Try this RegEx:
(?<=^|\s) # Start of each formula (start of line OR [space])
(?:.*?) # Attribute Name
= # =
(?: # Formula
(?!\s\w+=) # DO NOT Match [space] Word Characters = (Attr. Name)
[^=] # Any Character except =
)* # Formula Characters repeated any number of times
When checking formula characters, it uses a negative lookahead to check for a Space, followed by Word Characters (Attribute Name) and an =. If this is found, it will stop the match. The fact that the negative lookahead checks for a space means that it will stop without a trailing space at the end of the formula.
Live Demo on Regex101
Thanks to #Andy for the tip:
In this case I'll probably just match on the parameter name and equals, but replace the preceding whitespace with some other "parse-able" character to split on, like so:
(\s*)\w+[a-zA-Z_]=
Now my first capturing group can be used to insert something like a colon, semicolon, or line-break.
You need to add Perl tag. :-( Maybe this will help:
I ended up using this in C#. The idea was to break it into name value pairs, using a negative lookahead specified as the key to stop a match and start a new one. If this helps
var data = #"pd=2.0*(84e-9+(1.0*nf)*40e-9) nf=ng m=1 par=(1) par_nf=(1) * (ng) plorient=0";
var pattern = #"
(?<Key>[a-zA-Z_\s\d]+) # Key is any alpha, digit and _
= # = is a hard anchor
(?<Value>[.*+\-\\\/()\w\s]+) # Value is any combinations of text with space(s)
(\s|$) # Soft anchor of either a \s or EOB
((?!\s[a-zA-Z_\d\s]+\=)|$) # Negative lookahead to stop matching if a space then key then equal found or EOB
";
Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture)
.OfType<Match>()
.Select(mt => new
{
LHS = mt.Groups["Key"].Value,
RHS = mt.Groups["Value"].Value
});
Results:

How to better this regex?

I have a list of strings like this:
/soccer/poland/ekstraklasa-2008-2009/results/
/soccer/poland/orange-ekstraklasa-2007-2008/results/
/soccer/poland/orange-ekstraklasa-youth-2010-2011/results/
From each string I want to take a middle part resulting in respectively:
ekstraklasa
orange ekstraklasa
orange ekstraklasa youth
My code here does the job but it feels like it can be done in fewer steps and probably with regex alone.
name = re.search('/([-a-z\d]+)/results/', string).group(1) # take the middle part
name = re.search('[-a-z]+', name).group() # trim numbers
if name.endswith('-'):
name = name[:-1] # trim tailing `-` if needed
name = name.replace('-', ' ')
Can anyone see how make it better?
This regex should do the work:
/(?:\/\w+){2}\/([\w\-]+)(?:-\d+){2}/
Explanation:
(?:\/\w+){2} - eat the first two words delimited by /
\/ - eat the next /
([\w\-]+)- match the word characters of hyphens (this is what we're looking for)
(?:-\d+){2} - eat the hyphens and the numbers after the part we're looking for
The result is in the first match group
I cant test it because i am not using python, but i would use an Expression like
^(/soccer/poland/)([a-z\-]*)(.*)$
or
^(/[a-z]*/[a-z]*/)([a-z\-]*)(.*)$
This Expressen works like "/soccer/poland/" at the beginning, than "everything with a to z (small) or -" and the rest of the string.
And than taking 2nd Group!
The Groups should hold this Strings:
/soccer/poland/
orange-ekstraklasa-youth-
2010-2011/results/
And then simply replacing "-" with " " and after that TRIM Spaces.
PS: If ur Using regex101.com e.g., u need to escape / AND just use one Row of String!
Expression
^(\/soccer\/poland\/)([a-z\-]*)(.*)$
And one Row of ur String.
/soccer/poland/orange-ekstraklasa-youth-2010-2011/results/
If u prefere to use the Expression not just for soccer and poland, use
^(\/[a-z]*\/[a-z]*\/)([a-z\-]*)(.*)$