Regex to generate dynamic sql - regex

I want to generate dynamic sql on Notepad++ based on some rules. These rules include everything, so no sql knowledge is needed, and are the following:
Dynamic sql must have each single quote escaped by another single quote ( 'hello' becomes ''hello'')
Each line should begin with "+#lin"
If a line has only whitespace, nothing should be following the "+#lin", despite following rules
Replace each \t directly following "+#lin" with "+#tab"
Add " +' " after the #lin/#tab sequence
Add a single quote at the end of line
So, as an example, this input:
select 1,'hello'
from --two tabs exist after from
table1
should become:
+#lin+'select 1,''hello'''
+#lin+'from --two tabs exist after from'
+#lin
+#lin+#tab+'table1'
What I have for now is the following 4 steps:
Replace single quote with double quotes to cover rule 1
Replace ^(\t*)(.*)$ with \+#lin\1\+'\2' to cover rules 2,5,6
Replace \t with \+#tab to cover rule 4
Replace (\+#tab)*\+''$ with nothing to cover rule 3
Notice that this mostly works, except for the third replacement, which replaces all tabs, and not only the ones at the beginning. I tried (?<=^\t*)\t with no success- it matches nothing.
I'm looking for a solution which satisfies the rules in as few replacement steps as possible.

After replacing single quotes with 2 quotes, you can do the rest in a single step:
Not very elegant for processing multiple TABs, but it works.
Ctrl+H
Find what: ^(?:(\t)(\t)?(\t)?(\t)?(\t)?(\S.*)|\h*|(.+))$
Replace with: +#lin(?1+#tab+(?2#tab+)(?3#tab+)(?4#tab+)(?5#tab+)'$6')(?7+'$7')
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(?: # non capture group
(\t) # group 1, tabulation
(\t)? # group 2, tabulation, optional
(\t)? # group 3, tabulation, optional
(\t)? # group 4, tabulation, optional
(\t)? # group 5, tabulation, optional
(\S.*) # group 6, a non-space character followed by 0 or more any character but newline
| # OR
\h* # 0 or more horizontal spaces
| # OR
(.+) # group 7, 1 or more any character but newline
) # end group
$ # end of line
Replacement:
+#lin # literally
(?1 # if group 1 exists
+#tab+ # add this
(?2#tab+) # if group 2 exists, add a second #tab+
(?3#tab+) # id
(?4#tab+) # id
(?5#tab+) # id
'$6' # content of group 6 with single quotes
) # endif
(?7 # if group 7 exists
+ # plus sign
'$7' # content of group 3 with single quotes
) # endif
Screenshot (before):
Screenshot (after):

You can use three substitutions here, it is not quite possible (without additional assumptions) to reduce the number of steps here since you need to replace at the same positions.
Step 1: Replace single quotes with double - ' with ''. No regex so far, but you can have the regex checkbox on.
Step 2: Add +#lin+ at the start of the line and only wrap its contents with ' if there is any non-whitespace char on the line (while keeping all TABs before the first '):
Find What: ^(\t*+)(\h*\S)?+(.*)
Replace With: +#lin+$1(?2'$2$3':)
Details:
^ - start of a line
(\t*+) - Group 1 ($1): zero or more TABs
(\h*\S)?+ - Group 2 ($2): an optional sequence of any zero or more horizontal whitespace chars and then a non-whitespace char
(.*) - Group 3 ($3): the rest of the line
+#lin+$1(?2'$2$3':) - replaces the match with +#lin+ + Group 1 value (i.e. tabs found), and then - only if Group 2 matches - ' + Group 2 + Group 3 values + '
Step 3: Replace each TAB after +#lin+ with #tab+:
Find What: (\G(?!^)|^\+#lin\+)\t
Replace With: $1#tab+
Details:
(\G(?!^)|^\+#lin\+) - Group 1: either
\G(?!^) - end of the previous match
| - or
^\+#lin\+ - start of a line and +#lin+ string
\t - a TAB char.
The replacement is the concatenation of Group 1 value and #tab+ string.
See this regex online demo.

Related

RegEx string to find two strings and delete the rest of the text in the file including lines that don't contain the strings [duplicate]

I need to do a find and delete the rest in a text file with notepad+++
i want tu use RegeX to find variations on thban..... the variable always has max 5 chars behind it(see dots).
with my search string it hit the last line but the whole line. I just want the word preserved.
When this works i also want keep the words containing C3.....
The rest of a tekst file can be delete.
It should also be caps insensitive
(?!thban\w+).*\r?\n?
\
THBANES900 and C3950 bla bla
THBAN
..THBANES901.. C3850 bla bla
THBANMP900
**..thbanes900..**
This should result in
THBANES900 C3950
THBAN
THBANES901 C3850
THBANMP900
thbanes900
Maybe just capture those words of interest instead of replacing everything else? In Notepad++ search for pattern:
^.*\b(thban\S{0,5})(?:.*(\sC3\w+))?.*$|.+
See the Online Demo
^ - Start string ancor.
.*\b - Any character other than newline zero or more times upto a word-boundary.
(- Open 1st capture group.
thban\S{0,5} - Match "thban" and zero or 5 non-whitespace chars.
) - Close 1st capture group.
(?: - Open non-capturing group.
.* - Any character other than newline zero or more times.
( - Open 2nd capture group.
\sC3\w+ - A whitespace character, match "C3" and one ore more word characters.
) - Close 2nd capture group.
)? - Close non-capturing group and make it optional.
.* - Any character other than newline zero or more times.
$ - End string ancor.
| - Alternation (OR).
.+ - Any character other than newline once or more.
Replace with:
$1$2
After this, you may end up with empty line you can switly remove using the build-in option. I'm unaware of the english terms so I made a GIF to show you where to find these buttons:
I'm not sure what the english checkbutton is for ignore case. But make sure that is not ticked.
You may use
Find What: (?|\b(thban\S{0,5})|\s(C3\w+))|(?s:.)
Replace With: (?1$1\n:)
Screenshot & settings
Details
(?| - start of a branch reset group:
\b(thban\S{0,5}) - Group 1: a word boundary, then thban and any 0 to 5 non-whitespace chars
| - or
\s(C3\w+) - a whitespace char, and then Group 1: C3 and one or more word chars
) - end of the branch reset group
| - or
(?s:.) - any one char (including line break chars)
The replacement is
(?1 - if Group 1 matched,
$1\n - Group 1 value with a newline
: - else, replace with empty string
) - end of the conditional replacement pattern

Modify raw input to look like a bus timetable in specific format using regex?

I'm trying to figure this out for quite some time already, but can't seem to find the solution that would work at once or in the way I prefer it.
I have an input that looks like this:
0430
0500 25 50
0615 34 51
0708 26 43
And I need to turn it into this:
04:30
05:00,05:25,05:50
06:15,06:34,06:51
07:08,07:26,07:43
Since this is only part of the input and manually replacing everything isn't an option, my guess is that the best option is to go with regex.
What needs to be done:
Insert colon after the first two ciphers (something like (^\d{2}) and then doing replace/substitution with $1:)
Replace each space with comma + first two ciphers + colon.
My idea was to capture group (^\d{2}:) and then replace all spaces with ,$1 (per each line), but I can't seem to find the way to do it.
I use regex101.com for doing it, so if you have any advice on how to do it, or where to do it (or even if regex isn't the way to do it, what other way would you recommend) any help would be appreciated.
Thanks in advance!
Here is a way to do the job with Notepad++:
Ctrl+H
Find what: ^(\d\d)(\d\d)(?:\h+(\d\d)\h+(\d\d))?
Replace with: $1$2(?3,$1\:$3,$1\:$4:)
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
^ # beginning of line
(\d\d) # group 1, 2 digits
(\d\d) # group 2, 2 digits
(?: # non capture group
\h+ # 1 or more horizontal spaces
(\d\d) # group 3, 2 digits
\h+ # 1 or more horizontal spaces
(\d\d) # group 4, 2 digits
)? # end group, optional
Replacement:
$1 # content of group 1
$2 # content of group 2
(?3 # if group 3 exists
,$1\:$3 # a comma then content of group 1 and 3
,$1\:$4 # a comma then content of group 1 and 4
: # else nothing
) # end conditional
Screen capture (before):
Screen capture (after):

Split comma separated list on separate line (notepad++ / regex)

I have a few files where each file has some text which has a description and list of tags.
I would like to manipulate the tags in the text with notepad++ and regular expressions in each file.
I could easily replace the commas with /r/n, but that would also take into account the description part where there are also commas and I want to keep that intact. I only need to manipulate the tag part.
Plus, there is not always the same amount of tags (sometimes there are 4, sometimes more, it varies).
Original input text:
Description: blah, blah, blah, slsls,
tag:
- hello, bye, Thanks, etc, Notepad
Desired output text:
Description: blah, blah, blah, slsls,
tag:
- hello
- bye
- thanks
- etc
- notepad
Any idea how I could achieve this? thanks much
Ctrl+H
Find what: (^tag:\s+|\G)[,-]\h*(\w+)
Replace with: $1\t- $2\n
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # start group 1
^ # beginning of line
tag: # literally
\s+ # 1 or more spaces
| # OR
\G # restart from last match position
) # end group
[,-] # comma or hyphen
\h* # 0 or more horizontal spaces
(\w+) # group 2, 1 or more word character (you can use [^\s,])
Replacement:
$1 # content of group 1
\t # a tabulation
- # a hyphen followed by a space
$2 # content of group 2
\n # linefeed
Screen capture (before):
Screen capture (after):

Regex for text file

I have a text file with the following text:
andal-4.1.0.jar
besc_2.1.0-beta
prov-3.0.jar
add4lib-1.0.jar
com_lab_2.0.jar
astrix
lis-2_0_1.jar
Is there any way i can split the name and the version using regex. I want to use the results to make two columns 'Name' and 'Version' in excel.
So i want the results from regex to look like
andal 4.1.0.jar
besc 2.1.0-beta
prov 3.0.jar
add4lib 1.0.jar
com_lab 2.0.jar
astrix
lis 2_0_1.jar
So far I have used ^(?:.*-(?=\d)|\D+) to get the Version and -\d.*$ to get the Name separately. The problem with this is that when i do it for a large text file, the results from the two regex are not in the same order. So is there any way to get the results in the way I have mentioned above?
Ctrl+H
Find what: ^(.+?)[-_](\d.*)$
Replace with: $1\t$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(.+?) # group 1, 1 or more any character but newline, not greedy
[-_] # a dash or underscore
(\d.*) # group 2, a digit then 0 or more any character but newline
$ # end of line
Replacement:
$1 # content of group 1
\t # a tabulation, you may replace with what you want
$2 # content of group 2
Result for given example:
andal 4.1.0.jar
besc 2.1.0-beta
prov 3.0.jar
add4lib 1.0.jar
com_lab 2.0.jar
astrix
lis 2_0_1.jar
Not quite sure what you meant for the problem in large file, and I believe the two regex you showed are doing opposite as what you said: first one should get you the name and second one should give you version.
Anyway, here is the assumption I have to guess what may make sense to you:
"Name" may follow by - or _, followed by version string.
"Version" string is something preceded by - or _, with some digit, followed by a dot or underscore, followed by some digit, and then any string.
If these assumption make sense, you may use
^(.+?)(?:[-_](\d+[._]\d+.*))?$
as your regex. Group 1 is will be the name, Group 2 will be the Version.
Demo in regex101: https://regex101.com/r/RnwMaw/3
Explanation of regex
^ start of line
(.+?) "Name" part, using reluctant match of
at least 1 character
(?: )? Optional group of "Version String", which
consists of:
[-_] - or _
( ) Followed by the "Version" , which is
\d+ at least 1 digit,
[._] then 1 dot or underscore,
\d+ then at least 1 digit,
.* then any string
$ end of line

Regular Expression to parse whitespace-delimited data

I have written code to pull some data into a data table and do some data re-formatting. I need some help splitting some text into appropriate columns.
CASE 1
I have data formated like this that I need to split into 2 columns.
ABCDEFGS 0298 MSD
SDFKLJSDDSFWW 0298 RFD
I need the text before the numbers in column 1 and the numbers and text after the spaces in column 2. The number of spaces between the text and the numbers and will vary.
CASE 2 Data I have data like this that I need split into 3 columns.
00006011731 TAB FC 10MG 30UOU
00006011754 TAB FC 10MG 90UOU
00006027531 TAB CHEW 5MG 30UOU
00006071131 TAB CHEW 4MG 30UOU
00006027554 TAB CHEW 5MG 90UO
00006384130 GRAN PKT 4MG 30UOU
column is the first 11 characters That is easy
column 2 should contain all the text after the first 11 characters up to but not including the first number.
The last column is all the text after column 2
I would do it with these expressions:
(?-s)(\S+) +(.+)
and
(?-s)(.{11})(\D+)(.+)
And broken down in regex comment mode, those are:
(?x-s) # Flags: x enables comment mode, -s disables dotall mode.
( # start first capturing group
\S+ # any non-space character, greedily matched at least once.
) # end first capturing group
[ ]+ # a space character, greedily matched at least once. (brackets required in comment mode)
( # start second capturing group
.+ # any character (excluding newlines), greedily matched at least once.
) # end second capturing group
and
(?x-s) # Flags: x enables comment mode, -s disables dotall mode.
( # start first capturing group
.{11} # any character (excluding newlines), exactly 11 times.
) # end first capturing group
( # start second capturing group
\D+ # any non-digit character, greedily matched at least once.
) # end second capturing group
( # start third capturing group
.+ # any character (excluding newlines), greedily matched at least once.
) # end third capturing group
(The 'dotall' mode (flag s) means that . matches all characters, including newlines, so we have to disable it to prevent too much matching in the last group.)
Supposing you know how to handle the VB.NET code to get the groupings (matches) and that you are willing to strip the extra spaces from the groupings yourself
The Regex for case 1 is
(.*?\s+)(\d+.*)
.*? => grabs everything non greedily, so it will stop at the first space
\s+ => one or more whitespace characters
These two form the first group.
\d+ => one or more digits
.* => rest of the line
These two form the second group.
The Regex for case 2 is
(.{11})(.*?)(\d.*)
.{11} => matches 11 characters (you could restrict it to be just letters
and numbers with [a-zA-Z] or \d instead of .)
That's the first group.
.*? => Match everything non greedily, stop before the first
digit found (because that's the next regex)
That's the second group.
\d.* => a digit (used to stop the previous .*?) and the rest of the line
That's the third group.
I would use Peter Boughton's regexes, but ensure you have . matches newline turned off. If that is on, ensure you add a $ on the end :)
The greedy regexes will perform better.
The simplest way for the kind of data you are presenting is to split the line into fields at the spaces, then reunite what you want to have together. Regex.Split(line, "\\s+") should return an array of strings. This is also more robust against changing strings in the fields, for example if in the second case a line reads "00006011731 TAB 3FC 10MG 30UOU".