How can I find words in Notepad++? - regex

I have lot of queries like this,
select categorych0_.category_id as category3_2_0_, categorych0_.id as
id1_2_0_, categorych0_.id as id1_2_1_, categorych0_.category_id as
category3_2_1_, categorych0_.check_id as check_id4_2_1_,
categorych0_.tenantid as tenantid2_2_1_, check1_.id as id1_5_2_,
check1_.check_group as check_gr2_5_2_,
check1_.check_group_description_label as check_gr3_5_2_,
check1_.check_group_label as check_gr4_5_2_, check1_.check_name_label
as check_na5_5_2_, check1_.check_number as check_nu6_5_2_,
check1_.check_scope as check_sc7_5_2_, check1_.display_order as
display_8_5_2_, check1_.tenantid as tenantid9_5_2_ from
category_checks categorych0_ left outer join checks check1_ on
categorych0_.check_id=check1_.id where categorych0_.category_id=?
I need to remove 'as' phrases that mean, all alies phrases need to remove.

Try this regex:
as[^,]*?(?=,|from)
Replace each match with a blank string
Click for Demo
Explanation:
as - matches as literally
[^,]*? - matches 0+ occurrences of any character that is not a , as few as possible
(?=,|from) - positive lookahead to validate that the above match must be followed by a , or the text from

Related

Remove duplicate lines containing same starting text

So I have a massive list of numbers where all lines contain the same format.
#976B4B|B|0|0
#970000|B|0|1
#974B00|B|0|2
#979700|B|0|3
#4B9700|B|0|4
#009700|B|0|5
#00974B|B|0|6
#009797|B|0|7
#004B97|B|0|8
#000097|B|0|9
#4B0097|B|0|10
#970097|B|0|11
#97004B|B|0|12
#970000|B|0|13
#974B00|B|0|14
#979700|B|0|15
#4B9700|B|0|16
#009700|B|0|17
#00974B|B|0|18
#009797|B|0|19
#004B97|B|0|20
#000097|B|0|21
#4B0097|B|0|22
#970097|B|0|23
#97004B|B|0|24
#2C2C2C|B|0|25
#979797|B|0|26
#676767|B|0|27
#97694A|B|0|28
#020202|B|0|29
#6894B4|B|0|30
#976B4B|B|0|31
#808080|B|1|0
#800000|B|1|1
#803F00|B|1|2
#808000|B|1|3
What I am trying to do is remove all duplicate lines that contain the same hex codes, regardless of the text after it.
Example, in the first line #976B4B|B|0|0 the hex #976B4B shows up in line 32 as #976B4B|B|0|31. I want all lines EXCEPT the first occurrence to be removed.
I have been attempting to use regex to solve this, and found ^(.*)(\r?\n\1)+$ $1 can remove duplicate lines but obviously not what I need. Looking for some guidance and maybe a possibility to learn from this.
You can use the following regex replacement, make sure you click Replace All as many times as necessary, until no match is found:
Find What: ^((#[[:xdigit:]]+)\|.*(?:\R.+)*?)\R\2\|.*
Replace With: $1
See the regex demo and the demo screenshot:
Details:
^ - start of a line
((#[[:xdigit:]]+)\|.*(?:\R.+)*?) - Group 1 ($1, it will be kept):
(#[[:xdigit:]]+) - Group 2: # and one or more hex chars
\| - a | char
.* - the rest of the line
(?:\R.+)*? - any zero or more non-empty lines (if they can be empty, replace .+ with .*)
\R\2\|.* - a line break, Group 2 value, | and the rest of the line.

Negative lookbehind in RegEx: Matching multiple POS-tags at once

I am still fairly new to regex, so I would appreciate any help.
I am trying to use regular expressions to find specific grammatical patterns in a text corpus that was part-of-speech-tagged using the CLAWS7 tagset.
Here is a sample:
Ya_UH and_CC then_RT uhm_NN1 we_PPIS2 wrote_VVD in_RP but_CCB already_RR taken_VVN up_RP that_DD1 day_NNT1 that_CST we_PPIS2 wanted_VVD actually_RR they_PPHS2 said_VVD still_RR available_JJ you_PPY know_VV0 so_RR by_II that_DD1 time_NNT1 we_PPIS2 we_PPIS2 write_VV0 in_II our_APPGE letter_NN1 two_MC weeks_NNT2 later_RRR already_RR taken_VVN up_RP Quite_RG good_RR uh_UH P ICE-SIN:S1A-001#74:1:B Ask_VV0 her_PPHO1 I_PPIS1 left_VVD my_APPGE house_NN1 at_II one_MC1 met_VVD
PRO_NN1 in_II school_NN1 at_II two_MC Ya_PPY so_RR waited_VVD you_PPY know_VV0 they_PPHS2 say_VV0 half_DB hour_NNT1 later_RRR And_CC and_CC it_PPH1 was_VBDZ
still_RR drizzling_JJ and_CC raining_VVG
The pattern I am looking for is every instance of \w*\_V.*? (= every verb) that is not preceded by a pronoun. Pronouns can have these tags:
_PN _PN1 _PNQO _PNQS _PNQV _PNX1 _PPGE _PPH1 _PPHO1 _PPHO2 _PPHS2 _PPIO1 _PPIO2 _PPIS1 _PPIS2 _PPX1 _PPX2 _PPY
In the sample, the desired regex should ideally match:
taken_VVN
met_VVD
Ask_VV0
waited_VVD
raining_VVG
Using the negative lookbehind, I managed to create the following expression, which only matches verbs that are not preceded by a _PPIS2 tag:
(?<!\_PPIS2)\s\w*\_V.*?
What could I do to extend it to all the other pronoun tags? I've tried the expressions below, but they either do not match anything at all or match the wrong instances.
(?<!\_P.*)\s\w*\_V.*? (no match)
(?<![\_P.*])\s\w*\_V.*? (wrong results)
Any ideas or explanations would be greatly appreciated.
You may use this PCRE regex in sublime:
\b\w*_P\w*\h+\w*_V\w*(*SKIP)(*F)|\b\w*_V\w*
RegEx Demo
RegEx Details:
\b\w*_P\w*: Match a word with _P in it
\h+: Match 1+ whitespaces
\w*_V\w*: Match a word with _V anywhere
(*SKIP)(*F): skip and fail the matched substrings
|: OR
\b\w*_V\w*: Match a word with _V anywhere (these are our matches)
Maybe there would be a smarter pattern but with Sublime Text 3 you could use a combination of (*SKIP)(*F) to first match what you don't want, discard those matched, then match what you do want:
_P(?:N(?:X?1|Q[OSV]|)|P(?:GE|H1|(?:[HI]O|IS|X)[12]|HS2|Y))\s\w+_V[A-Z0-9]*\b(*SKIP)(*F)|\w+_V[A-Z0-9]*\b
See an online demo. Since all your words end on an underscore followed by the appropriate grammatical pattern I think it should fit your needs.
You can use
\b(?:[^\W_]+_[^\W_]+ )?(?<!_PN |_PN1 |_PNQ[OVS] |_PNX1 |_PPGE |_PPH1 |_PPHO[12] |_PPHS2 |_PPIO[12] |_PPIS[12] |_PPX[12] |_PPY )[^\W_]*_V\w*
See the regex demo.
Details
\b - a word boundary
(?:[^\W_]+_[^\W_]+ )? - an optional sequence of
[^\W_]+ - one or more letters/digits
_ - an underscore
[^\W_]+ - one or more letters/digits and a space
(?<!_PN |_PN1 |_PNQ[OVS] |_PNX1 |_PPGE |_PPH1 |_PPHO[12] |_PPHS2 |_PPIO[12] |_PPIS[12] |_PPX[12] |_PPY ) - a negative lookbehind that fails the match if any of the patterns above appear immediately to the left of the current location
[^\W_]* - zero or more digits/letters
_V - a _V string
\w* - any zero or more word chars.

openrefine extracting values between symbols

I am trying to extract string of text from a whole field with Openrefine.
This is an extract of my dataset:
172. D3B: 23Y1-Up, 27Y1-Up (36 LK) 6-S/F Rollers, 4-D/F Rollers, 2-Carrier Rollers
179. D3C: 23Y2508-UP (37LK) 6-S/F, 4-D/F, 2-T/C
180. 27Y5050-UP (37LK) 6-S/F, 4-D/F, 2-T/C
181. 2XF622-UP (37LK) 6-S/F, 4-D/F, 2-T/C
182. 3RF0147-UP (36LK) 6-S/F, 4-D/F, 2-T/C
200. D4D:67A1-UP, 78A1-UP, 85A1-UP, 86A1-UP, 59J1-644, 58J1-UP, 49J1-473, 22C1-UP, 91A1-UP, 88A1-UP
I want to extract 23Y1-Up, 27Y1-Up from record 172,
23Y2508-UP from record 179, 27Y5050-UP from record 180 and the whole 67A1-UP, 78A1-UP, 85A1-UP, 86A1-UP, 59J1-644, 58J1-UP, 49J1-473, 22C1-UP, 91A1-UP, 88A1-UP from record 200
So basically the rule would be to extract everything between :if present and ( if present. Maybe restricting it to where there is one or more occurrence of the string UP
So I am adding a new column based on existing column using value.match.
I tried to adapt some query to my scope but I am very far from succeding despite multiple attempts.
I started with this regex expression value.match(/\:?\s*(\w+\.?)+?.*/)[0] that I tought would isolate any word AFTER the semicolon (and the space) but it works only with words BEFORE...
Yesterday I successfully extracted the numbers before the LK that is also relevant information for my dataset, but I can't grasp this.
Any help is much appreciated!
Thanks
Using match matches the whole string.
You can use a single capture group with a negated character class to exclude matching (
^[^:]*:\s*([^(]+).*$
^[^:]*:\s* Match until the first : followed by optional whitespace chars
( Capture group 1
[^(]+ Match 1+ occurrence of any char except (
) Close group 1
.*$ Match the rest of the line
regex demo
Or capture in a group matching only word characters separated by a hyphen
^[^:]*:\s*(\w+-\w+(?:,\s+\w+-\w+)*).*$
regex demo

Regex for SQL Query

Hello together I have the following problem:
I have a long list of SQL queries which I would like to adapt to one of my changes. Finally, I have a renaming problem and I'm afraid I want to solve it more complicated than expected.
The query looks like this:
INSERT member (member, prename, name, street, postalcode, town, tel1, tel2, fax, bem, anrede, salutation, email, name2, name3, association, project) VALUES (2005, N'John', N'Doe', N'Street 4711', N'1234', N'Town', N'1234-5678', N'1234-5678', N'1234-5678', N'Leader', NULL, N'Dear Mr. Doe', N'a#b.com', N'This is the text i want to delete', N'Name2', N'Name3', NULL, NULL);
In the "Insert" there was another column which I removed (which I did simply via Notepad++ by typing the search term - "example, " - and replaced it with an empty field. Only the following entry in Values I can't get out using this method, because the text varies here. So far I have only worked with the text file in which I adjusted the list of queries.
So as you can see there is one more entry in Values than in the insertions (there was another column here, but it was removed by my change).
It is the entry after the email address. I would like to remove this including the comma (N'This is the text i want to delete',).
My idea was to form a group and say that the 14th digit after the comma should be removed. However, even after research I do not know how to realize this.
I thought it could look like this (tried in https://regex101.com/)
VALUES\s?\((,) something here
Is this even the right approach or is there another method? I only knew Regex to solve this problem, because of course the values look different here.
And how can I finally use the regex to get the queries adapted (because the queries are local to my computer and not yet included in the code).
Short summary:
Change the query from
VALUES (... test5, test6, test7 ...)
To
VALUES (... test5, test7 ...)
As per my comment, you could use find/replace, where you search for:
(\bVALUES +\((?:[^,]+,){13})[^,]+,
And replace with $1
See the online demo
( - Open 1st capture group.
\bValues +\( - Match a word-boundary, literally 'VALUES', followed by at least a single space and a literal open paranthesis.
(?: - Open non-capturing group.
[^,]+, - Match anything but a comma at least once followed by a comma.
){13} - Close non-capture group and repeat it 13 times.
) - Close 1st capture group.
[^,]+, - Match anything but a comma at least once followed by a comma.
You may use the following to remove / replace the value you need:
Find What: \bVALUES\s*\((\s*(?:N'[^']*'|\w+))(?:,(?1)){12}\K,(?1)
Replace With: (empty string, or whatever value you need)
See the regex demo
Details
\bVALUES - whole word VALUES
\s* - 0+ whitespaces
\( - a (
(\s*(?:N'[^']*'|\w+)) - Group 1: 0+ whitespaces and then either N' followed with any 0 or more chars other than ' and then a ', or 1+ word chars
(?:,(?1)){12} - twelve repetitions of , followed with the Group 1 pattern
\K - match reset operator that discards the text matched so far from the match memory buffer
, - a comma
(?1) - Group 1 pattern.
Settings screen:

How can I search and replace guids in Sublime 3

I have a textfile where I would like to replace all GUIDs with space.
I want:
92094, "970d6c9e-c199-40e3-80ea-14daf1141904"
91995, "970d6c9e-c199-40e3-80ea-14daf1141904"
87445, "f17e66ef-b1df-4270-8285-b3c15da366f7"
87298, "f17e66ef-b1df-4270-8285-b3c15da366f7"
96713, "3c28e493-015b-4b48-957f-fe3e7acc8412"
96759, "3c28e493-015b-4b48-957f-fe3e7acc8412"
94665, "87ac12a3-62ed-4e1d-a1a6-51ae05e01b1a"
94405, "87ac12a3-62ed-4e1d-a1a6-51ae05e01b1a"
To become:
92094,
91995,
87445,
87298,
96713,
96759,
94665,
94405,
How can i accomplish this in Sublime 3?
Ctrl+H
Find: "[\da-f-]{36}"
Replace: LEAVE EMPTY
Enable regex mode
Replace all
Explanation:
" : double quote
[ : start class character
\d : any digit
a-f : or letter from a to f
- : or a dash
]{36} : end class, 36 characters must be present
" : double quote
Result for given example:
92094,
91995,
87445,
87298,
96713,
96759,
94665,
94405,
Try doing a search for this pattern in regex search mode:
"[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12}"
And then just replace with empty string. This should strip off the GUID, leaving you with the output you want.
Demo
Another regex solution involving a slightly different search-replace strategy where we don't care about the GUI format and simply get the first column:
Search for ([^,]*,).* (again don't forget to activate the regex mode .*).
Replace with $1.
Details about the regular expression
The idea here is to capture all first columns. A column here is defined by a sequence of
"some non-comma character": [^,]*
followed by a comma: [^,]*,
The first column can then be followed by anything .* (the GUI format doesn't matter): [^,]*,.*
Finally we need to capture the 1st column using group capturing: ([^,]*,).*
In the replace field we use a backreference $x which refers the the x-th capturing group.