I have a large list of urls that has a unique numeric string in each, the string falls between a / and a ? I would like to remove all other text from notepad++ that are not these strings. for example
www.website.com/dsw/fv3n24nv1e4121v/123456789012?fwe=32432fdwe23f3 would end up as only 123456789012
I have figured out that the following regex \b\d{12}\b will get me the 12 digits, now I just need to remove all of the information that falls each side. I have had a look and found some posts that suggest replace with \t$1 , $1\n
, $1 , and /1 however all of these do the exact oposite of what I want and just remove the 12 digit string.
You can use this regex and replace it with empty string,
^[^ ]*\/|\?[^ ]*$
Demo
Explanation:
^[^ ]*\/ --> Matches anything expect space from start of string till it finds a /
\?[^ ]*$ --> Similarly, this matches anything except space starting from ? till end of input.
Ctrl+H
Find what: ^.*/([^?]+).*$
Replace with: $1
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.* # 0 or more any character but newline
/ # a slash
([^?\r\n]+) # group 1, 1 or more any character that is not ? or line break
.* # 0 or more any character but newline
$ # end of line
Result for given example:
123456789012
Related
I want to find words starting with stop. and extract the string that follows the word. Each string should be in a new line.
Also the results file should not have any duplicates.
Example file:
example regex stop.variant1
stop stop.variant_2 examplestop
stopstopvariant
stop.variant_#_3
Result:
variant1
variant_2
variant_#_3
Ctrl+H
Find what: .*?(\bstop\.variant\S*)
Replace with: $1\n
CHECK Match case
CHECK Wrap around
CHECK Regular expression
CHECK . matches newline*
Replace all
Explanation:
.*? # 0 or more any character
( # group 1
\b # word boundary
stop\.variant # literally
\S* # 0 or more non spaces
) # end group
Screen capture:
I have a text file with the following text:
andal-4.1.0.jar
besc_2.1.0-beta
prov-3.0.jar
add4lib-1.0.jar
com_lab_2.0.jar
astrix
lis-2_0_1.jar
Is there any way i can split the name and the version using regex. I want to use the results to make two columns 'Name' and 'Version' in excel.
So i want the results from regex to look like
andal 4.1.0.jar
besc 2.1.0-beta
prov 3.0.jar
add4lib 1.0.jar
com_lab 2.0.jar
astrix
lis 2_0_1.jar
So far I have used ^(?:.*-(?=\d)|\D+) to get the Version and -\d.*$ to get the Name separately. The problem with this is that when i do it for a large text file, the results from the two regex are not in the same order. So is there any way to get the results in the way I have mentioned above?
Ctrl+H
Find what: ^(.+?)[-_](\d.*)$
Replace with: $1\t$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(.+?) # group 1, 1 or more any character but newline, not greedy
[-_] # a dash or underscore
(\d.*) # group 2, a digit then 0 or more any character but newline
$ # end of line
Replacement:
$1 # content of group 1
\t # a tabulation, you may replace with what you want
$2 # content of group 2
Result for given example:
andal 4.1.0.jar
besc 2.1.0-beta
prov 3.0.jar
add4lib 1.0.jar
com_lab 2.0.jar
astrix
lis 2_0_1.jar
Not quite sure what you meant for the problem in large file, and I believe the two regex you showed are doing opposite as what you said: first one should get you the name and second one should give you version.
Anyway, here is the assumption I have to guess what may make sense to you:
"Name" may follow by - or _, followed by version string.
"Version" string is something preceded by - or _, with some digit, followed by a dot or underscore, followed by some digit, and then any string.
If these assumption make sense, you may use
^(.+?)(?:[-_](\d+[._]\d+.*))?$
as your regex. Group 1 is will be the name, Group 2 will be the Version.
Demo in regex101: https://regex101.com/r/RnwMaw/3
Explanation of regex
^ start of line
(.+?) "Name" part, using reluctant match of
at least 1 character
(?: )? Optional group of "Version String", which
consists of:
[-_] - or _
( ) Followed by the "Version" , which is
\d+ at least 1 digit,
[._] then 1 dot or underscore,
\d+ then at least 1 digit,
.* then any string
$ end of line
I have a text file with the following lines:
asm-java-2.0.0-lib
cib-slides-3.1.0
lib-hibernate-common-4.0.0-beta
astp
act4lib-4.0.0
I want to remove everything from, including the '-' before the numbers begin so the results look like:
2.0.0-lib
3.1.0
4.0.0-beta
act4lib
Does anyone know the correct regex for this? So far I have come up with -\D.*(a-z)* but its got too many errors.
Ctrl+H
Find what: ^.*?(?=\d|$)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.*? # 0 or more any character but newline, not greedy
(?= # start lookahead, zero-length assertion that makes sure we have after
\d # a digit
| # OR
$ # end of line
) # end lookahead
Result for given example:
2.0.0-lib
3.1.0
4.0.0-beta
Another solution that deals with act4lib-4.0.0:
Ctrl+H
Find what: ^(?:.*-(?=\d)|\D+)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(?: # start non capture group
.* # 0 or more any character but newline
- # a dash
(?=\d) # lookahead, zero-length assertion that makes sure we have a digit after
| # OR
\D+ # 1 or more non digit
) # end group
Replacement:
\t # a tabulation, you may replace with what you want
Given:
asm-java-2.0.0-lib
cib-slides-3.1.0
lib-hibernate-common-4.0.0-beta
astp
act4lib-4.0.0
Result for given example:
2.0.0-lib
3.1.0
4.0.0-beta
4.0.0
Use
^\D+\-
If you want to completely remove lines without numbers then use this
^\D+(\-|$)
In case the packages contain numbers in their names like act4lib-4.0.0 then a longer variant is needed
^[\w-]+(\-(?=\d+\.\d+)|$)
It can be shortened to ^.+?(\-(?=\d+\.)|$) but I just want to be sure so I also check the minor version number
The ^ will match from the start of line
lemme show an example. My file looks like this:
AaaAab
AacAaa
AacAap
AaaBbb
I would like to delete all the lines which contains 3 same characters in first or second 3 chars. Which means I will receive only AacAap from above example.
You can use something like:
^(?:(.)\1\1.*|.{3}(.)\2\2.*)$
Put that in the "Find what" field, and put an empty string in the "Replace with" field.
Here's a demo.
Ctrl+H
Find what: ^(?:(.)\1\1|...(.)\2\2).*\R
Replace with: LEAVE EMPTY
UNcheck Match case
check Wrap around
check Regular expression
DO NOT CHECK . matches newline
Replace all
Explanation:
^ : beginning of line
(?: : start non capture group
(.) : group 1, any character but newline
\1\1 : same as group 1, twice
| : OR
... : 3 any character
(.) : group 2, any character but newline
\2\2 : same as group 2, twice
) : end group
.* : 0 or more any character
\R : any kind of linebreak
Result for given example:
AacAap
You can use this pattern:
^(?:...)?(.)\1\1.*\r?\n?
The part (.)\1\1 matches three consecutive same characters with a capture and two backreferences. (?:...)? makes the three first characters optional, this way the consecutive characters can be at the beginning of the line or at the 4th position.
.*\r?\n? is only here to match all remaining characters of the line including the line break (you can preserve line breaks if you want, you only have to remove \r?\n?).
Check on the next regex (?im)^(?:...)?(.)\1\1.*(?:\R|\z).
To try the regex online and get an explanation, please click here.
I want to replace all the words starting with *
for eg:-
*finish :- finish (* removed)
a*finish :- a*finish (not removed)
What regular expression would work in notepad++ ?
I tried ^* but it says invalid regular expression.
similarly for ^[\\*] doesn't work as well.
For normar characters it is working.
you can use ^(\s+)?\*.+
Online demo
Replace each match of this regex:
(?:(?<=\s)|(?<=^))\*
with a blank string.
Click for Demo
Explanation:
(?<=\s) - positive lookbehind to make sure that the current position is preceded by a whitespace
| - OR
(?<=^) - positive lookbehind to make sure that the current position is preceded by start of the line
\* - If any of the above conditions satisfy, match the *
I think you can use this:
Press Ctrl+H
Fill in Find what: (^|\s)\*(.+?)(\s|$)
Fill in Replace with: \1\2\3
[ Regex Demo ]
Explanation:
(^|\s) => Group 1: start of line -^- or any white-space character -\s-
\* => * character
(.+?) => Group 2: one or many characters on lowest length until next match
(\s|$) => Group 3: any white-space character -\s- or end of line -$-