I have a text in which I want to get only the hexadecimal codes.
Like: "thisissometextthisistext\x64\x6f\x6e\x74\x74\x72\x61\x6e\x73\x6c\x61\x74\x65somemoretextoverhere"
It's possible to get the hex codes with \x..
But it doesn't seems I can do something like (^\x..) to select everything but the hex codes.
Any workarounds?
You may use a (?s)((?:\\x[a-fA-F0-9]{2})+)|. regex (that will match and capture into Group 1 any 1+ sequences of hex values OR will just match any other char including a line break char) and replace with a conditional replacement pattern (?{1}$1\n:) (that will reinsert the hex value chain or will replace the match with an empty string):
Find What: (?s)((?:\\x[a-fA-F0-9]{2})+)|.
Replace With: (?{1}$1\n:)
Regex Details:
(?s) - same as . matches newline option ON
((?:\\x[a-fA-F0-9]{2})+) - Group 1 capturing one or more sequences of
\\x - a \\x
[a-fA-F0-9]{2} - 2 letters from a to f or digits
| - or
. - any single char.
Replacement pattern:
(?{1} - if Group 1 matches:
$1\n - replace with its contents + a newline
: - else replace with an empty string
) - end of the replacement pattern.
try ^.*?((\\x[a-f0-9]{2})+).*$ and replace with $1
and it should just leave the hex code
then after replace
If you are already able to find the hexcodes with your regex, couldn't you just use that information to delete all of the hexcodes from the string (or from a clone of the string if you need to preserve the original) and you would be left with all text except for hexcodes.
^ acts as a negation token only inside (and at the beginning) of a character class, you can't use it to negate substrings of several characters.
To select all that isn't \xhh you can use this pattern:
\G(?:\\x[a-f0-9]{2})*+\K(?=.|\n)[^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*
it matches the \xhhs first and removes them from the match using the \K feature (that removes all on the left). The other part of the pattern [^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)* matches all that isn't a \xhh. Since this subpattern can match the empty string at the end of the string, I added the lookahead (?=.|\n) to ensure there's at least one character.
\G forces all matches to be contigous. In other words it matches the position at the end of the previous match.
Related
So I am trying to make a regex match for strings of the form:
"catalog.schema.'tablename'" .
The output I am looking for is just catalog.schema.'tablename' leaving out the quotes at the end position.
Can anyone help me out
I tried to do it with the expression
/(?!^|.$)+[^\s]/ which leaves out the end quotes but matches each character.
So I modified it to /(?!^|.$)+[^\s]+/g . This matches the whole sentence but doesn't ignore the end quote.
Depends on the data arround your string and quotationmarks may be within the string.
Why not just this: "(.*?)"
https://regex101.com/r/oaS8o0/1
To answer the question in the title you might simply use:
^.(.*)?.$
https://regex101.com/r/FxJgtW/1
You can just use
(?<=.).+(?=.)
Or, if you cannot use lookbehind:
(?!^).+(?!$)
See the regex demo #1 and regex demo #2.
Since . matches any char other than line break chars, the patterns just match any strings without their start and end chars.
If you don't want to match the first and the last character, you can just use a capture group instead of lookarounds and use the group 1 value.
The first . matches the first of (any) characters, the (.+) is a capture group that matches 1 or more characters, and the . at the end matches the last character of the string.
.(.+).
Regex demo
Or to get the text between the double quotes at the start and the end of the string using a negated character class and a capture group:
^"([^"]+)"$
Regex demo
Here are my potential inputs:
brian#muck.co, brian#gmail.com
brian#gmail.com, brian#muck.co
What I want to do is extract the #muck.co email address.
What I have tried is:
\s.*#muck.co
The problem is that this only grabs an email address if it is preceded by a space (so it would only match the second example input above). . . How would I write a Regex expression to match either inputs?
\s matches for a space, so you should wanted to use something like [^\s]*#muck.co - this means any number of not space caracters. [] - for a set of symbols, ^ - for negate effect.
It does not work for me, because \s in my regex flavour seems to not contain regular space, but this works [^[:space:]]\+#muck\.co. Also \+ instead of * for one or more non-space characters instead of any number and escape dot \. which unescaped stands for any single character.
You can use a negated character class to not cross the # and use either a word boundary at the end to prevent a partial word match:
[^\s#]+#muck\.co\b
Regex demo
I have a text in which I want to get only the hexadecimal codes.
Like: "thisissometextthisistext\x64\x6f\x6e\x74\x74\x72\x61\x6e\x73\x6c\x61\x74\x65somemoretextoverhere"
It's possible to get the hex codes with \x..
But it doesn't seems I can do something like (^\x..) to select everything but the hex codes.
Any workarounds?
You may use a (?s)((?:\\x[a-fA-F0-9]{2})+)|. regex (that will match and capture into Group 1 any 1+ sequences of hex values OR will just match any other char including a line break char) and replace with a conditional replacement pattern (?{1}$1\n:) (that will reinsert the hex value chain or will replace the match with an empty string):
Find What: (?s)((?:\\x[a-fA-F0-9]{2})+)|.
Replace With: (?{1}$1\n:)
Regex Details:
(?s) - same as . matches newline option ON
((?:\\x[a-fA-F0-9]{2})+) - Group 1 capturing one or more sequences of
\\x - a \\x
[a-fA-F0-9]{2} - 2 letters from a to f or digits
| - or
. - any single char.
Replacement pattern:
(?{1} - if Group 1 matches:
$1\n - replace with its contents + a newline
: - else replace with an empty string
) - end of the replacement pattern.
try ^.*?((\\x[a-f0-9]{2})+).*$ and replace with $1
and it should just leave the hex code
then after replace
If you are already able to find the hexcodes with your regex, couldn't you just use that information to delete all of the hexcodes from the string (or from a clone of the string if you need to preserve the original) and you would be left with all text except for hexcodes.
^ acts as a negation token only inside (and at the beginning) of a character class, you can't use it to negate substrings of several characters.
To select all that isn't \xhh you can use this pattern:
\G(?:\\x[a-f0-9]{2})*+\K(?=.|\n)[^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*
it matches the \xhhs first and removes them from the match using the \K feature (that removes all on the left). The other part of the pattern [^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)* matches all that isn't a \xhh. Since this subpattern can match the empty string at the end of the string, I added the lookahead (?=.|\n) to ensure there's at least one character.
\G forces all matches to be contigous. In other words it matches the position at the end of the previous match.
I was trying to write some regex to be able to fetch the value of banana. So given this list of text.
So essentially, for each line, I would like to be able to get whatever comes after banana= and have it stop at | if it exists.
apple=1|banana=2.5|oranges=1
banana=2.5|apple=1|oranges=1
apple=1|oranges=1|banana=2.5
apple=1|oranges=1|banana=-2.5
banana=2.5
I got as far as writing (?i)banana=(.*) but of course it gets everything after the exact match.
Do you guys have any solutions?
Thanks!
I would like to be able to get whatever comes after banana= and have it stop at | if it exists.
You may use a negated character class instead of a greedy dot pattern:
(?i)banana=([^|]*)
See the regex demo
The greedy dot, .*, matches any 0+ chars other than line break chars (in NFA engines) as many as possible (usually, up to the end of the line).
If you use [^|], a negated character class, it will match any char but |.
Pattern details
(?i) - case insensitive modifier
banana= - a literal substring (prepend with \b to match it as a whole word)
([^|]*) - Capturing group 1: any 0+ chars other than | (to avoid empty matches, replace * with + quantifier).
I'm trying to parse a list of filenames to a CSV file by converting the first 2 - characters per line into a |. The problem is that the filenames themselves also contain the character I'm searching for.
My raw data looks something like this:
12055371-1-Florence - BW Letter of Intent HB Comments 9-4-14-2.DOCX
12057668-2-EB-DUE-M- SBuxbaum FHA Benefit Plans-2.DOCX
12058210-1-Redline Letter of Intent-2.PDF
12058029-3-Florence Hospital--Order Establishing Bid Procedures-HB 9-23-14-2.DOCX
12058020-10-Florence - BW Letter of Intent 10,10,14 Revisions-2.DOCX
Using Notepadd++ to replace on the fly, but I'm not sure what regex will work to identify and replace these items.
Don't match -, match the beginning of the lines up to the second - :
match ^(.*?)-(.*?)-
replace by \1|\2|
Explanation :
^ matches the beginning of the line (0-width match).
(.*?) matches any character in a non-greedy way : if the next token of the regex can match, it will let it do so. The result is grouped so it can be referenced later.
\1 and \2 are back-references and refers to the two (.*?) groups.
Note : for efficiency you could replace the non-greedy matches by the negated class [^\-], which means every character but -, the - being escaped because it's a special character in this context. The groups would then become ([^\-]*). Of course it really does not matter if it's a one-time operation.