Regex to select only specific characters between two strings - regex

I have the following HTML:
<i>This is my first sentence.
This is my second sentence.</i>
Using Regex (in SublimeText FYI) how can I select only the whitespace (including line breaks) between the two <i></i> brackets?
I have got this far where I can select all the characters, but how do I limit it to whitespace and new lines only?:
(?<=<i.).*?(?=</i>)
https://regex101.com/r/eZ1gT7/1986

You can not do it with single regex, you can use a combination of regex
<\s*i[^>]+>([\s\S]+?)<\s*\/\s*i\s*>
Demo
This will give you values between tags <i> and text between tags is available in captured group 1, now you can loop through the matched values and find any space character
\s+

I'm guessing that maybe this expression,
(?=\s*[\n\r])(\s*)(?=\S)
replaced with a single space () might be close to what you might have in mind.
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

You'll have to loop through the captured group 2:
(<i[^>]+?>)?([ \n]*)(<\/i>)?
https://regex101.com/r/wcPwkU/1

Related

How would I match all data between 2 symbols with Regex?

I'm trying to find all data (including and after) a dash (-) appears, only up to the first delimiter which is a colon.
Example data:
Input:
bart23-testaccount#test.test:Test:Test:Test
Desired output:
bart23:Test:Test:Test
I've done some research and found this regex, but it's not fit for purpose -(.*):
My purpose is for thousands of lines which are all in various types of order, however the purpose remains the same, highlight all text between the - and the first : (which I will then proceed to delete). I will be using Notepad++
I can answer any questions or make my post more specific if need be, it's kind of hard to explain.
In Notepad++ you can use regex find/replace. Look for:
^([^-]+)-[^:]+(:.*)$
which captures everything up to the first - in group 1, and everything after (and including) the first : in group 2, and replace with
\1\2
Using Notepad++, without any capture group:
Ctrl+H
Find what: -[^:]+
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
- # an hyphen (by default, the first one in a line)
[^:]+ # 1 or more not colon
Result for given example:
bart23:Test:Test:Test
Screen capture:

RegEx help for NotePad++

I need help with RegEx I just can't figure it out I need to search for broken Hashtags which have an space.
So the strings are for Example:
#ThisIsaHashtagWith Space
But there could also be the Words "With Space" which I don't want to replace.
So important is that the String starts with "#" then any character and then the words "With Space" which I want to replace to "WithSpace" to repair the Hashtags.
I have a Document with 10k of this broken Hashtags and I'm kind of trying the whole day without success.
I have tried on regex101.com
with following RegEx:
^#+(?:.*?)+(With Space)
Even I think it works on regex101.com it doesn't in Notepad++
Any help is appreciated.
Thanks a lot.
BR
In your current regex you match a # and then any character and in a capturing group match (With Space).
You could change the capturing group to capture the first part of the match.
(#+.*?)With Space
Then you could use that group in the replacement:
$1WithSpace
As an alternative you could first match a single # followed by zero or more times any character non greedy .*? and then use \K to reset the starting point of the reported match.
Then match With Space.
#+(?:.*?)\KWith Space
In the replacement use WithSpace
If you want to match one or more times # you could use a quantifier +. If the match should start at the beginning of string you could use an anchor ^ at the start of the regex.
Try using ^(#.+?)(With\s+Space) for your regex as it also matches multiple spaces and tab characters - if you have multiple rows that you want to affect do gmi for the flags. I just tried it with the following two strings, each on a separate line in Notepad++
#blablaWith Space
#hello###$aWith Space
The replace with value is set to $1WithSpace and I've tried both replaceAll and replace one by one - seems to result in the following.
#blablaWithSpace
#hello###$aWithSpace
Feel free to comment with other strings you want replaced. Also be sure that you have selected the Regular Extension search mode in NPP.
Try this? (#.*)( ).
I tried this in Notepad++ and you should be able to just replace all with $1. Make sure you set the find mode to regular expressions first.
const str = "#ThisIsAHashtagWith Space";
console.log(str.replace(/(#.*)( )/g, "$1"));

Regexp, that ignores only first capture group

We have tab spaced list of "key=value" pairs.
How we can split it, using regexp?
Case key=value must be transformed into value. Case key=value=value2 must be transformed into value=value2.
https://regex101.com/r/dR5dT0/1 - I've started solution like this, but can't find beautiful way to remove only "key=" part from text.
UPD BTW, do you know cool crash courses on regular expressions?
You can just use
=(\S*)
See regex demo
Since the list is already formatted, the = in the pattern will always be the name/value delimiter.
The \S matches any non-whitespace character.
The * is a quantifier meaning that the \S should occur zero or more times (\S* matches zero or more non-whitespace characters).
You can use this regex for matching:
/\w+=(\S+)/
and grab captured group #1
RegEx Demo

Regex expressions to match text between first comma and the comma before the first number

I have a csv file with all UK areas (43000 rows).
However, even though the fields are separated with commas, they are not enclosed with anything, hence if the field has commas within its contents, import to a database fails.
Fortunately, there is only one field that has commas within its content.
I need a regular expression that I could use to select this field on all rows.
Here is an example of data:
Aberaman,Rhondda, Cynon, Taf (Rhondda, Cynon, Taff),51.69N,03.43W,SO0101
Aberangell,Powys,52.67N,03.71W,SH8410
This should look like:
Aberaman,"Rhondda, Cynon, Taf (Rhondda, Cynon, Taff)",51.69N,03.43W,SO0101
Aberangell,"Powys",52.67N,03.71W,SH8410
So I need to basically select the second field, which is between the first comma and the comma just before the first number.
I will use sublime text 2 to perform this regex search.
Sublime text2 supports \K,
Regex:
^[^,]*,\K(.*?)(?=,\d)
Replacement string:
"\1"
DEMO
Explanation:
^ Asserts that we are at the start of a line.
[^,]* Matches any character not of comma zero or more times.
, Literal comma.
\K Previously matched characters would be discarded.
(.*?)(?=,\d) Matches any character zeror or more times which must be followed by , and a number. ? after * does a reluctant match.
You can try with capturing groups. Simply substitute it with $1"$2"$3 or \1"\2"\3
^(\w+,)([^\d]*)(,.*)$
Live Demo
You can do it in Notepad++ as well.
Find what: ^(\w+,)([^\d]*)(,.*)$
Replace with: $1"$2"$3
A regex which should be able to solve your problem is:
^.*?,(.*?),\d+
This matches
anything (non-greedy) up to first comma (which will not be included in result)
then anything up to second comma (which will be in a group)
and additional condition is that there has to be a number after second comma
So your group is in $1

regular expression pattern handling brackets

I am looking for a regular expression pattern that is able to handle the following problem:
(to make someone) happy (adj.)
I only want to get the word "happy" and the regular expression pattern should also handle lines if only one part is in brackets e.g.:
(to make someone) happy
happy (adj.)
I've tried the following: "\s*\(.*\)"
But I am somehow wrong with my idea!
This one will get you the right word in the first capturing group in all three options:
(?:\([^)]*\)\s*)?(\w+)(?:\s*\([^)]*\))?
You can adjust and be more permissive in case you'd like to get a couple of words or to allow special characters:
(?:\([^)]*\)\s*)?([^()\n]+)(?:\s*\([^)]*\))?
A regex for finding the text between two parenthesized groups is
/(?:^|\([^)]*\))([^(]*)(?:$|\([^)]*\))/m
The breakdown is a follows:
Start with some text in parentheses or the beginning of a line: (?:^|\([^)]*\)). This matches from an open paren to the first closed paren
Then match the text outside of the parentheses, and put it in a group ([^(]*). This matches up to the next open paren.
Then match more text in parentheses or the end of a line: (?:$|\([^)]*\))
I used multiline mode (m) so that ^ and $ would match line breaks as well as the start and end of the string
Try regex (?:^|\))\s*([^\(\)]+?)\s*(?:\(|$)