Regex should not be recognized for special characters - regex

I want the regex not to be recognized, should be a special character before, between and after the regex.
My Regex:
\b([t][\W_]*?)+([e][\W_]*?)+([s][\W_]*?)+([t][\W_]*?)*?\b
https://regex101.com/r/zKg2eR/1
Example:
#test, te+st, t'est or =test etc.
I hope I could bring it across reasonably understandable.

If you want to match a word character excluding an underscore, you can write it as [^\W_] using a negated character class.
You don't need a character class for a single char [t] and you are repeating the groups as well, which you don't have to when you want to match a form of test
If the words are on a single line, you can append anchors ^ and $
^(t[^\W_]*)(e[^\W_]*)(s[^\W_]*)(t[^\W_]*)$
Regex demo
As you selected golang in the regex tester, you can not use lookarounds. Instead you can use an alternation to match either a whitespace char or the start/end of the string.
Then capture the whole match in another capture group.
(?:^|\s)((t[^\W_]*)(e[^\W_]*)(s[^\W_]*)(t[^\W_]*))(?:$|\s)
Regex demo

Related

How to match a word based on slash in regular expression

I am trying to match a word with regex. for example, I want to match only first 2 folders in below string
/folder1/folder2/filder3/folder4/folder5
I wrote a below regex to match first two folders but it matches everything till /folder5 but I wanted to match only till /folder2
/(\w.+){2}
I guess .+ matches everything. Any idea how to handle this?
You can use
^/[^/]+/[^/]+
^(?:/[^/]+){2}
Or, if you need to escape slashes:
^\/[^\/]+\/[^\/]+
^(?:\/[^\/]+){2}
See the regex demo. [^/] is a negated character class that matches any char other than a / char.

Match a part of a string using regex

I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks
Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo
The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example

Modifying regex to match beginning and end characters

I am new to regex and playing around with writing regex to match markdown syntaxes, particularly italic text like:
this is markdown with some *italic text*
After writing some naive implementations I found this regex which seems to do the job quite nicely (dealing with edge-cases) and matches the entire string:
(?<!\*)\*([^ ][^*\n]*?)\*(?!\*)
However, I don't want to match the entire string - I only want to match the beginning and end * characters (so that I can do some special formatting to those characters). How might I go about doing that?
The tricky thing is that I only want to the match the * characters when the rest of the string matches the correct format of a string in italics (i.e. meets the requirements of that regex above). So a simple regex like (\*|\*) isn't going to cut it.
Except from using a capturing group for the asterix at the start and at the end, you can add an asterix to the first negated character class to prevent matching a double **.
Note that as pointed out by #toto you don't really need the capturing groups around the asterix (\*). You can also match them and add the replacement characters before and after the single capturing group for the content in the middle.
It also means that it should match at least a single character other then an asterix.
You don't have to make the first character class non greedy *? as it can not cross the * boundary that follows.
(?<!\*)(\*)([^*\s][^*\r\n]*)(\*)(?!\*)
Regex demo
If there can also not be a space before the ending asterix, you can repeat matching a space followed by matching any non whitespace char except an asterix (?: [^*\s]+)*
The \r\n in the negated character class is to prevent newline boundaries which are also matched by \s. If that should not be the case, you can replace that by a space or tab and space.
(?<!\*)(\*)([^*\s]+(?: [^*\s]+)*)(\*)(?!\*)
Regex demo
Just change the first and second \* to capturing groups and you can change at will:
(?<!\*)(\*)([^ ][^*\n]*?)(\*)(?!\*)
Demo

Regex: scrub punctuation except if inside a word?

I'm not great at regex but I have this for removing punctuation from a string.
let text = 'a user provided string'
let pattern = /(-?\d+(?:[.,]\d+)*)|[-.,()&$#![\]{}"']+/g;
text.replace(pattern, "$1");
I am looking for a way to modify this so that it keeps punctuation if inside a word e.g.
some-hypenated-words
a_snake_case
or.even.a.dot.word
should all keep the punctuation. How would I modify it for that?
One option could be changing the \d to \w to extend the match to word characters and add a hyphen to the character class in the capturing group.
In the replacement use group 1.
(\w+(?:[.,-]\w+)*)|[-.,()&$#![\]{}"']+
Regex demo
If you want to match multiple hyphens, commas or dots you could repeat the character class [.,-]+

REGEX - Get all groups of characters with their delimiter

I'm not pretty good with regex sot his is my problem.
I have a String who contains c#m#fc#fm# and I want to get all groups of characters with their # at the end.
Like this :
c#
m#
fc#
fm#
I have try some regex but I never get what I want.
Thanks a lot for your help.
You can use [^#]+# and find all matches, where match will start by capturing one or more characters using negated character class [^#]+ (any character except #) and at the end will match one #
Regex Demo
Also, in case you have space in your string which you don't want to include in matched texts, you can put \s also within the negated character class and use this regex,
[^#\s]+#
Regex Demo excluding space from matched tokens