Create regular expression remove word

Create regular expression remove word - regex

Hello good afternoon!!
I'm new to the world of regular expressions and would like some help creating the following expression!
I have a query that returns the following values:
caixa-pod
config-pod
consultas-pod
entregas-pod
monitoramento-pod
vendas-pod
I would like the results to be presented as follows:
caixa
config
consultas
entregas
monitoramento
vendas
In this case, it would exclude the word "-pod" from each value.

I would try (.*)-pod. It is not clear, where do you want to use that regexp (so regexp can be different). I guess it is dashboard variable.

You can try
\b[a-z]*(?=-pod)\b
This regex basically tells the regex engine to match
\b a word boundary
[a-z]* any number of lowercase characters in range a-z (feel free to extend to whatever is needed e.g. [a-zA-Z0-9] matches all alphanumeric characters)
(?=-pod) followed by -pod but exclude that from the result (positive lookahead)
\b another word boundary
\b matches a word boundary position between a word character and non-word character or position (start / end of string).

Related

Why . is getting excluded in word boundary in regex

I have the following regex:
\b[_\.][0-9]{1,}[a-zA-Z]{0,}[_]{0,}\b
My input string is:
_49791626567342fYbYzeRESzHsQUgwjimkIfW
.49791626567342fYbYzeRESzHsQUgwjimkIfW
I would assume that it matches 1. and 2., but it is only matching in the first scenario. Can you help me find the mistake in the regex?

A word boundary is a border between a word character (letters, digits, underscore) and either a non-word-character or the start or end of the string. So there simply is no word boundary between dot (non-word-character) and the start of the string.
You can use an anchor in this case, to signal the start of the string, like
^[_\.][0-9]{1,}[a-zA-Z]{0,}[_]{0,}$
You can also shorten your regex a bit by using * and + quantifiers and avoiding unnecessary escape sequences, as suggested by Toto
^[_.][0-9]+[a-zA-Z]*_*$
You can also use lookahead and lookbehind (if available) to build yourself a custom boundary.

Regex finding words which contains a given sequence and excluding a specific word

I want to find words which contains a given sequence of letters. However the word should be different than a given banned word.
For instance in
"modal dalaman odal Modal ODAL amodal modalex amodale"
If the sequence is "dal" and the banned word is modal, I want to get the dalaman, odal, ODAL, amodal, modalex, amodale.
How can I do that in regex? BTW, there is no specific programming language for this question.

You can use this pattern below to match all words that contain "bar" but are not equal to "modal" as full word.
Pattern:
\w*dal(?<!\bmodal\b)\w*
Explanation:
\w* matches any number of word characters (alphanumeric and underscore "_"), including zero
dal matches the sequence "dal" literally
(?<!\bmodal\b) is a negative lookbehind which assures that the sequence "modal" could not be matched immediately on the left of this token.
The \b matches only at word boundaries, but does not consume any characters.
\w* matches any number of word characters (alphanumeric and underscore "_"), including zero
Check this regex out on regex101.com
This is the old version of my answer that was valid before the question update:
You could use the pattern below together with the i (case insensitivity) flag.
Depending on what programming language or environment you use to process the regex, you might either also have to set the g (global) flag to match all separate occurrences of the pattern, or use a method of your environment that searches all matches, like e.g. in Python re.findall().
Pattern:
\S*(?<!mo)dal\S*
Explanation:
\S* matches any number of non-whitespace characters, including zero
(?<!mo) is a negative lookbehind which assures that the sequence "mo" could not be matched immediately on the left of this token
dal matches the sequence "dal" literally
\S* matches any number of non-whitespace characters, including zero
Check this regex out on regex101.com
More general, you can use this pattern:
\S*(?<!%%FORBIDDEN_LEFT%%)%%REQUIRED%%(?!%%FORBIDDEN_RIGHT%%)\S*
after replacing the placeholders %%REQUIRED%%, %%FORBIDDEN_LEFT%% and %%FORBIDDEN_RIGHT%% with whatever strings you need.
For example, if you want to match "cd" but not "abcdef", you have to use the pattern \S*(?<!ab)cd(?!ef)\S*.

How to find words that contain string with a limited size

I need to find all the words in an inputted text that has (?i:val) in it and are no longer that 5 characters.
So far I got: \b([a-zA-Z]*(?i:val)[a-zA-Z]*){1,4}\b
If we take this sample text to look in: In computer science, a value is an expression which cannot be evaluated any further (a normal form). Val is also a match
I get 3 matches (value, evaluated and Val), however evaluated should not match the pattern, as it is too long. What is the right way to get this straight?

Your pattern does not account for the length of the words matched.
Use word boundaries and a lookahead like this:
(?i)\b(?=\w*val)\w{1,5}\b
See regex demo
The regex matches:
\b - a leading word boundary since the next pattern is \w
(?=\w*val) - a lookahead making sure there is a val substring after zero or more word characters
\w{1,5} - matches 1 to 5 word characters
\b - trailing word boundary that stops words of more than 5 characters long from matching
You may use an ASCII JS version of the regex:
/\b(?=[a-z]*val)[a-z]{1,5}\b/i

It's important to understand why the "evaluated" was matched. Note:
[a-zA-Z]* matches the "e"
(?i:val) matches "val"
[a-zA-Z]* matches "uated"
Actually there's not repetition here! The pattern was matched in only one iteration.
You can achieve what you want using lookarounds, but I think that regex is not the best tool for this task. I highly recommend you using other functions depending on what you have.

regex word boundary excluding the hyphen

i need a regex that matches an expression ending with a word boundary, but which does not consider the hyphen as a boundary.
i.e. get all expressions matched by
type ([a-z])\b
but do not match e.g.
type a-1
to rephrase: i want an equivalent of the word boundary operator \b which instead of using the word character class [A-Za-z0-9_], uses the extended class: [A-Za-z0-9_-]

You can use a lookahead for this, the shortest would be to use a negative lookahead:
type ([a-z])(?![\w-])
(?![\w-]) would mean "fail the match if the next character is in \w or is a -".
Here is an option that uses a normal lookahead:
type ([a-z])(?=[^\w-]|$)
You can read (?=[^\w-]|$) as "only match if the next character is not in the character class [\w-], or this is the end of the string".
See it working: http://www.rubular.com/r/NHYhv72znm

I had a pretty similar problem except I didn't want to consider the '*' as a boundary character. Here's what I did:
\b(?<!\*)([^\s\*]+)\b(?!*)
Basically, if you're at a word boundary, look back one character and don't match if the previous character was an '*'. If you're in the middle, don't match on a space or asterisk. If you're at the end, make sure the end isn't an asterisk. In your case, I think you could use \w instead of \s. For me, this worked in these situations:
*word
wo*rd
word*

Regular Expression Word Boundary and Special Characters

I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack
add +
or
add (+)
and the needle
+
the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?
Using
add (plus)
as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.

\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:
add +
...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.

Try changing it to:
/\b\s?+/gi
Edit:
Extend this concept as far as you want. If you want the first + after any word boundary:
/\b[^+]*+/gi

Boundaries are very conditional assertions; what they anchor depends on what they touch. See this answer for a detailed explanation, along with what else you can do to deal with it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Create regular expression remove word - regex

I would try (.*)-pod. It is not clear, where do you want to use that regexp (so regexp can be different). I guess it is dashboard variable.

Related

Why . is getting excluded in word boundary in regex

Regex finding words which contains a given sequence and excluding a specific word

How to find words that contain string with a limited size

regex word boundary excluding the hyphen

Regular Expression Word Boundary and Special Characters

Categories

Resources