reverse replace all using caracter and word - regex

I found this pattern in another post:
Pattern p = Pattern.compile("[^xyz]"); s.replaceAll(p.pattern(), "-");
It allow to replace all of characters except x, y and z
How can we adapt it to add a word in the reverse replace all ? For exemple, I'd like to keep xyz and the word dog
exemple:
"abcxyzabcxyzdfrdogdzx" -- > "xyzxyzdogzx"
thanks

For this, you'll need to capture words that you want globally, and join the matches together.
In JS:
/([xyz]|dog)/g
Breakdown:
[xyz] - any character from the list x, y, z
dog - match dog literally
([xyz]|dog) - capture the list, or dog
/g - the global modifier
let string = "abcxyzabcxyzdfrdogdzx",
regex = /([xyz]|dog)/g,
whatWeAreLookingFor = string.match(regex).join("");
console.log(whatWeAreLookingFor);

Related

match everything but a given string and do not match single characters from that string

Let's start with the following input.
Input = 'blue, blueblue, b l u e'
I want to match everything that is not the string 'blue'. Note that blueblue should not match, but single characters should (even if present in match string).
From this, If I replace the matches with an empty string, it should return:
Result = 'blueblueblue'
I have tried with [^\bblue\b]+
but this matches the last four single characters 'b', 'l','u','e'
Another solution:
(?<=blue)(?:(?!blue).)+(?=blue|$)|^(?:(?!blue).)+(?=blue|$)
Regex demo
If you regex engine support the \K flag, then we can try:
/blue\K|.*?(?=blue|$)/gm
Demo
This pattern says to match:
blue match "blue"
\K but then forget that match
| OR
.*? match anything else until reaching
(?=blue|$) the next "blue" or the end of the string
Edit:
On JavaScript, we can try the following replacement:
var input = "blue, blueblue, b l u e";
var output = input.replace(/blue|.*?(?=blue|$)/g, (x) => x != "blue" ? "" : "blue");
console.log(output);

Remove given string from both start and end of a word

Data :
col 1
AL GHAITHA
AL ASEEL
EMARAT AL
LOREAL
ISLAND CORAL
My code :
def remove_words(df, col, letters):
regular_expression = '^' + '|'.join(letters)
df[col] = df[col].apply(lambda x: re.sub(regular_expression, "", x))
Desired output :
col 1
GHAITHA
ASEEL
EMARAT
LOREAL
ISLAND CORAL
SUNRISE
Function call :
letters = ['AL','SUPERMARKET']
remove_words(df=df col='col 1',letters=remove_letters)
Basically, i wanted remove the letters provided either at the start or end. ( note : it should be seperate string)
Fog eg : "EMARAT AL" should become "EMARAT"
Note "LOREAL" should not become "LORE"
Code to build the df :
raw_data = {'col1': ['AL GHAITHA', 'AL ASEEL', 'EMARAT AL', 'LOREAL UAE',
'ISLAND CORAL','SUNRISE SUPERMARKET']
}
df = pd.DataFrame(raw_data)
You may use
pattern = r'^{0}\b|\b{0}$'.format("|".join(map(re.escape, letters)))
df['col 1'] = df['col 1'].str.replace(pattern, r'\1').str.strip()
The (?s)^{0}\b|(.*)\b{0}$'.format("|".join(map(re.escape, letters)) pattern will create a pattern like (?s)^word\b|(.*)\bword$ and it will match word as a whole word at the start and end of the string.
When checking the word at the end of the string, the whole text before it will be captured into Group 1, hence the replacement pattern contains the \1 placeholder to restore that text in the resulting string.
If your letters list contains items only composed with word chars you may omit map with re.escape, replace map(re.escape, letters) with letters.
The .str.strip() will remove any resulting leading/trailing whitespaces.
See the regex demo.

Grab first 4 characters of two words RegEx

I would like to grab the first 4 characters of two words using RegEx. I have some RegEx experinece however a search did not yeild any results.
So if I have Awesome Sauce I would like the end result to be AwesSauc
Use the Replace Text action with the following parameters:
Pattern: \W*\b(\p{L}{1,4})\w*\W*
Replacement text: $1
See the regex demo.
Pattern details:
\W* - 0+ non-word chars (trim from the left)
\b - a leading word boundary
(\p{L}{1,4}) - Group 1 (later referred to via $1 backreference) matching any 1 to 4 letters (incl. Unicode ones)
\w* - any 0+ word chars (to match the rest of the word)
\W* - 0+ non-word chars (trim from the right)
I think this RegEx should do the job
string pattern = #"\b\w{4}";
var text = "The quick brown fox jumps over the lazy dog";
Regex regex = new Regex(pattern);
var match = regex.Match(text);
while (match.Captures.Count != 0)
{
foreach (var capture in match.Captures)
{
Console.WriteLine(capture);
}
match = match.NextMatch();
}
// outputs:
// quic
// brow
// jump
// over
// lazy
Alternatively you could use patterns like:
\b\w{1,4} => The, quic, brow, fox, jump, over, the, lazy, dog
\b[\w|\d]{1,4} => would also match digits
Update:
added a full example for C# and modified the pattern slightly. Also added some alternative patterns.
one approach with Linq
var res = new string(input.Split().SelectMany((x => x.Where((y, i) => i < 4))).ToArray());
Try this expression
\b[a-zA-Z0-9]{1,4}
Using regex would in fact be more complex and totally unnecessary for this case. Just do it as either of the below.
var sentence = "Awesome Sau";
// With LINQ
var linqWay = string.Join("", sentence.Split(" ".ToCharArray(), options:StringSplitOptions.RemoveEmptyEntries).Select(x => x.Substring(0, Math.Min(4,x.Length))).ToArray());
// Without LINQ
var oldWay = new StringBuilder();
string[] words = sentence.Split(" ".ToCharArray(), options:StringSplitOptions.RemoveEmptyEntries);
foreach(var word in words) {
oldWay.Append(word.Substring(0, Math.Min(4, word.Length)));
}
Edit:
Updated code based on #Dai's comment. Math.Min check borrowed as is from his suggestion.

Regex to match repeated pattern after a string

I need a regex that extract pattern after specific word (her like Limits::)
i have teststring ,So let's say the text is always between delimiter !Limits::****! :
*ksjfl kfj sdfasdfaf dfasf asd sdf a dfasd fdaf ad f afdfaf dfad bla bla ksfajs ldsfskj !Limits::WLo1/WHi1/WHi1/WHi1,WLo2/WHi2/WHi/WHi2,.hier repeated pattern..,WLon/WHin/CLon/CHin!
fasdfakl skdfkas sflas fasf sdf afasf
i just want only words :
WLo1
WHi1
WHi1
WHi1
WLo2
WHi2
WHi
WHi2
.
.
.
WLon
WHin
CLon
CHin
i have tested like (?:!\w+::(?:(\w+)/(\w+)/(\w+)/(\w+)))|(?:,(\w+)/(\w+)/(\w+)/(\w+))+.*!, with fail
Regular expressions:
/(W.*|C.*)(?=\/|!|,)/g : match words beginning with W or C followed by / , !, or ,
/\/|,.*(?=,)|,/ : remove / or , or any characters followed by , or , from string returned from first RegExp
var str = "*ksjfl kfj sdfasdfaf dfasf asd sdf a dfasd fdaf ad f afdfaf dfad bla bla ksfajs ldsfskj !Limits::WLo1/WHi1/WHi1/WHi1,WLo2/WHi2/WHi/WHi2,.hier repeated pattern..,WLon/WHin/CLon/CHin! fasdfakl skdfkas sflas fasf sdf afasf";
var res = str.match(/(W.*|C.*)(?=\/|!|,)/g)[0].split(/\/|,.*(?=,)|,/);
document.body.textContent = res.join(" ")
I don't know what the ending delimiter is, so if it matters, update your question and I'll amend this expression:
/(?<=Limits::)(?:(.+?)\/)+/i
Searches for Limits::, then repeating strings ending with /, your words will be in group 1.

Regex to remove bracket but not its contents

I would like to remove constant text and brackets from a String using regex. e.g.
INPUT Expected OUTPUT
var x = CONST_STR(ABC) var x = ABC
var y = CONST_STR(DEF) var y = DEF
How to achieve this?
Try with regex:
(?<==\s)\w+\(([^)]+)\)
DEMO
which means:
(?<==\s) - lookbehind for = and space (\s), the regax need to fallow these signs,
\w+ - one or more word characters (A-Za-z_0-9) this might be upgraded to naming rules of language from code you are processing,
\( - opening bracket,
([^)]+) - capturing group (\1 or $1) for one or more characters, not closing brackets,
\) - closing bracket,
Just remember that in Java you need to use double escape character \,
like in:
public class Test {
public static void main(String[] args) {
System.out.println("var y = CONST_STR(DEF)".replaceAll("(?<==\\s)\\w+\\(([^)]+)\\)", "$1"));
}
}
with output:
var y = DEF
The $1 in replaceAll() is a call to captured groups nr 1, so a text captured by this fragment of regex: ([^)]+) - words in brackets.