Extract Names from Field (Regex) - regex

I'm trying to extract First and Last name from a string that looks like this:
CN=First\, Last,OU=Standard users,OU=Users,OU=Place,OU=DOMAIN,DC=dfe,DC=stuff,DC=asdf
([^CN=,\\])([a-zA-Z]*)?(?!OU)
My attempt is above, but it obviously doesn't work.
Can anyone point me in the correct direction?
Thanks

You can use this expression:
^CN=(.*?)\\, (.*?),
Live demo. It uses two capturing groups for first and last names with the other static text around them.

CN=(.*)\\, ([^,]*),
This should get your First until the \, and last until the next comma.

You could come up with:
CN=(?P[^,]+),\s*
(?P[^,]+)
See a demo on regex101.com.

In the regex you have tried, the [^CN=,\\] this part will do exactly the reverse of what you need, like will match the characters except CN=,\\ these.
You can use:
^CN=[^,]+,\s+[^,]+

^CN=(.+)\\, (.+?),
This will get you First in the first capture group and Last in the second capture group, assuming everything matches the pattern of line starting with CN=, then first name, then \,, then last name, followed by ,. First and last are will not be limited to only letters though.
When you put the ^ inside of brackets like you attempted, [^CN=,\\], then you are telling Regex to look for any characters except C, N, =, ,, and \.

Related

Regex for last point or comma in string

I'm looking for a regex with gives me the last occurance of a point or a comma in a string (whichever one is the further back)
From what I googled myself I got this one ([,\.])(?!.*\1) but this gives me two results per string which isnt what I wanted.
I hope somebody can help me as I'm struggeling for the correct keyword to google for.
Cheers and thank you very much in advance
This pattern ([,\.])(?!.*\1) matches the last comma or dot asserting not any of the 2 being present on the right anymore.
That can occur for both a dot and a comma so you could possibly get 2 matches.
If you want a single match, you can match one of them and assert no more occurrences of either one of them to the right using a negated character class [^,.\r\n]* matching any char except the listed characters.
Note that you don't need the capture group for a match only.
[,.](?=[^,.\r\n]*$)
See a regex demo.
In this regex pattern the first capturing group is the last comma or point in each line.
We achieve this by looking for a comma or point and not allowing a point, comma or line ending between it and the end of the line.
([.,])[^.,\n\r]*$

Regex to find if all the characters in a word are the same specific character

I have a set of words coming in one by one like aa, ##, ???, ~~~, ?~ etc
I need a regex to find if any of these words is containing only ? or only ~.
Of the above input examples, ??? and ~~~ should match but not the others.
I tried ^[\s?]*$ and ^[\s~]*$ separately and it works, I am trying to combine them.
^[\s?||~]*$ doesn't work as it also recognizes ?~ as valid.
Any help?
You can use this regex, which looks for a string starting with a ~ or a ?, and then asserts that every other character in the string is the same as the first one using a backreference (\1):
^([~?])\1+$
Demo on regex101
You need to use backreference to achived your desired result.
If you want only ~ or ? use
^([~?])\1+$
If you want any repetitive pattern, use
^(.)\1+$
Explanation (.) or ([~?]) capturing the first charactor.
Then, \1+ checking the first charactor, one or more times (backreferencing)
You want to match lines that both start and end with any number of either a tilde or questionmark. That would be ^\(~\|?\)*$. The parentheses to make a group and the vertical bar to do the 'or' need to be backslash escaped.

regex: literal followed by one or more digit

I would like to search for the literal US followed by a digit, that is repeated one or more times, followed by anything except a dash. For example, these should match:
US3.
US22?
US134!
while these don't
US5-
US66-
US789-
I have tried
r'US[0-9]+(?=[^-])'
but it also matches
'US6', 'US78'
How do I modify this?
Mention the list of characters after matching digits in a character class.
Regex: US\d+[?!&%.]
Regex101 Demo
I'm not sure if this is the absolute best way to go about doing this, but if you know that there is only one of these each line, you could just add a $ to the positive lookahead like so:
US[0-9]+(?=[^-]$)
Regexr.com example
You have to mention the list of characters in character class that are allowed.
I have attached the screenshot of the output , you can verify it !
US[0-9]+[?!&%.]

Replace duplicates Items from a string using Regex

I have a string which looks something like this
xyz 123;abc;xyz 123;efg;
I want to remove the duplicates and keep only one occurrence in the string. I want the output to be like this
xyz 123;abc;efg;
I tried using (?<=;|^)([^;]*);(\1)+(?=;|$) but couldn't figure out how to remove one of the duplicates. Any suggestions ?
Brief
Since you didn't specify a language, I'll assume the tokens in your original regex are all working in whatever language you're using.
Code
See regex in use here
(([^;]*;).*)\2
Replace with \1
Explanation
(([^;]*;).*) Capture the following into capture group 1
([^;]*;) Capture the following into capture group 2
-[^;]* Match any character except the semi-colon character ; any number of times
; Match the semi-colon character literally
\2 Matches the same text as most recently matched by the second capture group
Thanks all for your suggestions. Finally i got this working with this regex
(?<=,|^)([^,]*)(?=.*\\b\\1\\b)(?=,|$)
The below is for java.
For duplicate words(consequent/random) you can use the regex string as
\b(\w+)\b(?=.*?\b\1\b
For duplicate characters(consequent/random) in a string you can use
(.)(?=.*?\1)

Regex to match url with or without 'folder'

I'm struggling to get the right regex to match the following;
content/foo/B6128/8918/foo+bar+foo
OR
content/foo/B6128/8918/foo+bar+foo/randomstringnumsletters
I'm sure this isn't that complicated and I'm nearly there, just can't get it perfected. Here's what I've tried;
content\/(\w+)\/(\w+)\/(\d+)\/([^\/]+[\w]+)\/?(\w*)$
using this online tester: http://regex101.com/r/sB8rR5/2
It still matches a 5th item with this string content/foo/B6128/8918/foo+bar+foo;
And while technically this pattern does match either OR url structures. I don't want it to match the 5th item when there's no randomstringnumsletters present.
After playing around with it for a bit, I do realise some elements are redundant with what I've tried, but I'm not getting anywhere with it...
Just turn the last capturing group into an optional one, and change \w* to \w+ in the last capturing group inorder to prevent null character to be captured by the 5th group.
content\/(\w+)\/(\w+)\/(\d+)\/([^\/]+[\w]+)\/?(\w+)?$
DEMO
Looks like your REAL pattern should be:
content\/((?:\w+\/?)+)
DEMO
or am I wrong? This will match the whole string (after content/) and return it all / delimited. You can parse each variable from there.
You can take each part as an array, then take the part that you need...
DEMO