I need to find out all words in a sentence that are between a $ and a space like this
this is $abc $cde any $ety.
The result should be abc, cde and ety.
I tried this
'(?<=$$)(.*)(?=)'
but it shows some error. What is wrong in this or any new suggestions?
You can try this:
\$(\w+)
As capturing groups, you'll get each of the words.
\w will match a-Z, 0-9 and _, if you want to match only letters, for instance, you can change to: \$([a-zA-Z]+)
Try this RegEx:
(?<=\$)([^\s]*)(?=\s)
Assuming from the question, each word (word contains only chars A-Za-z) must begin with $ and have a space at the end.
The following regex will match such words -- \$([A-Za-z])+ (there is a space at the end, which is hard to see due to the formatting here). If there are multiple spaces, you can use + (space before +, hard to see again due to formatting) at the end of the regex.
Then you can extract the first matching group (i.e. $1) as your matching word, and you need to do this in a loop till there are no more matches you can extract. That is something like --
while ($x =~ /\$([A-Za-z])+ /g) {
// $1 is your match
}
If your word contains more than just chars, then you can use \w as mentioned by pcalcao, which will include both 0-9 and _
Related
I want to replace anything other than character, spaces and number only in end with empty string or in other words: we replace any number or spaces comes in-starting or in-middle of the string replace with empty string.
Example
**Input** **Output**
Ndd12 Ndd12
12Ndd12 Ndd12
Ndd 12 Ndd 12
Nav G45up Nav Gup
Attempted Code
regexp_replace(df1[col_name]), "(^[A-Za-z]+[0-9 ])", ""))
You may use:
\d+(?!\d*$)|[^\w\n]+(?!([A-Z]|$))
RegEx Demo
Explanation:
\d+(?!\d*$): Match 1+ digits that are not followed by 0+ digits and end of line
|: OR
[^\w\n]+(?!([A-Z]|$)): Match 1+ non-word characters that are not followed by an uppercase letter or and end of line
if you use python, you can use regular expressions.
You can use the re module.
import re
new_string = re.sub(r"[^a-zA-Z0-9]","",s)
Where ^ means exclusion.
Regular expressions exist in other languages. So it would be helpful to find a regular expression.
I came up with this regex to capture all characters that you want to remove from the string.
^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+
Do
regexp_replace(df1[col_name]), "^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+", ""))
Regex Demo
Explanation:
^\d+ - captures all digits in a sequence from the start.
(?<=\w)\d+(?![\d\s]) - Positive look behind for a word character with a negative look ahead for a number followed by space and capturing a sequence of digits in the middle. (Captures digits in G45up)
(?<=\s)\s+ - positive look behind for a space followed by one or more spaces, capturing all additional spaces.
Note : This regex could be inefficient when matching large strings as it uses expensive look-arounds.
^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+|(?<=\w)\W|\W(?=\w)|(?<!\w)\W|\W(?!\w)
I have a regex to find url's in text:
^(?!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}?$
However it fails when it is surrounded by text:
https://regex101.com/r/0vZy6h/1
I can't seem to grasp why it's not working.
Possible reasons why the pattern does not work:
^ and $ make it match the entire string
(?!:\/\/) is a negative lookahead that fails the match if, immediately to the right of the current location, there is :// substring. But [a-zA-Z0-9-_]+ means there can't be any ://, so, you most probably wanted to fail the match if :// is present to the left of the current location, i.e. you want a negative lookbehind, (?<!:\/\/).
[a-zA-Z]{2,11}? - matches 2 chars only if $ is removed since the {2,11}? is a lazy quantifier and when such a pattern is at the end of the pattern it will always match the minimum char amount, here, 2.
Use
(?<!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}
See the regex demo. Add \b word boundaries if you need to match the substrings as whole words.
Note in Python regex there is no need to escape /, you may replace (?<!:\/\/) with (?<!://).
The spaces are not being matched. Try adding space to the character sets checking for leading or trailing text.
Given the following string
span.a.b this.is.really.confusing
I need to return the matches a and b. I've been able to get close with the following regex:
(?<=\.)[\w]+
But it's also matching is, really, and confusing. When I include a negative lookahead I get even closer, but I'm still not there.
(?<=\.)[\w]+(?=\s) # matches b, confusing
How can I match words after a dot until a whitespace occurs?
How can I match words after a dot until a whitespace occurs?
NB: this is language agnostic pseudo-code, but should work.
regex = "^[^\s.]+.(\S+).*"
targets = <extracted_group>.split(".")
Regex explanation:
"^": beings with
"[^\s.]+." 1 or more non-whitespace, non-period characters, followed by a period.
"(\S+)": group and capture all of the following non-whitespace characters
".*": matches 0 or more of any non-newline character
If the split function takes a regex instead of a string, you'll need to escape the '.' or use a character class.
NB: You can do it without the split, but I think that the split is more transparent.
I am not sure if this is good enough for all your possible cases, but it should work with the provided example:
\.([\w]+)\.([\w]+)\s
$1 = a, $2 = b
Assume i have a big paragraph, in which there are words are like found field failed fired killed (so many negative words i know!!)
Now, I want to fetch line which have words starting from fi hi or k and ends with eld or ed
How would i go about searching this pattern of word in string....??
keep in check that i am asking about word pattern in string, not string pattern
These 2 surely didn't worked
egrep "^(f[ai]|k)+(eld|ed)$"
and
egrep "\<(f|k)+(eld|ed)$\>"
I'll admit i am not a hulk of regex, doing it out of basic understanding, so any one willing to suggest a better way (with some description) is most welcome too!! :)
The regex you are probably looking for would be
"\b([fh]i|k)\w*(eld|ed)\b"
The \w* should be equivalent to [a-zA-Z0-9_]*, so that will allow any word-like characters be between requested strings.
The \b is there to ensure, that the word really starts and ends with letters you want. Otherwise you might for example match string which contains word Unfailed
Also you need to remove $ and ^ from the regex because $ means end of line and ^ the beginning of line.
I'd use
\<(fi|hi|k)[a-zA-Z]*?(eld|ed)\>
to match the words you want.
demo # regex101
(when you take a look at the demo: \b is the same as \<
Explanation:
\< #beginning of word
(fi|hi|k) #either fi or hi or k
[a-zA-Z]*? #zero to unlimited of a-z and A-Z
(eld|ed) #either eld or ed
\> #end of word
If you want to allow numbers, dashes, underscores, ... within your words, simply add them to the character-class, for example: [a-zA-Z$_] if you want to allow $ and _, too.
You can use word boundary \b.
^.*\b(fi|hi|k)\w*(eld|ed)\b.*$
------------------------
This pattern would select lines that contain those words
NOTE:You need to use multiline modifier m & global modifier g
Try it here
I've been working for many hours trying to do a "simple thing": use a regex to validate a text field.
I need to make sure of:
1- Only use (a-z), (A-Z) and (0-9) values
2- Add a SINGLE wildcard only at the end.
Ex.
Match
MICHE*
Match
JAMES
No match
MICHE**
No match
MIC_HEAL*
I have this regex till now:
[a-zA-Z0-9\s-]+.\z*?
The problem is it still matches when I introduce an invalid character as long as I have a matching sub-string See my REGEX
What can I do to force a match on the whole string? What am I missing?
Thx!
Use ^ (start of line) and $ (end of line) to only match the whole string:
^[a-zA-Z0-9\s-]+.\z*?$
(If you have a multiline input you can also use \A and \z - start and end of string)
On a second look, I don't understand the end of your regex: . (anything) \z * ? (end of string, zero or more times, zero or one time). This regex will match something like:
Ikdflfdf&
Is that correct? If you only want the character *, you should use:
^[a-zA-Z0-9\s-]+\*?$
Also, as Robbie pointed out, you're including spaces and the - in your list of accepted characters. If you only want letters and digits, a shortcut would be using \w (word characters):
^\w+\*$
However, depending on whether the matcher is Unicode-aware or not, \w will also match non-ASCII letters and digits, which may or may not be what you want.
Try this one :
^[a-zA-Z0-9]+\*?$
^ string start
$ string end
* is meta character so it should be escaped like \* to use it as a letter
I think you just need ^ at the begining and $ at the end
^[a-zA-Z0-9\s-]+.\*?$
Also, you don't need the \z
Also, you haven't mentioned that you want to allow spaces and dashes - but you have included them in your allowed character set.