regex for match inside a word - regex

Say I have following similar texts:
_startOneEnd
_startTwoEnd
_startThreeEnd
I want to match on:
begins with _start
ends with End
and I want capture the bit in-between, e.g., One, Two, Three in the variable above:
Can anyone suggest a regex to capture this?

If each line of input contains only the text similar to your examples, something like this should work:
/^_start(.*)End$/
The ^ anchors the pattern to the start of the string. The $ anchors it to the end of the string. The parenthesis capture the middle part.

In C#, you may use this:
(?<=_start).*(?=End)

It isn't clear if the part in the middle may only be the examples given.
If so, use this:
_start((One)|(Two)|(Three))End
If not, is it can be anything, try this:
_start(.*?)End
Note that the match is non-greedy.

Related

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

How write a regex starts and ends with particular string?

I hava a string, like this:
{"content":(uint32)123", "id":(uint64)111, "test":{"hi":"(uint32)456"}}
I want to get result:
(uint32)123
(uint64)111
so I write regex like this:
[^(?!\")](\(uint32\)|\(uint64\))(\d)+[^(?!\")$]
but the result is:
:(uint32)123
:(uint64)111,
here the result adds : and ,
I hope that the regex does not begin with " and does not end with " , now I should how change my regex?
(\(uint(?:32|64)\)\d+) Works for me. It captures the entire string (uint[32/64])<any number of digits\> without bothering about the characters that come before or after.
Tested the following one in python
(?<!\")(\(uint32\)|\(uint64\))\d+(?!(\"|\d))
It looked like you was trying to use negative lookahead and negative lookbehind checks. But you did couple of mistakes:
You put them inside symbol group like this: [^(?!\")] what this regexp really mean - not any of symbols inside square bracket (^ - stands for not). How it should be instead: (?!\") - which mean symbol after current position shouldn't be quote (note: this will also work if there is no symbol after
To check symbol before you need to use look ahead check which have syntax (?<!some_regexp). So it would be (?<!\")
You don't need checks for start or end of the line. If you do you can put then into separate negative look ahead/behind statement.
Here is corrected example without line start/end checks:
(?<!\")(\(uint32\)|\(uint64\))(\d)+(?!\")(?!\d)
Note: you need to add (?!\d) at the end, cause otherwise it would match everything except last digit if there is quote.
Here is example with start/end of line checks:
(?<!^)(?<!\")(\(uint32\)|\(uint64\))(\d)+(?!\")(?!\d)(?!$)
P.S.: depending on language you using - you might not need to escape quote - you do need to escape quote only in case it is string escape sequence not regexp escape sequence.

Wrap each matching word with quotationmarks

I have several lines in Notepad++ which looks similar to this
A8s KQo QTs A9s A9s AJo AJo 99 KQo A5s
What I would like to do is to wrap each word in quotation marks, followed by a comma is possible.
I've tried matching against [A-Za-z\d]{2-3}, but I dont get any matches.
Desired result:
"A8s", "KQo", "QTs", //etc...
What nickb said is true, but you might want to consider adding word boundaries:
\b[A-Za-z0-9]{2,3}\b
Otherwise if your input had longer words, too like
A8s KQo ABCD 1234
You would get results like
"A8s" "KQo" "ABC"D "123"4
The word boundary makes sure that you can only match entire words.
Because in quantifiers, you need a comma, not a dash:
[A-Za-z\d]{2,3}
^
Otherwise, you were literally matching the characters {2-3}, so your current regex would match things like:
A{2-3}
You probably want to wrap this in a capturing group, like this:
([A-Za-z\d]{2,3})
And then replace it with a reference to what was captured, but surrounded by quotes, similar to this:
"$1",

Regex — only zero or one 's'

I have a name, "foo bar", and in any string, foo, foos, bar and bars should be matched.
I thought this should work like this: (foo|bar)s?. I tried some other regexes as well, but they all were like this. How can I do this?
(foo|bar)s? is correct...
You should use a boundary like \b(foo|bar)s?\b. Else it would also match hihellofoos.
Your question seems to reflect perplexity over why you found a match in foosss. Note the difference between finding a match in a string, and matching the whole string.
You have several ways of dealing with this, and the right choice depends on your application.
Anchor the regex to the whole input line or input: ^(foo|bar)s?$
Anchor the regex to one word: \b(foo|bar)s?\b
Some APIs (but not preg_match) have a separate function to match the whole string.

Regular Expression Troubles

Given the following type of string:
"#First Thing# #Another One##No Space# Main String #After Main# #EndString#"
I would like to come up with a regular expression that can return all the text surrounded by the # symbols as matches. One of the things giving me grief is the fact that the # symbol is both the opening and closing delimiter. All of my attempts at a regex have just returned the entire string. The other issue is that it is possible for part of the string to not be surrounded by # symbols, as shown by the substring "Main String" above. Does anyone have any ideas? I have toyed around with Negative Look-behind assertion a bit, but haven't been able to get it to work. There may or may not be a space in between the groups of #'s but I want to ignore them (not match against them) if there are. The other option would be to just write a string parser routine, which would be fairly easy, but I would prefer to use a regex if possible.
/((#[^#]+#)|([^#]+))/
Perhaps something like the above will match what you want.
This will match the space in between two hashes. Hmm.
/((#[^#]+#)|([^#]*[^#\s]+[^#]*))/
That will get rid of the nasty space, I think.
[Edit]
I think that this is what you need:
(?<=#)[^#]+?(?=#)
With input #First Thing# #Another One##No Space# Main String #After Main# matches:
First Thing
Another One
No Space
Main String
After Main
The second match is the space between Thing# and #Another.
[EDIT] To ignore space:
(?<=)(?!\s+)[^#]+?(?=#)
If you want to ignore trailing spaces:
(?<=)(?!\s+)[^#]+?(?=\s*#)
Try this. The first and last groups should not be captured and the .*? should be lazy
(?:#)(.*?)(?:#)
I think this is what you really need:
((#[^#]+#)|([^#]*[^#\s]+[^#]*))
but it will not capture the #'s around Main String