match chars, numeric and special chars within a pattern - regex

within a string i could have the following:
this is a string ::foo:bar:: ::baz:123abc:: ::bäz:üéü:: ::#$%%:4/4::
how can i get all parts with starts with :: and ends with :: and match what is in between.
within those colons there are key, value pairs i need to filter out of the string.
if there wouldn't be special chars i the regex would look like this:
r'::([a-z0-9]+):([a-z0-9]+)::'
i could list those special chars manually but i don't think thats the right way to do this.
thx

With not-colon:
r'::([^:]+):([^:]+)::'

First you should mention the regex flavor/tool you'd like to use, but generally:
r'::([^:]+)::
Should capture the special chars as well.
HTH

Related

Regexp - Get everything before two different strings. One can contain both

I have to use regexp.
Current state:
.+?((/=\.czxy)|(?=\.zzzz))
It's working for the first two cases (that's obvious)
So I have decided to do something like this:
.+?((/=\.czxy)|(?=\.zzzz)|(?=\-\-[0-9]))
But this still doesn't work. (There is OR).
I want to have everything before the extension. (Example 1 and 2)
When string is ended with '--1,--2, --3... and so on', I need to have everything before that. (Example 3 and 4)
Note: I cannot use if construction.
Examples:
123_abc_cb1.czxy -> 123_abc_cb1
123_23c_cb1.zzzz -> 123_23c_cb1
123_abc_cb1--1.czxy -> 123_abc_cb1
123_23c_cb1--1.zzzz -> 123_23c_cb1
EDIT:
123_abc_cb1 is a random combination of letters, numbers and special characters, there can be everything.
Your attempt has these issues:
A typo: (/= should be (?=
The regex does not require that the --[0-9] part is still followed by the extension. That part should actually be an optional part that precedes the pattern for the extension.
So change to this:
^.+?(?=(?:--\d)?\.(?:czxy|zzzz))
Or -- if matches do not necessarily start at the start of the input/line:
(?<!\S).+?(?=(?:--\d)?\.(?:czxy|zzzz))
You don't need any lookarounds if you can use a capture group. To match characters and underscore you can use for example \w to match word characters:
(\w+)(?:--\d+)?\.(?:czxy|zzzz)\b
Regex demo
why not use the recurrent information "_cb1"
/.*_cb1/

Regex ot extract multiple substrings

Example string containing one or more variables comma separated: TR.ASDASD, TU.IOHOUFHAF, XP.FWEFRWE .....
I need to use Regex to extract the characters before the . and end up with a string like this: TR, TU, XP
thanks in advance!
This regex works for what you need:
\..+?(?=,|$)
You need to substitute with nothing (an empty string).
This matches a ., then anything up to a comma or string end.
Example of it working: https://regex101.com/r/cV5hS2/1

Remove ending of string with gsub

I have two possible endings for my string. The first with no numbers:
http://www.something.com/test.html
the second with numbers (up to two digits)
http://www.something.com/test-1.html
http://www.something.com/test-2.html
http://www.something.com/test-3.html
http://www.something.com/test-4.html
http://www.something.com/test-15.html
I need to strip the .html from the first case and -1.html (or whatever number) from the second. The idea is to make the two string comparable to find duplicates.
I think the following should manage the second case
gsub("-[0-9]|[1-9][0-9].html", "", string)
but is it possible to have a function to manage both cases?
You can perhaps use something like this:
(-[0-9]+)?\\.html
Note that it's safer to escape the dot because an unescaped dot will match any character.
regex101 demo

Replacing part of delimited string with R's regex

I have the following list of strings:
name <- c("hsa-miR-555p","hsa-miR-519b-3p","hsa-let-7a")
What I want to do is for each of the above strings
replace the text after second delimiter (-) with "zzz".
Yielding:
hsa-miR-zzz
hsa-miR-zzz
hsa-let-zzz
What's the way to do it?
Might as well use something like:
gsub("^((?:[^-]*-){2}).*", "\\1zzz", name)
(?:[^-]*-) is a non-capturing group which consists of several non-dash characters followed by a single dash character and the {2} just after means this group occurs twice only. Then, match everything else for the replacement. Note I used an anchor just in case to avoid unintended substitutions.
Perhaps something like this:
> gsub("([A-Za-z]+-)([A-Za-z]+-)(.*)", "\\1\\2zzz", name)
[1] "hsa-miR-zzz" "hsa-miR-zzz" "hsa-let-zzz"
There are actually several ways to approach this, depending on how "regular" your expressions actually are. For example, do they all start with "hsa-"? What are the options for the "middle" group? Might there be more than three dashes?

How to match a string that does not end in a certain substring?

how can I write regular expression that dose not contain some string at the end.
in my project,all classes that their names dont end with some string such as "controller" and "map" should inherit from a base class. how can I do this using regular expression ?
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Do a search for all filenames matching this:
(?<!controller|map|anythingelse)$
(Remove the |anythingelse if no other keywords, or append other keywords similarly.)
If you can't use negative lookbehinds (the (?<!..) bit), do a search for filenames that do not match this:
(?:controller|map)$
And if that still doesn't work (might not in some IDEs), remove the ?: part and it probably will - that just makes it a non-capturing group, but the difference here is fairly insignificant.
If you're using something where the full string must match, then you can just prefix either of the above with ^.* to do that.
Update:
In response to this:
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Not quite sure what you're attempting with the public/class stuff there, so try this:
public.*class.*(?<!controller|map)$`
The . is a regex char that means "anything except newline", and the * means zero or more times.
If this isn't what you're after, edit the question with more details.
Depending on your regex implementation, you might be able to use a lookbehind for this task. This would look like
(?<!SomeText)$
This matches any lines NOT having "SomeText" at their end. If you cannot use that, the expression
^(?!.*SomeText$).*$
matches any non-empty lines not ending with "SomeText" as well.
You could write a regex that contains two groups, one consists of one or more characters before controller or map, the other contains controller or map and is optional.
^(.+)(controller|map)?$
With that you may match your string and if there is a group() method in the regex API you use, if group(2) is empty, the string does not contain controller or map.
Check if the name does not match [a-zA-Z]*controller or [a-zA-Z]*map.
finally I did it in this way
public.*class.*[^(controller|map|spec)]$
it worked