Regex string for IRC - regex

I am trying to write a regex string to match a string revived from an IRC channel.
The message will be in the format "!COMMAND parameters"; the only command that is built by the system so far is repeat.
The regex I am using looks like this:
/![repeat] (.*?)/
When other commands are added it will look like:
/![cmd1|cmd2|cmd3] (.*?)/
It does not seem to be matching the right things in the string. Can anyone offer any input on this?
It appears that I need to add some basic regex stuff.
() brackets return data, [] matches but does not return.
Swapping to () does not work either.
The IRC program I am writing has a dynamic number of commands, so far I have only added "repeat" so the command pattern is "[repeat]". If I added "say", it would be "[repeat|say]".

Use the parentheses for grouping:
/!(cmd1|cmd2|cmd3) (.*)/
The brackets […] denote a character class describing just one character out of a set of characters.
You should also not use a non-greedy .* as the minimal match of .*? is an empty string.

You used bad brackets
/!(cmd1|cmd2|cmd3) (.*)/
I don't understand what did you mean with ? in your regex

[repeat] is a character class and will match r or e or p etc..., you should just use
/!repeat (.*?)/
and
/!(cmd1|cmd2|cmd3) (.*?)/
I don't understand exactly what you are hoping to match, but the lazy operator seems wrong for example
/!COMMAND (.*?)/ applied to !COMMAND paramater will match !COMMAND only, (.*?) at the end of a regex is guaranteed to match nothing.

You're doing one thing wrong.
If you replace your [] brackets with () everything should work. Between [] you put some letters to match. [abc] would match a, b, or c, not "abc", while (abc) would match "abc" and (abc|bca) would match "abc" or "bca".
Check out the Perl regular expressions tutorial and reference for more information.

Related

How do i select only the files that starts with either CH or OTC [duplicate]

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

Get all matches for a certain pattern using RegEx

I am not really a RegEx expert and hence asking a simple question.
I have a few parameters that I need to use which are in a particular pattern
For example
$$DATA_START_TIME
$$DATA_END_TIME
$$MIN_POID_ID_DLAY
$$MAX_POID_ID_DLAY
$$MIN_POID_ID_RELTM
$$MAX_POID_ID_RELTM
And these will be replaced at runtime in a string with their values (a SQL statement).
For example I have a simple query
select * from asdf where asdf.starttime = $$DATA_START_TIME and asdf.endtime = $$DATA_END_TIME
Now when I try to use the RegEx pattern
\$\$[^\W+]\w+$
I do not get all the matches(I get only a the last match).
I am trying to test my usage here https://regex101.com/r/xR9dG0/2
If someone could correct my mistake, I would really appreciate it.
Thanks!
This will do the job:
\$\$\w+/g
See Demo
Just Some clarifications why your regex is doing what is doing:
\$\$[^\W+]\w+$
Unescaped $ char means end of string, so, your pattern is matching something that must be on the end of the string, that's why its getting only the last match.
This group [^\W+] doesn't really makes sense, groups starting with [^..] means negate the chars inside here, and \W is the negation of words, and + inside the group means literally the char +, so you are saying match everything that is Not a Not word and that is not a + sign, i guess that was not what you wanted.
To match the next word just \w+ will do it. And the global modifier /g ensures that you will not stop on the first match.
This should work - Based on what you said you wanted to match this should work . Also it won't match $$lower_case_strings if that's what you wanted. If not, add the "i" flag also.
\${2}[A-Z_]+/g

The regex characters ?, $, |

I have the following data:
abc def; ghi.
This regex will match:
([a-z0-9A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ ]*)\W (.*)( (\w\.))?
This regex will also match
([a-z0-9A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ ]*)\W (.*)$
I'm still quite new to regex's, but I thought | stood for OR, () grouped and ? stood for 0 or one occurence. So i thought when combining above queries it would still match. However the following will not match:
([a-z0-9A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ ]*)\W (.*)( (\w\.))|$
What am I doing wrong?
ps.
I am using the following for testing my regex.
http://regexpal.com/
EDIT:
I didn't use the code tag, so a character disappeared
EDIT2:
What I am trying to match is the following, the data will be a name.
So "abc def" is the surname. ghi the salutation (english is not my native language, is that the correct term for words like sir. ?). It's however possible that the first letter of the first name. That's why it should either be the end of the line, or that letter.
The data when there is a first name involved would be:
abc; def. G.
Operator precedence for the | operator is a little tricky. It's usually a good idea to explicitly wrap its two operands in parentheses.
Also, be careful about inserting spaces into your regexes. It looks like you want to match a literal period in the \w. fragment, to match "G."
So I think what you want for the combined expressions is something like
((.*)( (\w\.))?)|(.*)$
But since ? means 0 or more, as you have learned, this can be rewritten as
(.*)( (\w\.))?)$
And, to add the rest of the expression back in, we have
^[a-z0-9A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ ]*)\W (.*)( (\w\.))?)$
And, yes, "salutation" or "greeting" is a good word for "Mr.", "Ms.", "Dr.", etc.

Need regexp to find substring between two tokens

I suspect this has already been answered somewhere, but I can't find it, so...
I need to extract a string from between two tokens in a larger string, in which the second token will probably appear again meaning... (pseudo code...)
myString = "A=abc;B=def_3%^123+-;C=123;" ;
myB = getInnerString(myString, "B=", ";" ) ;
method getInnerString(inStr, startToken, endToken){
return inStr.replace( EXPRESSION, "$1");
}
so, when I run this using expression ".+B=(.+);.+"
I get "def_3%^123+-;C=123;" presumably because it just looks for the LAST instance of ';' in the string, rather than stopping at the first one it comes to.
I've tried using (?=) in search of that first ';' but it gives me the same result.
I can't seem to find a regExp reference that explains how one can specify the "NEXT" token rather than the one at the end.
any and all help greatly appreciated.
Similar question on SO:
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Replace patterns that are inside delimiters using a regular expression call
RegEx matching HTML tags and extracting text
You're using a greedy pattern by not specifying the ? in it. Try this:
".+B=(.+?);.+"
Try this:
B=([^;]+);
This matches everything between B= and ; unless it is a ;. So it matches everything between B= and the first ; thereafter.
(This is a continuation of the conversation from the comments to Evan's answer.)
Here's what happens when your (corrected) regex is applied: First, the .+ matches the whole string. Then it backtracks, giving up most of the characters it just matched until it gets to the point where the B= can match. Then the (.+?) matches (and captures) everything it sees until the next part, the semicolon, can match. Then the final .+ gobbles up the remaining characters.
All you're really interested in is the "B=" and the ";" and whatever's between them, so why match the rest of the string? The only reason you have to do that is so you can replace the whole string with the contents of the capturing group. But why bother doing that if you can access contents of the group directly? Here's a demonstration (in Java, because I can't tell what language you're using):
String s = "A=abc;B=def_3%^123+-;C=123;";
Pattern p = Pattern.compile("B=(.*?);");
Matcher m = p.matcher(s);
if (m.find())
{
System.out.println(m.group(1));
}
Why do a 'replace' when a 'find' is so much more straightforward? Probably because your API makes it easier; that's why we do it in Java. Java has several regex-oriented convenience methods in its String class: replaceAll(), replaceFirst(), split(), and matches() (which returns true iff the regex matches the whole string), but not find(). And there's no convenience method for accessing capturing groups, either. We can't match the elegance of Perl one-liners like this:
print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;
...so we content ourselves with hacks like this:
System.out.println("A=abc;B=def_3%^123+-;C=123;"
.replaceFirst(".+B=(.*?);.+", "$1"));
Just to be clear, I'm not saying not to use these hacks, or that there's anything wrong with Evan's answer--there isn't. I just think we should understand why we use them, and what trade-offs we're making when we do.