RegEx that matches characters after semicolon in the same line - regex

I need some help with the Regular Expressions. I need a RegEx that matches with characters if they are after a semicolon AND in the same line of a previous word.
Let me explain that:
I need something like this. I have to make a function that does not allow to introduce character after a semicolon in the same line, and I think I could do it with this sort of RegEx.
Thank you.

I am not sure I understood your question, but would something like this help?
This regular expression

Well, you've got two ways to do it:
A: Create a regular expression to validate correct input.
B: Create a regular expression to find incorrect input.
I would use option 1, but it depends on what you need to do.
A: Regex to validate correct lines
In this case, we'll use the m modifier to set the regex engine to search by line (m = multiline). This means that ^ matches the beginning of a line and $ matches the end of a line.
Then we want to match some characters which are not the semicolon itself. To do this we use the [^ ] group meaning "anything which is not in the provided list of characters". So to say any char except the semicolon we'll have to use [^;].
Now, this char is not alone as they'll be probably many of them. To do that we can either use the * or + operators that respectively mean "0 or more times" and "1 or more times". If the data before the semicolon is mandatory then we'll use the + operator. This leads to [^;]+ to say any char which is not a semicolon, 1 or more times.
Then we'll capture this with the () operators. This will let us have direct access to this value without having to take the line and remove the semicolon with a truncation by our own.
After this capturation, we have the semicolon and then maybe some empty spaces or not and then the end of the line. For the spaces after, it's up to you. It would be \s* to say any kind of space, tab or blank char 0 or n times.
At the end we get this regex: ^([^;]+);\s*$ with the m and g flags
m for multiline and g for global, which means don't stop at the first match but look for all of them.
Test it here: https://regex101.com/r/sT59eu/1/
B: Regex to find invalid lines
Well, this could be rather easy too: ;.+$
. means any char. So here we'll find the lines with something behind the semicolon.
Test it here: https://regex101.com/r/ocDofm/1/
But you will NOT find lines with missing semicolons!

if I understand it correctly,
(?<=;)[A-Za-z]+
might does your work.
The python documentation is helpful: https://docs.python.org/3/library/re.html

Related

Get all matches for a certain pattern using RegEx

I am not really a RegEx expert and hence asking a simple question.
I have a few parameters that I need to use which are in a particular pattern
For example
$$DATA_START_TIME
$$DATA_END_TIME
$$MIN_POID_ID_DLAY
$$MAX_POID_ID_DLAY
$$MIN_POID_ID_RELTM
$$MAX_POID_ID_RELTM
And these will be replaced at runtime in a string with their values (a SQL statement).
For example I have a simple query
select * from asdf where asdf.starttime = $$DATA_START_TIME and asdf.endtime = $$DATA_END_TIME
Now when I try to use the RegEx pattern
\$\$[^\W+]\w+$
I do not get all the matches(I get only a the last match).
I am trying to test my usage here https://regex101.com/r/xR9dG0/2
If someone could correct my mistake, I would really appreciate it.
Thanks!
This will do the job:
\$\$\w+/g
See Demo
Just Some clarifications why your regex is doing what is doing:
\$\$[^\W+]\w+$
Unescaped $ char means end of string, so, your pattern is matching something that must be on the end of the string, that's why its getting only the last match.
This group [^\W+] doesn't really makes sense, groups starting with [^..] means negate the chars inside here, and \W is the negation of words, and + inside the group means literally the char +, so you are saying match everything that is Not a Not word and that is not a + sign, i guess that was not what you wanted.
To match the next word just \w+ will do it. And the global modifier /g ensures that you will not stop on the first match.
This should work - Based on what you said you wanted to match this should work . Also it won't match $$lower_case_strings if that's what you wanted. If not, add the "i" flag also.
\${2}[A-Z_]+/g

Interesting easy looking Regex

I am re-phrasing my question to clear confusions!
I want to match if a string has certain letters for this I use the character class:
[ACD]
and it works perfectly!
but I want to match if the string has those letter(s) 2 or more times either repeated or 2 separate letters
For example:
[AKL] should match:
ABCVL
AAGHF
KKUI
AKL
But the above should not match the following:
ABCD
KHID
LOVE
because those are there but only once!
that's why I was trying to use:
[ACD]{2,}
But it's not working, probably it's not the right Regex.. can somebody a Regex guru can help me solve this puzzle?
Thanks
PS: I will use it on MYSQL - a differnt approach can also welcome! but I like to use regex for smarter and shorter query!
To ensure that a string contains at least two occurencies in a set of letters (lets say A K L as in your example), you can write something like this:
[AKL].*[AKL]
Since the MySQL regex engine is a DFA, there is no need to use a negated character class like [^AKL] in place of the dot to avoid backtracking, or a lazy quantifier that is not supported at all.
example:
SELECT 'KKUI' REGEXP '[AKL].*[AKL]';
will return 1
You can follow this link that speaks on the particular subject of the LIKE and the REGEXP features in MySQL.
If I understood you correctly, this is quite simple:
[A-Z].*?[A-Z]
This looks for your something in your set, [A-Z], and then lazily matches characters until it (potentially) comes across the set, [A-Z], again.
As #Enigmadan pointed out, a lazy match is not necessary here: [A-Z].*[A-Z]
The expression you are using searches for characters between 2 and unlimited times with these characters ACDFGHIJKMNOPQRSTUVWXZ.
However, your RegEx expression is excluding Y (UVWXZ])) therefore Z cannot be found since it is not surrounded by another character in your expression and the same principle applies to B ([ACD) also excluded in you RegEx expression. For example Z and A would match in an expression like ZABCDEFGHIJKLMNOPQRSTUVWXYZA
If those were not excluded on purpose probably better can be to use ranges like [A-Z]
If you want 2 or more of a match on [AKL], then you may use just [AKL] and may have match >= 2.
I am not good at SQL regex, but may be something like this?
check (dbo.RegexMatch( ['ABCVL'], '[AKL]' ) >= 2)
To put it in simple English, use [AKL] as your regex, and check the match on the string to be greater than 2. Here's how I would do in Java:
private boolean search2orMore(String string) {
Matcher matcher = Pattern.compile("[ACD]").matcher(string);
int counter = 0;
while (matcher.find())
{
counter++;
}
return (counter >= 2);
}
You can't use [ACD]{2,} because it always wants to match 2 or more of each characters and will fail if you have 2 or more matching single characters.
your question is not very clear, but here is my trial pattern
\b(\S*[AKL]\S*[AKL]\S*)\b
Demo
pretty sure this should work in any case
(?<l>[^AKL\n]*[AKL]+[^AKL\n]*[AKL]+[^AKL\n]*)[\n\r]
replace AKL for letters you need can be done very easily dynamicly tell me if you need it
Is this what you are looking for?
".*(.*[AKL].*){2,}.*" (without quotes)
It matches if there are at least two occurences of your charactes sorrounded by anything.
It is .NET regex, but should be same for anything else
Edit
Overall, MySQL regular expression support is pretty weak.
If you only need to match your capture group a minimum of two times, then you can simply use:
select * from ... where ... regexp('([ACD].*){2,}') #could be `2,` or just `2`
If you need to match your capture group more than two times, then just change the number:
select * from ... where ... regexp('([ACD].*){3}')
#This number should match the number of matches you need
If you needed a minimum of 7 matches and you were using your previous capture group [ACDF-KM-XZ]
e.g.
select * from ... where ... regexp('([ACDF-KM-XZ].*){7,}')
Response before edit:
Your regex is trying to find at least two characters from the set[ACDFGHIJKMNOPQRSTUVWXZ].
([ACDFGHIJKMNOPQRSTUVWXZ]){2,}
The reason A and Z are not being matched in your example string (ABCDEFGHIJKLMNOPQRSTUVWXYZ) is because you are looking for two or more characters that are together that match your set. A is a single character followed by a character that does not match your set. Thus, A is not matched.
Similarly, Z is a single character preceded by a character that does not match your set. Thus, Z is not matched.
The bolded characters below do not match your set
ABCDEFGHIJKLMNOPQRSTUVWXYZ
If you were to do a global search in the string, only the italicized characters would be matched:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Little vim regex

I have a bunch of strings that look like this: '../DisplayPhotod6f6.jpg?t=before&tn=1&id=130', and I'd like to take out everything after the question mark, to look like '../DisplayPhotod6f6.jpg'.
s/\(.\.\.\/DisplayPhoto.\{4,}\.jpg\)*'/\1'/g
This regex is capturing some but not all occurences, can you see why?
\.\{4,} is trying to match 4 or more . characters. What it looks like you wanted is "match 4 or more of any character" (.\{4,}) but "match 4 or more non-. characters" ([^.]\{4,}) might be more accurate. You'll also need to change the lone * at the end of the pattern to .* since the * is currently applying to the entire \(\) group.
I think the easyest way to go for this is:
s/?.*$/'/g
This says: delete everything after the question mark and replace it with a single quote.
I would use macros, sometime simpler than regexp (and interactive) :
qa
/DisplayPhoto<Enter>
f?dt'
n
q
And then some #a, or 20000#a to go though all lines.
The following regexp: /(\.\./DisplayPhoto.*\.jpg)/gi
tested against following examples:
../DisplayPhotocef3.jpg?t=before&tn=1&id=54
../DisplayPhotod6f6.jpg?t=before&tn=1&id=130
will result:
../DisplayPhotocef3.jpg
../DisplayPhotod6f6.jpg
%s/\('\.\.\/DisplayPhoto\w\{4,}\.jpg\).*'/\1'/g
Some notes:
% will cause the swap to work on all lines.
\w instead of '.', in case there are some malformed file names.
Replace '.' at the start of your matching regex with ' which is exactly what it should be matching.

Regex for all strings not containing a string? [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 6 years ago.
Ok, so this is something completely stupid but this is something I simply never learned to do and its a hassle.
How do I specify a string that does not contain a sequence of other characters. For example I want to match all lines that do NOT end in '.config'
I would think that I could just do
.*[^(\.config)]$
but this doesn't work (why not?)
I know I can do
.*[^\.][^c][^o][^n][^f][^i][^g]$
but please please please tell me that there is a better way
You can use negative lookbehind, e.g.:
.*(?<!\.config)$
This matches all strings except those that end with ".config"
Your question contains two questions, so here are a few answers.
Match lines that don't contain a certain string (say .config) at all:
^(?:(?!\.config).)*$\r?\n?
Match lines that don't end in a certain string:
^.*(?<!\.config)$\r?\n?
and, as a bonus: Match lines that don't start with a certain string:
^(?!\.config).*$\r?\n?
(each time including newline characters, if present.
Oh, and to answer why your version doesn't work: [^abc] means "any one (1) character except a, b, or c". Your other solution would also fail on test.hg (because it also ends in the letter g - your regex looks at each character individually instead of the entire .config string. That's why you need lookaround to handle this.
(?<!\.config)$
:)
By using the [^] construct, you have created a negated character class, which matches all characters except those you have named. Order of characters in the candidate match do not matter, so this will fail on any string that has any of [(\.config) (or [)gi.\onc(])
Use negative lookahead, (with perl regexs) like so: (?!\.config$). This will match all strings that do not match the literal ".config"
Unless you are "grepping" ... since you are not using the result of a match, why not search for the strings that do end in .config and skip them? In Python:
import re
isConfig = re.compile('\.config$')
# List lst is given
filteredList = [f.strip() for f in lst if not isConfig.match(f.strip())]
I suspect that this will run faster than a more complex re.
As you have asked for a "better way": I would try a "filtering" approach. I think it is quite easy to read and to understand:
#!/usr/bin/perl
while(<>) {
next if /\.config$/; # ignore the line if it ends with ".config"
print;
}
As you can see I have used perl code as an example. But I think you get the idea?
added:
this approach could also be used to chain up more filter patterns and it still remains good readable and easy to understand,
next if /\.config$/; # ignore the line if it ends with ".config"
next if /\.ini$/; # ignore the line if it ends with ".ini"
next if /\.reg$/; # ignore the line if it ends with ".reg"
# now we have filtered out all the lines we want to skip
... process only the lines we want to use ...
I used Regexpal before finding this page and came up with the following solution when I wanted to check that a string doesn't contain a file extension:
^(.(?!\.[a-zA-Z0-9]{3,}))*$ I used the m checkbox option so that I could present many lines and see which of them did or did not match.
so to find a string that doesn't contain another "^(.(?!" + expression you don't want + "))*$"
My article on the uses of this particular regex

Regular expression - what is my mistake?

I would like to match either any sequence or digits, or the literal: na .
I am using:
"^\d*|na$"
Numbers are being matched, but not na.
Whats my mistake?
More info: im using this in a regular expression validator for a textbox in aspnet c#.
A blank entry is ok.
It's because the expression is being read (assuming PCRE):
"^\d*" OR "na$"
Some parentheses would take care of that in a jiff. Choose from (depending on your needs):
"^(\d+|na)$" // this will capture the number or na
"^(?:\d+|na)$" // this one won't capture
Cheers!
The | operator have a higher precedence than the anchors ^ and $. So the expression ^\d*|na$ means match ^\d* or na$. So try this:
^(\d*|na)$
Or:
^\d*$|^na$
Perhaps ^(?:\d*|na)$ would be better. What language/engine? Also, please show the input and, if possible, the snippet of the code.
Also, it is possible that you aren't matching "na" because there is a new line after it. The digits wouldn't be affected because you did not specify a $ anchor for them.
So, depending on the language and how the input is acquired, there might be new-line between "na" and the end of the string, and $ won't match it unless you turn on multi-line match (or strip the string of the new line).
This may not be the best or most elegant way to fix it, but try this:
"^\d*|[n][a]$"