I trying to check string data using regular expression.
form of input data are as bellow.
#1X2Y3Z#4A5B6C (valid)
<--nothing (valid)
#1X2Y3Z (valid)
##4A (valid)
#4A# (invalid)
# must be followed by at least one component matching ([0-9]+)A, ([0-9]+)B or ([0-9]+)C
And # must be the first character if input is not an empty string.
I wrote this regex:
#(([0-9]+)X)?(([0-9]+)Y)?(([0-9]+)Z)?#(([0-9]+)A)?(([0-9]+)B)?([0-9]+)C)?
but it regards #1X2Y3Z# as valid.
# must be represented with at least one component {A,B,C} or more and empty string is also valid.
^(?:#[ABC]+)?$
+ repeats the previous token one or more times, so [ABC]+ matches one or more A or B or C. ^ called starting anchor and $ called end of the line anchor.
Update:
^(?:#(?:#?[0-9]+[ABCXYZ])+)?$
DEMO
Use this one:
^(#(?:[0-9][A-Z])*#?(?:[0-9]|[A-Z])+)?$
I tested it with your requirements as per description:
# must be represented with at least one component [([0-9]+)A([0-9]+)B, ([0-9]+)C].
And # must be exist in front of string if input is not empty string.
Related
I need to modify the positions number 10 of every line that finds the word 'Example' (can´t use the actual data here) and add the string '(ID) '. It doesn´t necessarily have to begin with 9 numbers, it just needs to add the string to the position number 10.
For example, this line should be modified like this:
ORIGINAL: 123456789This line is being used as an Example
SOLUTION: 123456789(ID) This line is being used as an Example
So far I have this, to find the Example and copy the rest of the line as to not lose the text:
Find: (.*)Example
Bonus points if it works for two different words 'Example1' and 'Example2' in different sentences, the 'and also' part of this example would change in every line.
ORIGINAL: 123456789This line is being used as an Example1 and also Example2
SOLUTION: 123456789(ID) This line is being used as an Example1 and also Example2
This would have this search:
Find: (.*)Example1(.*)Example2
Thank you
You could try:
Find: (\d{9})(?=.*\bExample1\b.*\bExample2\b)
Replace: $(ID)
^^^ single space after (ID)
Demo
The regex pattern used matches and captures a 9 digit number (you may adjust to any width, or range of widths, which you want). It also uses a positive lookahead to assert that Example1 and Example2 in fact occur later in the same line:
(?=.*\bExample1\b.*\bExample2\b)
This is how you add characters in a certain position, even tho I accepted Tims answer because it´s very similar and made me figure it out:
^(\S{9})(?=.*\bExample1\b.*\bExample2\b)
As you can see, I only added '^' so it´s the position from the start of the line, and 'S' instead of 'd' so it counts characters that are not whitespace, instead of numbers. This should work for any type of line you have.
I have the following regex:
^([A-Za-z]{2,3}\d{6}|\d{5}|\d{3})((\d{3})?)(\d{2}|\d{3}|\d{6})(\d{2}|\d{3})$
I use this regex to match different, yet similar strings:
# MOR644-004-007-001
MOR644004007001 # string provided
# VUF00101-050-08-01
VUF001010500801 # string provided
# MF001317-077944-01
MF00131707794401 # string provided
These strings need to match/group as it is at the top of the strings, however my problem is that it is not grouping it correctly
The first string: MOR644004007001 is grouped: (MOR644004) (007) (001) which should be (MOR644) (004) (007) (001)
The second string: VUF001010500801 is grouped (VUF001010) (500) (801) which should be (VUF00101) (050) (08) (01)
How can I change ([A-Za-z]{2,3}\d{6}|\d{5}|\d{3})((\d{3})?) so that it would group the provided string correctly?
I am not sure that you can do what you want to.
Let's consider the first two strings:
# MOR644-004-007-001
MOR644004007001 # string provided
# VUF00101-050-08-01
VUF001010500801 # string provided
Now, both the strings are composed of 3 chars followed by 12 digits. Thus, given a regex R, if R does not depend on particular (sequences of) characters and on particular (sequences of) digits (i.e., it presents [A-Za-z] and \d but does not present, let's say, MO and 0070), then it will match both the string in the same way.
So, if you want to operate a different matching, then you need to look at the particular occurrence of certain characters or digits. We need more data from you in order to give you an aswer.
Finally, I suggest you to take a look at this tool:
http://regex.inginf.units.it/ (demo: http://regex.inginf.units.it/demo.html). It is a research project that automatically generates a regex given (many) examples of extraction. I warmly suggest you to try it, especially if you know that an underlying pattern is present in your case for sure (i.e. strings beginning with VUF must be matched differently from strings beginning with MOR) but you are unable to find it. Again, you will need to provide many examples to the engine. Needles to say, if a generic pattern does not exist, then the tool won't find it ;)
Considering your comment to Serv I'd say the (only?) solution is to have one regex for each possibility, like -
MOR(\d{3})(\d{3})(\d{3})(\d{3})|VUF(\d{5})(\d{3})(\d{2})(\d{2})|MF(\d{6})(\d{6})(\d{2})
and then use the execution environment (JS/php/python - you haven't provided which one) to piece the parts together.
See example on regex101 here. Note that substitution, only as an example, matches only the second string.
Regards
Take a look at this. I have used what's called as a named group. As pointed out earlier by others, it's better to have one regex code for each string. I have shown here for the first string, MOR644004007001. Easily you can expand for other two strings:
import re
# MOR644-004-007-001
MOR = "MOR644004007001" # string provided
# VUF00101-050-08-01
VUF = "VUF001010500801" # string provided
# MF001317-077944-01
MF = "MF00131707794401" # string provided
MORcompile = re.compile(r'(?P<first>\w{,6})(?P<second>\d{,3})(?P<third>\d{,3})(?P<fourth>\d{,3})')
MORsearch = MORcompile.search(MOR.strip())
print MORsearch.group('first')
print MORsearch.group('second')
print MORsearch.group('third')
print MORsearch.group('fourth')
MOR644
004
007
001
My regex is (?<![\u0410-\u042F])[.!?](?=(\s)?(\s)?[\u0410-\u042F]|[\u04E8]|["]|[\u201C]|![0-9])
I want to split a paragraph into sentences.
I do the regex with re.split() and I print the array
This is a sample input I did:
Мамлекеттик айыптоочу Биринчи май райондук сотуна берген бул сунушун диний кастыкты ырбатпоо аракети менен негиздеди. Мусулмандарга акаарат келтирип жатат деген кайрылуу каттын негизинде УКМК Тезекбаевге каршы кылмыш ишин козгоп, сотко өткөргөн. Бул ишти бүгүн Биринчи май райондук соту карап бүттү жана өкүм эртең чыгарыларын маалымдады. Тараптар мунаса тапты;
Ишти карап жаткан мамлекеттик айыптоочу Кудайберди Чаргынов Кубанычбек Тезекбаевдин диний кастыкты козутууга болгон аракети толугу менен далилденгенин билдирүүдө. Бирок мамлекеттик айыптоочу диний кастыкты ырбатпоо максатында Кыргызстандын Кылмыш кодексинин 65-беренесине ылайык, иш өз маанисин жоготконуна байланыштуу кылмыш ишин Т.У. кыскартып салууну сунуштады.
It prints out fine except that the last character (in this case a period, ?, or !) gets removed!
I searched online and it says to surround the punctuation with lookahead tags, but it doesn't work.
I'm using Python 3.
Put a capturing group around the character(s) you want to preserve in the split:
(?<![\u0410-\u042F])([.!?])(?=(\s)?(\s)?[\u0410-\u042F]|[\u04E8]|["]|[\u201C]|![0-9])
The periods will be added as new elements in the resulting list. From the documentation:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
If you don't want this, you'll have to split on the space itself, by ensuring that the space is preceded by a matching period of other punctuation marks (using a look-behind assertion).
How do you create a regular expression for a certain string? And can you do it in the Assertion (precondition part of the code)?
I've been google-ing around but couldn't get anything convincing.
The question is like this:
Add a precondition to the DEPARTMENT (the class that we're working on) creation procedure that ensures that the phone number is valid. There are three possible valid phone number formats. A valid phone number consists of one of:
eight digits, the first of which is non-zero
a leading zero, a single non-zero digit area code, and then eight digits, the first of
which is non-zero
a leading ‘+’, followed by a two digit country code, then a single non-zero digit
area code, and then eight digits, the first of which is non-zero
Any embedded spaces are to be ignored when validating a phone number.
It is acceptable, but not required, to add a PHONE_NUMBER class to the system as part of
solving this problem.
There are several different questions to be answered:
How to check if a given string matches a specified regular expression in Eiffel? One can use a class RX_PCRE_MATCHER from the Gobo library. The feature compile allows setting the required regular expression and the feature recognizes allows testing if the string matches it.
How to write a regular expression for the given phone number specification? Something like "(|0[1-9]|\+[0-9]{2}[1-9])[1-9][0-8]{7}" should do though I have not checked it. It's possible to take intermediate white spaces into account in the regular expression itself, but it's much easier to get rid of them before passing to the regular expression matcher by applying prune_all (' ') on the input string.
How to add a precondition to a creation procedure to verify that the argument satisfies it? Let's assume that from the previous items we constructed a function is_phone_number that takes a STRING and returns a BOOLEAN that indicates if the specified string represents a valid phone number. A straightforward solution would be to write
make (tel: STRING)
require
is_phone_number (tel)
...
and have a feature is_phone_number in the class DEPARTMENT itself. But this prevents us from checking if the specified string represents a phone number before calling this creation procedure. So it makes sense to move is_phone_number to the class PHONE_NUMBER_VALIDATOR that class DEPARTMENT will inherit. Similarly, if PHONE_NUMBER needs to validate the string against specified rules, it can inherit PHONE_NUMBER_VALIDATOR and reuse the feature is_phone_number.
Halikal actually worked this one out, but dudn't share until now ...
This works in eiffelStudio 6.2 (note - this is gobo)
http://se.inf.ethz.ch/old/people/leitner/gobo_guidelines/naming_conventions.html
A valid phone number consists of one of:
eight digits, the first of which is non-zero
a leading zero, a single non-zero digit area code,
and then eight digits, the first of which is non-zero
a leading + followed by a two digit country code,
then a single non-zero digit area code, and then eight digits,
the first of which is non-zero
Any embedded spaces are to be ignored when validating a phone number.
require -- 040 is ascii hex space
valid_phone:
match(phone, "^\040*[1-9]\040*([0-9]\040*){7}$") = TRUE or
match(phone, "^\040*0\040*([1-9]\040*){2}([0-9]\040*){7}$") = TRUE or
match(phone, "^\040*\+\040*([0-9]\040*){2}([1-9]\040*){2}([0-9]\040*){7}$") = TRUE
feature --Regular Expression check
match(text: STRING; pattern: STRING): BOOLEAN is
-- checks whether 'text' matches a regular expression 'pattern'
require
text /= Void
pattern /= Void
local
dfa: LX_DFA_REGULAR_EXPRESSION --There's the Trick!
do
create dfa.make
dfa.compile(pattern, True) --There's the Trick!
check -- regex must be compiled before we can use it
dfa.is_compiled;
end
Result := dfa.matches(text)
-- debug: make sure of which pattern
if dfa.matches (text) then
io.putstring(text + " matches " + pattern + "%N")
end
end
end
Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed