I want to validate my input records coming one by one from file. And my file can contain 10,000 to 20,000 records
Record can have only (capital aplphabets, -, dot, spaces and numbers) only. And record ends up with new line character. It could be one of them (\n or \r\n)
I want regex to match record only having above five parameters with including new character of both type (\n or \r\n). If record contains other character from I've mentioned should not be matched.
I've tried this regex.
[A-Z\d\- ]{120}\s+$
lets take an example for 10 characters.
1) Input
AAAA12.0 A\nor\r\n
Regex should match for given input(1) because of exact ten characters plus new line character(one is possible at a time)
2)Input
AA-A13.0 AAA\nor\r\n
Regex should match for given input(2) because number of characters are more than 10
But this regex fails sometime. Any suggestion on this regex to improve and make it more strict on my five requirements ?
This expression:
^[-.A-Z \d]*\r?$
Matches if the entire line consists of only hyphen, dot, capital A-Z, space, and digit, and ends with \n or optionally \r\n.
Related
/[\w|A-Z]{1,3}[a-z]/g
but I want to match only the first 3 char of words.
For example:
I WANt THE FIRst 3 CHAr OF WORds ONLy.
It's for a rapid lector: only uppercase the begining of any words.
The best could be: (First 3 char)(Rest of the word or space)
https://regex101.com/r/PCi8Dn/2
Thank you !
Original answer
Use positive lookahead ((?=[pattern]) to match without including in the match.
[A-Z]{1,3}(?=[a-z])
appears to do what you want (if I've understood your spec correctly).
You can see it in action here.
New answer following clarification on spec
I think this does what you want:
(\S{1,3})(\S*[\s\.]+)
The breakdown is:
1st capturing group: (\S{1,3})
Matches a maximum of 3 non-space characters (\S used instead of \w because I think you want to match characters with diacritics like à and punctuation in the middle of words like '.
2nd capturing group: (\S*[\s\.]+)
Matches zero or more non-space characters (the remaining characters in each word) followed by one or more delimiter characters (space or period). I included period as a delimiter to match the last word. You might want to adjust that part depending on your exact needs.
See it in action here.
I have trouble understanding why my regex query takes one extra character besides the symbols I have told regex to include into the query, so this is my regex:
([\-:, ]{1,})[^0-9]
This is my test text:
Test- Product-: 1 --- 3 hour ,--kayak:--rental
It always includes the first character of each starting word, like P on Product or h on hour, how can I prevent regex from including those first characters?
I am trying to get all dashes, double points, comma and spaces excluding numbers or any characters.
The [^0-9] part of your regex matches any char but a digit, so you should remove it from your pattern.
There is no need to wrap the character class with a capturing group, and {0,1} is equal to +, so the whole regex can be shortened to
[-:, ]+
Note that - in the initial and end positions inside a character class does not have to be escaped.
I want to match specific strings from beginning to 5th word of article title.
Input string:
The 14 best US colleges in the West are dominated by California — here's who makes the cut.
regex:
/^.*(\bbest\b|\btop\b|\bhot\b).*$/
Currently matched whole article title but want to search till "colleges".
and also need ignore or not matched strings like laptop,hot-spot etc.
You can use this expression
^((?:\w+\s?){1,5}).*
Explanation:
^ assert position at start of the string
\w+ match any word character
\s? match any white space character
{1,5} Quantifier - Between 1 and 5 times, as many times as possible
.* matches any character (except newline)
This matches the first 5 words (and spaces).
^(\w+\s){0,4}\b(best|top|hot)(\s|$)
You want to match string within first five words of input sentence. Then if counted from the start the sentence, there must be 0-4 words before the word you want to match. So you need ^(\w+\s){0,4} before the specific words you want to match. See https://regex101.com/r/nS0dU6/4
regex101 comes to help again.
^(?=(?:\w+\s){0,4}?(?:best|top|hot)\b(?!-))(\w+(?:\s\w+){0,4})
(?=(?:\w+\s){0,4}?(?:best|top|hot)\b(?!-) checks that the keyword is within first 5 (note that (?!-) is added to cater for words such as hot-spot)
(\w+(?:\s\w+){0,4}) then matches the first maximum 5 words
I'm wanting to match a string if begins with either a letter or number, and from there I want to count the string (excluding whitespaces), and if it's over 5 characters, match it.
I believe I'm pretty close, my current regex is:
\s*(?:\S[\t ]*){5,}
What I need to add, is making sure the string starts with either a letter or number (or if it begins with a whitespace, make sure the following character is a letter or number.)
http://regex101.com/r/lD7mZ2/1
How about the regex
^\s*[a-zA-Z0-9]\s*(?:\S[\t ]*){4,}
Example: http://regex101.com/r/lD7mZ2/4
Changes made
^ anchors the regex at the start of the string.
[a-zA-Z0-9] matches letter or digit
{4,} quantifies it minimum 4 times. The presceding \w makes length of minimum 5
OR
a shorter version would be
^\s*[a-zA-Z0-9]\s*(?:\S\s*){4,}
I have a large file with data in this format:
regabc123456_user_domain_application_env_id
regdef789101_user_domain_application_env_id
in vim I want to do a search and replace ("_" for ", ") and match the machine name (regabc123456).
i am trying this:
:%s/^reg.*\{6}_/^reg.*\{6},\ /g
^ for beginning of the line 'reg' because all start with this then '.*' for anything after that but before the six digit code starts which I am tryign to catch with {6}.
This doesn't seem to be doing what I want. I can match the machine name, but I can't replace it with what I want. Is there an easier way to identify the machine name with regular expressions? example:
'reg' followed by three lower case letter followed by six numbers followed by an underscore, then replace?
Thanks.
The below regex would replace regabc123456_ to regabc123456,
:%s/^\(reg.*[0-9]\{6\}\)_/\1,/g
OR
:%s/^\(reg[a-z]\{3\}[0-9]\{6\}\)_/\1,/g
If you want a space after the comma then add space after comma in the replacement part.
%s/^\(reg[a-z]\{3\}[0-9]\{6\}\)_/\1, /g
To match a 6 digit number , you need to use [0-9]\{6\}. It repeats the previous token exactly 6 times.