Regex match the characters with same character in the given string

Regex match the characters with same character in the given string - regex

I am working on validating the pan card numbers. I need to check that the first character and the fifth character should be same while validating the pan card. Whatever the first character in the below string the same should be matched with the fifth character. Can anyone help me in applying the above condition?
Regex I have tried : [A-Za-z]{4}\d{4}[A-Za-z]{1}
Here is my pan card example: ABCDA9999K

If you want to match the full example string where the first A should match up with the fifth A, the pattern should match 5 occurrences of [A-Za-z]{5} instead of [A-Za-z]{4}
You could use a capturing group with a backreference ([A-Za-z])[A-Za-z]{3}\1 to account for the first 5 chars.
You might add word boundaries \b to the start and end to prevent a partial match or add anchors to assert the start ^ and the end $ of the string.
This part of the pattern {1} can be omitted.
([A-Za-z])[A-Za-z]{3}\1\d{4}[A-Za-z]
Regex demo

Related

RegExp: Match first 3 char words

/[\w|A-Z]{1,3}[a-z]/g
but I want to match only the first 3 char of words.
For example:
I WANt THE FIRst 3 CHAr OF WORds ONLy.
It's for a rapid lector: only uppercase the begining of any words.
The best could be: (First 3 char)(Rest of the word or space)
https://regex101.com/r/PCi8Dn/2
Thank you !

Original answer
Use positive lookahead ((?=[pattern]) to match without including in the match.
[A-Z]{1,3}(?=[a-z])
appears to do what you want (if I've understood your spec correctly).
You can see it in action here.
New answer following clarification on spec
I think this does what you want:
(\S{1,3})(\S*[\s\.]+)
The breakdown is:
1st capturing group: (\S{1,3})
Matches a maximum of 3 non-space characters (\S used instead of \w because I think you want to match characters with diacritics like à and punctuation in the middle of words like '.
2nd capturing group: (\S*[\s\.]+)
Matches zero or more non-space characters (the remaining characters in each word) followed by one or more delimiter characters (space or period). I included period as a delimiter to match the last word. You might want to adjust that part depending on your exact needs.
See it in action here.

How to extract a word that could possibly be followed with another word

I want to extract [games, games, things, things] from
the following array.
Today_games
Today_games_freq
Today_things
Today_things_freq
I have tried Today_(\w+)(?=_freq)?
Which will give me the extra "freq"
And some other combinations, but I couldn't figure out how to get just after the first hyphen.

You can use
Today_(\w+?)(?:_freq)?$
See the regex demo. This matches Today_, then captures any one or more word chars (as few as possible) into Group 1 (with (\w+?)), and then (?:_freq)?$ matches an optional occurrence of a _freq substring and asserts the position at the end of string.
Or,
Today_([^\W_]+)
See this regex demo.
Here, Today_ is matched and the ([^\W_]+) pattern captures one or more alphanumeric chars into Group 1 (same as \w+ with _ subtracted from \w).

regex match two words based on a matching substring

there are 4 strings as shown below
ABC_FIXED_20220720_VALUEABC.csv
ABC_FIXED_20220720_VALUEABCQUERY_answer.csv
ABC_FIXED_20220720_VALUEDEF.csv
ABC_FIXED_20220720_VALUEDEFQUERY_answer.csv
Two strings are considered as matched based on a matching substring value (VALUEABC, VALUEDEF in the above shown strings). Thus I am looking to match first 2 (having VALUEABC) and then next 2 (having VALUEDEF). The matched strings are identified based on the same value returned for one regex group.
What I tried so far
ABC.*[0-9]{8}_(.*[^QUERY_answer])(?:QUERY_answer)?.csv
This returns regex group-1 (from (.*[^QUERY_answer])) value "VALUEABC" for first 2 strings and "VALUEDEF" for next 2 strings and thus desired matching achieved.
But the problem with above regex is that as soon as the value ends with any of the characters of "QUERY_answer", the regex doesn't match any value for the grouping. For instance, the below 2 strings doesn't match at all as the VALUESTU ends with "U" here :
ABC_FIXED_20220720_VALUESTU.csv
ABC_FIXED_20220720_VALUESTUQUERY_answer.csv
I tried to use Negative Lookahead:
ABC.*[0-9]{8}_(.*(?!QUERY_answer))(?:QUERY_answer)?.csv
but in this case the grouping-1 value is returned as "VALUESTU" for first string and "VALUESTUQUERY_answer" for second string, thus effectively making the 2 strings unmatched.
Any way to achieve the desired matching?

With your shown samples please try following regex.
^ABC_[^_]*_[0-9]+_(.*?)(?:QUERY_answer)?\.csv$
OR to match exact 8 digits try:
^ABC_[^_]*_[0-9]{8}_(.*?)(?:QUERY_answer)?\.csv$
Here is the online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^ABC_[^_]*_ ##Matching from starting of value ABC followed by _ till next occurrence of _.
[0-9]+_ ##Matching continuous occurrences of digits followed by _ here.
(.*?) ##Creating one and only capturing group using lazy match which is opposite of greedy match.
(?:QUERY_answer)? ##In a non-capturing group matching QUERY_answer and keeping it optional.
\.csv$ ##Matching dot literal csv at the end of the value.

You need
ABC.*[0-9]{8}_(.*?)(?:QUERY_answer)?\.csv
See the regex demo.
Note
.*[^QUERY_answer] matches any zero or more chars other than line break chars as many as possible, and then any one char other than Q, U, E, etc., i.e. any char in the negated character class. This is replaced with .*?, to match any zero or more chars other than line break chars as few as possible.
(?:QUERY_answer)? - the group is made non-capturing to reduce grouping complexity.
\.csv - the . is escaped to match a literal dot.

Regex for replacing everything after a keyword with colon up to any other keyword with colon

I have the following type of strings:
This is a test: 1, two again,three test2: what is, this
test: acid, kool-aid word: some more info
Another test: face, 3, & yes
What I'd like to do is remove test: and everything after until it hits another word that has a colon.
The result set from above would look like:
This is a test2: what is, this
word: some more info
Another
Here's what I've attempted, but this fails when there is NO word with a colon (so example 3 fails)
test:.+?(?=\w+:)

You can use this regex for matching:
*\btest:.*?\b(?=\w+:|$)
And replace with empty string.
RegEx Demo
RegEx Details:
*: Match 0 or more spaces
\btest: Match full word test:
.*?\b: Match 0 or more of any characters (lazy match) followed by a word boundary
(?=\w+:|$): Positive lookahead to assert that we have a word + : or end of line ahead.

With your shown samples, please try following regex. This will create 1 to 2 capturing groups, this is having 3 matches 1st from starting to just before text with colon's 1st occurrence comes, 2nd match: From text followed by colon to next occurrence of text followed by colon(no capturing group is created for this match). 3rd match: rest of the value. So in case line has only 2 matches found(nothing in value after 2nd occurrence of text colon) then it will create 1 capturing group else it will be having 2 capturing groups. Perform substitution accordingly.
^(.*?)\s*\w+:.*?(?:\w+:|$)\s*(.*)$
Online demo for above regex

You were on the right track. For the last case where there is no second word with a colon, you need to match on the end-of-line character $. So you can use:
test:.*?(?=$|\b\w+:).
Demo

Regular expression for match string within first five words of input sentence

I want to match specific strings from beginning to 5th word of article title.
Input string:
The 14 best US colleges in the West are dominated by California — here's who makes the cut.
regex:
/^.*(\bbest\b|\btop\b|\bhot\b).*$/
Currently matched whole article title but want to search till "colleges".
and also need ignore or not matched strings like laptop,hot-spot etc.

You can use this expression
^((?:\w+\s?){1,5}).*
Explanation:
^ assert position at start of the string
\w+ match any word character
\s? match any white space character
{1,5} Quantifier - Between 1 and 5 times, as many times as possible
.* matches any character (except newline)
This matches the first 5 words (and spaces).

^(\w+\s){0,4}\b(best|top|hot)(\s|$)
You want to match string within first five words of input sentence. Then if counted from the start the sentence, there must be 0-4 words before the word you want to match. So you need ^(\w+\s){0,4} before the specific words you want to match. See https://regex101.com/r/nS0dU6/4

regex101 comes to help again.
^(?=(?:\w+\s){0,4}?(?:best|top|hot)\b(?!-))(\w+(?:\s\w+){0,4})
(?=(?:\w+\s){0,4}?(?:best|top|hot)\b(?!-) checks that the keyword is within first 5 (note that (?!-) is added to cater for words such as hot-spot)
(\w+(?:\s\w+){0,4}) then matches the first maximum 5 words

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex match the characters with same character in the given string - regex

Related

RegExp: Match first 3 char words

How to extract a word that could possibly be followed with another word

regex match two words based on a matching substring

Regex for replacing everything after a keyword with colon up to any other keyword with colon

Regular expression for match string within first five words of input sentence

Categories

Resources