Regular expression in Hive - Get number after specific text - regex

How can I use regular expressions in order to get the number 4968 from the following text?
"category_path":["XXX1430","XXX109026","XXX3120","XXX4968","XXX377357"]
Many thanks!

Try Regex: category_path":\[(?:"[X]+\d+",){3}"[X]+\K\d+
Demo

If you want to use a regex you could capture 4968 in a group (\d+).
"category_path":\["XXX\d+"(?:,"XXX\d+"){2},"XXX(\d+)"(?:,"XXX\d+")*\]
Explanation
"category_path":\[ Match literally
"XXX\d+" Match XXX digits pattern without a coma
(?:,"XXX\d+"){2} repeat the XXX digits pattern with a comma 2 times
"XXX(\d+)" match "XXX and capture one or more digits in a group (\d+) and match a "
(?:,"XXX\d+")*\] match the pattern zero or more times and the closing ]

Related

replaceAll regex to remove last - from the output

I was able to achieve some of the output but not the right one. I am using replace all regex and below is the sample code.
final String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
System.out.println(label.replaceAll(
"([^-]+)-([^-]+)-(.+)-([^-]+)-([^-]+)", "$3"));
i want this output:
abc-nyd-request-xyxpt
but getting:
abc-nyd-request-xyxpt-
here is the code https://ideone.com/UKnepg
You may use this .replaceFirst solution:
String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
label.replaceFirst("(?:[^-]*-){2}(.+?)(?:--1)?-[^-]+$", "$1");
//=> "abc-nyd-request-xyxpt"
RegEx Demo
RegEx Details:
(?:[^-]+-){2}: Match 2 repetitions of non-hyphenated string followed by a hyphen
(.+?): Match 1+ of any characters and capture in group #1
(?:--1)?: Match optional --1
-: Match a -
[^-]+: Match a non-hyphenated string
$: End
The following works for your example case
([^-]+)-([^-]+)-(.+[^-])-+([^-]+)-([^-]+)
https://regex101.com/r/VNtryN/1
We don't want to capture any trailing - while allowing the trailing dashes to have more than a single one which makes it match the double --.
With your shown samples and attempts, please try following regex. This is going to create 1 capturing group which can be used in replacement. Do replacement like: $1in your function.
^(?:.*?-){2}([^-]*(?:-[^-]*){3})--.*
Here is the Online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^(?:.*?-){2} ##Matching from starting of value in a non-capturing group where using lazy match to match very near occurrence of - and matching 2 occurrences of it.
([^-]*(?:-[^-]*){3}) ##Creating 1st and only capturing group and matching everything before - followed by - followed by everything just before - and this combination 3 times to get required output.
--.* ##Matching -- to all values till last.

Regular Expression to Validate Monaco Number Plates

I would like to have an expression to validate the plates of monaco.
They are written as follows:
A123
123A
1234
I started by doing:
^[a-zA-Z0-9]{1}?[0-9]{2}?[a-zA-Z0-9]{1}$
But the case A12A which is false is possible with that.
You can use
^(?!(?:\d*[a-zA-Z]){2})[a-zA-Z\d]{4}$
See the regex demo. Details:
^ - start of string
(?!(?:\d*[a-zA-Z]){2}) - a negative lookahead that fails the match if there are two occurrences of any zero or more digits followed with two ASCII letters immediately to the right of the current location
[a-zA-Z\d]{4} - four alphanumeric chars
$ - end of string.
You can write the pattern using 3 alternatives specifying all the allowed variations for the example data:
^(?:[a-zA-Z][0-9]{3}|[0-9]{3}[a-zA-Z]|[0-9]{4})$
See a regex demo.
Note that you can omit {1} and
To not match 2 chars A-Z you can write the alternation as:
^(?:[a-zA-Z]\d{3}|\d{3}[a-zA-Z\d]|\d[a-zA-Z\d][a-zA-Z\d]\d)$
See another regex demo.
So it needs 3 connected digits and 1 letter or digit.
Then you can use this pattern :
^(?=.?[0-9]{3})[A-Za-z0-9]{4}$
The lookahead (?=.?[0-9]{3}) asserts the 3 connected digits.
Test on Regex101 here

How do I create a regex expression for a 10 digit phone number with the same separator?

I am trying to create a basic regular expression to match a phone number which can either use dots [.] or hyphens [-] as the separator.
The format is 123.456.7890 or 123-456-7890.
The expression I am currently using is:
\d\d\d[-.]\d\d\d[-.]\d\d\d\d
The issue here is that it also matches the phone numbers that have both separators in them which I want to be termed as invalid/not a match. For example, with my expression, 123.456-7890 and 123-456.7890 show up as a match, something I do not want happening.
Is there a way to do that?
Use a backreference:
^\d{3}([.-])\d{3}\1\d{4}$
Here is an explanation of the regex:
^ from the start of the number
\d{3} match any 3 digits
([.-]) then match AND capture either a dot or a dash separator
\d{3} match any 3 digits
\1 match the SAME separator seen earlier
\d{4} match any 4 digits
$ end of the number
You can use this regex:
^\d{3}([-.])\d{3}\1\d{4}$
You can see that it works here.
Key point here - is that you capture your desired character using brackets ([-.])
and then reuse it with back reference \1.

How to replace strings only in the string and numbers combination between double quotation Notepad++?

I'm trying to replace 「g」 with 「k」 in some article :
this is "g1"..., and "g2" is......, last "g1034" shows that...
the end result will be
this is "k1"..., and "k2" is......, last "k1034" shows that...
I'm using the following regex to find the pattern :
"([^"]*)"
but fail to replace with success.
How can I do a replacement? thx in advance
One option is to use
"\Kg(?=\d+")
" Match "
\Kg Forget what is currenly matched using \K, then match g
(?=\d+") Positive lookahead, assert what is on the right is 1+ digits and "
And replace with k
Regex demo
Another option could be using capturing groups
(")g(\d+")
In the replacement use the 2 groups
$1k$2
Regex demo
Note that if you do not want to match only digits, you could use [^"]+ instead of \d+ to match at least 1 char after g or use * to match 0 or more chars after g

Regex to extract values from look behind groups along with subsequent repetitions

In a JAVA program, I need to match a text input with a regular expression pattern. Simplistically, the text input looks like this: "100|200|123,124,125".
The output from the above match should find three matches, where all matches will return the two fixed subgroups - 100 and 200 and the variable repeating sub-group 123/124/125.
Match 1 - 123
Match 2 - 124
Match 3 - 125.
Each of these match output should also include 100 and 200 in two separate groups.
So basically, matches will target extracting patterns such as '100|200|123', '100|200|124', '100|200|125'.
I have used this regex: (?<=(?:(?<first>\d+)\|(?<second>\d+)\|)|,)(?<vardata>\d+)(?=,|$).
But I get this error: + A quantifier inside a look-behind makes it non-fixed width
As stated in comments above, you cannot use variable length assertions in lookbehind in Java regex.
However you can use this regex based on \G:
(?:(\d+)\|(\d+)\||(?<!^)\G,)(\d+)
RegEx Demo
RegEx Details:
\G asserts position at the end of the previous match or the start of the string for the first match.
You will get comma separated numbers in group(3) in a loop while group(1) and group(2) will give you first 2 numbers from input string.