Scala split and line start in the regex - regex

I am trying to split the string in to four parts P, Q, R, S.
String starts with P as per the following example :
"P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK".split("[(^?P\\|)][(Q?\\|)]?[(R?\\|)]?[(S?\\|)]")
"P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK".split("[(^?P\|)][(Q?\|)]?[(R?\|)]?[(S?\|)]") foreach println
gives
VAL1|VAL2|VAL3|BLANK
VAL4|BLANK|BLANK
VAL5|BLANK|VAL6|HEL
BLANK|VAL7
|EDIT|BLANK|VAL8
DK 1.8
BLANK
where my expectation is :
VAL1|VAL2|VAL3|BLANK
VAL4|BLANK|BLANK
VAL5|BLANK|VAL6|HELP|BLANK|VAL7
EDIT|BLANK|VAL8|(SDK 1.8)|BLANK
However
"P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK".split("[(^P\\|)][(Q?\\|)]?[(R?\\|)]?[(S?\\|)]") (0)
Checking first element of split with above gives
res9: String = ""
It seems that start of string is not honored here. I tried this on regex 101 as well it correctly matches P| at the start. However it also matches P| in the |HELP|. So it seems my regex is flawed. However my question is How the empty string above comes in to play ?

You can use the following regex if having an empty first element of your list is not important:
\\|[QRS]\\||^P\\|
You can replace this regex by \\|[PQRS]\\||^P\\| if you except other P as separator inside the string
OUTPUT:
"P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK".split("\\|[QRS]\\||^P\\|");
[, VAL1|VAL2|VAL3|BLANK, VAL4|BLANK|BLANK, VAL5|BLANK|VAL6|HELP|BLANK|VAL7, EDIT|BLANK|VAL8|(SDK 1.8)|BLANK]
Otherwise you need to do it in 2 steps:
match and remove the P| at the beginning of your string using ^P\\| and replacing it by nothing demo1
split the string using the regex \\|[QRS]\\| demo2 You can replace this regex by \\|[PQRS]\\| if you except other P as separator inside the string

Here's one approach that defines the delimiter as one of P, Q, R, S enclosed by word boundary \b and optional |:
val s = "P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK"
s.split("""\|?\b[PQRS]\b\|?""").filter(_ != "")
// res1: Array[String] = Array(VAL1|VAL2|VAL3|BLANK, VAL4|BLANK|BLANK, VAL5|BLANK|VAL6|HELP|BLANK|VAL7, EDIT|BLANK|VAL8|(SDK 1.8)|BLANK)
Skip the filter in case you want to include extracted empty strings.

Related

Regex fro fetching the substring after nth _

I'm trying to write a regex for getting the substring after the late '_'.
I have written a regex but this is giving me the last '_' as well.
Regex written :
^((?:[^_]*\_){2})
Input string: harmeet_kaur_abc
Regex output: harmeet_kaur_
Required output: harmeet_kaur
You are including the second _ in the repetition.
You could either match only the first part:
SELECT REGEXP_MATCHES('harmeet_kaur_abc', '^[^_]*_[^_]*')
Or remove starting from the last underscore:
SELECT REGEXP_REPLACE('harmeet_kaur_abc', '_[^_]*$', '')
Both will output
harmeet_kaur

Regex to find first occurence of a character in lines matching a character

I am trying to do find and replace in my eclipse workspace on all properties files. The condition is that I have to find the lines which has the character '<' and then get the first matching "=" character on those matched lines.
For example
app.searchform.height.label=Height <b>(in cm)</b>
I want to find the char = in the above line and replace it with like =<has_html> so I get the below output
app.searchform.height.label=<has_html>Height <b>(in cm)</b>
You may use this regex,
(?=.*<)=
And replace it with,
=<has_html>
Explanation: Positive look ahead ensures the replacement only occurs if it finds < character in the string. And then just matches with = and replaces it with =< has_html>.
Demo,
https://regex101.com/r/yLt9j4/1
Edit1:
Here is how you can do it in java codes for replacing only first occurrence of =,
public static void main(String[] args) {
String s = "app.searchform.height.label=Height <b>(in =cm)</b>";
System.out.println("Before: " + s);
s = s.replaceFirst("(?=.*<)=", "=<has_html>");
System.out.println("After: " + s);
}
This gives following output,
Before: app.searchform.height.label=Height <b>(in =cm)</b>
After: app.searchform.height.label=<has_html>Height <b>(in =cm)</b>
You can try this one:
^(.*?)=(?=.*<)
With this as the replacement string:
$1=<has_html>
Working Example
Explanation:
In order to limit matches to 1 per line I start the match at the beging of the line with ^
Then uses a lazy quantifier to expand out words, stuffing everything into a capture group to paste back in later with (.*?)
then terminate the expansion o a = character and use a lookahead (?=.*<) to check for the < character

Regex to match everything from nth occurence of character onwards [duplicate]

i am trying to build one regex expression for the below sample text in which i need to replace the bold text. So far i could achieve this much
((\|)).*(\|) which is selecting the whole string between the first and last pip char. i am bound to use apache or java regex.
Sample String: where text length between pipes may vary
1.1|ProvCM|111111111111|**10.15.194.25**|10.100.10.3|10.100.10.1|docsis3.0
To match part after nth occurrence of pipe you can use this regex:
/^(?:[^|]*\|){3}([^|]*)/
Here n=3
It will match 10.15.194.25 in matched group #1
RegEx Demo
^((?:[^|]*\\|){3})[^|]+
You can use this.Replace by $1<anything>.See demo.
https://regex101.com/r/tP7qE7/4
This here captures from start of string to | and then captures 3 such groups and stores it in $1.The next part of string till | is what you want.Now you can replace it with anything by $1<textyouwant>.
Here's how you can do the replacement:
String input = "1.1|ProvCM|111111111111|10.15.194.25|10.100.10.3|10.100.10.1|docsis3.0";
int n = 3;
String newValue = "new value";
String output = input.replaceFirst("^((?:[^|]+\\|){"+n+"})[^|]+", "$1"+newValue);
This builds:
"1.1|ProvCM|111111111111|new value|10.100.10.3|10.100.10.1|docsis3.0"

Incorrect use of regex wildcards

This is not correct use of wildcards ? I'm attempting to match String that contains a date. I don't want to include the date in the returned String or the String value that prepends the matched String.
object FindText extends App{
val toFind = "find1"
val line = "this is find1 the line 1 \n 21/03/2015"
val find = (toFind+".*\\d{2}/\\d{2}/\\d{4}").r
println(find.findFirstIn(line))
}
Output should be : "find1 the line 1 \n "
but String is not found.
Dot does not match newline characters by default. You can set a DOTALL flag to make it happen (I have also added a "positive look-ahead - the (?=...) thingy - since you did not want the date to be included in the match": val find = (toFind+"""(?s).*(?=\d{2}/\d{2}/\d{4})""").r
(Note also, that in scala you do not need to escape special characters in strings, enclosed in a triple-quote pairs ... pretty neat).
The problem lies with the newline in the test string. A .* does not match newlines apparently. Replacing this with .*\\n?.* should fix it. One could also use a multiline flag in the regex such as:
val find = ("(?s)"+toFind+".*\\d{2}/\\d{2}/\\d{4}").r

Extract number not in brackets from this string using regular expressions [70-(90)]

[15-]
[41-(32)]
[48-(45)]
[70-15]
[40-(64)]
[(128)-42]
[(128)-56]
I have these values for which I want to extract the value not in curled brackets. If there is more than one, then add them together.
What is the regular expression to do this?
So the solution would look like this:
[15-] -> 15
[41-(32)] -> 41
[48-(45)] -> 48
[70-15] -> 85
[40-(64)] -> 40
[(128)-42] -> 42
[(128)-56] -> 56
You would be over complicating if you go for a regex approach (in this case, at least), also, regular expressions does not support mathematical operations, as pointed out by #richardtallent.
You can use an approach as shown here to extract a substring which omits the initial and final square brackets, and then, use the Split (as shown here) and split the string in two using the dash sign. Lastly, use the Instr function (as shown here) to see if any of the substrings that the split yielded contains a bracket.
If any of the substrings contain a bracket, then, they are omitted from the addition, or they are added up if otherwise.
Regular expressions does not support performing math on the terms. You can loop through the groups that are matched and perform the math outside of Regex.
Here's the pattern to extract any number within the square brackets that are not in cury brackets:
\[
(?:(?:\d+|\([^\)]*\))-)*
(\d+)
(?:-[^\]]*)*
\]
Each number will be returned in $1.
This works by looking for a number that is prefixed by any number of "words" separated by dashes, where the "words" are either numbers themselves or parenthesized strings, and followed by, optionally, a dash and some other stuff before hitting the end brace.
If VBA's RegEx doesn't support uncaptured groups (?:), remove all of the ?:'s and your captured numbers will be in $3 instead.
A simpler pattern also works:
\[
(?:[^\]]*-)*
(\d+)
(?:-[^\]]*)*
\]
This simply looks for numbers delimited by dashes and allowing for the number to be at the beginning or end.
Private Sub regEx()
Dim RegexObj As New VBScript_RegExp_55.RegExp
RegexObj.Pattern = "\[(\(?[0-9]*?\)?)-(\(?[0-9]*?\)?)\]"
Dim str As String
str = "[15-]"
Dim Match As Object
Set Match = RegexObj.Execute(str)
Dim result As Integer
Dim value1 As Integer
Dim value2 As Integer
If Not InStr(1, Match.Item(0).submatches.Item(0), "(", 1) Then
value1 = Match.Item(0).submatches.Item(0)
End If
If Not InStr(1, Match.Item(0).submatches.Item(1), "(", 1) And Not Match.Item(0).submatches.Item(1) = "" Then
value2 = Match.Item(0).submatches.Item(1)
End If
result = value1 + value2
MsgBox (result)
End Sub
Fill [15-] with the other strings.
Ok! It's been 6 years and 6 months since the question was posted. Still, for anyone looking for something like that maybe now or in the future...
Step 1:
Trim Leading and Trailing Spaces, if any
Step 2:
Find/Search:
\]|\[|\(.*\)
Replace With:
<Leave this field Empty>
Step 3:
Trim Leading and Trailing Spaces, if any
Step 4:
Find/Search:
^-|-$
Replace With:
<Leave this field Empty>
Step 5:
Find/Search:
-
Replace With:
\+