Regex on nested arrays? - regex

I've got a string of text that looks like such:
...],null,null,
],
["Tuesday",["8AM–5:30PM"]
,null,null,"2018-09-25",1,[[8,0,17,30]
]
,0]
,["Wednesday",["8AM–5:30PM"]
,null,null,"2018-09-26",1,[[8,0,17,30]
]
,0]
,["Thursday",["8AM–5:30PM"]
,null,null,"2018-09-27",1,[[8,0,17,30]
],x,y,[[[.....
I know this ends with three consecutive left brackets.
I'm writing a regex to grab all the arrays starting from the first day to the end of the array of the last day, but I'm having trouble getting too much returned.
val regEx = """[a-zA-Z]*(day)(?s)(.*)(\[\[\[\")""".r
I'm using the (?s)(.*) to capture the fact that there can be newlines between day arrays.
This is essentially grabbing everything from the text following the first day rather than stopping at the [[[.
How can I resolve this issue?

Scala regex defaults to anchored, but your text string doesn't end with the target [[[. There's more after that so you want it unanchored.
You put the text day in a capture group, which seems rather pointless in that you're losing the part that identifies which day you're starting with.
Why put the closing [[[ in a capture group? I don't see its purpose.
Your regex pattern ends with a single quote, ", but that's not in the sample string so this pattern won't match at all, even though you claim it's "grabbing everything ... rather than stopping at the [[[". You should make sure that the code you post fails in the way you describe.
The title of you question mentions "nested arrays" but there are no arrays, nested or otherwise. You have a String that you are trying to parse. Perhaps something like this:
val str = """Tuesday",["8AM–5:30PM"]
,null,null,"2018-09-25",1,[[8,0,17,30]
]
,0]
,["Wednesday",["8AM–5:30PM"]
,null,null,"2018-09-26",1,[[8,0,17,30]
]
,0]
,["Thursday",["8AM–5:30PM"]
,null,null,"2018-09-27",1,[[8,0,17,30]
],x,y,[[[....."""
val regEx = """([a-zA-Z]*day)(?s)(.*)\[\[\[""".r.unanchored
str match {
case regEx(a,b) => s"-->>$a$b<<--"
case _ => "nope"
}

I know this ends with three consecutive left brackets.
I'm writing a regex to grab this, but having trouble getting too much
returned
If you just need to grab that [[[, it can be done as below:
val str = """Tuesday",["8AM?5:30PM"]
,null,null,"2018-09-25",1,[[8,0,17,30]
]
,0]
,["Wednesday",["8AM?5:30PM"]
,null,null,"2018-09-26",1,[[8,0,17,30]
]
,0]
,["Thursday",["8AM?5:30PM"]
,null,null,"2018-09-27",1,[[8,0,17,30]
],x,y,[[[....."""
scala> val regEx = """\[\[\[""".r
regEx: scala.util.matching.Regex = \[\[\[
scala> regEx.findFirstIn(str).get
res20: String = [[[
If you have more [[[ in the str, you can use, regEx.findAllIn(str).toArray which returns
an Array("[[[",....)
scala> regEx.findAllIn(str).toArray
res22: Array[String] = Array([[[)

Related

How do I do regex substitutions with multiple capture groups?

I'm trying to allow users to filter strings of text using a glob pattern whose only control character is *. Under the hood, I figured the easiest thing to filter the list strings would be to use Js.Re.test[https://rescript-lang.org/docs/manual/latest/api/js/re#test_], and it is (easy).
Ignoring the * on the user filter string for now, what I'm having difficulty with is escaping all the RegEx control characters. Specifically, I don't know how to replace the capture groups within the input text to create a new string.
So far, I've got this, but it's not quite right:
let input = "test^ing?123[foo";
let escapeRegExCtrl = searchStr => {
let re = [%re("/([\\^\\[\\]\\.\\|\\\\\\?\\{\\}\\+][^\\^\\[\\]\\.\\|\\\\\\?\\{\\}\\+]*)/g")];
let break = ref(false);
while (!break.contents) {
switch (Js.Re.exec_ (re, searchStr)) {
| Some(result) => {
let match = Js.Re.captures(result)[0];
Js.log2("Matching: ", match)
}
| None => {
break := true;
}
}
}
};
search -> escapeRegExCtrl
If I disregard the "test" portion of the string being skipped, the above output will produce:
Matching: ^ing
Matching: ?123
Matching: [foo
With the above example, at the end of the day, what I'm trying to produce is this (with leading and following .*:
.*test\^ing\?123\[foo.*
But I'm unsure how to achieve creating a contiguous string from the matched capture groups.
(echo "test^ing?123[foo" | sed -r 's_([\^\?\[])_\\\1_g' would get the work done on the command line)
EDIT
Based on Chris Maurer's answer, there is a method in the JS library that does what I was looking for. A little digging exposed the ReasonML proxy for that method:
https://rescript-lang.org/docs/manual/latest/api/js/string#replacebyre
Let me see if I have this right; you want to implement a character matcher where everything is literal except *. Presumably the * is supposed to work like that in Windows dir commands, matching zero or more characters.
Furthermore, you want to implement it by passing a user-entered character string directly to a Regexp match function after suitably sanitizing it to only deal with the *.
If I have this right, then it sounds like you need to do two things to get the string ready for js.re.test:
Quote all the special regex characters, and
Turn all instances of * into .* or maybe .*?
Let's keep this simple and process the string in two steps, each one using Js.re.replace. So the list of special characters in regex are [^$.|?*+(). Suitably quoting these for replace:
str.replace(/[\[\\\^\$\.\|\?\+\(\)]/g, '\$&')
This is just all those special characters quoted. The $& in the replacement specifications says to insert whatever matched.
Then pass that result to a second replace for the * to .*? transformation.
str.replace(/*+/g, '.*?')

Building a Regex String - Any assistance provided

Im very new to REGEX, I understand its purpose, but Im struggling to yet fully comprehend how to use it. Im trying to build a REGEX string to pull the A8OP2B out from the following (or whatever gets dumped in that 5th group).
{"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}}
The other items in above line, will change in character length, so I cannot say the 51st to the 56th character. It will always be the 5th group in quotation marks though that I want to pull out.
Ive tried building various regex strings up, but its still mostly a foreign language to me and I still have much reading to do on it.
Could anyone provide me a working example with the above, so I can reverse engineer and understand better?
Thanks
Demo 1: Reference the JSON to a var, then use either dot or bracket notation.
Demo 2: Using RegEx is not recommended, but here's one in JavaScript:
/\b(\w{6})(?=","RfKey":)/g
First Match
non-consuming match: :"A
meta border: \b: A non-word=:, any char=", and a word=A
consuming match: A8OP2B
begin capture: (, Any word =\w, 6 times={6}
end capture: )
non-consuming match: ","RfKey":
Look ahead: (?= for: ","RfKey": )
Demo 1
var obj = {"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}};
var dataDot = obj.RfReceived.Data;
var dataBracket = obj['RfReceived']['Data'];
console.log(dataDot);
console.log(dataBracket)
Demo 2
Note: This is consuming a string of 3 consecutive patterns. 3 matches are expected.
var rgx = /\b(\w{6})(?=","RfKey":)/g;
var str = `{"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}},{"RfReceived":{"Sync":8080,"Low":102,"High":1200,"Data":"PFN07U","RfKey":"None"}},{"RfReceived":{"Sync":7580,"Low":471,"High":360,"Data":"XU89OM","RfKey":"None"}}`;
var res = str.match(rgx);
console.log(res);

What is the syntax for the pattern in a VBScript RegExp object?

I'm not too familiar with VB script and am having trouble with the syntax for the pattern property of my regexp object.
I have some data that looks like this:
Front SHD Trip CLEAR OBSTRUCTION BEFORE CONTINUING [Table Position =
0mmFwd] Front SHD Trip CLEAR OBSTRUCTION BEFORE CONTINUING [Table
Position = 563mmFwd]
I want to strip off the [Table Position = x] part from these records so I have created a little function. Although there are no errors, it's not stripping off the end of the string as expected and I'm fairly sure that the issue is my syntax in this line:
objRegExp.Pattern = "[*]"
Here's the whole function:
Function RemoveTablePosition(AlarmText)
'Initialise a new RegExp object
Dim objRegExp, strNewAlarmText
Set objRegExp = New Regexp
'Set the RegExp object's parameters
objRegExp.IgnoreCase = True
objRegExp.Global = True
'Look for [Table Position = xx] at the end of the code text (they always follow the same format)
objRegExp.Pattern = "[*]"
'Replace all [Table Position = xx] with the empty string to effectively remove them
strNewAlarmText = objRegExp.Replace(AlarmText, "")
'Return the new alarm text value
RemoveTablePosition = strNewAlarmText
Set objRegExp = Nothing
End Function
Can someone point me in the right direction? Thanks in advance!
"[*]" is a character class matching a * literal character.
You can use
\[[^\]]*]$
or
\[.*?]$
See the regex demo. If you need to also match optional whitespace(s) before the [...], add \s* at the pattern start.
Explanation
\[ - literal [ symbol
[^\]]* - zero or more symbols other than ] (if there can be no [ and ], replace this one with [^\][]*)
OR
.*? - 0+ any characters other than a newline as few as possible up to the first...
] - a literal ] symbol that is at the...
$ - end of string
The difference between \[[^\]]*]$ and \[.*?]$ is that the former will also match newlines in between [ and ] (if any) and \[.*?]$ won't.
Assuming you also wish to get rid of the space after "CONTINUING", you would want to search for:
" \[.*\]$"
This looks for the space, left bracket, any number of characters, a right bracket and end of string.
If the right bracket is not end of string, for example:
Test String [REMOVE [THIS] AND CONTINUE]
it will keep matching with .* until it finds \]$ ($ being end of string.)
Wiktor's examples are good, but
"\[[^\]]*]$" is broken - I think he meant "\[[^\]]*\]$" - the problem with this is that it will stop at the first right bracket, so the example I gave above would fail to match
His second example, "\[.*?]$", also should probably be "\[.+\]$" assuming you wish to ensure there is something within the brackets, which it appears he was going for.
(? looks for 0 or 1, + looks for 1 or more, * looks for 0 or more)
(More elegant to use "\[.*\]$" as this does not make that assumption. ".*" matches 0 or more of any character. Surrounded by \[ and \], it will find anything in a pair of brackets, and ending with the $ assures it is at end of string.)
Hopefully this helps someone - I noted the date before replying, but saw the broken regex. The way the site uses * and hides single \ marks, it could be Wiktor's examples were fine until he posted them... (More than likely, actually.)

Trying to match a string in the format of domain\username using Lua and then mask the pattern with '#'

I am trying to match a string in the format of domain\username using Lua and then mask the pattern with #.
So if the input is sample.com\admin; the output should be ######.###\#####;. The string can end with either a ;, ,, . or whitespace.
More examples:
sample.net\user1,hello -> ######.###\#####,hello
test.org\testuser. Next -> ####.###\########. Next
I tried ([a-zA-Z][a-zA-Z0-9.-]+)\.?([a-zA-Z0-9]+)\\([a-zA-Z0-9 ]+)\b which works perfectly with http://regexr.com/. But with Lua demo it doesn't. What is wrong with the pattern?
Below is the code I used to check in Lua:
test_text="I have the 123 name as domain.com\admin as 172.19.202.52 the credentials"
pattern="([a-zA-Z][a-zA-Z0-9.-]+).?([a-zA-Z0-9]+)\\([a-zA-Z0-9 ]+)\b"
res=string.match(test_text,pattern)
print (res)
It is printing nil.
Lua pattern isn't regular expression, that's why your regex doesn't work.
\b isn't supported, you can use the more powerful %f frontier pattern if needed.
In the string test_text, \ isn't escaped, so it's interpreted as \a.
. is a magic character in patterns, it needs to be escaped.
This code isn't exactly equivalent to your pattern, you can tweek it if needed:
test_text = "I have the 123 name as domain.com\\admin as 172.19.202.52 the credentials"
pattern = "(%a%w+)%.?(%w+)\\([%w]+)"
print(string.match(test_text,pattern))
Output: domain com admin
After fixing the pattern, the task of replacing them with # is easy, you might need string.sub or string.gsub.
Like already mentioned pure Lua does not have regex, only patterns.
Your regex however can be matched with the following code and pattern:
--[[
sample.net\user1,hello -> ######.###\#####,hello
test.org\testuser. Next -> ####.###\########. Next
]]
s1 = [[sample.net\user1,hello]]
s2 = [[test.org\testuser. Next]]
s3 = [[abc.domain.org\user1]]
function mask_domain(s)
s = s:gsub('(%a[%a%d%.%-]-)%.?([%a%d]+)\\([%a%d]+)([%;%,%.%s]?)',
function(a,b,c,d)
return ('#'):rep(#a)..'.'..('#'):rep(#b)..'\\'..('#'):rep(#c)..d
end)
return s
end
print(s1,'=>',mask_domain(s1))
print(s2,'=>',mask_domain(s2))
print(s3,'=>',mask_domain(s3))
The last example does not end with ; , . or whitespace. If it must follow this, then simply remove the final ? from pattern.
UPDATE: If in the domain (e.g. abc.domain.org) you need to also reveal any dots before that last one you can replace the above function with this one:
function mask_domain(s)
s = s:gsub('(%a[%a%d%.%-]-)%.?([%a%d]+)\\([%a%d]+)([%;%,%.%s]?)',
function(a,b,c,d)
a = a:gsub('[^%.]','#')
return a..'.'..('#'):rep(#b)..'\\'..('#'):rep(#c)..d
end)
return s
end

Parsing Excel reference with regular expression?

Excel returns a reference of the form
=Sheet1!R14C1R22C71junk
("junk" won't normally be there, but I want to be sure that there's no extraneous text.)
I would like to 'split' this into a VB array, where
a(0)="Sheet1"
a(1)="14"
a(2)="1"
a(3)="22"
a(4)="71"
a(5)="junk"
I'm sure it can be done easily with a regular expression, but I just can't get the hang of it.
Is there a kind soul who could help me?
Thanks
=([^!]+)!R(\d+)C(\d+)R(\d+)C(\d+)(.*)
should work.
[^!]+ matches a sequence of non-exclamation-point characters.
\d+ matches a sequence of digits.
.* matches anything.
So, in VB.NET:
Dim a As Match
a = Regex.Match(SubjectString, "=([^!]+)!R(\d+)C(\d+)R(\d+)C(\d+)(.*)")
If a.Success Then
' matched text: a.Value
' backreference n text: a.Groups(n).Value
Else
' Match attempt failed
End If
A straightforward String.Split would work, provided the "junk" text wasn't there:
Dim input As String = "=Sheet1!R14C1R22C71"
Dim result = input.Split(New Char() { "="c, "!"c, "R"c, "C"c }, StringSplitOptions.RemoveEmptyEntries)
For Each item As String In result
Console.WriteLine(item)
Next
The regex gets a little tricky since you will need to go through the Groups and Captures of the nested portions to get the proper order.
EDIT: here's my regex solution. It accepts multiple occurrences of R's and C's.
Dim input As String = "=Sheet1!R14C1R22C71junk"
Dim pattern As String = "=(?<Sheet>Sheet\d+)!(?:R(?<R>\d+)C(?<C>\d+))+"
Dim m As Match = Regex.Match(input, pattern)
If m.Success Then
Console.WriteLine(m.Groups("Sheet").Value)
For i = 0 To m.Groups("R").Captures.Count - 1
Console.WriteLine(m.Groups("R").Captures(i).Value)
Console.WriteLine(m.Groups("C").Captures(i).Value)
Next
End If
Pattern explanation:
"=(?Sheet\d+)" : matches an = sign followed by "Sheet" and digits. Uses named group of "Sheet"
"!(?:R(?\d+)C(?\d+))+" : matches the exclamation mark followed by at least one occurrence of the *R*xx*C*xx portion of the text. Named groups of "R" and "C" are used.
"(?:...)+" : this portion from the above portion matches but does not capture the inner pattern (i.e., the R/C part). This is to avoid unnecessarily capturing them while we are actually capturing them with the named groups.
More general regexes for R1C1 style:
^=(?:(?<Sheet>[^!]+)!)?(?:R((?<RAbs>\d+)|(?<RRel>\[-?\d+\]))C((?<CAbs>\d+)|(?<CRel>\[-?\d+\]))){1,2}$
And A1 style:
^=(?:(?<Sheet>[^!]+)!)?(?:(?<Col1>\$?[a-z]+)(?<Row1>\$?\d+))(?:\:(?<Col2>\$?[a-z]+)(?<Row2>\$?\d+))?$
It doesn't match external references like =[Book1]Sheet1!A1 though.