RegEx to remove multiline property - regex

I'm porting my system to another data access library. For that, I'm using regex to replaces/remove some codes on my source. (A example above)
I need to remove everything between IBOQ_OrderingItems.Strings and ') by regex. But I can't write a regex to express this condition to express that. In my attempts, this does not recognize something like #180'asdf' or 'adsf (asdf) asdf' or ' adf '. When recognized, the regexp delete all content of file.
object SQLCalcula_umaLinha: TFDQuery
IBOQ_OrderingItems.Strings = (
'sf')
end
object SQLCalcula_VariasLinhas: TFDQuery
IBOQ_OrderingItems.Strings = (
'sfdf'
'sdffs'
'sf')
end
object SQLCalcula_parentesesNoMeio: TFDQuery
IBOQ_OrderingItems.Strings = (
'sfdf'
'sdffs ('' asdf '')'
'sf')
end

I found a solution:
IBOQ_.*.Strings.=.\((\s.[\w|\s|('|')|#|!|$|#||&|*|<|>|=|*|~]*.)+'\)
I hope to help :)

Or you could try something like
IBOQ_.*\.Strings\s*=\s*\((?:'[^']*'|[^)])*\)
which does it in 288 steps instead of yours, that does it in 48067 steps ;)
Check it out here at regex101.
Edit Changed to handle parentheses inside quotes.

Related

Splunk rex expression to remove comma if present in json file

I have stuck in a small issue where I need to remove last character "," ( if present) from JSON log file. I am using it in Splunk.
It seems simple and I was hoping my regex will work but its not working.
My Attempts :
1. s/\(,$\)?//g
2. s/,$//g
3. s/\(.*\),/\1/
FYI: My json file is nested, along with removing last character, I am removing some header and footers from this file and breaking the 1 event in multiple. Due to event break it has , at the end of each event.
For better understanding can refer this link which I posted on Splunk Community fourm
https://community.splunk.com/t5/Getting-Data-In/Updated-Help-in-event-break-for-json-file/td-p/569676
Actually there was an extra space at the end so below one is working but it cause another issue.
Working Regex s/\(,\s$\)//g
because I am using it with other regex and event break. Not event break is not working.
Other Regex
SEDCMD-removefooter = s/(\]\,).*//g
SEDCMD-removeheader = s/\{\"data\": \[//g
LINE_BREAKER = ([\r\n,]*(?:{[^[{]+\[)?){"links"
I resolved the issue
Working regex
SEDCMD-replacequotes = s/'/"/g
SEDCMD-removecomma = s/,\s$//g
SEDCMD-removefooter = s/(\]\,).*//g
SEDCMD-removeheader = s/\{.data.: \[//g

Regex to match everything between multiple set of brackets

I am trying to match everything between multiple set of brackets
Example of data
[[42.30722,-83.181125],[42.30722,-83.18112667],[42.30722167,-83.18112667,[42.30721667,-83.181125],[+42.30721667,-83.181125]]
I need to match everything within the inner brackets as below
42.30722,-83.181125,
42.30722,-83.18112667,
42.30722167,-83.18112667,
42.30721667,-83.181125,
+42.30721667,-83.181125
How do I do that. I tried \[([^\[\]]|)*\] but it gives me values with brackets. Can anybody please help me with this. Thanks in advance
Seems like one of them is missing a bracket maybe, or if not, maybe some expression similar to:
\[([+-]?\d+\.\d+)\s*,\s*([+-]?\d+\.\d+)\s*\]?
might be OK to start with.
Test
import re
expression = r"\[([+-]?\d+\.\d+)\s*,\s*([+-]?\d+\.\d+)\s*\]?"
string = """
[[42.30722,-83.181125],[42.30722,-83.18112667],[42.30722167,-83.18112667,[42.30721667,-83.181125],[+42.30721667,-83.181125]]
"""
print([list(i) for i in re.findall(expression, string)])
print(re.findall(expression, string))
Output
[['42.30722', '-83.181125'], ['42.30722', '-83.18112667'], ['42.30722167', '-83.18112667'], ['42.30721667', '-83.181125'], ['+42.30721667', '-83.181125']]
[('42.30722', '-83.181125'), ('42.30722', '-83.18112667'), ('42.30722167', '-83.18112667'), ('42.30721667', '-83.181125'), ('+42.30721667', '-83.181125')]
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
A little late, but figured I would include it anyhow.
Your 3rd set is missing a ']'.
If that is in there, then in Alteryx, you can just use Text to Columns splitting to Rows and ignore delimiter in brackets

Regex specific Param from Uri

Simply put, I pull the href prop of a link and need to replace it with new link when clicked. The new link needs 1 parameter from the original link (a claim link opening a new window and claiming a task for a user).
Thus far I have a working solution. What I'm wanting is for someone to maybe help me refine my RegEx a little.
For links like:
/crm/v2/claimTask?email=example#gmail.com&id=1372365392-1UsIvb-0002qr-Sz
I use:
$(this).prop("href").match(/(email|order|phone|num)=\s*?(.+)&/)[0].replace(/&/, '')
And get:
email=example#gmail.com
What i'd like to do is be able to remove .replace(/&/, '') and have the regex stop at the & symbol to begin with, but i'm unsure how to do this. Any ideas?
Further examples:
/crm/v2/claimTask?order=123456&id=137236456452-1UweRRwvb-00456jr-Sz
/crm/v2/claimTask?phone=6665554444&id=175655392-4WERTe4-097qt-Da
/crm/v2/claimTask?num=6665554444&id=1372234392-9sfaWa-12374ip-eW
/crm/v2/claimTask?email=email#test.net&id=133453465392-k0wS24S-36735qr-rt
Using:
$(this).prop("href").match(/(email|order|phone|num)=\s*?(.+)&/)
Would yield:
order=123456&
phone=6665554444&
num=6665554444&
email=email#test.net&
Try this:
$(this).prop("href").match(/((email|order|phone|num)=\s*?(.+))&/)[1] //"email=email#test.net"
$(this).prop("href").match(/((email|order|phone|num)=\s*?(.+))&/)[3] //"email#test.net"
The above just puts the part without the & into a capture group. You could also use a positive lookahead:
$(this).prop("href").match(/(email|order|phone|num)=\s*?(.+)(?=&)/) //["email=email#test.net", "email", "email#test.net"]
Just use a lookahead:
(email|order|phone|num)=\s*?(.+)(?=&)
It will not "eat" the ampersand.

Why is it selecting this file?

I have the following statement:
Directory.GetFiles(filePath, "A*.pdf")
.Where(file => Regex.IsMatch(Path.GetFileName(file), "[Aa][i-lI-L].*"))
.Skip((pageNum - 1) * pageSize)
.Take(pageSize)
.Select(path => new FileInfo(path))
.ToArray()
My problems is that the above statement also finds the file "Adali.pdf" which it should not - but i cannot figure out why.
The above statement should only select files starting with a, and where the second letter is in the range i-l.
Because it matches Adali taking 3rd and 4th characters (al):
Adali
--
Try using ^ in your regex which allows looking for start of the string (regex cheatsheet):
Regex.IsMatch(..., "^[Aa][i-lI-L].*")
Also I doubt you need asterisk at all.
PS: As a sidenote let me notice that this question doesn't seem to be written that good. You should try debugging this code yourself and particularly you should try checking your regex against your cases without LINQ. I'm sure there is nothing to do here with LINQ (the tag you have in your question), but the issue is about regular expressions (which you didn't mention in tags at all).
You are not anchoring the string. This makes the regex match the al in Adali.pdf.
Change the regex to ^[Aa][i-lI-L].* You can do just ^[Aa][i-lI-L] if you don't need anything besides matching.
You should to do this
var f = Directory.GetFiles(tb_Path.Text, "A*.pdf").Where(file => Regex.IsMatch(Path.GetFileName(file), "[Aa][i-lI-L].pdf")).ToArray();
When you call ".*" Adali accept in Regex

Article spinner with 2 tiers

I made an article spinner that used regex to find words in this syntax:
{word1|word2}
And then split them up at the "|", but I need a way to make it support tier 2 brackets, such as:
{{word1|word2}|{word3|word4}}
What my code does when presented with such a line, is take "{{word1|word2}" and "{word3|word4}", and this is not as intended.
What I want is when presented with such a line, my code breaks it up as "{word1|word2}|{word3|word4}", so that I can use this with the original function and break it into the actual words.
I am using c#.
Here is the pseudo code of how it might look like:
Check string for regex match to "{{word1|word2}|{word3|word4}}" pattern
If found, store each one as "{word1|word2}|{word3|word4}" in MatchCollection (mc1)
Split the word at the "|" but not the one inside the brackets, and select a random one (aka, "{word1|word2}" or "{word3|word4}")
Store the new results aka "{word1|word2}" and "{word3|word4}" in a new MatchCollection (mc2)
Now search the string again, this time looking for "{word1|word2}" only and ignore the double "{{" "}}"
Store these in mc2.
I can not split these up normally
Here is the regex I use to search for "{word1|word2}":
Regex regexObj = new Regex(#"\{.*?\}", RegexOptions.Singleline);
MatchCollection m = regexObj.Matches(originalText); //How I store them
Hopefully someone can help, thanks!
Edit: I solved this using a recursive method. I was building an article spinner btw.
That is not parsable using a regular expression, instead you have to use a recursive descent parser. Map it to JSON by replacing:
{ with [
| with ,
wordX with "wordX" (regex \w+)
Then your input
{{word1|word2}|{word3|word4}}
becomes valid JSON
[["word1","word2"],["word3","word4"]]
and will map directly to PHP arrays when you call json_decode.
In C#, the same should be possible with JavaScriptSerializer.
I'm really not completely sure WHAT you're asking for, but I'll give it a go:
If you want to get {word1|word2}|{word3|word4} out of any occurrence of {{word1|word2}|{word3|word4}} but not {word1|word2} or {word3|word4}, then use this:
#"\{(\{[^}]*\}\|\{[^}]*\})\}"
...which will match {{word1|word2}|{word3|word4}}, but with {word1|word2}|{word3|word4} in the first matching group.
I'm not sure if this will be helpful or even if it's along the right track, but I'll try to check back every once in a while for more questions or clarifications.
s = "{Spinning|Re-writing|Rotating|Content spinning|Rewriting|SEO Content Machine} is {fun|enjoyable|entertaining|exciting|enjoyment}! try it {for yourself|on your own|yourself|by yourself|for you} and {see how|observe how|observe} it {works|functions|operates|performs|is effective}."
print spin(s)
If you want to use the [square|brackets|syntax] use this line in the process function:
'/[(((?>[^[]]+)|(?R))*)]/x',