I want to create a rule to remove array( and ) from this text:
"price"=> array(129),
to get:
"price"=> 129,
I tried this expression without success:
(?<="price"=>\s*)array\((?=\d*)\)(?=,)
Then I decided to made replacement in 2 steps. Firstly, I removed array(:
(?<="price"=>\s\s\s\s\s)array\(
And got:
"price"=> 129),
So I had to remove only a closing parenthesis ). I tried without success:
(?<="price"=>\s*\d*)\)(?=,)
This works, but only for a known number of whitespaces and digits:
(?<="price"=>\s\s\s\s\s\d\d\d)\)(?=,)
Try this for the find:
("price"=>\s+)array\((\d+)\)
and this for the replace:
\1\2
you can match whole line with this
\"price"[^a)]+(array\()\d+(\),)
it contains one group for "array(" and another for "),"
Try this:
(?:(?<=\"price\"=>\s*)array\((?=\d+\)))|(?<=\"price\"=>\s*array\(\d+)\)
The regex consists mainly two parts (the pipe in the middle is an alternation symbol which means if the first part doesn't match it should look for the second part).
The first part checks if array( is preceded by "price"=> ... and is succeded by ) by using the look-behind (?<= ... ) and look-ahead (?= ... ) symbol respectively.
(?:(?<=\"price\"=>\s*)array\((?=\d+\)))
Then we have a pipe (explained above)..
|
The second part checks if ) is preceded by everything we've matched before ("price"=> array(129) also using the look-behind symbol (<= ... ):
(?<=\"price\"=>\s*array\(\d+)\)
Thus for the string "price"=> array(129), the result should be two matches: array( and ).
Please let me know if this works for you.
Related
I am trying to implement a RegEx that will get all the occurrences of markdown links in this format [link_description](link_destination).
However, I have some requirements:
link destination MUST HAVE a space
link destination MUST NOT start with <
I got to this RegEx:
REGEX = /
(?<description>\[.*?\])
\(
(?<destination>
(?!<) # Do not start with greater than symbol
.*\s.* # Have at least one empty space
)
\)
/x.freeze
It works great when there is only one occurrence, such:
'[Contact us](mailto:foo#foo space)'.scan(REGEX)
=> [["[Contact us]", "mailto:foo#foo space"]]
However, current output for multiple occurrences:
"[Contact us](mailto:foo#foo space>) [Contact us](mailto:foo#foo space>)"
=> [["[Contact us]", "mailto:foo#foo space>) [Contact us](mailto:foo#foo space>"]]
Expected output:
"[Contact us](mailto:foo#foo space>) [Contact us](mailto:foo#foo space>)"
=> [["[Contact us]", "mailto:foo#foo space>"], ["[Contact us]", "mailto:foo#foo space>"]]
I tried changing it and added a [^)] to the end of the second capture, but still failing:
REGEX = /
(?<description>\[.*?\])
\(
(?<destination>
(?!<) # Do not start with greater than symbol
.*\s.*
[^)]
)
\)
/x.freeze
What am I doing wrong?
The issue is that the second capture group (?<destination>.*\s.*[^)]) matches everything until the last ) in the input string, which is not what you want. To fix this, you need to use a non-greedy quantifier (.*?) to match the minimum amount of characters until the first closing parenthesis ).
This should give you the expected output for multiple occurrences.
I try to catch somes "blocks" of my text file that are endend by a pattern with several "=" symbols.
I want to catch all these block without the final pattern but it's made with "=" that is use on some capture group of my block ... So when i select them, the pattern is always in the last match ...
Do you no a method for exclude it ?
A extract of my regex :
(\d{2}-\d{2}-\d{4} \d{2}:\d{2}) (.*)(Statut)([,:. aA-zZ0-9À-ÖØ-öø-ÿ=><\n\r]*)\n
And block to analyse :
01-10-2021 16:02 utilisateur1Statut A réaliser =>
Ouverte
01-10-2021 16:03 utilisateur1Statut MyFile.txt
01-10-2021 16:04 utilisateur1Statut
utilisateur1 => utilisateur2
======================================================================
Warning : my block can be with one or more row with carriage return ...
Links to regex101 sample : https://regex101.com/r/hXu3QO/1
The last part of the pattern contains a character class [,:. aA-zZ0-9À-ÖØ-öø-ÿ=><\n\r] that also matches = and newlines, so there is no rule to stop matching.
Note that aA-zZ is not the same as [a-zA-Z]
You can exclude the newlines from the character class, and repeat the matching starting with a newline and all lines that do not start with for example === or \d{2}-
You can make the rule as specific as you want of course.
(\d{2}-\d{2}-\d{4} \d{2}:\d{2}) (.*?)(Statut)\s*([,:. a-zA-Z0-9À-ÖØ-öø-ÿ=><]*(?:\n(?!===|\d{2}-)[,:. a-zA-Z0-9À-ÖØ-öø-ÿ=><]+)*)
Regex demo
I'm using the following regular expression pattern:
.*(?<line>^\s*Extends\s+#(?<extends>[_A-Za-z0-9]+)\s*$)?.*
And the following text:
Name #asdf
Extends #extendedClass
Origin #id
What I don't understand is that both of the caught group results (line and extends) are empty, but when I remove the last question mark from the expression the groups are caught.
The line group must be optional since the Extends line is not always present.
I created a fiddle using this expression, which can be accessed at https://regexr.com/4rekk
EDIT
I forgot to mention that I'm using the multiline and dotall flags along with the expression.
It's already been mentioned that the leading .* is capturing everything when you make your (?<line>) group optional. The following is not directly related to your question but it may be useful information (if not, just ignore):
You need to be careful elsewhere. You are using ^ and $ to match the start and end of lines as well as the start and end of the string. But the $ character will not consume the newline character that marks the end of a line. So:
'Line 1\nLine 2'.match(/^Line 1$^Line 2/m) returns null
while
'Line 1\nLine 2'.match(/^Line 1\n^Line 2/m) returns a match
So in your case if you were trying to capture all three lines, any of which were optional, you would write the regex for one of the lines as follows to make sure you consume the newline:
/(?<line>^\s*Extends\s+#(?<extends>[_A-Za-z0-9]+)[^\S\n]*\n)?/ms
Where you had specified \s*$, I have [^\S\n]*\n. [^\S\n]* is a double negative that says one or more non non-white space character excluding the newline character. So it will consume all white space characters except the newline character. If you wanted to look for any of the three lines in your example (any or all are optional), then the following code snippet should do it. I have used the RegExp function to create the regex so that it can be split across multiple lines. Unfortunately, it takes a string as its argument and so some backslash characters have to be doubled up:
let s = ` Name #asdf
Extends #extendedClass
Origin #id
`;
let regex = new RegExp(
"(?<line0>^\\s*Name\\s+#(?<name>[_A-Za-z0-9]+)[^\\S\\n]*\\n)?" +
"(?<line>^\\s*Extends\\s+#(?<extends>[_A-Za-z0-9]+)[^\\S\\n]*\\n)?" +
"(?<line2>^\\s*Origin\\s+#(?<id>[_A-Za-z0-9]+)[^\\S\\n]*\\n)?",
'm'
);
let m = s.match(regex);
console.log(m.groups);
The above code snippet seems to have a problem under Firefox (an invalid regex flag, 's', is flagged on a line that doesn't exist in the above snippet). See the following regex demo.
And without named capture groups:
let s = ` Name #asdf
Extends #extendedClass
Origin #id
`;
let regex = new RegExp(
"(^\\s*Name\\s+#([_A-Za-z0-9]+)[^\\S\\n]*\\n)?" +
"(^\\s*Extends\\s+#([_A-Za-z0-9]+)[^\\S\\n]*\\n)?" +
"(^\\s*Origin\\s+#([_A-Za-z0-9]+)[^\\S\\n]*\\n)?",
'm'
);
let m = s.match(regex);
console.log(m);
I have multiple square bracketed data in the log file of a splunk log. I am attempting to find a particular field named UserDataGuid and then gather the data in the bracket after this. My only option seems to be regular expressions in a standard that seems similar to perl to me. Yet does not work what am I doing wrong here ?
| rex "\]\s(?<UserDataGuid>.*?)\s*$"
// this trial looks more promising but grabs the last bracket :( and doesn't name the field, to be used in a subSearch.
| rex "(?i)UserDataGuid\s*\[([^\}]*)\]
the data looks like this
[21] INFO UserDataGuid [fas08f0da-faf6-4308-aad6-hfld5643gs] [(null)] [(null)] [(null)]
and I want only the guid
fas08f0da-faf6-4308-aad6-hfld5643gs
and I would love for it to be a field I could reuse like fields are used in splunk.
It looks like you want
(?<=UserDataGuid\s\[)([^\]]*)
I'd try the following regex:
(?<=UserDataGuid \[).*?(?=\])/g
This will capture fas08f0da-faf6-4308-aad6-hfld5643gs. See a demo here.
With
\]\s(?<UserDataGuid>.*?)\s*$
you say: match a ] > \], follow by any space character (only one) > \s, follow by a group with name UserDataGuid > (?<UserDataGuid> ... ) that contains any character, except newline (zero times, to unlimited times) > .*? ( in lazy mode, ? ), follow by any space character (zero times, to unlimited times) > \s*, follow by end of string > $
I think that you don't want this (?<UserDataGuid> ... );
you want match (in some way) UserDataGuid, no call UserDataGuid at the group that match " any character, except newline (zero times, to unlimited times) > .*? ( in lazy mode, ? ) "
In
(?i)UserDataGuid\s*\[([^\}]*)\]
change the }, for a ], and then, you captured your GUID in group #1
but, you don't need match "UserDataGuid\s[*"
you could use:
(?<=UserDataGuid \[)([^\]]*)
and then, you only match the GUID, and find it in the group #1
you can remove the parenthesis of group #1, because is a full match:
(?<=UserDataGuid \[)[^\]]*
https://regex101.com/r/sI3kW4/1
I can't seem to make this regex work.
The input is as follows. Its really on one row but I have inserted line breaks after each \r\n so that it's easier to see, so no check for space characters are needed.
01-03\r\n
01-04\r\n
TEXTONE\r\n
STOCKHOLM\r\n
350,00\r\n ---- 350,00 should be the last value in the first match
12-29\r\n
01-03\r\n
TEXTTWO\r\n
COPENHAGEN\r\n
10,80\r\n
This could go on with another 01-31 and 02-01, marking another new match (these are dates).
I would like to have a total of 2 matches for this input.
My problem is that I cant figure out how to look ahead and match the starting of a new match (two following dates) but not to include those dates within the first match. They should belong to the second match.
It's hard to explain, but I hope someone will get me.
This is what I got so far but its not even close:
(.*?)((?<=\\d{2}-\\d{2}))
The matches I want are:
1: 01-03\r\n01-04\r\nTEXTONE\r\nSTOCKHOLM\r\n350,00\r\n
2: 12-29\r\n01-03\r\nTEXTTWO\r\nCOPENHAGEN\r\n10,80\r\n
After that I can easily separate the columns with \r\n.
Can this more explicit pattern work to you?
(\d{2}-\d{2})\r\n(\d{2}-\d{2})\r\n(.*)\r\n(.*)\r\n(\d+(?:,?\d+))
Here's another option for you to try:
(.+?)(?=\d{2}-\d{2}\\r\\n\d{2}-\d{2}|$)
Rubular
/
\G
(
(?:
[0-9]{2}-[0-9]{2}\r\n
){2}
(?:
(?! [0-9]{2}-[0-9]{2}\r\n ) [^\n]*\n
)*
)
/xg
Why do so much work?
$string = q(01-03\r\n01-04\r\nTEXTONE\r\nSTOCKHOLM\r\n350,00\r\n12-29\r\n01-03\r\nTEXTTWO\r\nCOPENHAGEN\r\n10,80\r\n);
for (split /(?=(?:\d{2}-\d{2}\\r\\n){2})/, $string) {
print join( "\t", split /\\r\\n/), "\n"
}
Output:
01-03 01-04 TEXTONE STOCKHOLM 350,00
12-29 01-03 TEXTTWO COPENHAGEN 10,80`