wildcard dot-star to ignore delimeter - regex

I am trying to write a splunk query that basically extracts fields between delimiter "," and puts the value in a variable.
I want to extract everything between 2 commas.
,extract this##$,
,extract this-=##$, etc
Now, there is not an option to not give any regular expression and just get everything between , ,.
So closest I can think is ,.*, .. however .* would get greedy and match everything else.
Is there a regular expression I can use to stop wildcard matching after a , is encountered?
I tried
.*$,
.*,
which didn't work.

Not sure about splunk but in most languages you could try
/\,([^,]*)\,/
-- match all characters that aren't a comma. extraction will be in \1. Not sure if you'll need the escapes in front of the commas.

Use following regex in your splunk search query, it will extract required filed as "FIELDNAME"
| rex "(?i)\,(?P<FIELDNAME>[^ ][a-z0-9#-#$ ]+)\,\s"
Considering after "," (after field) there is space hence i have added "\s", without "\s" it will also work.

Related

Regex: How to find a string, then get charactes on either side up to a dilimeter?

I have a string like so:
foobar_something_alt=\"Brownfields1.png#asset:919\" /><p>MSG participat
And wish to find all oocurrences via the substring #asset: then select the characters around the match up to the quote marks.
Trying to extract specific ALT tags from a SQL dump. Is this possible with a regular expression?
Put [^"]* before and after the string you want to match. This will match any sequence of characters that aren't ".
[^"]*#asset:[^"]*

Regex Search and replace, if between ," and ", is a comma

How can i in Visual Basic replace comma to dot
If between ," and ", is a comma replace to dot
For example in first row replace "482,5" to "482.5"
"Peter",1,1,1,1,500,"500",631,"631",19,"482,5",1
"Peter",1,1,1,1,500,"500",631,"631",19,"482,5",2
"Peter",1,1,1,1,1984,"1984",635,"635",4,"101,5",3
"Peter",1,1,1,1,500,"500",2000,"2000",19,"482,5",4
"Peter",1,1,1,1,500,"500",1962,"1962",18,"457",5
"Peter",1,1,1,1,486,"486",613,"613",18,"457",6
"Peter",1,1,1,1,1016,"1016",322,"322",19,"482,5",7
"Peter",1,1,1,1,933,"933",444,"444",16,"406,5",8
"Peter",1,1,1,1,250,"250",476,"476",16,"406,5",9
"Peter",1,1,1,1,250,"250",476,"476",16,"406,5",10
"Peter",1,1,1,1,234,"234",933,"933",16,"406,5",11
"Peter",1,1,1,1,250,"250",965,"965",16,"406,5",12
In general I suggest to parse the csv with a CSV parser because the CSV format is way more complicated than it seems to be. Just see RFC 4180 for details. The ideal solution would identify the problematic columns, and then replace the text in those columns only.
The regex approach must make some assumptions. I.e. the regex approach will work in some cases, and will not work in others.
Probably some people can write a really advanced regex that handles csvs correctly. But they are hard to understand and difficult to maintain. Let's just make assumptions here:
The only text delimiter that we care about is ". I.e. no ' -s.
There are no quotes within fields. They would look like this: "asd""ghi". Here is a more confusing example: "asd"",".
So the regex is:
(?:^|,)"[^",]*,
And the replacement is: $1.
Explanation:
(?:...) is a non-capturing group
(?:^|,) matches either start of line, or a comma
then comes the " to match the starting quote
[^",]* matches everything that's neither a quote or a comma. So it prevents matching through several fields.
finally, it matches a comma: ,
the parentheses (...) capture the stuff inside. I.e. everything before the comma.
In the replacement $1 refers to the captured group. I.e. the replacement is the matched stuff, and then a dot. The closing comma was not in the group, so this is how the replacement goes.
RegexR demo.
VB.Net fiddle demo.

remove all commas between quotes with a vim regex

I've got a CSV file with lines like:
57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M
I need them to look like
57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200
I'm using vim regexes. I've broken it down into 4 steps:
Remove ^M and insert newlines:
:%s:<ctrl-V><ctrl-M>:\r:g`
Replace all with -:
:%s: :\-:g
Remove commas between quotes: Need help here.
Remove quotes:
:%s:\"\([^"]*\)\":\1:g
How do I remove commas between quotes, without removing all commas in the file?
Something like this?
:%s:\("\w\+\),\(\w\+"\):\1 \2:g
My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.
To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.
:%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.
The relevant help page is :help sub-replace-special.
As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.
In Step 2 escaping the - in the replacement is unnecessary. So the command is just
:%s/ /-/g
In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them
:%s/"//g
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g
example: "this is , an, example"
\("\w*\) match start of " every letter following qoutes group \1 for back reference
\(,\) capture comma group \2 for back reference
(.*"\) match every other character upto the second qoute ->group 3 for backreference
:\1\3: only include groups without comma, discard group 2 from returned string which is \2
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g removes commas

Notepad++ Replace all with an exception

I am attempting to edit a csv file, below is a sample line from this file.
|MIGRATE|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
The beginning of the line |MIGRATE| needs to be modified without changing the second MIGRATE so the line would read
|MIGRATE|;|MIG_IN|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
There are 7700 or so lines so if I am forced to do this manually I will probably cry a little.
Thanks in advance!
Just replace all the ones you want not changed with another word temporarily, then replace the rest with what you want. I'm not sure what you're asking here, but from what I can guess this might help.
It seems like you could just search for Just search for:
^\|MIGRATE\|
And replace with:
|MIGRATE|;|MIG_IN|
Make sure you've checked 'Regular expression' in the 'Search Mode' options.
Explanation: The ^ is a begin anchor; it will match the beginning of the line, ensuring that it does not match the second |MIGRATE|. The \ characters are required to escape the | characters since they normally have special meaning in regular expressions, and you want to match a literal |.
You can use beginning of line anchors:
Find:
^(\|MIGRATE\|)
Replace with:
$1;|MIG_IN|
regex101 demo
Just make sure that you are using the regular expression mode of the Search&Replace.
If you want to be a bit fancier, you can use a positive lookbehind:
Find:
(?<=^\|MIGRATE\|)
Replace with:
;|MIG_IN|
^ Will match only at the beginning of a line.
( ... ) is called a capture group, and will save the contents of the match in variable you can use (in the first regex, I accessed the variable using $1 in the replace. The first capture gets stored to $1, the second to $2, etc.)
| is a special character meaning 'or' in regex (to match a character or group of characters or another, e.g. a|b matches a or b. As such, you need to escape it with a backslash to make a regex match a literal |.
In my second regex, I used (?<= ... ) which is called a positive lookbehind. It makes sure that the part to be matched has what's inside before it. For instance, (?<=a)b matches a b only if it has an a before it. So that the b in ab matches but not in bb.
The website I linked also explains the details of the regex and you can try out some regex yourself!

What REGEX pattern should I use to look for a specific string pattern and remove anything else that doesnt match?

I'm parsing through code using a Perl-REGEX parsing engine in my IDE and I want to grab any variables that look like
$hash->{ hash_key04}
and nuke the rest of the code..
So far my very basic REGEX doesnt do what I expected
(.*)(\$hash\-\>\{[\w\s]+\})(.*)
(
\$
hash
\-\>
\{
[\w\s]+
\}
)
I know to use replace for this ($1,$2,etc), but match (.*) before and after the target string doesnt seem to capture all the rest of the code!
UPADTED:
tried matching null but of course thats too greedy.
([^\0]*)
What expression in regex should i use to look only for the string pattern and remove the rest?
The problem is I want to be left with the list of $hash->{} strings after the replace runs in the IDE.
This is better approached from the other direction. Instead of trying to delete everything you don't want, what about extracting everything you do want?
my #vars = $src_text =~ /(\$hash->\{[\w\s]+\})/g;
Breaking down the regex:
/( # start of capture group
\$hash-> # prefix string with $ escaped
\{ # opening escaped delimiter
[\w\s]+ # any word characters or space
\} # closing escaped delimiter
)/g; # match repeatedly returning a list of captures
Here is another way that might fit within your IDE better:
s/(\$hash->\{[\w\s]+\})|./$1/gs;
This regex tries to match one of your hash variables at each location, and if it fails, it deletes the next character and then tries again, which after running over the whole file will have deleted everything you don't want.
Depends on your coding language. What you want is group 2 (The second set of characters in parenthesis). In perl that would be $2, in VIM it would be \2, etc ...
It depends on the platform, but generally, replace the pattern with an empty string.
In javascript,
// prints "the la in ing"
console.log('the latest in testing'.replace(/test/g, ''));
In bash
$ echo 'the latest in testing' | sed 's/test//g'
the la in ing
In C#
Console.WriteLine(Regex.Replace("the latest in testing", "test", ""));
etc
By default the wildcard . won't match newlines. You can enable newlines in its matching set using a flag depending on what regex standard you're using and under what language/api. Or you can add them explicitly yourself by defining a character set:
[.\n\r]* <- Matches any character including newline, carriage return.
Combine this with capture groups to grab desired variables from your code and skip over lines which contain no capture group.
If you want help constructing the proper regex for your context you'll need to paste some input text and specify what the output should be.
I think you want to add a ^ to the beginning of the regex s/^.(PATTERN)(.)$/$1/ so that it starts at the beginning of the line and goes to the end, removing anything except that pattern.