regex code: does or does not contain a character - regex

I cant figure this out. I want to capture the string inside the square brackets, with or without characters in it.
[5123512], [412351, 1235123, 5125123], [12312-AA] and []
i want to convert the square brackets into double quote
[5123512] ==> "5123512"
[412351, 1235123, 5125123] ==> "412351, 1235123, 5125123"
[12312-AA] ==> "12312-AA"
[] == > ""
i tried this \[\d+\] and not working
This is my sample data, its a json format.
Square brackets inside the description need not to change, only the attributes.
{"results":
[{"listing": 4613456431,"sku": [5123512],"category":[412351, 1235123,
5125123],"subcategory": "12312-AA", "description":"This is [123sample]"}
{"listing": 121251,"sku":[],"category": [412351],"subcategory": "12312-AA",
"description": "product sample"}]}
TIA

Your regex doesn't work for three reasons :
[ is a meta-character that opens a character class. To match a literal [, you need to escape it with a backslash. ] also is a meta-character when it follows the [ meta-character, but if you escape the [ you shouldn't need to escape the ] (not that it hurts to do so).
\d only captures decimal digits, however your sample contains the letter A. If that's the hexadecimal digit, you will probably want to use [\dA-F] instead of \d, or [\dA-Fa-f] if the digits can be found in small case. If that can be any letter, you could use [\dA-Z] or [\dA-Za-z] depending on your need to match small case letters.
+ means "one or more occurences", so it wouldn't match an empty []. Use the * "0 or more occurences" quantifier instead.
Additionally, you probably need to capture the sequence of digits in a (capturing group) in order to be able to reference it in your replacement pattern.
However, as Andrew Morton suggests, it looks like you should be able to use a plain text search/replace.

First off, regex is a horrible tool for parsing JSON formatted data. I'm sure you'll find plenty of tools to simply read your JSON in vb.net and mangle it in simpler ways than taking it in as text... For example: How to parse json and read in vb.net
Original answer (edited slightly):
You're almost there, but here's a few things you need to change:
in your regex pattern, escape the square brackets: \[ and \]
if you only want to capture all characters in the brackets, then . is a good way to go
the plus sign + means "at least one" โ€” if you want to match empty brackets too, use *? instead
the question mark means "lazy" โ€” it explicitly tells the regex to match the shortest sequence of characters possible (instead of going over to the next square bracket...)
wrap the .*? into parenthesis so that you can reference to that part later when substituting the stuff
finally, the output value / pattern to substitute with is \1 or $1, depending on the context
or "\1" or "$1" if you really need the double quotes in the output โ€” maybe you just need a string variable?
All in all this becomes:
Find this: \[(.*?)\]
Replace with: \1

Related

Check array syntax with Regex

I'm trying to create a regex that checks if a string is a valid path for Firestore document.
I will find a regex that testing if a string:
start with a char ^([a-z]{1})
after first char, there will be only letter/digit and/or a dot \w*(.?\w+){0,}
last chars in the string could be an index of an array (\[{1}\d+\]{1})?$
First and second points work well but the last group doesn't work. I test a string like data.images[11 and the regex return true.
first of all you can shorten some quantifiers in your regex:
{1} -> can be ignored completely
{0,} -> *
Your second part could be expressed like this, this will also support readability:
[\w.]* meaning: take any character inside the brackets 0 to n-times. The bracket expression also supports predefined classes, so we are using \w here. The dot INSIDE the brackets doesn't need to be escaped, it simply means the one character dot.
So your parts would be:
^([a-z])
[\w.]*
(\[\d+\])?$
I hope this helps. According to regexpal it matches data.images[11], but not data.images[11. Also it seems to support all your demands.
EDIT:
Your second part doesn't work because (like Asocia stated in the answer) you would need to escape the dot. The dot itself is a class meaning "any character" (depending on regex engine and settings sometimes even line breaks). As you mean the dot as a character you need to escape it.

Using regex for HTML-parsed text

Based on the following text returned from the server storing an HTML-parsed text string for tagging users, how do I use regex here for the name, "Dave Park":
[u=8367|Dave Park]
I tried the following regex, but to no avail:
|(\\w*)]
For some reason you seem to have escaped exactly what you shouldn't escape, and have not escaped several special symbols in regex that do need escaping.
Taking the full pattern, and escaping the correct part and adding the capture group, you should end up with this:
\[u=\d+\|([^\]]+)\]
This matches a literal [ bracket, the u= string followed by multiple numbers, and then the literal |, then the group containing any characters that are not a closing ] bracket, and finally, the literal closing ] bracket.
Test it out yourself
I'm sort of wondering why you're not also capturing the obvious ID in the first part, but, well, you can do that simply by putting round brackets around the \d+ in my posted pattern.
You were very close. You needed to escape the | character and include spaces as a legal character in your capture group. So something like this:
\|([\w ]*)]

regex not working as it should

I'm trying to catch up on regex and I have made one as below;
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ ,\d]{1};(\d){8};(\d){1};(\d){1}; $
and the sample is;
รค;1234;00126434;K;11821111;00000000;SOME TEXT ; 0;00000000;0;0;
As far as I've read
. is all chars, \d is digits, {n} and variations indicates n time and depending on variation, more repetitions.
What could be the problem?
A few suggestions/observations:
You can remove all {1}s, they don't do anything.
[A,K] means "A, , or K". If you want to match any letter between A and K, use [A-K].
You should place the capturing group around the repetitions: (\d{7,8}) captures a 7-8 digit number; (\d){7,8} will only capture the last digit.
[ ,\d]{1} fails on your regex because there are two characters (space and 0) at that point in the string.
you might need to remove the space before the final $, unless there actually is a space in your string after the last semicolon.
Here's a version that matches (and captures each element in a separate group):
^(.);(\d{4});(\d{8});([A-K]);(\d{7,8});(\d{8});([A-Z ]+);([ ,\d]+);(\d{8});(\d);(\d); *$
See it in action on regex101.com.
Please, don't abuse regexps for everything.
Your format is a CSV format, just split at ; and the validate the individual parts properly. This is perfectly valid, usually similarly efficient, and easier to debug.
With regexp, make sure you properly escape (i.e. double escape!). In most programming languages, \ is a reserved character in strings, and you will need to use \\ to get the desired effect.
Try this:
^(.){1};(\d){4};(\d){8};[A-K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ \d]{2};(\d){8};(\d){1};(\d){1};$
Here what was happening in your regex
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ ,\d]{1};(\d){8};(\d){1};(\d){1}; $
You have extra space before $ at the end.
To specify range use - and not comma, Your range should be [A-K].
In [ ,\d] range You have restricted it to 1 character {1} it should be {2} one for
space and 1 for digit.
Additional: You don't need to specify {1} as it will match one preceding token by default
If yours does not work, you can try this one :
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};( \d){1};(\d){8};(\d){1};(\d){1};$

C# Regex match + next n characters

I'm new to Regex and i need to parse sourcecode from a website. Can anyone tell me what would be the syntax to match a word followed by the next n characters in the string.
Let's say I wanna match word "country" followed by the next 15 chars in the string.
If string would be "...<tr class="hover"><td>country</td><td>RO</td></t......" I need to get "country</td><td>RO" , I can deal with the string like this , ideally would be only "country RO " but I don't wanna ask for too much.
Something like: (country)<\/td><td>(\.\.)
Using $1 $2 as your output should give you what you need.
Explaination:
Putting the () brackets around something lets you back reference it with the $1, etc.
Otherwise you are able to match exact characters.
Note to escape special regex chars like / with a backslash
The second match in brackets is just matching the next two characters no matter what they are. If you have a subset these can be (i.e. [A-Za-z]) it would be better to use that
With that assumption I would use something like: (country)<\/td><td>([A-za-z]{2})
Also helps to find a good reference: http://www.regular-expressions.info/reference.html
Depending on your flavor of Regex engine:
"country.{15}"
Should match "country" exactly, followed by 15 characters of any kind.
It's worth noting that this is an exact match. If there aren't exactly 15 characters following the words "country" this match will fail. That could be problematic for you.
"country.{1,15}"
This will match "country" exactly followed by any character (up to 15). Again, this could also be problematic depending on your use case.

Perform substitution on regex results, but only on a given condition

First of all, let me please clarify that I know absolutely nothing about regular expressions, but I need to write a "Tagger Script" for MusicBrainz Picard so that it doesn't mess with the way I format certain aspects of my tracks' titles.
Here's what I need to do:
- Find all sub-strings inside parenthesis
- Then, for those matches that meet a given criteria and those matches only, change the parentheses to brackets
For example, consider this string:
DJ Fresh - Louder (Sian Evans) (Flux Pavilion & Doctor P Remix)
It needs to be changed like so:
DJ Fresh - Louder (Sian Evans) [Flux Pavilion & Doctor P Remix]
The condition is that if the string within the parentheses contains the sub-string "dj" or "mix" or "version" or "inch", etc... then the parentheses surrounding it need to be changed to brackets.
So, the question is:
Is it possible to create a single regex expression that can perform this operation?
Thank you very much in advance.
Assuming there are no nested brackets, you can use the following regex to search for the text:
(?i)\((?=[^()]*(?:dj|mix|version|inch))([^()]+)\)
Note that the regex is case-insensitive, due to (?i) in front - make it case-sensitive by removing it.
Check the syntax of your language to see if you can use r prefix, e.g. r'literal_string', to specify literal string.
And use the following as replacement:
[$1]
You can include more keywords by adding keywords to (?:dj|mix|version|inch) part, each keyword separated by |. If the keyword contains (, ), [, ], |, ., +, ?, *, ^, $, \, {, } you need to escape them (I'm 99% sure the list is exhaustive). An easier way to think about it is: if the keyword only contains space and alphanumeric (but note that the number of spaces is strict), you can add them into the regex without causing side-effect.
Dissecting the regex:
(?i): Case-insensitive mode
\(: ( is special character in regex, need to escape it by prepending \.
(?=[^()]*(?:dj|mix|version|inch)): Positive look-ahead (?=pattern):
[^()]*: I need to check that the text is within bracket, not outside or in some other bracket, so I use a negated character class [^characters] to avoid matching () and spill outside the current bracket. The assumption I made also comes into play a bit here.
(?:dj|mix|version|inch): A list of keywords, in a non-capturing group (?:pattern). | means alternation.
([^()]+): The assumption about no nested bracket makes it easier to match all the characters inside the bracket. The text is captured for later replacement, since (pattern) is capturing group, as opposed to (?:pattern).
\): ) is special character in regex, need to escape it by prepending \.