Regex: Replace all occurrences of an attribute from an object/dictionary/json? - regex

I have an input attribute key value and I want to remove all its occurences from a json/dictionary/object. Here's an example:
{
"$type":"NewRunner.SingleValueExpression",
"name":"ABC",
"age":23
"nestedJSON": {
"$type":"NewRunner.SingleValueExpression003",
"field3":"edvrvbte"
}
}
I want to remove "$type" attribute from everywhere in the given string and the output should be:
{
"name":"ABC",
"age":23
"nestedJSON": {
"field3":"edvrvbte"
}
}
How can I write a regex for the same? Can someone help me?
Ideally it would be like: string.replace("regexValue",replacement)
I am looking for writing the regex value.
I tried this:
\"\$type\":\".+?(?=abc)\",
and this as well:
\"\$type\":\"(?<=\[)(.*?)(?=\])\",
But confused what should I write in center \".+?(?=abc)\" to match anything in value

Try this:
"\$type":[^,{}]*,[\r\n]*|,\s*"\$type":[^{},\r\n]*
"\$type":[^,{}]*,[\r\n]*
"\$type": match the string "$type":.
[^,{}]* match zero or more character except , this is important and it means that every character will be matched except , because we don't want to cross the comma. The same thing with the curly braces {} we don't want to cross the curly braces as well.
,[\r\n]* match a literal , and zero or more newline.
|,\s*"\$type":[^{},\r\n]*
| this is the alternation operator it is like Boolean OR.
,\s* to match a comma followed by zero or more whitespace character.
"\$type": this part is the same as the previous part.
[^{},\r\n]* this part also the same as the previous part but here we added \r and \n and there is no comma , this is because if the value "$type":"NewRunner.SingleValueExpression" happens to be the last value in the object there will be no comma after it, but the problem here is that after the last value in the object there will be an optional new line or a closing curly brace } so we don't want to cross the closing curly brace as well. here we added \r and \n because if the value is the last value we don't want to remove the new line after it, this is not an important thing but to make the code looks good and the closing curly brace will be at that new line.
See regex demo.

You might use
\s*"\$type":[^{},\r\n]*,?
Explanation
\s* Match optional whitespace chars
"\$type": Match "$type":
[^{},\r\n]* Optionally match any char except { } , or a newline
,? Match an optional comma
See a regex demo.

Related

Multiline Regex not catching the correct mask

I'm trying to catch a tag with a special syntax in a file with this regex :
([a-z0-9 >}\/])(\{(var)\:([a-z0-9\_\/\-\.]+)([\?0-9]+)*\})([a-z0-9 {<\/])
The tag looks like :
{var:contactText}
But as you can see in my regex, I want to catch what's before and after the {var:something}. My expression work fine except when the expression is alone in a line.
I've had m flag to prevent the problem but that's still not working.
Live example: https://regex101.com/r/6T6OJm/1/
Am I missing something? It seems to be the last part with ([a-z0-9 {<\/]) which doesn't accept line break, so what's the solution?
Like the cat, the "multiline" modifier is a false friend. The m modifier doesn't mean the pattern will run magically over multiple lines, It only changes the meaning of the ^ and $ anchors (from start/end of the string to start/end of the line).
All what you need is to figure potential white characters using the \s class (that includes also the carriage return \r and the newline \n characters).
~
([a-z0-9 >}/])
\s* ( { (var) : ([a-z0-9_/.-]+) ([?0-9]+)? } ) \s*
([a-z0-9 {</])
~xi
demo
Note that many characters in the pattern doesn't need to be escaped.
Since the pattern is a bit long, I used the x modifier to not take in account spaces in the pattern: it's more readable.
Not sure that all the capture groups are useful.

Regex for matching every line enclosed in curly brackets

I'm trying to match every single line within curly brackets, and I'm struggling to capture what I want. To give an example, if I have this text:
{
this is a line,
this = another line,
this is the third line!
this is, indeed, another line
},
round two: {
we're now on the second pair of brackets,
and this is the final line.
}
Then I want to match and capture a total of six lines:
this is a line,
this = another line,
this is the third line!
this is, indeed, another line
we're now on the second pair of brackets,
and this is the final line.
So far my current idea is trying to match "curly bracket" -> "anything" -> "line" -> "anything" -> "curly bracket", i.e. something like this:
{(?s)[^}]*(^([^}^\n]+)$)(?s)[^}]*}
But that only matches one line per pair of curly brackets, rather than every line.
How would I go about doing this? Thanks.
EDIT: Updated the example to include preceding text before one of the opening curly braces and varying whitespace.
Just match lines that don't contain a brace:
^[^{}\r\n]+$
The multiline flag is to be set (/m). Alternatively, insert (?m) at the beginning of the regex.
Demo
The regex reads, "match the beginning of the line followed by one or more characters other than {, }, \r and \n, followed by the end of the line".
To exclude leading spaces in each matched line you can modify the regex slightly:
^\s*\K[^{}\r\n]+$
Demo
\K resets the starting point of the match, excluding any previously-consumed characters. \K is not available with all regex engines.
Assuming input is well formed:
([^{\n](?=[^{]+}))+
See live demo

Remove everything after second colon

I need to remove everything after the colon following orange
Example:
apple:orange:banana:grapes:
Becomes:
apple:orange
I've looked up a million different references for this and cannot find a solution.
Currently doing this in Notepad++ using the Find/Replace function.
Find what : (^[a-z]+:[a-z]+).*$
(^[a-z]+:[a-z]+) First capturing group. Match alphabetic characters at start of string, a colon, alphabetic characters.
.*$ Match anything up to the end of the string.
Replace with : \1
\1 Replace with captured group one.
You could of course make the expression more general:
Find what : (^[^:]+:[^:]+).*$
(^[^:]+:[^:]+) Capturing group. Match anything other than a colon at start of string, a colon, anything other than a colon.
.*$ Match anything up to end of string.
Replace with : \1
\1 Replace with captured group one.
As pointed out by revo in the comment below, you should disable the matches newline option when using the patterns above in Notepad++.
If I understand you correctly, you can use the plugin ConyEdit to do this. You can use its command cc.dac <nth>/<regex>/[<mode>] [-options]. cc.dac means: delete after column.
For Example:
With ConyEdit running in the background, copy the text and the command line below, then paste:
apple:orange:banana:grapes:
cc.dac 2/:/ -d
How big is the file?
If it's a small file, you could probably write a simple code something like following snippet in java. Most the programming languages would support such operations.
String input = "apple:orange:banana:grapes:";
String[] arrOfStr = input.split(":");
int index = arrOfStr.indexOf("orange");
String[] arrOfStrSub = Arrays.copyOf(arrOfStr, 0, index);
String output = StringUtils.join(arrOfStrSub, ':');

RegEx to detect if a line doesn't end in a semi colon

I'm trying to run through some code files and find lines that don't end in a semicolon.
I currently have this: ^(?:(?!;).)*$ from a bunch of Googling, and it works just fine. But now I want to expand on it so it ignores all the whitespace at the start or specific keywords like package or opening and closing braces.
The end goal is to take something like this:
package example
{
public class Example
{
var i = 0
var j = 1;
// other functions and stuff
}
}
And for the pattern to show me var i = 0 is missing a semi colon. That's just an example, the missing semi colon could be anywhere in class.
Any ideas? I've been fiddling for over an hour but no luck.
Thanks.
If you want a line that doesn't end in a semicolon you can ask for any amount anything .* followed by one character that isn't a semicolon [^;] followed possibly by some whitespace \s* by the end of the line $. So you have:
.*[^;]\s*$
Now if you don't want whitespace at the beginning you need to ask for the beginning of the line ^ followed by any character that isn't whitespace [^\s] followed by the regex from earlier:
^[^\s].*[^;]\s*$
If you don't want it to start with a keyword like package or, say, class, or whitespace you can ask for a character that isn't any of those three things. The regex that matches any of those three things is (?:\s|package|class) and the regex that matches anything except them them is (?!\s|package|class). Note the !. So you now have:
^(?!\s|package|class).*[^;]\s*$
Try this:
^\s*(?!package|public|class|//|[{}]).*(?<!;\s*)$
When tested in PowerShell:
PS> (gc file.txt) -match '^\s*(?!package|public|class|//|[{}]).*(?<!;\s*)$'
var i = 0
PS>
The key to capturing this complicated concept in a regex is to first understand how your regular expression engine/interpreter handles the following concepts:
positive lookahead
negative lookahead
positive lookbehind
negative lookbehind
Then you can begin to understand how to capture what you want, but only in such cases where what's ahead and what's behind is exactly as you specify.
str.scan(/^\s*(?=\S)(?!package.+\n|public.+\n|\/\/|\{|\})(.+)(?<!;)\s*$/)
This is the regular expression line I'm using to highlight lines of Java code that don't end in semicolon and aren't one of the lines in java that aren't supposed to have a semicolon at the end... using vim's regular expression engine.
\(.\+[^; ]$\)\(^.*public.*\|.*//.*\|.*interface.*\|.*for.*\|.*class.*\|.*try.*\|^\s*if\s\+.*\|.*private.*\|.*new.*\|.*else.*\|.*while.*\|.*protected.*$\)\#<!
^ ^ ^
| | negative lookbehind feature
| |
| 2. But not where such matches are preceeded by these keywords
|
|
1. Group of at least some anychar preceeding a missing semicolon
Mnemonics for deciphering glyphs:
^ beginning of line
.* Any amount of any char
+ at least one
[^ ... ] everything but
$ end of line
\( ... \) group
\| delimiter
\#<! negative lookbehind
Which roughly translates to:
Find me all lines that don't end in a semicolon and don't have any of the above keywords/expressions to the left of it. It's not perfect and probably doesn't hold up to obfuscated java, but for simple java programs it highlights the lines that should have semicolons at the end, but don't.
Image showing how this expression is working out for me:
Helpful link that helped me get the concepts I needed:
https://jbodah.github.io/blog/2016/11/01/positivenegative-lookaheadlookbehind-vim/
For just line that don't end in a semicolon, this is simpler:
.*[^;]$
If you don't want lines starting with whitespace and ending with semicolon:
^[^ ].*[^;]$
You are trying to match lines that possibly begin with whitespace ^\s*, then don't have a particular set of words, for example (?!package|class), then have anything .* but then don't end in a semicolon (or a semicolon with whitespace after it) [^;]\s*.
^\s*(?!package|class).*?[^;]\s*$
Note that I added parentheses around a section of the regex.

How to check if a line is blank using regex

I am trying to make simple regex that will check if a line is blank or not.
Case;
" some" // not blank
" " //blank
"" // blank
The pattern you want is something like this in multiline mode:
^\s*$
Explanation:
^ is the beginning of string anchor.
$ is the end of string anchor.
\s is the whitespace character class.
* is zero-or-more repetition of.
In multiline mode, ^ and $ also match the beginning and end of the line.
References:
regular-expressions.info/Anchors, Character Classes, and Repetition.
A non-regex alternative:
You can also check if a given string line is "blank" (i.e. containing only whitespaces) by trim()-ing it, then checking if the resulting string isEmpty().
In Java, this would be something like this:
if (line.trim().isEmpty()) {
// line is "blank"
}
The regex solution can also be simplified without anchors (because of how matches is defined in Java) as follows:
if (line.matches("\\s*")) {
// line is "blank"
}
API references
String String.trim()
Returns a copy of the string, with leading and trailing whitespace omitted.
boolean String.isEmpty()
Returns true if, and only if, length() is 0.
boolean String.matches(String regex)
Tells whether or not this (entire) string matches the given regular expression.
Actually in multiline mode a more correct answer is this:
/((\r\n|\n|\r)$)|(^(\r\n|\n|\r))|^\s*$/gm
The accepted answer: ^\s*$ does not match a scenario when the last line is blank (in multiline mode).
Try this:
^\s*$
Full credit to bchr02 for this answer. However, I had to modify it a bit to catch the scenario for lines that have */ (end of comment) followed by an empty line. The regex was matching the non empty line with */.
New: (^(\r\n|\n|\r)$)|(^(\r\n|\n|\r))|^\s*$/gm
All I did is add ^ as second character to signify the start of line.
The most portable regex would be ^[ \t\n]*$ to match an empty string (note that you would need to replace \t and \n with tab and newline accordingly) and [^ \n\t] to match a non-whitespace string.
Here Blank mean what you are meaning.
A line contains full of whitespaces or a line contains nothing.
If you want to match a line which contains nothing then use '/^$/'.
Somehow none of the answers from here worked for me when I had strings which were filled just with spaces and occasionally strings having no content (just the line terminator), so I used this instead:
if (str.trim().isEmpty()) {
doSomethingWhenWhiteSpace();
}
Well...I tinkered around (using notepadd++) and this is the solution I found
\n\s
\n for end of line (where you start matching) -- the caret would not be of help in my case as the beginning of the row is a string
\s takes any space till the next string
hope it helps
This regex will delete all empty spaces (blank) and empty lines and empty tabs from file
\n\s*