I'm trying to use a regex to match a block of text, and using replace all, replace it with nothing, so as to delete it.
But Since I sometimes (but not always) have the block appear one after another when I try to replace all, it replaces every second block.
I made this Regex
http.*\n.*\K\n\{\n "code"(.*\n)+?\}\nhttp.*\n
But it will match all isolated blocks, but only every second consecutive block.
I think I'm meant to use "assertions" as described by here. But I couldn't get them to work.
Also how do I replace with nothing (as in delete)? Just leave an empty replace with field? or do I need some special character? Or as I am coming to suspect, I shouldn't use Notpad++ for this sort of thing? If that is the case what should/could I be using?
Sample Data:
"teamAbbr" : "Foo",
"teamName" : "Bar",
"teamNickname" : "FBar"
}
} ]
}
http://www.link_I_want_to_keep_belonging_to_above_data.com
{
"code" : "XXXXXXXXXXXXXXXXXXXXXXX",
"techMessage" : "XXXXXXXXXXXXXXXXXXXXXX",
"userMessage" : "XXXXXXXXXXXXXXXXXXX",
"host" : "XXXXXXXXXXXX",
"date" : "XXXXXXXXXXX",
"version" : "XXX"
}
http://www.url_that_belong_to_block_Iwant_to_be_rid_off.com
{
"code" : "XXXXXXXXXXXXXXXXXXXXXXX",
"techMessage" : "XXXXXXXXXXXXXXXXXXXXXX",
"userMessage" : "XXXXXXXXXXXXXXXXXXX",
"host" : "XXXXXXXXXXXX",
"date" : "XXXXXXXXXXX",
"version" : "XXX"
}
http://www.url_that_belong_to_block_Iwant_to_be_rid_off.com
The problem is that you also match the first url, but that is unavailable when immidiately after a match. And also at the start of the file.
Lookbehind assertions takes care of the problem, but needs to be fixed length.
Do you need to search for the first url? Ie. does
\{\n "code"(.*\n)+?\}\nhttp.*\n
work for you?
To delete a whole match you replace with an empty string. No special characters needed.
Related
I have some problems with the regexp query for elasticsearch. In my index there's a text field with comma-separated numeric values (IDs), f.e.
2,140,3,2495
And I have the following query term:
"regexp" : {
"myIds" : {
"value" : "^2495,|,2495,|,2495$|^2495$",
"boost" : 1
}
}
But my result list is empty.
Let me say that I know that regexp queries are kind of slow but the index still exists and is filled with millions of documents so unfortunately it's not an option to restructure it. So I need a regex solution.
In ElasticSearch regex, patterns are anchored by default, the ^ and $ are treated as literal chars.
What you mean to use is "2495,.*|.*,2495,.*|.*,2495|2495" - 2495, at the start of string, ,2495, in the middle, ,2495 at the end or a whole string equal to 2495.
Or, you may use a simpler
"(.*,)?2495(,.*)?"
That means
(.*,)? - an optional text (not including line breaks) ending with ,
2495 - your value
(,.*)? - an optional text (not including line breaks) ending with ,
Here is an online demo showing how this expression works (not a proof though).
Ok, I got it to work but run in another problem now. I built the string as follows:
(.*,)?2495(,.*)?|(.*,)?10(,.*)?|(.*,)?898(,.*)?
It works good for a few IDs but if I have let's say 50 IDs, then ES throws an exception which says that the regexp is too complex to process.
Is there a way to simplify the regexp or restructure the query it selves?
I have a text like this:
"entity"
{
"id" "5040044"
"classname" "weapon_defibrillator_spawn"
"angles" "0 0 0"
"body" "0"
"disableshadows" "0"
"skin" "0"
"solid" "6"
"spawnflags" "3"
"origin" "449.47 5797.25 2856"
editor
{
"color" "0 0 200"
"visgroupshown" "1"
"visgroupautoshown" "1"
"logicalpos" "[-13268 14500]"
}
}
What would regex expression be to select only that part in Notepad++:
editor
{
"color" "0 0 200"
"visgroupshown" "1"
"visgroupautoshown" "1"
"logicalpos" "[-13268 14500]"
}
First word is always "editor", but the number of lines and content in curly brackets may vary.
editor\s*{\s*(?:\"[a-z]*\"\s*\".*\"\s*)*\}
Demo
Also tested it in Notepad++ it works fine
The simplest way to find everything between curly brackets would be \{[^{}]*\} (example 1).
You can prepend editor\s* on it so it limits the search to only that specific entry: editor\s*\{[^{}]*\} (example 2).
However... if any of the keys or value strings within editor {...} contain a { or }, you're going to have edge cases.
You'll need to find double-quoted values and essentially ignore them. This example shows how you would stop before the first double quote within the group, and this example shows how to match up through the first key-value pair.
You essentially want to repeatedly match those key-value pairs until no more remain.
If your keys or values can contain \" within them, such as "help" "this is \"quoted\" text", you need to look for that \ character as well.
If there are nested groups within this group, you'll need to recursively handle those. Most regex (Notepad++ included) don't handle recursion, though, so to get around this, you copy-paste what you have so far inside of the code if it happens to come across more nested { and }. This does not handle more than one level of nesting, though.
TL;DR
For Notepad++, this is a single line regex you could use.
I am trying to write a search and replace regex (in ruby) to replace all instances of a character in a string in a given context.
The regex needs to replace all instances of "." in a json key, and I'm battling with references. I have a feeling that I need to use a lookaround in some way, but the variations I've tried I can't seem to get working.
Some example strings:
, "key1.name" : " value.something "
, "key2.complex.name" : "value.else"
, "this.is.the.most.complex.name" : "value"
I initially had this regex to replace a single occurrence (replacing it with "FULLSTOP"):
s/, "([^.]+)\.([^"]+)" :/, "\1FULLSTOP\2" :/gāā
Desired output:
, "key1FULLSTOPname" : " value.something "
, "key2FULLSTOPcomplexFULLSTOPname" : "value.else"
, "thisFULLSTOPisFULLSTOPtheFULLSTOPmostFULLSTOPcomplexFULLSTOPname" : "value"
I'm guessing I need to use a (?=\.) somehow in the search, but not sure how to use this correctly with references. I am using the opening , and ending : as a way of defining the context for a json key.
thanks in advance.
(?=.*?\:)\.
Use this.See demo.
http://regex101.com/r/cH8vN2/5
Edit:
(?=.*?\"\s*\:)\.
Use this to be very sure.
See demo.
http://regex101.com/r/cH8vN2/6
You can use the following as a sample :
str = ', "this.is.the.most.complex.name" : "value';
str = str.gsub(/\.+/, 'FULLSTOP');
puts str;
I have not taken care of the 'value' part.
You should be able to do that easily.
I've got a large collection of text data stored in MondoDB that users can query via keyword or phrase, and have an issue where some data has unicode character U+00A0 (no-break space) instead of a regular space.
Fixing up the data not being an option (those nbsps are there intentionally), I still want the user to be able to search and find that data. So I updated our Mongo query-building code to search for any whitespace [\s] in places where the user entered a space, resulting in a query like so:
{ "tt" : { "$elemMatch" : { "x" : { "$regex" : "high[\s]performance" , "$options" : "i"} }}}
(there's more to the query, that's just the relevant bit).
Unfortunately, this doesn't return the expected results. So I play around with a bunch of other ways to accomplish this, and eventually discover that I get the correct results when I search for "not non-whitespace" [^\S], as so:
{ "tt" : { "$elemMatch" : { "x" : { "$regex" : "high[^\S]performance" , "$options" : "i"} }}}
Which leads to my question -- why does "any whitespace" ("\s") fail finding this text while "not-non whitespace" ("^\S") finds it successfully? Does Mongo have a different set of rules for what counts as whitespace and non-whitespace?
Data is all in UTF-8 throughout, MongoDB version is 2.2.2
I suppose that the problem here is with \, not with spaces. Can you please write \\ to prove my conjecture?
I have a big JSON file, formatted over multiple lines. I want to find objects that don't have a given property. The objects are guaranteed not to contain any further nested objects. Say the given property was "bad", then I would want to locate the value of"foo" in the second element in the following (but not in the first element).
{
result: [
{
"foo" : {
"good" : 1,
"bad" : 0
},
"bar" : 123
},
{
"foo" : {
"good" : 1
},
"bar" : 123
}
]
}
I know about multi-line regexes in Vim but I can't get anything that does what I want. Any pointers?
Try the following:
/\v"foo"\_s*:\_s*\{%(%(\_[\t ,]"bad"\_s*:)#!\_.){-}\}
When you need to exclude something, you should look at negative look-aheads or look-behinds (latter is slower and unlike vim Perl/PCRE regular expressions do not support look-behinds except fixed-width (or a number of alternative fixed-width) ones).
JSON is a context free grammar and as such is not regular. Unless you can give a much stricter set of rules to go on, no regex will be able to do what you want.