I have a big JSON file, formatted over multiple lines. I want to find objects that don't have a given property. The objects are guaranteed not to contain any further nested objects. Say the given property was "bad", then I would want to locate the value of"foo" in the second element in the following (but not in the first element).
{
result: [
{
"foo" : {
"good" : 1,
"bad" : 0
},
"bar" : 123
},
{
"foo" : {
"good" : 1
},
"bar" : 123
}
]
}
I know about multi-line regexes in Vim but I can't get anything that does what I want. Any pointers?
Try the following:
/\v"foo"\_s*:\_s*\{%(%(\_[\t ,]"bad"\_s*:)#!\_.){-}\}
When you need to exclude something, you should look at negative look-aheads or look-behinds (latter is slower and unlike vim Perl/PCRE regular expressions do not support look-behinds except fixed-width (or a number of alternative fixed-width) ones).
JSON is a context free grammar and as such is not regular. Unless you can give a much stricter set of rules to go on, no regex will be able to do what you want.
Related
I have got below string and I need to Get all the values Between Pizzahut: and |.
ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|
I have got RegExpression .scan(/(?<=Pizzahut:)([.*\s\S]+)(?=\|)/) but it fetches
"j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|"
Result should be: 34532jdhgj,3242237,67688873rg
You can use
s='ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|'
p s.scan(/Pizzahut:([^|]+)/).flatten
# => ["j34532jdhgj", "3242237", "67688873rg"]
See this Ruby demo and the Rubular demo.
It does not look possible that you have Pizzahut as a part of another word, but it is possible, use a version with a word boundary, /\bPizzahut:([^|]+)/.
The Pizzahut:([^|]+) matches Pizzahut: and then captures into Group 1 any one or more chars other than a pipe (with ([^|]+)).
Note that String#scan returns the captures only if a pattern contains a capturing group, so you do not need to use lookarounds.
I'm not sure why you're jumping to a regex solution here; that input string clearly looks structured to me, and you would probably do better by splitting it on the delimiters to convert it into a more convenient data structure.
Something like this:
input = "ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg"
converted_input = input
.split('|') #=> ["ABC:2fg45rdvsg", "Pizzahut:j34532jdhgj", ... ]
.map { |pair| pair.split(':') } #=> [["ABC", "2fg45rdvsg"], ["Pizzahut", "j34532jdhgj"], ... ]
.group_by(&:first) #=> {"ABC"=>[["ABC", "2fg45rdvsg"]], "Pizzahut"=>[["Pizzahut", "j34532jdhgj"], ... ], "Dominos"=>[["Dominos", "3424232"]], ... ]
.transform_values { |v| v.flat_map(&:last) }
(The above series of transformations is just one possible way; you could probably come up with a dozen similar alternative steps to convert this input into the same hash shown below! For example, by using reduce or even the CSV library.)
Which gives you the final result:
converted_input = {
"ABC" => ["2fg45rdvsg"],
"Pizzahut" => ["j34532jdhgj", "3242237", "67688873rg"],
"Dominos" => ["3424232"],
"Wendys" => ["3462783"]
}
Now that the data is formatted conveniently, obtaining data like your original request becomes trivial:
converted_input["Pizzahut"].join(',') #=> "j34532jdhgj,3242237,67688873rg"
(Although quite likely it would be more suitable to leave it as an Array, not a comma-separated String!!)
I'm new to ruby so please excuse my ignorance :)
I just learned about eval and I read about its dark sides.
what I've read so far:
When is eval in Ruby justified?
Is 'eval' supposed to be nasty?
Ruby Eval and the Execution of Ruby Code
so what I have to do is to read a file in which there are some text such as /e/ 3 which will replace each e with a 3 after evaluation.
so here what i did so far:(working but..)
def evaluate_lines
result="elt"
IO.foreach("test.txt") do |reg|
reg=reg.chomp.delete(' ')
puts reg
result=result.gsub(eval(reg[0..2]),"#{reg[3..reg.length]}" )
p result
end
end
contents of the test.txt file
/e/ 3
/l/ 1
/t/ 7
/$/ !
/$/ !!
this only works because I know the length of the lines in the file.
so assuming my file has the following /a-z/ 3 my program would be not able to do what is expected from it.
Note
I tried using Regexp.new reg and this resulted with the following /\/e\/3/ which isn't very helpful in this case.
simple example to the `Regexp
str="/e/3"
result="elt"
result=result.gsub(Regexp.new str)
p result #outputs: #<Enumerator: "elt":gsub(/\/e\/3/)>
i already tried stripping off the slashes but even though this wont deliver the desired result thus the gsub() takes two parameters, such as this gsub(/e/, "3").
for the usage of the Regexp, I have already read Convert a string to regular expression ruby
While you can write something to parse that file, it rapidly gets complicated because you have to parse regular expressions. Consider /\/foo\\/.
There are a number of incomplete solutions. You can split on whitespace, but this will fail on /foo bar/.
re, replace = line.split(/\s+/, 2)
You can use a regex. Here's a first stab.
match = "/3/ 4".match(%r{^/(.*)/\s+(.+)})
This fails on escaped /, we need something more complex.
match = '/3\// 4'.match(%r{\A / ((?:[^/]|\\/)*) / \s+ (.+)}x)
I'm going to guess it was not your teacher's intent to have you parsing regexes. For the purposes of the assignment, splitting on whitespace is probably fine. You should clarify with your teacher.
This is a poor data format. It is non-standard, difficult to parse, and has limitations on the replacement. Even a tab-delimited file would be better.
There's little reason to use a non-standard format these days. The simplest thing is to use a standard data format for the file. YAML or JSON are the most obvious choices. For such simple data, I'd suggest JSON.
[
{ "re": "e", "replace": "3" },
{ "re": "l", "replace": "1" }
]
Parsing the file is trivial, use the built-in JSON library.
require 'json'
specs = JSON.load("test.json")
And then you can use them as a list of hashes.
specs.each do |spec|
# No eval necessary.
re = Regexp.new(spec["re"])
# `gsub!` replaces in place
result.gsub!(re, spec["replace"])
end
The data file is extensible. For example, if later you want to add regex options.
[
{ "re": "e", "replace": "3" },
{ "re": "l", "replace": "1", "options": ['IGNORECASE'] }
]
While the teacher may have specified a poor format, pushing back on bad requirements is good practice for being a developer.
Here's a really simple example that uses vi notation like s/.../.../ and s/.../.../g:
def rsub(text, spec)
_, mode, repl, with, flags = spec.match(%r[\A(.)\/((?:[^/]|\\/)*)/((?:[^/]|\\/)*)/(\w*)\z]).to_a
case (mode)
when 's'
if (flags.include?('g'))
text.gsub(Regexp.new(repl), with)
else
text.sub(Regexp.new(repl), with)
end
end
end
Note the matcher looks for non-slash characters ([^/]) or a literal-slash combination (\\/) and splits out the two parts accordingly.
Where you can get results like this:
rsub('sandwich', 's/and/or/')
# => "sorwich"
rsub('and/or', 's/\//,/')
# => "and,or"
rsub('stack overflow', 's/o/O/')
# => "stack Overflow"
rsub('stack overflow', 's/o/O/g')
# => "stack OverflOw"
The principle here is you can use a very simple regular expression to parse out your input regular expression and feed that cleaned up data into Regexp.new. There is absolutely no need for eval here, and if anything that severely limits what you can do.
With a little work you could alter that regular expression to parse what's in your existing file and make it do what you want.
I have a text like this:
"entity"
{
"id" "5040044"
"classname" "weapon_defibrillator_spawn"
"angles" "0 0 0"
"body" "0"
"disableshadows" "0"
"skin" "0"
"solid" "6"
"spawnflags" "3"
"origin" "449.47 5797.25 2856"
editor
{
"color" "0 0 200"
"visgroupshown" "1"
"visgroupautoshown" "1"
"logicalpos" "[-13268 14500]"
}
}
What would regex expression be to select only that part in Notepad++:
editor
{
"color" "0 0 200"
"visgroupshown" "1"
"visgroupautoshown" "1"
"logicalpos" "[-13268 14500]"
}
First word is always "editor", but the number of lines and content in curly brackets may vary.
editor\s*{\s*(?:\"[a-z]*\"\s*\".*\"\s*)*\}
Demo
Also tested it in Notepad++ it works fine
The simplest way to find everything between curly brackets would be \{[^{}]*\} (example 1).
You can prepend editor\s* on it so it limits the search to only that specific entry: editor\s*\{[^{}]*\} (example 2).
However... if any of the keys or value strings within editor {...} contain a { or }, you're going to have edge cases.
You'll need to find double-quoted values and essentially ignore them. This example shows how you would stop before the first double quote within the group, and this example shows how to match up through the first key-value pair.
You essentially want to repeatedly match those key-value pairs until no more remain.
If your keys or values can contain \" within them, such as "help" "this is \"quoted\" text", you need to look for that \ character as well.
If there are nested groups within this group, you'll need to recursively handle those. Most regex (Notepad++ included) don't handle recursion, though, so to get around this, you copy-paste what you have so far inside of the code if it happens to come across more nested { and }. This does not handle more than one level of nesting, though.
TL;DR
For Notepad++, this is a single line regex you could use.
wish to extract a scalar value from json.
know JSON uses double quotes.
know datatype of scalar: string, number, date, boolean.
know scalar will be on first level, ie, not an attribute of an embedded object
{ "want": "string" } => "string"
{ "want": 123 } => 123
{ "not": { "want": "wrong" }, "want": "right" } => "right
{ "nothing": 0 } => null / not found
do not know how to handle the opening/closing quotes, nor do I know how to handle embedded objects.
is this possible?
this is the best I have come up with so far:
// match `want` attribute
(?:"want"\s*:\s*)
// string, number, boolean or null
(((?:")([^"]*)(?:"))|([-0-9][.eE0-9]*)|true|false|null)
// followed by comma or right bracket
(?:\s*(,|}))
it's good because it
can be run in postgres
grabs strings
grabs numbers
grabs boolean and null
it's bad because it
does not ensure want is a first level attribute
string value cannot have quote (") inside
This expression will get you 50% of the way there:
(?<=:\s*)(".*?"(?<!\\")|\-?(0|[1-9]\d*)(\.\d+)?([eE][+-]?\d+)?)(?=\s*})
Or, when written as a multi-line regex:
(?x:
(?<=:\s*) # After : + space
(
".*?"(?<!\\") # String in double quotes
| # -or-
\-? # Optional leading -ve
(0|[1-9]\d*) # Number
(\.\d+)? # Optional fraction
([eE][+-]?\d+)? # Optional exponent
)
(?=\s*}) # space + }
)
This will not match your nested object example ({ "not": { "want" ...) or rather, it will match, but on the wrong thing. Also, your final example ({ "nothing": 0 } => null / not found) is difficult because 0 is a valid number. To work around the this problem, I would just check the result in procedural code and replace a result of 0 with null.
The nested objects problem is a whole different ball game though. It's getting into the realm of lexical analysis rather than simple tokenizing. At that point, you might as well just use a JSON library because you'd be writing a full JSON parser anyway. Fortunately, JSON is a simple enough grammar that it wouldn't be that expensive to use a third party library - certainly no more than doing it yourself.
I think the short answer is: from a simple { "name" : <value> } object, yes, but from anything more complicated, no.
For info on the JSON syntax, see http://www.json.org/.
Hello guys I need to find a regular expression that takes strings with two sets of 11 only
from a set {0,1,2}
0011110000 match it only has two sets
0010001001100 does not match (only has one set)
0000011000110011 does not match (it has three sets)
00 does not match (it has no set
0001100000110001 match it only has two sets
This is what I've done so far
([^1]|1(0|2|3)(0|2|3)*)*11([^1]|1(0|2|3)(0|2|3)*)*11([^1]|1(0|2|3)(0|2|3)*|1$)*
--------------------------
I think what I'm missing is that I need to make sure the underlined section of the above regular expression has to make sure there is no more "11" left in the string, and I don't think that section is working correctly.
You could use a regular expression, but you've got much simpler options available to you. Here's an example in C#:
public bool IsValidString(string input)
{
return input.Split(new string[] { "11" }, StringSplitOptions.None).Length == 3;
}
Although regular expressions can be a very useful tool, their usage is not always warranted. As jwz put it:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
If this is not homework, then I would suggest avoiding a regex and going with a regular function (shown here is JavaScript):
function hasTwoElevensOnly(s) {
var first = s.indexOf("11");
if (first < 0) return false;
var second = s.indexOf("11", first + 2);
if (second < 0) return false;
return s.indexOf("11", second + 2) < 0;
}
Code here: http://jsfiddle.net/8FMRH/
function hasTwoElevensOnly(s) {
return /^((0|1(?!1)|2)*?11(0|1(?!1)|2)*?){2}$/.test(s);
}
If a regex is required,
COde here: http://jsfiddle.net/PAARn/1/
most of regex comes with the restriction of appearance, usually in {}. For example, in JavaScript, you could do something like:
/^((10|0)*11(01|0)*){2}$/
Which mataches 2 set of 11 prefixed and suffixed with 0+ 0 in the string.
There may be a simpler way, but starting with your approach, this seems to work on the sample data provided:
/^([^1]|1[023])*11([^1]|1[023])*11((?<!11)|1[023]|[023]|(?<=[023])1)*$/
Using lookbehind.