Grep match a certain key/value set json - regex

IF THIS IS THE FIRST TIME YOU"RE READING THIS QUESTION, SKIP RIGHT TO THE EDIT
So what I'm trying to do is match everything until a certain word
What I'm working with is similar to this:
{"selling":"0"morestuffhere"notes":"otherthingshere"}unwantedthingshere
The regex I got so far is:
grep -o "\{\"selling\":\"0\""
which will match up to {"selling":"0".
I want it to match {"selling":"0"morestuffhere"notes":"otherthingshere"} but NOT unwantedstuffhere.
I don't know beforehand what "morestuffhere", "otherthingshere" and "unwantedstuffhere" are gonna be. So what I want to do is match everything from what I already have until "notes":"otherthingshere"}.
How do I do this?
EDIT: forgot to mention some key points. Sorry, had to hurry because dinner was ready.
My input consists of a series of key:value sets, as such:
{"key":"value", "otherkey":"othervalue","morekeys":"morevalues"},{"othersetkey":"othersetvalue","otherothersetkey":"otherothersetvalue","othersetmorekeys":"othersetmorevalues"}
and so on.
The first key/value set is different from the rest of them, and I don't want to match that set.
The first key of all sets other than the first is "selling", and I want to match all sets that have a "selling" value of 1. The last key of the set is "notes".
The input is JSON, so I added that to the tags.

Through sed,
sed -r 's/^[^{]*([^}]*).*$/\1}/g' file
Example:
$ echo 'dSDGAadb{"selling":"0"morestuffhere"notes":"otherthingshere"}unwantedthingshere' | sed -r 's/^[^{]*([^}]*).*$/\1}/g'
{"selling":"0"morestuffhere"notes":"otherthingshere"}
I think you want something like this,
$ cat aa
dSDGAadb{"selling":"0"morestuffhere"notes":"otherthingshere"}{"selling":"1"morestuffhere"notes":"otherthingshere"}bgj
$ sed -r 's/.*(\{"selling":"1"[^}]*)}.*/\1}/g' aa
{"selling":"1"morestuffhere"notes":"otherthingshere"}
OR
something like this,
$ cat aa
dSDGAadb{"selling":"0"morestuffhere"notes":"otherthingshere"}{"selling":"1"morestuffhere"notes":"otherthingshere"}bgj{"selling":"1"morestuffhere"notes":"otherthingshere"}
$ grep -oP '{\"selling\":\"1\"[^}]*}' aa
{"selling":"1"morestuffhere"notes":"otherthingshere"}
{"selling":"1"morestuffhere"notes":"otherthingshere"}

You could do this with grep:
grep -o '{[^}]*}' file
This matches an opening curly brace, followed by anything that isn't a closing curly brace, followed by a closing curly brace.
Testing it out on your input:
$ grep -o '{[^}]*}' <<<'{"selling":"0"morestuffhere"notes":"otherthingshere"}unwantedthingshere'
{"selling":"0"morestuffhere"notes":"otherthingshere"}

What's wrong with
>> grep -o ".*}" file.txt
{"selling":"0"morestuffhere"notes":"otherthingshere"}
where file.txt contains your example string?

I've never found a good way to do this kind of thing with json in the shell with basic unix tools like grep, sed, etc. A quick and dirty ruby or python script is your friend,
#!/usr/bin/env ruby
# h.rb
require 'json'
key=ARGV.shift
json=ARGF.read
h=JSON.parse(json)
puts h.key?(key) ? h[key] : "not found"
And then pipe your json into the script specifying the key as a parameter,
$ echo '{"key":"value", "otherkey":"othervalue","morekeys":"morevalues"}' | /tmp/h.rb otherkey
othervalue
or from a file,
$ cat /tmp/h.json | /tmp/h.rb otherkey
othervalue

Related

Sed replace hyphen with underscore

new to regex and have a problem. I want to replace hyphens with underscores in certain places in a file. To simplify things, let's say I want to replace the first hyphen. Here's an example "file":
dont-touch-these-hyphens
leaf replace-these-hyphens
I want to replace hyphens in all lines found by
grep -P "leaf \w+-" file
I tried
sed -i 's/leaf \(\w+\)-/leaf \1_/g' file
but nothing happens (wrong replacement would have been better than nothing). I've tried a few tweaks but still nothing. Again, I'm new to this so I figure the above "should basically work". What's wrong with it, and how do I get what I want? Thanks.
You can simplify things by using two distinct regex's ; one for matching the lines that need processing, and one for matching what must be modified.
You can try something like this:
$ sed '/^leaf/ s/-/_/' file
dont-touch-these-hyphens
leaf replace_these-hyphens
Just use awk:
$ awk '$1=="leaf"{ sub(/-/,"_",$2) } 1' file
dont-touch-these-hyphens
leaf replace_these-hyphens
It gives you much more precise control over what you're matching (e.g. the above is doing a string instead of regexp comparison on "leaf" and so would work even if that string contained regexp metacharacters like . or *) and what you're replacing (e.g. the above only does the replacement in the text AFTER leaf and so would continue to work even if leaf itself contained -s):
$ cat file
dont-touch-these-hyphens
leaf-foo.*bar replace-these-hyphens
leaf-foobar dont-replace-these-hyphens
Correct output:
$ awk '$1=="leaf-foo.*bar"{ sub(/-/,"_",$2) } 1' file
dont-touch-these-hyphens
leaf-foo.*bar replace_these-hyphens
leaf-foobar dont-replace-these-hyphens
Wrong output:
$ sed '/^leaf-foo.*bar/ s/-/_/' file
dont-touch-these-hyphens
leaf_foo.*bar replace-these-hyphens
leaf_foobar dont-replace-these-hyphens
(note the "-" in leaf-foo being replaced by "_" in each of the last 2 lines, including the one that does not start with the string "leaf-foo.*bar").
That awk script will work as-is using any awk on any UNIX box.

Grepping for a pattern followed by another pattern and excluding what lies inbetween as ouput

I want to do something like
egrep -o '(mon|tues)[1-3]?[0-9].*(mon|tues)[1-3]?[0-9]'
And only get what isn't found by the (mon|tues)[1-3]?[0-9]
With this as input
mon19hellotues20
mon19world
hellomon19
tues8worldtues22
I want
mon19tues20
tues8tues22
As output
sed is better tool for this to print certain matched txt in output:
sed -nE 's/(mon|tues)([1-3]{0,1}[0-9]).*(mon|tues)([1-3]{0,1}[0-9])/\1\2\3\4/p' file
mon19tues20
tues8tues22

Script to generate code based on pattern

I am trying to generate code to re-initialize an object declared in a file xyz. For each variable declared as follows(x's denote any character, could be repeated any number of times):-
Private lst_xxxxxxxxx As xxxxxxxxxxx
or
Private _lst_xxxxxx As xxxxxxxxxxx
I want to generate something like:-
lstxxxxxxxx.Clear()
for each such occurrence, followed by a newline.
I tried using something like [^*[_ ]lst*] to match the lines in awk but it ended up capturing unwanted expressions
I can use any of the scripting tools from among the tags for this task, just need to get the job done.
You can use the following sed:
sed -nr 's/Private _?(lst)_(\w*) As \w*/\1\2.Clear()/p' file
-n blocks the printing and -r allows to catch groups with just () and then print back with \1, \2...
Example
$ cat a
how are
Private lst_hey As you
Private _lst_helloooo As blabla
Private _lst_hello
you
i am ok
$ sed -nr 's/Private _?(lst)_(\w*) As \w*/\1\2.Clear()/p' a
lsthey.Clear()
lsthelloooo.Clear()
The question's tag is awk, looks you prefer awk.
awk '/Private.*As/{gsub(/_/,X,$2);print $2 ".Clear()"}' file
lsthey.Clear()
lsthelloooo.Clear()
If you need the output show as #fedorqui
awk '/Private.*As/{gsub(/_?lst_/,X,$2);print $2 ".Clear()"}' file
hey.Clear()
helloooo.Clear()

Regular expressions with grep

So I have a bunch of data that all looks like this:
janitor#1/2 of dorm#1/1
president#4/1 of class#2/2
hunting#1/1 hat#1/2
side#1/2 of hotel#1/1
side#1/2 of hotel#1/1
king#1/2 of hotel#1/1
address#2/2 of girl#1/1
one#2/1 in family#2/2
dance#3/1 floor#1/2
movie#1/2 stars#5/1
movie#1/2 stars#5/1
insurance#1/1 office#1/2
side#1/1 of floor#1/2
middle#4/1 of December#1/2
movie#1/2 stars#5/1
one#2/1 of tables#2/2
people#1/2 at table#2/1
Some lines have prepositions, others don't so I thought I could use regular expressions to clean it up. What I need is each noun, the # sign and the following number on its own line. So for example, the first lines of output should look like this in the final file:
janitor#1
dorm#1
president#4
etc...
The list is stored in a file called NPs. My code to do this is:
cat NPs | grep -E '\b(\w*[#][1-9]).' >> test
When I open test, however, it's the exact same as the input file. Any input as to what I'm missing? It doesn't seem like it should be a hard operation, so maybe I'm missing something about syntax? I'm using this command from a shell script that is called in bash.
Thanks in advance!
This should do what you need.
The -o option will show only the part of a matching line that matches the PATTERN.
grep -Eo '[a-z#]+[1-9]' NPs > test
or even the -P option, which Interprets the PATTERN as a Perl regular expression
grep -Po '[\w#]*(?=/)' NPs > test
Using grep:
$ grep -o "\w*[#]\w*" inputfile
janitor#1
dorm#1
president#4
class#2
hunting#1
hat#1
side#1
hotel#1
side#1
hotel#1
king#1
hotel#1
address#2
girl#1
one#2
family#2
dance#3
floor#1
movie#1
stars#5
movie#1
stars#5
insurance#1
office#1
side#1
floor#1
middle#4
ecember#1
movie#1
stars#5
one#2
tables#2
people#1
table#2
grep variations extracting entire lines from text, if they match pattern. If you need to modify lines, you should use sed, like
cat NPs | sed 's/^\(\b\w*[#][1-9]\).*$/\1/g'
You need sed, not grep. (Or awk, or perl.) It looks like this would do what you want:
cat NPs | sed 's?/.*??'
or simply
sed 's?/.*??' NPs
s means "substitute". The next character is the delimiter between regular expressions. Usually it's "/", but since you need to search for "/", I used "?" instead. "." refers to any character, and "*" says "zero or more of what preceded me". Whatever is between the last two delimiters is the replacement string. In this case it's empty, so you're replacing "/" followed by zero or more of any character, with the empty string.
EDIT: Oh, I see now that you wanted to extract the last item on the line, too. Well, I'm sure that others' suggested regexps would work. If it were my problem, I'd probably filter the file in two steps, perhaps piping the results from one step to the next, or using multiple substitutions with sed: First delete the "of"s and middle spaces, and add newlines, and then run sed as above. It's not as cool as doing it all in one regexp, but each step is easier to understand. For even more simplicity and uncoolness, use three steps, replacing " of " with space in the first step. Since others have provided complete solutions, I won't work out the details.
Grep by default just searches for the text, so in your case it is printing the lines that match. I think you want to investigate sed instead to perform the replacement. (And you don't need to cat the file, just grep PATTERN filename)
To get your output on separate lines, this worked for me:
sed 's|/.||g' NPs | sed 's/ .. /=/' | tr "=" "\n"
This uses two seds in a row to do different substitutions, and tr to insert line feeds.
The -o option in grep, which causes it to print out only the matching text, as described in another answer, is probably even simpler!
An awk version:
awk '/#/ {print $NF}' RS="/" NPs
janitor#1
dorm#1
president#4
class#2
hunting#1
hat#1
side#1
hotel#1
side#1
hotel#1
king#1
hotel#1
address#2
girl#1
one#2
family#2
dance#3
floor#1
movie#1
stars#5
movie#1
stars#5
insurance#1
office#1
side#1
floor#1
middle#4
December#1
movie#1
stars#5
one#2
tables#2
people#1
table#2

unix sed command regular expression

Can anyone explain me how the regular expression works in the sed substitute command.
$ cat path.txt
/usr/kbos/bin:/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/local/sbin:/sbin:/bin/:/usr/sbin:/usr/bin:/opt/omni/bin:
/opt/omni/lbin:/opt/omni/sbin:/root/bin
$ sed 's/\(\/[^:]*\).**/\1/g' path.txt
/usr/kbos/bin
/usr/local/sbin
/opt/omni/lbin
From the above sed command they used back reference and save operator concept.
Can anyone explain me how the regular expression especially /[^:]* work in the substitute command to get only the first path in each line.
I think you wrote an extra asterisk * in your sed code, so it should be like this:
$ sed 's/\(\/[^:]*\).*/\1/g' file
/usr/kbos/bin
/usr/local/sbin
/opt/omni/lbin
To change the delimiter will help to understand it a little bit better:
sed 's#\(/[^:]*\).*#\1#g'
The s#something#otherthing#g is a basic sed command that looks for something and changes it for otherthing all over the file.
If you do s#(something)#\1#g then you "save" that something and then you can print it back with \1.
Hence, what it is doing is to get a pattern like /[^:]* and then print is back. /[^:]* means / and then every char except :. So it will get / + all the string until it finds a semicolon :. It will store that piece of the string and then print it back.
Small examples:
# get every char
$ echo "hello123bye" | sed 's#\([a-z]*\).*#\1#g'
hello
# get everything until it finds the number 3
$ echo "hello123bye" | sed 's#\([^3]*\).*#\1#g'
hello12
[^:]*
in regex would match all characters except for :, so it would match until this:
/usr/kbos/bin
also it would match these,
/usr/local/bin
/usr/jbin
/usr/bin
/usr/sas/bin
As, these all contains characters, that are not :
.* match any character, zero or more times.
Thus, this regex [^:]*.*, would match all this expressions:
/usr/kbos/bin:/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/jbin:/usr/bin:/usr/sas/bin
/usr/bin:/usr/sas/bin
However, you get only the first field (ie,/usr/kbos/bin, by using back reference in sed), because, regular expression output the longest possible match found.