Using JQ and regex to extract just matching regex string - regex

I have the following input json which I obtain from a curl command and I'm feeding it to jq.
{
"version": "14.10.0-ee",
"revision": "ad109bc62af"
}
I'm trying to use jq to extract just '14.10.0'.
I have the following jq command but it's just returning "14.10.0-ee"
jq '. | select(.version|capture("^[0-9]{1,}.[0-9]{1,}.[0-9]{1,}")).version'
I've looked at the jq documentation here and I'm not able to figure the correct syntax.
I've tried scan, capture, and match without success.
I am able to achieve what I want if I pipe the result to grep but I would prefer to do it all in one command.
Any help would be greatly appreciated.

Yet you can use capture through [[:digit:].]+ pattern
jq -r '.version| capture("(?<v>[[:digit:].]+)").v'
Demo

You can split on the hyphen and then take the first element:
.version|split("-")|first
or
.version/"-"|first
(/ on strings is equivalent to calling split)
Call jq with flag -r for raw output, that is without quotation marks around strings.

.version|scan("^[^-]+")
or
.version|scan("^[0-9.]+")
Don't overcomplicate it ;)
Call jq with flag -r for raw output, that is without quotation marks around strings.

Related

Regex search in JQ

I am looking for a regular expression in my jq query (using. It's a pretty simple one, I want to match entries starting with a number \d (or [0-9]) and ending in linux. What I've tried so far:
versions=`echo $allversions| jq '.tags[] | select(startswith("\d")) | select(endswith("linux"))'`
but I don't think startwith doesn't support regular expression. I'm reading that match supports regular expression, but I cannot find proper documentation or examples about it. A simple 'jq '.tags[]| match("\d.*linux")' doesn't work and gives a syntax error message:
syntax error near unexpected token `"\d*linux"'
How can I accomplish this? Or should I combine jq with sed instead?
FYI:
$ jq --version
jq-1.6
Ok I found how to do it!
jq -r '.tags[] | select(test("^[0-9].*linux"))'
The arguments to regex functions must be JSON strings, so any regex backslash must be escaped. Thus, instead of match("\d.*linux")
you'd write:
match("\\d.*linux")

Filtering a variable in bash script using regex tr or awk

row1=$('+00 00:30:07.880000')
rowX=$('row1 | tr -dc '0-9')
I basically want to filter out all the special characters and space.
I wish to have a output as follows.
echo $'row1' = 003007.880000
You don't need regular expressions or external commands like tr for this. Bash's built-in parameter expansion can do it:
row1='+00 00:30:07.880000'
row1=${row1//[^0-9.]/}
echo "row1=$row1"
outputs row1=00003007.880000.
The output has two leading zeros that are not in the output suggested in the question. Maybe there's an unstated requirement to remove prefixes delimited by spaces. If that is the case, possible code is:
row1='+00 00:30:07.880000'
row1=${row1##* }
row1=${row1//[^0-9.]/}
echo "row1=$row1"
That outputs row1=003007.880000.
See How do I do string manipulations in bash? for explanations of ${row1//[^0-9.]/} and ${row1##* }.
This is the easiest way to do that :
$ echo '+00 00:30:07.880000' | tr -dc '[0-9].'
00003007880000
Regards!

Extracting substring in bash script

I am not that good at bash scripting. I have a requirement to extract a substring between two words of a string. I tried different ways. Could some one help me pls?
This is my text "RegionName": "eu-west-1", "LatestAmiId": "ami-0ebfeadd9ccacfbb2",
Remember the the quotes and comma are the part of String. I need to extract the AMI ID alone, Means text between "LatestAmiId": " and ",
Any help pls?
Assuming you have this string stored in a variable name input_text you can get the AmiId using sed like this
ami_id=$(echo "$input_text" | sed -e 's/.*LatestAmiId": "//' -e 's/",$//')
this uses two different sed scripts:
s/.*LatestAmiId": "// replaces all text up to and including LatestAmiId": " with nothing
s/",$// replaces the ", at the end of the line with nothing
As I mentioned in comments, jq is a tool that I have found really helpful when working with JSON objects in bash scripts. Since your input string looks like a section out of a json response from an AWS api, I highly recommend using a json tool rather than a regex to extract this information.

Using sed (or any other tool) to remove the quotes in a json file

I have a json file
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
I want to change it to
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
i.e. convert the requestId from String to Integer. I have tried some ways, but none seem to work :
sed -e 's/"requestId":"[0-9]"/"requestId":$1/g' test.json
sed -e 's/"requestId":"\([0-9]\)"/"requestId":444/g' test.json
Could someone help me out please?
Try
sed -e 's/\("requestId":\)"\([0-9]*\)"/\1\2/g' test.json
or
sed -e 's/"requestId":"\([0-9]*\)"/"requestId":\1/g' test.json
The main differences with your attempts are:
Your regular expressions were looking for [0-9] between double quotes, and that's a single digit. By using [0-9]* instead you are looking for any number of digits (zero or more digits).
If you want to copy a sequence of characters from your search in your replacing string, you need to define a group with a starting \( and a final \) in the regexp, and then use \1 in the replacing string to insert the string there. If there are multiple groups, you use \1 for the first group, \2 for the second group, and so on.
Also note that the final g after the last / is used to apply this substitution in all matches, in every processed line. Without that g, the substitution would only be applied to the first match in every processed line. Therefore, if you are only expecting one such replacement per line, you can drop that g.
Since you said "or any other tool", I'd recommend jq! While sed is great for line-based, JSON is not and sometimes newlines are added in just for pretty printing the output to make developers' lives easier. It's rules also get even more tricky when handling Unicode or double-quotes in string content. jq is specifically designed to understand the JSON format and can dissect it appropriately.
For your case, this should do the job:
jq '.requestId = (.requestId | tonumber)'
Note, this will throw an error if requestId is missing and not output the JSON object. If that's a concern, you might need something a little more sophisticated like this example:
jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end'
Also, jq does pretty-print and colorize it's output if sent to a terminal. To avoid that and just see a compact, one-line-per-object format, add -Mc to the command. jq will also work if provided multiple objects back-to-back without a newline in the input. Here's a full-demo to show this filter:
$ (echo '{"doc_type":"bare"}{}'
echo '{"doc_type":"user","requestId":"0092","clientId":"11"}'
echo '{"doc_type":"user","requestId":"1000778","clientId":"42114"}'
) | jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end' -Mc
Which produced this output:
{"doc_type":"bare"}
{}
{"doc_type":"user","requestId":92,"clientId":"11"}
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
sed -e 's/"requestId":"\([0-9]\+\)"/"requestId":\1/g' test.json
You were close. The "new" regex terms I had to add: \1 means "whatever is contained in the first \( \) on the "search" side, and \+ means "1 or more of the previous thing".
Thus, we search for the string "requestId":" followed by a group of 1 or more digits, followed by ", and replace it with "requestId": followed by that group we found earlier.
Perhaps the jq (json query) tool would help you out?
$ cat test
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
$ cat test |jq '.doc_type' --raw-output
user
$

How does bash expand escaped characters when dereferencing variables

If I quit using variables, and just write the regexes directly into to the last sed command,
everything works. But as it is here, no substitutions are done?
#!/bin/bash
#html substitutions
ampP="\&"
ampR="&"
ltP="\<"
ltR="<"
gtP="\>"
gtR=">"
quotP="\""
quotP2='\โ€œ'
quotP3="\โ€"
quotR="\""
tripDotP="\&#8230"
tripDotR="..."
tickP="\โ€™"
tickR="\ยด"
#get a random page, and filter out the quotes
#pick a random quote
#translate wierd html symbols
curl "www.yodaquotes.net/page/$((RANDOM % 9 +1))/" -Ls | sed -nr 's/.*data-text=\"([^\"]+)\".*/\1/p' \
| sort -R | head -n1 \
| sed 's/"$ampP"/"$ampR"/g; s/$ltP/$ltR/g; s/$gtP/$gtR/g; s/$quotP/$quotR/g; s/"$quotP2"/"$quotR"/g; s/$quotP3/$quotR/g; s/$tripDotP/$tripDotR/g; s/$stickP/$stickR/g'
This sed isn't going to work:
sed 's/"$ampP"/"$ampR"/g'
because of wrong shell quoting. Your shell variables won't be expanded at all in single quotes. Try using this form:
sed "s~$ampP~$ampR~g"
Debugging 101: Let's just echo what sed receives:
echo 's/"$ampP"/"$ampR"/g; s/$ltP/$ltR/g; s/$gtP/$gtR/g; s/$quotP/$quotR/g; s/"$quotP2"/"$quotR"/g; s/$quotP3/$quotR/g; s/$tripDotP/$tripDotR/g; s/$stickP/$stickR/g'
s/"$ampP"/"$ampR"/g; s/$ltP/$ltR/g; s/$gtP/$gtR/g; s/$quotP/$quotR/g; s/"$quotP2"/"$quotR"/g; s/$quotP3/$quotR/g; s/$tripDotP/$tripDotR/g; s/$stickP/$stickR/g
That doesn't look right now, does it?
There's no variable substitution within single quotes in bash. That's why we have two different quotes, so you can decide which one is more appropriate for the task.
For readability I would suggest putting each sed command within double quotes.
Like this: "s/$ampP/$ampR/g"