Regex search in JQ

Regex search in JQ - regex

I am looking for a regular expression in my jq query (using. It's a pretty simple one, I want to match entries starting with a number \d (or [0-9]) and ending in linux. What I've tried so far:
versions=`echo $allversions| jq '.tags[] | select(startswith("\d")) | select(endswith("linux"))'`
but I don't think startwith doesn't support regular expression. I'm reading that match supports regular expression, but I cannot find proper documentation or examples about it. A simple 'jq '.tags[]| match("\d.*linux")' doesn't work and gives a syntax error message:
syntax error near unexpected token `"\d*linux"'
How can I accomplish this? Or should I combine jq with sed instead?
FYI:
$ jq --version
jq-1.6

Ok I found how to do it!
jq -r '.tags[] | select(test("^[0-9].*linux"))'

The arguments to regex functions must be JSON strings, so any regex backslash must be escaped. Thus, instead of match("\d.*linux")
you'd write:
match("\\d.*linux")

Related

Using JQ and regex to extract just matching regex string

I have the following input json which I obtain from a curl command and I'm feeding it to jq.
{
"version": "14.10.0-ee",
"revision": "ad109bc62af"
}
I'm trying to use jq to extract just '14.10.0'.
I have the following jq command but it's just returning "14.10.0-ee"
jq '. | select(.version|capture("^[0-9]{1,}.[0-9]{1,}.[0-9]{1,}")).version'
I've looked at the jq documentation here and I'm not able to figure the correct syntax.
I've tried scan, capture, and match without success.
I am able to achieve what I want if I pipe the result to grep but I would prefer to do it all in one command.
Any help would be greatly appreciated.

Yet you can use capture through [[:digit:].]+ pattern
jq -r '.version| capture("(?<v>[[:digit:].]+)").v'
Demo

You can split on the hyphen and then take the first element:
.version|split("-")|first
or
.version/"-"|first
(/ on strings is equivalent to calling split)
Call jq with flag -r for raw output, that is without quotation marks around strings.

.version|scan("^[^-]+")
or
.version|scan("^[0-9.]+")
Don't overcomplicate it ;)
Call jq with flag -r for raw output, that is without quotation marks around strings.

Using sed (or any other tool) to remove the quotes in a json file

I have a json file
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
I want to change it to
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
i.e. convert the requestId from String to Integer. I have tried some ways, but none seem to work :
sed -e 's/"requestId":"[0-9]"/"requestId":$1/g' test.json
sed -e 's/"requestId":"\([0-9]\)"/"requestId":444/g' test.json
Could someone help me out please?

Try
sed -e 's/\("requestId":\)"\([0-9]*\)"/\1\2/g' test.json
or
sed -e 's/"requestId":"\([0-9]*\)"/"requestId":\1/g' test.json
The main differences with your attempts are:
Your regular expressions were looking for [0-9] between double quotes, and that's a single digit. By using [0-9]* instead you are looking for any number of digits (zero or more digits).
If you want to copy a sequence of characters from your search in your replacing string, you need to define a group with a starting \( and a final \) in the regexp, and then use \1 in the replacing string to insert the string there. If there are multiple groups, you use \1 for the first group, \2 for the second group, and so on.
Also note that the final g after the last / is used to apply this substitution in all matches, in every processed line. Without that g, the substitution would only be applied to the first match in every processed line. Therefore, if you are only expecting one such replacement per line, you can drop that g.

Since you said "or any other tool", I'd recommend jq! While sed is great for line-based, JSON is not and sometimes newlines are added in just for pretty printing the output to make developers' lives easier. It's rules also get even more tricky when handling Unicode or double-quotes in string content. jq is specifically designed to understand the JSON format and can dissect it appropriately.
For your case, this should do the job:
jq '.requestId = (.requestId | tonumber)'
Note, this will throw an error if requestId is missing and not output the JSON object. If that's a concern, you might need something a little more sophisticated like this example:
jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end'
Also, jq does pretty-print and colorize it's output if sent to a terminal. To avoid that and just see a compact, one-line-per-object format, add -Mc to the command. jq will also work if provided multiple objects back-to-back without a newline in the input. Here's a full-demo to show this filter:
$ (echo '{"doc_type":"bare"}{}'
echo '{"doc_type":"user","requestId":"0092","clientId":"11"}'
echo '{"doc_type":"user","requestId":"1000778","clientId":"42114"}'
) | jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end' -Mc
Which produced this output:
{"doc_type":"bare"}
{}
{"doc_type":"user","requestId":92,"clientId":"11"}
{"doc_type":"user","requestId":1000778,"clientId":"42114"}

sed -e 's/"requestId":"\([0-9]\+\)"/"requestId":\1/g' test.json
You were close. The "new" regex terms I had to add: \1 means "whatever is contained in the first \( \) on the "search" side, and \+ means "1 or more of the previous thing".
Thus, we search for the string "requestId":" followed by a group of 1 or more digits, followed by ", and replace it with "requestId": followed by that group we found earlier.

Perhaps the jq (json query) tool would help you out?
$ cat test
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
$ cat test |jq '.doc_type' --raw-output
user
$

regex that works with sed not honored with ${var//search/replace}

I am trying to simply do a regular expression replace in bash but cannot figure it out. In my test, I would like the following string transformed:
test_data(123)
to
test_xyz
I've tried the following:
echo "test_data(123)" | sed -e 's/.*\(data(.*)\).*/xyz/g'
And that gets me: xyz
Then I tried:
var=${"test_data(123)"//.*\(data(.*)\).*/xyz}
But I get an error - bad substitution
How do I get my desired results on the regex replace in bash?

${foo//$match/$replace} uses fnmatch (glob-style) patterns, not any form compatible with BRE/ERE/PCRE or other conventional regex syntax formats.
input="test_data(123)"
match='data(*)'
replace='xyz'
result=${input//$match/$replace}
echo "$result"
...properly emits test_xyz.

search multiple strings in a single line with multiple spaces in between

I want to search for the whole line below from my /etc/pam.d/su file to use it in a script:
auth required pam_wheel.so use_uid
there might be multiple spaces in between, and it might be commented out also, with multiple #'s
This is what I'm using :
grep "^#*auth +required +pam_wheel\.so +use_uid$"
, but it doesn't yield anything
I'm certainly doing something wrong, but what am I doing wrong? Sorry, have always been bad with regular expressions

egrep is the way to go, but the question said "multiple" spaces. That can be done like this
egrep "^([[:space:]]*#)*[[:space:]]*auth[[:space:]]+required[[:space:]]+pam_wheel\.so[[:space:]]+use_uid[[:space:]]*$"
A backslashed space "\ " is not listed in the special escapes in regex(7) Instead, the POSIX character class can be used. You could also use blank (a GNU extension) rather than space to make this only space/tab.

you can use grep -E (extended regexp)
grep -E "^\ +auth\ +required\ +pam_wheel\.so +use_uid$"
this works:
echo " auth required pam_wheel.so use_uid" | grep -E "^\ +auth\ +required\ +pam_wheel\.so +use_uid$"
gives
auth required pam_wheel.so use_uid

Well this finally works for me:
[root#server4 ~]# egrep "^#*auth.*required.*pam_wheel\.so.*use_uid" /etc/pam.d/su
#auth required pam_wheel.so use_uid
I think the issue is in how we are mentioning the spaces.

Regular Expression to parse Common Name from Distinguished Name

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...
CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu
I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)
DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.
Thank you very much for your help!

I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:
CN=Doe\, John,CN=Users,DC=example,DC=local
Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:
echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'
(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.

Two cut commands is probably the simplest (although not necessarily the best):
DSCL | cut -d, -f1 | cut -d= -f2
First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.

Using sed:
sed 's/^CN=\([^,]*\).*/\1/' input_file
^ matches start of line
CN= literal string match
\([^,]*\) everything until a comma
.* rest

http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt

I like awk too, so I print the substring from the fourth char:
DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt

This regex will parse a distinguished name, giving name and val a capture groups for each match.
When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:
(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+
Here is is nicely formatted:
(?:^|,\s?)
(?:
(?<name>[A-Z]+)=
(?<val>"(?:[^"]|"")+"|[^,]+)
)+
Here's a link so you can see it in action:
https://regex101.com/r/zfZX3f/2
If you want a regex to get only the CN, then this adapted version will do it:
(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex search in JQ - regex

Ok I found how to do it! jq -r '.tags[] | select(test("^[0-9].*linux"))'

The arguments to regex functions must be JSON strings, so any regex backslash must be escaped. Thus, instead of match("\d.linux") you'd write: match("\\d.linux")

Related

Using JQ and regex to extract just matching regex string

Using sed (or any other tool) to remove the quotes in a json file

regex that works with sed not honored with ${var//search/replace}

search multiple strings in a single line with multiple spaces in between

Regular Expression to parse Common Name from Distinguished Name

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex search in JQ - regex

Ok I found how to do it! jq -r '.tags[] | select(test("^[0-9].*linux"))'

The arguments to regex functions must be JSON strings, so any regex backslash must be escaped. Thus, instead of match("\d.*linux") you'd write: match("\\d.*linux")

Related

Using JQ and regex to extract just matching regex string

Using sed (or any other tool) to remove the quotes in a json file

regex that works with sed not honored with ${var//search/replace}

search multiple strings in a single line with multiple spaces in between

Regular Expression to parse Common Name from Distinguished Name

Categories

Resources

The arguments to regex functions must be JSON strings, so any regex backslash must be escaped. Thus, instead of match("\d.linux") you'd write: match("\\d.linux")