jq remove spaces after first

jq remove spaces after first - regex

Seemed simple, but not so far. Tried lots of things. Best I've got:
echo "low quality not gonna apologize" | jq -r 'gsub("[\\s+]"; " "; "g")'
parse error: Invalid numeric literal at line 1, column 4
Goal is to have 1 space replace any occurrence of multiple whitespace of any kind. Note that I removed tabs and newlines already from this stream. This is bash shell. I don't get this error in the context of the larger application I'm building either, where the code is simply and quietly not changing the multiple spaces into a single space for IDK why.

The right way with jq:
echo "low quality not gonna apologize" | jq -Rr 'gsub("\\s+";" ";"g")'
-R - raw input; each line of text is passed to the filter as a string
The output:
low quality not gonna apologize

Two of many alternatives:
$ echo '"low quality not gonna apologize"' | jq -r 'gsub("\\s+"; " ")'
low quality not gonna apologize
$ jq -n --arg in "low quality not gonna apologize" '$in | gsub("\\s+"; " ")'
"low quality not gonna apologize"
Notice that:
Not every shell string is a JSON string.
The --arg command-line option has the effect of coercing the shell string to a JSON string.
if you use 'gsub', there is no need to specify "g" as well.

Related

Using sed (or any other tool) to remove the quotes in a json file

I have a json file
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
I want to change it to
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
i.e. convert the requestId from String to Integer. I have tried some ways, but none seem to work :
sed -e 's/"requestId":"[0-9]"/"requestId":$1/g' test.json
sed -e 's/"requestId":"\([0-9]\)"/"requestId":444/g' test.json
Could someone help me out please?

Try
sed -e 's/\("requestId":\)"\([0-9]*\)"/\1\2/g' test.json
or
sed -e 's/"requestId":"\([0-9]*\)"/"requestId":\1/g' test.json
The main differences with your attempts are:
Your regular expressions were looking for [0-9] between double quotes, and that's a single digit. By using [0-9]* instead you are looking for any number of digits (zero or more digits).
If you want to copy a sequence of characters from your search in your replacing string, you need to define a group with a starting \( and a final \) in the regexp, and then use \1 in the replacing string to insert the string there. If there are multiple groups, you use \1 for the first group, \2 for the second group, and so on.
Also note that the final g after the last / is used to apply this substitution in all matches, in every processed line. Without that g, the substitution would only be applied to the first match in every processed line. Therefore, if you are only expecting one such replacement per line, you can drop that g.

Since you said "or any other tool", I'd recommend jq! While sed is great for line-based, JSON is not and sometimes newlines are added in just for pretty printing the output to make developers' lives easier. It's rules also get even more tricky when handling Unicode or double-quotes in string content. jq is specifically designed to understand the JSON format and can dissect it appropriately.
For your case, this should do the job:
jq '.requestId = (.requestId | tonumber)'
Note, this will throw an error if requestId is missing and not output the JSON object. If that's a concern, you might need something a little more sophisticated like this example:
jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end'
Also, jq does pretty-print and colorize it's output if sent to a terminal. To avoid that and just see a compact, one-line-per-object format, add -Mc to the command. jq will also work if provided multiple objects back-to-back without a newline in the input. Here's a full-demo to show this filter:
$ (echo '{"doc_type":"bare"}{}'
echo '{"doc_type":"user","requestId":"0092","clientId":"11"}'
echo '{"doc_type":"user","requestId":"1000778","clientId":"42114"}'
) | jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end' -Mc
Which produced this output:
{"doc_type":"bare"}
{}
{"doc_type":"user","requestId":92,"clientId":"11"}
{"doc_type":"user","requestId":1000778,"clientId":"42114"}

sed -e 's/"requestId":"\([0-9]\+\)"/"requestId":\1/g' test.json
You were close. The "new" regex terms I had to add: \1 means "whatever is contained in the first \( \) on the "search" side, and \+ means "1 or more of the previous thing".
Thus, we search for the string "requestId":" followed by a group of 1 or more digits, followed by ", and replace it with "requestId": followed by that group we found earlier.

Perhaps the jq (json query) tool would help you out?
$ cat test
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
$ cat test |jq '.doc_type' --raw-output
user
$

Get procmail to reply to larger messages

I am trying to reply to messages larger than a certain size then forward to another user. Got this, but nothing happens. Its seem I am only able to add text to the end of the message.
:0
* > 1000
{
:0 fhw
| cat - ; echo "Insert this text at the top of the body"
:0
| formail -rk
| $SENDMAIL -t
}

Using sed helped a lot.
SEDSCRIPT='0,/^$/ s//\nLarge message rejected [Max=4MB]\n/'
MAILADDR=me#nowhere
:0
* > 4000000
* !^FROM_DAEMON
* !^X-Loop: $MAILADDR
| formail -rk -A "X-Loop: $MAILADDR" \
| sed "$SEDSCRIPT" \
| $SENDMAIL -t

It's not clear what exactly is wrong, but if you want to append text at the beginning, you obviously need to echo before cat, and work on the body (b), not the headers (h).
:0 fbw
| echo "Insert this"; cat -
I suppose you could technically break the headers by appending something at the end, but if you want it to appear in the body, it needs to have a neck (a newline) before it.
:0 fhw
| cat -; echo; echo "Insert this"
There is also a sed syntax which allows for somewhat more flexible manipulation (sed addressing lets you say things like "before the first line which starts with > for example) but getting newlines into sed command lines inside Procmail is hairy. As a workaround, I often use a string, and then just interpolate that. (How hairy exactly depends on details of sed syntax which are not standard. Some implementations seem to require newlines in the a and i commands.)
sedscript='1i\
insert this\
'
:0 fbw
| sed "$sedscript"
(If you are lucky, your sed will accept something simpler like sed '1i insert this'. The variant above seems to be the only one I can get to work on macOS, and thus probably generally *BSD.)
As an aside, a message which is 1000 bytes long isn't by any means large. I recall calculating an average message length of about 4k in my own inbox, but this was before people started to use HTML email clients. Depending on your inbound topology, just the headers could easily be more than 1000 bytes.

Grep rsync output?

Running an rsync command produces output similar to this :
66256896 92% 4.51MB/s 0:00:01
How can I grep this output for just the percentage value ?
So anything {0-100}% so instead of showing the full output I only see the percentage ?
The command would be:
rsyncd -Pav server.com::files/remotefile.tar.gz localfile.tar.gz | grep xxx
Thanks

If you really want to use sed, this ugly thing works!
rsyncd -Pav server.com::files/remotefile.tar.gz localfile.tar.gz | sed -e 's/%.*/%/; s/.* //'
It replaces % followed by the rest of the line with just % (thereby deleting everything after the percent), then replaces everything up to the space before the percentage also with nothing.

sed: return last occurrence match until end of file

Using sed, how do I return the last occurance of a match until the End Of File?
(FYI this has been simplified)
So far I've tried:
sed -n '/ Statistics |/,$p' logfile.log
Which returns all lines from the first match onwards (almost the entire file)
I've also tried:
$linenum=`tail -400 logfile.log | grep -n " Statistics |" | tail -1 | cut -d: -f1`
sed "$linenum,\$!d" logfile.log
This works but won't work over an ssh connection in one command, really need it all to be in one pipeline.
Format of the log file is as follows:
(There are statistics headers with sub data written to the log file every minute, the purpose of this command is to return the most recent Statistics header together with any associated errors that occur after the header)
Statistics |
Stuff
More Stuff
Even more Stuff
Statistics |
Stuff
More Stuff
Error: incorrect value
Statistics |
Stuff
More Stuff
Even more Stuff
Statistics |
Stuff
Error: error type one
Error: error type two
EOF
Return needs to be:
Statistics |
Stuff
Error: error type one
Error: error type two

Your example script has a space before Statistics but your sample data doesn't seem to. This has a regex which assumes Statistics is at beginning of line; tweak if that's incorrect.
sed -n '/^Statistics |/h;/^Statistics |/!H;$!b;x;p'
When you see Statistics, replace the hold space with the current line (h). Otherwise, append to the hold space (H). If we are not at the end of file, stop here (b). At end of file, print out the hold space (x retrieve contents of hold space; p print).
In a sed script, commands are optionally prefixed by an "address". Most commonly this is a regex, but it can also be a line number. The address /^Statistics |/ selects all lines matching the regular expression; /^Statistics |/! selects lines not matching the regular expression; and $! matches all lines except the last line in the file. Commands with no explicit address are executed for all input lines.
Edit Explain the script in some more detail, and add the following.
Note that if you need to pass this to a remote host using ssh, you will need additional levels of quoting. One possible workaround if it gets too complex is to store this script on the remote host, and just ssh remotehost path/to/script. Another possible workaround is to change the addressing expressions so that they don't contain any exclamation marks (these are problematic on the command line e.g. in Bash).
sed -n '/^Statistics |/{h;b};H;${x;p}'
This is somewhat simpler, too!
A third possible workaround, if your ssh pipeline's stdin is not tied up for other things, is to pipe in the script from your local host.
echo '/^Statistics |/h;/^Statistics |/!H;$!b;x;p' |
ssh remotehost sed -n -f - file

If you have tac available:
tac INPUTFILE | sed '/^Statistics |/q' | tac

This might work for you:
sed '/Statistics/h;//!H;$!d;x' file
Statistics |
Stuff
Error: error type one
Error: error type two

If you're happy with an awk solution, this kinda works (apart from getting an extra blank line):
awk '/^Statistics/ { buf = "" } { buf = buf "\n" $0 } END { print buf }' input.txt

sed ':a;N;$!ba;s/.*Statistics/Statistics/g' INPUTFILE
should work (GNU sed 4.2.1).
It reads the whole file to one string, then replaces everything from the start to the last Statistics (word included) with Statistics, and prints what's remaining.
HTH

This might also work, slightly more simple version of the sed solution given by the others above:
sed -n 'H; /^Statistics |/h; ${g;p;}' logfile.log
Output:
Statistics |
Stuff
Error: error type one
Error: error type two

how to replace part of a string using sed

echo "/home/repository/tags/1.9.1/1.9.1.8/core" | sed "s/HELP/XXX/g"
I would like some HELP in replacing what is in between tags and core with let's say XXX. So my desired output would be /home/repository/tags/XXX/core.
The string is a directory path, where /home/repository/tags are the only constant parts. The path is always six levels deep. So it may not always be between tags and core.

echo "/home/repository/whatever/1.9.1/1.9.1.8/core/and/more/junk" \
| sed 's#\(/[^/]*/[^/]*/[^/]*\)/[^/]*/[^/]*#\1/XXX#'
yields ...
/home/repository/whatever/XXX/core/and/more/junk

By using repetition quantifiers, you can easily adjust where your replacement is made:
echo "/home/repository/tags/1.9.1/1.9.1.8/core" | \
sed -r 's|(/([^/]+/){3})([^/]+/){2}(.*)|\1XXX/\4|'
3 represents how many components to keep at the beginning
2 represents how many to replace
You could even use variables:
$ dirs='/one/two/three/four/five/six/seven/eight'
$ for keep in {0..3}; do for replace in {0..3}; do echo "$dirs" | \
sed -r "s|(/([^/]+/){$keep})([^/]+/){$replace}(.*)|\1XXX/\4|"; done; done
/XXX/one/two/three/four/five/six/seven/eight
/XXX/two/three/four/five/six/seven/eight
/XXX/three/four/five/six/seven/eight
/XXX/four/five/six/seven/eight
/one/XXX/two/three/four/five/six/seven/eight
/one/XXX/three/four/five/six/seven/eight
/one/XXX/four/five/six/seven/eight
/one/XXX/five/six/seven/eight
/one/two/XXX/three/four/five/six/seven/eight
/one/two/XXX/four/five/six/seven/eight
/one/two/XXX/five/six/seven/eight
/one/two/XXX/six/seven/eight
/one/two/three/XXX/four/five/six/seven/eight
/one/two/three/XXX/five/six/seven/eight
/one/two/three/XXX/six/seven/eight
/one/two/three/XXX/seven/eight

If your directory is always 6 levels deep, this works (remember to escape the round brackets):
echo "/home/repository/tags/1.9.1/1.9.1.8/core" |
sed 's/\(\/home\/repository\/tags\/\).*\/.*\(\/.*\)/\1XXX\2/'
produces:
/home/repository/tags/XXX/core

Here, spare yourself some regex agony:
echo "/home/repository/tags/1.9.1/1.9.1.8/core" | sed 's#/home/repository/tags/.*/\(.\+\)$#/home/repository/tags/XXX/\1#'
No need to explicitly match the components if all you're really trying to do is strip out everything between tags/ and the last component. Note that I used + not *, so the component must be nonempty. That'll guard against having a trailing slash.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

jq remove spaces after first - regex

The right way with jq: echo "low quality not gonna apologize" | jq -Rr 'gsub("\\s+";" ";"g")' -R - raw input; each line of text is passed to the filter as a string The output: low quality not gonna apologize

Related

Using sed (or any other tool) to remove the quotes in a json file

Get procmail to reply to larger messages

Grep rsync output?

sed: return last occurrence match until end of file

how to replace part of a string using sed

Categories

Resources