I have json lines that contain multiple parts per line that look like this:
"SomeDate":"Date(-2156284800000)",
I would like to convert each occurrence in all lines into something more human readable:
"SomeDate":"1901-09-03 00:19:32",
I tried using sed to put the matched block (in this case the timestamp) into the argumentlist of the date command. This fails.
$ echo '"SomeDate":"Date(-2156284800000)",' | \
sed "s/Date(\([0-9\-]*\)[0-9][0-9][0-9])/$(date -d#\\1 \"+%F %T\")/g"
date: invalid date `#\\1'
"SomeDate":"",
In an attempt to debug this all I added an 'echo' to the date to validate the command it should be running
$ echo '"SomeDate":"Date(-2156284800000)",' | \
sed "s/Date(\([0-9\-]*\)[0-9][0-9][0-9])/$(echo date -d#\\1 \"+%F %T\")/g"
"SomeDate":"date -d#-2156284800 "+%F %T"",
$ date -d#-2156284800 "+%F %T"
1901-09-03-00:19:32
Why isn't the first command running as expected?
The best guess I have right now is that the subshell is executed first WITHOUT the \1 substitution and then the resulting output is actually used by sed.
How do I achieve what I'm trying to do?
P.S. I'm using CentOS 6.6
How about using awk:
echo '"SomeDate":"Date(-2156284800000)",' | awk '{ print gensub(/Date\(([0-9\-]+)\)/, ("date -d#\\1 \"+%F %T\"" |& getline t) ? t : "\\1", "g"); }'
Disclaimer: There's probably a better way to do this, but briefly:
gensub is like gsub but gives you access to the matched groups
Capture the Date(XXX) bit with:
/Date\(([0-9\-]+)\)/
(which gets the actual epoch in the match group \1)
The second argument is:
("date -d#\\1 \"+%F %T\"" |& getline t) ? t : "\\1"
which forms the date command, runs it (with getline) and assigns the result to the variable t. getline, bizarely returns 1 on success, so we check for that with the ternary (?:) statement and return the first line of output from that command.
Finally, we tell gensub to be global.
The workaround I use right now is via perl:
$ echo fee 4321432431 fie 1429882795 fum | \
perl -MPOSIX -pe 's/(\d+)/strftime "%F", localtime $1/eg'
fee 2106-12-10 fie 2015-04-24 fum
Related
How can I use sed to add a dynamic prefix to each number in an integer list?
For example:
I have a string "A-1,2,3,4,5", I want to transform it to string "A-1,A-2,A-3,A-4,A-5" - which means I want to add prefix of first integer i.e. "A-" to each number of the list.
If I have string like "B-1,20,300" then I want to transform it to string "B-1,B-20,B-300".
I am not able to use RegEx Capturing Groups because for global match they do not retain their value in subsequent matches.
When it comes to looping constructs in sed, I like to use newlines as markers for the places I have yet to process. This makes matching much simpler, and I know they're not in the input because my input is a text line.
For example:
$ echo A-1,2,3,4,5 | sed 's/,/\n/g;:a s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/; ta'
A-1,A-2,A-3,A-4,A-5
This works as follows:
s/,/\n/g # replace all commas with newlines (insert markers)
:a # label for looping
s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/ # replace the next marker with a comma followed
# by the prefix
ta # loop unless there's nothing more to do.
The approach is similar to #potong's, but I find the regex much more readable -- \([^0-9]*\) captures the prefix, \([^\n]*\) captures everything up to the next marker (i.e. everything that's already been processed), and then it's just a matter of reassembling it in the substitution.
Don't use sed, just use the other standard UNIX text manipulation tool, awk:
$ echo 'A-1,2,3,4,5' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
A-1,A-2,A-3,A-4,A-5
$ echo 'B-1,20,300' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
B-1,B-20,B-300
This might work for you (GNU sed):
sed -E ':a;s/^((([^-]+-)[^,]+,)+)([0-9])/\1\3\4/;ta' file
Uses pattern matching and a loop to replace a number following a comma by the first column prefix and that number.
Assuming this is for shell scripting, you can do so with 2 seds:
set string = "A1,2,3,4,5"
set prefix = `echo $string | sed 's/^\([A-Z]\).*/\1/'`
echo $string | sed 's/,\([0-9]\)/,'$prefix'-\1/g'
Output is
A1,A-2,A-3,A-4,A-5
With
set string = "B-1,20,300"
Output is
B-1,B-20,B-300
Could you please try following(if ok with awk).
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i !~ /^A/&&$i !~ /\"A/){
$i="A-"$i
}
}
}
1' Input_file
if your data in 'd' file, tried on gnu sed:
sed -E 'h;s/^(\w-).+/\1/;x;G;:s s/,([0-9]+)(.*\n(.+))/,\3\1\2/;ts; s/\n.+//' d
I have a json file
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
I want to change it to
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
i.e. convert the requestId from String to Integer. I have tried some ways, but none seem to work :
sed -e 's/"requestId":"[0-9]"/"requestId":$1/g' test.json
sed -e 's/"requestId":"\([0-9]\)"/"requestId":444/g' test.json
Could someone help me out please?
Try
sed -e 's/\("requestId":\)"\([0-9]*\)"/\1\2/g' test.json
or
sed -e 's/"requestId":"\([0-9]*\)"/"requestId":\1/g' test.json
The main differences with your attempts are:
Your regular expressions were looking for [0-9] between double quotes, and that's a single digit. By using [0-9]* instead you are looking for any number of digits (zero or more digits).
If you want to copy a sequence of characters from your search in your replacing string, you need to define a group with a starting \( and a final \) in the regexp, and then use \1 in the replacing string to insert the string there. If there are multiple groups, you use \1 for the first group, \2 for the second group, and so on.
Also note that the final g after the last / is used to apply this substitution in all matches, in every processed line. Without that g, the substitution would only be applied to the first match in every processed line. Therefore, if you are only expecting one such replacement per line, you can drop that g.
Since you said "or any other tool", I'd recommend jq! While sed is great for line-based, JSON is not and sometimes newlines are added in just for pretty printing the output to make developers' lives easier. It's rules also get even more tricky when handling Unicode or double-quotes in string content. jq is specifically designed to understand the JSON format and can dissect it appropriately.
For your case, this should do the job:
jq '.requestId = (.requestId | tonumber)'
Note, this will throw an error if requestId is missing and not output the JSON object. If that's a concern, you might need something a little more sophisticated like this example:
jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end'
Also, jq does pretty-print and colorize it's output if sent to a terminal. To avoid that and just see a compact, one-line-per-object format, add -Mc to the command. jq will also work if provided multiple objects back-to-back without a newline in the input. Here's a full-demo to show this filter:
$ (echo '{"doc_type":"bare"}{}'
echo '{"doc_type":"user","requestId":"0092","clientId":"11"}'
echo '{"doc_type":"user","requestId":"1000778","clientId":"42114"}'
) | jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end' -Mc
Which produced this output:
{"doc_type":"bare"}
{}
{"doc_type":"user","requestId":92,"clientId":"11"}
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
sed -e 's/"requestId":"\([0-9]\+\)"/"requestId":\1/g' test.json
You were close. The "new" regex terms I had to add: \1 means "whatever is contained in the first \( \) on the "search" side, and \+ means "1 or more of the previous thing".
Thus, we search for the string "requestId":" followed by a group of 1 or more digits, followed by ", and replace it with "requestId": followed by that group we found earlier.
Perhaps the jq (json query) tool would help you out?
$ cat test
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
$ cat test |jq '.doc_type' --raw-output
user
$
I am trying to extract a part of the filename - everything before the date and suffix. I am not sure the best way to do it in bashscript. Regex?
The names are part of the filename. I am trying to store it in a shellscript variable. The prefixes will not contain strange characters. The suffix will be the same. The files are stored in a directory - I will use loop to extract the portion of the filename for each file.
Expected input files:
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
Expected Extract:
EXAMPLE_FILE
EXAMPLE_FILE_2
Attempt:
filename=$(basename "$file")
folder=sed '^s/_[^_]*$//)' $filename
echo 'Filename:' $filename
echo 'Foldername:' $folder
$ cat file.txt
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
$
$ cat file.txt | sed 's/_[0-9]*-[0-9]*-[0-9]*\.out$//'
EXAMPLE_FILE
EXAMPLE_FILE_2
$
No need for useless use of cat, expensive forks and pipes. The shell can cut strings just fine:
$ file=EXAMPLE_FILE_2_2017-10-12.out
$ echo ${file%%_????-??-??.out}
EXAMPLE_FILE_2
Read all about how to use the %%, %, ## and # operators in your friendly shell manual.
Bash itself has regex capability so you do not need to run a utility. Example:
for fn in *.out; do
[[ $fn =~ ^(.*)_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} ]]
cap="${BASH_REMATCH[1]}"
printf "%s => %s\n" "$fn" "$cap"
done
With the example files, output is:
EXAMPLE_FILE_2017-09-12.out => EXAMPLE_FILE
EXAMPLE_FILE_2_2017-10-12.out => EXAMPLE_FILE_2
Using Bash itself will be faster, more efficient than spawning sed, awk, etc for each file name.
Of course in use, you would want to test for a successful match:
for fn in *.out; do
if [[ $fn =~ ^(.*)_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} ]]; then
cap="${BASH_REMATCH[1]}"
printf "%s => %s\n" "$fn" "$cap"
else
echo "$fn no match"
fi
done
As a side note, you can use Bash parameter expansion rather than a regex if you only need to trim the string after the last _ in the file name:
for fn in *.out; do
cap="${fn%_*}"
printf "%s => %s\n" "$fn" "$cap"
done
And then test $cap against $fn. If they are equal, the parameter expansion did not trim the file name after _ because it was not present.
The regex allows a test that a date-like string \d\d\d\d-\d\d-\d\d is after the _. Up to you which you need.
Code
See this code in use here
^\w+(?=_)
Results
Input
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
Output
EXAMPLE_FILE
EXAMPLE_FILE_2
Explanation
^ Assert position at start of line
\w+ Match any word character (a-zA-Z0-9_) between 1 and unlimited times
(?=_) Positive lookahead ensuring what follows is an underscore _ character
Simply with sed:
sed 's/_[^_]*$//' file
The output:
EXAMPLE_FILE
EXAMPLE_FILE_2
----------
In case of iterating through the list of files with extension .out - bash solution:
for f in *.out; do echo "${f%_*}"; done
awk -F_ 'NF-=1' OFS=_ file
EXAMPLE_FILE
EXAMPLE_FILE_2
Could you please try awk solution too, which will take care of all the .out files, note this has ben written and tested in GNU awk.
awk --re-interval 'FNR==1{if(val){close(val)};split(FILENAME, array,"_[0-9]{4}-[0-9]{2}-[0-9]{2}");print array[1];val=FILENAME;nextfile}' *.out
Also my awk version is old so I am using --re-interval, if you have latest version of awk you may need not to use it then.
Explanation and Non-one liner fom of solution: Adding a non-one liner form of solution too here with explanation.
awk --re-interval '##Using --re-interval for supporting ERE in my OLD awk version, if OP has new version of awk it could be removed.
FNR==1{ ##Checking here condition that when very first line of any Input_file is being read then do following actions.
if(val){ ##Checking here if variable named val value is NOT NULL then do following.
close(val) ##close the Input_file named which is stored in variable val, so that we will NOT face problem of TOO MANY FILES OPENED, so it will be like one file read close it in background then.
};
split(FILENAME, array,"_[0-9]{4}-[0-9]{2}-[0-9]{2}");##Splitting FILENAME(which will have Input_file name in it) into array named array only, whose separator is a 4 digits-2 digits- then 2 digits, actually this will take care of YYYY-MM-DD format in Input_file(s) and it will be easier for us to get the file name part.
print array[1]; ##Printing array 1st element here.
val=FILENAME; ##Storing FILENAME variable value which will have current Input_file name in it to variable named val, so that we could close it in background.
nextfile ##nextfile as it name suggests it will skip all the lines in current line and jump onto the next file to save some cpu cycles of our system.
}
' *.out ##Mentioning all *.out Input_file(s) here.
I want to extract MIB-Objects from snmpwalk output. The output FILE looks like:
RFC1213-MIB::sysDescr.0.0.0.0.192.168.1.2 = STRING: "Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u1 (2017-06-18) x86_64"
RFC1213-MIB::sysObjectID.0 = OID: RFC1155-SMI::enterprises.8072.3.2.10
..
First, I read the output file, split at character = and remove everything between RFC1213-MIB:: and .0 till the end of the string.
while read -r; do echo "${REPLY%%=*}" | sed -e 's/RFC1213-MIB::\(.*\)\.0/\1/'; done <$FILE
My current output:
sysDescr.0.0.0.192.168.1.2
sysObjectID
How can I remove the other values? Is there a better solution of extracting sysDescr, sysObjectID?
With awk:
awk -F[:.] '{print $3}'
(define : and . as field delimiters and display the 3rd field)
with sed (Gnu):
sed 's/^[^:]*::\|\.0.*//g'
(replace with the empty string all that isn't a : followed by :: at the start of the line or the first .0 and following characters until the end of the line)
Maybe you can try with:
sed 's/RFC1213-MIB::\([^\.]*\).*/\1/' $FILE
This will get everything that is not a dot (.) following the RFC1213-MIB:: string.
If you don't want to use sed, you can just use parameter substitution. sed is an external process so it won't be as fast as parameter substitution since it's a bash built in.
while IFS= read -r line; do line=${line#*::}; line=${line%%.*}; echo $line; done < file
line=${line#*::} assumes RFC1213-MIB does not have two colons and will be split from sysDescr with two colons.
line=${line%%.*} assumes sysDescr will have a . after it.
If you have more examples, that you think won't work, I can update my answer.
I want to extract from < to the next from my log-files.
$>cat messages.log
2013-03-24 19:32:37.231 <F280 [192.168.178.22]:5000 -- Unknown>, Msg:[Test1]
2013-03-24 19:32:37.547 <F281 [192.168.178.22]:5000 -- Unknown>, Msg:[Test2
Test3
Test4]
2013-03-24 19:32:38.833 <F280 [192.168.178.22]:5000 -- Unknown>, Msg:[Test5]
2013-03-24 19:32:42.222 <F281 [192.168.178.22]:5000 -- Unknown>, Msg:[Test6]
$>sed 's/.*\<\(.*\) \[.*/\1|/g' messages.log
F280|
F281|
Test3
Test4]
F280|
F281|
I almost got what I wanted except for the output with the newlines. So I'd like to have the following result:
F280|F281|F280|F281
How has the regular expression look like?
I wouldn't create a unreadable regexp to do this I'd use awk here:
$ awk -F'[< ]' '/^[0-9]+/{s?s=s"|"$4:s=s$4}END{print s}' file
F280|F281|F280|F281
Try this:
sed -n '/</{s/^.*<\([^ ]\+\) .*$/\1|/g;H;${x;s/\n//g;s/|$//;p}}' messages.log
Try something like that (you'll have nested groups), or turn on multiline option in regex:
(^.+<(\w+) .+$)+
Is it compulsory to only use grep or are also other commands available?
I'd say that
grep "<.* " messages.log | sed 's/.*\<\(.*\) \[.*/\1|/g' | tr -d '\n' | sed 's/.$//'
The first grep is to remove data not following your desired pattern, followed by your sed command.
On the output, who should look like
F280|
F281|
F280|
F281|
The last tr command just removes the newline character at the end of each line (i.e it concatenates the result) while the last sed is just to remove the final pipe delimiter