I have a large file that contains timestamps in the following format:
2018-08-22T13:06:04.442774Z
I would like to add double quotes around all the occurrences that match this specific expression. I am trying to use sed, but I don't seem to be able to find the right command. I am trying something around these lines:
sed -e s/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}.[0-9]\{6\}Z/"$0"/g my_file.json
and I am pretty sure that the problem is around my "replace" expression.
How should I correct the command?
Thank you in advance.
You should wrap the sed replacement command with single quotes and use & instead of $0 in the RHS to replace with the whole match:
sed 's/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\.[0-9]\{6\}Z/"&"/g' file > outfile
See the online demo
Also, do not forget to escape the . char if you want to match a dot, and not any character.
You may also remove excessive escapes if you use ERE syntax:
sed -E 's/[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{6}Z/"&"/g'
If you want to change the file inline, use the -i option,
sed -i 's/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\.[0-9]\{6\}Z/"&"/g' file
The following works:
sed 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\.[0-9]\{6\}Z\)/"\1"/g' my_file.json
multiple modifications:
wrap command in single quotes
use \( and \) to create a group (referenced by '\1` in the replacement section)
escape the '.' and '{' and '}' characters
Related
I need to replace only single instance of backslash.
Input: \\apple\\\orange\banana\\\\grape\\\\\
Output: \\apple\\\orangebanana\\\\grape\\\\\
Tried using sed 's/\\//g' which is replacing all backslashes
Note: The previous character to single backslash can be anything including alphanumeric or special characters. And it's a multiline text file.
Appreciate your help on this.
If you want to consider perl then lookahead and lookahead is exactly what you need here:
perl -pe 's~(?<!\\)\\(?!\\)~~g' file
\\apple\\\orangebanana\\\\grape\\\\\
Details
(?<!\\): Negative lookbehind to make sure that previous char is not \
\\: Match a \
(?!\\): Negative lookahead to make sure that next char is not \
If you want to use sed only then I suggest:
sed -E -e ':a' -e 's~(^|[^\\])\\([^\\]|$)~\1\2~g; ta' g
When you want to replace at most one single backslash, you can use
sed -r 's/(.*[^\]|^)\\([^\].*|$)/\1\2/g'
The command is ugly due to the possibility of a line starting or ending with a backslash (need to include the possibility ^ and $).
When you want to get rid off '\al\l \\sin\gle\slas\hes \\\on \\\\a \\\\\l\i\n\e\' , you can remove a backslash from any sequence of backslashes and afterwards put one back at any place where at least one backslash is left:
sed -r 's/\\([\]*)/\1/g;s/([\]+)/\\\1/g'
or, as suggested by #potong,
sed -E 's/\\(\\*)/\1/g;s/(\\+)/\\&/g'
I like the solution, as it mimics someone who removes one of any sequence of backslashes and tries to undo his last operation. The "bug" in his attempt is that the resulting output is missing the single slashes.
With your shown samples, please try following sed code. Written and tested with GNU sed.
sed -E 's/^(\\\\[^\]*\\\\\\)([^\]*)\\(.*)/\1\2\3/' Input_file
Explanation: Using -E option to enable ERE(extended regular expression) for this program. Then using sed's back reference capability(to save matched part into temp buffer which could be used later in substitution part) here. Creating 1st capturing group which has \\apple\\\ in it. In 2nd capturing group it has orange in it then in 3rd capturing group it has rest of line in it. Now if you see carefully we have left \ between orange and banana, which is needed as per OP's required output.
This might work for you (GNU sed):
sed 's/\>\\\<//g' file
Delete a single \ between word boundaries.
Having the following string inside of a text file.
{"_job":"delete","query":{"query":{"bool":{"must":[{"term":{"_id":"28381"}}],"should":[]}}},"script":{"inline":"ctx._source.meta='This
is a ' test string Peedr'"},"timestamp":1518165383,"host":"","port":"9200","index":"","docType":"","customIndexer":""}
I would like to replace all the ' that are inside the ctx._source.meta='' part with \' using sed.
In the example above I've This is a ' test string Peedr which I would like to convert to This is a \' test string Peedr, so the desired output would be:
{"_job":"delete","query":{"query":{"bool":{"must":[{"term":{"_id":"28381"}}],"should":[]}}},"script":{"inline":"ctx._source.meta='This
is a \' test string
Peedr'"},"timestamp":1518165383,"host":"","port":"9200","index":"","docType":"","customIndexer":""}
I'm using the following regex to get the ' that is inside the ctx._source.meta string (3rd capture group).
(meta=')(.*?)(')(.*?)(')
I've the regex, but I dont know how to use the sed comand in order to replace the 3rd capture group with \'.
Can someone give me a hand and tell me the sed comand I have to use?
Thanks in advance
sed generally does not support the Perl regex extensions, so the non-greedy .*? will probably not do what you hope. If you want to use Perl regex, use Perl!
perl -pe "s/(meta='.*?)(')(.*?')/\$1\\\\\$2\$3/"
This will still not necessarily work if the input is malformed; a better approach would be to specifically exclude single quotes from the match, and then you don't need the non-greedy matching.
sed "s/\\(meta='[^']*\\)'\\([^']*'\\)/\\1\\\\'\\2/"
In both cases, the number of backslashes required to escape the backslashes inside the shell's double quotes is staggering.
You put back-references to groups except one you want to replace. There is a better way to accomplish same task:
sed -E "s/(ctx\._source\.meta=')([^']*)(')([^']*')/\1\2\\'\4/"
You may use:
sed "s/ ' / \\\' /g" sample.txt
The first part will instruct sed to only look for a single quote between 2 spaces, as such ctx._source.meta='This and string Peedr'"} will not match, hence will not be changed.
Edit:
At the poster's request, I edited my sed command to apply to extra use cases:
sed "s/\(ctx._source.meta='.*\)'\(.*Peedr'\"\)/\1\\\'\2/g"
The following Regex works as expected in Notepad++:
^.*[^a-z\r\n].*$
However, when I try to use it with sed, it wont work.
sed -r 's/\(^.*[^a-z\r\n].*$\)//g' wordlist.txt
You could use:
sed -i '/[^a-z]/d' wordlist.txt
This will delete each line that has a non-alphabet character (no need to specify linefeeds)
EDIT:
You regex doesn't work because you are trying to match
( bracket
^ beginning of line
...
$ end of line
) bracket
As you won't have a bracket and then the beginning of the line, your regex simply doesn't match anything.
Note, also an expression of
s/\(^.*[^a-z\r\n].*$\)//g'
wouldn't delete a line but replace it with a blank line
EDIT2:
Note, in sed using the -r flag changes the behaviour of \( and \) without the -r flag they are group indicators, but with the -r flag they're just brackets...
Two things:
Sed is a stream editor. It processes one line of the input at a time. That means the search and replace commands, etc, can only see the current line. By contrast, Notepad++ has the whole file in memory and so its search expressions can span two or more lines.
Your command sed -r 's/\(^.*[^a-z\r\n].*$\)//g' wordlist.txt includes \( and \). These mean real (ie non-escaped) round brackets. So the command says find a line that starts with a ( and ends with a ) with some other characters between and replace it with nothing. Rewriting the command as sed -r 's/^.*[^a-z\r\n].*$//g' wordlist.txt should have the desired effect. You could also remove the \r\n to give sed -r 's/^.*[^a-z].*$//g' wordlist.txt. But neither of these will be exactly the same as the Notepad++ command as they will leave empty lines. So you may find the command sed -r '/^.*[^a-z].*$/d' wordlist.txt is closer to what you really want.
I need to use sed to look for all lines in a file with pattern "[whatever]|[whatever]" so I'm using the following regex:
sed '/\"[a-zA-Z0-9]+\|[a-zA-Z0-9]+\"/p' test2.txt
But it's not working because in this file is returning something when it shouldn't
RTV0031605951US|3160595|20/03/2013|0|"Laurie Graham"|"401"
Does anybody know with regex should I use? Thanks in advance
I see three problems with your regular expression:
+ is not a metacharacter, so you need to escape it to get its special meaning.
Similar issue happens with the pipe. Neither it is a metacharacter, so don't escape it to match it literally.
Sed by default prints each line that matches, so add -n that avoids that, if you already use /p that prints it. Otherwise you will have those lines twice in the output.
sed will output anything that is a partial match.
To match only whole lines that match your regex, add ^ and $ to the start/end:
sed '/^\"[a-zA-Z0-9]+\|[a-zA-Z0-9]+\"$/p' test2.txt
sed '/\B\"[ [:alnum:]]\+\"|\"[ [:alnum:]]\+\"\B/!d' file
If you use this in a sed script, do not escape double quotes.
I am trying to replace an expression using sed. The regex works in vim but not in sed. I'm replacing the last dash before the number with a slash so
/www/file-name-1
should return
/www/file-name/1
I am using the following command but it keeps outputting /www/file-name/0 instead
sed 's/-[0-9]/\/\0/g' input.txt
What am I doing wrong?
You must surround between parentheses the data to reference it later, and sed begins to count in 1. To recover all the characters matched without the need of parentheses, it is used the & symbol.
sed 's/-\([0-9]\)/\/\1/g' input.txt
That yields:
/www/file-name/1
You need to capture using parenthesis before you can back reference (which start a \1). Try sed -r 's|(.*)-|\1/|':
$ sed -r 's|(.*)-|\1/|' <<< "/www/file-name-1"
/www/file-name/1
You can use any delimiter with sed so / isn't the best choice when the substitution contains /. The -r option is for extended regexp so the parenthesis don't need to be escaped.
It seems sed under OS X starts counting backreferences at 1. Try \1 instead of \0