Using sed to fix format of date string

Using sed to fix format of date string - regex

The question specifically involves modifying a string of form
abc_MM-DD-YY_XX.jpg
(where XX can be comprised of two or three digits) to
xyz_YYYY-MM-DD_XXX.jpg
I was able to do this using:
sed 's/\(.*_\)\(.\{5\}\)-\([0-9][0-9]\)_\([0-9][0-9]\.\)/xyz_20\3-\2_0\4/'
I was wondering, though, if there are any better, perhaps more concise alternatives. Also, is using TRE (tagged regular expression) the only way sed can accomplish such a task? Thanks!
EDIT: Sorry, to clarify, the original string can either be in the format "abc_MM-DD-YY_XX.jpg" or "abc_MM-DD-YY_XXX.jpg", but the output format must be "abc_MM-DD-YY_XXX.jpg". So in the first case I would want to pad "XX" with a 0 and in the second case I would want to leave it be. I also realized that my expression doesn't work for the second case...

This will work only in the century!
Using awk
I would use awk for that. It is simpler to use:
awk -F'[-_]' '$0="xyz_20"$4"-"$2"-"$3"_"sprintf("%03d",$5)' <<<'abc_03-24-15_11.jpg'
will give you:
xyz_2015-03-24_011.jpg
while:
awk -F'[-_]' '$0="xyz_20"$4"-"$2"-"$3"_"sprintf("%03d",$5)' <<<'abc_03-24-15_111.jpg'
will give you:
xyz_2015-03-24_111.jpg
what should be what you want.
Explanation:
I'm using either - or _ as the field delimiter and simply reorganize the fields. To achieve the padding of and XX value to XXX I'm using sprintf(). (Thanks Amadan)
Using sed
Btw, you can simplify the sed command a lot if you would use the -r option and if you simply match sequences of not occurring characters:
sed -r 's/([^_]+)_([^-]+)-([^-]+)-([^_]+)_([^.]+)/xyz_20\4-\2-\3_0\5/;' <<<'abc_03-24-15_12.jpg'
(This doesn't work perfectly since it does not solve the XX to XXX problem properly at the moment. Give me a minute ... )
To solve that you can simply append another s command:
s/0([0-9]{3})\./\1./
which will replace the sequence 0123 by 123. The final command looks like this:
sed -r 's/([^_]+)_([^-]+)-([^-]+)-([^_]+)_([^.]+)/xyz_20\4-\2-\3_0\5/;s/0([0-9]{3})\./\1./' <<<'abc_03-24-15_12.jpg'
Doesn't it look simpler using -r ;) (hihi)

Related

Deleting the un-matched portion using sed

I'm having a text file containing data in the following format:
2020-01-01 00:00:00 #gibberish - key1:{value1}, unwanted key2:{value2}, unwanted key3:{value3}
I wanted to collect the timestamp in the beginning and key-value pairs alone. Like the following
2020-01-01 00:00:00,key1:{value1},key2:{value2},key3:{value3}
I'm able to write a regex script that can select the required values (works in visual studio code)
^([0-9 :-]+)|([0-9A-z,_-]+):\{(.*?)\}
(first pattern selects the timestamp and second part selects the key-value pattern)
Now, how can I select the un-matched part and delete it using sed ?
Note: I tried using egrep to match the required pattern and writing it to a new file. But every matched string is written on a new line instead of maintaining on the same line. That is not useful to me.
egrep -o '^([0-9 :-]+)|([0-9A-z,_-]+):\{(.*?)\}' source.txt > target.txt

Going from last to first, I can comment that:
egrep: yes, that is the designed behavior - egrep is probably not what you want to use.
sed: it is important to note that sed uses POSIX regular expressions which is simpler and much more limited than what people expect from regular expressions these days. Most of the new style (enhanced, perl-compatible, etc) regular expression work in the last few decades was done in Perl, which is readily available on UNIX systems and is probably what you want to use (but also note that in macOS, like all Apple distributed UNIX programs, the perl binary there is pretty outdated. It will probably still do what you want, but be warned).
Your regular expression uses a range [A-z], which is weird and doesn't work in my egrep or sed - I understand what you want to do, but it shouldn't work in system that actually use character sets (I'm not sure what Visual Studio is doing with this range, but it seems bonkers to me). You probably meant to use [A-Za-z].
I would have written this thing, using Perl, like so:
perl -nle '#res = (); while(m/^([0-9 :-]+\d)|([0-9A-Za-z,_-]+:\{[^}]+\})/g) {
push #res, "$1$2";
};
print join ",",#res' < source.txt > target.txt

With your shown samples, could you please try following. Written and tested in GNU awk in case you are ok with it.
awk '
match($0,/[0-9]{4}-[0-9]{2}-[0-9]{2}[[:space:]]+([0-9]{2}:){2}[0-9]{2}/){
val=""
printf("%s ",substr($0,RSTART,RLENGTH))
while(match($0,/key[0-9]+:{value[0-9]+}(,|$)/)){
val=(val?val OFS:"")substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
print val
}
' Input_file

This might work for you (GNU sed):
sed -E 's/\S+/\n&/3g;s#.*#echo "&"|sed "1b;/:{.*}/!d;s/, *$//"#e;s/ *\n/,/g' file
Split each line into a lines of tokens (keeping the date and time as the first of these lines).
Remove any line (apart from the first) that does not contain the pattern :{...}.
Flatten the lines by replacing the introduced newlines by , separator.

sed -rn 's/([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}[[:space:]]([[:digit:]]{2}:){2}[[:digit:]]{2})(.*)(key1.*,)(.*)(key2.*,)(.*)(key3.*$)/\1,\4\6\8/p' <<< "2020-01-01 00:00:00 #gibberish - key1:{value1}, unwanted key2:{value2}, unwanted key3:{value3}"
Enable regular expression interpretation with sed -r or -E and then split the string into 8 sections using parenthesis. Substitute the line for the 1st, 4th, 6th and 8th sections and print.

sed and regular expression: unexpected replacement pattern

I am trying to use a small bash script using sed to append a string, but I do not seem to be able to make it work.
I want to append a string to another string pattern:
Strings in input file:
Xabc
Xdef
Desired output:
XabcZ
XdefZ
Here is the script:
#!/bin/bash
instring="$2"
sed -r "s/${instring}/${instring}Z/g" $1
Where $1 is the file name and $2 is the string pattern I am looking for
Then I run the script:
bash script.test.sh test.txt X
output:
XZabc
XZdef
As expected.
but if I use regular expressions:
bash script.test.sh test.txt X...
All I get is:
X...Z
X...Z
So obviously it is not reading it correctly in the replacement part of the command. Smae thing if I use X[a-z09] (but there may be "_" in my strings, I want to include those as well). I had a look at several previous similar topics, but I do not seem able to implement any of the solutions correctly (bear with a newbie...). Thank you for your kind help.
EDIT: After receiving the answers from Glenn Jackman (accepted solution) and RavinderSingh13, I would like to clarify two important points for whoever is having a similar issue:
1) Glenn Jackman solution did not work because I needed to convert the text file from DOS to Unix. I did it with dos2unix , but for some reason did not work (maybe forgot to overwrite the output to the old file?). I later did it using sed -i 's/\r$//' test.txt ; that solved the issue, and Glenn's solution now works. having a dos-formatted text file has been the source of many trouble, for me at least.
2) I probably did not make clear that I only wanted to target specific lines in the input files; my example only has target strings, but the actual file has strings that I do not want to edit. That was probably the misunderstanding occurred with RavinderSingh13's script, which actually works, but targets every single line.
Hope this can help future readers. Thank you, Stackers, you saved the day once again :)

What you have (sed -r "s/${instring}/${instring}Z/g" $1) uses the variable as a pattern on the left-hand side and as plain text on the right-hand side.
What you want to do is:
sed -r "s/${instring}/&Z/g" $1
# ....................^
where the & marker is replaced by whatever text the pattern matched. In the documentation for The s Command:
[T]he replacement can contain unescaped & characters which reference the whole matched portion of the pattern space.

EDIT: In case you need to pass a regex to script then following may help, where my previous solution was only appending a character to last of the line.
cat script.ksh
value="$2"
sed "s/$value/&Z/" "$1"
After running the script:
./script.ksh X.*
XabcZ
XdefZ
After seeing OP's comment to match everything which starts from either small letter or capital letter run script in following style then.
./script.ksh [A-Za-z]+*
Could you please try following and let me know if this helps you.
cat script.ksh
value="$2"
sed "s/$/$value/" "$1"
After running script I am getting following output on terminal too.
./script.ksh Input_file Z
XabcZ
XdefZ
You could use sed -i option in above code in case you want to save output into Input_file itself too.

Find a string after a certain character

An example will explain it better:
structure_1/structure_2/<I NEED WHAT'S HERE/structure_3
Structure_1 is always the same value
Structure_2 is a string that can be of any size, sometimes with _ or -
What I need is behind the second forward slash
I don't care what comes after
Other example:
order/shirt/blue_stripes/America
order/pants_ripped/green/Europe
order/skirts/yellow-folded/Asia
order/socks/orange/Africa
Results that I want to become after regex
blue_stripes
pants_ripped
yellow-folded
orange
I'm writing a BASH script for my Unix machine
UPDATE
I first used a regex in order to do this but I was informed by Flying that it would be better to use the command 'awk' and this did the trick with ease!

This one will do the trick: ^(?:[^\/]+\/){2}([^\/]+). You're basically need to skip first 2 groups of chars. You can check it by yourself here.
UPDATE: Since, as defined into comment, actual task is not about finding correct regular expression, but about grepping information from Unix file - it is much better to use awk instead:
awk -F"/" '{print $3}' orders.txt

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.

This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.

You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

What is the best way to do string manipulation in a shell script?

I have a path as a string in a shell-script, could be absolute or relative:
/usr/userName/config.cfg
or
../config.cfg
I want to extract the file name (part after the last /, so in this case: "config.cfg")
I figure the best way to do this is with some simple regex?
Is this correct? Should or should I use sed or awk instead?
Shell-scripting's string manipulation features seem pretty primative by themselves, and appear very esoteric.
Any example solutions are also appreciated.

If you're okay with using bash, you can use bash string expansions:
FILE="/path/to/file.example"
FILE_BASENAME="${FILE##*/}"
It's a little cryptic, but the braces start the variable expansion, and the double hash does a greedy removal of the specified glob pattern from the beginning of the string.
Double %% does the same thing from the end of a string, and a single percent or hash does a non-greedy removal.
Also, a simple replace construct is available too:
FILE=${FILE// /_}
would replace all spaces with underscores for instance.
A single slash again, is non-greedy.

Instead of string manipulation I'd just use
file=`basename "$filename"`
Edit:
Thanks to unwind for some newer syntax for this (which assumes your filename is held in $filename):
file=$(basename $filename)

Most environments have access to perl and I'm more comfortable with that for most string manipulation.
But as mentioned, from something this simple, you can use basename.

I typically use sed with a simple regex, like this:
echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
result:
>echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
config.cfg

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js