Using sed command with delimiter character - regex

I have a text file as below containing multiple lines with = in the middle of each line.
User name = user1
Date expire = Oct 20, 2019
I want to find Date expire and replace the right side of = which is the date with something else via sed. For example, Oct 25, 2019.
I know basic usage of sed 's/foo/bar/g' but that is used for fixed strings. I want to change part of the sentence by detecting a special character.
How can I do that?

Could you please try following.
sed '/Date expire/s/\(.*= \).*/\1 your_new_text_here/' Input_file
Using sed mechanism of storing matched regex values into tempraory buffer. Taking everything into 1st buffer till = and then keeping rest of the line's value without storing onto buffer. Finally substituting whole line with 1st value and new value

Related

Highlight line for specific commit in git log graph

I am trying to highlight the whole line for a specific commit in my git log graph. I have since before created a git log alias to format the output of my logs. I have attempted to highlight a specific line containing the commit-id, using my alias.
Alias in ~/.gitconfig
# Base command for log formatting
lg-base = "log --graph --decorate=short --decorate-refs-exclude='refs/tags/*' --color=always"
# Version 1 log format
lg1 = !"git lg-base --format=format:'%C(#f0890c)%h%C(reset) - %C(bold green)(%ar)%C(reset) %C(white)%s%C(reset) %C(dim white)- %an%C(reset)%C(#d10000)%d%C(reset)'"
Doing a test with searching for 6 months just because it should behave the same and might showcase my issue a bit better.
git lg1 | grep --color=always -E '(6 months).*|$'
Matches the correct lines. But it doesn't highlight the whole line to the right and when trying to highlight the left part of the line as well, it doesn't work as expected. Probably because of my lack in skills of using regex.
git lg1 | grep --color=always -E '.*(6 months).*|$'
Instead it marks the * in the beginning.
If you have a total other approach, that is fine with me as long as I can use my formatted git log alias.
Thomas' comment is the key to the issue here: although grep is adding its own color (or colour) changing escape sequences to highlight the line, Git has already put in color changing directives. Each such directive, for one of the named colors, looks like this:
ESC[numberm
where the number part is 30 through 37 for a foreground color and 40 through 47 for a background color (plus some extra codes for bold or dim, which I won't include here). (%C(reset) sends ESC [ m and your orange selector uses a 24-bit color directive, which is less widely supported than the eight base colors, which go back to the 1990s). Hence the original output reads:
* <sp> <orange> <hash-ID> <reset> <sp> - <sp> <blue> (n months ago) <reset> ...
The grep adds red, which is ESC [ 31 m, and a reset, around the matched expression—but the existing escapes within the expression remain.
The easiest way by far to avoid all this is to stop using color escape sequences at all, so that grep's added ones stick out like a sore red thumb. Of course that defeats your goal, which is to keep the color-changing escapes in lines that aren't highlighted. But you haven't explained what you'd like done with the color-changing escapes in lines, or parts of lines, that are highlighted. Answering that will determine what to do next.
There are any number of ways you could handle this. For instance, instead of %C(color)%<directive>%C(reset) you could use %x1b(name-of-color)%<directive>%x1b(reset) to insert the literal sequences ESC ( name of color or reset ), or assume that the terminal in question will use ANSI style escapes that end with the lowercase m character, and try to write something up in sed or awk (I'd use awk for something this complex, just because it's less like writing line noise) that does the match—awk supports regex matching—and if found, strips out the color sequences from the matched part and adds its own. Post-process this with something that inserts the appropriate terminal-dependent color-change sequences, or keep the original ESC [ ... m sequences on the assumption that you're in a window that uses that form, and you'll have the output you want (which you can now pipe through less -R if desired).
A skeleton awk program that does what you want is:
/<desired regex>/ { handle matched line; next; }
{ print }
The hard part is the "handle matched line". GNU awk has RSTART and RLENGTH to help out a lot; see, e.g., this answer. The substring of the line from the beginning to RSTART-1 wasn't matched (this may be empty), and the substring from RSTART+RLENGTH to the end of the line (which may also be empty) also was not matched; the substring of $0 at RSTART for length RLENGTH was matched and here's where you would strip out any color-changing sequences, if you want your basic red (or whatever) applied throughout.
Sample script (by Robin Hellmers)
Creating a script and placing it where you please, e.g.
~/.local/bin/highlight-commit.awk
with the contents
#!/usr/bin/nawk -f
BEGIN {
n = split(commits,arrayCommits," ");
background="145;0;0"
foreground="255;255;255"
}
{
# Compare with every given input e.g. commit id
for (i=1; i <= n; i++) {
if(match($0,arrayCommits[i])) {
# Remove any ANSI color escape sequence for matching row
gsub("\x1b\\[[0-9;]*m","",$0)
# Create ANSI color escape sequence for whole row
$0 = sprintf("\x1b[48;2;%sm\x1b[38;2;%sm%s\x1b[0m\x1b[0m",
background,
foreground,
$0);
break;
}
}
printf("%s\n", $0);
}
In ~/.gitconfig, add the following alias:
[alias]
highlight-commit = "!f() { git lg | awk -v commits=\"$*\" -f ~/.local/bin/highlight-commit.awk | less -XR; }; f"
By calling with e.g. two commits:
git highlight-commit 82451f8 310fca4

Regex to match and split every third occurrence of a string

in a Korn Shell script I have a large amount of data in a string variable contents that matches the following syntax:
account_id_0:group_id_0:name_0
account_id_1:group_id_1:name_1
...
account_id_N:group_id_N:name_N
I want to split the string on the : character every third instance so I can generate three other strings accounts,groups, and names
that have the format:
accounts = account_id_0,account_id_1,...,account_id_N
groups = group_id_0,group_id_1,...,group_id_N
names = name_0,name_1,...,name_N
The reason I would like to store these in a string rather than an array is for portability across environments.
Am I able to achieve this using something like the sed, cut, or awk command?
the current regex I'm using to capture the accounts is:
[a-zA-Z][0-9]+(?:([a-zA-z]*[0-9]*)*)(?:([a-zA-Z]*[0-9]*)*)
But I feel there is a more efficient alternative.
I have attempted to achieve the desired output using a combination of this solution and this solution however the first one lacks the repetition I require, and the latter is for file manipulation not strings.
I would use arrays, and process the contents variable like reading lines from a file:
contents='account_id_0:group_id_0:name_0
account_id_1:group_id_1:name_1
...:...:...
account_id_N:group_id_N:name_N'
as=()
gs=()
ns=()
while IFS=: read -r a g n; do
as+=("$a")
gs+=("$g")
ns+=("$n")
done <<< "$contents"
accounts=$(IFS=,; echo "${as[*]}")
groups=$(IFS=,; echo "${gs[*]}")
names=$(IFS=,; echo "${ns[*]}")
printf "%s\n" "$accounts" "$groups" "$names"
account_id_0,account_id_1,...,account_id_N
group_id_0,group_id_1,...,group_id_N
name_0,name_1,...,name_N
If you're getting the contents value from a file, you can skip the step of storing it in a variable and just read the file directly.

How to parse csv output requiring multiple matches using one-liner?

I have a scenario, where I have post-process / filter values taken out from DB. I'm using perl ple for the task. All works well until I come across extracted output (csv) which contains multiple text tags. See sample here. The code works same (extract regex) correctly if there is just one text tag. In my db there are instances where there are more then one text files (i.e rule conditions).
The code is
echo "COPY (SELECT rule_data FROM custom_rule) TO STDOUT with CSV HEADER" | psql -U qradar -o /tmp/Rules.csv qradar;
perl -ple '
($enabled) = /(?<=enabled="").*?(?="")/g;
($group) = /(?<=group="").*?(?="")/g;
($name) = /(?<=<name>).*?(?=<\/name>)/g;
($text) = /(?<=<text>).*?(?=<\/text>)/g;
$_= "$enabled;$group;$name;$text";
s/<.*?>//g;
' Rules.csv > rules_revised.csv
Just running the code on sample output I get following content in rule_revised file.
true;Flow Property Tests;DoS: Local Flood (Other);when the flow bias
is any of the following outbound
Actually the line is truncated after outbound which infact should carry information similar to this..
when at least 3 flows are seen with the same Source IP,
Destination IP in 5 minutes and when the IP protocol is one of the
following IPSec, Uncommon and when the source packets is greater than
60000
I have tried to correct this by making the regex greedy removing the ? in $text but then it overflow all in-between text till the last text and at the end removing lt;.*?>messes the rest as it includes all the tag characters (i.e html) elements which I originally intended to dis include before making the regex greedy change.
The reason you are getting a truncated result with multiple matches is that you only store the first one.
($text) = /(?<=<text>).*?(?=<\/text>)/g;
This only stores the first match. If you change that scalar to an array, you will capture all matches:
(#text) = /(?<=<text>).*?(?=<\/text>)/g;
When you interpolate the array, it will insert spaces (the value of $") between the elements. If you do not want that, you can change the value of $" to an acceptable delimiter. To be clear, you would change two characters to get the following lines:
(#text) = /(?<=<text>).*?(?=<\/text>)/g;
...
$_= "$enabled;$group;$name;#text";
If I run your code on your sample with these changes the output looks like this:
false;Flow Property Tests;DoS: Local Flood (Other);when the flow bias is any of the following outbound when at least 3 flows are seen with the same Source IP, Destination IP in 5 minutes when the IP protocol is one of the following IPSec, Uncommon when the source packets is greater than 60000
Have you tried to use the s modifier, it make the dot match newline:
perl -ple '
($enabled) = /(?<=enabled="").*?(?="")/g;
($group) = /(?<=group="").*?(?="")/g;
($name) = /(?<=<name>).*?(?=<\/name>)/g;
($text) = /(?<=<text>).*?(?=<\/text>)/gs;
# here ___^
$_= "$enabled;$group;$name;$text";
s/<.*?>//g;
' Rules.csv > rules_revised.csv

Using sed to erase field in bibtex entry

I'm faced with a text file containing multiple bibtex instances like this one
#article{Lindgren1989Resonant,
abstract = {Using a simple model potential, a truncated image barrier, for the
Al(111) surface, one obtains a resonant bound surface state at an energy
that agrees surprisingly well with recent observations by inverse
photoemission.},
author = {Lindgren and Walld\'{e}n, L.},
citeulike-article-id = {9286612},
citeulike-linkout-0 = {http://dx.doi.org/10.1103/PhysRevB.40.11546},
citeulike-linkout-1 = {http://adsabs.harvard.edu/cgi-bin/nph-bib\_query?bibcode=1989PhRvB..4011546L},
doi = {10.1103/PhysRevB.40.11546},
journal = {Phys. Rev. B},
keywords = {image-potential, surface-states},
month = dec,
pages = {11546--11548},
posted-at = {2011-05-12 11:42:49},
priority = {0},
title = {Resonant bound states for simple metal surfaces},
url = {http://dx.doi.org/10.1103/PhysRevB.40.11546},
volume = {40},
year = {1989}
}
I want to erase the abstract field, which can span over one or multiple (like in the above case) lines. I tried using sed in the follwing manner
sed "/^\s*${field}.*=/,/},?$/{
d
}" file
where file is a text file containing the above bibtex code. However, the output of this command is just
#article{Lindgren1989Resonant,
Obviously sed is matching for the final }, but how do I get it to match the closing bracket of the abstract value?
This might work for you:
sed '1{h;d};H;${x;s/\s*abstract\s*=\s*{[^}]*}\+,//g;p};d' file
This slurps the whole file into the hold space then deletes the abstract fields
Explanation:
On the first line replace the hold space (HS) with the current line, append all subsequent lines to the HS. Upon encountering the last line, swap to the HS and substitute all occurrences of the abstract field then print the file out. N.B. all lines that would normally be printed out are deleted.
does this awk line work for you?
awk '/abstract *= *{/{a=1} (a && /} *,$/){a=0;next;}!a' yourInput
Addresses in sed match in a weird way:
addr2 can match BEFORE addr1 which is what you are experiencing with your expression! Use multiple blocks.

How would I parse in a bash script date_value _space_ date_value

I am trying to import a tsv file into a mysql db but I am having trouble since the file has no unique delimiters to identify where a new row starts. The only unique identifier is a date followed by a space followed by time. Example: 6/19/2010 16:04:43
Could someone please point me in the right direction or help me make a bash script that puts a semicolon ";" in front of that string. So the end result will be ;6/19/2010 16:04:43
The tricky part is that in this file there will be other date fields and other time fields but this is the only string that will have a space in between the two.
cat file | sed 's#[0-9]\{1,2\}/[0-9]\{1,2\}/[0-9]\{4\} #;&#g' >resultfile. Test before using.