I have the following task:
match all lines which end with a number and then reverse these numbers
example:
romarana:qwerty12543
asdewfpwk:asdqwe312
asdj:asbd
asdewfpwk:strwtwe129
ooasodo:asbdjahj
should be:
romarana:qwerty34521
asdj:asbd
asdewfpwk:asdqwe213
asdewfpwk:strwtwe921
ooasodo:asbdjahj
What I tried:
sed -r "/[0-9]$/s/[0-9]{1,}$/$(rev <<< &)/" test.txt
NOTE: you can ignore lines that don't end with the number for now.
NOTE: You can use awk,grep or any other tool
With perl
$ perl -pe 's/\d+$/reverse $&/e' ip.txt
romarana:qwerty34521
asdewfpwk:asdqwe213
asdj:asbd
asdewfpwk:strwtwe921
ooasodo:asbdjahj
The e modifier allows to use Perl code in replacement section. $& contains the matched portion.
This can also be done with sed alone, by inserting a separator character (let's take the number sign) before the number and then repeatedly moving the line's last digit before the separator:
sed 's/\([0-9]*\)$/#\1/;:b;s/#\([0-9]*\)\([0-9]\)$/\2#\1/;tb;s/#$//'
You can do this with an awk command, as in the following bash script:
#!/usr/bin/env sh
( echo romarana:qwerty12543
echo asdewfpwk:asdqwe312
echo asdj:asbd
echo asdewfpwk:strwtwe129
echo ooasodo:asbdjahj ) | awk '
/[0-9]+$/ { # Lines ending in digits.
num = txt = $0 # Divide into text and num.
gsub("[0-9]+$", "", txt)
num = substr(num, length(txt)+1)
revnum = "" # Build reversed number bit.
while (num != "") {
revnum = substr(num, 1, 1)""revnum
num = substr(num, 2)
}
print txt""revnum" (from "$0")" # Output text, reversed num.
next
}
{ print } # Not digits at end.
'
It's pretty verbose, and could probably be reduced, but it does the job (you can get rid of the from output, that's just there so you can see it's working):
pax:~> ./testprog.sh
romarana:qwerty34521 (from romarana:qwerty12543)
asdewfpwk:asdqwe213 (from asdewfpwk:asdqwe312)
asdj:asbd
asdewfpwk:strwtwe921 (from asdewfpwk:strwtwe129)
ooasodo:asbdjahj
With GNU awk could you please try following.
awk '
match($0,/[0-9]+$/,a){
num=split(a[0],arr,"")
for(i=num;i>0;i--){
val=val arr[i]
}
print substr($0,1,RSTART-1) val
val=""
next
}
1
' Input_file
Output will be as follows.
romarana:qwerty34521
asdewfpwk:asdqwe213
asdj:asbd
asdewfpwk:strwtwe921
ooasodo:asbdjahj
With GNU awk for the 3rd arg to match() and null FS splitting $0 into chars:
$ awk -v FS= 'match($0,/(.*[^0-9])([0-9]+)$/,a) {
$0=a[2]; for (i=NF;i>=1;i--) a[1]=a[1] $i; $0=a[1]
} 1' file
romarana:qwerty34521
asdewfpwk:asdqwe213
asdj:asbd
asdewfpwk:strwtwe921
ooasodo:asbdjahj
Related
I'm trying to emulate GNU grep -Eo with a standard awk call.
What the man says about the -o option is:
-o --only-matching
Print only the matched (non-empty) parts of matching lines, with each such part on a separate output line.
For now I have this code:
#!/bin/sh
regextract() {
[ "$#" -ge 2 ] || return 1
__regextract_ere=$1
shift
awk -v FS='^$' -v ERE="$__regextract_ere" '
{
while ( match($0,ERE) && RLENGTH > 0 ) {
print substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+1)
}
}
' "$#"
}
My question is: In the case that the matching part is 0-length, do I need to continue trying to match the rest of the line or should I move to the next line (like I already do)? I can't find a sample of input+regex that would need the former but I feel like it might exist. Any idea?
Here's a POSIX awk version, which works with a* (or any POSIX awk regex):
echo abcaaaca |
awk -v regex='a*' '
{
while (match($0, regex)) {
if (RLENGTH) print substr($0, RSTART, RLENGTH)
$0 = substr($0, RSTART + (RLENGTH > 0 ? RLENGTH : 1))
if ($0 == "") break
}
}'
Prints:
a
aaa
a
POSIX awk and grep -E use POSIX extended regular expressions, except that awk allows C escapes (like \t) but grep -E does not. If you wanted strict compatibility you'd have to deal with that.
If you can consider a gnu-awk solution then using RS and RT may give identical behavior of grep -Eo.
# input data
cat file
FOO:TEST3:11
BAR:TEST2:39
BAZ:TEST0:20
Using grep -Eo:
grep -Eo '[[:alnum:]]+' file
FOO
TEST3
11
BAR
TEST2
39
BAZ
TEST0
20
Using gnu-awk with RS and RT using same regex:
awk -v RS='[[:alnum:]]+' 'RT != "" {print RT}' file
FOO
TEST3
11
BAR
TEST2
39
BAZ
TEST0
20
More examples:
grep -Eo '\<[[:digit:]]+' file
11
39
20
awk -v RS='\\<[[:digit:]]+' 'RT != "" {print RT}' file
11
39
20
Thanks to the various comments and answers I think that I have a working, robust, and (maybe) efficient code now:
tested on AIX/Solaris/FreeBSD/macOS/Linux
#!/bin/sh
regextract() {
[ "$#" -ge 1 ] || return 1
[ "$#" -eq 1 ] && set -- "$1" -
awk -v FS='^$' '
BEGIN {
ere = ARGV[1]
delete ARGV[1]
}
{
tail = $0
while ( tail != "" && match(tail,ere) ) {
if (RLENGTH) {
print substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART+RLENGTH)
} else
tail = substr(tail,RSTART+1)
}
}
' "$#"
}
regextract "$#"
notes:
I pass the ERE string along the file arguments so that awk doesn't pre-process it (thanks #anubhava for pointing that out); C-style escape sequences will still be translated by the regex engine of awk though (thanks #dan for pointing that out).
Because assigning $0 does reset the values of all fields,
I chose FS = '^$' for limiting the overhead
Copying $0 in a separate variable nullifies the overhead induced by assigning $0 in the while loop (thanks #EdMorton for pointing that out).
a few examples:
# Multiple matches in a single line:
echo XfooXXbarXXX | regextract 'X*'
X
XX
XXX
# Passing the regex string to awk as a parameter versus a file argument:
echo '[a]' | regextract_as_awk_param '\[a]'
a
echo '[a]' | regextract '\[a]'
[a]
# The regex engine of awk translates C-style escape sequences:
printf '%s\n' '\t' | regextract '\t'
printf '%s\n' '\t' | regextract '\\t'
\t
Your code will malfunction for match which might have zero or more characters, consider following simple example, let file.txt content be
1A2A3
then
grep -Eo A* file.txt
gives output
A
A
your while's condition is match($0,ERE) && RLENGTH > 0, in this case former part gives true, but latter gives false as match found is zero-length before first character (RSTART was set to 1), thus body of while will be done zero times.
Using awk or sed in a bash script, I need to remove comma separated delimiters that are located between an inner and outer delimiter. The problem is that wrong values ends up in the wrong columns, where only 3 columns are desired.
For example, I want to turn this:
2020/11/04,Test Account,569.00
2020/11/05,Test,Account,250.00
2020/11/05,More,Test,Accounts,225.00
Into this:
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
I've tried to use a few things, testing regex:
But I cannot find a solution to only select the commas in order to remove.
awk -F, '{ printf "%s,",$1;for (i=2;i<=NF-2;i++) { printf "%s ",$i };printf "%s,%s\n",$(NF-1),$NF }' file
Using awk, print the first comma delimited field and then loop through the rest of the field up to the last but 2 field printing the field followed by a space. Then for the last 2 fields print the last but one field, a comma and then the last field.
With GNU awk for the 3rd arg to match():
$ awk -v OFS=, '{
match($0,/([^,]*),(.*),([^,]*)/,a)
gsub(/,/," ",a[2])
print a[1], a[2], a[3]
}' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
or with any awk:
$ awk '
BEGIN { FS=OFS="," }
{
n = split($0,a)
gsub(/^[^,]*,|,[^,]*$/,"")
gsub(/,/," ")
print a[1], $0, a[n]
}
' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
Use this Perl one-liner:
perl -F',' -lane 'print join ",", $F[0], "#F[1 .. ($#F-1)]", $F[-1];' in.csv
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
$F[0] : first element of the array #F (= first comma-delimited value).
$F[-1] : last element of #F.
#F[1 .. ($#F-1)] : elements of #F between the second from the start and the second from the end, inclusive.
"#F[1 .. ($#F-1)]" : the above elements, joined on blanks into a string.
join ",", ... : join the LIST "..." on a comma, and return the resulting string.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perl -pe 's{,\K.*(?=,)}{$& =~ y/,/ /r}e' file
sed -e ':a' -e 's/\(,[^,]*\),\([^,]*,\)/\1 \2/; t a' file
awk '{$1=$1","; $NF=","$NF; gsub(/ *, */,","); print}' FS=, file
awk '{for (i=2; i<=NF; ++i) $i=(i>2 && i<NF ? " " : ",") $i} 1' FS=, OFS= file
awk doesn't support look arounds, we could have it by using match function of awk; using that could you please try following, written and tested with shown samples in GNU awk.
awk '
match($0,/,.*,/){
val=substr($0,RSTART+1,RLENGTH-2)
gsub(/,/," ",val)
print substr($0,1,RSTART) val substr($0,RSTART+RLENGTH-1)
}
' Input_file
Yet another perl
$ perl -pe 's/(?:^[^,]*,|,[^,]*$)(*SKIP)(*F)|,/ /g' ip.txt
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
(?:^[^,]*,|,[^,]*$) matches first/last field along with the comma character
(*SKIP)(*F) this would prevent modification of preceding regexp
|, provide , as alternate regexp to be matched for modification
With sed (assuming \n is supported by the implementation, otherwise, you'll have to find a character that cannot be present in the input)
sed -E 's/,/\n/; s/,([^,]*)$/\n\1/; y/,/ /; y/\n/,/'
s/,/\n/; s/,([^,]*)$/\n\1/ replace first and last comma with newline character
y/,/ / replace all comma with space
y/\n/,/ change newlines back to comma
A similar answer to Timur's, in awk
awk '
BEGIN { FS = OFS = "," }
function join(start, stop, sep, str, i) {
str = $start
for (i = start + 1; i <= stop; i++) {
str = str sep $i
}
return str
}
{ print $1, join(2, NF-1, " "), $NF }
' file.csv
It's a shame awk doesn't ship with a join function builtin
How do i replace second dot after comma.
this is the closest i could go
echo '0.592922148,0.821504176,1.174.129.731' | xargs -d ',' -n1 echo | sed 's/\([^\.]*\.[^\.]*\)\./\1/' | sed 's/\([^\.]*\.[^\.]*\)\./\1/'
Output :
0.592922148
0.821504176
1.174129731
Expected output :
0.592922148,0.821504176,1.174129731
You may use
sed -e ':a' -e 's/\(\.[^.,]*\)\./\1/' -e 't a'
See online sed demo:
s='0.592922148,0.821504176,1.174.129.731'
sed -e ':a' -e 's/\(\.[^.,]*\)\./\1/' -e 't a' <<< "$s"
Details
:a - label a
s/\(\.[^.,]*\)\./\1/ - finds and captures into Group 1 a dot, then any 0+ chars other than dot and comma, and then just matches a dot, and replaces this match with the value in Group 1 (thus, removing the second matched dot)
t a - if there was a successful replacement, goes back to the a label position in the string.
While I think the sed solution is your best choice, since you have tagged your question with both sed and awk, an awk solution is fairly straight forward as well using split() and basic string concatenation. (just not nearly as short) For example you could do:
awk -v OFS=, -F, '{
for (i=1; i<=NF; i++) {
n=split ($i, a,".")
if (n > 2) {
s=a[1] "." a[2]
for (j=3; j<=n; j++)
s = s a[j]
$i=s
}
}
}1'
Where you define the field separator and output field separators as ','. Then looping over each field, check the return of split(), splitting the field into an array on '.' into array a. If the resulting number of elements is greater than 2, then put your first two elements back together restoring the first '.' in the number, and then simply concatenate the remaining fields. The 1 at the end is the default "print record" to print the updated record.
Example Use/Output
$ echo '0.592922148,0.821504176,1.174.129.731' |
> awk -v OFS=, -F, '{
> for (i=1; i<=NF; i++) {
> n=split ($i, a,".")
> if (n > 2) {
> s=a[1] "." a[2]
> for(j=3;j<=n;j++)
> s = s a[j]
> $i=s
> }
> }
> }1'
0.592922148,0.821504176,1.174129731
Could you please try following.
echo '0.592922148,0.821504176,1.174.129.731' |
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
ind=index($i,".")
if(ind){
val1=substr($i,1,ind)
val2=substr($i,ind+1)
gsub(/\./,"",val2)
$i=val1 val2
}
}
val1=val2=""
}
1'
Explanation: Adding explanation for above code.
echo '0.592922148,0.821504176,1.174.129.731' | ##Printing values as per OP mentioned and using pipe to send its output as standard input for awk command.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program here.
FS=OFS="," ##Setting FS and OFS as comma for each line of Input_file here.
} ##Closing BEGIN BLOCK here.
{
for(i=1;i<=NF;i++){ ##Starting a for loop to traverse through fields of line..
ind=index($i,".") ##Checking index of DOT in current field and saving it into ind variable.
if(ind){ ##Checking condition if variable ind is NOT NULL.
val1=substr($i,1,ind) ##Creating variable val1 from sub-string in current field from 1 to ind value.
val2=substr($i,ind+1) ##Creating variable val2 from sub-string in current field from ind+1 value to till complete length of current field.
gsub(/\./,"",val2) ##Globally substituting DOTs with NULL in val2 variable.
$i=val1 val2 ##Re-crearing current field with value of val1 val2.
} ##Closing BLOCK for if condition.
} ##Closing BLOCK for for loop.
val1=val2="" ##Nullifying val1 and val2 variables here.
} ##Closing main code BLOCK here.
1' ##Mentioning 1 will print edited/non-edited line.
An awk verison:
echo '0.592922148,0.821504176,1.174.129.731' | awk -F, '{for (i=1;i<=NF;i++) {sub(/\./,"#",$i);gsub(/\./,"",$i);sub(/#/,".",$i);print $i}}'
0.592922148
0.821504176
1.174129731
It splits the line inn to multiple fields by ,. Then replace first . to #. Then replace rest of . to nothing. Last replace # back to . and print it.
Edit
awk -F, '{for (i=1;i<=NF;i++) {sub(/\./,"#",$i);gsub(/\./,"",$i);sub(/#/,".",$i);a=a (i==1?"":",")$i}print a}' file
0.592922148,0.821504176,1.174129731
I would like to reverse the complete text from the file.
Say if the file contains:
com.e.h/float
I want to get output as:
float/h.e.com
I have tried the command:
rev file.txt
but I have got all the reverse output: taolf/h.e.moc
Is there a way I can get the desired output. Do let me know. Thank you.
Here is teh link of teh sample file: Sample Text
You can use sed and tac:
str=$(echo 'com.e.h/float' | sed -E 's/(\W+)/\n\1\n/g' | tac | tr -d '\n')
echo "$str"
float/h.e.com
Using sed we insert \n before and after all non-word characters.
Using tac we reverse the output lines.
Using tr we strip all new lines.
If you have gnu-awk then you can do all this in a single awk command using 4 argument split function call that populates split strings and delimiters separately:
awk '{
s = ""
split($0, arr, /\W+/, seps)
for (i=length(arr); i>=1; i--)
s = s seps[i] arr[i]
print s
}' file
For non-gnu awk, you can use:
awk '{
r = $0
i = 0
while (match(r, /[^a-zA-Z0-9_]+/)) {
a[++i] = substr(r, RSTART, RLENGTH) substr(r, 0, RSTART-1)
r = substr(r, RSTART+RLENGTH)
}
s = r
for (j=i; j>=1; j--)
s = s a[j]
print s
}' file
Is it possible to use Perl?
perl -nlE 'say reverse(split("([/.])",$_))' f
This one-liner reverses all the lines of f, according to PO's criteria.
If prefer a less parentesis version:
perl -nlE 'say reverse split "([/.])"' f
For portability, this can be done using any awk (not just GNU) using substrings:
$ awk '{
while (match($0,/[[:alnum:]]+/)) {
s=substr($0,RLENGTH+1,1) substr($0,1,RLENGTH) s;
$0=substr($0,RLENGTH+2)
} print s
}' <<<"com.e.h/float"
This steps through the string grabbing alphanumeric strings plus the following character, reversing the order of those two captured pieces, and prepending them to an output string.
Using GNU awk's split, splitting from separators . and /, define more if you wish.
$ cat program.awk
{
for(n=split($0,a,"[./]",s); n>=1; n--) # split to a and s, use n from split
printf "%s%s", a[n], (n==1?ORS:s[(n-1)]) # printf it pretty
}
Run it:
$ echo com.e.h/float | awk -f program.awk
float/h.e.com
EDIT:
If you want to run it as one-liner:
awk '{for(n=split($0,a,"[./]",s); n>=1; n--); printf "%s%s", a[n], (n==1?ORS:s[(n-1)])}' foo.txt
I have an index HTML file with file/dir listing. It is just a usual filebrowser like :
...content here...
<td>20120011/</td>
<td>20120111/</td>
<td>20120211/</td>
<td>20120411/</td>
...content here...
I don't understand how to extract the 2nd line from the bottom.
1) I downloaded HTML with curl
content=$(curl -sL "http://path-to-html")
2) then used
dir=$(echo $content | sed '/.*href="\([0-9]*\/\)".*/!d;s//\1/;q')
which gives me the last match : 20120411.
But how to get the previous one ?
I don't know the total count of items.
This awk program will print the penultimate line:
echo ${content} | awk '{ pen = ult; ult = $0 } END { print pen }'
This will print the penultimate matching line:
echo ${content} | awk '/href="([0-9]{8}\/)"/ { pen = ult; ult = $0 } END { print pen }'
If you just want to extract the first capture group:
echo ${content} | awk 'match($0, /href="([0-9]{8}\/)"/, a) { pen = ult; ult = a[1] } END { print pen }'
Putting it all together:
bash-4.2$ dir=$(curl -sL http://www.arteetmarte.no/tmp/index.html |
awk 'match($0, /href="([0-9]{8}\/)"/, a) {
pen = ult
ult = a[1]
}
END {
print pen
}
')
bash-4.2$ echo ${dir}
20130918/
Tested with: GNU Awk 4.1.0, API: 1.0
May be a bit easier with awk
dir=$(echo "$content"|awk '/href=/{x=p;p=$0}END{sub(/.*">/,"",x);sub(/<.*/, "",x); print x}')
dir=$(echo $content | sed sed -n '/href="\([0-9]\{1,\}\/\)"/ {s|.*href="\([0-9]\{1,\}/\)".*|-\1-|;H;}
$ {x;l;s|.*-\([0-9]\{1,\}/\)-\(\n-[0-9]\{1,\}/-\)\{1\}$|\1|p;}')
The 1 in \{1\}$ specify how much line must be removed from the end