How to sort textfile text in this way? [duplicate] - regex

This question already has answers here:
How to merge every two lines into one from the command line?
(21 answers)
Closed 3 years ago.
I have a textfile contains below:
#_.5_sh
#handa12247
#lshydymhmwd
#ahmr0784
#f7j.i
#carameljeddah
#lnqm_iii2
#raghad.ayman.524
#asfhfdfgt4355
#kuw871
#nouralhuda_muhammad
#gogo56817gma
#kaald10000
#sal_0221
#kaled_24009165
#km_kn124
#princess.hana89
#fulefulemm
#norah.0._
#ommajed965
#lam3aastar
#alimarar265
#klthmlmdy
#anas.sasan55
#s.m_b.b
#asnosy_almgrhe_
#norh7132
#880ali7
#tv.creativity
#ksakking3
I'd like to sort them each 5 users in each line.
#_.5_sh #handa12247 #lshydymhmwd #ahmr0784 #f7j.i
#carameljeddah #lnqm_iii2 #raghad.ayman.524 #asfhfdfgt4355 #kuw871
#nouralhuda_muhammad #gogo56817gma #kaald10000 #sal_0221 #kaled_24009165
#km_kn124 #princess.hana89 #fulefulemm #norah.0._ #ommajed965
#lam3aastar #alimarar265 #klthmlmdy #anas.sasan55 #s.m_b.b
#asnosy_almgrhe_ #norh7132 #880ali7 #tv.creativity #ksakking3
i've played around seq, and awk, with failed attempt. i wish somebody who can help me out sorting this way.

Using awk
awk 'ORS=NR%5?FS:RS' file
#_.5_sh #handa12247 #lshydymhmwd #ahmr0784 #f7j.i
#carameljeddah #lnqm_iii2 #raghad.ayman.524 #asfhfdfgt4355 #kuw871
#nouralhuda_muhammad #gogo56817gma #kaald10000 #sal_0221 #kaled_24009165
#km_kn124 #princess.hana89 #fulefulemm #norah.0._ #ommajed965
#lam3aastar #alimarar265 #klthmlmdy #anas.sasan55 #s.m_b.b
#asnosy_almgrhe_ #norh7132 #880ali7 #tv.creativity #ksakking3
It changes the Output Record Selector for every 5 Lines
Edit: This does not work on dos format file, so run dos2unix yourfile before awk

Related

how to pick up to 2 or 3 decimals with lists using python2? [duplicate]

This question already has answers here:
How to round to 2 decimals with Python? [duplicate]
(21 answers)
Closed 3 years ago.
I am reading data from the robot's sensor. it gives this data as a list:
data:[1.0014142543, 0.4142543254, 4.5432544179]
I'd like to convert it like this format:
data:[1.00, 0.41, 4.54]
I am running it with python2.
I think this post for one number.
any ideas or suggestions, it would be highly appreciated.
Considering data as list ,
Using round :
>> data=[1.00, 0.41, 4.54]
>> [round(item,2) for item in data]
Using Float and Format:
>> [float(format(item,".2f")) for item in data]
Output :
[1.0, 0.41, 4.54]

How to format first 7 rows in this txt file using Regex

I have a text file with data formatted as below. Figured out how to format the second part of the file to format it for upload into a db table. Hitting a wall trying to get the just the first 7 lines to format in the same way.
If it wasn't obvious, I'm trying to get it pipe delimited with the exact same number of columns, so I can easily upload it to the db.
Year: 2019 Period: 03
Office: NY
Dept: Sales
Acct: 111222333
SubAcct: 11122234-8
blahblahblahblahblahblahblah
Status: Pending
1000
AAAAAAAAAA
100,000.00
2000
BBBBBBBBBB
200,000.00
3000
CCCCCCCCCC
300,000.00
4000
DDDDDDDDDD
400,000.00
some kind folks answered my question about the bottom part, using the following code I can format that to look like so -
(.*)\r?\n(.*)\r?\n(.*)(?:\r?\n|$)
substitute with |||||||$1|$2|$3\n
|||||||1000|AAAAAAAAAA|100,000.00
|||||||2000|BBBBBBBBBB|200,000.00
|||||||3000|CCCCCCCCCC|300,000.00
|||||||4000|DDDDDDDDDD|400,000.00
just need help formatting the top part - to look like this, so the entire file matches with the exact same number of columns.
Year: 2019|Period: 03|Office: NY|Dept: Sales|Acct: 111222333|SubAcct: 11122234-8|blahblahblahblahblahblahblah|Status: Pending|||
I'm ok with having multiple passes on the file to get the desired end result.
I've helped you on your previous question, so I will focus now on the first part of your file.
You can use this regex:
\n|\b(?=Period)
Working demo
And use | as the replacement string
If you don't want the previous space before Period, then you can use:
\n|\s(?=Period)

Extracting part of lines with specific pattern and sum the digits using bash

I am just learning bash scripting and commands and i need some help with this assignment.
I have txt file that contains the following text and i need to:
Extract guest name ( 1.1.1 ..)
Sum guest result and output the guest name with result.
I used sed with simple regex to extract out the name and the digits but i have no idea about how to summarize the numbers becuase the guest have multiple lines record as you can see in the txt file. Note: i can't use awk for processing
Here is my code:
cat file.txt | sed -E 's/.*([0-9]{1}.[0-9]{1}.[0-9]{1}).*([0-9]{1})/\1 \2/'
And result is:
1.1.1 4
2.2.2 2
1.1.1 1
3.3.3 1
2.2.2 1
Here is the .txt file:
Guest 1.1.1 have "4
Guest 2.2.2 have "2
Guest 1.1.1 have "1
Guest 3.3.3 have "1
Guest 2.2.2 have "1
and the output should be:
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
Thank you in advance
I know your teacher wont let you use awk but, since beyond this one exercise you're trying to learn how to write shell scripts, FYI here's how you'd really do this job in a shell script:
$ awk -F'[ "]' -v OFS=' = ' '{sum[$2]+=$NF} END{for (id in sum) print id, sum[id]}' file
3.3.3 = 1
2.2.2 = 3
1.1.1 = 5
and here's a bash builtins equivalent which may or may not be what you've covered in class and so may or may not be what your teacher is expecting:
$ cat tst.sh
#!/bin/env bash
declare -A sum
while read -r _ id _ cnt; do
(( sum[$id] += "${cnt#\"}" ))
done < "$1"
for id in "${!sum[#]}"; do
printf '%s = %d\n' "$id" "${sum[$id]}"
done
$ ./tst.sh file
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
See https://www.artificialworlds.net/blog/2012/10/17/bash-associative-array-examples/ for how I'm using the associative array. It'll be orders of magnitude slower than the awk script and I'm not 100% sure it's bullet-proof (since shell isn't designed to process text there are a LOT of caveats and pitfalls) but it'll work for the input you provided.
OK -- since this is a class assignment, I will tell you how I did it, and let you write the code.
First, I sorted the file. Then, I read the file one line at a time. If the name changed, I printed out the previous name and count, and set the count to be the value on that line. If the name did not change, I added the value to the count.
Second solution used an associative array to hold the counts, using the guest name as the index. Then you just add the new value to the count in the array element indexed on the guest name.
At the end, loop through the array, print out the indexes and values.
It's a lot shorter.

CLI method for vlookup like search [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I have a huge csv file, demo.csv (few GBs in size) which has 3 columns like the following:
$ cat demo.csv
call_start_time,called_no,calling_no
43284.85326,1111111111,2222222222
43284.83192,3333333333,1111111111
43284.83205,2222222222,1111111111
43284.81304,4444444444,3333333333
I am trying to find the rows which has repeated values in either column 2 or 3 (whatever the order). For example, this should be the output for the data shown above:
call_start_time,called_no,calling_no
43284.85326,1111111111,2222222222
43284.83205,2222222222,1111111111
I tried to use csvkit:
csvsql --query "select called_no, calling_no, call_start_time, count(1) from file123 group by called_no,calling_no having count(1)>1" file123.csv > new.csv
With awk you can build an associative array a with records as values and keys k build with the fields $2 and $3 sorted and joined with a pipe.
awk -F, 'NR==1; { k=($3<$2) ? $3"|"$2 : $2"|"$3; if (a[k]) { if (a[k]!="#") {print a[k];a[k]="#"} print} else a[k]=$0}' file
If the current record has a key that already exists, the stored record is printed (only if it is the first time) and the current record is printed too.
$ awk '
NR==1 { print; next }
{ key = ($2>$3 ? $2 FS $3 : $3 FS $2) }
seen[key]++ { print orig[key] $0; delete orig[key]; next }
{ orig[key] = $0 ORS }
' file
call_start_time called_no calling_no
43284.85326 1111111111 2222222222
43284.83205 2222222222 1111111111

how to use grep to parse out columns in csv

I have a log with millions of lines that like this
1482364800 bunch of stuff 172.169.49.138 252377 + many other things
1482364808 bunch of stuff 128.169.49.111 131177 + many other things
1482364810 bunch of stuff 2001:db8:0:0:0:0:2:1 124322 + many other things
1482364900 bunch of stuff 128.169.49.112 849231 + many other things
1482364940 bunch of stuff 128.169.49.218 623423 + many other things
Its so big that I can't really read it into memory for python to parse so i want to zgrep out only the items I need into another smaller file but Im not very good with grep. In python I would normally open.gzip(log.gz) then pull out data[0],data[4],data[5]to a new file so my new file only has the epoc and ip and date(the ip can be ipv6 or 4)
expected result of the new file:
1482364800 172.169.49.138 252377
1482364808 128.169.49.111 131177
1482364810 2001:db8:0:0:0:0:2:1 124322
1482364900 128.169.49.112 849231
1482364940 128.169.49.218 623423
How do I do this zgrep?
Thanks
To select columns you have to use cut command zgrep/grep select lines
so you can use cut command like this
cut -d' ' -f1,2,4
in this exemple I get the columns 1 2 and 4 with space ' ' as a delimiter of the columns
yous should know that -f option is used to specify numbers of columns and -d for the delimiter.
I hope that I have answered your question
I'm on OSX and maybe that is the issue but I couldnt get zgrep to work in filtering out columns. and zcat kept added a .Z at the end of the .gz. Here's what I ended up doing:
awk '{print $1,$3,$4}' <(gzip -dc /path/to/source/Largefile.log.gz) | gzip > /path/to/output/Smallfile.log.gz
This let me filter out the 3 columns I needed from the Largefile to a Smallfile while keeping both the source and destination in compressed format.