Remove duplicates based on specific column

Remove duplicates based on specific column - regex

I do have a big text file in the following format on my Linux CentOS 7.
430004, 331108, 075, 11, 19, Chunsuttiwat Nattika
431272, 331108, 075, 11, 19, Chunsuttiwat Nattika
435979, 335086, 803, 6, 19, ANNI BRENDA
436143, 335151, 545, 4, 23, Agrawal Abhishek
436723, 335387, 386, 2, 19, Bhati Naintara
438141, 325426, 145, 11, 19, Teh Joshua
I would like to remove duplicate lines including the origin if it matches on second column.
Expected Output:
435979, 335086, 803, 6, 19, ANNI BRENDA
436143, 335151, 545, 4, 23, Agrawal Abhishek
436723, 335387, 386, 2, 19, Bhati Naintara
438141, 325426, 145, 11, 19, Teh Joshua

Update:
sort + uniq + awk pipeline:
sort -k2,2 file | uniq -f1 -c -w7 | awk '$1==1{ sub(/[[:space:]]*[0-9]+[[:space:]]*/,"",$0); print}'
sort -k2 -n file - sort the file by the 2nd field numerically
uniq -f1 -c - output lines with the number of occurrences (-f1 - skips the 1st field in the file)
awk '$1==1{ $1=""; print}' - print the lines which occur only once ($1==1 - check for count value from uniq command)

Using awk
#Input
awk '{R[i++]=$0;f=$1;$1="";N[$0]++;}
END{for(i=0;i<NR;i++){
temp=R[i];sub(/^[[:digit:]]*\, /,"",R[i]);
if(N[" "R[i]]==1){print temp}}}' filename
#Output
435979, 335086, 803, 6, 19, ANNI BRENDA
436143, 335151, 545, 4, 23, Agrawal Abhishek
436723, 335387, 386, 2, 19, Bhati Naintara
438141, 325426, 145, 11, 19, Teh Joshua

This is all you need:
$ awk 'NR==FNR{c[$2]++;next} c[$2]==1' file file
435979, 335086, 803, 6, 19, ANNI BRENDA
436143, 335151, 545, 4, 23, Agrawal Abhishek
436723, 335387, 386, 2, 19, Bhati Naintara
438141, 325426, 145, 11, 19, Teh Joshua

Related

AWS cloudWatch logs insight, sum the key/value pairs in json

I need some help with the cloud watch insight query that can sum up the key/value pairs in JSON form in logs.
fields #timestamp, #message | filter #message like /CTS/
After this, how do I parse the JSON and sum the key/values
Logs look like this
2022-10-28 16:58:14,685 :INFO: CTS {'aa': 135, 'bb': 187, 'cc': 14, 'dd': 8, 'ee': 3, 'ff': 1} CTE
2022-10-28 16:49:11,397 :INFO: CTS {'aa': 101, 'bb': 153, 'gg': 11, 'ii': 17, 'jj': 2, 'pp': 1, 'zz': 5} CTE
....
...
..
I need to sum up the pairs to make a pie chart.
like aa: 236, bb: 240 .....

"how to fix MathJax linebreaking?"

I'm using doubleslash(\\) for line-breaking ,the cursor is pointing to the next line but a single slash(\) is appending with my data.
This is the input I am giving:
Find the median of the given data:"\\ "13, 16, 12, 14, 19, 12, 14, 13, 14"
The output is:
Find the median of the given data: \13, 16, 12, 14, 19, 12, 14, 13, 14.
Single slash is appending to the data.

Try using \\\\. Your content management system may be using \ as a special character, an that may turn \\ into \ in the resulting HTML. For example, Markdown usually does that.

Is there a way to automate the creation of a new text file that contains a dictionary from a different file?

I'm using Python 2.7
Here I create a set of dictionaries:
day0 = 0
day1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25]
month0 = 0
december = [day0, day1]
calendar = [month0, december]
Then what I want to do is this:
file = open("calendarScript.py", "w")
file.write(calendar) ## Trying to create the calendar in a new doc
file.close()
But I get this error:
TypeError: expected a string or other character buffer object
Is there a way to recreate a dictionary in a new document?
Thank you for your help :)
P.s., I just tried this:
import shutil
shutil.copy(calendar, newFolder)
And got back this error:
TypeError: coercing to Unicode: need string or buffer, list found
Trying to find a way to copy a dict to a new file.

The answer to my problem was "dump". What I was trying to do was "dump" to a text file. Thanks to #KFL for this response (link below):
Writing a dict to txt file and reading it back?
>>> import json
>>> d = {"one":1, "two":2}
>>> json.dump(d, open("text.txt",'w'))
He also answered what was going to be my next problem:
>>> d2 = json.load(open("text.txt"))
>>> print d2
{u'two': 2, u'one': 1}

checking if two lists are equal in Maple

I've got the following lists :
list1:=[1, 5, 14, 30, 55, 91, 140, 204, 285, 385, 506, 650, 819, 1015,
1240, 1496, 1785, 2109, 2470, 2870]
list2:=[1, 5, 14, 30, 55, 91, 140, 204, 285, 385, 506, 650, 819, 1015,
1240, 1496, 1785, 2109, 2470, 2870]
each generated by a procedure I defined. I need to verify that they are equal, which is the case. However, when I tried to use the evalb function as well as a flag that I was updating during a loop, in both cases, I got 'false' as the answer along with the error message:
"error, final value in a for loop must be numeric or a character"
What I am doing wrong?

Maple will automatically resolve multiple copies of lists with identical entries to the same object. So to test equality, you don't even need to traverse the lists programmatically. You can just do:
evalb(list1=list2);
If however you'd like to do a more sophisticated comparison, you can use the verify command. For example, this will verify that the first list has the second list as a sublist:
verify([1, 2, 3, 4, 5], [2, 3, 4], superlist);
Calling verify with no second argument is equivalent to the first evalb test, e.g.:
verify(list1, list2);

input output in prolog

i want to read characters from a file in prolog and place them in a list.
could some one help me out with it?
thanks

SWI-Prolog offers read_file_to_codes/3. Usage example:
?- read_file_to_codes('/etc/passwd', Codes, []).
Codes = [114, 111, 111, 116, 58, 120, 58, 48, 58|...].

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove duplicates based on specific column - regex

This is all you need: $ awk 'NR==FNR{c[$2]++;next} c[$2]==1' file file 435979, 335086, 803, 6, 19, ANNI BRENDA 436143, 335151, 545, 4, 23, Agrawal Abhishek 436723, 335387, 386, 2, 19, Bhati Naintara 438141, 325426, 145, 11, 19, Teh Joshua

Related

AWS cloudWatch logs insight, sum the key/value pairs in json

"how to fix MathJax linebreaking?"

Is there a way to automate the creation of a new text file that contains a dictionary from a different file?

checking if two lists are equal in Maple

input output in prolog

Categories

Resources