looping over different list files, retrieving always the sam Eline

looping over different list files, retrieving always the sam Eline - list

I have three files with a list of files I need to use later to apply a function.
I have managed to create a loop going through the 3 different files, but no exactly the output I need.
My input:
rw_sorted_test.txt
1
2
3
fwd_sorted_test.txt
A
B
C
run_list.txt
1st
2nd
3rd
I am running it like this:
for f in `cat rw_sorted_test.txt`; do for l in `cat fwd_sorted_test.txt`; do for r in `cat run_list.txt` do echo ${f} ${l} ${r}; done; done; done;
What I am obtain now is something like:
1 A 1st
1 A 2nd
1 A 3rd
2 A 1st
2 A 2nd
2 A 3rd
3 A 1st
(...)
What I am looking for is something like:
1 A 1st
2 B 2nd
3 C 3rd
I am sure that it will be something simple, but I am really beginner and all the workarounds have not been working.
Also, how can I then make it run after echo my desired output?
Thank you

Quick try, if it is this that you need:
exec 4< run_list.txt
exec 5< rw_sorted_test.txt
for a in $(cat fwd_sorted_test.txt); do
read run <&4
read sort <&5
echo "$sort $a $run"
done
...output is:
1 1st A
2 2nd B
3 3rd C
Files should also be closed:
exec 4<&-
exec 5<&-
The whole point is to do a single cycle, and read a line at a time from 3 different files. The files which are opened for input (exec ...< ...) should contain at least the same number of lines as the main file, which is controlling the loop.
Some reference could be find here: How do file descriptors work?
or doing some study on bash file descriptors. Hope it helps.

You can use this as a testcase for writing a program in awk:
3 inputfiles, store lines in an array and print everything in an END-block.
In this case you can use another program that works even easier:
paste -d" " rw_sorted_test.txt fwd_sorted_test.txt run_list.txt

Related

Scanning folder to find all files with minimum matching words (UNIX)

Having some trouble figuring out the command line to the following issue and hoping u guys can help!
Basically, I have a folder which contains a ~1000 PDF's. I need to search through every pdf and return the file names of PDF's that match certain words X amount of times.
For example, I have 10 PDF's which all contain the word "Fragile". I would like to return a list of all files that contain "Fragile" a minimum of 3 times throughout the PDF.
I am currently using pdfgrep and giving it a regex to look for, but it will return all the files that match at least once. I have seen a few recommendations out there piping the command with "awk", but i'm not sure what this really does...

Don't know much about pdfgrep, but if the output is like on https://pdfgrep.org/ it should be fairly easy to get the number of the lines in the output, doing something like:
for f in *.pdf; do if [ $(pdfgrep -nHm 10 "Fragile" "$f" | wc -l) -gt 2 ]; then echo $f; fi; done

Compare two files and extract line based on matching substring?

I have two files
crackedHashes.txt formatted as Hash:Password
C3B9FE4E0751FC204C29183910DB9EB4:fretful
CA022093C4BAFA397FAC5FB2E407FCA9:remarkable
36E13152AA93A7631608CD9DD753BD2A:please
hashList.txt formatted as Username:Hash
Frank:C3B9FE4E0751FC204C29183910DB9EB4
Jane:A67BC194586C11FD2F6672DE631A28E0
Lisa:CA022093C4BAFA397FAC5FB2E407FCA9
John:36E13152AA93A7631608CD9DD753BD2A
Dave:6606866DB8B0232B371C2C4C35B37D01
I want a new file that combines the two lists based on the same matching hash.
output.txt
Frank:C3B9FE4E0751FC204C29183910DB9EB4:fretful
Lisa:CA022093C4BAFA397FAC5FB2E407FCA9:remarkable
John:36E13152AA93A7631608CD9DD753BD2A:please
I've been scouring the forums here and can only find things returning one string or not using regex (matching whole line). I've tried to do it in parts so I first broke up crackedHashes by doing sed 's/:.*//' crackedHashes.txt and then was going to do the same for the other file and compare by basically writting a bunch of outfiles and comparing the outfiles. I also tried comparing based on variation of grep -f crackedHashes.txt hashList.txt > outfile.txt but that was yielding many more "results" than it was supposed to.
I could manually do grep <hash> hashList.txt> but when it comes to files and lines I'm a bit lost

With GNU join, bash and GNU sort:
join -1 1 -2 2 -t : <(sort crackedHashes.txt) <(sort -t : -k 2 hashList.txt) -o 2.1,1.1,1.2
Output:
John:36E13152AA93A7631608CD9DD753BD2A:please
Frank:C3B9FE4E0751FC204C29183910DB9EB4:fretful
Lisa:CA022093C4BAFA397FAC5FB2E407FCA9:remarkable
See: man join

How to use 'sed' to automate changes of config file?

I am trying to create a script that dynamically finds line numbers in a .groovy config file and then utilizes the 'head/tail' command to insert multiple lines of code into the .groovy' config file. I cannot hardcode line numbers into the script because the vendor may alter the config and order of line numbers in the future. Anybody have suggestions for the best way to accomplish this?
EX.)
1: This is line one
2: This is line two
Problem: I need to insert:
test {
test{
authenticationProvider =/random/path
}
}
I cannot hard code the lie numbers in sed because they may change in the future. How can I dynamically make sed find the appropriate line number and insert multiple lines of code in the proper format?

this should do
$ line_num=2; seq 5 | sed "${line_num}r insert"
1
2
test {
test{
authenticationProvider =/random/path
}
}
3
4
5
to be inserted text is placed in the file named insert. Since there is no sample input file, I generated sequence of 5 as the input source.

Assuming you can find the line number, you could do this fairly easy with a bash script:
file insert-lines.sh:
#!/bin/bash
MYLINE=$1
FILE=$2
head -$MYLINE < $FILE
cat <<__END__
test {
test{
authenticationProvider =/random/path
}
}
__END__
tail +$((MYLINE+1)) $FILE
Then you can run this:
chmod 755 insert-lines.sh
./insert-lines.sh 3 .groovy > .groovy.new
mv .groovy.new .groovy
and the script will insert the block between lines 3 and 4 of the .groovy file.
Note that I'm assuming a recent Linux distro which supports the tail +n syntax, which outputs the end of the file starting at line n. You'll have to replace that part of the code if your version of tail does not support it.

How can I combine multiple text files, remove duplicate lines and split the remaining lines into several files of certain length?

I have a lot of relatively small files with about 350.000 lines of text.
For example:
File 1:
asdf
wetwert
ddghr
vbnd
...
sdfre
File 2:
erye
yren
asdf
jkdt
...
uory
As you can see line 3 of file 2 is a duplicate of line 1 in file 1.
I want a program / Notepad++ Plugin that can check and remove these duplicates in multiple files.
The next problem I have is that I want all lists to be combined into large 1.000.000 line files.
So, for example, I have these files:
648563 lines
375924 lines
487036 lines
I want them to result in these files:
1.000.000 lines
511.523 lines
And the last 2 files must consist of only unique lines.
How can I possibly do this? Can I use some programs for this? Or a combination of multiple Notepad++ Plugins?
I know GSplit can split files of 1.536.243 into files of 1.000.000 and 536.243 lines, but that is not enough, and it doesn't remove duplicates.
I do want to create my own Notepad++ plugin or program if needed, but I have no idea how and where to start.
Thanks in advance.

You have asked about Notepad++ and are thus using Windows. On the other hand, you said you want to create a program if needed, so I guess the main goal is to get the job done.
This answer uses Unix tools - on Windows, you can get those with Cygwin.
To run the commands, you have to type (or paste) them in the terminal / console.
cat file1 file2 file3 | sort -u | split -l1000000 - outfile_
cat reads the files and echoes them; normally, to the screen, but the pipe | gets the output of the command left to it and pipes it through to the command on the right.
sort obviously sorts them, and the switch -u tells it to remove duplicate lines.
The output is then piped to split which is being told to split after 1000000 lines by the switch -l1000000. The - (with spaces around) tells it to read its input not from a file but from "standard input"; the output in sort -u in this case. The last word, outfile_, can be changed by you, if you want.
Written like it is, this will result in files like outfile_aa, outfile_ab and so on - you can modify this with the last word in this command.
If you have all the files in on directory, and nothing else is in there, you can use * instead of listing all the files:
cat * | sort -u | split -l1000000 - outfile_
If the files might contain empty lines, you might want to remove them. Otherwise, they'll be sorted to the top and your first file will not have the full 1.000.000 values:
cat file1 file2 file3 | grep -v '^\s*$' | sort -u | split -l1000000 - outfile_
This will also remove lines that consist only of whitespace.
grep filters input using regular expressions. -v inverts the filter; normally, grep keeps only lines that match. Now, it keeps only lines that don't match. ^\s*$ matches all lines that consist of nothing else than 0 or more characters of whitespace (like spaces or tabs).
If you need to do this regularly, you can write a script so you don't have to remember the details:
#!/bin/sh
cat * | sort -u | split -l1000000 - outfile_
Save this as a file (for example combine.sh) and run it with
./combine.sh

Replace non-unique occurences with sed or other command

my first post here and beginner level. Is there a way I can solve this problem with sed (or any other means)? I want to manipulate a newly created file daily and replace some IP and port occurences.
1) I want to replace the first occurence of "5027,5028" with A3 and the second with A4.
2) I want to replace the first occurence of "5026" with A1 and the second with A2.
PS. I have tried to simplify the example and left the preceeding lines with version="y" or version="x" that could be of help to distinguish the occurences from eachother. (The first x and y version pair is a primary connection and the other two the secondary connection).
Input file:
version="x"
commaSeparatedList="5027,5028"`
version="y"
commaSeparatedList="5026"
version="x"
commaSeparatedList="5027,5028"
version="y"
commaSeparatedList="5026"
Edited file:
version="1.4.1-12"
commaSeparatedList="A3"
version="1.3.0"
commaSeparatedList="A1"
version="1.4.1-12"
commaSeparatedList="A4"
version="1.3.0"
commaSeparatedList="A2"
Sorry, I had some editing horror for a few minutes. Hope it looks easier to understand now. I am basically receiving this file on a system that is deployed nightly and I want to edit this file using a cron job before it starts to make sure a connection works.

Do not bother trying to use sed for this. It can be done, but sed is the wrong tool.
Use awk instead. To replace the first occurrence of "5027,5028" with A3 and the second with A4.
awk '/5027,5028/ && count < 2 { if( count ++ ) repl="A4"; else repl="A3";
sub( "5027,5028", repl)} 1' input
The second replacement is left as an exercise. It is basically the same thing, and you can either run awk twice or just add additional clauses the above.
To overwrite the original file, use shell redirections:
awk ... input > tmpfile && mv tmpfile input

This might work for you (GNU Sed):
sed '1,/5027,5028/s/5027,5028/A3/;s/5027,5028/A4/;1,/5026/s/5026/A1/;s/5026/A2/' file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js