How to use 'sed' to automate changes of config file? - regex

I am trying to create a script that dynamically finds line numbers in a .groovy config file and then utilizes the 'head/tail' command to insert multiple lines of code into the .groovy' config file. I cannot hardcode line numbers into the script because the vendor may alter the config and order of line numbers in the future. Anybody have suggestions for the best way to accomplish this?
EX.)
1: This is line one
2: This is line two
Problem: I need to insert:
test {
test{
authenticationProvider =/random/path
}
}
I cannot hard code the lie numbers in sed because they may change in the future. How can I dynamically make sed find the appropriate line number and insert multiple lines of code in the proper format?

this should do
$ line_num=2; seq 5 | sed "${line_num}r insert"
1
2
test {
test{
authenticationProvider =/random/path
}
}
3
4
5
to be inserted text is placed in the file named insert. Since there is no sample input file, I generated sequence of 5 as the input source.

Assuming you can find the line number, you could do this fairly easy with a bash script:
file insert-lines.sh:
#!/bin/bash
MYLINE=$1
FILE=$2
head -$MYLINE < $FILE
cat <<__END__
test {
test{
authenticationProvider =/random/path
}
}
__END__
tail +$((MYLINE+1)) $FILE
Then you can run this:
chmod 755 insert-lines.sh
./insert-lines.sh 3 .groovy > .groovy.new
mv .groovy.new .groovy
and the script will insert the block between lines 3 and 4 of the .groovy file.
Note that I'm assuming a recent Linux distro which supports the tail +n syntax, which outputs the end of the file starting at line n. You'll have to replace that part of the code if your version of tail does not support it.

Related

How can I combine multiple text files, remove duplicate lines and split the remaining lines into several files of certain length?

I have a lot of relatively small files with about 350.000 lines of text.
For example:
File 1:
asdf
wetwert
ddghr
vbnd
...
sdfre
File 2:
erye
yren
asdf
jkdt
...
uory
As you can see line 3 of file 2 is a duplicate of line 1 in file 1.
I want a program / Notepad++ Plugin that can check and remove these duplicates in multiple files.
The next problem I have is that I want all lists to be combined into large 1.000.000 line files.
So, for example, I have these files:
648563 lines
375924 lines
487036 lines
I want them to result in these files:
1.000.000 lines
511.523 lines
And the last 2 files must consist of only unique lines.
How can I possibly do this? Can I use some programs for this? Or a combination of multiple Notepad++ Plugins?
I know GSplit can split files of 1.536.243 into files of 1.000.000 and 536.243 lines, but that is not enough, and it doesn't remove duplicates.
I do want to create my own Notepad++ plugin or program if needed, but I have no idea how and where to start.
Thanks in advance.
You have asked about Notepad++ and are thus using Windows. On the other hand, you said you want to create a program if needed, so I guess the main goal is to get the job done.
This answer uses Unix tools - on Windows, you can get those with Cygwin.
To run the commands, you have to type (or paste) them in the terminal / console.
cat file1 file2 file3 | sort -u | split -l1000000 - outfile_
cat reads the files and echoes them; normally, to the screen, but the pipe | gets the output of the command left to it and pipes it through to the command on the right.
sort obviously sorts them, and the switch -u tells it to remove duplicate lines.
The output is then piped to split which is being told to split after 1000000 lines by the switch -l1000000. The - (with spaces around) tells it to read its input not from a file but from "standard input"; the output in sort -u in this case. The last word, outfile_, can be changed by you, if you want.
Written like it is, this will result in files like outfile_aa, outfile_ab and so on - you can modify this with the last word in this command.
If you have all the files in on directory, and nothing else is in there, you can use * instead of listing all the files:
cat * | sort -u | split -l1000000 - outfile_
If the files might contain empty lines, you might want to remove them. Otherwise, they'll be sorted to the top and your first file will not have the full 1.000.000 values:
cat file1 file2 file3 | grep -v '^\s*$' | sort -u | split -l1000000 - outfile_
This will also remove lines that consist only of whitespace.
grep filters input using regular expressions. -v inverts the filter; normally, grep keeps only lines that match. Now, it keeps only lines that don't match. ^\s*$ matches all lines that consist of nothing else than 0 or more characters of whitespace (like spaces or tabs).
If you need to do this regularly, you can write a script so you don't have to remember the details:
#!/bin/sh
cat * | sort -u | split -l1000000 - outfile_
Save this as a file (for example combine.sh) and run it with
./combine.sh

Remove lines from a file which has a matching regex from another file [duplicate]

This question already has answers here:
How to remove the lines which appear on file B from another file A?
(12 answers)
Closed 7 years ago.
I have this shell script:
AVAIL_REMOVAL=$(grep -oPa '^.*(?=(\.com))' $HOME/dcheck/files/available.txt) | sed -i "/$AVAIL_REMOVAL/d" $HOME/dcheck/files/domains.txt
$HOME/dcheck/files/available.txt
unregistereddomain1.com available 15/12/28_14:05:27
unregistereddomain3.com available 15/12/28_14:05:28
$HOME/dcheck/files/domains.txt
unregistereddomain1
registereddomain2
unregistereddomain3
I want to remove unregistereddomain1 and unregistereddomain3 lines from domains.txt. How is it possible?
Also, is there a faster solution than grep? This benchmark showed that grep needed the most time to execute: Deleting lines from one file which are in another file
EDIT:
This works with one line files, but not multiline:
sed -i "/$(grep -oPa '^.*(?=(\.com))' $HOME/dcheck/files/available.txt)/d" $HOME/dcheck/files/domains.txt
EDIT 2:
Just copy here to have a backup. This solution needed for a domain checker bash script which if terminating some reason, at the next restart, it will remove the lines from the input file:
grep -oPa --no-filename '^.*(?=(\.com))' $AVAILABLE $REGISTERED > $GREPINPUT \
&& awk 'FNR==NR { a[$0]; next } !($0 in a)' $GREPINPUT $DOMAINS > $DOMAINSDIFF \
&& cat $DOMAINSDIFF > $DOMAINS \
&& rm -rf $GREPINPUT $DOMAINSDIFF
Most of the domain checker scripts here trying to solve this removel at the end of the script. But what they do not think about what's happening when the script terminated to run and there's no graceful shutdown? Than it will check again every single line from the input file, including the ones that are already checked... This one solves this problem. This way the script (with proper service management, like docker-compose, systemd, supervisord) can run for years from millions of millions size list files, until it will totally eat up the input file!
from man grep:
-f file
--file=file
Obtain patterns from file, one per line. The empty file contains
zero patterns, and therefore matches nothing. (-f is specified by POSIX.)
Regarding the speed: depending on regexp the performance may differ drastically. The one you use seems /suspicious/. The fixed lines matches are the fastest, almost always.

How to replace using sed command in shell scripting to replace a string from a txt file present in one directory by another?

I am very new to shell scripting and trying to learn the "sed" command functionality.
I have a file called configurations.txt with some variables defined in it with some string values initialised to each of them.
I am trying to replace a string in a file (values.txt) which is present in some other directory by the values of the variables defined. The name of the file is values.txt.
Data present in configurations.txt:-
mem="cpu.memory=4G"
proc="cpu.processor=Intel"
Data present in the values.txt (present in /home/cpu/script):-
cpu.memory=1G
cpu.processor=Dell
I am trying to make a shell script called repl.sh and I dont have alot of code in it for now but here is what I got:-
#!/bin/bash
source /home/configurations.txt
sed <need some help here>
Expected output is after an appropriate regex applied, when I run script sh repl.sh, in my values.txt , It must have the following data present:-
cpu.memory=4G
cpu.processor=Intell
Originally which was 1G and Dell.
Would highly appreciate some quick help. Thanks
This question lacks some sort of abstract routine and looks like "help me do something concrete please". Thus it's very unlikely that anyone would provide a full solution for that problem.
What you should do try to split this task into number of small pieces.
1) Iterate over configuration.txt and get values from each line. To do that you need to get X and Y from a value="X=Y" string.
This regex could be helpful here - ([^=]+)=\"([^=]+)=([^=]+)\". It contains 3 matching groups separated by ". For example,
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\1/' configurations.txt
mem
proc
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\2/' configurations.txt
cpu.memory
cpu.processor
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\3/' configurations.txt
4G
Intel
2) For each X and Y find X=Z in values.txt and substitute it with a X=Y.
For example, let's change cpu.memory value in values.txt with 4G:
>> X=cpu.memory; Y=4G; sed -r "s/(${X}=).*/\1${Y}/" values.txt
cpu.memory=4G
cpu.processor=Dell
Use -i flag to do changes in place.
Here is an awk based answer:
$ cat config.txt
cpu.memory=4G
cpu.processor=Intel
$ cat values.txt
cpu.memory=1G
cpu.processor=Dell
cpu.speed=4GHz
$ awk -F= 'FNR==NR{a[$1]=$2; next;}; {if($1 in a){$2=a[$1]}}1' OFS== config.txt values.txt
cpu.memory=4G
cpu.processor=Intel
cpu.speed=4GHz
Explanation: First read config.txt & save in memory. Then read values.txt. If a particular value was defined in config.txt, use the saved value from memory (config.txt).

how to remove lines from file that don't match regex?

I have a big file that looks like this:
7f0c41d6-f9c6-47aa-a034-d40bc629c973.csv
159890
159891
24faaed6-62ee-4175-8430-5d73b09911c8.csv
159907
5bad221f-25ef-44fa-9086-fd152e697928.csv
642e4ac3-3d46-4b4c-b5c8-aa2fa54d0b04.csv
d0e145a5-ceb8-4d4b-ae47-11e0c9a6548d.csv
159929
ba678cbd-af57-493b-a69e-e7504b4bc328.csv
7750840f-9bf9-4a68-9f25-a2ba0968d481.csv
159955
159959
And I'm only interesting in *.csv files, can someone point me how to remove files that do not end with .csv.
Thank you.
grep "\.csv$" file
will pull out only those lines ending in .csv
Then if you want to put them in a different file;
grep "\.csv$" file > newfile
sed is your friend:
sed -i.bak '/\.csv$/!d' file
-i.bak : in-place edit. creates backup file with .bak extension
([0-9a-zA-Z-]*.csv$)
This is the regex code that only select the filename ending with .csv extensions.
Hope this will help you.
If you are familiar with the vim text editor (vim or vi is typically installed on many linux boxes), use the following vim Ex mode command to remove lines that don't match a particular pattern:
:v/<pattern>/d
For example, if I wanted to delete all lines that didn't contain "column" I would run:
:v/"column"/d
Hope this helps.
If it is the case that you do not want to have to save the names of files in another file just to remove unwanted files, then this may also be an added solution for your needs (understanding that this is an old question).
This single line for loop using the grep "\.csv" file solution recursively so you don't need to manage multiple files names being saved here or there.
for f in *; do if [ ! "$(echo ${f} | grep -Eo '.csv')" == ".csv" ]; then rm "${f}"; fi; done
As a visual aid to show you that it works as intended (for removing all files except csv files) here is a quick and dirty screenshot showing the results using your sample output.
And here is a slightly shorter version of the single line command:
for f in *; do if [ ! "$(echo ${f} | grep -o '.csv')" ]; then rm "${f}"; fi; done
And here is it's sample output using your sample's csv file names and some randomly generated text files.
The purpose for using such a loop with a conditional is to guarantee you only rid yourself of the files you want gone (the non-csv files) and only in the current working directory without parsing the ls command.
Hopefully this helps you and anyone else that is looking for a similar solution.

sed: display lines selected for deleting

How to use verbose flag in sed. Eg. If I'm deleting some lines using sed command then I want them to get displayed on a screen whichever lines are getting deleted. Also let me know if this can be done through a script?
Thanks in advance
sed doesn't have a verbose flag.
You can write a sed script that separates deleted lines from other lines, though. You can look at the deleted lines later, and decide whether deleting them was a good idea.
Here's an example. I want to delete from test.dat every line that starts with a number.
$ cat test.dat
1 First line
2 Second line
3 Third line
A Keep this one
Here's the sed script that will "do" the deleting. It looks for lines that start with a number, writes them to the file "deleted.dat", and then deletes them from the pattern space.
$ cat code/sed/delete-verbose.sed
/^[0-9]/{
w /home/myusername/deleted.dat
d
}
Here's what happens when you run it.
$ sed -f code/sed/delete-verbose.sed test.dat
A Keep this one
And here's what it wrote to "deleted.dat".
$ cat deleted.dat
1 First line
2 Second line
3 Third line
When you're confident the script is going to do the right thing, redirect output to another file, or edit the file in-place (-i option).
This might work for you (GNU sed);
sed -e '/pattern_to_delete/{w /dev/stderr' -e ';d}' input_file > output_file
There is no verbose flag but by sending the lines to be deleted to stderr the effect you require can be achieved.