How to change the current line in BASH using regex - regex

My task is to change all function names in C source code (definitions, declarations and comments) which have only one underscore and only small letters. For example, my_func(void) ----> myFunc(void). How should I do it? In my code I check if the line of the file contains name of function which I have to change, but I have no idea how to do it. Or may be this task has a more efficient and better solution?
while read line; do
if [[ "$line" =~ ^(int|char|float|long|short|void|double)?[[:space:]]?[^_]([a-z0-9]+[_]?)+[(]([a-z[:space:]])*[)][:space:]*[{]? ]]; then
# here should be the code
fi
done < ${FILENAMES[i]}
I understand that here should be used 'sed' like
sed -i 's/_//' ${FILENAMES[i]}
but this command changes all lines of my file, not only what I want and have to change. Thanks.

Related

How to run list of perl regex from file in terminal

I'm fairly new to the whole coding game, and am very grateful for every answer!
I am working on a directory with many .txt files in them and have a file with looong list of regex like "perl -p -i -e 's/\n\n/\n/g' *.xml" they all work if I copy them to terminal. But is there a possibility to run them straight from the file?
I tried ./unicode.sh but that resulted in:
No such file or directory.
Any ideas?
Thank you so much!
Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml (one main difference being that this has strict and warnings enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while loop.
#!/usr/bin/env perl
use warnings;
use strict;
if (!#ARGV) { # if no files on command line
#ARGV = glob('*.xml'); # get a default list of files
}
local $^I = ''; # enable inplace editing (like perl -i)
while (<>) { # read each line of each file into $_
s/\n\n/\n/g; # modify $_ with a regex
# more regexes here...
print; # write the line $_ back out
}
You can save this script in a file such as process.pl, and then run it with perl process.pl, or do chmod u+x process.pl and then run it via ./process.pl.
On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g actually won't have any effect, since when reading files line-by-line, no string will contain two \n's (you can change how Perl reads files, but I don't see any mention of that in the question).
Edit: You've named the script in your example unicode.sh - if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut.
It's likely if you got no such file or directory, your problem was you forgot to make unicode.sh executable, as in chmod +x unicode.sh, assuming that's a script that you wrote.
Of course the normal way to run multiple perl commands is this thing that looks like runme.pl which you write, i.e., a perl script.
That said, yes, everything will work from the terminal, you just need to be careful about escaping that bash performs.

Extracting group from regex in shell script using grep

I want to extract the output of a command run through shell script in a variable but I am not able to do it. I am using grep command for the same. Please help me in getting the desired output in a variable.
x=$(pwd)
pw=$(grep '\(.*\)/bin' $x)
echo "extracted is:"
echo $pw
The output of the pwd command is /opt/abc/bin/ and I want only /root/abc part of it. Thanks in advance.
Use dirname to get the path and not the last segment of the path.
You can use:
x=$(pwd)
pw=`dirname $x`
echo $pw
Or simply:
pw=`dirname $(pwd)`
echo $pw
All of what you're doing can be done in a single echo:
echo "${PWD%/*}"
$PWD variable represents current directory and %/* removes last / and part after last /.
For your case it will output: /root/abc
The second (and any subsequent) argument to grep is the name of a file to search, not a string to perform matching against.
Furthermore, grep prints the matching line or (with -o) the matching string, not whatever the parentheses captured. For that, you want a different tool.
Minimally fixing your code would be
x=$(pwd)
pw=$(printf '%s\n' "$x" | sed 's%\(.*\)/bin.*%\1%')
(If you only care about Bash, not other shells, you could do sed ... <<<"$x" without the explicit pipe; the syntax is also somewhat more satisfying.)
But of course, the shell has basic string manipulation functions built in.
pw=${x%/bin*}

batch renaming of files with perl expressions

This should be a basic question for a lot of people, but I am a biologist with no programming background, so please excuse my question.
What I am trying to do is rename about 100,000 gzipped data files that have existing name of a code (example: XG453834.fasta.gz). I'd like to name them to something easily readable and parseable by me (example: Xanthomonas_galactus_str_453.fasta.gz).
I've tried to use sed, rename, and mmv, to no avail. If I use any of those commands on a one-off script then they work fine, it's just when I try to incorporate variables into a shell script do I run into problems. I'm not getting any errors, just no names are changed, so I suspect it's an I/O error.
Here's what my files look like:
#! /bin/bash
# change a bunch of file names
file=names.txt
while IFS=' ' read -r r1 r2;
do
mmv ''$r1'.fasta.gz' ''$r2'.fasta.gz'
# or I tried many versions of: sed -i 's/"$r1"/"$r2"/' *.gz
# and I tried many versions of: rename -i 's/$r1/$r2/' *.gz
done < "$file"
...and here's the first lines of my txt file with single space delimiter:
cat names.txt
#find #replace
code1 name1
code2 name2
code3 name3
I know I can do this with python or perl, but since I'm stuck here working on this particular script I want to find a simple solution to fixing this bash script and figure out what I am doing wrong. Thanks so much for any help possible.
Also, I tried to cat the names file (see comment from Ashoka Lella below) and then use awk to move/rename. Some of the files have variable names (but will always start with the code), so I am looking for a find & replace option to just replace the "code" with the "name" and preserve the file name structure.
I suspect I am not escaping the variable within the single tick of the perl expression, but I have poured over a lot of manuals and I can't find the way to do this.
If you're absolutely sure than the filenames doesn't contain spaces of tabs, you can try the next
xargs -n2 < names.txt echo mv
This is for DRY run (will only print what will do) - if you satisfied with the result, remove the echo ...
If you want check the existence ot the target, use
xargs -n2 < names.txt echo mv -i
if you want NEVER allow overwriting of the target use
xargs -n2 < names.txt echo mv -n
again, remove the echo if youre satisfied.
I don't think that you need to be using mmv, a simple mv will do. Also, there's no need to specify the IFS, the default will work for you:
while read -r src dest; do mv "$src" "$dest"; done < names.txt
I have double quoted the variable names as it is generally considered good practice but in this case, a space in either of the filenames will result in read not working as you expect.
You can put an echo before the mv inside the loop to ensure that the correct command will be executed.
Note that in your file names.txt, the .fasta.gz suffix is already included, so you shouldn't be adding it inside the loop aswell. Perhaps that was your problem?
This should rename all files in column1 to column2 of names.txt. Provided they are in the same folder as names.txt
cat names.txt| awk '{print "mv "$1" "$2}'|sh

using awk to match a column in log file and print the entire line

I'm trying to write a script which will analyse a log file,
i want to give the user the option to enter a pattern and then print any line which matches this pattern in a specific column (the fifth one)
the following works from the terminal
awk ' $5=="acpid:" {print$0}' *filename*
ok so above im trying to match "acpid:" this works fine but in the script i want to be able to allow multiple entries and search for them all, the problem is i'm messing up the variable in the script this is what i have:
echo "enter any services you want details on, seperated by spaces"
read -a details
for i in ${details[#]}
do
echo $i
awk '$5 == "${i}" {print $0}' ${FILE}
done
again if i directly put in a matching expression instead of the variable it works so i guess my problem is here any tips would be great
UPDATE
So im using the second option suggested(shown below) by #ghoti as it matches my log file slightly better
however im not having any luck with multiple entries. ive added two lines to illustratre the results im getting these are echo $i and echo "finish loop" as placed they should tell me what input the loop is currently on and that im leaving the loop
'read -a details
re=""
for i in "${details[#]}"; do
re="$re${re:+|}$i"
echo $i
echo"finish loop"
done
awk -v re="$re" '$5 ~ re' "$FILE" `
When i give read an input of either "acpid" or "init" seperately a perfect result is matched, however when the input is "acpid init" the following is the output
acpid init
finish loop
What im seeing from this is that the read is taking the both words as one entry and then the awk is searching but not matching them (as would be expected). so why is the input not being taken as two separate entries i had thought the -a option with read specified that words separated by a space would be placed into separate elements of the array. perhaps i have not declared the array correctly?
Update update
ok cancel the above update like i fool i'd forgotten that id chaged IFS to \n earlier in the script changed it back and bingo !!!
Many thanks again to #ghoti for his help!!
There are a few ways that you could do what you want.
One option might be to run through a for loop for each word, then apply a different call to awk, and show the results sequentially. For example, if you entered foo bar into the $details variable, you might get a list of foo matches, followed by a list of bar matches:
read -a details
for i in "${details[#]}"; do
awk -v s="$i" '$5 == s' "$FILE"
done
The idea here is that we use awk's -v option to get each word into the script, rather than expanding the variable inside the quoted script. You should read about how bash deals with different kinds of quotes. (There are also a few Stackoverflow questions on the topic, here and here and elsewhere.)
Another option might be to construct a regular expression that searches for all the words you're interested in, all at once. This has the benefit of using a single run of awk to search through $FILE:
read -a details
re=""
for i in "${details[#]}"; do
re="$re${re:+|}$i"
done
awk -v re="$re" '$5 ~ re' "$FILE"
The result will contain all the interesting lines from $FILE in the order in which they appear in $FILE, rather than ordered by the words you provided.
Note that this is a fairly rudimentary search, without word boundaries, so if you search for "foo bar babar", you may get results you don't want. You can play with the regex yourself, though. :)
Does that answer your question?

How can I remove text at beginning of a file using a regex?

I have a bunch of files that contain a semi-standard header. That is, the look of it is very similar but the text changes somewhat.
I want to remove this header from all of the files.
From looking at the files, I know that what I want to remove is encapsulated between similar words.
So, for instance, I have:
Foo bar...some text here...
more text
Foo bar...I want to keep everything after this point
I tried this command in perl:
perl -pi -e "s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
But it doesn't work. I'm not a regex expert but hoping someone knows how to basically remove a chunk of text from the beginning of a file based on a text match and not the number of characters...
By default, ARGV (aka <> which is used behind-the-scenes by -p) only reads a single line at a time.
Workarounds:
Unset $/, which tells Perl to read a whole file at a time.
perl -pi -e "BEGIN{undef$/}s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
BEGIN is necessary to have that code run before the first read is done.
Use -0, which sets $/ = "\0".
perl -pi -0 -e "s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
Take advantage of the flip-flop operator.
perl -ni -e "print unless 1 ... /^Foo.bar/'
This will skip printing starting from line 1 to /^Foo.bar/.
If your header stretches across more than one line you must tell perl how much to read. If the files are small in comparison to memory you may want to just slurp the whole file into memory:
perl -0777pi.orig -e 's/your regex/your replace/s' file1 file2 file3
The -0777 option sets perl to slurp mode, so $_ will hold the each whole file each time through the loop. Also, always remember to set the backup extension. If you don't you may find that you have wiped out your data accidentally and have no way to get it back. See perldoc perlrun for more information.
Given information from the comments, it looks like you are trying to strip all of the annoying stuff from the front of a Project Gutenberg ebook. If you understand all of the copyright issues involved, you should be able to get rid of the front matter like this:
perl -ni.orig -e 'print unless 1 .. /^\*END/' 00ws110.txt
The Project Gutenberg header ends with
*END*THE SMALL PRINT! FOR PUBLIC DOMAIN ETEXTS*Ver.04.29.93*END*
A safer regex would take into account the *END* at the end of the line as well, but I am lazy.
I might be misinterpreting what you're asking for, but it looks to me that simple:
perl -ni -e 'print unless 1..($. > 1 && /^Foo bar/)'
Here you go! This replaces the first line of the file:
use Tie::File;
tie my #array,"Tie::File","path_to_file" or die("can't tie the file");
$array[0] =~s/text_i_want_to_replace/replacement_text/gi;
untie #array;
You can operate on the array and you will see the modifications in the array. You can delete elements from the array and it will erase the line from the file. Applying substitution on elements will substitute text from the lines.
If you want to delete the first two lines, and keep something from the third, you can do something like this :
# tie the #array before this
shift #array;
shift #array;
$array[0]=~s/foo bar\.\.\.//gi;
# untie the #array
and this will do exactly what you need!