Does syntax of bash variables change when being passed to bc? - bc

I noticed a solution on Codewars which had the following syntax:
#!/bin/bash
seven () {
bc <<< "
scale=0
counter=0
m=$1
while( m > 99 ) {
counter = counter + 1
x = m / 10
y = m % 10
m = x - 2 * y
}
print m, \", \", counter
"
}
seven "$1"
My question is regarding the variables used(m,x,counter). How is it that bash allows to use variables without using $variable_name?
Are there special cases (such as wraping code with double-quotes) that allow for this?

These are not bash variables but bc variables.
The <<< operator introduces a "Here String" (see man bash), the following word undergoes expansions except for pathname expansion and word splitting and is sent to the standard input of the command, bc in this case.
You can include a program for any other interpreter this way, i.e.
python3 <<< 'x="Hello world!"
print(x)'
or
dc <<< '
100 3 /
p'
Also note that bash uses $x or ${x}, not $(x) (that would run the command x and return its output). $(x) for the variable x is used in Makefiles, though.

There's virtually no bash code in that answer. It's a shell function that does nothing but run bc. Everything in the here string is a bc script, which has nothing to do with bash.
As far as bash is concerned, the here string doesn't contain any variables or any discernible structure: it's just opaque text that it will feed to bc's standard input.

Related

awk unix - match regex - regex string size limit | ideas?

The following code works as a minimal example. It searches a regular expression with one mismatch inside a text (later a large DNA file).
awk 'BEGIN{print match("CTGGGTCATTAAATCGTTAGC...", /.ATC|A.TC|AA.C|AAT./)}'
Later I am interested in the position where the regular expression is found. Therefore the awk command is more complex. Like it is solved here
If I want to search with more mismatches and a longer string I will come up with very long regex expressions:
example: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" with 3 mismatches "." allowed:
/
...AAAAAAAAAAAAAAAAAAAAAAAAAAA|
..A.AAAAAAAAAAAAAAAAAAAAAAAAAA|
..AA.AAAAAAAAAAAAAAAAAAAAAAAAA|
-
- and so on. (actually 4060 possibilities)
/
The problem with my solution is:
very long regex will not be accepted by awk! (limit seems to be at roughly about 80.000 characters)
Error: "bash: /usr/bin/awk: Argument list too long"
possible solution: SO-Link but I don't find the solution...
My question is:
Can I somehow still use the long regex expression?
splitting the string and running the command multiple times could be a solution, but then I will get duplicated results.
Is there another way to approach this?
("agrep" will work, but not to find the positions)
As Jonathan Leffler points out in comments your issue in the first case (bash: /usr/bin/awk: Argument list too long) is from the shell and you can solve that by putting your awk script in a file.
As he also points out, your fundamental approach is not optimal. Below are two alternatives.
Perl has many features that will aid you with this.
You can use the ^ XOR operator on two strings that will return \x00 where the strings match and another character where they don't match. March through the longer string XORing against the shorter with a max substitution count and there you are:
use strict;
use warnings;
use 5.014;
my $seq = "CGCCCGAATCCAGAACGCATTCCCATATTTCGGGACCACTGGCCTCCACGGTACGGACGTCAATCAAAT";
my $pat = "AAAAAA";
my $max_subs = 3;
my $len_in = length $seq;
my $len_pat = length $pat;
my %posn;
sub strDiffMaxDelta {
my ( $s1, $s2, $maxDelta ) = #_;
# XOR the strings to find the count of differences
my $diffCount = () = ( $s1 ^ $s2 ) =~ /[^\x00]/g;
return $diffCount <= $maxDelta;
}
for my $i ( 0 .. $len_in - $len_pat ) {
my $substr = substr $seq, $i, $len_pat;
# save position if there is a match up to $max_subs substitutions
$posn{$i} = $substr if strDiffMaxDelta( $pat, $substr, $max_subs );
}
say "$_ => $posn{$_}" for sort { $a <=> $b } keys %posn;
Running this prints:
6 => AATCCA
9 => CCAGAA
10 => CAGAAC
11 => AGAACG
13 => AACGCA
60 => CAATCA
61 => AATCAA
62 => ATCAAA
63 => TCAAAT
Substituting:
$seq=AAATCGAAAAGCDFAAAACGT;
$pat=AATC;
$max_subs=1;
Prints:
1 => AATC
8 => AAGC
15 => AAAC
It is also easy (in the same style as awk) to convert this to 'magic input' from either stdin or a file.
You can also write a similar approach in awk:
echo "AAATCGAAAAGCDFAAAACGT" | awk -v mc=1 -v seq="AATC" '
{
for(i=1; i<=length($1)-length(seq)+1; i++) {
cnt=0
for(j=1;j<=length(seq); j++)
if(substr($1,i+j-1,1)!=substr(seq,j,1)) cnt++
if (cnt<=mc) print i-1 " => " substr($1,i, length(seq))
}
}'
Prints:
1 => AATC
8 => AAGC
15 => AAAC
And the same result with the longer example above. Since the input is moved to STDIN (or a file) and the regex does not need to be HUGE, this should get you started either with Perl or Awk.
(Be aware that the first character of a string is offset 1 in awk and offset 0 in Perl...)
The "Argument list too long" problem is not from Awk. You're running into the operating system's memory size limit on the argument material that can be passed to a child process. You're passing the Awk program to Awk as a very large command line argument.
Don't do that; put the code into a file, and run it with awk -f file, or make the file executable and put a #!/usr/bin/awk -f or similar hash-bang line at the top.
That said, it's probably not such such great idea to include your data in the program source code as a giant literal.
Is there another way to approach this?
Looking for fuzzy matches is easy with Python. You just need to install the PyPi regex module by running the following in the terminal:
pip install regex # or pip3 install regex
and then create the Python script (named, say, script.py) like
#!/usr/bin/env python3
import regex
filepath = r'myfile.txt'
with open(filepath, 'r') as file:
for line in file:
for x in regex.finditer(r"(?:AATC){s<=1}", line):
print(f'{x.start()}:{x.group()}')
Use the pattern you want, here, (?e)(?:AATC){s<=1} means you want to match AATC char sequence allowing one substitution at most in the match, with (?e) attempting to find a better fit.
Run the script using python3 script.py.
If myfile.txt contains just one AAATCGAAAAGCDFAAAACGT line, the output is
1:AATC
8:AAGC
15:AAAC
meaning that there are three matches at positions 1 (AATC), 8 (AAGC) and 15 (AAAC).
You can get the values themselves by replacing x.start() with x.group() in the Python script.
See an online Python demo:
import regex
line='AAATCGAAAAGCDFAAAACGT'
for x in regex.finditer(r"(?:AATC){s<=1}", line):
print(f'{x.start()}:{x.group()}')

BASH: Search a string and exactly display the exact number of times a substring happens inside it

I've searched all over and still cant find this simple answer. I'm sure its so easy. Please help if you know how to accomplish this.
sample.txt is:
AAAAA
I want to find the exact times the combination "AAA" happens. If you just use for example
grep -o 'AAA' sample.txt | wc -l
We receive a 1. This is the same as just searching the number of times AAA happens from with a standard text editor search box type search. However, I want the complete number of matches exactly, starting from each individual character which is exactly 3. We get this when we search from each character individually instead of treating each AAA hit like a box type block.
I am looking for the most squeezed in/most possibilities/literal exact number of occurences starting from every individual character of "AAA" in sample.txt, not just blocks of every time it finds it like it does in a normal text editor type search from the search box.
How do we accomplish this, preferrably in AWK? SED, GREP and anything else is fine as well as I can include in a Bash script.
This might work for you (GNU sed & wc):
sed -r 's/^[^A]*(AA?[^A]+)*AAA/AAA\nAA/;/^AAA/P;D' | wc -l
Lose any characters other than A's, and single or double A's.Then print a triple A and lose the first A and repeat. Finally count the number of lines printed.
This isn't a trivial problem in bash. As far as I know, standard utils don't support this kind of searching. You can however use standard bash features to implement this behavior in a function. Here's how I would attack the problem, but there are other ways:
#!/bin/bash
search_term="AAA"
text=$(cat sample.txt)
term_len=${#search_term}
occurences=0
# While the text is greater than or equal to the search term length
while [ "${#text}" -ge "$term_len" ]; do
# Look at just the length of the search term
text_substr=${text:0:${term_len}}
# If we see the search term, increment occurences
if [ "$text_substr" = "$search_term" ]; then
((occurences++))
fi
# Remove the first character from the main text
# (e.g. "AAAAA" becomes "AAAA")
text=${text:1}
done
printf "%d occurences of %s\n" "$occurences" "$search_term"
This is the awk version
echo "AAAAA AAA AAAABBAAA" \
| gawk -v pat="AAA" '{
for(i=1; i<=NF; i++){
# current field length
m=length($i)
#search pattern length
n=length(pat)
for(l=1 ; l<m; l++){
sstr=substr($i,l,n)
#print i " " $i " sub:" sstr
# substring matches pattern
if(sstr ~ pat){
count++
}else{
print "contiguous count on field " i " = " count
# uncomment next line if non-contiguous matches are not needed
#break
}
}
print "total count on field " i " = " count
count=0
}
}'
I posted this on OP's another post, but it was ignored maybe because I did not add notes and explanation. Just a different approach and any discussions are welcome.
$ awk -v sample="$(<sample.txt)" '{ x=sample; n=0 }$0 != ""{
while(t=index(x,$0)){ n++; x=substr(x,t+1) }
print $0,n
}' combinations
Explanation:
The variables:
sample: is the raw sample text slurp in from the file sample.txt with the -v argument
x: is the targeting string, before each test, the value is reset to sample
$0: is the testing string from the file combination, each line feeds a testing string
n: is the counter, number of occurences of the testing string($0)
t: is the position of the first character of the matched testing string($0) in the targeting string(x)
Update: Added $0 != "" before the main while loop to skip EMPTY strings which lead to unlimited loop.
The code:
awk -v sample="$(<sample.txt)" '
# reset the targeting string(with the sample text) and the counter "n"
{ x = sample; n = 0 }
# below the main block where $0 != "" to skip the EMPTY testing string
($0 != ""){
# the function index(x, $0) returns the position(assigned to "t") of the first character
# of the matched testing string($0) in the targeting string(x).
# when no match is found, it returns zero and thus step out of the while loop.
while(t=index(x,$0)) {
n++; # increment the number of matches
x = substr(x, t+1) # modify the targeting string to remove all characters before the position(t) inclusively
}
print $0, n # print the testing string and the counts
}
' combinations
awk index() is a function much faster than regex matches and it does not need the expensive string comparisons in a brute-force way. attached the tested sample.txt and combinations:
$ more sample.txt
AAAAAHHHAAHH
HAAAAHHHAAHH
AAHH
$ more combinations
AA
HH
AAA
HHH
AAH
HHA
ZK
Tested Environment: GNU Awk 4.0.2, Centos 7.3

sed - replace a variable of power N by the product of N variables

from sed replace a variable of power by the product of two variables, I would like to generalize the "power 2" case to "N power case".
The command line in the "power 2" case is :
sed 's/\([^(*+\/^-]*\(([^)]*)\)\?\)\^2/\1\*\1/g'
So that
cos(2*a)^2+sin(3*b)^2+m1^2*m2^2*cos(4*c)
is replaced by :
cos(2*a)*cos(2*a)+sin(3*b)*sin(3*b)+m1*m1*m2*m2*cos(4*c)
Now, I want to transform :
cos(a)^3 +m1^4
to
cos(a)*cos(a)*cos(a)+m1*m1*m1*m1
Is there a way to store the exponent "N" and print N times the variable powered with star symbol ?
It would be something like this (we store the exponent in pattern \2) :
sed 's/\([^(*+\/^-]*\(([^)]*)\)\?\)\^\([0-9]*\)/
"print N times (pattern \2) factors \1"
\1\*\1*\1*\1 /g'
If somenone has a solution with other tools (other Linux commands), I take.
#!/usr/bin/awk -f
BEGIN {
RS = "[ \n+]"
FS = "^"
OFS = "*"
}
{
z = $2
for (y=2; y<=z; y++)
$y = $1
printf "%s%s", $0, RT
}
Input
sin(b)
cos(a)^3 +m1^4
tan(c)
Output
sin(b)
cos(a)*cos(a)*cos(a) +m1*m1*m1*m1
tan(c)
echo "cos(a)^3 +m1^4" | sed '
# encapsulate between +
s/.*/+&+/
:a
# For each power object
/\^/!b end
# isolate power object
h
s#\(.*[-+/*^]\)\([^-+/*^]*\)^\([0-9]\{1,2\}\)\(.*\)#\1\
\2\
\4#
# isolate power value to convert it in useable reproducing factor
x
s//00\3/
s/\(.\)\{0,1\}\(.\)\{0,1\}\(.\)\{0,1\}/\1C\2D\3/
s/0.//g;s/9/18/g;s/8/17/g;s/7/16/g;s/6/15/g;s/5/14/g;s/4/13/g;s/3/12/g;s/2/11/g;s/1/U/g
:cdu
s/1\(1*\)\([^1]\)/\2\1\2/g;t cdu
s/C/DDDDDDDDDD/g;s/D/UUUUUUUUUU/g
# dont replicate power 1
s/U//
# we got the replication number
# replicate
G
:repl
s/^U\(U*\n.*\n\)\(.*\)\(\n\)\(.*\)/\1\2\3*\2\4/;t repl
# reassemble
s/\n//g
b a
:end
# remove extra +
s/.\(.*\)./\1/
'
Let's be crazy in this mad world (I don't recommend this in production, especially for maintenance and modification)
Limited to power: positive, integer and smaller than 999
Some comment in code but not exhaustive (a bit long)
Tested on GNU sed but POSIX compliant
Recursive process so nearly any simple powered argument between +-*/^ should work up to the limit of sed itself.

Regex Matching for Bash

I have potential inputs that will come in from a read -e -p command in a bash script. For example, the user would type L50CA. Some other possibilites that the user could type in are: K117CB, K46CE2, or V9CE1.
I need to break up what was read in. I read in like this:
read -e -p "What first atom? " sel1
then I would like to make an array like this (but this will not separate):
arr1=($sel1)
But I need to separate the array so that
${arr1[0]} is equal to L
${arr1[1]} is equal to 50
and ${arr1[2]} is equal to CA
This separation has to work with the other possible user input formats like the ones listed above. Regex seems to be the way to do this. I can isolate the first two matches of the input with the following regular expressions: ^\D and \d*(?=\w)
I need help matching the third component and implementing it into an array. Alternatively, it is fine to to break up the user input into three new variables. Or we can place a space between each of the matches so L50CA is converted to L 50 CA because then arr1=($sel1) will work.
Thanks for your help.
Bash only solution:
for sel in L50CA K117CB K46CE2 V9CE1; do
[[ "$sel" =~ "^(\w)([0-9]+)(.*)" ]]
printf '%s - ' "${BASH_REMATCH[#]}"
printf \\n;
done
The
for sel in L50CA K117CB K46CE2 V9CE1
do
arr=($(sed 's/\([0-9][0-9]*\)/ \1 /g'<<<"$sel"))
echo "${arr[#]}"
done
prints
L 50 CA
K 117 CB
K 46 CE 2
V 9 CE 1
In bash using string manipulation:
~$ sel1=L50CA
~$ part1=$(expr match $sel1 "\([A-Z]\+\).*")
~$ part2=$(expr match $sel1 "[A-Z]*\([0-9]\+\).*")
~$ part3=$(expr match $sel1 "[A-Z]*[0-9]*\([A-Z]*\)")
~$ echo $part{1,2,3}
L 50 CA
~$ arr=($part{1,2,3})
~$ echo ${arr[#]}
L 50 CA

unix regex for adding contents in a file

i have contents in a file
like
asdfb ... 1
adfsdf ... 2
sdfdf .. 3
I want to write a unix command that should be able to add 1 + 2 + 3 and give the result as 6
From what I am aware grep and awk would be handy, any pointers would help.
I believe the following is what you're looking for. It will sum up the last field in each record for the data that is read from stdin.
awk '{ sum += $NF } END { print sum }' < file.txt
Some things to note:
With awk you don't need to declare variables, they are willed into existence by assigning values to them.
The variable NF is the number of fields in the current record. By prepending it with a $ we are treating its value as a variable. At least this is how it appears to work anyway :)
The END { } block is only once all records have been processed by the other blocks.
An awk script is all you need for that, since it has grep facilities built in as part of the language.
Let's say your actual file consists of:
asdfb zz 1
adfsdf yyy 2
sdfdf xx 3
and you want to sum the third column. You can use:
echo 'asdfb zz 1
adfsdf yyy 2
sdfdf xx 3' | awk '
BEGIN {s=0;}
{s = s + $3;}
END {print s;}'
The BEGIN clause is run before processing any lines, the END clause after processing all lines.
The other clause happens for every line but you can add more clauses to change the behavior based on all sorts of things (grep-py things).
This might not exactly be what you're looking for, but I wrote a quick Ruby script to accomplish your goal:
#!/usr/bin/env ruby
total = 0
while gets
total += $1.to_i if $_ =~ /([0-9]+)$/
end
puts total
Here's one in Perl.
$ cat foo.txt
asdfb ... 1
adfsdf ... 2
sdfdf .. 3
$ perl -a -n -E '$total += $F[2]; END { say $total }' foo
6
Golfed version:
perl -anE'END{say$n}$n+=$F[2]' foo
6