Difference between 'bc' and 'bc -l' in Bash when I try to find modulus of a number [closed] - bc

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Why does Unix give the result 0 when I execute the following command?
echo "7%2" | bc -l
And give result 1 when I execute the following command?
echo "7%2" | bc

why does Unix gives result 0 when I execute the command: echo "7%2" | bc -l
From the bc manual:
If bc is invoked with the -l option, a math library is preloaded and the default scale is set to 20.
And
expr % expr
The result of the expression is the "remainder" and it
is computed in the following way. To compute a%b,
first a/b is
computed to scale digits. That result is used to compute a-(a/b)*b
to the scale of the maximum of scale+scale(b) and scale(a). If
scale is set to zero and both expressions are integers this
expression is the integer remainder function.
So:
a=7 b=2
a/b = 7 / 2 = 3.50000000000000000000000000000000000000000000000000
7%2 = a-(a/b)*b =
= 7 - (7/2)*2 =
= 7 - (3.50000000000000000000000000000000000000000000000000) * 2 =
= 7 - 7 =
= 0
and gives result 1 when I execute the command: echo "7%2" | bc
From the bc manual:
scale defines how some operations use digits after the decimal point. The default value of scale is 0.
In that case:
a=7 b=2
a/b = 7 / 2 = 3 # scale is 0, rounds down
a%b = a-(a/b)*b =
= 7 - (7/2)*2 =
= 7 - 3 * 2 =
= 7 - 6 =
= 1
Because the 7/2 is computed with different scale, the resulting expression differs.

Related

combining multiple items to create one dummy variable

I have 7 items/variables in Stata that address the same survey question. These 7 items are each different weight control behaviors (diet, exercise, pills, etc.). I am trying to combine these variables to create a single weight control behavior dummy variable that is coded as yes (did engage in weight control) and no (did not engage in weight control).
The response options for each variable look something like this for a given weight control behavior
dieted
11438 0 not marked
2771 1 marked
16 6 refused
6508 7 legitimate skip
13 8 don’t know
Here is my code. I re-coded 6,7,8 for all 7 vars as missing:
tab1 h1gh30a-h1gh30g,m`
foreach X of varlist h1gh30a-h1gh30g {
replace `X'=. if `X' > 1
}
egen wgt_control= rowmax(h1gh30a-h1gh30g)
ta wgt_control
gen wgt_control_new=wgt_control
replace wgt_control_new = 1 if wgt_control>0 & wgt_control!=.
replace wgt_control_new= 0 if wgt_control <1
ta wgt_control_new
I used rowmax() to combine all 7 items but my issue is that the response option 0 or No doesn't appear when I tabulate it. I only get those who responded yes=1.
Here is a suggestion with a reproducible example for what I think is the cleanest approach. I also included some unsolicited advice about survey data best practices
* Example generated by -dataex-. For more info, type help dataex
clear
input double(h1gh30a h1gh30b h1gh30c)
1 1 1
1 0 1
6 1 8
0 0 0
7 6 8
end
* Explicit coding is better, so if possible, which it is with 7 vars,
* create a local with the vars are explicitly listed
local wgt_controls h1gh30a h1gh30b h1gh30c
* Recode is a better command to use here. And do not destroy information,
* there is a survey data quality assurance difference between respondent
* refusing to answer, not knowing or question skipped. You can replace this
* survey codes with these extended missing values that behaves like missing values
* but retain the differences in the survey codes
recode `wgt_controls' (6=.a) (7=.b) (8=.c)
* While rowmax() could be used, I think it seems like anymatch() fits
* what you are trying to do better
egen wgt_control = anymatch(`wgt_controls'), values(1)
There is no minimal reproducible example here, so we can't reproduce the problem independently.
From your code, it seems that h1gh30a-h1gh30g are recoded so that all are 0, 1 or missing, so their maximum takes one of the same values.
gen wgt_control_new = wgt_control
replace wgt_control_new = 1 if wgt_control>0 & wgt_control!=.
replace wgt_control_new= 0 if wgt_control <1
seems to boil down to cloning the variable:
gen wgt_control_new = wgt_control
In short, I can't see a reason in your code why you should never see 0 as a possible result.
EDIT
A minimal check on whether there are zeros that aren't showing up as they should might be
egen max = rowmax(h1gh30a-h1gh30g)
list high30a-high30g if max == 0
```

How Python calculates % function can some one please explain 3%5

How Python calculates % function? can some one please explain 3%5 outcome as 3 in Python? Answer for 5%3 is also showing 3. I use python 2.7
The Python % operator isn't percentage, it's modulo. That means the remainder part of a division. Remember when you were a kid and your math problems would be like 11 divided by 3 = 3 R 2 (remainder 2)? That's what % does. 5 % 3 = 2.
If you want to calculate percentage, do that yourself like A * 100.0 / B.

Replace zeros with missing values in certain cases

I was wondering if anyone knew an easier way of doing the following:
I have a dataset of health facility caseload by year, where each observation is one health facility. Facilities were 'brought online' in different years, so some have zeros before they have values for caseload. Also, some 'discontinue', as in they did provide services, but don't any more. I would like to replace the zeros with missing values for the years in which a facility discontinued. In the following example, the 3rd and 4th facilities discontinued, so I'd like missing for y2014 for the 3rd and y2013 & y2014 for the 4th.
y2011 y2012 y2013 y2014
0 0 76 82
0 0 29 13
0 0 25 0
5 10 0 0
0 0 17 24
I tried the following, which worked, but I'm going to have many years worth of data to work on (2000-2014), so was wondering if there was a more efficient way.
replace y2014=. if y2014==0 & (y2013>0 | y2012>0 | y2011>0)
replace y2013=. if y2013==0 & ( y2012>0 | y2011>0)
replace y2012=. if y2012==0 & ( y2011>0)
I messed around with egen rowlast to identify the facilities with a zero in the last year (meaning they discontinued), but then wasn't sure where to go with it.
Your problem would benefit from a loop over the variables.
We'll initialise started to 0, change our mind about started when we see a positive value, and change any subsequent 0s to missings if started is 1.
gen started = 0
forval y = 2000/2014 {
replace started = 1 if y`y' > 0
replace y`y' = . if started == 1 & y`y' == 0
}
Note that this scheme allows re-starts.
A more general comment is that this is not the better data structure for such panel or longitudinal data. This particular problem is not too challenging, but most problems with such data will be easier after reshape long.
See here for a survey of "rowwise" technique in Stata.

Calculating the distance between characters

Problem: I have a large number of scanned documents that are linked to the wrong records in a database. Each image has the correct ID on it somewhere that says where it belongs in the db.
I.E. A DB row could be:
| user_id | img_id | img_loc |
| 1 | 1 | /img.jpg|
img.jpg would have the user_id (1) on the image somewhere.
Method/Solution: Loop through the database. Pull the image text in to a variable with OCR and check if user_id is found anywhere in the variable. If not, flag the record/image in a log, if so do nothing and move on.
My example is simple, in the real world I have a guarantee that user_id wouldn't accidentally show up on the wrong form (it is of a specific format that has its own significance)
Right now it is working. However, it is incredibly strict. If you've worked with OCR you understand how fickle it can be. Sometimes a 7 = 1 or a 9 = 7, etc. The result is a large number of false positives. Especially among images with low quality scans.
I've addressed some of the image quality issues with some processing on my side - increase image size, adjust the black/white threshold and had satisfying results. I'd like to add the ability for the prog to recognize, for example, that "81*7*23103" is not very far from "81*9*23103"
The only way I know how to do that is to check for strings >= to the length of what I'm looking for. Calculate the distance between each character, calc an average and give it a limit on what is a good average.
Some examples:
Ex 1
81723103 - Looking for this
81923103 - Found this
--------
00200000 - distances between characters
0 + 0 + 2 + 0 + 0 + 0 + 0 + 0 = 2
2/8 = .25 (pretty good match. 0 = perfect)
Ex 2
81723103 - Looking
81158988 - Found
--------
00635885 - distances
0 + 0 + 6 + 3 + 5 + 8 + 8 + 5 = 35
35/8 = 4.375 (Not a very good match. 9 = worst)
This way I can tell it "Flag the bottom 30% only" and dump anything with an average distance > 6.
I figure I'm reinventing the wheel and wanted to share this for feedback. I see a huge increase in run time and a performance hit doing all these string operations over what I'm currently doing.

Regex wheel size or digits after point [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
These are 2 samples:
Size: 15x6.5
Size: 15x7
I need a regex command to capture the digits before "x" and another regex command to capture digits after.
I want to obtain something like this:
Size: 15x6.5 --> 1) 15 2) 6.5
Size: 15x7 --> 1) 15 2) 7
Use regular expression: (\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)
You didn't specified the regular expression engine you are using.
Python
>>> import re
>>> matched = re.search(r'(\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)', 'Size: 15x6.5')
>>> matched.groups()
('15', '6.5')
>>> matched = re.search(r'(\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)', 'Size: 15x7')
>>> matched.groups()
('15', '7')
Ruby
>> 'Size: 15x6.5'.scan(/(\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)/)
=> [["15", "6.5"]]
>> 'Size: 15x7'.scan(/(\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)/)
=> [["15", "7"]]
Javascript
> 'Size: 15x6.5'.match(/(\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)/)
["15x6.5", "15", "6.5"]
> 'Size: 15x7'.match(/(\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)/)
["15x7", "15", "7"]
UPDATE
Use (\d+(?:\.\d+)?)(?=x) and (?<=x)(\d+(?:\.\d+)?)