Is The Flow Chart of the program correct? - flowchart

Calculate one child’s allowance, based upon 75 cents per year old if s/he is under 10, and $1 per year if 10 or over.
I could have stated them at the beginning as well but is the use of the rate variables after the decision box correct?

Yes, it's functionally correct.
You could get rid of some of the sections if you wish by just using one variable to store the rate (rather than rate1 / rate2), and then just using the decision block to set the variable differently. You then wouldn't need to duplicate the last two statements.

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

Stata: estimating monthly weighted mean for portfolio

I have been struggling to write optimal code to estimate monthly, weighted mean for portfolio returns.
I have following variables:
firm stock returns (ret)
month1, year1 and date
portfolio (port1): this defines portfolio of the firm stock returns
market capitalisation (mcap): to estimate weights (by month1 year1 port1)
I want to calculate weighted returns for each month and portfolio weighted by market cap. (mcap) of each firm.
I have written following code which works without fail but takes ages and is highly inefficient:
foreach x in 11 12 13 21 22 23 {
display `x'
forvalues y = 1980/2010 {
display `y'
forvalues m = 1/12 {
display `m'
tempvar tmp_wt tmp_tm tmp_p
egen `tmp_tm' = total(mcap) if month1==`m' & year1==`y' & port1 ==`x'
gen `tmp_wt' = mcap/`tmp_tm' if month1==`m' & year1==`y' & port1 ==`x'
gen `tmp_p' = ret*`tmp_wt' if month1==`m' & year1==`y' & port1 ==`x'
gen port_ret_`m'_`y'_`x' = `tmp_p'
}
}
}
Data looks as shown in the image:![Data for value weighted portfolio return][1]
This does appear to be a casebook example of how to do things as slowly as possible, except that naturally you are not doing that on purpose. All it lacks is a loop over observations to calculate totals. So, the good news is that you should indeed be able to speed this up.
It seems to boil down to
gen double wanted = .
bysort port1 year month : replace wanted = sum(mcap)
by port1 year month : replace wanted = (mcap * ret) / wanted[_N]
Principle. To get a sum in a single scalar, use summarize, meanonly rather than using egen, total() to put that scalar into a variable repeatedly, but use sum() with by: to get group sums into a variable when that is what you need, as here. sum() returns cumulative sums, so you want the last value of the cumulative sum.
Principle. Loops (here using foreach) are not needed when a groupwise calculation can be done under the aegis of by:. That is a powerful construct which Stata programmers need to learn.
Principle. Creating lots of temporary variables, here 6 * 31 * 12 * 3 = 6696 of them, is going to slow things down and use more memory than is needed. Each time you execute tempvar and follow with generate commands, there are three more temporary variables, all the size of a column in a dataset (that's what a variable is in Stata), but once they are used they are just left in memory and never looked at again. It's a subtlety with temporary variables that a tempvar assigns a new name every time, but it should be clear that generate creates a new variable every time; generate will never overwrite an existing variable. The temporary variables would all be dropped at the end of a program, but by the end of that program, you are holding a lot of stuff unnecessarily, possibly the size of the dataset multiplied by about one thousand. If that temporarily expanded dataset could not all fit in memory, you flip Stata into a crawl.
Principle. Using if obliges Stata to check each observation in turn; in this case most are irrelevant to the particular intersection of loops being executed and you make Stata check almost all of the data set (a fraction of 2231/2232, almost 1) irrelevantly while doing each particular calculation for 1/2232 of the dataset. If you have more years, or more portfolios, the fraction looked at irrelevantly is even higher.
In essence, Stata will obey your instructions (and not try any kind of optimization -- your code is interpreted utterly literally) but by: would give the cross-combinations much more rapidly.
Note. I don't know how big or how close to zero these numbers will get, so I gave you a double. For all I know, a float would work fine for you.
Comment. I guess you are being influenced by coding experience in other languages where creating variables means something akin to x = 42 to hold a constant. You could do that in Stata too, with scalars or local or global macros, not to mention Mata. Remember that a new variable in Stata is an entire new column in the dataset, regardless of whether it is holding a constant or different values in each observation. You will get what you ask for, but it is more like getting an array every time. Again, it seems that you want as an end result just one new variable, and you do not in fact need to create any others temporarily at all.

How do I "fill down" a dataset in Stata, but for only a certain number of rows?

I have a dataset with observations at specific timepoints, but those timepoints (and the length of time between them) vary by group. I'm trying to "fill down" the data so that existing observations are carried down into missing cells. But I only want to do this for a certain number of rows after the original observation. So for example, I could have a dataset that looks like this:
For group A, I'd want to fill in the value for 2002 with 2001's value, 2004 with 2003, etc. I wouldn't want to fill in 2000 at all, since I don't have the preceding value. And I ALSO wouldn't want to fill in the 2011 value, because the "cyclelength" variable tells me that group A's observations are supposed to take place every two years, so I don't want to carry data forward past that. 2011 is just a genuinely missing value.
Similarly, in group B, I'd want to carry 2000's value forward into years 2001, 2002, and 2003 (because the "cyclelength" here is 4 years). I'd want to carry 2004's value into 2005, 2006, and 2007, but not beyond that--the later years should stay missing.
I've tried setting this up with the "carryforward" command, but haven't figured out how to have it stop filling down after a specified number of years that varies by group. Is there a way to do this, either with carryforward or otherwise?
This is a variation on a problem documented since 2000 as an FAQ: see here
The variation lies in limiting how far non-missing values are copied. But it falls easily to the same idea.
The last known value was recorded in certain years which we can copy down the dataset:
gen when_last_known = year if !missing(value)
bysort group (year) : replace when_last_known = when_last_known[_n-1] if missing(when_last_known)
Now the replacement wanted is
by group : replace value = value[_n-1] if missing(value) & (year - when_last_known) < cyclelength
That statement presupposes the sort order of the previous statement.
On Statalist (see here) you'd be expected to document that carryforward is a user-written command to be installed from SSC. That's a good convention here too.
In practice, it's good data management to keep the original data exactly as they arrive and do this on a clone of the variable. Sooner or later someone will ask to see the original values, and then you could be seriously embarrassed.

How to record total values with rrdtool

I'm pretty sure this question has been asked several times, but either I did not find the correct answer or I didn't understand the solution.
To my current problem:
I have a sensor which measures the time a motor is running.
The sensor is reset after reading.
I'm not interested in the time the motor was running the last five minutes.
I'm more interested in how long the motor was running from the very beginning (or from the last reset).
When storing the values in an rrd, depending on the aggregate function, several values are recorded.
When working with GAUGE, the value read is 3000 (10th seconds) every five minutes.
When working with ABSOLUTE, the value is 10 every five minutes.
But what I would like to get is something like:
3000 after the first 5 minutes
6000 after the next 5 minutes (last value + 3000)
9000 after another 5 minutes (last value + 3000)
The accuracy of the older values (and slopes) is not so important, but the last value should reflect the time in seconds since the beginning as accurate as possible.
Is there a way to accomplish this?
I dont know if it is useful for ur need or not but maybe using TREND/TRENDNAN CDEF function is what u want, look at here:
TREND CDEF function
I now created a small SQLite database with one table and one column in that tabe.
The table has one row. I update that row every time my cron job runs and add the current value to the current value. So the current value of the one row and column is the cumualted value of my sensor. This is then fed into the rrd.
Any other (better) ideas?
The way that I'd tackle this (in Linux) is to write the value to a plain-text file and then use the value from that file for the RRDTool graph. I think that maybe using SQLite (or any other SQL server) just to keep track of this would be unnecessarily hard on a system just to keep track of something like this.

Most efficient way to process complex histogram data?

I'm currently implementing a histogram that will show a very large scale data using Qt and I have some doubts about which data structure(s) I should be using for my problem. I will be displaying the amount of queries received from users of an application and the way I should display is as follows -in a single application that will show different histograms upon clicking different "show me this data etc." buttons-
1) Display the histogram of total queries per every month -4 months of data here, I
kept four variables and incremented them as I caught queries belonging to those months
in the CSV file-
2) Display the histogram of total queries per every single day in a selected month -I was thinking of using 4 QVectors to represent the months for this one, incrementing every element of the vector (day), as I come by that specific day -e.g. the vector represents the month of August and whenever I come across a data with 2011-08-XY , I will increment the (XY + 1)th element of that vector by 1- my second alternative is to use 4 QLinkedList's for the sake of better complexity but I'm not sure if the ways I've come up with are efficient enough and I'm willing to listen to any other idea.
3) Here's where things get a bit complicated. Display the histogram of total queries per every hour in a selected day and month. The data represented is multiplied in a vast manner and I don't know which data structure -or combination of structures- I should use to implement this one. A list of lists perhaps?
Any ideas on my problems at 2) and 3) would be helpful, Thanks in advance.
Actually, it shouldn't be too unmanageable to always do queries per hour. Assuming that the number of queries per hour is never greater than the maximum int value, that's only 24 ints per day = 32 bits or 64 depending on your machine. Assuming 32 bits, then you could get up to 28 years worth of data per MB.
There's no need to transfer the month/year - your program can work that out. Just assign hour 0 to the earliest point in your data, which you keep as a constant, then work out the date based on hours passed since then.
This avoids having to have a list of lists or anything fancy - just use an array where each address contains the number of hours since hour 0, and the number of queries for that hour.
Why don't you simply use a classical database?
When you start asking these kind of question I think it is a good time to consider a more robust structure.There are multiple data structures implemented inside any DB, optimized either for different access type. You should considerate at least lookup, insertion, deletion, range queries. There is no structure which is better than the others in all costs, so there is always a trade-off.
Qt has some database classes you can use. I never used the Qt SQL library, but I think you should give it a shot. Fortunately, there is a Qt SQL programming guide at the end of the page linked.