I have an API that returns the values for n rows in a csv file.
If the table is 100 "rows" (for lack of a better term) and I get the values in every row, it takes x seconds and has a size of y KB. If the table is 100 rows and I get the values for half the rows, it takes ~x/2 seconds and has a size of ~y/2.This makes sense.
The problem comes when I get to the real size of the file -- millions of rows. It's not giving me the correct values. The sizes and times don't pass the smell test (they're too low), and they're the same.The values in the entire table, half the table, whatever -- the time it takes to return the values are the same and the size of what's being returned is the same.
I have a suspicion that it's the printed response. What... it's gonna print four million rows in that wee little panel at the bottom of the screen?
Is there a way to just get the metadata (status, time, size) and not print the response? A setting or something? That's all I need anyways.
(PS: Response is JSON if that matters)
Related
I have a complex survey with numerous skip logic rules that ends up returning over 3 dozen columns of mostly empty data with only certain questions applicable to each respondent's submission. I tried creating a column at the end of the columns to grab any cell in that row that was not blank and concatenating them all into one cell:
=ifna(textjoin("|",true,filter($A$2:$AO$2&"_"&A3:AO3,A3:AO3>0)))
This yielded me one cell per row with everything I needed - including the column headers so I could parse the data (without all the blanks) by looking only at that one column.
However, each time a new response comes in, it shifts all the data down so I am constantly needing to go in and add the formula to new responses. I tried moving the formula to another tab completely:
=ifna(textjoin("|",true,filter(Eureka!$A$2:$AO$2&"_"&Eureka!A3:AO3,Eureka!A3:AO3>0)))
This formula also will not correct itself once new data appears on the Eureka tab. So I filled that formula down in one long column...it works perfectly on any response up to that point. Then when a new response comes in (at row 274 as an example), all of the formulas below row 274 automatically add a row to the references. So that if my formula in row 274 has ranges like this: A274:AO274...once a response comes in on row 275, my formula on row 275 has jumped up by one like this: A276:AO276 (to 298 or 343...depending on the number of new responses.
So I want to make my formula act as an arrayformula:
=ifna(arrayformula(textjoin("|",true,filter(Eureka!$A$2:$AO$2&"_"&Eureka!A3:AO,Eureka!A3:AO>0))))
but textjoin only works on either rows or columns, so this keeps giving me an error.
I think I need to use MAP/LAMBDA possibly or some kind of REPT, but I just can't seem to crack it.
And in full disclosure, my ultimate goal would be to actually have each question returned on its own row so that the first two columns get repeated for every question vertically. But I think once I get the original question addressed, I can figure out how to do that.
TEXTJOIN in arrayformula?
The following formula should produce the result you desire:
=BYROW(BYCOL(FILTER(Eureka!A2:AO,Eureka!A2:A <> ""),LAMBDA(col, ARRAYFORMULA(CONCAT(ARRAYFORMULA(IF(ISBLANK(FILTER(col,{FALSE;TRANSPOSE(SPLIT(REPT(TRUE&CHAR(127),ROWS(col)-1),CHAR(127),TRUE,TRUE))})),,ARRAY_CONSTRAIN(col,1,1)&"_")),FILTER(col,{FALSE;TRANSPOSE(SPLIT(REPT(TRUE&CHAR(127),ROWS(col)-1),CHAR(127),TRUE,TRUE))}))))),LAMBDA(row,TEXTJOIN("|",true,row)))
I am trying to simplify a table that shows the amount of time that people are working on certain jobs and wanting to present the dataset in a table that only shows the values greater than zero.
The image below shows how the table currently looks, where each person has a % of their time allocated to 1 of 5 jobs across columns.
I am trying to create a table that looks like the below, where it only shows the jobs that each person is working on, and excludes the ones where they have no % of their time allocated.
Wondering if I am going about this in the wrong fashion, any help greatly appreciated!
Thanks
I have been tryin to use an index match function with some if logic for values greater than zero but have been only able to get the first value greater than zero to populate.
I create a standard RRDTool database with a default step of 5mn (300s).
I have different types of values in it, some gauges which are easily processed, but I have other values I would have in COUNTER but here is my problem :
I read the data in a program, and get the difference between values over two steps is good but the counter increment less than time (It can increment by less than 300 during a step), so my out value is wrong.
Is it possible to change the COUNTER for not be a number by second but by step or something like that, if it's not I suppose I have to calculate the difference in my program ?
Thank you for helping.
RRDTool is capable of handling fractional values, so there is no problem if the counter increments by less than the seconds interval since the last update.
RRDTool stores everything as a Rate. If your DS is of type GAUGE, then RRDTool assumes that the incoming value is alreayd a rate, and only applies Data Normalisation (more on this later). If the type is COUNTER or DERIVE, then the value/timepoint you are updating with is compared to the previous value/timepoint to obtain a rate thus: r=(x2 - x1)/(t2 - t1). The rate obtained is then Normalised. The other DS type is ABSOLUTE, which assumes the counter was reset on the last read, giving r=x2/(t2 - t1).
The Normalisation step adjusts the data point based on assuming a linear progression from the last data point so that it lies exactly on an interval boundary. For example, if your step is 5min, and you update at 12:06, the data point is adjusted back to what it would have been at 12:05, and stored against 12:05. However the last unadjusted DP is still preserved for use at the next update, so that overall rates are correct.
So, if you have a 300s (5min) interval, and the value increased by 150, the rate stored will be 0.5.
If the value you are graphing is something small, e.g. 'number of pages printed', this might seem counterintuitive, but it works well for large rates such as network traffic counters (which is what RRDTool was designed for).
If you really do not want to display fractional values in the generated graphs or output, then you can use a format string such as %.0f to enforce no decimal places and the displayed number will be rounded to the nearest integer.
I am trying to create COMMAND JSON datasource to monitor some values, for example from such script:
print json.dumps({
'values': {
'': {'random': random()},
},
'events': []
})
And when i just starting zencommand, appropriate rrd file is created, but cur, avg and max values on graph shows me NaN. That NaNs is replaced by actual numbers when I zoom in to a current point in time, which is not very far from start of monitoring.
Why it don't show correct min, max and avg values before I zoom in? Is that somehow related to consolidation? I read http://www.vandenbogaerdt.nl/rrdtool/min-avg-max.php, but that page don't tell anything about NaN values.
And is any way to quicker zoom in to the current timestamp to see some data faster?
When you are zoomed out, you'll be looking at the lower-granularity RRAs (Round Robin Archives). These do not get populated until enough data are in the higher-granularity ones; so, for example, if you have a 5min-granularity RRA, a 1hr-granularity RRA, and a 1day-granularity RRA, and have collected data for the last 45min, then you will see ~8 data points in your 'daily' graph (which uses the 5min RRA), but nothing in your 'monthly' (which will use the 1hr RRA) or your 'yearly' (which uses the 1day RRA).
This applies to any RRA; AVG, LAST, MAX, etc. Until the consolidated time window is complete, and the full complement of Primary Data Points has been collected for consolidation, the consolidated data point value is undefined.
RRDTool picks the RRA to use based on the requested graph data width and pixel width, as well as the requested consolidation functions. Although there are ways to force RRDtool to use a higher-granularity RRA than it needs to, and to consolidate on the fly, this is inefficient and slow. It also makes having the lower-granularity RRA pointless and throws away one of the major benefits of RRDtool (that it performs consolidation at update time making graphing faster)
I have been querying Geonames for parks per state. Mostly there are under 1000 parks per state, but I just queried Conneticut, and there are just under 1200 parks there.
I already got the 1-1000 with this query:
http://api.geonames.org/search?featureCode=PRK&username=demo&country=US&style=full&adminCode1=CT&maxRows=1000
But increasing the maxRows to 1200 gives an error that I am querying for too many at once. Is there a way to query for rows 1000-1200 ?
I don't really see how to do it with their API.
Thanks!
You should be using the startRow parameter in the query to page results. The documentation notes that it takes an integer value (0 based indexing) and should be
Used for paging results. If you want to get results 30 to 40, use startRow=30 and maxRows=10. Default is 0.
So to get the next 1000 data points (1000-1999), you should change your query to
http://api.geonames.org/search?featureCode=PRK&username=demo&country=US&style=full&adminCode1=CT&maxRows=1000&startRow=1000
I'd suggest reducing the maxRows to something manageable as well - something that will put less of a load on their servers and make for quicker responses to your queries.