How to extract time series with NCO - nco

I have a single netcdf file with monthly precipitation data from 1998-2016. The extent of the file is (14.75,-90.5,15.75,-88.75) with a cell size of 0.25 by 0.25. So 8 columns and 5 rows. How can I make time series graphs using NCO for specific cells or for the whole extent?

A timeseries for each point can be extracted with, e.g.,
ncks -d lat,15.0 -d lon,-90.3 in.nc out.nc
The whole extent can be averaged with
ncwa in.nc out.nc
The manual ably answers these questions.

Related

Pivoting multiple files into one

I am facing a challenge right now.
I need to combine various CSV into one database. The "catch" is:
The files have the columns as rows;
The files may have different number of rows (columns).
Here is an example below:
File1
Agent Name,xpto1
Agent Email,xpto1#abcd.com
Date,04/18/2019 14:58:25
Time Zone,Europe/Lisbon
Time Filter Begin,04/17/2019 00:00:00
Time Filter End,04/17/2019 23:00:00
Total Unavailable Time (hh:mm:ss),00:00:00
Total Offline Time (hh:mm:ss),00:47:50
File2
Agent Name,xpto2
Agent Email,xpto2#abcd.com
Date,04/18/2019 14:58:25
Time Zone,Europe/Lisbon
Time Filter Begin,04/17/2019 00:00:00
Time Filter End,04/17/2019 23:00:00
Total Auto Answered Calls Notifications Offered,27
Total Number of Calls Notification Offered,42
Total Auto Answered Calls Connected,15
Total Number of Calls Connected,15
Total Number of Multicast Call Notifications Offered,15
Total Unavailable Time (hh:mm:ss),00:02:32
Total Offline Time (hh:mm:ss),14:54:48
Total Wrap-up Time (hh:mm:ss),01:40:37
Total Email Time (hh:mm:ss),00:42:50
Total In-call Time (hh:mm:ss),03:57:42
Total Available Time (hh:mm:ss),01:04:56
Total Meal Time (hh:mm:ss),01:00:59
Total Break Time (hh:mm:ss),00:35:39
Basically I need to combine a hundred (give or take) of these files in a database format
Column 1, Column2, etc, with the correspondent values below each column, with nulls/zeros in case some file does not have a row that other may have.
Any clue how to do this?
I've tried to pivot all the files after merging but I am getting an error:
"Expression.Error: There were too many elements in the enumeration to complete the operation.
Details:
List"
Thanks in advance!
If you have all of these files in a folder, then you can connect to that folder.
If you click that double down arrow in the Content column, then it should automatically do a bunch of steps to get you to a table looking like this:
From here, you want to pivot Column1 using Column2 as the Values Column and Don't Aggregate under Advanced options. This should give you what you're looking for:

Visualisation in power BI

I had a trouble visualizing my output in Power BI. I have imported more than 1500 CSV files but every time I visualize it, it only shows 10 CSV files that are randomly selected among the 1500 CSV files.
How can I see all 1500 CSV visualization at once rather than just 10 CSSV?
If it's not possible then I wanted to know how is 10 CSV files selected out of 1500 CSV files? Is there any calculation involved or is it just a random selection that Power BI dose on its own?
In image4 I would like to know how the calculation is di=one for Average, Sum, Medium, and Maximum.
I have attached screenshots for reference. I tried using various filters but none has given me the desired output. In Image4 you can see that I can select the Sum Average Minimum Maximum and other filters....but neither worked.
Power BI has a limit in the number of datapoints it can print on a graph.
since january you have more options with the high density line charts.
you can read more about it here:
high density sampling

Calculating median on a excel file

I want to calculate the medians for a series of numbers from an excel file.
My excel spreadsheet looks like this:
CELLNOUN 9.32
CELLNOUN 10.62
CELLNOUN 8.42
CELLNOUN 10.64
CELLNOUN 11.51
CELLNOUN 12.01
CELLNOUN 8.83
CELLSNOUN/CELLNOUN 9.53
CELLSNOUN/CELLNOUN 9.21
CELLNOUN/CELLSNOUN 10.76
CELLNOUN/CELLSNOUN 7.01
CELLSNOUN/CELLNOUN 10.21
PLANTNOUN/PLANTSNOUN 3.62
PLANTNOUN/PLANTSNOUN 3.38
PLANTSNOUN/PLANTNOUN 3.92
PLANTSNOUN/PLANTNOUN 3.24
PLANTNOUN/PLANTSNOUN 3.83
PLANTNOUN/PLANTSNOUN 3.24
PLANTSNOUN/PLANTNOUN 3.00
PLANTSNOUN/PLANTNOUN 1.80
...
In the spreadsheet, each set of words has been separated by a blank row, but the numbers of the entries for each set varies, like CELLNOUN/CELLSNOUN has 12 entries but PLANTNOUN/ has 8 entries. The numbers coming after the words are, in fact, the occurrences of these words. I want to find out the median of the occurrences for CELLNOUN/CELLSNOUN, PLANTNOUN/PLANTSNOUN etc, by using Regex instead of using the MEDIAN function in Excel to do it, because I have thousands of sets like this and I can't do it one by one on Excel. But if you know a quicker way to do it on Excel, please advice.
Thank you very much.
First of all, remove the blank rows from your data set and then create an Excel Table with Insert > Table or Ctrl-T. With an Excel table object, all functions and commands that refer to the table will catch when more data is added to the table.
Now you can create a pivot table from your source data with Insert > PivotTable. If you drag the first column field into the rows area, you will have a list of unique values in that source data column. You can drag the values column into the Values area of the Pivot Panel, if you want to. This should now look similar to this screenshot:
I'm not sure if you are aware of the different spellings of your categories, i.e. with or without an "S". The pivot table uncovers them all.
Out of the box, Excel PivotTables do not offer the Median as an option to aggregate, but you can use a method outlined here
http://www.myonlinetraininghub.com/calculating-median-in-pivottables
to calculate a median.
The exact approach varies depending on whether or not you use Pivot tables or Power Pivot, so check out the article.
Use an array formula as shown below and press ctrl+shift+enter to make it an array formula:
=MEDIAN((IF($A$1:$A$20=A1,$B$1:$B$20)))
Refer to the formula bar in the image below to apply to all cells by applying the same formula to all cells

How to do proportionate stratified sampling without replacement?

I want to select my sample in Stata 13 based on three stratum variables with 12 strata in total (size - two strata; sector - three strata; intangible intensity - two strata). The selection should be proportional without replacement.
However, I can only find disproportionate selection commands that select for instance x% of each stratum.
Can anyone help me out with this problem?
Thank you for this discussion. I think I know where my problem was.
The command "gsample" can select strata based on different variables. Therefore, I thought I had to define three different stratum variables. But the solution should be more simple.
There are 12 strata in total (the large firms with high intensity in sector 1, the small firms with high intensity in sector 1, and so on) with each firm in the sample falling in to one of the strata.
All I have to do is creating a variable "strataident" with values from 1 to 12 identifying the different strata. I do this for the population dataset, so the number of firms falling into each stratum is representative for the population. The following code will provide me a stratified random sample that is representative for the population.
gsample 10, percent strata (strataident) wor
This command works as well and is much easier, see the example in 1:
gsample 10, percent wor strata(size sector intensity)
The problem is, that strata may "overlap". So you probably have to rebalance the sample after initial draft.
Now the question is, how this can be implemented. The final sample should represent the proportion of the population as good as possible.

Stata: Groupwise regressions and ranking

I am currently developing a sentiment index using Google search frequencies taken from Google Trends.
I am using Stata 12 on Windows.
My approach is as following:
I downloaded approx ~150 business-related search queries from Googletrends from Jan 2004 to Dec 2013
I now want to construct an index using the 30 at that point in time most relevant queries related to the market I observe
To achieve that I want to use monthly expanding backward rolling regressions of each query on the market
Thus I need to regress 150 items one-by-one on the market 120 times (12 months x 10 years), using different time windows and then extract the 30 queries with the most negative t-test.
To exemplify the procedure, if I would want to construct the sentiment for January 2010 I would regress the query terms on the market during the period from Jan 2004 to December 2009 and then extract the 30 queries with the most negative t-statistic.
Now I am looking for a way to make this as automatized as possible. I guess should be able to run the 150 items at once, and I can specify the time window using the time stamps. Using Excel commands and creating a do-file with all the regression commands in it (which would be quite large) I could probably create the regressions relatively efficiently (although it depends on how much Stata can handle - any experience on that?).
What I would need to make the data extraction much easier is a command which I can use to rank the results of the regression according to their t-statistics. Does someone have an efficient approach to this? Or has general advice?
If you are using Stata, once you run a ttest, you can type return list and you will get scalars that stata stores. Once you run a loop you can store these values in a number of different ways. check out the post command.