spotfire plot list of elements - list

I have a data table that has this format :
and I want to plot temperature to time, any idea how to do that ?

This can be done in a TERR data function. I don't know how comfortable you are integrating Spotfire with TERR, there is an intro video here for instance (demo starts from about minute 7):
https://www.youtube.com/watch?v=ZtVltmmKWQs
With that in mind, I wrote the script without loading any library, so it is quite verbose and explicit, but hopefully simpler to follow step by step. I am sure there is a more elegant way, and there are better ways of making it flexible with column names, but this is a start.
Your input will be a data table (dt, the original data) and the output a new data table (dt.out, the transformed data). All column names (and some values) are addressed explicitly in the script (so if you change them it won't work).
#remove the []
dt$Values=gsub('\\[|\\]','',dt$Values)
#separate into two different data frames, one for time and one for temperature
dt.time=dt[dt$Description=='time',]
dt.temperature=dt[dt$Description=='temperature',]
#split the columns we want to separate into a list of vectors
dt2.time=strsplit(as.character(dt.time$Values),',')
dt2.temperature=strsplit(as.character(dt.temperature$Values),',')
#rearrange times
names(dt2.time)=dt.time$object
dt2.time=stack(dt2.time) #stack vectors
dt2.time$id=c(1:nrow(dt2.time)) #assign running id for merging later
colnames(dt2.time)[colnames(dt2.time)=='values']='time'
#rearrange temperatures
names(dt2.temperature)=dt.temperature$object
dt2.temperature=stack(dt2.temperature) #stack vectors
dt2.temperature$id=c(1:nrow(dt2.temperature)) #assign running id for merging later
colnames(dt2.temperature)[colnames(dt2.temperature)=='values']='temperature'
#merge time and temperature
dt.out=merge(dt2.time,dt2.temperature,by=c('id','ind'))
colnames(dt.out)[colnames(dt.out)=='ind']='object'
dt.out$time=as.numeric(dt.out$time)
dt.out$temperature=as.numeric(dt.out$temperature)
Gaia

because all of the example rows you've shown here contain exactly four list items and you haven't specified otherwise, I'll assume that all of the data fits this format.
with this assumption, it becomes pretty trivial, albeit a little messy, to split the values out into columns using the RXReplace() expression function.
you can create four calculated columns, each with an expression like:
Int(RXReplace([values],"\\[([\\d\\-]+),([\\d\\-]+),([\\d\\-]+),([\\d\\-]+)]","\\1",""))
the third argument "\\1" determines which number in the list to extract. backslashes are doubled ("escaped") per the requirements of the RXReplace() function.
note that this example assumes the numbers are all whole numbers. if you have decimals, you'd need to adjust each "phrase" of the regular expression to ([\\d\\-\\.]+), and you'd need to wrap the expression in Real() rather than Int() (if you leave this part out, the result will be a String type which could cause confusion later on when working with the data).
once you have the four columns, you'll be able to unpivot to get the data easily.

Related

In Libreoffice Calc, which formula will check if a a keyword or part of it is contained in a cell in a row and copy the entire content of that cell?

I am learning how to use formulas in spreadsheets, I do use libre office.
I need to sort out data in a quite huge messy spreadsheet.
Each column contains mixed data, the sheet is huge, dozens of columns and thousands of rows, if the spreadsheet does not contain errors each cell in a row either contains a different keyword or is empty, there should not be two cells in the same row containing the same keyword.
The problem to solve is to sort out all the data so to reach to have a new spreadsheet in which each cell marked with a given specific keyword is kept in the same position but placed in one column dedicated to that same keyword.
the kind of spreadsheet with mixed up cells to be sorted out
the data in the spreadsheet has to be fixed so to appear in this way
A formula that can be used to extract sorted out data from a cell is the following:
=IF(SEARCH("Text1";B2;1);B2;0)
The formula can be dragged to each cell below to hit the proper cell next to it. The result is correct.
The results are correct, but I do not know why the expected 0 is not printed, there is #VALUE! instead
The logic is very simple, if the cell contains the keyword or any other text that contains that keyword the result is the full content of that cell, otherwise the result is 0.
Here comes the first question, why do I get #VALUE! as a result for those cells that do not contain the keyword? I expected to get 0 instead, just as indicated in the formula,
I tried to leave this filed empty and also to put the 0 result in quotes, the actual result is always the same, #VALUE!...
However, of course this formula extracts only the information contained in one column, so for each other column the process must be repeated.
In order to avoid to create a column with the formula for each column in the spreadsheet or anyway to process each column one by one and more importantly to have then to merge all the results to form one columns containing only cells with a given keyword I thought to use the same formula extending the parsing to each next cell in the row as follows:
=IF(SEARCH("text";B2;1);B2;IF(SEARCH("text";C2;1);C2;IF(SEARCH("text";D2;1);D2;0)))
The logic is very simple and should output in one go a column containing all the cells containing the keyword that are found in the row, check if the first cell in the row contains a word using the search function, if does then the result is the content of that cell, otherwise perform the next test, the next test is the same, check if the next cell contains a certain word using the search function, if does then the result is the content of that cell, otherwise proceed to the next test…. and so on until last test, if no test gave a true result then print 0 (but we get #VALUE!, OK I could live with that...).
In theory should work for a any number of cells, but in the practice does not at all, in fact does work only for the first IF test and cell indicated in the formula.
WHY?
The result using the extended version of the formula to parse N cells in sequence is the same obtained with the simple formula to parse only one cell
Finally, how do I resolve this problem using IF and Search?
Is there any other better approach and way to solve this kind of problems and sort out data in huge spreadsheets of this kind?
Thank you for any hint and help.

Searching within the result of a vlookup using a range of values and parsing text

MY GOAL:
parse a MM/DD date from the result of a vlookup so that it can be used in a project plan
BACKGROUND:
The vlookup result contains multiple values separated by a "•" (I don't need all of them)
The value I'm looking to parse is not always in the same location in the vlookup result (otherwise I could use the RIGHT formula)
There is a finite number of the values I'm looking to retrieve (and I know them already)
The value that I'm looking to retrieve contains some text with a date range; I only want the first four values in the date range (MM/DD)
I'd like to achieve all this with a single formula with the result in a single cell
CURRENT FORMULA
The formula that I've been working on that is not working is:
=ARRAYFORMULA(if(iserror(search(Iterations!D2:D7,(VLOOKUP(A2,'Results {2596503}'!$C$2:$L$183,3)))),,))
I've set up a sheet called "Erik Help" with the following formulas in B2 ad C2:
=ArrayFormula(IF(A2:A="","",MID(VLOOKUP(A2:A,data!A2:B,2,FALSE),FIND(REGEXEXTRACT(VLOOKUP(A2:A,data!A2:B,2,FALSE),"[0-9]-[0-9]"),VLOOKUP(A2:A,data!A2:B,2,FALSE))-4,5)))
and
=ArrayFormula(IF(A2:A="","",MID(VLOOKUP(A2:A,data!A2:B,2,FALSE),FIND(REGEXEXTRACT(VLOOKUP(A2:A,data!A2:B,2,FALSE),"[0-9]-[0-9]"),VLOOKUP(A2:A,data!A2:B,2,FALSE))+2,5)))
respectively.
They may be longer than actually needed, but you did not share realistic results in Column B or list which symbols may appear in Column B other than in the date; so I tried to account for either a hyphen or a forward slash possibly appearing in Column B in places other than within the date span.
Your analytics sheet also shows a formula that is sorting the results from data!A:A. So even though in your example the original data order happens to be the same as in analytics!A:A, that is not a given (again, based on your formula). Therefore, the VLOOKUP is also necessary.
You did not indicate whether you need to further use these returned date-snippets in calculations, or whether you just need to view them. So the results generated in "Erik Help" are text.
If you want usable numbers/dates, you add further issues that would need to be controlled for in the formula, because you'll only be extracting month and day, not year. That's fine right now. But what about when the date range to be extracted is "12/28-01/13"? If you simply make these values/dates, they will both be assigned to the current year. So the end date here will wind up being earlier than the start date.
Because of this, I've added a second sheet, "Erik Help 2," which contains extended formulas to account for these cases while still returning the date format you want as actual dates which can be used in calculations.
EDIT
(following your note on the sheet: "I would like to remove col b altogether and nest in the formulas in col c and d")
You can adjust the range B2:B by replacing it with your already existing formula in B2.
The new adjusted formula will become
=ArrayFormula(IFNA(SPLIT(REGEXEXTRACT(VLOOKUP(ARRAYFORMULA(sort(unique(data!A2:A))),data!$A$1:$C,2),"\d+\/\d+-\d+\/\d+"),"-")))
Original answer
You can use the following formula:
=ArrayFormula(IFNA(SPLIT(REGEXEXTRACT(B2:B,"\d{2}\/\d{2}-\d{2}\/\d{2}"),"-")))
Make sure you format the results as Date.
(Please adjust ranges to your needs)
Functions used:
ArrayFormula
IFNA
SPLIT
REGEXEXTRACT
try:
=ARRAYFORMULA(IF(A2:A="",,IFNA(TEXT(SPLIT(REGEXEXTRACT(
VLOOKUP(data!A2:A, data!A:C, 2), "\d+/\d+-\d+/\d+"), "-"), "mm/dd"))))

How to apply conditional formatting (if cell is in another range) to a range of cells

So I have searched through several different questions related to this. None of them seem to be asking exactly what I'm looking for and none of the solutions I've found have worked for me thus far.
I have several columns of data (Player names) where each column's values are generated from a formula in the 2nd row of that column. The 1st row is a header (Game name). This whole range is the collection of which players are willing to play which games. These are columns D-J(ish, the list is dynamically generated with another formula, based on form responses)
I have another range of data where the 1st column is the Player and the 2nd is the player's PREFERRED game. This data is also generated with a formula based on form responses. These are columns A-B.
Here's what I'm trying to do
Using conditional formatting in columns D-J, I want to highlight the player's name if this game (in row 1 of this column) is their preferred game (range A2:B).
I've tried several different variations of VLOOKUPS, MATCHES, and FILTERS in the conditional formatting, but so far nothing has worked. The problem I run into every time is that I can't figure out how to reference the cell that the formatting is applying to, but still have it reference each individual cell over the whole range.
I know I could do this if I applied an individual conditional formatting to each individual cell. However that is a very time consuming and inelegant solution to this issue considering I'm expecting my data range to be much larger in the future. I need a conditional formatting formula that will work across the whole range or , at the very least, for an entire column.
This is a mock of what I'm trying to accomplish:
This is a link to a mock of my sheet so that you can clearly see the data layout and specific formulas I'm using:
https://docs.google.com/spreadsheets/d/1wy1T6dWJwNC_EfdCAbkuxtkJH7y4Cg3x4IyEk6R567M/edit?usp=sharing
use:
=REGEXMATCH(D3, TEXTJOIN("|", 1, FILTER($A$3:$A, $B$3:$B=D$2)))

kettle sample rows for each type

I have a set of rows let's say "rowId","type","value". I need on output set of 10 sample rows for each "type". How can I do it? "type" has aprox. 100 different, and changing values, so switch is not good option.
Well I've figured a walkaround from this situation. I splited transformation in parts. First part collects all data to a temp table, finds unique types, and copies them to the result.
The second one runs for every input row (where we have types), and collects data of a given type from temp table. Then you need no grouping to do stratified sample.

How to create Google Chart with lines (series) of different lengths?

How do I create a Google Line Chart that displays two or more lines, with a different number of data points in each series?
For instance, I want to create a chart with 2 lines, one showing the expected values over time and another showing the actual values over time. The date range and expected values are known in advance so I can fully graph them, but the actual values may not be fully known yet (e.g. the date range covers some dates in the future).
I found the answer in this SO question. The solution is to use "_" (or "__", depending on the encoding of the data values) to indicate "no value".
For instance, one data series might be 10,7,3,1 and the other might be 10,6,_,_.