GNUPlot select rows in Histogram - row

I have a Data file which looks like the one below. Now, I wanted to make a histogram chart using column 9, column 10 as errorbars. That works out pretty good. Bubt is there an option only to plot specific rows?
I tried the solution in a another thread that using a ternary operator:
plot 'Härte StS-123 bis 151.txt' using ( ( $0 == 4 || $0 == 6 ) ? $9 : 1/0 ):($9+$10):($9-$10):xticlabels(2)
this plots row 4 and 6 indeed, but leaves an empty space inbetween the datasets.
Is there any other way to achieve this?
Data File:
StS-123a "SBR / THF" 50.10 49.60 49.20 50.70 50.00 49.50 49.85 0.49 0.00974176
StS-123b "SBR / THF" 51.00 50.40 50.40 52.00 52.80 50.60 51.20 0.90 0.017614257
StS-124a "SBR+2phrGraphit" 49.60 49.40 49.30 48.90 49.40 49.10 49.28 0.23 0.004599753

What you may want is the index option to the plot command:
plot 'datafile' index 4 u 9:($9-$10):($9+$10):xticlabels(2), \
'' index 6 u 9:($9-$10):($9+$10):xticlabels(2)
This should plot just the data from the 4th and 6th datasets (rows), albeit with two different styles which you can adjust in the plot command.
Did you want to connect the values from the two datasets? That may be trickier.
If you want to only plot data from the 4th and 6th rows that have data, you can use external commands in gnuplot, like:
plot "<sed '/^$/d' data.dat | sed -n '4p; 6p'" u 9:($9-$10):($9+$10):xticlabels(2)
(This may not be the most compact way to use sed in this case, but it deletes blank lines then returns the 4th and 6th rows.)

Related

Replace blanks with a string with DAX coding in Power BI

I have a data in Excel and I have uploaded in power bi, created a visualisation using a chart which looks like -
blank, CP, Jj10 are basically my y axis and dashes are my bars of horizontal chart. I have tried to show how my chart looks like because I don't get any other option
(Blank)-------------998
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
There is a column named performance (sheet_name=Empl_data) and what I want is to replace the blanks with Non-GT in power bi with creating a new column.
What my output should look like -
(Non-GT)-------------998
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
I have tried this -
Non-GT = IF(ISBLANK('Empl_data'[performance]),"Non-GT",'Empl_data'[performance])
What i get is
Non-GT----------------964
(Blank)-------------34
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
I just want to replace blanks with Non-TSG completely but still it shows blank. Please help me out to solve the problem and please let me know if I have made clear what my prblm is.
My data -
Empl_id
Empl_name
performance
99807
Somman paul
0076
Richards.M
8870
Maheen Josef.T
11209
Dojar Farah
6651
Macklegn Sagoe
Hi-po
551
Cada Farez
Jj10
12
Qwezy Goha
Hi-po
6567
Beheriop Produse
CP
2227
John semmers
0BN
656
Majeeio .f
80100
Drejju Yan
Is it actually Blank where it is showing nothing? First confirm there are spaces or really null in the data, then apply conditions as bellow-
Non-GT =
IF(
'Empl_data'[performance] = BLANK() || 'Empl_data'[performance] = "",
"Non-GT",
'Empl_data'[performance]
)
Q: Where you are transforming data with that condition? Power Query Editor? Or creating Measure or Calculated column?

How do I animate an arrow using gnuplot? [duplicate]

I'm trying to animate 2D vector with gnuplot. I want to show one line i.e, one vector at a time.
My Data Structure is as follows: They x,y,u,v
2.24448 0.270645 1.00 1.00
3.24448 0.270645 0.500 1.20
I'm able to create a static plot sing following command:
plot "datam.dat" using 1:2:3:4 with vectors filled head lw 3
Here is the output:
Here is my question: I would like to animate and show one row (i.e,) one vector at a time, how to accomplish this in GNU plot using GIF?
Thanks
Animated GIFs are created with set terminal gif animate. Check help gif for details.
Below is a simple example (tested with gnuplot 5.2). You have to make a new plot for each frame. So, put your plot command into a do for-loop. With every ::i::i you are plotting only the ith line (check help every). If you don't know the total number of lines of your datafile, do stats "YourFile.dat" and the variable STATS_records will tell you this number.
Code:
### animated graph with vectors
reset session
set term gif size 300,300 animate delay 12 loop 0 optimize
set output "AnimateVectors.gif"
# create some dummy data
set angle degrees
N = 60
set samples N
set table $Data
plot [0:360] '+' u (cos($1)):(sin($1)):(sin($1)):(cos($1)) w table
unset table
set xrange[-2.5:2.5]
set yrange[-2.5:2.5]
do for [i=0:N-1] {
plot $Data u 1:2:3:4 every ::i::i w vectors lw 2 lc rgb "red" notitle
}
set output
### end of code
Result:
Addition:
This would be the non-animated version, e.g. in a wxt-terminal.
Code:
### non-animated graph with vectors
reset session
set term wxt size 400,400
# create some dummy data
set angle degrees
N = 60
set samples N
set table $Data
plot [0:360] '+' u (cos($1)):(sin($1)):(sin($1)):(cos($1)) w table
unset table
set xrange[-2.5:2.5]
set yrange[-2.5:2.5]
plot $Data u 1:2:3:4 w vectors lw 1.5 lc rgb "red" notitle
### end of code
Result:
Addition2:
Do you maybe mean something like this? A "semi"-animated arrow? By the way, as you can see the arrow look quite different in gif and wxt terminal.
Code:
### "semi"-animated graph with vectors
reset session
set term gif size 300,300 animate delay 12 loop 0 optimize
set output "AnimateVectorsSemi.gif"
# create some dummy data
set angle degrees
N = 60
set samples N
set table $Data
plot [0:360] '+' u (cos($1)):(sin($1)):(sin($1)):(cos($1)) w table
unset table
set xrange[-2.5:2.5]
set yrange[-2.5:2.5]
do for [i=0:N-1] {
plot $Data u 1:2:3:4 every ::0::i w vectors lw 1.5 lc rgb "red" notitle
}
set output
### end of code
Result:

Split one column into two columns and retaining the seperator

I have a very large data array:
'data.frame': 40525992 obs. of 14 variables:
$ INSTNM : Factor w/ 7050 levels "A W Healthcare Educators"
$ Total : Factor w/ 3212 levels "1","10","100",
$ Crime_Type : Factor w/ 72 levels "MURD11","NEG_M11",
$ Count : num 0 0 0 0 0 0 0 0 0 0 ...
The Crime_Type column contains the type of Crime and the Year, so "MURD11" is Murder in 2011. These are college campus crime statistics my kid is analyzing for her school project, I am helping when she is stuck. I am currently stuck at creating a clean data file she can analyze
Once i converted the wide file (all crime types '9' in columns) to a long file using 'gather' the file size is going from 300MB to 8 GB. The file I am working on is 8GB. do you that is the problem. How do i convert it to a data.table for faster processing?
What I want to do is to split this 'Crime_Type' column into two columns 'Crime_Type' and 'Year'. The data contains alphanumeric and numbers. There are also some special characters like NEG_M which is 'Negligent Manslaughter'.
We will replace the full names later but can some one suggest on how I separate
MURD11 --> MURD and 11 (in two columns)
NEG_M10 --> NEG_M and 10 (in two columns)
etc...
I have tried using,
df <- separate(totallong, Crime_Type, into = c("Crime", "Year"), sep = "[:digit:]", extra = "merge")
df <- separate(totallong, Crime_Type, into = c("Year", "Temp"), sep = "[:alpha:]", extra = "merge")
The first one separates the Crime as it looks for numbers. The second one does not work at all.
I also tried
df$Crime_Type<- apply (strsplit(as.character(df$Crime_Type), split="[:digit:]"))
That does not work at all. I have gone through many posts on stack-overflow and thats where I got these commands but I am now truly stuck and would appreciate your help.
Since you're using tidyr already (as evidenced by separate), try the extract function, which, given a regex, puts each captured group into a new column. The 'Crime_Type' is all the non-numeric stuff, and the 'Year' is the numeric stuff. Adjust the regex accordingly.
library(tidyr)
extract(df, 'Crime_Type', into=c('Crime', 'Year'), regex='^([^0-9]+)([0-9]+)$')
In base R, one option would be to create a unique delimiter between the non-numeric and numeric part. We can capture as a group the non-numeric ([^0-9]+) and numeric ([0-9]+) characters by wrapping it inside the parentheses ((..)) and in the replacement we use \\1 for the first capture group, followed by a , and the second group (\\2). This can be used as input vector to read.table with sep=',' to read as two columns.
df1 <- read.table(text=gsub('([^0-9]+)([0-9]+)', '\\1,\\2',
totallong$Crime_Type),sep=",", col.names=c('Crime', 'Year'))
df1
# Crime Year
#1 MURD 11
#2 NEG_M 11
If we need, we can cbind with the original dataset
cbind(totallong, df1)
Or in base R, we can use strsplit with split specifying the boundary between non-number ((?<=[^0-9])) and a number ((?=[0-9])). Here we use lookarounds to match the boundary. The output will be a list, we can rbind the list elements with do.call(rbind and convert it to data.frame
as.data.frame(do.call(rbind, strsplit(as.character(totallong$Crime_Type),
split="(?<=[^0-9])(?=[0-9])", perl=TRUE)))
# V1 V2
#1 MURD 11
#2 NEG_M 11
Or another option is tstrsplit from the devel version of data.table ie. v1.9.5. Here also, we use the same regex. In addition, there is option to convert the output columns into different class.
library(data.table)#v1.9.5+
setDT(totallong)[, c('Crime', 'Year') := tstrsplit(Crime_Type,
"(?<=[^0-9])(?=[0-9])", perl=TRUE, type.convert=TRUE)]
# Crime_Type Crime Year
#1: MURD11 MURD 11
#2: NEG_M11 NEG_M 11
If we don't need the 'Crime_Type' column in the output, it can be assigned to NULL
totallong[, Crime_Type:= NULL]
NOTE: Instructions to install the devel version are here
Or a faster option would be stri_extract_all from library(stringi) after collapsing the rows to a single string ('v2'). The alternate elements in 'v3' can be extracted by indexing with seq to create new data.frame
library(stringi)
v2 <- paste(totallong$Crime_Type, collapse='')
v3 <- stri_extract_all(v2, regex='\\d+|\\D+')[[1]]
ind1 <- seq(1, length(v3), by=2)
ind2 <- seq(2, length(v3), by=2)
d1 <- data.frame(Crime=v3[ind1], Year= v3[ind2])
Benchmarks
v1 <- do.call(paste, c(expand.grid(c('MURD', 'NEG_M'), 11:15), sep=''))
set.seed(24)
test <- data.frame(v1= sample(v1, 40525992, replace=TRUE ))
system.time({
v2 <- paste(test$v1, collapse='')
v3 <- stri_extract_all(v2, regex='\\d+|\\D+')[[1]]
ind1 <- seq(1, length(v3), by=2)
ind2 <- seq(2, length(v3), by=2)
d1 <- data.frame(Crime=v3[ind1], Year= v3[ind2])
})
#user system elapsed
#56.019 1.709 57.838
data
totallong <- data.frame(Crime_Type= c('MURD11', 'NEG_M11'))

How to rename a column of a data frame with part of the data frame identifier in R?

I've got a number of files that contain gene expression data. In each file, the gene name is kept in a column "Gene_symbol" and the expression measure (a real number) is kept in a column "RPKM". The file name consists of an identifier followed by _ and the rest of the name (ends with "expression.txt"). I would like to load all of these files into R as data frames, for each data frame rename the column "RPKM" with the identifier of the original file and then join the data frames by "Gene_symbol" into one large data frame with one column "Gene_symbol" followed by all the columns with the expression measures from the individual files, each labeled with the original identifier.
I've managed to transfer the identifier of the original files to the names of the individual data frames as follows.
files <- list.files(pattern = "expression.txt$")
for (i in files) {var_name = paste("Data", strsplit(i, "_")[[1]][1], sep = "_"); assign(var_name, read.table(i, header=TRUE)[,c("Gene_symbol", "RPKM")])}
So now I'm at a stage where I have dataframes as follows:
Data_id0001 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"),RPKM=c(2.43,5.24,6.53))
Data_id0002 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"),RPKM=c(4.53,1.07,2.44))
But then I don't seem to be able to rename the RPKM column with the id000x bit. (That is in a fully automated way of course, looping through all the data frames I will generate in the real scenario.)
I've tried to store the identifier bit as a comment with the data frames but seem to be unable to assign the comment from within a loop.
Any help would be appreciated,
mce
You should never work this way in R. You should always try keeping all your data frames in a list and operate over them using function such as lapply etc. Thus, instead of using assign, just create an empty list of length of your files list and fill it with the for loop
For your current situation, we can fixed it using ls and mget combination in order to pull this data frames from the global environment into a list and then change the columns of interest.
temp <- mget(ls(pattern = "Data_id\\d+$"))
lapply(names(temp), function(x) names(temp[[x]])[2] <<- gsub("Data_", "", x))
temp
#$Data_id0001
# Gene_symbol id0001
# 1 geneA 2.43
# 2 geneB 5.24
# 3 geneC 6.53
#
# $Data_id0002
# Gene_symbol id0002
# 1 geneA 4.53
# 2 geneB 1.07
# 3 geneC 2.44
You could eventually use list2env in order to get them back to the global environment, but you should use with caution
thanks a lot for your suggestions! I think I get the point. The way I'm doing it now (see below) is hopefully a lot more R-like and works fine!!!
Cheers,
Maik
library(plyr)
files <- list.files(pattern = "expression.txt$")
temp <- list()
for (i in 1:length(files)) {temp[[i]]=read.table(files[i], header=TRUE)[,c("Gene_symbol", "RPKM")]}
for (i in 1:length(temp)) {temp[[i]]=rename(temp[[i]], c("RPKM"=strsplit(files[i], "_")[[1]][1]))}
combined_expression <- join_all(temp, by="Gene_symbol", type="full")

read table with spaces in one column

I am attempting to extract tables from very large text files (computer logs). Dickoa provided very helpful advice to an earlier question on this topic here: extracting table from text file
I modified his suggestion to fit my specific problem and posted my code at the link above.
Unfortunately I have encountered a complication. One column in the table contains spaces. These spaces are generating an error when I try to run the code at the link above. Is there a way to modify that code, or specifically the read.table function to recognize the second column below as a column?
Here is a dummy table in a dummy log:
> collect.models(, adjust = FALSE)
model npar AICc DeltaAICc weight Deviance
5 AA(~region + state + county + city)BB(~region + state + county + city)CC(~1) 17 11111.11 0.0000000 5.621299e-01 22222.22
4 AA(~region + state + county)BB(~region + state + county)CC(~1) 14 22222.22 0.0000000 5.621299e-01 77777.77
12 AA(~region + state)BB(~region + state)CC(~1) 13 33333.33 0.0000000 5.621299e-01 44444.44
12 AA(~region)BB(~region)CC(~1) 6 44444.44 0.0000000 5.621299e-01 55555.55
>
> # the three lines below count the number of errors in the code above
Here is the R code I am trying to use. This code works if there are no spaces in the second column, the model column:
my.data <- readLines('c:/users/mmiller21/simple R programs/dummy.log')
top <- '> collect.models\\(, adjust = FALSE)'
bottom <- '> # the three lines below count the number of errors in the code above'
my.data <- my.data[grep(top, my.data):grep(bottom, my.data)]
x <- read.table(text=my.data, comment.char = ">")
I believe I must use the variables top and bottom to locate the table in the log because the log is huge, variable and complex. Also, not every table contains the same number of models.
Perhaps a regex expression could be used somehow taking advantage of the AA and the CC(~1) present in every model name, but I do not know how to begin. Thank you for any help and sorry for the follow-up question. I should have used a more realistic example table in my initial question. I have a large number of logs. Otherwise I could just extract and edit the tables by hand. The table itself is an odd object which I have only ever been able to export directly with capture.output, which would probably still leave me with the same problem as above.
EDIT:
All spaces seem to come right before and right after a plus sign. Perhaps that information can be used here to fill the spaces or remove them.
try inserting my.data$model <- gsub(" *\\+ *", "+", my.data$model) before read.table
my.data <- my.data[grep(top, my.data):grep(bottom, my.data)]
my.data$model <- gsub(" *\\+ *", "+", my.data$model)
x <- read.table(text=my.data, comment.char = ">")