It is possible to move the content of 1 row to another row? - row

I have this data frame:
biopsia1
Name:Juan
Rut: 17006-9
Diagnostic:
Gallbladder Cancer
I split the data base in 2 columns
text
text_2
Name
Juan
Rut
17006-9
Diagnostic
NA
Gallbladder Cancer
NA
The original format have de Diagnostic results in a different row (i can not change it), so when i split the data base "Galbladder Cancer" is below diagnosis, but i want it in the same row.
text
text_2
Diagnostic
Gallbladder Cancer
It is possible to move "Gallbladder Cancer" to the same row?
Thank for your help!

Related

Replace blanks with a string with DAX coding in Power BI

I have a data in Excel and I have uploaded in power bi, created a visualisation using a chart which looks like -
blank, CP, Jj10 are basically my y axis and dashes are my bars of horizontal chart. I have tried to show how my chart looks like because I don't get any other option
(Blank)-------------998
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
There is a column named performance (sheet_name=Empl_data) and what I want is to replace the blanks with Non-GT in power bi with creating a new column.
What my output should look like -
(Non-GT)-------------998
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
I have tried this -
Non-GT = IF(ISBLANK('Empl_data'[performance]),"Non-GT",'Empl_data'[performance])
What i get is
Non-GT----------------964
(Blank)-------------34
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
I just want to replace blanks with Non-TSG completely but still it shows blank. Please help me out to solve the problem and please let me know if I have made clear what my prblm is.
My data -
Empl_id
Empl_name
performance
99807
Somman paul
0076
Richards.M
8870
Maheen Josef.T
11209
Dojar Farah
6651
Macklegn Sagoe
Hi-po
551
Cada Farez
Jj10
12
Qwezy Goha
Hi-po
6567
Beheriop Produse
CP
2227
John semmers
0BN
656
Majeeio .f
80100
Drejju Yan
Is it actually Blank where it is showing nothing? First confirm there are spaces or really null in the data, then apply conditions as bellow-
Non-GT =
IF(
'Empl_data'[performance] = BLANK() || 'Empl_data'[performance] = "",
"Non-GT",
'Empl_data'[performance]
)
Q: Where you are transforming data with that condition? Power Query Editor? Or creating Measure or Calculated column?

Caption numbering not in sequential order when citing the captions with captioner in Rmarkdown

I am using captioner (https://cran.r-project.org/web/packages/captioner/vignettes/using_captioner.html) to create table captions in Rmarkdown - the main reason is because I am using huxtable for conditional formatting and exporting to word. This is the only I have found to have numbered captions.
I was trying to reference the captions but the caption number is not in sequential order when citing the captions but only if the table_nums(..., display="cite") is before the tables. I was trying to give the range of table numbers and it changed the number of the last table. I The number isn't changed if the r table_nums('third_cars_table',display = "cite") is put after the captions. Is there a way to make sure that table numbers remain in sequential order? I'd also be happy with a better solution for numbered captions.
Reproducible example:
---
title: "Untitled"
output: bookdown::word_document2
---
```{r setup, include=FALSE}
library(captioner)
library(huxtable)
library(knitr)
library(pander)
table_nums <- captioner(prefix = "Table")
fig_nums <- captioner(prefix = "Figure")
knitr::opts_chunk$set(echo = TRUE)
```
## Description of tables
I am trying to put a description of tables
and say that these results are shown table numbers ranging
from the first table (`r table_nums('first_cars_table',display = "cite")`)
to the last table (`r table_nums('third_cars_table',display = "cite")`)
```{r, results='asis',echo=FALSE,eval.after=TRUE}
tablecap1=cat(table_nums(name="first_cars_table",caption='First car table'))
kable((cars[1:5,]))
tablecap2=cat(table_nums(name="second_cars_table",caption='second car table'))
kable(cars[6:10,])
tablecap3=cat(table_nums(name="third_cars_table",caption='third car table'))
kable(cars[10:15,])
```
The results:
A (terrible) workaround is to manually give the number ordering using display = FALSE. For example, inserting the following at the start of the document will ensure t1-t5 are sequentially numbered, no matter where the tables or first citations appear:
`r table_nums('t1', display = FALSE)`
`r table_nums('t2', display = FALSE)`
`r table_nums('t3', display = FALSE)`
`r table_nums('t4', display = FALSE)`
`r table_nums('t5', display = FALSE)`
I have not examined the captioner code but I expect that the document is read from top to bottom once and hence the numbering is stored in a first come, first served basis. Thus, I am not sure there are any other ways to get around this as it would involve some kind of pre-processing stage.

source file fixed width , need only Header and Footer to the target(oracle)

I have this scenario with source as fix width flat file, and I need to read to target only the Header and Footer not the details records.
I need to trim the first column (PA22109 ) and get only PA and next 2 columns to rows as two different dates.
For Footer get only the PT(PT000000000700000030620E00000055612I00000010277I) and the rest into a column of the target.
How can I achieve this logic, inputs are appreciated.
source file :
PA22109 00153252015110905408179 2015110820151108PO ---header
DE0E9D TESTGROUPEXCH TESTINSEXCH TESTLOCEXCH ID014 LNAME014 FNAME014 14 MAIN ST ANYWHERE NJ011110000 195001012Z 01000000014 LNAME014 PATFIRST014 14 MAIN ST ANYWHERE NJ011110000 1955010110106000220 TESTGROUPEXCH 8179 TESTBENEXCH TESTCNTE53 0000000000 0000002643005 011234567890 011234567890 1234 TEST PHARMACY TEST PHARMACY LANE PHARMACYTOWN NJ09876 5555555555 11Y5 019876543210 019876543210 NJPRESCLAST PRESCFIRST 5555555551 DRLAST DRFIRST 110110000009770990300406048410 2015092720150927154401000000000000120150929 0000100000000000000000000000000
PT000000000700000030620E00000055612I00000010277I --Footer
As this a fixed file you can perform following to meet your requirement.
In your Informatica mapping, Read row in a single column.
In Expression, Mark each record for filter out if It does not start with PA OR PT (Assumption your Detail records do not start with PA or PT). Filter detail record out using Filter transformation.
Now you have only Header and Footer Records.
Now you can apply respective condition in expression for PA and PT Records.

How to rename a column of a data frame with part of the data frame identifier in R?

I've got a number of files that contain gene expression data. In each file, the gene name is kept in a column "Gene_symbol" and the expression measure (a real number) is kept in a column "RPKM". The file name consists of an identifier followed by _ and the rest of the name (ends with "expression.txt"). I would like to load all of these files into R as data frames, for each data frame rename the column "RPKM" with the identifier of the original file and then join the data frames by "Gene_symbol" into one large data frame with one column "Gene_symbol" followed by all the columns with the expression measures from the individual files, each labeled with the original identifier.
I've managed to transfer the identifier of the original files to the names of the individual data frames as follows.
files <- list.files(pattern = "expression.txt$")
for (i in files) {var_name = paste("Data", strsplit(i, "_")[[1]][1], sep = "_"); assign(var_name, read.table(i, header=TRUE)[,c("Gene_symbol", "RPKM")])}
So now I'm at a stage where I have dataframes as follows:
Data_id0001 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"),RPKM=c(2.43,5.24,6.53))
Data_id0002 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"),RPKM=c(4.53,1.07,2.44))
But then I don't seem to be able to rename the RPKM column with the id000x bit. (That is in a fully automated way of course, looping through all the data frames I will generate in the real scenario.)
I've tried to store the identifier bit as a comment with the data frames but seem to be unable to assign the comment from within a loop.
Any help would be appreciated,
mce
You should never work this way in R. You should always try keeping all your data frames in a list and operate over them using function such as lapply etc. Thus, instead of using assign, just create an empty list of length of your files list and fill it with the for loop
For your current situation, we can fixed it using ls and mget combination in order to pull this data frames from the global environment into a list and then change the columns of interest.
temp <- mget(ls(pattern = "Data_id\\d+$"))
lapply(names(temp), function(x) names(temp[[x]])[2] <<- gsub("Data_", "", x))
temp
#$Data_id0001
# Gene_symbol id0001
# 1 geneA 2.43
# 2 geneB 5.24
# 3 geneC 6.53
#
# $Data_id0002
# Gene_symbol id0002
# 1 geneA 4.53
# 2 geneB 1.07
# 3 geneC 2.44
You could eventually use list2env in order to get them back to the global environment, but you should use with caution
thanks a lot for your suggestions! I think I get the point. The way I'm doing it now (see below) is hopefully a lot more R-like and works fine!!!
Cheers,
Maik
library(plyr)
files <- list.files(pattern = "expression.txt$")
temp <- list()
for (i in 1:length(files)) {temp[[i]]=read.table(files[i], header=TRUE)[,c("Gene_symbol", "RPKM")]}
for (i in 1:length(temp)) {temp[[i]]=rename(temp[[i]], c("RPKM"=strsplit(files[i], "_")[[1]][1]))}
combined_expression <- join_all(temp, by="Gene_symbol", type="full")

read table with spaces in one column

I am attempting to extract tables from very large text files (computer logs). Dickoa provided very helpful advice to an earlier question on this topic here: extracting table from text file
I modified his suggestion to fit my specific problem and posted my code at the link above.
Unfortunately I have encountered a complication. One column in the table contains spaces. These spaces are generating an error when I try to run the code at the link above. Is there a way to modify that code, or specifically the read.table function to recognize the second column below as a column?
Here is a dummy table in a dummy log:
> collect.models(, adjust = FALSE)
model npar AICc DeltaAICc weight Deviance
5 AA(~region + state + county + city)BB(~region + state + county + city)CC(~1) 17 11111.11 0.0000000 5.621299e-01 22222.22
4 AA(~region + state + county)BB(~region + state + county)CC(~1) 14 22222.22 0.0000000 5.621299e-01 77777.77
12 AA(~region + state)BB(~region + state)CC(~1) 13 33333.33 0.0000000 5.621299e-01 44444.44
12 AA(~region)BB(~region)CC(~1) 6 44444.44 0.0000000 5.621299e-01 55555.55
>
> # the three lines below count the number of errors in the code above
Here is the R code I am trying to use. This code works if there are no spaces in the second column, the model column:
my.data <- readLines('c:/users/mmiller21/simple R programs/dummy.log')
top <- '> collect.models\\(, adjust = FALSE)'
bottom <- '> # the three lines below count the number of errors in the code above'
my.data <- my.data[grep(top, my.data):grep(bottom, my.data)]
x <- read.table(text=my.data, comment.char = ">")
I believe I must use the variables top and bottom to locate the table in the log because the log is huge, variable and complex. Also, not every table contains the same number of models.
Perhaps a regex expression could be used somehow taking advantage of the AA and the CC(~1) present in every model name, but I do not know how to begin. Thank you for any help and sorry for the follow-up question. I should have used a more realistic example table in my initial question. I have a large number of logs. Otherwise I could just extract and edit the tables by hand. The table itself is an odd object which I have only ever been able to export directly with capture.output, which would probably still leave me with the same problem as above.
EDIT:
All spaces seem to come right before and right after a plus sign. Perhaps that information can be used here to fill the spaces or remove them.
try inserting my.data$model <- gsub(" *\\+ *", "+", my.data$model) before read.table
my.data <- my.data[grep(top, my.data):grep(bottom, my.data)]
my.data$model <- gsub(" *\\+ *", "+", my.data$model)
x <- read.table(text=my.data, comment.char = ">")