rmarkdown - format numbers for inline code - r-markdown

When I use inline code in rmarkdown the result comes out in exponential form. It seems to be random (though I'm sure it must not be) as to which ones work, and which ones don't.
Can anyone tell me how to get this number to display with 2 decimal points?
Here is the data:
# AllStats
structure(list(Mean_CalcnetPd = 13919.45, Mean_CalcnetPd2 = 13911.91,
SD_CalcnetPd = 4458.63, SD_CalcnetPd2 = 4394.47, Outliers = 3L,
Outlier_Cutoff = 27295.34, n_Stats = 22675, n_Stats2 = 22672,
RegressionModel = "Predicted",
`as.numeric(n_Stats)` = 22675), class = "data.frame", row.names = c(NA,
-1L))
And here is the text with inline code:
The sample is further differentiated by including only contracted Alliance Providers. The sample size was r AllStats$n_Stats.
Here is the resulting knitted html:
The sample is further differentiated by including only contracted Alliance Providers. The sample size was 2.2675^{4}.
I have tried converting the n_Stats value to different types like integers, numbers, etc.
I have tried to find a way through dplyr or something to fix the number before it gets pulled into the inline.
I have checked the cheat sheet and this guide: https://bookdown.org/yihui/rmarkdown/

My problem was solved by adding options(scipen=999) and using the ?round function as Julian commented under the question.

Related

Regex for XAML Formatting

I'm attempting to build a PowerShell CmdLet that can parse and cleanly reformat a chunk of XAML or any other markup language.
So far, I've had to build an assortment of CmdLet's so that I can get the correct information to put into this thing (for indentation, counts, items, child items, etc, so forth...)
What I'm attempting to do is to collect ALL of the properties and values in a set of XAML/HTML, etc, and then once I have the lengths of all those variables, I can then start to chunk them out and properly format them so that they all output down a straight line. It may not make a super amount of sense as I describe it? So, here's an example.
<Window xmlns = 'http://schemas.microsoft.com/winfx/2006/xaml/presentation'
xmlns:x = 'http://schemas.microsoft.com/winfx/2006/xaml'
Title = 'Window Title'
Height = '600'
MinHeight = '600'
Width = '800'
MinWidth = '800'
BorderBrush = 'Black'
ResizeMode = 'CanResize'
HorizontalAlignment = 'Center'
WindowStartupLocation = 'CenterScreen'>
The reason I am attempting to build this, is so that I can programmatically save the instructions to a smaller footprint. So, instead of... having fluctuating numbers for each line and item and the end result looking like this...
<Window xmlns='http://schemas.microsoft.com/winfx/2006/xaml/presentation'
xmlns:x = 'http://schemas.microsoft.com/winfx/2006/xaml' Title = 'Window Title' Height = '600'
MinHeight = '600' Width = '800' MinWidth = '800' BorderBrush = 'Black' ResizeMode = 'CanResize'
HorizontalAlignment = 'Center' WindowStartupLocation = 'CenterScreen'>
...I then have a set of instructions that can vectorize the content of the XAML, so that it has a pattern and less randomness. Sure, the line count might get expanded quite a bit, but there's no need to be concerned with that if all it is doing is expanding into RAM. Which is the point of it...
At any rate, the code that I am having trouble with is essentially a way to preserve the spacing between the quoted objects. I feel like I'm beating my head against a wall trying to get this to work correctly when I know it's a matter of Regex ...
I've posted the code I'm talking about via this link.
https://github.com/secure-digits-plus-llc/FightingEntropy/blob/master/Format-XAML.ps1
Lines 43-147
It is a script block, and testing with it requires a Xaml Here String.
Any suggestions would be appreciated. I'm not much of a Regex fan, I understand some basics to it but I'm not that great with it yet.
-MC
Found the answer I was looking for.
Not the most eloquent way to solve the issue I was having, but it works.
"(?<=\').+?(?=\')"
When the lines are split, and you want to preserve the spacing within the quotes, then you need something like this.
I was attempting to iterate through a do loop until the array/string contained (2) single quotes, but what was happening was... 'oh. I thought you wanted to match 'adbhjikvgrfe' with '21345rfs'.
No regex. Wasn't looking to match that. sigh.
Then it was taking the spacing out between the quotes.
sigh
I gotta say... anyone who truly writes good programming...? Well, I tip my hat off to you good sir/ma'am... because... it's a frustrating job. For certain.

Regex (re2 googlesheets) multiple values in multiline cell

Getting stuck on how to read and pretty up these values from a multiline cell via arrayformula.
Im using regex as preceding line can vary.
just formulas please, no custom code
The first column looks like a set of these:
```
[config]
name = the_name
texture = blah.dds
cost = 1000
[effect0]
value = 1000
type = ATTR_A
[effect1]
value = 8
type = ATTR_B
[feature0]
name = feature_blah
[components]
0 = comp_one,1
[resources]
res_one = 1
res_five = 1
res_four = 1
<br/>
Where to be useful elsewhere, at minimum it needs each [tag] set ([effect\d], [feature\d], ect) to be in one column each, for example the 'effects' column would look like:
ATTR_A:1000,ATTR_B:8
and so on.
Desired output can also be seen in the included spreadsheet
<br/>
<b>Here is the example spreadsheet:</b>
https://docs.google.com/spreadsheets/d/1arMaaT56S_STTvRr2OxCINTyF-VvZ95Pm3mljju8Cxw/edit?usp=sharing
**Current REGEXREPLACE**
Kinda works, finds each 'type' and 'value' great, just cant figure out how to extract just that from the rest, tried capture (and non-capturing) groups before and after but didnt work
=ARRAYFORMULA(REGEXREPLACE($A3:$A,"[\n.][effect\d][\n.](.)\n(.)","1:$1 2:$2"))
**Current SUBSTITUTE + REGEXEXTRACT + REGEXREPLACE**
A different approach entirely, also kinda works, longer form though and left with having to parse the values out of that string, where got stuck again. Idea was to use this to simplify, then regexreplace like above. Getting stuck removing content around the final matches though, and if can do that then above approach is fine too.
// First ran a substitute
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE($A3:$A,char(10),";"),";;",char(10)))
// Then variation of this (gave up on single line 'effect/d' so broke it up to try and get it working)
=ARRAYFORMULA(IF(A3:A<>"",IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect0]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect1]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect2]);(.)$")&";;"),""))
// Then use regexreplace like above
=ARRAYFORMULA(REGEXREPLACE($B3:$B,"value = (.);type = (.);;","1:$1 2:$2"))
**--EDIT--**
Also, as my updated 'Desired Output' sheet shows (see timestamped comment below), bonus kudos if you can also extract just the values of matching 'type's to those extra columns (see spreadsheet).
All good if you cant though, just realized would need that too for lookups.
**--END OF EDIT--**
<br/>
Ive tried dozens of things, discarding each in turn, had a quick look in version history to grab out two promising attempts and shared them in separate sheets.
One of these also used SUBSTITUTE to simplify input column, im happy for a solution using either RAW or the SUBSTITUTE results.
<br/>
**Potentially Useful links:**
https://github.com/google/re2/wiki/Syntax
<br/>
<b>Just some more words:</b>
I also have looked at dozens of stackoverflow and google support pages, so tried both REGEXEXTRACT and REGEXREPLACE, both promising but missing that final tweak. And i tried dozens of tweaks already on both.
Any help would be great, and hopefully help others in future since examples with spreadsheets are great since every new REGEX seems to be a new adventure ;)
<br/>
P.S. if we can think of better title for OP, please say in comment or your answer :)
paste in B3:
=ARRAYFORMULA(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
IF(C3:E<>"", C2:E2&":"&C3:E, )),,999^99))), " ", ", "))
paste in C3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&C2)))
paste in D3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&D2)))
paste in E3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&E2)))
paste in F3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[feature\d+\]\nname = (.*)")))
paste in G3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[components\]\n\d+ = (.*)")))
paste in H3:
=ARRAYFORMULA(IFNA(REGEXREPLACE(INDEX(SPLIT(REGEXEXTRACT(
REGEXREPLACE(A3:A, "\n", ", "), "\[resources\], (.*)"), "["),,1), ", , $", )))
spreadsheet demo
This was a fun exercise. :-)
Caveat first: I have added some "input data". Examples:
[feature1]
name = feature_active_spoiler2
[components]
0 = spoiler,1
1 = spoilerA, 2
So the output has "extra" output.
See the tab ADW's Solution.

Difficulty Understanding TensorFlow Computations

I'm new to TensorFlow and have difficulty understanding how the computations works. I could not find the answer to my question on the web.
For the following piece of code, the last time I print "d" in the for loop of the "train_neural_net()" function, I'm expecting the values to be identical to when I print "test_distance.eval". But they are way different. Can anyone tell me why this is happening? Isn't TensorFlow supposed to cache the Variable results learned in the for loop and use them when I run "test_distance.eval"?
def neural_network_model1(data):
nn1_hidden_1_layer = {'weights': tf.Variable(tf.random_normal([5, n_nodes_hl1])), 'biasses': tf.Variable(tf.random_normal([n_nodes_hl1]))}
nn1_hidden_2_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])), 'biasses': tf.Variable(tf.random_normal([n_nodes_hl2]))}
nn1_output_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl2, vector_size])), 'biasses': tf.Variable(tf.random_normal([vector_size]))}
nn1_l1 = tf.add(tf.matmul(data, nn1_hidden_1_layer["weights"]), nn1_hidden_1_layer["biasses"])
nn1_l1 = tf.sigmoid(nn1_l1)
nn1_l2 = tf.add(tf.matmul(nn1_l1, nn1_hidden_2_layer["weights"]), nn1_hidden_2_layer["biasses"])
nn1_l2 = tf.sigmoid(nn1_l2)
nn1_output = tf.add(tf.matmul(nn1_l2, nn1_output_layer["weights"]), nn1_output_layer["biasses"])
return nn1_output
def neural_network_model2(data):
nn2_hidden_1_layer = {'weights': tf.Variable(tf.random_normal([5, n_nodes_hl1])), 'biasses': tf.Variable(tf.random_normal([n_nodes_hl1]))}
nn2_hidden_2_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])), 'biasses': tf.Variable(tf.random_normal([n_nodes_hl2]))}
nn2_output_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl2, vector_size])), 'biasses': tf.Variable(tf.random_normal([vector_size]))}
nn2_l1 = tf.add(tf.matmul(data, nn2_hidden_1_layer["weights"]), nn2_hidden_1_layer["biasses"])
nn2_l1 = tf.sigmoid(nn2_l1)
nn2_l2 = tf.add(tf.matmul(nn2_l1, nn2_hidden_2_layer["weights"]), nn2_hidden_2_layer["biasses"])
nn2_l2 = tf.sigmoid(nn2_l2)
nn2_output = tf.add(tf.matmul(nn2_l2, nn2_output_layer["weights"]), nn2_output_layer["biasses"])
return nn2_output
def train_neural_net():
prediction1 = neural_network_model1(x1)
prediction2 = neural_network_model2(x2)
distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(prediction1, prediction2)), reduction_indices=1))
cost = tf.reduce_mean(tf.multiply(y, distance))
optimizer = tf.train.AdamOptimizer().minimize(cost)
hm_epochs = 500
test_result1 = neural_network_model1(x3)
test_result2 = neural_network_model2(x4)
test_distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(test_result1, test_result2)), reduction_indices=1))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(hm_epochs):
_, d = sess.run([optimizer, distance], feed_dict = {x1: train_x1, x2: train_x2, y: train_y})
print("Epoch", epoch, "distance", d)
print("test distance", test_distance.eval({x3: train_x1, x4: train_x2}))
train_neural_net()
Each time you call the functions neural_network_model1() or neural_network_model2(), you create a new set of variables, so there are four sets of variables in total.
The call to sess.run(tf.global_variables_initializer()) initializes all four sets of variables.
When you train in the for loop, you only update the first two sets of variables, created with these lines:
prediction1 = neural_network_model1(x1)
prediction2 = neural_network_model2(x2)
When you evaluate with test_distance.eval(), the tensor test_distance depends only on the variables that were created in the last two sets of variables, which were created with these lines:
test_result1 = neural_network_model1(x3)
test_result2 = neural_network_model2(x4)
These variables were never updated in the training loop, so the evaluation results will be based on the random initial values.
TensorFlow does include some code for sharing weights between multiple calls to the same function, using with tf.variable_scope(...): blocks. For more information on how to use these, see the tutorial on variables and sharing on the TensorFlow website.
You don't need to define two function for generating models, you can use tf.name_scope, and pass a model name to the function to use it as a prefix for variable declaration. On the other hand, you defined two variables for distance, first is distance and second is test_distance . But your model will learn from train data to minimize cost which is only related to first distance variable. Therefore, test_distance is never used and the model which is related to it, will never learn anything! Again there is no need for two distance functions. You only need one. When you want to calculate train distance, you should feed it with train data and when you want to calculate test distance you should feed it with test data.
Anyway, if you want second distance to work, you should declare another optimizer for it and also you have to learn it as you have done for first one. Also you should consider the fact that models are learning base on their initial values and training data. Even if you feed both models with exactly same training batches, you can't expect to have exactly similar characteristics models since initial values for weights are different and this could cause falling into different local minimum of error surface. At the end notice that whenever you call neural_network_model1 or neural_network_model2 you will generate new weights and biases, because tf.Variable is generating new variables for you.

Get information from a TXT File

I have a txt file that has 7 columns and I am trying to extract data from. Essentially there is a column that has a lot of minimum values, a column solely consists of dashes, column of only maximum values, and a few others that I would like to break into their own lists (I think thats the way to go). Any help would be much appreciated. Thanks!
Edit: Sorry I should have been clearer. I am using Python 3.5, grabbing right from the txt and using split actually. I guess I should ask where to go from there. I currently have it loading a file and using split(). End game I would like to be able to put each column into its own list so I can calculate averages, percentages, etc. Thanks again, sorry about the bad initial post, its my first time posting here.
file = open("year2000.txt")
for line in file:
z = line.strip()
z = line.find(" ")
min_sal1 = line[:z]
min_sal2 = min_sal1.replace(',', '')
min_sal3 = min_sal2.find('.')
min_sal4 = min_sal1[:min_sal3]
min_sal = int(min_sal4)
print(min_sal4)
y = z.find(' ', 2)
x = z.find(' ', 3)
max_sal = line[y:x]
print(max_sal)
After running this, I get a list of all min salarys like it should, however for max values I am getting just a bunch of blank lines. I also plan on putting each type of value into their own lists. Thanks

Importing unfriedly formatted data in Excel and forcing messy values as column names

I'm trying to import some publicly available life outcomes data using the code below:
require(gdata)
# Source SIMD12 data zone level data
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE)
Naturally, the imported data frame doesn't look good:
I would like to amend my column names using the code below:
# Clean column names
names(simd.sg.xls) <- make.names(names = as.character(simd.sg.xls[1,]),
unique = TRUE,allow_ = TRUE)
But it produces rather unpleasant results:
> names(simd.sg.xls)
[1] "X1" "X1.1" "X771" "X354" "X229" "X74" "X67" "X33" "X19" "X1.2"
[11] "X6" "X1.3" "X8" "X7" "X7.1" "X6506" "X21" "X1.4" "X6158" "X6506.1"
[21] "X6506.2" "X6506.3" "X6263" "X6506.4" "X6468" "X1010" "X815" "X99" "X58" "X65"
[31] "X60" "X6506.5" "X21.1" "X1.5" "X6173" "X5842" "X6506.6" "X6506.7" "X6263.1" "X6506.8"
[41] "X6481" "X883" "X728" "X112" "X69" "X56" "X54" "X6506.9" "X21.2" "X1.6"
[51] "X6143" "X5651" "X6506.10" "X6506.11" "X6263.2" "X6506.12" "X6480" "X777" "X647" "X434"
[61] "X518" "X246" "X436" "X6506.13" "X21.3" "X1.7" "X6136" "X5677" "X6506.14" "X6506.15"
[71] "X6263.3" "X6506.16" "X660" "X567" "X480" "X557" "X261" "X456"
My question is if there is a way to neatly force the values from the first row to the column names? As I'm doing a lot of data I'm looking for solution that would be easily reproducible, I can accommodate a lot of violation to the actual strings to get syntactically correct names but ideally I would avoid faffing around with elaborate regular expressions as I'm often reading files like the one linked here and don't wan to be forced to adjust the rules for each single import.
It looks like the problem is that the header is on the second line, not the first. You could include a skip=1 argument but a more general way of dealing with this using read.xls seems to be to use the pattern and header arguments which force the first line which matches the pattern string to be treated as the header. Your code becomes:
require(gdata)
# Source SIMD12 data zone level data
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE,
pattern="DATAZONE", header=TRUE)
UPDATE
I don't get the warning messages you do when I execute the code. The messages refer to an issue with locale. The locale settings on my system are:
Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Yours are probably different. Locale data could be OS dependent. I'm using Windows 8.1. Also I'm using Strawberry Perl; you appear to be using something else. So some possible reasons for the discrepancy in warning messages but nothing more specific.
On the second question in your comment, to read the entire file, and convert a particular row ( in this case, row 2) to column names, you could use the following code:
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE,
header=FALSE, stringsAsFactors=FALSE)
names(simd.sg.xls) <- make.names(names = simd.sg.xls[2,],
unique = TRUE,allow_ = TRUE)
simd.sg.xls <- simd.sg.xls[-(1:2),]
All data will be of character type so you'll need to convert to factor and numeric as necessary.