I have a section in a Rmarkdown file such as:
* Building your YAML spec
+ we can get all the keys each kind supports
```{bash, eval=F}
#apiVersion: v1
#kind: Service
kubectl explain services --recursive
```
+ sepc section
```{bash, eval=F}
kubectl explain services.spec
```
+ sepc type
```{bash, eval=F}
kubectl explain services.spec.type
```
And after I rendered it:
My question is why it doesn't recognize the two pluses as items?
Figure out the solution: Use four spaces to indent code chunks between bullet points
* Building your YAML spec
+ we can get all the keys each kind supports
```{bash, eval=F}
#apiVersion: v1
#kind: Service
kubectl explain services --recursive
```
+ sepc section
```{bash, eval=F}
kubectl explain services.spec
```
+ sepc type
```{bash, eval=F}
kubectl explain services.spec.type
```
This will output the expected items.
In the Rmarkdown flavour of Markdown, the plus sign + is reserved for sub-items of a list. However, to be a sub-item, there needs to be an item. Unfortunately, when you insert a code chunk, you're ending the list, and therefore lose the connection between the original item "Building your YAML spec" and the following sub-items "sepc section" and "sepc type".
It may help to look at the rendered HTML. You should see that before the code chunk, there is a closing tag </ul>.
Related
I am trying to extract text that exists inside root level brackets from a string in Spark-SQL. I have used the function regexp_extract() on both Spark-SQL and Athena on the same string with the same regex.
On Athena, it's working fine.
But on Spark-SQL, it is not returning the value as expected.
Query is:
SELECT regexp_extract('Russia (Federal Service of Healthcare)', '.+\s\((.+)\)', 1) AS cl
Output On Athena:
Federal Service of Healthcare
Output on Spark-SQL:
ia (Federal Service of Healthcare)
I am bumping my head around but can't seem to find a solution around this.
This does the trick:
SELECT regexp_extract('Russia (Federal Service of Healthcare)', '.+\\\\s\\\\((.+)\\\\)', 1) AS cl
output:
+-----------------------------+
|cl |
+-----------------------------+
|Federal Service of Healthcare|
+-----------------------------+
The s is not being escaped in your example, that's why it falls as part of the group; you can also use the regexp_extract API directly which makes a cleaner solution:
.withColumn("cl", regexp_extract(col("name"), ".+\\s\\((.+)\\)", 1))
Good luck!
Let's say I want to list pods, and show their name and the number of containers they're running. If I just want the images tags themselves, I could do something like
λ kubectl get pods --output custom-columns='NAME:.metadata.namespace,IMAGES:.spec.containers[*].image'
NAME IMAGES
prometheus-system quay.io/prometheus/prometheus:v2.21.0,quay.io/prometheus-operator/prometheus-config-reloader:v0.42.1,jimmidyson/configmap-reload:v0.4.0
prometheus-system quay.io/prometheus-operator/prometheus-operator:v0.42.1
But how do I make it display just the number of containers? In other words, what do I put for the selector to get the lenght of the slice, to give me output like this instead?
λ kubectl get pods --output custom-columns='NAME:.metadata.namespace,CONTAINERS:<what goes here?>'
NAME CONTAINERS
prometheus-system 3
prometheus-system 1
(Eventually, I want to put this on a CRD to display the length of a list in its default output, but I figure this use case is more reproducible, and therefore easier to relate to. IIUC - but please correct me if I'm wrong! - a solution that works for this question, will also work for the display-columns of a CRD...)
I have tried this command;
gcloud alpha scc assets list <ORGANISATION-ID> --filter "security_center_properties.resource.type="google.iam.ServiceAccount" AND resource_properties.name:\"Compute Engine default service account\""
but I am recieving the following error;
(gcloud.alpha.scc.assets.list) INVALID_ARGUMENT: Invalid filter.
When I remove the filter after AND, I don't get an error message but I just see an >
Any ideas where I am going wrong?
I have reviewed this documentation to support me building the command but not sure which is the right filter to use.
I wonder if i should be filtering on the email of a compute engine default service account that ends "-compute#developer.gserviceaccount.com" but I can't identify what the right filter for this is.
The problem is the use of " on the filter.
You need to type --filter and put the filter like this: "FILTER_EXPRESION".
One filter expression could be: security_center_properties.resource_type="google.compute.Instance"
But you can not put a double quote inside a double quote block. So you need to use the back slash (\),if not, the command interpret the first double quote of the filter as the end of the filter.
On the other hand if you delete part of the command the prompt shows you '>' because there is a double quote block that is not end and it is waiting that you ends the command.
So the filter that you want has to be like this, for example:
gcloud alpha scc assets list <ORGANIZATION ID> \
--filter "security_center_properties.resource_type=\"google.compute.Instance\" AND security_center_properties.resource_type=\"google.cloud.resourcemanager.Organization\""
I hope that this explanation could help you!
I want to produce the output of my map function, filtering the data by dates.
In local tests, I simply call the application passing the dates as parameters as:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 | ./reducer.py
Then the parameters are taken in the map function
#!/usr/bin/python
date1 = sys.argv[1];
date2 = sys.argv[2];
The question is:
How do I pass the date parameters to the map calling on Amazon EMR?
I am a beginner in Map reduce. Will appreciate any help.
First of all,
When you run a local test, and you should as often as possible.
the correct format (in order to reproduce how map-reduce works) is:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 | sort | ./reducer.py | sort
That the way the hadoop framework works.
If you are looking on a big file, you should do it in steps to verify results of each line.
meaning:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 > map_result.txt
cat map_result.txt | sort > map_result_sorted.txt
cat map_result_sorted.txt | ./reducer.py > reduce_result.txt
cat reduce_result.txt | sort > map_reduce_result.txt
In regard to your main question:
Its the same thing.
If you are going to use the amazon web console to create your cluster, in the add step window you just write as fallowing:
name: learning amazon emr
Mapper: (here they say: please give us s3 path to your mapper, we will ignore that, and just write our script name and parameters, no backslash...) mapper.py 20/12/2014 31/12/2014
Reducer: (the same as in the mapper) reducer.py (you can add here params too)
Input location: ...
Output location: ... (just remember to use a new output every time, or your task will fail)
Arguments: -files s3://cod/mapper.py,s3://cod/reducer.py (use your file path here, even if you add only one file use the -files argument)
That's it
If you are going into the all argument thing, i suggest you see this guy blog on how to use the passing of arguments in order to use only a single map,reduce file.
Hope it helped
I am running a series of clustering analyses in weka and I have realized that automatizing it is the way to go if I want to get somewhere. I'll explain a bit how I am working.
I do all the pre-processing manually in R and save it as a csv file, importing it in weka and saving it again as an arff file.
I use weka's GUI, and in general I just open my data with in the arff file and go directly to the clustering tab and play around. (My experience using the CLI is limited).
I am trying to reproduce some results I've got by using the GUI, but now with commands in the CLI. The problem is that I usually ignore a list of attributes when clustering using the GUI. I cannot find a way of selecting a list of attributes to be ignored in the command line.
For example:
java weka.clusterers.XMeans \
-I 10 -M 1000 -J 1000 \
-L 2 -H 9 -B 1.0 -C 0.25 \
-D "weka.core.MinkowskiDistance -R first-last" -S 10 \
-t "/home/pedrosaurio/bigtable.arff"
My experience with weka is limited so I don't know if I am missing some basic understanding of how it works.
Data Preprocessing functions are called filters.
You need to use filters together with cluster algorithm.
See below example.
java weka.clusterers.FilteredClusterer \
-F weka.filters.unsupervised.attribute.Remove -V -R 1,5 \
-W weka.clusterers.XMeans -I 10 -M 1000 -J 1000 -L 2 -H 9 -B 1.0 -C 0.25 \
-D "weka.core.MinkowskiDistance -R first-last" -S 10 \
-t "/home/pedrosaurio/bigtable.arff"
Here we remove attributes 1-5 then use xmeans.
To ignore an attribute you have to do it from the distance function
Ignore attributes from command line (Matlab):
COLUMNS = '3-last'; % The indices start from 1, 'first' and 'last' are valid as well. E.g .: first-3,5,6-last
Df = weka.core.EuclideanDistance (); % Setup distance function.
Df.setAttributeIndices (COLUMNS); % Setup distance function.
Ignore attributes from GUI
Ignore attributes from GUI
I do not understand why when someone asks how to ignore attributes all the answers say how to modify the dataset, using a filter in the preprocess section.