GCP Metadata Larger 256kb - google-cloud-platform

i need to use metadata user-data for few instances (on linux FCOS) larger than 256kb, but i got:
is too large: maximum size 262144 character
https://cloud.google.com/compute/docs/metadata/setting-custom-metadata#limitations
It's possible to:
anyhow use larger than 256kb metadata user-data?
or fetch user-data for instance from a remote server e.g.: I could upload this user-data to any HTTP server?

I found.. solution via remote ignition file.
I tried increasing the 256kb per single metadata limit via google support but they cant increase this for me.
Workaround via remote ignition file:
In Fedora CoreOS user-data add the below snippet
On Nginx share a big ignition file larger than 256kb
Start instance, instance will connect to remote Nginx to get big ignition file
{
"ignition": {
"config": {
"merge": [
{
"source": "http://xxxx/file.ign"
}
]
},
"version": "3.2.0"
}
}

Related

AWS Keyspace DSBulk unload failed, "Token metadata not present"

Getting error when trying to unload or count data from AWS Keyspace using dsbulk.
Error:
Operation COUNT_20221021-192729-813222 failed: Token metadata not present.
Command line:
$ dsbulk count/unload -k my_best_storage -t book_awards -f ./dsbulk_keyspaces.conf
Config:
datastax-java-driver {
basic.contact-points = [ "cassandra.us-east-2.amazonaws.com:9142"]
advanced.auth-provider {
class = PlainTextAuthProvider
username = "aw.keyspaces-at-XXX"
password = "XXXX"
}
basic.load-balancing-policy {
local-datacenter = "us-east-2"
}
basic.request {
consistency = LOCAL_QUORUM
default-idempotence = true
}
advanced {
request{
log-warnings = true
}
ssl-engine-factory {
class = DefaultSslEngineFactory
truststore-path = "./cassandra_truststore.jks"
truststore-password = "XXX"
hostname-validation = false
}
metadata {
token-map.enabled = false
}
}
}
dsbulk load - loading operator works fine...
I suspect the problem here is that your cluster is using the proprietary com.amazonaws.cassandra.DefaultPartitioner partitioner which most open-source tools and drivers don't recognise.
The DataStax Bulk Loader (DSBulk) tool uses the Cassandra Java driver under the hood to connect to Cassandra clusters. The Java driver uses the partitioner to determine which nodes own tokens [ranges]. Only the following Cassandra partitioners are supported:
Murmur3Partitioner
RandomPartitioner
ByteOrderedPartitioner
Since the Java driver doesn't know about DefaultPartitioner, it doesn't have a map of token range owners (token metadata) and so can't determine how to "split" the Cassandra ring to query the nodes.
As you already figured out, this doesn't affect the load command because it simply sends writes to coordinators and lets the coordinators figure out how the data is partitioned. But for unload and count commands which require reads, the Java driver can't determine which coordinators to pick for sub-range queries with an unsupported partitioner.
Maybe as a workaround you can try to disable token-awareness with:
$ dsbulk count [...]
--driver.advanced.metadata.token-map.enabled false
but I don't have an AWS Keyspaces cluster I could test and I'm doubtful it will work. In any case, you're welcome to try.
There is an outstanding DSBulk feature request to provide the ability to completely disable token-awareness (internal ticket ID DAT-622) but it is unassigned at the time of writing so I'm not in a position to provide any expectation on when it will be prioritised. Cheers!
Amazon Keyspaces now supports multiple partitioners including MurMr3Partitioner. See the following to update your partitioner. You will also want to set token-map.enabled to true.
metadata {
token-map.enabled = true
}
Additionally, if you are using VPC Endpoints you will need the following permissions to make sure that you will see available peers.
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"ListVPCEndpoints",
"Effect":"Allow",
"Action":[
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeVpcEndpoints"
],
"Resource":"*"
}
]
}
I would also recommend increasing the connection pool size for the data load process.
advanced.connection.pool.local.size = 3
Finally, I would recommend using AWS glue instead of DSBulk. DSBulk is single process tool and will not scale for larger data loads. Additionally, learning glue will be helpful in managing other aspects of the data lifecycle. See my example on how to unload/export data using AWS Glue.

Bypassing Cloud Run 32mb error via HTTP2 end to end solution

I have an api query that runs during a post request on one of my views to populate my dashboard page. I know the response size is ~35mb (greater than the 32mb limits set by cloud run). I was wondering how I could by pass this.
My configuration is set via a hypercorn server and serving my django web app as an asgi app. I have 2 minimum instances, 1gb ram, 2 cpus per instance. I have run this docker container locally and can't bypass the amount of data required and also do not want to store the data due to costs. This seems to be the cheapest route. Any pointers or ideas would be helpful. I understand that I can bypass this via http2 end to end solution but I am unable to do so currently. I haven't created any additional hypecorn configurations. Any help appreciated!
The Cloud Run HTTP response limit is 32 MB and cannot be increased.
One suggestion is to compress the response data. Django has compression libraries for Python or just use zlib.
import gzip
data = b"Lots of content to compress"
cdata = gzip.compress(s_in)
# return compressed data in response
Cloud Run supports HTTP/1.1 server side streaming, which has unlimited response size. All you need to do is use chunked transfer encoding.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding

"Host header is specified and is not an IP address or localhost" message when using chromedp headless-shell

I'm trying to deploy chromedp/headless-shell to Cloud Run.
Here is my Dockerfile:
FROM chromedp/headless-shell
ENTRYPOINT [ "/headless-shell/headless-shell", "--remote-debugging-address=0.0.0.0", "--remote-debugging-port=9222", "--disable-gpu", "--headless", "--no-sandbox" ]
The command I used to deploy to Cloud Run is
gcloud run deploy chromedp-headless-shell --source . --port 9222
Problem
When I go to this path /json/list, I expect to see something like this
[{
"description": "",
"devtoolsFrontendUrl": "/devtools/inspector.html?ws=localhost:9222/devtools/page/B06F36A73E5F33A515E87C6AE4E2284E",
"id": "B06F36A73E5F33A515E87C6AE4E2284E",
"title": "about:blank",
"type": "page",
"url": "about:blank",
"webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/B06F36A73E5F33A515E87C6AE4E2284E"
}]
but instead, I get this error:
Host header is specified and is not an IP address or localhost.
Is there something wrong with my configuration or is Cloud Run not the ideal choice for deploying this?
This specific issue is not unique to Cloud Run. It originates from an existing change in the Chrome DevTools Protocol which generates this error when accessing it remotely. It could be attributed to security measures against some types of attacks. You can see the related Chromium pull request here.
I deployed a chromedp/headless-shell container to Cloud Run using your configuration and also received the same error. Now, there is this useful comment in a GitHub issue showing a workaround for this problem, by passing a HOST:localhost header. While this does work when I tested it locally, it does not work on Cloud Run (returns a 404 error). This 404 error could be due to how Cloud Run also utilizes the HOST header to route requests to the correct service.
Unfortunately this answer is not a solution, but it sheds some light on what you are seeing and why. I would go for using a different service from GCP, such a GCE that are pure virtual machines and less managed.

How to increase _cluster/settings/cluster.max_shards_per_node for AWS Elasticsearch Service

I uses AWS Elasticsearch service version 7.1 and its built-it Kibana to manage application logs. New indexes are created daily by Logstash. My Logstash gets error about maximum shards limit reach from time to time and I have to delete old indexes for it to become working again.
I found from this document (https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-handling-errors.html) that I have an option to increase _cluster/settings/cluster.max_shards_per_node.
So I have tried that by put following command in Kibana Dev Tools
PUT /_cluster/settings
{
"defaults" : {
"cluster.max_shards_per_node": "2000"
}
}
But I got this error
{
"Message": "Your request: '/_cluster/settings' payload is not allowed."
}
Someone suggests that this error occurs when I try to update some settings that are not allowed by AWS, but this document (https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-supported-es-operations.html#es_version_7_1) tells me that cluster.max_shards_per_node is one in the allowed list.
Please suggest how to update this settings.
You're almost there, you need to rename defaults to persistent
PUT /_cluster/settings
{
"persistent" : {
"cluster.max_shards_per_node": "2000"
}
}
Beware though, that the more shards you allow per node, the more resources each node will need and the worse the performance can get.

How to increase the vm.max_map_count in AWS ECS fargate?

I am trying to run a sonarqube app on the AWS fargate platform. When I run the raw docker image it works like a charm. But If I pass the JDBC properties to the container as an argument I am facing the following issue. Apparently, the elastic search needs a new config. If it is an ECS cluster I would have ssh into the EC2 instances and update these properties. In the case of fargate, how do I achieve this?
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
From the github issue seems like it not possible as there is no EC2 instance or Host ivolvole in Fargate.
the workarounds for the max_map_count error appear to be setting max_memory_map directly on the host (which may result in undesirable side effects, or using the sysctl flag on on the docker run command. Unfortunately, neither of these options are not supported in Fargate since it involves interacting with the container instance itself.
But the other way is to increase file limit and disable mmap check.
I had to properly configure U limits on my ECS task definition, something like:
"ulimits": [
{
"name": "nofile",
"softLimit": 65535,
"hardLimit": 65535
}
]
I've disabled mmap in ElasticSearch, which gets rid of the max_map_count setting requirement. This can be done by configuring the sonar.search.javaAdditionalOpts SonarQube setting. I wasn't able to do it with an environment variable since ECS seems to be eating them, but in the end I just passed it as a parameter to the container, which works since the entrypoint is set and consumes arguments properly. In my case:
"command": [
"-Dsonar.search.javaAdditionalOpts=-Dnode.store.allow_mmapfs=false"
]
sonarqube disable nmap
Adding the variable discovery.type = single-node under the "Environment Variables" section (edit the es container) in the task definition resolved the issue for me. If you're using the json file, then add the following section
"environment": [
{
"name": "discovery.type",
"value": "single-node"
}
],
Unfortunately, I don't know why that worked.