I'm just learning service mesh using istio and I found a strange behavior.
To understand maxRequestsPerConnection of Istio DestinationRule CRD, I write the below manifest and apply it.
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: httpbin
spec:
host: httpbin
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1
http:
http1MaxPendingRequests: 1
maxRequestsPerConnection: 1
And then, I sent requests using fortio. The result is below:
yunoMacBook-Air:labo8 yu$ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 5 -qps 0 -n 1000 -loglevel Error http://httpbin:8000/get
07:12:01 I logger.go:127> Log level is now 4 Error (was 2 Info)
Fortio 1.11.3 running at 0 queries per second, 2->2 procs, for 1000 calls: http://httpbin:8000/get
Aggregated Function Time : count 1000 avg 0.0036879818 +/- 0.004588 min 0.000379697 max 0.034176044 sum 3.68798183
# target 50% 0.00234783
# target 75% 0.0032551
# target 90% 0.008
# target 99% 0.025
# target 99.9% 0.032784
Sockets used: 876 (for perfect keepalive, would be 5)
Jitter: false
Code 200 : 126 (12.6 %)
Code 503 : 874 (87.4 %)
All done 1000 calls (plus 0 warmup) 3.688 ms avg, 1170.1 qps
yunoMacBook-Air:labo8 yu$
After that, I changed maxRequestsPerConnection value to 10:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: httpbin
spec:
(...)
maxRequestsPerConnection: 10
and I sent requests again with the same settings.
yunoMacBook-Air:labo8 yu$ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 5 -qps 0 -n 1000 -loglevel Error http://httpbin:8000/get
07:11:07 I logger.go:127> Log level is now 4 Error (was 2 Info)
Fortio 1.11.3 running at 0 queries per second, 2->2 procs, for 1000 calls: http://httpbin:8000/get
Aggregated Function Time : count 1000 avg 0.0039736575 +/- 0.004068 min 0.000404827 max 0.030141552 sum 3.97365754
# target 50% 0.00231923
# target 75% 0.00475
# target 90% 0.0104667
# target 99% 0.0192
# target 99.9% 0.025
Sockets used: 723 (for perfect keepalive, would be 5)
Jitter: false
Code 200 : 281 (28.1 %)
Code 503 : 719 (71.9 %)
All done 1000 calls (plus 0 warmup) 3.974 ms avg, 1098.3 qps
yunoMacBook-Air:labo8 yu$
200 rate increased and I cannot understand why it happened.
In my understanding, fortio uses http/1.1 and only one HTTP request is in one TCP connection when I use http/1.1. So I expected that I get the same results.
Could you tell me why this happened?
First things first: HTTP/1.1 does allow multiple request per connection with Keep-Alive header. This is the default behavior (RFC 2616, Section 8.1).
The documentation is a bit unclear.
maxRequestsPerConnection description states:
Maximum number of requests per connection to a backend. Setting this parameter to 1 disables keep alive. Default 0, meaning “unlimited”, up to 2^29.
Setting maxRequestsPerConnection to 1 disables Keep-Alive. Setting it to any other value (value > 1) switches Keep-Alive back on.
Setting this field to proper value (not too high, not too low) is the hard part of configuring Istio, and is dependent on your application needs and traffic.
Related
I'm defining a data series for testing a Prometheus alert using the container_last_seen metric from the cadvisor exporter.
How do I enter timestamp series values, as returned by the container_last_seen metric? I'm testing Prometheus alerts on an Apple Mac which run in production on Linux boxes.
Here's one thing I tried:
input_series:
- series: |
container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
values: '1563968832+0x61'
It seems whatever I put in the values for the series is not accepted.
I've also tried durations: '0h+1mx60'
As this is legal: time() - container_last_seen{...} cls is definitely a timestamp, and I would expect a timestamp to be represented by a Unix epoch number. Executing the query on Prometheus gives Unix epoch times, but putting numbers in a series is rejected with the error below.
promtool is recognising the different types but giving much the same error:
➜ promtool test rules alertrules-service-oriented-test.yml
Unit Testing: alertrules-service-oriented-test.yml
FAILED:
1:1: parse error: unexpected number "0" in series values
If the values are '1h+0mx61', promtool correctly identifies the values as durations:
1:1: parse error: unexpected duration "1h" in series values
Note that when this test is commented out, there is no 1:1: parse error and the tests complete successfully. This is not a problem with out of sight parts of the test file.
Thanks for any insights.
Here's the alert:
alertrules.yaml:
- name: containers
interval: 15s
rules:
- alert: prod_container_crashing
expr: |
count by (instance, container_label_com_docker_swarm_service_name)
(
count_over_time(container_last_seen{container_label_com_docker_swarm_service_name!="",env="prod"}[15m])
) - 1 > 2
for: 5m
labels:
service: prod
type: container
severity: critical
annotations:
summary: "pdce {{ $labels.container_label_com_docker_swarm_service_name }}"
description: "{{ $labels.container_label_com_docker_swarm_service_name }} in prod cluster on {{ $labels.instance }} is crashing"
and here's the test file:
alertrules_test.yml:
rule_files:
- alertrules.yml
evaluation_interval: 1m
tests:
- name: container_tests
interval: 15s
input_series:
- series: |
container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
values: '1563968832+0x61'
alert_rule_test:
- eval_time: 15m
alertname: prod_container_crashing
exp_alerts:
- exp_labels:
service: prod
type: container
severity: critical
exp_annotations:
summary: prod service1
description: service1 in prod cluster on 10.0.0.1 is crashing
When the series: value is all on one line, without a > or | yaml flow operator, e.g.
- series: container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
values: '1563968832+0x61'
the error is not there, I don't know why. So this doesn't appear to be a data typing issue.
It's a shame for readability reasons-- either Prometheus or GoLang may have a squeaky wheel in their YAML implementation.
I've provisions a Keyspace on AWS and in order to make sure it can achieve our desired performance I'm trying to run the cassandra-stress tool on it and compare it to other architectures we're experimenting with.
I managed to connect to it using the following cqlshrc:
[connection]
port = 9142
factory = cqlshlib.ssl.ssl_transport_factory
[ssl]
validate = true
certfile = /root/.cassandra/AmazonRootCA1.pem
And the following command (hoping that soon enough there will be Python3 support, the development was completed this February according to their Jira ticket):
cqlsh cassandra.eu-central-1.amazonaws.com 9142 -u "myuser-at-722222222222" -p "12/12ZmHmtD1klsDk9cgqt/XXXXXXXXxUz6Sy687z/U=" --ssl --cqlversion="3.4.4"
Surprisingly or not, when using the official AWS guides things tend to work.
So I went on and tried connecting the cassandra-stress tool (I have it inside a Docker container, I'd rather keep my OS Java free) to the same Keyspace.
First I converted the AWS AmazonRootCA1.pem into cassandra_truststore.jks using the following commands (explained here):
openssl x509 -outform der -in AmazonRootCA1.pem -out temp_file.der
keytool -import -alias cassandra -keystore cassandra_truststore.jks -file temp_file.der
Now when I'm trying to run the actual tool like this:
./cassandra-stress write -node cassandra.eu-central-1.amazonaws.com -port native=9142 thrift=9142 jmx=9142 -transport truststore=/root/.cassandra/cassandra_truststore.jks truststore-password=mypassword -mode native cql3 user="myuser-at-722222222222" password="12/12ZmHmtD1klsDk9cgqt/XXXXXXXXxUz6Sy687z/U="
I'm getting the following error:
******************** Stress Settings ********************
Command:
Type: write
Count: -1
No Warmup: false
Consistency Level: LOCAL_ONE
Target Uncertainty: 0.020
Minimum Uncertainty Measurements: 30
Maximum Uncertainty Measurements: 200
Key Size (bytes): 10
Counter Increment Distibution: add=fixed(1)
Rate:
Auto: true
Min Threads: 4
Max Threads: 1000
Population:
Sequence: 1..1000000
Order: ARBITRARY
Wrap: true
Insert:
Revisits: Uniform: min=1,max=1000000
Visits: Fixed: key=1
Row Population Ratio: Ratio: divisor=1.000000;delegate=Fixed: key=1
Batch Type: not batching
Columns:
Max Columns Per Key: 5
Column Names: [C0, C1, C2, C3, C4]
Comparator: AsciiType
Timestamp: null
Variable Column Count: false
Slice: false
Size Distribution: Fixed: key=34
Count Distribution: Fixed: key=5
Errors:
Ignore: false
Tries: 10
Log:
No Summary: false
No Settings: false
File: null
Interval Millis: 1000
Level: NORMAL
Mode:
API: JAVA_DRIVER_NATIVE
Connection Style: CQL_PREPARED
CQL Version: CQL3
Protocol Version: V4
Username: myuser-at-722222222222
Password: *suppressed*
Auth Provide Class: null
Max Pending Per Connection: 128
Connections Per Host: 8
Compression: NONE
Node:
Nodes: [cassandra.eu-central-1.amazonaws.com]
Is White List: false
Datacenter: null
Schema:
Keyspace: keyspace1
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Strategy Pptions: {replication_factor=1}
Table Compression: null
Table Compaction Strategy: null
Table Compaction Strategy Options: {}
Transport:
factory=org.apache.cassandra.thrift.TFramedTransportFactory; truststore=/root/.cassandra/cassandra_truststore.jks; truststore-password=mypassword; keystore=null; keystore-password=null; ssl-protocol=TLS; ssl-alg=SunX509; store-type=JKS; ssl-ciphers=TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA;
Port:
Native Port: 9142
Thrift Port: 9142
JMX Port: 9142
Send To Daemon:
*not set*
Graph:
File: null
Revision: unknown
Title: null
Operation: WRITE
TokenRange:
Wrap: false
Split Factor: 1
java.lang.RuntimeException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.eu-central-1.amazonaws.com/3.127.48.183:9142 (com.datastax.driver.core.exceptions.TransportException: [cassandra.eu-central-1.amazonaws.com/3.127.48.183] Channel has been closed))
at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:220)
at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:79)
at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:69)
at org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:228)
at org.apache.cassandra.stress.StressAction.run(StressAction.java:57)
at org.apache.cassandra.stress.Stress.run(Stress.java:143)
at org.apache.cassandra.stress.Stress.main(Stress.java:62)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.eu-central-1.amazonaws.com/3.127.48.183:9142 (com.datastax.driver.core.exceptions.TransportException: [cassandra.eu-central-1.amazonaws.com/3.127.48.183] Channel has been closed))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:233)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:403)
at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:160)
at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:211)
... 6 more
I've tried changing some parameters such as the jks password etc. (Just in case I was wrong) but I got a different error message so it's probably not the case.
Did I miss something?
Try using TLP Stress instead.
tlp-stress run RandomPartitionAccess -d 10m --host cassandra.us-east-1.amazonaws.com --port 9142 --username alice --password fLyWYFlTCD5J2gzGAZ –ssl --max-requests 4000 --dc us-east-2 --threads 10
https://thelastpickle.com/tlp-stress/
I'm trying to send push to my app using Google Cloud Scheduler:
gcloud beta scheduler jobs create http PUSH --schedule="0 * * * *" --uri="https://fcm.googleapis.com/fcm/send" --description="desc" --headers="Authorization: key=<AUTHKEY> --http-method="POST" --message-body="{\"to\":\"/topics/allDevices\",\"priority\":\"low\",\"data\":{\"success\":\"ok\"}}"
The result is always 401 Unauthorized. After issuing command:
gcloud beta scheduler jobs describe PUSH
I do not get these headers back:
description: desc
httpTarget:
body: eyJ0byI6Ii90b3BpY3MvYWxsnByaW9yaXR5IjoRGV2aWNlcyIsIiaGlnaCIsImRhdGEiOnsic3VjY2VzcyI6Im9rIn19 <--- THIS IS WEIRD
headers:
Content-Type: application/octet-stream
User-Agent: Google-Cloud-Scheduler
httpMethod: POST
uri: https://fcm.googleapis.com/fcm/send
lastAttemptTime: '2018-11-07T20:32:37.657408Z'
name: projects/..../locations/europe-west1/jobs/PUSH
retryConfig:
maxBackoffDuration: 3600s
maxDoublings: 16
maxRetryDuration: 0s
minBackoffDuration: 5s
schedule: 0 * * * *
scheduleTime: '2018-11-07T21:00:00.681498Z'
state: ENABLED
status:
code: 16
timeZone: Etc/UTC
userUpdateTime: '2018-11-07T20:29:15Z'
The first question about body:
body:eyJ0byI6Ii90b3BpY3MvYWxsnByaW9yaXR5IjoRGV2aWNlcyIsIiaGlnaCIsImRhdGEiOnsic3VjY2VzcyI6Im9rIn19
<--- THIS IS WEIRD
This is the base64 encoding of
{\"to\":\"/topics/allDevices\",\"priority\":\"low\",\"data\":{\"success\":\"ok\"}}
Google is taking your --message-body and encoding it in base64.
Next regarding the header issue. You have a several errors in your '--headers`.
--headers="Authorization: key=AUTHKEY
You are missing a quote mark after AUTHKEY. I will assume that this issue is just editing mistake creating the question. (Note I could not figure out how to include the less-than and greater-than characters in this response).
However, the syntax for --headers is wrong. The --headers expects KEY=VALUE, not KEY:VALUE. In this example the KEY is Authorization and the VALUE is key=AUTHKEY.
--headers="Authorization=key=AUTHKEY"
What should i set broadcast_rpc_address in 3 seed and 3 non-seed node cassandra cluster deployed on AWS?
rpc address is set to wildcard 0.0.0.0,
Seed nodes launched using static ENI,
Non seed nodes launched using ASG,
ALL the nodes are launched in private subnet which are able to connect to internet using NAT gateway.
Added cassandra.yaml file which i am using
cluster_name: 'Cassandra Cluster'
num_tokens: 256
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
disk_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "seednode-A-IP,seednode-B-IP,seednode-C-IP"
concurrent_reads: 32
concurrent_writes: 32
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.8.9.83
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: 0.0.0.0
broadcast_rpc_address: NAT-GATEWAY-IP
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
column_index_size_in_kb: 64
compaction_throughput_mb_per_sec: 16
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: Ec2Snitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
internode_compression: all
inter_dc_tcp_nodelay: false
Okie, found the issue,
In cassandra versions 3.o and above cql uses 9042 for connection , after changing the LB listener port from 9160 to 9042, it worked
cluster_name: 'Cassandra Cluster'
num_tokens: 256
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
disk_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "seednode-A-IP,seednode-B-IP,seednode-C-IP"
concurrent_reads: 32
concurrent_writes: 32
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.8.9.83
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: 0.0.0.0
broadcast_rpc_address: NAT-GATEWAY-IP
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
column_index_size_in_kb: 64
compaction_throughput_mb_per_sec: 16
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: Ec2Snitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
internode_compression: all
inter_dc_tcp_nodelay: false
we have Cassandra cluster deployed on AWS with 3 seed nodes (static ENI attached) and 3 non seed nodes in Auto scaling group.
I have rpc_address set to 0.0.0.0, can someone tell me what should be the broadcast_rpc_address in cassandra.yaml file?
I was using cassandra 2.0.7 version before and was able to connect to cluster fine with just rpc_address as 0.0.0.0 and without broadcast_rpc_address this unset, but when i upgraded to 3.11.1 it is giving me error
CassandraDaemon.java:708 - Exception encountered during startup: If rpc_address is set to a wildcard address (0.0.0.0), then you must set broadcast_rpc_address to a value other than 0.0.0.0
If you are using Cassandra 2.1 or greater, you can configure broadcast_rpc_address. you can configure broadcast_rpc_address to be public IP.
Cassandra uses the "broadcast_address/listen_address" for internode connectivity and "broadcast_rpc_address/rpc_address" for the rpc interface (client -> coordinator (Cassandra node) requests).