ElasticSearch 7.10 AWS Reindexing error es_rejected_execution_exception 429

ElasticSearch 7.10 AWS Reindexing error es_rejected_execution_exception 429 - amazon-web-services

Both my indices are on the same node. The source has about 200k documents, I'm using AWS and the instance type is "t3.small.search" so 2 vCPUs. I tried slicing already but it just gives me the same error. Any ideas on what I can do to make this process finish successfully?
{
"error" : {
"root_cause" : [
{
"type" : "es_rejected_execution_exception",
"reason" : "rejected execution of coordinating operation [shard_detail=[fulltext][0][C], shard_coordinating_and_primary_bytes=0, shard_operation_bytes=98296362, shard_max_coordinating_and_primary_bytes=105630] OR [node_coordinating_and_primary_bytes=0, node_replica_bytes=0, node_all_bytes=0, node_operation_bytes=98296362, node_max_coordinating_and_primary_bytes=105630924]"
}
],
"type" : "es_rejected_execution_exception",
"reason" : "rejected execution of coordinating operation [shard_detail=[fulltext][0][C], shard_coordinating_and_primary_bytes=0, shard_operation_bytes=98296362, shard_max_coordinating_and_primary_bytes=105630] OR [node_coordinating_and_primary_bytes=0, node_replica_bytes=0, node_all_bytes=0, node_operation_bytes=98296362, node_max_coordinating_and_primary_bytes=105630924]"
},
"status" : 429
}

I ran into a similar problem. I was trying to reindex a couple of indices that had a lot of documents in them. I raised the JVM HeapSize from 512mb to 2gb and it fixed the problem.
Check current JVM HeapSize:
GET {ES_URL}/_cat/nodes?h=heap*&v
Here's how you can change the settings: https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html
Hope this helps.

Related

AWS Elasticsearch performance issue

Have an index which is search heavy. Rpm varies from 15-20k. Issue is, for first few days resp time of search query will be around 15ms. But it will start increasing gradually and touches ~70ms. Some of the requests starts queuing(as per Search thread pool graph in aws console) but there were no rejection. Queuing would increase latency of the search request.
Got to know that queuing will happen if there is pressure on resource. I think I have sufficient cpu and memory, plz look at config below.
Enabled slow query logs, but didnt find any anamoly. Even though average resp time is around 16ms, I see few queries going above 50ms. But there was no issue in search query. Searchable documents is around 8k.
Need your suggestion on how to improve performance here. Document mapping, search query and ES config are given below. Is there any issue in mapping or query here?
Mapping:
{
"data":{
"mappings":{
"_doc":{
"properties":{
"a":{
"type":"keyword"
},
"b":{
"type":"keyword"
}
}
}
}
}
}
Search query:
{
"size":5000,
"query":{
"bool":{
"filter":[
{
"terms":{
"a":[
"all",
"abc"
],
"boost":1
}
},
{
"terms":{
"b":[
"all",
123
],
"boost":1
}
}
],
"adjust_pure_negative":true,
"boost":1
}
},
"stored_fields":[]
}
Im using keyword in mapping and terms in search query as I want to search for exact value.Boost and adjust_pure_negative are added automatically. From what I read, they should not affect performance.
Index settings:
{
"data":{
"settings":{
"index":{
"number_of_shards":"1",
"provided_name":"data",
"creation_date":"12345678154072",
"number_of_replicas":"7",
"uuid":"3asd233Q9KkE-2ndu344",
"version":{
"created":"10499"
}
}
}
}
}
ES config:
Master node instance type: m5.large.search
Master nodes: 3
Data node instance type: m5.2xlarge.search
Data nodes: 8 (8 vcpu, 32 GB memory)

Dealing With Incoming Null Values In Cloud Data Fusion When Building Data Pipeline

I have started trying out google cloud data fusion as a prospect ETL tool that I can finally decide to use.When building a data pipeline to fetch data from a REST API source and load it to a MySQL database am facing this error Expected a string but was NULL at line 1 column 221'. Please check the system logs for more details. and yes it's true I have a field that is null from the JSON response am seeing
"systemanswertime": null
How do I deal with null values since the available dropdown in the cloud data fusion studio string is not working are they other optional data types that I can use?
Below are two screenshots showing my current data pipeline structure
geneneral view
view showing mapping and the output schema
Thank You!!

What you need to do is to tell HTTP plugin that you are expecting a null by checking the null checkbox in front of output on the right side. See below example

You might be getting this error because in the JSON schema you are defining the value properties. You should allow systemanswertime parameter to be NULL.
You could try to parse the JSON value as follow:
"systemanswertime": {
"type": [
"string",
"null"
]
}
In the case you don't have access to the JSON file, you could try to use this plug in in order to enable the HTTP to manage nulleable values by dynamically substituting the configurations that can be served by the HTTP Server. You will need access to the HTTP endpoint in order construct an accessible HTTP endpoint that can serve content similar to:
{
"name" : "output.schema", "type" : "schema", "value" :
[
{ "name" : "id", "type" : "int", "nullable" : true},
{ "name" : "first_name", "type" : "string", "nullable" : true},
{ "name" : "last_name", "type" : "string", "nullable" : true},
{ "name" : "email", "type" : "string", "nullable" : true},
]
},

In case you are facing an error such as: No matching schema found for union type: ["string","null"], you could try the following workaround. The root cause of this errors are when the entries in the response from the API doesn't have all the fields it needs to have. For example, some entries may have callerId, channel, last_channel, last data, etc... but others entries may have not have last_channel or whatever other field from the JSON. This leads to a mismatch in the schema provided in the HTTP source and the pipeline fails right away.
As pear this when nodes encounter null values, logical errors, or other sources of errors, you may use an error handler plugin to catch errors. The way is as following:
In the HTTP source plug-in, change the following:
Output schema to account for custom field.
JSON/XML field mapping to account into custom field.
Changed Non-HTTP Error Handling field to Send to Error. This way it pushes the records through error collector and the pipeline proceeds with subsequent records.
Added Error Collector and a sink to capture the error records.
With this method you will be able to run the pipeline and had the problematic fields detected.
Kind regards,
Manuel

ElasticSearch Node Failure

My Elasticsearch cluster dropped from 2B documents to 900M Records, on AWS it shows
Relocating shards: 4
Whilst Showing
Active Shards: 35
and
Active primary shards: 34
(Might not be relevant but here's rest of stats):
Number of nodes: 9
Number of data nodes: 6
Unassigned shards: 17
When running
GET /_cluster/allocation/explain
it returns:
{
"index": "datauwu",
"shard": 6,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "NODE_LEFT",
"at": "2019-10-31T17:02:11.258Z",
"details": "node_left[removedforsecuritybecimparanoid1]",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions": [
{
"node_id": "removedforsecuritybecimparanoid2",
"node_name": "removedforsecuritybecimparanoid2",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "removedforsecuritybecimparanoid3",
"node_name": "removedforsecuritybecimparanoid3",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "removedforsecuritybecimparanoid4",
"node_name": "removedforsecuritybecimparanoid4",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "removedforsecuritybecimparanoid5",
"node_name": "removedforsecuritybecimparanoid5",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "removedforsecuritybecimparanoid6",
"node_name": "removedforsecuritybecimparanoid6",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "removedforsecuritybecimparanoid7",
"node_name": "removedforsecuritybecimparanoid7",
"node_decision": "no",
"store": {
"found": false
}
}
]
}
im a bit confused to what this exactly means, does this mean my elasticsearch cluster did not lose data, but is instead relocating it into different shards, or cannot it not find the shards?
If it cannot find shards, does this mean my data was lost? if so, what could be the reason, how can i prevent this from happening in the future?
I haven't setup replicas as i was indexing data, and replicas slow it down whilst indexing.
also side not, my record count dropped down to 400m at one point but then rose back up to 900m randomly. i don't know what this means and any insight would greatly be appreciated.

"reason": "NODE_LEFT"
And:
I haven't setup replicas as i was indexing data, and replicas slow it down whilst indexing.
If the node holding the primary shards has gone away, then yes, your data is gone. After all, if there are no replicas, then where would the cluster retrieve the data from, if the primary (and only) shards are no longer part of the cluster? You will either need to bring the node holding those shards back up and add it into the cluster, or the data is gone.
The error message is saying "You want me to allocate a primary shard for this index that I know exists, but there used to be another version of that primary shard that can't be found anymore, I won't allocate it again in case the previous primary comes back."
You can force Elasticsearch to reallocate the primary shard (and explicitly accept that the data in the previous primary shard is gone) by performing a reroute with allocate_stale_primary (doc):
curl -H 'Content-Type: application/json' \
-XPOST '127.0.0.1:9200/_cluster/reroute?pretty' -d '{
"commands" : [ {
"allocate_stale_primary" :
{
"index" : "datauwu", "shard" : 6,
"node" : "target-data-node-id",
"accept_data_loss" : true
}
}
]
}'
Turning off replicas for anything but development with disposable data is usually a bad idea.
also side not, my record count dropped down to 400m at one point but then rose back up to 900m randomly. i don't know what this means and any insight would greatly be appreciated.
This happens because shards aren't visible in the cluster. This can happen if all copies of a shard are being allocated, relocated, or recovered. This corresponds with a RED cluster state. You can mitigate it by ensuring that you have at least 1 replica (though ideally you have a sufficient number of replicas set up to survive the loss of N data nodes in the cluster). This lets Elasticsearch keep one shard as the primary while it moves others around.
If you only have the primary and no replicas, then if a primary is being recovered or relocated, the data in that shard will not be visible in the cluster. Once the shard is active again, the documents in it become visible.

When attempting to recover an unallocated shard with a missing primary using allocate_stale_primary as described by Chris Heald you might get:
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "No data for shard [0] of index [xyz] found on any node"
}
This mean the data is gone unless the missing node rejoins the cluster. Alternatively, you can empty the shard using an allocate_empty_primary command.
curl -H 'Content-Type: application/json' \
-XPOST '127.0.0.1:9200/_cluster/reroute?pretty' -d '{
"commands" : [ {
"allocate_empty_primary" :
{
"index" : "datauwu", "shard" : 6,
"node" : "target-data-node-id",
"accept_data_loss" : true
}
}
]
}'
This wipe the data and will overwrite the shard if the missing node rejoins.

RingCentral RingOut trouble

I’m having some RingOut API trouble. Everything was working out just fine but suddenly about a week ago one of the users claimed that the RingOut functionality was broken and from what I see she is correct. For the life of me I can’t figure out what the problem is. Below is what my request looks like. After polling the call the api errors out claiming that one or two lines is busy when I know for a fact they are not. Any ideas or direction on this would be greatly appreciated
Request URI:
https://platform.ringcentral.com/restapi/v1.0/account/279578017/extension/279580017/ring-out/Y3MxNjg2OTU1OTIyMDIwMzQ1NDI5QDEwLjE0LjIzLjQw
Post Variables:
{"from":{"phoneNumber":"+17606992007","forwardingNumberId":""},"to":{"phoneNumber":"+17602146463"},"callerId":{"phoneNumber":"+17604440557"},"playPrompt":false,"country":{"id":"1”}}
Result:
{
"uri" : "https://platform.ringcentral.com/restapi/v1.0/account/279578017/extension/279580017/ring-out/Y3MxNjg2OTU1NTIyNzE2MzY4NDQyQDEwLjE0LjIzLjA",
"id" : "Y3MxNjg2OTU1NTIyNzE2MzY4NDQyQDEwLjE0LjIzLjA",
"status" : {
"callStatus" : "InProgress",
"callerStatus" : "InProgress",
"calleeStatus" : "InProgress"
}
}
Result from polling the call:
{
"uri" : "https://platform.ringcentral.com/restapi/v1.0/account/279578017/extension/279580017/ring-out/Y3MxNjg2OTU1OTgxNjMyMzY1NjYwQDEwLjE0LjIzLjQ2",
"id" : "Y3MxNjg2OTU1OTgxNjMyMzY1NjYwQDEwLjE0LjIzLjQ2",
"status" : {
"callStatus" : "CannotReach",
"callerStatus" : "Busy",
"calleeStatus" : "InProgress"
}
}

I know it sound strange but i had the same error on my app, Removing the +1 from the number fixed the issue for me.
Hope that helps!
ringcentral

Can you connect a SqlActivity to a JdbcDatabase in Amazon Data Pipeline?

Using Amazon Data Pipeline, I'm trying to use a SqlActivity to execute some SQL on a non-Redshift data store (SnowflakeDB, for the curious). It seems like it should be possible to do that with a SqlActivity that uses a JdbcDatabase. My first warning was when the wysiwyg editor on Amazon didn't even let me try to create a JdbcDatabase, but I plowed on anyway and just wrote and uploaded a Json definition by hand, myself (here's the relevant bit):
{
"id" : "ExportToSnowflake",
"name" : "ExportToSnowflake",
"type" : "SqlActivity",
"schedule" : { "ref" : "DefaultSchedule" },
"database" : { "ref" : "SnowflakeDatabase" },
"dependsOn" : { "ref" : "ImportTickets" },
"script" : "COPY INTO ZENDESK_TICKETS_INCREMENTAL_PLAYGROUND FROM #zendesk_incremental_stage"
},
{
"id" : "SnowflakeDatabase",
"name" : "SnowflakeDatabase",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "com.snowflake.client.jdbc.SnowflakeDriver",
"username" : "redacted",
"connectionString" : "jdbc:snowflake://redacted.snowflakecomputing.com:8080/?account=redacted&db=redacted&schema=PUBLIC&ssl=on",
"*password" : "redacted"
}
When I upload this into the designer, it refuses to activate, giving me this error message:
ERROR: 'database' values must be of type 'RedshiftDatabase'. Found values of type 'JdbcDatabase'
The rest of the pipeline definition works fine without any errors. I've confirmed that it activates and runs to success if I simply leave this step out.
I am unable to find a single mention on the entire Internet of someone actually using a JdbcDatabase from Data Pipeline. Does it just plain not work? Why is it even mentioned in the documentation if there's no way to actually use it? Or am I missing something? I'd love to know if this is a futile exercise before I blow more of the client's money trying to figure out what's going on.

In your JdbcDatabase you need to have the following property:
jdbcDriverJarUri: "[S3 path to the driver jar file]"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js