I tried to follow this example https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html to load data to neptune
curl X POST -H 'Content-Type: application/json' https://endpoint:port/loader -d '
{
"source" : "s3://source.csv",
"format" : "csv",
"iamRoleArn" : "role",
"region" : "region",
"failOnError" : "FALSE",
"parallelism" : "MEDIUM",
"updateSingleCardinalityProperties" : "FALSE",
"queueRequest" : "TRUE"
}'
{
"status" : "200 OK",
"payload" : {
"loadId" : "411ee078-3c44-4620-85ac-e22ef5466bbb"
}
And I get status 200 but then I try to check if the data was loaded and get this:
curl G 'https://endpoint:port/loader/411ee078-3c44-4620-85ac-e22ef5466bbb'
{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 4,
"startTime" : 1617653964,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
}
}
I had no idea why I get LOAD_FAILED so I decided to use get-status API to see what errors caused the load failure and got this:
curl -X GET 'endpoint:port/loader/411ee078-3c44-4620-85ac-e22ef5466bbb?details=true&errors=true'
{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 4,
"startTime" : 1617653964,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
},
"failedFeeds" : [
{
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 1,
"startTime" : 1617653967,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
}
],
"errors" : {
"startIndex" : 1,
"endIndex" : 10,
"loadId" : "411ee078-3c44-4620-85ac-e22ef5466bbb",
"errorLogs" : [
{
"errorCode" : "FROM_OR_TO_VERTEX_ARE_MISSING",
"errorMessage" : "Either from vertex, '1414', or to vertex, '70', is not present.",
"fileName" : "s3://source.csv",
"recordNum" : 0
},
What does this error even mean and what is the possible fix?
It looks as if you were trying to load some edges. When an edge is loaded, the two vertices that the edge will be connecting must already have been loaded/created. The message:
"errorMessage" : "Either from vertex, '1414', or to vertex, '70',is not present.",
is letting you know that one (or both) of the vertices with ID values of '1414' and '70' are missing. All vertices referenced by a CSV file containing edges must already exist (have been created or loaded) prior to loading edges that reference them. If the CSV files for vertices and edges are in the same S3 location then the bulk loader can figure out the order to load them in. If you just ask the loader to load a file containing edges but the vertices are not yet loaded, you will get an error like the one you shared.
Related
What is the correct curl format to view trc20 balance of an account? I tried the below command, but the output showed no balance.
curl -X POST http://127.0.0.1:8090/wallet/triggersmartcontract -d
'{
"contract_address":"TCFLL5dx5ZJdKnWuesXxi1VPwjLVmWZZy9",
"address":"TUT5SVvKmnxKpKdHi2tXMzPfffQNg7e3MU",
"function_selector":"balanceOf(address)",
"owner_address":"TUT5SVvKmnxKpKdHi2tXMzPfffQNg7e3MU",
"visible":true
}'
output:
{
"result" : {
"result" : true
},
"transaction" : {
"raw_data" : {
"ref_block_hash" : "0d9745f14e11d7fa",
"expiration" : 1605942390000,
"ref_block_bytes" : "9c45",
"contract" : [
{
"type" : "TriggerSmartContract",
"parameter" : {
"type_url" : "type.googleapis.com/protocol.TriggerSmartContract",
"value" : {
"contract_address" : "TCFLL5dx5ZJdKnWuesXxi1VPwjLVmWZZy9",
"owner_address" : "TUT5SVvKmnxKpKdHi2tXMzPfffQNg7e3MU",
"data" : "70a08231"
}
}
}
],
"timestamp" : 1605942331560
},
"txID" : "a3bdcb595a94f9805301fb74b33f2b536d3a6bb5050b7eb7b12808bb1e36fcd7",
"visible" : true,
"ret" : [
{}
],
"raw_data_hex" : "0a029c4522080d9745f14e11d7fa40f0d980cdde2e5a6d081f12690a31747970652e676f6f676c65617069732e636f6d2f70726f746f636f6c2e54726967676572536d617274436f6e747261637412340a1541cab799601a50938457902e1a31d3faa26ca1d76012154118fd0626daf3af02389aef3ed87db9c33f638ffa220470a0823170a891fdccde2e"
},
"constant_result" : [
"0000000000000000000000000000000000000000000000000000000000000000"
]
}
I use the default main_net_config with supportConstant = true. Do I have to enable anything else in my config file?
Transitioning to new AWS documentDB service. Currently, on Mongo 3.2. When I run db.collection.distinct("FIELD_NAME") it returns the results really quickly. I did a database dump to AWS document DB (Mongo 3.6 compatible) and this simple query just gets stuck.
Here's my .explain() and the indexes on the working instance versus AWS documentdb:
Explain function on working instance:
> db.collection.explain().distinct("FIELD_NAME")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"FIELD_NAME" : 1
},
"inputStage" : {
"stage" : "DISTINCT_SCAN",
"keyPattern" : {
"FIELD_NAME" : 1
},
"indexName" : "FIELD_INDEX_NAME",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"FIELD_NAME" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [ ]
},
Explain on AWS documentdb, not working:
rs0:PRIMARY> db.collection.explain().distinct("FIELD_NAME")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.collection",
"winningPlan" : {
"stage" : "AGGREGATE",
"inputStage" : {
"stage" : "HASH_AGGREGATE",
"inputStage" : {
"stage" : "COLLSCAN"
}
}
}
},
}
Index on both of these instances:
{
"v" : 1,
"key" : {
"FIELD_NAME" : 1
},
"name" : "FIELD_INDEX_NAME",
"ns" : "db.collection"
}
Also, this database has a couple million documents but there are only about 20 distinct values for that "FIELD_NAME". Any help would be appreciated.
I tried it with .hint("index_name") and that didn't work. I tried clearing plan cache but I get Feature not supported: planCacheClear
COLLSCAN and IXSCAN don't have too much difference in this case, both need to scan all the documents or index entries.
///sample Data
{
"_id" : "CUST1234",
"Phone Number" : "9585290750",
"First Name" : "jeff",
"Last Name" : "ayan",
"Email ID" : "",
"createddate" : 1462559400000.0,
"services" : [
{
"type" : "Enquiry",
"timeSpent" : "0:00",
"trxID" : "TRXE20160881",
"CustomerQuery" : "Enquiry about travell agent numbers in basaveshwara nagara",
"ServiceProvided" : "provided info through whatsapp",
"Category" : "Tours/Travels",
"callTime" : "2016-05-06T18:30:00.000Z",
"ActualAmount" : 0,
"FinalAmount" : 0,
"DiscountRuppes" : 0,
"DiscountPerctange" : 0
},
{
"type" : "Enquiry",
"timeSpent" : "0:00",
"trxID" : "TRXE20160882",
"CustomerQuery" : "Enquiry about Electric bill payment of house",
"ServiceProvided" : "Service provided",
"Category" : "Utility Services",
"callTime" : "2016-05-10T18:30:00.000Z",
"ActualAmount" : 0,
"FinalAmount" : 0,
"DiscountRuppes" : 0,
"DiscountPerctange" : 0
},
{
"type" : "Enquiry",
"timeSpent" : "0:00",
"trxID" : "TRXE20160883",
"CustomerQuery" : "Enquiry about KPSC office number",
"ServiceProvided" : "provided info through whatsapp",
"Category" : "Govt Offices/Enquiries",
"callTime" : "2016-05-13T18:30:00.000Z",
"ActualAmount" : 0,
"FinalAmount" : 0,
"DiscountRuppes" : 0,
"DiscountPerctange" : 0
},
{
"type" : "Enquiry",
"timeSpent" : "0:00",
"trxID" : "TRXE20160884",
"CustomerQuery" : "Enquiry about Sagara appolo hospital contact number",
"ServiceProvided" : "provided the information through call",
"Category" : "Hospitals/Equipments",
"callTime" : "2016-05-14T18:30:00.000Z",
"ActualAmount" : 0,
"FinalAmount" : 0,
"DiscountRuppes" : 0,
"DiscountPerctange" : 0
},
]
}
Expected Output : entire data that matches particular string in search box from "services" field.
db.collection.aggregate([
{
$match: {
"Phone Number": "9585290750",
"services": { $regex: "/^t/", $options: "s i" }
}
},
{
$project: {
"Services": "services"
}
}
]);
I am facing an issue in regex portion in the above Collection, services is an array field. Please help me to filter the data.
Guys since i am new to Mongodb it took me a day to find a proper solution to my task. I have a solution to my issue. If you guys have better query than this, just post it or modify it....
db.collections.aggregate([
{"$match":{"Corporate_ID":"id"}},
{"$unwind":"$services"},
{"$match":{"$or":[
{"services.type":{$regex:'TRXF2016088142',"$options": "i"}},
{"services.timeSpent":{$regex:'TRXF2016088142',"$options": "i"}},
{"services.trxID":{$regex:'TRXF2016088142',"$options": "i"}},
{"services.CustomerQuery":{$regex:'F',"$options": "i"}},
{"services.ServiceProvided":{$regex:'F',"$options": "i"}},
{"services.Category":{$regex:'F',"$options": "i"}},
{"services.callTime":{$regex:'TRXF2016088142',"$options": "i"}},
{"services.ActualAmount":{$regex:'TRXF2016088142',"$options": "i"}},
{"services.FinalAmount":{$regex:'TRXF2016088142',"$options": "i"}},
{"services.DiscountRuppes":{$regex:'TRXF2016088142',"$options": "i"}},
{"services.DiscountPerctange":{$regex:'TRXF2016088142',"$options": "i"}}
]}},
{"$unwind":"$services"},
{"$project":{
"service":"$services"}
}
])
This is because you are passing in a string of JavaScript regular expression object to $regex. Change your regex to one of the following.
"service": { "$regex": /^t/, "$options": "si" }
or
"service": { "$regex": "^t", "$options": "si" }
I've got a query that's using a regex anchor and it seems to be slower when running an index scan rather than a collection scan.
A bit of background to the question:
I have a MSSQL database that has approximately 2.8 million rows in a table. We were running the following query against the table to return approximately 2.6 million results in 23 seconds:
select * from table where 'column' like "IL%"
So out of curiosity I decided to see if mongodb could perform this any faster than my MSSQL database and on a new test server I created a mongodb database which I filled 1 collection (test1) with just under 3 million objects. Here's the basic structure of a document in a collection:
> db.test1.findOne()
{
"_id" : 2,
"Other_REV" : "NULL",
"Holidex_Code" : "W8BP0",
"Segment_Name" : "NULL",
"Source" : "Forecast",
"Date_" : ISODate("2009-11-12T11:14:00Z"),
"Rooms_Sold" : 3,
"FB_REV" : "NULL",
"Rate_Code" : "ILM87",
"Export_Date" : ISODate("2010-12-12T11:14:00Z"),
"Rooms_Rev" : 51
}
All of my records have Rate_Code prefixed with IL and I ran the following query against the database which took just over 3 seconds:
> db.test1.find({'Rate_Code':{$regex: /^IL/}}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2999999,
"nscannedObjects" : 2999999,
"nscanned" : 2999999,
"nscannedObjectsAllPlans" : 2999999,
"nscannedAllPlans" : 2999999,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 3398,
"indexBounds" : {
},
"server" : "MONGODB:27017"
}
Out of curiosity I created an index to see if I could speed up the retrieval at all:
> db.test1.ensureIndex({'Rate_Code':1})
However this appears to actually slow down the resolution of the query to approximately 6 seconds on average:
> db.test1.find({'Rate_Code':{$regex: /^IL/}}).explain()
{
"cursor" : "BtreeCursor Rate_Code_1",
"isMultiKey" : false,
"n" : 2999999,
"nscannedObjects" : 2999999,
"nscanned" : 2999999,
"nscannedObjectsAllPlans" : 2999999,
"nscannedAllPlans" : 2999999,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 5895,
"indexBounds" : {
"Rate_Code" : [
[
"IL",
"IM"
]
]
},
"server" : "MONGODB:27017"
}
The OS has 2GB of memory and appears to be holding both indexes quite comfortably in memory with no disk usage being recorded when the query is ran:
> db.test1.stats()
{
"ns" : "purify.test1",
"count" : 2999999,
"size" : 623999808,
"avgObjSize" : 208.0000053333351,
"storageSize" : 790593536,
"numExtents" : 18,
"nindexes" : 2,
"lastExtentSize" : 207732736,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 153218240,
"indexSizes" : {
"_id_" : 83722240,
"Rate_Code_1" : 69496000
},
"ok" : 1
}
I'm thinking the slow down is due to mongodb performing a full scan of the index followed by a full collection scan as it can't be sure that all my matches are in the index but I'm not entirely sure if this is the case. Is there any way that this could be improved upon for better performance?
Thanks for any help.
I'm using django-haystack and ElasticSearch to index Stores.
Until now, each store had one lat,long coordinate pair; we had to change this to represent the fact that one store can deliver products to very different regions (disjunct) I've added up to ten locations (lat,long pairs) to them.
When using one location field everything was working fine and I got right results. Now, with multiple location fields, I can't get any results, not even the previuos one, for the same user and store coordinates.
My Index is as following:
class StoreIndex(indexes.SearchIndex,indexes.Indexable):
text = indexes.CharField(document=True, use_template=True,
template_name='search/indexes/store/store_text.txt')
location0 = indexes.LocationField()
location1 = indexes.LocationField()
location2 = indexes.LocationField()
location3 = indexes.LocationField()
location4 = indexes.LocationField()
location5 = indexes.LocationField()
location6 = indexes.LocationField()
location7 = indexes.LocationField()
location8 = indexes.LocationField()
location9 = indexes.LocationField()
def get_model(self):
return Store
def prepare_location0(self, obj):
# If you're just storing the floats...
return "%s,%s" % (obj.latitude, obj.longitude)
# ..... up to prepare_location9
def prepare_location9(self, obj):
# If you're just storing the floats...
return "%s,%s" % (obj.latitude_9, obj.longitude_9)
Is this the correct way to build my index?
From elasticsearch I get this mapping information:
curl -XGET http://localhost:9200/stores/_mapping?pretty=True
{
"stores" : {
"modelresult" : {
"properties" : {
"django_id" : {
"type" : "string"
},
"location0" : {
"type" : "geo_point",
"store" : "yes"
},
"location1" : {
"type" : "geo_point",
"store" : "yes"
},
"location2" : {
"type" : "geo_point",
"store" : "yes"
},
"location3" : {
"type" : "geo_point",
"store" : "yes"
},
"location4" : {
"type" : "geo_point",
"store" : "yes"
},
"location5" : {
"type" : "geo_point",
"store" : "yes"
},
"location6" : {
"type" : "geo_point",
"store" : "yes"
},
"location7" : {
"type" : "geo_point",
"store" : "yes"
},
"location8" : {
"type" : "geo_point",
"store" : "yes"
},
"location9" : {
"type" : "geo_point",
"store" : "yes"
},
"text" : {
"type" : "string",
"analyzer" : "snowball",
"store" : "yes",
"term_vector" : "with_positions_offsets"
}
}
}
}
}
Then, I try to query this way:
sqs0 = SearchQuerySet().dwithin('location0', usuario, max_dist).distance('location0',usuario).using('stores')
where:
usuario is a Point instance representing the user trying to find stores near his position and
max_dist is a D instance.
If I query directly, using curl I got no results, too.
Here is the result of quering using curl with multiple location fields:
$ curl -XGET http://localhost:9200/stores/modelresult/_search?pretty=true -d '{ "query" : { "match_all": {} }, "filter" : {"geo_distance" : { "distance" : "6km", "location0" : { "lat" : -23.5, "lon" : -46.6 } } } } '
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
If comment out the fields location1-9 from the StoreIndex class everything works fine, but if I leave them to get multiple location points, I get no results for the same query (user position). This happens for the same query, in django as directly, using curl. That is, if I have only one location (say location0), both queries returns correct results. With more locations (location0-9), both queries didn't give any results.
Here's the results of quering directly using curl with only one location field:
$ curl -XGET http://localhost:9200/stores/modelresult/_search?pretty=true -d '{ "query" : { "match_all": {} }, "filter" : {"geo_distance" : { "distance" : "6km", "location0" : { "lat" : -23.5, "lon" : -46.6 } } } } '
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 9,
"max_score" : 1.0,
"hits" : [ {
"_index" : "stores",
"_type" : "modelresult",
"_id" : "store.store.110",
"_score" : 1.0, "_source" : {"django_ct": "store.store", "text": "RESULT OF THE SEARCH \n\n", "django_id": "110", "id": "store.store.110", "location0": "-23.4487554,-46.58912"}
},
lot's of results here
]
}
}
Of course, I rebuild_index after any change in StoreIndex.
Any help on how to get multiple location fields working with elasticsearch and django?
PS.: I've cross posted this question on Django-Haystack and ElasticSearch Google Groups.
https://groups.google.com/d/topic/elasticsearch/85fg7vdCBBU/discussion
https://groups.google.com/d/topic/django-haystack/m2A3_SF8-ls/discussion
Thanks in advance
Mário