goamz/sqs/md5.go:57: undefined: md5.Sum issue - amazon-web-services

I am trying to manipulate aws sqs on a http://www.nitrous.io box with golang version go1.1.1 linux/amd64.
When I import the sqs module from this github repository https://github.com/crowdmob/goamz/tree/master/sqs and I run my code with a
go run myCode.go
I face this issue:
# github.com/crowdmob/goamz/sqs
../src/github.com/crowdmob/goamz/sqs/md5.go:57: undefined: md5.Sum
My call for that module is like this:
import "github.com/crowdmob/goamz/sqs"
And I can use other modules from the same repo. for example the aws and the S3 one
import "github.com/crowdmob/goamz/aws"
import "github.com/crowdmob/goamz/s3"
Looking at the error in the /sqs/md5.go from the goamz repository I can see the function Sum and it seems the import are done well:
package sqs
import (
"crypto/md5"
"encoding/binary"
"sort"
)
So I am a bit clueless on what's happening. Any Idea?

You're using an old version of Go -- md5.Sum didn't exist in go1.1.1.
Update to go1.3

Did you performed the testing files ? It could come from this, refering to the build state as said chendesheng

Related

using Gatsby, the build fails,

I get a new bug, when trying to do yarn build
2 | //# sourceMappingURL=react-table.production.min.js.map
3 |
WebpackError: ReferenceError: regeneratorRuntime is not defined
image is attached
this is coming from a component that uses react-table, this is a new bug that "suddenly" happened.
any tips?
As you can follow in this Gatsby discussion what ends in this thread from react-table the issue rises because of the useAsyncDebounce you're using, which internally uses async/await.
You have two solutions:
Importing regenerator-runtime/runtime at the top of your component:
import 'regenerator-runtime/runtime';
import React from 'react';
import {
useAsyncDebounce,
useFilters,
useGlobalFilter,
useTable,
} from "react-table"
Configuring your browserlist to include the polyfill automatically. In your package.json:
{
"browserslist": [">0.25%", "not dead", "since 2017-06"]
}
You have to add a version that’s recent enough to support async/await, so Babel does not try to add a polyfill. Check them at https://github.com/browserslist/browserslist

Error when running Pytest with Delta Lake Tables

I am working in the VDI of a company and they use their own artifactory for security reasons.
Currently I am writing unit tests to perform tests for a function that deletes entries from a delta table. When I started, I received an error of unresolved dependencies, because my spark session was configured in a way that it would load jars from maven. I was able to solve this issue by loading these jars locally from /opt/spark/jars. Now my code looks like this:
class TestTransformation(unittest.TestCase):
#classmethod
def test_ksu_deletion(self):
self.spark = SparkSession.builder\
.appName('SPARK_DELETION')\
.config("spark.delta.logStore.class", "org.apache.spark.sql.delta.storage.S3SingleDriverLogStore")\
.config("spark.jars", "/opt/spark/jars/delta-core_2.12-0.7.0.jar, /opt/spark/jars/hadoop-aws-3.2.0.jar")\
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")\
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")\
.getOrCreate()
os.environ["KSU_DELETION_OBJECT"]="UNITTEST/"
deltatable = DeltaTable.forPath(self.spark, "/projects/some/path/snappy.parquet")
deltatable.delete(col("DATE") < get_current()
However, I am getting the error message:
E py4j.protocol.Py4JJavaError: An error occurred while calling z:io.delta.tables.DeltaTable.forPath.
E : java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException.<init>(Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;)V
Do you have any idea by what this is caused? I am assuming it has to do with the way I am configuring spark.sql.extions and/or the spark.sql.catalog, but to be honest, I am quite a newb in Spark.
I would greatly appreciate any hint.
Thanks a lot in advance!
Edit:
We are using Spark 3.0.2 (Scala 2.12.10). According to https://docs.delta.io/latest/releases.html, this should be compatible. Apart from the SparkSession, I trimmed down the subsequent code to
df.spark.read.parquet(Path/to/file.snappy.parquet)
and now I am getting the error message
java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DeltaDelete has interface org.apache.spark.sql.catalyst.plans.logical.UnaryNode as super class
As I said, I am quite new to (Py)Spark, so please dont hesitate to mention things you consider completely obvious.
Edit 2: I checked the Python path I am exporting in the Shell before running the code and I can see the following:
Could this cause any problem? I dont understand why I do not get this error when running the code within pipenv (with spark-submit)
It looks like that you're using incompatible version of the Delta lake library. 0.7.0 was for Spark 3.0, but you're using another version - either lower, or higher. Consult Delta releases page to find mapping between Delta version & required Spark versions.
If you're using Spark 3.1 or 3.2, consider using delta-spark Python package that will install all necessary dependencies, so you just import DeltaTable class.
Update: Yes, this happens because of the conflicting versions - you need to remove delta-spark and pyspark Python package, and install pyspark==3.0.2 explicitly.
P.S. Also, look onto pytest-spark package that can simplify specification of configuration for all tests. You can find examples of it + Delta here.

Pylint errors with #validator decorator from AWS Lambda Powertools Python

I'm getting some Pylint errors while using the Lambda Powertools for Python. If I download the three files from the Validator decorator example code, and run pylint --errors-only validator_decorator.py, I get three errors:
************* Module validator_decorator
validator_decorator.py:5:1: E1120: No value for argument 'handler' in function call (no-value-for-parameter)
validator_decorator.py:5:1: E1120: No value for argument 'event' in function call (no-value-for-parameter)
validator_decorator.py:5:1: E1120: No value for argument 'context' in function call (no-value-for-parameter)
Here is the code from validator_decorator.py:
from aws_lambda_powertools.utilities.validation import validator
import schemas
#validator(inbound_schema=schemas.INPUT, outbound_schema=schemas.OUTPUT)
def handler(event, context):
return event
Despite the errors, the code works just fine, but I'd like to understand if I'm doing something wrong. I could always add a # pylint: disable=no-value-for-parameter but I'd rather not if there's a better way to handle this. I've tried digging through the source but there's a lot of wrapping going on there, and since I'm at the level of just having read Primer on Python Decorators I could use a hand understanding this. Thanks.
(venv) $ python --version
Python 3.9.10
(venv) $ pylint --version
pylint 2.12.2
astroid 2.9.3
Python 3.9.10 (main, Jan 15 2022, 11:40:53)
[Clang 13.0.0 (clang-1300.0.29.3)]
Looking at the function documentation here if you click "expand source code", one of the suggestion of use is this:
from aws_lambda_powertools.utilities.validation import validate
def handler(event, context):
validate(event=event, schema=json_schema_dict)
return event
There are other suggestions, but non where it's used as a decorator, maybe it's not supposed to be used as one.

Amazon Deequ (Spark + Scala ) - java.lang.NoSuchMethodError: 'scala.Option org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAgg

Spark Version - 3.0.1
Amazon Deequ version - deequ-2.0.0-spark-3.1.jar
Im running the below code in spark shell in my local :
import com.amazon.deequ.analyzers.runners.{AnalysisRunner, AnalyzerContext}
import com.amazon.deequ.analyzers.runners.AnalyzerContext.successMetricsAsDataFrame
import com.amazon.deequ.analyzers.{Compliance, Correlation, Size, Completeness, Mean,
ApproxCountDistinct, Maximum, Minimum, Entropy}
import com.amazon.deequ.analyzers.{Compliance, Correlation, Size, Completeness, Mean,
ApproxCountDistinct, Maximum, Minimum, Entropy}
val analysisResult: AnalyzerContext = {AnalysisRunner.onData(datasourcedf).addAnalyzer(Size()).addAnalyzer(Completeness("customerNumber")).addAnalyzer(ApproxCountDistinct("customerNumber")).addAnalyzer(Minimum("creditLimit")).addAnalyzer(Mean("creditLimit")).addAnalyzer(Maximum("creditLimit")).addAnalyzer(Entropy("creditLimit")).**run()**}
ERROR:
java.lang.NoSuchMethodError: 'scala.Option
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression$default$2()'
at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31)
at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60)
at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167)
at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110)
... 63 elided
Can someone please let me know how to resolve this issue
You can't use Deeque version 2.0.0 with Spark 3.0 because it's binary incompatible due of the changes in the Spark's internals. With Spark 3.0 you need to use version 1.2.2-spark-3.0

arcpy in django doesnot run

I have a script with ArcGIS using arcpy model. And I want to combine it with Django. The script runs on console successfully, however, when running with Django, I find the arcpy functions do not run. So I make a simple test for it, and get the same result.
test.py
import arcpy
import os
def test_arcpy():
tempFolder = arcpy.env.scratchFolder
tempGDBPath = os.path.join(tempFolder, 'test.gdb')
arcpy.env.overwriteOutput = True
if not arcpy.Exists(tempGDBPath):
arcpy.AddMessage('create..')
arcpy.CreateFileGDB_management(tempFolder, 'test.gdb')
return arcpy.Exists(tempGDBPath)
views.py
from django.http import HttpResponse
from . import test
def t(request):
msg = str(test.test_arcpy())
return HttpResponse(msg)
If I run the test.py in console, it return True.But if I run it in django, it always return false. If I cannot solve it , I cannot make more difficult script in django. Can you help me?
I found the similar question in Flask app with ArcGIS, Arcpy does not run, but no solution in this question.
I think I can use the subprocess model to run the arcpy script in console, and then return the console message back to the django function.
But I really want to know whether arcpy can run with django or not.
i meet the same problem the same as you in Flask.
I check the source code in arcpy, found there is an bug in arcgisscripting.pyd file while i trigger arcpy.Exists() from Flask. And i lost any clew. After searching for some case on ESRI online blog, i certainly found that it's a bug in arcpy. So i advice u:
try higher version arcpy.(my current version is 10.1)
try to run your code in a console
try to run message queue like from multiprocessing import Queue, Process
try to not use the function. In my case, I avoid to use arcpy.Exists()
try to use 64-bit arcpy in arcServer.