how to set the contextPath of jetty on xsbt-web-plugin? - jetty

I'm on sbt 0.11.1 and xsbt-web-plugin 0.2.10
here goes the build.sbt and plugins.sbt
build.sbt
organization := "org"
name := "demo"
version := "0.1.0-SNAPSHOT"
scalaVersion := "2.9.1"
seq(webSettings :_*)
configurationXml :=
<configuration>
<webApp>
<contextPath>/foo</contextPath>
</webApp>
</configuration>
libraryDependencies ++= Seq(
"org.eclipse.jetty" % "jetty-webapp" % "7.4.5.v20110725" % "container",
"javax.servlet" % "servlet-api" % "2.5" % "provided"
)
resolvers += "Sonatype OSS Snapshots" at "http://oss.sonatype.org/content/repositories/snapshots/"
project/plugins.sbt
libraryDependencies <+= sbtVersion(v => "com.github.siasia" %% "xsbt-web-plugin" % (v+"-0.2.10"))
It seems the configurationXml doesn't work, after running container:start in sbt console, the contextPath gets the default value "/"
how can I change the contextPath? any tips? thanks in advance!

Here's a solution from scalatra-user group
Add jetty-plus to dependencies:
"org.eclipse.jetty" % "jetty-plus" % "7.4.5.v20110725" % "container"
Add this to build.sbt:
env in Compile := Some(file(".") / "jetty-env.xml" asFile)
In the same directory as build.sbt, create the jetty-env.xml:
<Configure class="org.eclipse.jetty.webapp.WebAppContext">
<Set name="contextPath">/foo</Set>
</Configure>

Related

hadoop-aws and aws-java-sdk version compatibility for Spark 3.1.2

I ran into version compatibility issues updating Spark project utilising both hadoop-aws and aws-java-sdk-s3 to Spark 3.1.2 with Scala 2.12.15 in order to run on EMR 6.5.0.
I checked EMR release notes stating these versions:
AWS SDK for Java v1.12.31
Spark v3.1.2
Hadoop v3.2.1
I am currently running spark locally to ensure compatibility of above versions and get the following error:
java.lang.NoSuchFieldError: SERVICE_ID
at com.amazonaws.services.s3.AmazonS3Client.createRequest(AmazonS3Client.java:4925)
at com.amazonaws.services.s3.AmazonS3Client.createRequest(AmazonS3Client.java:4911)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1441)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1381)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:381)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:236)
at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:380)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:314)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
I also tried checking version of aws-java-sdk hadoop-aws is based on. Hadoop-aws 3.2.1 relies on aws-java-sdk 1.11.375 as it can be found here
However these versions result in a different error:
'org.apache.http.client.methods.HttpRequestBase com.amazonaws.http.HttpResponse.getHttpRequest()'
at com.amazonaws.services.s3.internal.S3ObjectResponseHandler.handle(S3ObjectResponseHandler.java:57)
at com.amazonaws.services.s3.internal.S3ObjectResponseHandler.handle(S3ObjectResponseHandler.java:29)
at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1555)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1416)
at org.apache.hadoop.fs.s3a.S3AInputStream.lambda$reopen$0(S3AInputStream.java:196)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:195)
at org.apache.hadoop.fs.s3a.S3AInputStream.lambda$lazySeek$1(S3AInputStream.java:346)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$2(Invoker.java:195)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:193)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:215)
at org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:339)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:451)
at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
build.sbt:
scalaVersion := "2.12.15"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.1.2",
"org.apache.spark" %% "spark-sql" % "3.1.2",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.12.2",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.12.2",
"org.apache.hadoop" % "hadoop-client" % "3.2.1",
"org.apache.hadoop" % "hadoop-aws" % "3.2.1",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.375"
)
What should be correct versions for these libraries?
the EMR docs says "use our own s3: connector"...if you are running on EMR do exactly that.
you should use the s3a one on other installations, including local ones
And there
mvnrepository a good way to get a view of what dependencies are
* here is its summary for hadoop-aws though its 3.2.1 declaration misses out all the dependencies. it is 1.11.375
the stack traces you are seeing are from trying to get the aws s3 sdk, core sdk, jackson and httpclient in sync.
it's easiest to give up and just go with the full aws-java-sdk-bundle, which has a consistent set of aws artifacts and private versions of the dependencies. It is huge -but takes away all issues related to transitive dependencies
Turns out adding dependency to aws-java-sdk-core explicitely solved my problem, as mentioned here. That way I can avoid heavy aws sdk bundle.
build.sbt:
scalaVersion := "2.12.15"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.1.2",
"org.apache.spark" %% "spark-sql" % "3.1.2",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.12.2",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.12.2",
"org.apache.hadoop" % "hadoop-client" % "3.2.1",
"org.apache.hadoop" % "hadoop-aws" % "3.2.1",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.375",
"com.amazonaws" % "aws-java-sdk-core" % "1.11.375"
)

Scala lambda only failing in AWS

Im writing my first scala lambda, and locally everything connects and works fine. However, when I try to test my lambda in AWS, I get the following error.
{
"errorMessage": "Error loading class FooBar.Main: scala/collection/Seq",
"errorType": "java.lang.NoClassDefFoundError"
}
From my googling, its seems this is cause I needed to add the scala library to my dependencies, which I did.
name := "FooBar"
version := "0.1"
scalaVersion := "2.12.12"
javacOptions ++= Seq("-source", "1.8", "-target", "1.8", "-Xlint")
lazy val root = (project in file(".")).
settings(
name := "FooBar",
version := "1.0",
scalaVersion := "2.12.12",
retrieveManaged := true
)
libraryDependencies += "software.amazon.awssdk" % "ec2" % "2.5.60"
libraryDependencies += "com.amazonaws" % "aws-lambda-java-core" % "1.2.0"
libraryDependencies += "com.amazonaws" % "aws-lambda-java-events" % "2.1.0"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-dynamodb" % "1.11.313"
libraryDependencies += "org.scalikejdbc" %% "scalikejdbc" % "3.4.0"
libraryDependencies += "org.apache.phoenix" % "phoenix-core" % "4.14.3-HBase-1.4"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.4.10"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.4.10"
libraryDependencies += "io.spray" %% "spray-json" % "1.3.2"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" % "test"
libraryDependencies += "org.scala-lang" % "scala-library" % "2.12.12"
assemblyShadeRules in assembly := Seq(
ShadeRule.keep("x.**").inAll,
ShadeRule.keep("FooBar.**").inProject
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs#_*) => MergeStrategy.discard
case x => MergeStrategy.first
}
again, everything works fine locally, just can never execute on AWS. Anyone have an idea?
The sbt-assembly plugins shade rule ShadeRule.keep documentation states
The ShadeRule.keep rule marks all matched classes as "roots". If any
keep rules are defined all classes which are not reachable from the
roots via dependency analysis are discarded when writing the output
jar.
https://github.com/sbt/sbt-assembly#shading
So in this case all the classes of the class path x.* and FooBar.* are retained while creating the fat jar. Rest all other classes including the classes in scala-library are discarded.
To fix this remove all the ShadeRule.keep rules and instead try ShadeRule.zap to selectively discard classes not required.
For example, the following shade rules removes all the HDFS classes from the far jar:
assemblyShadeRules in assembly := Seq(
ShadeRule.zap("org.apache.hadoop.hdfs.**").inAll
)
PS: AWS Lambda had a hard limit of 256MB of code size after unzipping the fat jar.

Scala AWS size limit

I have been trying to use the shadeRules in SBT to bring my jar file for my scala lambda down to size. Currently, AWS requires the jar to be no bigger than 50megs.
The problem really comes into play because I am trying to access a phoenix database, a LOT of classes come along for the ride, and I am constantly hitting up against the 50meg size limit, and then hunting for files to delete (zap).
I feel there has to be an more automated process to do this. Am i just missing something? right now I update my jar, upload to AWS, get the error for which files I am missing and add them (almost all of my shadeRules are keeps, and then zap to delete unneeded files inside those libraries). This is a slow, long, boring process.
Thanks
Thanks
EDIT:
As asked, here are my added libraries:
libraryDependencies += "software.amazon.awssdk" % "ec2" % "2.5.60"
libraryDependencies += "com.amazonaws" % "aws-lambda-java-core" % "1.2.0"
libraryDependencies += "com.amazonaws" % "aws-lambda-java-events" % "2.1.0"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-dynamodb" % "1.11.313"
libraryDependencies += "org.scalikejdbc" %% "scalikejdbc" % "3.4.0"
libraryDependencies += "org.apache.phoenix" % "phoenix-core" % "4.14.3-HBase-1.4"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.4.10"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.4.10"
libraryDependencies += "io.spray" %% "spray-json" % "1.3.2"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" % "test"
libraryDependencies += "org.scala-lang" % "scala-library" % "2.12.12"
Try uploading it via S3, real limit is 250 MB uncompressed.
Create S3 deployment bucket.
sam package \
--profile ${PROFILE} \
--region ${REGION} \
--template-file template.yaml \
--s3-bucket ${S3_BUCKET} \
--output-template-file ./build/package.yaml
sam deploy \
--profile ${PROFILE} \
--region ${REGION} \
--template-file ./build/package.yaml \
--stack-name ${APPLICATION}-lambda \
--capabilities CAPABILITY_NAMED_IAM
Note: Try to minimise pacakge size it reflects during cold start.
Verify one more time your package if it has some extra dependencies included.
If you list you entire dependency tree ,we could give you better hints.
sbt "inspect tree clean"

Unresolved dependencies in Akka http

I keep getting unresolved dependencies with the code below. Any clue what I can do to clear the error?
name := "AkkaDemo"
version := "1.0"
scalaVersion := "2.11.8"
val scalaTestVersion = "3.0.1"
resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
lazy val akkademoService = project.settings (libraryDependencies ++= Seq(
"mysql" % "mysql-connector-java" % "5.1.25",
"com.typesafe.slick" %% "slick"% "3.1.0",
"com.typesafe.slick" %% "slick-hikaricp" % "3.1.0",
"com.typesafe.akka" %% "akka-actor" % "2.4.16",
"com.typesafe.akka" %% "akka-http" % "10.0.1",
"com.typesafe.akka" % "akka-slf4j" % "2.3.14"
)).
dependsOn(instanceConfig)
lazy val instanceConfig = project
lazy val AkkaDemo = project.in(file(".")).aggregate(instanceConfig, akkademoService)
Here is the sbt output for the sbt run:
Error:Error while importing SBT project:<br/>...<br/><pre>[info]
Resolving org.fusesource.jansi#jansi;1.4 ...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: com.typesafe.akka#akka-actor_2.10;2.4.16: not found
[warn] :: com.typesafe.akka#akka-slf4j;2.3.14: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
This was solved by adding build.sbt with the scalaversion to use to each module. Apparently each module, without a build.sbt specifying the version to use, default to 2.10.

Spark Unit Testing

My entire build.sbt is:
name := """sparktest"""
version := "1.0.0-SNAPSHOT"
scalaVersion := "2.11.8"
scalacOptions := Seq("-unchecked", "-deprecation", "-encoding", "utf8", "-Xexperimental")
parallelExecution in Test := false
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.2",
"org.apache.spark" %% "spark-sql" % "2.0.2",
"org.apache.avro" % "avro" % "1.8.1",
"org.scalatest" %% "scalatest" % "3.0.1" % "test",
"com.holdenkarau" %% "spark-testing-base" % "2.0.2_0.4.7" % "test"
)
I have a simple test. Obviously, this is just a starting point, I'd like to test more:
package sparktest
import com.holdenkarau.spark.testing.DataFrameSuiteBase
import org.scalatest.FunSuite
class SampleSuite extends FunSuite with DataFrameSuiteBase {
test("simple test") {
assert(1 + 1 === 2)
}
}
I run sbt clean test and get a failure with:
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf$ConfVars
For my dev environment, I'm using the spark-2.0.2-bin-hadoop2.7.tar.gz
Do I have to configure this environment in any way? Obviously HiveConf is a transitive Spark dependency
As #daniel-de-paula mentions in the comments you will need to add spark-hive as an explicit dependency (you can restrict this to the test scope though if you aren't using hive in your application its self). spark-hive is not a transitive dependency of spark-core which is why this error happened. spark-hive is excluded from spark-testing-base as a dependency so that people who are doing RDD only tests don't need to add it as a dependency.