Error: Could not find or load main class org.apache.cassandra.stress.Stress - elassandra

How can I do a cassandra-stress on a Elassandra cluster?
cassandra-stress is not on elassandra file system

Stress tool is located in $CASSANDRA_HOME/tools/bin
Prior to v6.2.3.11, there was a packaging issue with the stress tool. If you use an older version, you have to download it from the official Cassandra distribution.

Related

Problems Integrating Hadoop 3.x on Flink cluster

I am facing some issues while trying to integrate Hadoop 3.x version on a Flink cluster. My goal is to use HDFS as a persistent storage and store checkpoints. I am currectly using Flink 1.13.1 and HDFS 3.3.1. The error that I am getting while trying to submit a job is that HDFS is not supported as a file system. In the standalone version, this error was solved by specifying the HADOOP_CLASSPATH on my local machine. As a next step I applied the solution above in all the machines that are used in my cluster and in the standalone version I managed to successfully submit my jobs in all of them without facing any issues. However, when I started modifying the configurations to setup my cluster (by specifying the IPs of my machines) that problem came up once again. What I am missing?
In Hadoop 2.x there are the pre-bundled jar files in the official flink download page that would solve similar issues in the past but that's not the case with Hadoop 3.x versions
It should be enough to set HADOOP_CLASSPATH on every machine in the cluster.
For anyone still struggling with a similar issue, the answer proposed by David worked for me in the end. The detail that I was missing was in the definition of the environment variables.
In my initial attempts, I was using the .bashrc script to permanently define my environment variables. This works in the standalone cluster which is not the case with a distributed cluster due to the scope of the script. What actually worked for me was defining my variables(and $HADOOP_CLASSPATH) in the /etc/profile
I also managed to find another solution while was struggling with HADOOP_CLASSPATH. As I mentioned in my initial post, in Hadoop 2.x there are pre-bundled jar files in the official Flink download page to support HDFS integration, which is not the case in Hadoop 3.x. I found the following maven repository page and after testing all of the existing jars I managed to find one that worked in my case. To be more precise, for Hadoop 3.3.1 the 3.1.1.7.2.8.0-224-9.0 jar (Placed the jar in the $FLINK_HOME/lib) worked. While it is not an "official solution" it seems to solve the issue.
https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-3-uber?repo=cloudera-repos

Unable to import module 'lambda_function': No module named '_awscrt'

I'm working with this article Asynchronous Amazon Transcribe Streaming SDK for Python.
I'm trying to create a lambda layer for the required libraries.
I used the following command:
pip3 install amazon-transcribe aiofile -t .
But I get the following error when I use the layer in my lambda function:
Unable to import module 'lambda_function': No module named '_awscrt'
The same works fine with virtual environment locally. I'm not sure what's the exact issue.
I even tried installing awscrt separately but it didn't work.
Any kind of help will be greatly appreciated. Thanks!
Lambda layers .zip files need to follow a specific directory file structure. Look at this section of the documentation to see how it should be structured for Python. This might be your problem.
I built the layer on Amazon Linux and it worked fine!
The troubleshooting guide in the repo helped:
The caio linux implementation works normal for modern linux kernel versions and file systems. So you may have problems specific for your environment. It's not a bug and might be resolved some ways:
1. Upgrade the kernel
2. Use compatible file system
3. Use threads based or pure python implementation.

Node.JS native addons on LINUX [duplicate]

I'm using AWS Lambda, which involves creating an archive of my node.js script, including the node_modules folder and uploading that to their infrastructure to run.
This works fine, except when it comes to node modules with native bindings (using node-gyp). Because the binding was complied and project archived on my local computer (OS X), it is not compatible with AWS's (Amazon Linux) servers.
How can I cross-compile/install a node module (specifically, node-sqlite3) so when I upload it to another server arch it runs?
While not really a solution to your problem, a very easy workaround could be to simply compile the native addons on a Linux machine.
For your particular situation, I would use Vagrant. Vagrant can create virtual machines and configure them within seconds.
Find an OS image that resembles Amazon's Linux distro (Fedora, CentOS, others that use yum as package manager - see Wiki)
Use a simple configuration script that, when run by Vagrant on machine startup, will run npm install (optionally it might also remove the node_modules folder before to ensure a clean installation)
For extra comfort, the script can also create the zip file for deployment
Once the installation finishes, the script will shutdown the VM to avoid unnecessary consumption of system resources
Deploy!
It might require some tuning if the linked libraries are not at the same place on the target machine but generally this seems to me like the best and quickest solution.
While installing the app using Vagrant might be sufficient in some cases, I have found it necessary to build the app on Linux which is as close to Lambda's Amazon Linux AMI as possible.
You can read the original answer here: https://stackoverflow.com/a/34019739/303184
Steps to make it work:
Spawn new EC2 instance. Make sure it is based on exactly the same image as your AWS Lambda runtime. You can review Lambda env details here: http://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html. In our case, it was Amazon Linux AMI called amzn-ami-hvm-2015.03.0.x86_64-gp2.
Install nvm and use it to install the same version of Node.js as on the AWS Lambda. At the time of writing this, it was v0.10.36. You can refer to http://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html again to find out.
You will probably need to install git & g++ compiler on the EC2. You can do this running
sudo yum install git gcc-c++
Finally, clone your app to your new EC2 and install your app's dependecies:
nvm use 0.10.36
npm install --production
You can then easily download the node_modules using scp or such.
Same lines as Robert's answer, when I had to work on my MAC in a different OS I use vm ware like Oracle's free virtualizer VirtualBox to get a linux on my mac, no cost to me. Or sign up for a new AWS account, you get a micro for a year free. Use that to get your linux box, do whatever you need there.
AWS has a page describing how to deal with native NPM modules: https://aws.amazon.com/blogs/compute/nodejs-packages-in-lambda/

Does stopping google cloud instance will loose the installed programs on it ?

I have recently initialized a GPU instance on Google cloud, and installed Anaconda and installed all required dependencies before I stoped that instance. Now when I started the instance, it does not have anaconda installed in it. I found it is so weird. Please let me know if you know any details on it. I also looked into details from the doc of google, I don't find any related comments that should behave like this.
https://cloud.google.com/compute/docs/instances/stopping-or-deleting-an-instance
No, this should not happen if programs got installed properly in persistent/boot disk file system.
If programs are supposedly installed in TMPFS or other memory mapped file system then after the instance is rebooted the memory contents would be lost and consequently data and links to it.
However, this is never done as VM Instance packages are installed in persistent disk.
I guess your installation failed for some reason. Check if the packages are still installed. If you are using a Redhat Linux variant you can use ‘yum list installed’ to see all installed packages or ‘yum list installed|grep -i <package-to-search-for> to filter out a particular package.
If the package shows up, then the issue could be related to a misconfiguration or other problem somewhere. Use dmesg and/or cat /var/log/messages to view the logs and try to find any problems there which may be related to Anaconda or GPU software.
I just encountered the same problem. I know this question is dated but might help a complete beginner like myself. In my case I needed to SSH onto the instance instead of just being in the project level virtual environment.
gcloud beta compute ssh --zone "europe-west2-c" "myinstancename" --project "fired-brimstone-234534"

Zeppelin: how to download or save the Zeppelin notebook?

I am using Zeppelin sandbox with aws EMR.
Is there a way to download or save the zeppelin notebook in a way so that it can be imported into another Zeppelin server ?
As noted in the comments above, this feature is available starting in version 0.5.6. You can find more details in the release notes. Downloading and installing this version would solve that issue.
Given that you are using EMR, it looks like you will have to work with the version available. As Samuel mentioned above, you can backup the contents of the incubator-zeppelin/notebook folder and make the transfer.