How to snappy compress a file using a python script - python-2.7

I am trying to compress in snappy format a csv file using a python script and the python-snappy module. This is my code so far:
import snappy
d = snappy.compress("C:\\Users\\my_user\\Desktop\\Test\\Test_file.csv")
with open("compressed_file.snappy", 'w') as snappy_data:
snappy_data.write(d)
snappy_data.close()
This code actually creates a snappy file, but the snappy file created only contains a string: "C:\Users\my_user\Desktop\Test\Test_file.csv"
So I am a bit lost on getting my csv compressed. I got it done working on windows cmd with this command:
python -m snappy -c Test_file.csv compressed_file.snappy
But I need it to be done as a part of a python script, so working on cmd is not fine for me.
Thank you very much,
Álvaro

You are compressing the plain string, as the compress function takes raw data.
There are two ways to compress snappy data - as one block and the other as streaming (or framed) data
This function will compress a file using framed method
import snappy
def snappy_compress(path):
path_to_store = path+'.snappy'
with open(path, 'rb') as in_file:
with open(path_to_store, 'w') as out_file:
snappy.stream_compress(in_file, out_file)
out_file.close()
in_file.close()
return path_to_store
snappy_compress('testfile.csv')
You can decompress from command line using:
python -m snappy -d testfile.csv.snappy testfile_decompressed.csv
It should be noted that the current framing used by python / snappy is not compatible with the framing used by Hadoop

Related

Lambda not connecting to ffmpeg

I have an issue with a Lambda function that tries to use ffmpeg as a third party on AWS. The function itself uses ffmpeg.js library which generates ffmpeg commands in it's functions, when they are called. I installed ffmpeg on my instance via SSH, and it's still giving me the same error
Command failed: ffmpeg -i "....
ffmpeg: command not found
Any advice on this? Many thanks
You need to include static build of ffmpeg inside your project directory
Download x86_64 version. As it the one used my lambda environment
Unzip the file and copy ffmpeg named file which is binary build and paste it in your project directory.
After that on the top of your code paste the following snippet:
process.env.PATH = process.env.PATH + ':/tmp/'
process.env['FFMPEG_PATH'] = '/tmp/ffmpeg';
const BIN_PATH = process.env['LAMBDA_TASK_ROOT']
rocess.env['PATH'] = process.env['PATH'] + ':' + BIN_PATH;
Now inside your exports.handler, paste the following line of code in the beginning of function call. It will look like this
exports.handler = function(event, context, callback) {
require('child_process').exec(
'cp /var/task/ffmpeg /tmp/.; chmod 755 /tmp/ffmpeg;',
function (error, stdout, stderr) {
if (error) {
console.log('Erro occured',error);
} else {
var ffmpeg = require('ffmpeg');
// Your task to be performed
}
}
)
}
I hope this helps. Don't forget to leave a thumbs up :)
Above solution is for Node.js language
I successfully can work with ffmpeg on AWS Lambda in Python:
Get static build of ffmpeg from here.
Untar with tar -zxvf ffmpeg-release-amd64-static.tar.xz
Fetch file ffmpeg (and optionally ffprobe) from folder and delete rest of files.
Put bare ffmpeg file (without the subfolder) in the same folder as your lambda code.
cd into this folder and zip with zip -r -X "../archive.zip" *
Upload zipped file to AWS Lambda and save.
In your Python code you need to set the correct filepath to the ffmpeg static build like so:
FFMPEG_STATIC = "/var/task/ffmpeg"
# now call ffmpeg with subprocess
import subprocess
subprocess.call([FFMPEG_STATIC, '-i', input_file, output_file])
I didn´t have to change any file permissions. This wouldn't have worked anyways because /var/task/ doesn't seem to be writeable.
input_file and output_file are local files in your spawned Lambda instance. I download my files from s3 to /tmp/ and do the processing with ffmpeg there. Make also sure to set sufficient memory and timeout for the Lambda (I use maximum settings for my workflow).

Windows executable file generated by PyGame + PyInstaller gives me "No such file or directory" error

I have first tried to create my original game sample using Pygame.
I have traced the instruction written in the Web site below:
https://irwinkwan.com/2013/04/29/python-executables-pyinstaller-and-a-48-hour-game-design-compo/
Specifically,
I created "myFile" class something like following and read every file (.txt, .png, .mp3, etc...) using this class
myFile Class
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
class myFile():
def resource_path(self, relative):
if hasattr(sys, "_MEIPASS"):
return os.path.join(sys._MEIPASS, relative)
return os.path.join(relative)
### code where I read something ###
myfile = myFIle()
filename = myfile.resource_path(os.path.join("some dir", "somefile"
+ "extension")
I typed command below to create .spec file (myRPG.py contains main)
pyinstaller --onefile myRPG.py
I modified .spec file so exe object include Trees (I store data files in separate
directories such as data, image, and so on)
myRPG.spec file
# -*- mode: python -*-
block_cipher = None
a = Analysis(['myRPG.py'],
pathex=['C:\\mygame\\mygame'],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
Tree('bgm', prefix='bgm'),
Tree('charachip', prefix='charachip'),
Tree('data', prefix='data'),
Tree('image', prefix='image'),
Tree('mapchip', prefix='mapchip'),
Tree('se', prefix='se'),
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
name='myRPG',
debug=False,
strip=False,
upx=True,
console=True )
I did rebuild my package using modified .spec file
pyinstaller myRPG.spec
When I execute myRPG.exe file, I got the error below
C:\mygame\mygame\dist>myRPG.exe
[Errno 2] No such file or directory:
'C:\\Users\\bggfr\\AppData\\Local\\Temp\\_MEI64~1\\item.data' IO Errorが発生しました。
Traceback (most recent call last):
File "myRPG.py", line 606, in <module>
File "myRPG.py", line 37, in __init__
File "myItemList.py", line 11, in __init__
File "myItemList.py", line 33, in load
UnboundLocalError: local variable 'fp' referenced before assignment
Failed to execute script myRPG
I believe that I properly specify the directories where data is expanded since I use the function that checks _MEIPASS but it does not work.
I have also tried to use "added_files" instead of Tree did not help for me at all. ”a.datas" may work for small number of files but I do not want to specify all the files I'm going to use, because it will be over hundreds of thousands of files.
This is a very helpful page that I used when converting my pygame program into an executable.
In short:
Here is the place where you can download pyInstaller. Click on the download button under "Installation." Wait for it to download, then run it.
Type this into the popup where it says command: venv -c -i pyi-env-name. The reason for this is stated in the page mentioned at the top.
If you have pyInstaller installed already, type pyinstaller --version to make sure. If you don't, type pip install PyInstaller. If this doesn't work, look at step 7 in the guide up top.
Make sure that you put a copy of your program in the folder that it says on the command line.
If you want a zip with the exe and all necessary files in it type pyinstaller and then the name of your program, complete with the .py extension. However, if you want an exe that doesn't need all of the extra files in the zip that you can just run by itself, type pyinstaller --onefile --windowed and then the name of your program, complete with the extension of .py.
Once you have done this, the exe will appear in a dist folder inside of the one where you placed your program in step 3.
This is how it worked for me, and I apologize if I have misinformed you. Be sure to upvote/mark as answered if this works.

GNU Parallel to run Python script on huge file

I have a file which contains XML elements in every line, which needs to be converted to JSON. I have written a Python script which does the conversion but runs in serial mode. I have two options to use Hadoop or GNU Parallel, I have tried Hadoop and want to see how GNU could help, will be simple for sure.
My Python code is as follows:
import sys
import json
import xmltodict
with open('/path/sample.xml') as fd:
for line in fd:
o=xmltodict.parse(line)
t=json.dumps(o)
with open('sample.json', 'a') as out:
out.write(t+ "\n")
So can I use GNU parallel to directly work on the huge file or do I need to split it?
Or is this right:
cat sample.xml | parallel python xmltojson.py >sample.json
Thanks
You need to change your Python code to a UNIX filter, i.e. a program that reads from standard input (stdin) and writes to standard output (stdout). Untested:
import fileinput
import sys
import json
import xmltodict
for line in fileinput.input():
o=xmltodict.parse(line)
t=json.dumps(o)
print t + "\n"
Then you use --pipepart in GNU Parallel:
parallel --pipepart -a sample.xml --block -1 python my_script.py

Weka: Array Index out of bounds exception with CSV files

I get an unfriendly ArrayOutOfBoundsException when I try to input a CSV file to Weka. But it works fine when I use the same in the GUI.
pvadrevu#MacPro~$ java -Xmx2048m -cp weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -t "some.csv" -d temp.model
Refreshing GOE props...
[KnowledgeFlow] Loading properties and plugins...
[KnowledgeFlow] Initializing KF...
java.lang.ArrayIndexOutOfBoundsException: 1
weka.classifiers.evaluation.Evaluation.setPriors(Evaluation.java:3843)
weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1503)
weka.classifiers.Evaluation.evaluateModel(Evaluation.java:650)
weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:359)
weka.classifiers.functions.Logistic.main(Logistic.java:1134)
at weka.classifiers.evaluation.Evaluation.setPriors(Evaluation.java:3843)
at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1503)
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:650)
at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:359)
at weka.classifiers.functions.Logistic.main(Logistic.java:1134)
Turns out that the new versions of Weka donot handle CSV files through command line. There are two options:
Revert back to an older version of Weka. 3.6.11 works fine for me while 3.7.11 does not.
Convert the CSV files to ARFF. It can be done using the Weka GUI.

Create a session secret - Python - Google Drive SDK

Using: Windows 7, Python 2.7
Code:
Dr Edit - Google's example for creating a new file in Google Drive
Dr Edit Using Python with Google Drive SDK
The README instructions state:
Create a session secret, which should be at least 64 bytes of random
characters, for example with
python -c "import os; print os.urandom(64)" > session.secret
I have no idea how to run this command line. I've tried to run it in the Windows Command Prompt. I've tried to run it in Python Shell. I've tried to run it in the Python Commmand Line. SyntaxError: invalid syntax
Is this README doc correct? Is it outdated? Am I doing something wrong? Is this a line of code that is supposed to be part of a python program? What is the -c for?
I just realized the the python -c is probably just notation to indicate that the user should use the python command line to enter what follows.
So I entered:
import os
print os.urandom(64)
And a string of special characters were printed:
タAËDkæ4ÃZ﾿ヌTツUルヒL゚é-ツؓルト0ᄚ'Ënᄚô#ߝUíèRKRüÏ*'ᄇᄂヤナ%ÿëᄄÄÓò&
There is probably supposed to be a session.secret object? file? variable? that the string of special characters get written to.
I needed to learn the Python Print function.
You can read and write files in Python without importing a library.
The Open() function in Python will open a file, OR create a new file
There are four modes to open (create) a file
'r' read only
'w' write only
'a' append
'r+' both reading and writing
This will create a new text file named "daSecretCode" if there wasn't one already. NOTE: The file gets created in the directory from where the .py module was run from.
daNewFile = open("daSecretCode.txt", "w")
Write content to file:
daNewFile.write("oerjfwi456745oetr78u9f9oirhiwurhkwejr")
Close the file:
daNewfile.close()
Write a new session secret to a new file:
import os
daSecretCode = repr(os.urandom(64))
daNewfile = open("daSecretCode.txt", "w")
daNewfile.write(daSecretCode)
daNewfile.close()
I'm not sure how I could format the content to look like the original output. The characters get formatted into something else.