I`m want to write the script for Check bitcoin private addresses for money from csv file.
Python 2.7.16 64-bit on Ubuntu 19.04
import requests
from pybitcoin import BitcoinPrivateKey
import pybitcoin
import time
keys = set()
with open('results.csv') as f:
for line in f.read().split('\n'):
if line:
repo_name, path, pkey = line.split(",")
keys.add(pkey)
for priv in keys:
try:
p = BitcoinPrivateKey(priv)
pub = p.public_key().address()
r = requests.get("https://blockchain.info/rawaddr/{}".format(pub))
time.sleep(1)
print '{} {} {:20} {:20} {:20} '.format(priv, pub,
r.json()['final_balance'],
r.json()['total_received'],
r.json()['total_sent'])
except (AssertionError, IndexError):
pass
except ValueError:
print r
print r.text
Exception has occurred: ValueError
too many values to unpack
File "/home/misha/bitcoinmaker/validate.py", line 9, in <module>
repo_name, path, pkey = line.split(",")
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/runpy.py", line 82, in _run_module_code
mod_name, mod_fname, mod_loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 252, in run_path
return _run_module_code(code, init_globals, run_name, path_name)
Some CSV data (its cut version of file (original file have 14042 lines of code)):
repo_name,path,pkey
tobias/tcrawley.org,src/presentations/ClojureWestJava9/assets/AB1E09C5-A5E3-4B1D-9E3B-C2E586ACFAC2/assets/AB1E09C5-A5E3-4B1D-9E3B-C2E586ACFAC2.pdfp,----
annavernidub/MDGServer,specs/resources/xml/results.xml,----
gyfei/bitcc,routes/home.js~,----
Nu3001/external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,----
cdawei/digbeta,dchen/music/format_results.ipynb,----
bitsuperlab/cpp-play,tests/regression_tests/issue_1229_public/alice.log,----
justin/carpeaqua-template,assets/fonts/562990/65B3CCFE671D2E128.css,----
dacsunlimited/dac_play,tests/regression_tests/issue_1218/alice.log,----
amsehili/audio-segmentation-by-classification-tutorial,multiclass_audio_segmentation.ipynb,----
biosustain/pyrcos,examples/.ipynb_checkpoints/RegulonDB network-checkpoint.ipynb,----
blockstack/blockstore,integration_tests/blockstack_integration_tests/scenarios/name_pre_reg_stacks_sendtokens_multi_multisig.py,----
gitcoinco/web,app/assets/v2/images/kudos/smart_contract.svg,----
Is it csv file are too large??
or maybe its some syntax error?
What im missing?
This is the way I would do it with a check and add some logging:
import csv
unknown_list = []
with open('results.csv') as f:
f = csv.reader(f, delimiter=',')
for line in f:
if len(line) == 3:
pkey = line[2]
keys.add(pkey)
else:
temp_list = []
for i in range(len(line)):
temp_list.append(line[i])
unknown_list.append(temp_list)
After that you can view what lines gave you the issue by printing or logging the unknown list. You can also log at every step of the process to see where the script is breaking.
If you provide some sample data from your results.csv file it will be easier to give you an accurate answer.
In general your line:
repo_name, path, pkey = line.split(",")
is providing three variables for the values derived from the line being split by a comma delimiter, but the split is producing more than three values.
The sample data:
sobakasu/vanitygen,README,5JLUmjZiirgziDmWmNprPsNx8DYwfecUNk1FQXmDPaoKB36fX1o
lekanovic/pycoin,tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
wkitty42/fTelnet,rip-tests/16c/SA-MRC.RIP,5HHR5GHR5CHR5AHR5AHR59HR57HR57HR54HR53HR52HR51HR4ZH
NKhan121/Portfolio,SAT Scores/SAT Project.ipynb,5Jy8FAAAAwGK6PMwpEonoxRdfVCAQaLf8yJEjeuGFFzrdr7m5We
chengsoonong/digbeta,dchen/music/format_results.ipynb,5JKSoHtHSE2Fbj3UHR4A5v4fVHFV92jN5iC9HKJ4MvRZ7Ek4Z7j
the-metaverse/metaverse,test/test-explorer/commands/wif-to-ec.cpp,5JuBiWpsjfXNxsWuc39KntBAiAiAP2bHtrMGaYGKCppq4MuVcQL
hessammehr/ChemDoodle-2D,data/spectra/ir_ACD.jdx,5K626571K149659j856919j347351J858139j136932j515732
designsters/android-fork-bitcoinj,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
HashEngineering/groestlcoinj,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
dyslexic-charactersheets/assets,languages/german/pathfinder/Archetypes/Wizard/Wizard (Familiar).pdf,5KAD7sCUfirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdi
ElementsProject/elements,src/wallet/rpcwallet.cpp,5Kb8kLf9zgWQnogidDA76MzPL6TsZZY36hWXMssSzNydYXYB9KF
pavlovdog/bitcoin_in_a_nutshell,markdown_source/bitcoin_in_a_nutshell_cryptography.md,5HtqcFguVHA22E3bcjJR2p4HHMEGnEXxVL5hnxmPQvRedSQSuT4
ValyrianTech/BitcoinSpellbook-v0.3,unittests/test_keyhelpers.py,5Jy4SEb6nqWLMeg4L9QFsi23Z2q6fzT6WMq4pKLfFqkTC389CrG
jrovegno/fci,Derechos-Agua-Petorca.ipynb,5JNZvHgxAAsWLGDJkiUsW7YMePhNGvT5UYPSHvNNfX54eHig2mM
ionux/bitforge,tests/data/privkey.json,5Ke7or7mg3MFzFuPpiTf2tBCnFQk6dR9qsbTmoE74AYWcQ8FmJv
chengsoonong/mclass-sky,projects/alasdair/notebooks/09_thompson_sampling_vstatlas.ipynb,5Keno6jz32WJXtERERNVZoCXdDhgypMhe1VmnQ54mVY1wuV62r
surikov/webaudiofontdata,sound/12842_4_Chaos_sf2_file.js,5HGw2nKaKhUAtyFG6aLMpTRULLRiTyHBCMFnFRWg6BsaULCwta
malikoski/wabit,src/main/resources/ca/sqlpower/wabit/example_workspace.wabit,5HSGKWPp2yPSqBzMxKyUfKyeuRybCzdi5cxV6Nmur8q78gyRLc6
GemHQ/money-tree,spec/lib/money-tree/address_spec.rb,5JXz5ZyFk31oHVTQxqce7yitCmTAPxBqeGQ4b7H3Aj3L45wUhoa
wkitty42/fTelnet,rip-tests/16c/SS-TT1.RIP,5KA85HAB5GAH5CAH5CAC5EA55FA95AAG56AG54A559A55AA955A
norsween/data-science,springboard-answers-to-exercises/Springboard Data Story Exercise.ipynb,5HA4jkQHgtPh9AsAKhoqUpB7FJykpSW1tbYqLi1NdXV3QXnNzs
alisdev/dss,dss-xades/src/test/resources/plugtest/esig2014/ESIG-XAdES/ES/Signature-X-ES-31.xsig,5HbuDMDxVSaWBvVRPSoaL8S6Vdtv3yBkz5bSgjAo1YKQn4q435
AmericasWater/awash,docs/compare-simulate.ipynb,5JAYT9xf2vChY6fDgWPnohZnfsxhmqA2yBRs1HHskMikDkvKto
coincooler/coincooler,spec/helpers/data_helper_spec.rb,5J2PLz9ej2k7c1UEfQANfQgLsZnFVeY5HjZpnDe1n6QSKXy1zFQ
magehost/magento-malware-scanner,corpus/backend/8afb44c2bd8e4ed889ec8a935d3b3d38,5HcfPTZ4JTqg6zdvKSRv92dUzgDEZBuvWiucdiZ7KS8UJWDU3y
daily-wallpapers/daily-wallpapers.github.io,2018/12/22/bd05d9380ae4459588eef75e0e25fc2c.html,5KafK34DXHztjFfL51J48dSvXUxtWW5akij5RVYEAaR228Kq7j
Grant-Redmond/cryptwallet,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
ryanralph/DIY-Piper,keys.txt,5JYiNDZgZrH9sDR6FC9XSG175FoBDKPrrt6eyyKxPCdQ1AWJgDD
FranklinChen/IHaskell,notebooks/IHaskell.ipynb,5Ktqn7AjR45QoEABnJ2d5Sd6hRBCPBaLxUJCQgJxcXGEhoZKwD
djredhand/epistolae,sites/all/modules/panopoly_demo/panopoly_demo.features.content.inc,5HbHJ6fWvXPDfhKbxT4ein8RTyw2kgDC3A2sR2J9PpXrGjabpvh
SAP/cloud-portal-tutorial,Image Gallery.html,5KepoCFUEsCQYFGkU3AvgK2AxqEjAp3CjjpaTCeWkdjQ3MDCznq
yueou/nyaa,public/img/mafuyu.svg,5K4do2Fj2wgV9FPLkoWz3Sk9au3M3BQStA5TmvvfYaL798nfPnz
dwiel/tensorflow_hmm,notebooks/gradient_descent_example.ipynb,5JXNt3LQB9NAgYHU2naQLmdJwBPP1ML2R21QS4cAYAR2frp87Z
imrehg/electrum,lib/tests/test_account.py,5Khs7w6fBkogoj1v71Mdt4g8m5kaEyRaortmK56YckgTubgnrhz
thirdkey-solutions/pycoin,tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
asm-products/AssemblyCoins,API/Whitepaper.md,5JYVttUTzATan4zYSCRHHdN2nfJJHv6Nu1PB6VnhWSQzQRxnyLa
karlobermeyer/numeric-digit-classification,numeric_digit_classification.baseline.ipynb,5KaGgoQUFBvPTSS1y5cqVRTXV1NTNnziQ2NpZ27doRFhZGSkoK
richardkiss/pycoin,tests/cmds/test_cases/ku/bip32_keyphrase.txt,5JvNzA5vXDoKYJdw8SwwLHxUxaWvn9mDea6k1vRPCX7KLUVWa7W
paulgovan/AutoDeskR,vignettes/AutoDeskR.html,5Jb6vUZGXEC1AcK2vzQW5oApRuJseyHkerTHgp4pbMbU5t5kgV
trehansiddharth/mlsciencebowl,git/hs/archive/357.txt.mail,5KAAACAAgALgABAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAEEAAA
mermi/EFD2015,img/weblitmap-1.1.svg,5H3756AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPj78
breadwallet/breadwallet-core,Java/Core/src/test/java/com/breadwallet/core/BRWalletManager.java,5Kb8kLf9zgWQnogidDA76MzPL6TsZZY36hWXMssSzNydYXYB9KF
zsoltii/dss,dss-xades/src/test/resources/plugtest/esig2014/ESIG-XAdES/ES/Signature-X-ES-74.xsig,5HbuDMDxVSaWBvVRPSoaL8S6Vdtv3yBkz5bSgjAo1YKQn4q435
gyglim/Recipes,examples/ImageNet Pretrained Network (VGG_S).ipynb,data/text/MATRYOSHKA-CHALLENGE,5HW67XgqGZKakwVrpftp9bzQFBig1gfWCPUTUyxWaVCCcfLV17
dyslexic-charactersheets/assets,languages/spanish/pathfinder/Archetypes/Druid/Plains Druid (Animal Companion).pdf,5J5ut7i5STzG9rLppih1CFLy8kKSTK8sfwpHXeJa99hUnfChHRf
keith-epidev/VHDL-lib,top/stereo_radio/ip/xfft/c_addsub_v12_0/hdl/c_addsub_v12_0_legacy.vhd,5JW64dZ8AXjc3DEXpwxS1wcUvakyfxBHNyPgk9SseKsP4PojeGV
gitonio/pycoin,COMMAND-LINE-TOOLS.md,5KhoEavGNNH4GHKoy2Ptu4KfdNp4r56L5B5un8FP6RZnbsz5Nmb
techdude101/code,C Sharp/SNMP HDD Monitor v0.1/SNMP HDD Monitor/mainForm.resx,5KSkv8FBQXjAAAASQAAACsAAAAcAAAADwAAAAUAAAABAAAAAAAA
RemitaBit/Remitabit,tests/regression_tests/short_below_feed/alice.log,5HpUwrtzSztqQpJxVHLsrZkVzVjVv9nUXeauYeeSxguzcmpgRcK
SportingCRED/sportingcrdblog,desempenho/jonathan_rosa.html,5HRm4g8amexBY4KFhCyBVy5Db6UNkSKAgKo2ogXibQxfSDVmjh
bashrc/zeronet-debian,src/src/Test/TestSite.py,5JU2p5h3R7B1WrbaEdEDNZR7YHqRLGcjNcqwqVQzX2H4SuNe2ee
martindale/fullnode,test/privkey.js,5JxgQaFM1FMd38cd14e3mbdxsdSa9iM2BV6DHBYsvGzxkTNQ7Un
atsuyim/ZeroNet,src/Content/ContentManager.py,5JCGE6UUruhfmAfcZ2GYjvrswkaiq7uLo6Gmtf2ep2Jh2jtNzWR
greenfield-innovation/greenfield-innovation.github.io,_site/examples/visual/index.html,5JZUUxKS1NzVaFe8Rt21aD6ELyD1n8FCadw334UkZHAXb3NSyZ
JorgeDeLosSantos/master-thesis,src/ch3/parts_01.svg,5KWSJQ8D2d3T6SNwwbSs3Mwh7RyV3qRbSUbf13GNmML9zN1FvC
JaviMerino/lisa,ipynb/tutorial/00_LisaInANutshell.ipynb,5JB3VPWgYAAACALLBw4cKwQ6hm3rx5YYcQCZxRRVbiXnHwifyCT
nopdotcom/2GIVE,src/vanitygen-master/README,5JLUmjZiirgziDmWmNprPsNx8DYwfecUNk1FQXmDPaoKB36fX1o
dacsunlimited/dac_play,tests/regression_tests/collected_fees/alice.log,5J3SQvvxRK4RfzFDcWZR5sLRkjrMvTn1FKXnzNGvWLgWdctLDQm
bussiere/ZeroNet,src/Content/ContentManager.py,5JCGE6UUruhfmAfcZ2GYjvrswkaiq7uLo6Gmtf2ep2Jh2jtNzWR
loon3/Tokenly-Pockets,Chrome Extension/js/bitcoinsig.js,5JeWZ1z6sRcLTJXdQEDdB986E6XfLAkj9CgNE4EHzr5GmjrVFpf
openledger/graphene-ui,web/lib/common/trxHelper.js,5KikQ23YhcM7jdfHbFBQg1G7Do5y6SgD9sdBZq7BqQWXmNH7gqo
tensorflow/probability,tensorflow_probability/examples/jupyter_notebooks/Factorial_Mixture.ipynb,5JBZJUGinA9k9twEaTFKNRiorTP6NEoJYzAhoBzRqBrpCiu8Bt
mostafaizz/Face_Landmark_Localization,BioID-FaceDatabase/BioID_0380.pgm,5JdWRQPQTTYWPRPTUSUPSUUXRSTSTPSXUSQVXSSPQQPTSSUTXY
number7team/SevenCore,test/test.PrivateKey.js,5KS4jw2kT3VoEFUfzgSpX3GVi7qRYkTfwTBU7qxPKyvbGuiVj33
thephez/electrum,lib/tests/test_account.py,5Khs7w6fBkogoj1v71Mdt4g8m5kaEyRaortmK56YckgTubgnrhz
surikov/webaudiofontdata,sound/12842_7_Chaos_sf2_file.js,5HGw2nKaKhUAtyFG6aLMpTRULLRiTyHBCMFnFRWg6BsaULCwta
CMPUT301F14T07/Team7-301Project,UML/UML.class.violet.html,5HSUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAHNePP
kernoelpanic/tinker-with-usbarmory,README.md,5JrAdQp23Zkqi4NFwSGoh6kftHochz56ctxuFVemX1vy4KozLvV
benzwjian/bitcoin-fp,test/ecpair.js,5J3mBbAH58CpQ3Y5RNJpUKPE62SQ5tfcvU2JpbnkeyhfsYB1Jcn
zaki/slave,SLave/SLaveForm.resx,5H3v8s6YfyJQZApinCSUNL6T32hHUDnneGdaiwCa8ekodUyfF6Z
bokae/szamprob,notebooks/Package04/pi.ipynb,5J5VdLSedNJqWbYso6TUxoj52bfA1XwmZse2N1TdMGvdgscMP3
sciant/sciant.github.io,assets/desktop/Oval.svg,5HCxg2h2kWtAYr8RmVxHSgYzBTDQwiZrD25K75ZbazTeJpHN1rK
UCL-BLIC/legion-buildscripts,cytofpipe/v1.3/Rlibs/openCyto/doc/openCytoVignette.html,5KEPZdiwYXq9nkajPfT2oiCgPnqU5xT1zhyLSE1fM74n8e8Pgs
xJom/mbunit-v3,src/Extensions/Icarus/Gallio.Icarus/Reload/ReloadDialog.resx,5Hsyo5SiKUCgUtJRSSyk151wrpTRjTANQSRzBrGfWiJc1BMePHz
GroestlCoin/bitcoin,src/wallet/test/psbt_wallet_tests.cpp,5KSSJQ7UNfFGwVgpCZDSHm5rVNhMFcFtvWM3zQ8mW4qNDEN7LFd
gitcoinco/web,app/assets/v2/images/kudos/doge.svg,5JCWfzJMfuHtoYFVMBYxXDbGLbpib6af32A5tEX9eTfbiEDyKy9
moocowmoo/dashman,lib/pycoin/tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
reblws/tab-search,src/static/lib/fonts.css,5KgjXgnXgNVBJ3HpaugH8GWwEr4NNgL2D9ofN25TvnfjvYkXWsK
schildbach/bitcoinj,core/src/test/java/org/bitcoinj/core/Base58Test.java,5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsreAbuatmU
QuantEcon/QuantEcon.notebooks,ddp_ex_career_py.ipynb,5JkqSRsSBaCKrqjiSvAv4fsAg4q6quGnJYkiSNjAWREABU1T8D
moncho/warpwallet,bitcoin/bitcoin_test.go,5JfEekYcaAexqcigtFAy4h2ZAY95vjKCvS1khAkSG8ATo1veQAD
BigBrother1984/android_external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,5Js3pmbMmKEaNmyoRowYoTj2kHb2MBW74ap169bq7NmzrvFDhg
taishi107/K-means_clustering,K-means_clustering3.ipynb,5HLKFQZeSqRSCTPGTLyVCKRSJ4zpGKXSCSS5wyp2CUSieQ5Qyp2
keith-epidev/VHDL-lib,top/stereo_radio/ip/xfft/xfft_v9_0/hdl/shift_ram.vhd,5HAT5FAXSTzJUoP6GBwzVHhyeEqVpX4CC8pfA8TEjGyxPv2tkY
sawatani/bitcoin-hall,test/Fathens/Bitcoin/Wallet/KeysSpec.hs,5JHm1YEyRk4dV8KNN4vkaFeutqpnqAVf9AWYi4cvQrtT5o57dPR
ivansib/sib16,src/test/data/base58_keys_valid.json,5KEyMKs1jynRbTfpGFPveXyxMcfZb1X9SnR3TneYQwRtXdzkzhL
ledeprogram/algorithms,class4/homework/najmabadi_shannon_4_3.ipynb,5Jfp5kZKyPAjkSNQwje6pGVVFbXpuUV1teS9WoqgLVKNiYTdQw
desihub/desitarget,doc/nb/connecting-spectra-to-mocks.ipynb,5JjZv3ozVq1ebXJfqjGtZW4bCJkVFRcjMzHToGpMmTcKJEycE97
eneldoserrata/marcos_openerp,addons/fleet/fleet_cars.xml,5KJ9TQiWUeuAtrhEWjgCZwAbgg3VLU4Ecg8pQCRBRmUpFGPSLex
JCROM-Android/jcrom_external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,5Js3pmbMmKEaNmyoRowYoTj2kHb2MBW74ap169bq7NmzrvFDhg
alixaxel/dump.HN,data/items/2014/03/08/23-24.csv,5HpHagT65TZzG1PH3CSu63k8DbpvD9KsvQVUCsn2t55TVA1jxW7
qutip/qutip-notebooks,development/development-ssesolve-tests.ipynb,5KM85KMs5CXg9ZCzHbZ3DPJa2wDP7uzm6e1dDKYtykNe3r26iT
bitshares/devshares,tests/regression_tests/short_below_feed/alice.log,5HpUwrtzSztqQpJxVHLsrZkVzVjVv9nUXeauYeeSxguzcmpgRcK
denkhaus/bitsharesx,tests/regression_tests/titan_test/client1.log,5JMnSU8bfBcu67oA9KemNm5jbs9RTp2eBHqxoR53WWyB4CH2QJF
ivanfoong/practical-machine-learning-assessment,building_human_activity_recognition_model.html,5Jno7z6Py8fkQkwASbABJgAE2AC9U5g4cKFcmoy4jheeeWVHh1
vikashvverma/machine-learning,mlfoundation/istat/project/investigate-a-dataset-template.ipynb,5KMV5Jz3ytfN7q9KTC4AqBPiLPhRwjLowiY19xaGhmRJ93auJjF
partrita/partrita.github.io,posts/altair/index.html,5KWn2EcAX6ibXaCEEXvTSJeP6T445Sc9mPreCPPrDUmX4cNegw
OriolAbril/Statistics-Rocks-MasterCosmosUAB,Rico_Block3/Block3_HT.ipynb,5HxYmbvSepRw6a73P3VoM9dipzifb4xaztSfWptxuqcdhvRM7Pj
wildbillcat/MakerFarm,MakerFarm/Migrations/201401212104523_Formatted external to humanfriendly column name.resx,5K1zhDkW7JZWeJsS9brtA8fojjTko3p1edWzMukZHXw5GZRciJN
nccgroup/grepify,Win.Grepify/Win.Grepify/Form1.resx,5J6AMbRdASzmN4pnx622HrzrEm7attVmygUetm88aLpJkAKMRUy
voytekresearch/misshapen,demo_Shape dataframe.ipynb,5HRUdGxuLsLAwBAcH49WrV3j27Bnu3bsns56oPCP1GjRokKdrA
SirmaITT/conservation-space-1.7.0,cs-models/Partners definitions/SMK/template/treatmentreportmultipleobjecttemplate.xml,5KoJspjvTDGPkPbDHC2NtohN1MKypc69Fiy9LFepfg6tt4S3orU
4flyers/4flyers.github.io,img/portfolio/portfolio_projects.svg,5KYWXACSGEEHJxmxnZB2HBQed3MyWEEEJ6ncA3zhVoU2Wxy69J
frrp/bitshares,tests/regression_tests/issue_1229_titan/alice.log,5KasHemYTcbGtHXKHNx5sUMPrrz8r4GuU3ao157F6Wx95y7NnbN
wkitty42/fTelnet,rip-tests/16c/JE-APOC.RIP,5HF56H958H15CGW5DGS5DGL5EGG5EG75EG15DFT5CFN5CFJ5AF
iobond/bitcore-old,test/test.WalletKey.js,5KMpLZExnGzeU3oC9qZnKBt7yejLUS8boPiWag33TMX2XEK2Ayc
I have to monitor an XML file being written by a tool running all the day. But the XML file is properly completed and closed only at the end of the day.
Same constraints as XML stream processing:
Parse an incomplete XML file on-the-fly and trigger actions
Keep track of the last position within the file to avoid processing it again from the beginning
On answer of Need to read XML files as a stream using BeautifulSoup in Python, slezica suggests xml.sax, xml.etree.ElementTree and cElementTree. But no success with my attempts to use xml.etree.ElementTree and cElementTree. There are also xml.dom, xml.parsers.expat and lxml but I do not see support for "on-the-fly parsing".
I need more obvious examples...
I am currently using Python 2.7 on Linux, but I will migrate to Python 3.x => please also provide tips on new Python 3.x features. I also use watchdog to detect XML file modifications => Optionally, reuse the watchdog mechanism. Optionally support also Windows.
Please provide easy to understand/maintain solutions. If it is too complex, I may just use tell()/seek() to move within the file, use stupid text search in the raw XML and finally extract the values using basic regex.
XML sample:
<dfxml xmloutputversion='1.0'>
<creator version='1.0'>
<program>TCPFLOW</program>
<version>1.4.6</version>
</creator>
<configuration>
<fileobject>
<filename>file1</filename>
<filesize>288</filesize>
<tcpflow packets='12' srcport='1111' dstport='2222' family='2' />
</fileobject>
<fileobject>
<filename>file2</filename>
<filesize>352</filesize>
<tcpflow packets='12' srcport='3333' dstport='4444' family='2' />
</fileobject>
<fileobject>
<filename>file3</filename>
<filesize>456</filesize>
...
...
First test using SAX failed:
import xml.sax
class StreamHandler(xml.sax.handler.ContentHandler):
def startElement(self, name, attrs):
print 'start: name=', name
def endElement(self, name):
print 'end: name=', name
if name == 'root':
raise StopIteration
if __name__ == '__main__':
parser = xml.sax.make_parser()
parser.setContentHandler(StreamHandler())
with open('f.xml') as f:
parser.parse(f)
Shell:
$ while read line; do echo $line; sleep 1; done <i.xml >f.xml &
...
$ ./test-using-sax.py
start: name= dfxml
start: name= creator
start: name= program
end: name= program
start: name= version
end: name= version
Traceback (most recent call last):
File "./test-using-sax.py", line 17, in <module>
parser.parse(f)
File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib64/python2.7/xml/sax/xmlreader.py", line 125, in parse
self.close()
File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 220, in close
self.feed("", isFinal = 1)
File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 214, in feed
self._err_handler.fatalError(exc)
File "/usr/lib64/python2.7/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: report.xml:15:0: no element found
Since yesterday I found the Peter Gibson's answer about the undocumented xml.etree.ElementTree.XMLTreeBuilder._parser.EndElementHandler.
This example is similar to the other one but uses xml.etree.ElementTree (and watchdog).
It does not work when ElementTree is replaced by cElementTree :-/
import time
import watchdog.events
import watchdog.observers
import xml.etree.ElementTree
class XmlFileEventHandler(watchdog.events.PatternMatchingEventHandler):
def __init__(self):
watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['*.xml'])
self.xml_file = None
self.parser = xml.etree.ElementTree.XMLTreeBuilder()
def end_tag_event(tag):
node = self.parser._end(tag)
print 'tag=', tag, 'node=', node
self.parser._parser.EndElementHandler = end_tag_event
def on_modified(self, event):
if not self.xml_file:
self.xml_file = open(event.src_path)
buffer = self.xml_file.read()
if buffer:
self.parser.feed(buffer)
if __name__ == '__main__':
observer = watchdog.observers.Observer()
event_handler = XmlFileEventHandler()
observer.schedule(event_handler, path='.')
try:
observer.start()
while True:
time.sleep(10)
finally:
observer.stop()
observer.join()
While the script is running, do not forget to touch one XML file, or simulate the on-the-fly writing using this one line script:
while read line; do echo $line; sleep 1; done <in.xml >out.xml &
For information, the xml.etree.ElementTree.iterparse does not seem to support a file being written. My test code:
from __future__ import print_function, division
import xml.etree.ElementTree
if __name__ == '__main__':
context = xml.etree.ElementTree.iterparse('f.xml', events=('end',))
for action, elem in context:
print(action, elem.tag)
My output:
end program
end version
end creator
end filename
end filesize
end tcpflow
end fileobject
end filename
end filesize
end tcpflow
end fileobject
end filename
end filesize
Traceback (most recent call last):
File "./iter.py", line 9, in <module>
for action, elem in context:
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1281, in next
self._root = self._parser.close()
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1654, in close
self._raiseerror(v)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: no element found: line 20, column 0
Three hours after posting my question, no answer received. But I have finally implemented the simple example I was looking for.
My inspiration is from saaj's answer and is based on xml.sax and watchdog.
from __future__ import print_function, division
import time
import watchdog.events
import watchdog.observers
import xml.sax
class XmlStreamHandler(xml.sax.handler.ContentHandler):
def startElement(self, tag, attributes):
print(tag, 'attributes=', attributes.items())
self.tag = tag
def characters(self, content):
print(self.tag, 'content=', content)
class XmlFileEventHandler(watchdog.events.PatternMatchingEventHandler):
def __init__(self):
watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['*.xml'])
self.file = None
self.parser = xml.sax.make_parser()
self.parser.setContentHandler(XmlStreamHandler())
def on_modified(self, event):
if not self.file:
self.file = open(event.src_path)
self.parser.feed(self.file.read())
if __name__ == '__main__':
observer = watchdog.observers.Observer()
event_handler = XmlFileEventHandler()
observer.schedule(event_handler, path='.')
try:
observer.start()
while True:
time.sleep(10)
finally:
observer.stop()
observer.join()
While the script is running, do not forget to touch one XML file, or simulate the on-the-fly writing using the following command:
while read line; do echo $line; sleep 1; done <in.xml >out.xml &
I am trying to connect to a remote MySql server from my local machine.
I want to run it whenever the DEBUG constant is set to true.
Here's the script:
import select
import SocketServer
import sys
import threading
import paramiko
SSH_PORT = 22
DEFAULT_PORT = 4000
g_verbose = True
class ForwardServer (SocketServer.ThreadingTCPServer):
daemon_threads = True
allow_reuse_address = True
class Handler (SocketServer.BaseRequestHandler):
def handle(self):
try:
chan = self.ssh_transport.open_channel('direct-tcpip',
(self.chain_host, self.chain_port),
self.request.getpeername())
except Exception, e:
verbose('Incoming request to %s:%d failed: %s' % (self.chain_host,
self.chain_port,
repr(e)))
return
if chan is None:
verbose('Incoming request to %s:%d was rejected by the SSH server.' %
(self.chain_host, self.chain_port))
return
verbose('Connected! Tunnel open %r -> %r -> %r' % (self.request.getpeername(),
chan.getpeername(), (self.chain_host, self.chain_port)))
while True:
r, w, x = select.select([self.request, chan], [], [])
if self.request in r:
data = self.request.recv(1024)
if len(data) == 0:
break
chan.send(data)
if chan in r:
data = chan.recv(1024)
if len(data) == 0:
break
self.request.send(data)
chan.close()
self.request.close()
verbose('Tunnel closed from %r' % (self.request.getpeername(),))
def forward_tunnel(local_port, remote_host, remote_port, transport):
# this is a little convoluted, but lets me configure things for the Handler
# object. (SocketServer doesn't give Handlers any way to access the outer
# server normally.)
class SubHander (Handler):
chain_host = remote_host
chain_port = remote_port
ssh_transport = transport
ForwardServer(('', local_port), SubHander).serve_forever()
def verbose(s):
if g_verbose:
print s
HELP = """\
Set up a forward tunnel across an SSH server, using paramiko. A local port
(given with -p) is forwarded across an SSH session to an address:port from
the SSH server. This is similar to the openssh -L option.
"""
def forward():
client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.WarningPolicy())
try:
print 'connecting'
client.connect('*******', username='***', password='****!')
print 'connected'
except Exception, e:
print '*** Failed to connect to %s:%d: %r' % ('*****', 22, e)
sys.exit(1)
try:
forward_tunnel(3306, '127.0.0.1', 3306, client.get_transport())
except SystemExit:
print 'C-c: Port forwarding stopped.'
sys.exit(0)
I have two problems here:
1) I don't know how and when to call my forward function when django raises.
2) When I access django locally and run the script from the console I get the following exception:
exception happened during
processing of request from
('127.0.0.1', 41872) Traceback (most
recent call last): File
"/usr/lib/python2.6/SocketServer.py",
line 558, in process_request_thread
self.finish_request(request, client_address) File
"/usr/lib/python2.6/SocketServer.py",
line 320, in finish_request
self.RequestHandlerClass(request, client_address, self) File
"/usr/lib/python2.6/SocketServer.py",
line 615, in init
self.handle() File "/home/omer/Aptana Studio 3
Workspace/Website/src/ssh_tunnel/tunnel.py",
line 51, in handle
verbose('Tunnel closed from %r' % (self.request.getpeername(),)) File
"", line 1, in getpeername
File "/usr/lib/python2.6/socket.py",
line 165, in _dummy
raise error(EBADF, 'Bad file descriptor') error: [Errno 9] Bad file
descriptor
Was this a bad idea to begin with?
Should I do this manually every time?
I don't think it's a bad idea.
I don't think you need to do it manually.
The exception is a bug in paramiko's forward code sample. This has been addressed by jhalcrow in the pull request here:
https://github.com/paramiko/paramiko/pull/36
This post has some code to do it in a more event driven way, i.e if you wanted to call it via some web event hooks in your django code or the like:
Paramiko SSH Tunnel Shutdown Issue
humm, i didn't try this, but if you are on linux, could you run
ssh -L 3306:localhost:3306 remote.host.ip
through python system call when DEBUG is set?
also if you are on Windows, try putty with port forwarding