Python read variable with French characters from file - python-2.7

I have a set of text files in which variables are stored, which I am trying to read into Python. As long as the variables do not contain any French characters, e.g. é, ç, etc. The following piece of code works well:
#!/usr/bin/python
import imp
def getVarFromFile(filename):
f=open (filename, 'rt')
global data
data = imp.load_source('data', " ", f)
f.close()
return()
def main():
getVarFromFile('test.txt')
print data.Title
print data.Language
print data.Summary
return()
if __name__ == "__main__":
main()
Example output:
me#mypc:$ ./readVar.py
Monsieur Flaubert
French
A few lines of text.
However when the text file contains French characters, for instance:
Title = "Monsieur Flaubert"
Language = "Français"
Summary = "Quelques lignes de texte en Français. é à etc."
I am getting the following error for which I cannot find a solution:
Traceback (most recent call last):
File "./tag.py", line 30, in <module>
main()
File "./tag.py", line 22, in main
getVarFromFile('test.txt')
File "./tag.py", line 15, in getVarFromFile
data = imp.load_source('data', " ", f)
File " ", line 2
SyntaxError: Non-ASCII character '\xc3' in file on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
How can French (utf-8) characters be handled?
Thanks for your consideration and help to this Python-learner.

You could use codecs.open.
data = {}
with codecs.open('test.txt', encoding='utf-8') as f:
for line in f.readlines():
# Use some logic here to load each line into a dict, like:
key, value = line.split(" = ")
data[key] = value
This solution doesn't use imp, it requires that you implement your own logic to interpret the contents of the file.

Related

Python 2.7, Too many values to unpack

I`m want to write the script for Check bitcoin private addresses for money from csv file.
Python 2.7.16 64-bit on Ubuntu 19.04
import requests
from pybitcoin import BitcoinPrivateKey
import pybitcoin
import time
keys = set()
with open('results.csv') as f:
for line in f.read().split('\n'):
if line:
repo_name, path, pkey = line.split(",")
keys.add(pkey)
for priv in keys:
try:
p = BitcoinPrivateKey(priv)
pub = p.public_key().address()
r = requests.get("https://blockchain.info/rawaddr/{}".format(pub))
time.sleep(1)
print '{} {} {:20} {:20} {:20} '.format(priv, pub,
r.json()['final_balance'],
r.json()['total_received'],
r.json()['total_sent'])
except (AssertionError, IndexError):
pass
except ValueError:
print r
print r.text
Exception has occurred: ValueError
too many values to unpack
File "/home/misha/bitcoinmaker/validate.py", line 9, in <module>
repo_name, path, pkey = line.split(",")
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/runpy.py", line 82, in _run_module_code
mod_name, mod_fname, mod_loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 252, in run_path
return _run_module_code(code, init_globals, run_name, path_name)
Some CSV data (its cut version of file (original file have 14042 lines of code)):
repo_name,path,pkey
tobias/tcrawley.org,src/presentations/ClojureWestJava9/assets/AB1E09C5-A5E3-4B1D-9E3B-C2E586ACFAC2/assets/AB1E09C5-A5E3-4B1D-9E3B-C2E586ACFAC2.pdfp,----
annavernidub/MDGServer,specs/resources/xml/results.xml,----
gyfei/bitcc,routes/home.js~,----
Nu3001/external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,----
cdawei/digbeta,dchen/music/format_results.ipynb,----
bitsuperlab/cpp-play,tests/regression_tests/issue_1229_public/alice.log,----
justin/carpeaqua-template,assets/fonts/562990/65B3CCFE671D2E128.css,----
dacsunlimited/dac_play,tests/regression_tests/issue_1218/alice.log,----
amsehili/audio-segmentation-by-classification-tutorial,multiclass_audio_segmentation.ipynb,----
biosustain/pyrcos,examples/.ipynb_checkpoints/RegulonDB network-checkpoint.ipynb,----
blockstack/blockstore,integration_tests/blockstack_integration_tests/scenarios/name_pre_reg_stacks_sendtokens_multi_multisig.py,----
gitcoinco/web,app/assets/v2/images/kudos/smart_contract.svg,----
Is it csv file are too large??
or maybe its some syntax error?
What im missing?
This is the way I would do it with a check and add some logging:
import csv
unknown_list = []
with open('results.csv') as f:
f = csv.reader(f, delimiter=',')
for line in f:
if len(line) == 3:
pkey = line[2]
keys.add(pkey)
else:
temp_list = []
for i in range(len(line)):
temp_list.append(line[i])
unknown_list.append(temp_list)
After that you can view what lines gave you the issue by printing or logging the unknown list. You can also log at every step of the process to see where the script is breaking.
If you provide some sample data from your results.csv file it will be easier to give you an accurate answer.
In general your line:
repo_name, path, pkey = line.split(",")
is providing three variables for the values derived from the line being split by a comma delimiter, but the split is producing more than three values.
The sample data:
sobakasu/vanitygen,README,5JLUmjZiirgziDmWmNprPsNx8DYwfecUNk1FQXmDPaoKB36fX1o
lekanovic/pycoin,tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
wkitty42/fTelnet,rip-tests/16c/SA-MRC.RIP,5HHR5GHR5CHR5AHR5AHR59HR57HR57HR54HR53HR52HR51HR4ZH
NKhan121/Portfolio,SAT Scores/SAT Project.ipynb,5Jy8FAAAAwGK6PMwpEonoxRdfVCAQaLf8yJEjeuGFFzrdr7m5We
chengsoonong/digbeta,dchen/music/format_results.ipynb,5JKSoHtHSE2Fbj3UHR4A5v4fVHFV92jN5iC9HKJ4MvRZ7Ek4Z7j
the-metaverse/metaverse,test/test-explorer/commands/wif-to-ec.cpp,5JuBiWpsjfXNxsWuc39KntBAiAiAP2bHtrMGaYGKCppq4MuVcQL
hessammehr/ChemDoodle-2D,data/spectra/ir_ACD.jdx,5K626571K149659j856919j347351J858139j136932j515732
designsters/android-fork-bitcoinj,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
HashEngineering/groestlcoinj,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
dyslexic-charactersheets/assets,languages/german/pathfinder/Archetypes/Wizard/Wizard (Familiar).pdf,5KAD7sCUfirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdi
ElementsProject/elements,src/wallet/rpcwallet.cpp,5Kb8kLf9zgWQnogidDA76MzPL6TsZZY36hWXMssSzNydYXYB9KF
pavlovdog/bitcoin_in_a_nutshell,markdown_source/bitcoin_in_a_nutshell_cryptography.md,5HtqcFguVHA22E3bcjJR2p4HHMEGnEXxVL5hnxmPQvRedSQSuT4
ValyrianTech/BitcoinSpellbook-v0.3,unittests/test_keyhelpers.py,5Jy4SEb6nqWLMeg4L9QFsi23Z2q6fzT6WMq4pKLfFqkTC389CrG
jrovegno/fci,Derechos-Agua-Petorca.ipynb,5JNZvHgxAAsWLGDJkiUsW7YMePhNGvT5UYPSHvNNfX54eHig2mM
ionux/bitforge,tests/data/privkey.json,5Ke7or7mg3MFzFuPpiTf2tBCnFQk6dR9qsbTmoE74AYWcQ8FmJv
chengsoonong/mclass-sky,projects/alasdair/notebooks/09_thompson_sampling_vstatlas.ipynb,5Keno6jz32WJXtERERNVZoCXdDhgypMhe1VmnQ54mVY1wuV62r
surikov/webaudiofontdata,sound/12842_4_Chaos_sf2_file.js,5HGw2nKaKhUAtyFG6aLMpTRULLRiTyHBCMFnFRWg6BsaULCwta
malikoski/wabit,src/main/resources/ca/sqlpower/wabit/example_workspace.wabit,5HSGKWPp2yPSqBzMxKyUfKyeuRybCzdi5cxV6Nmur8q78gyRLc6
GemHQ/money-tree,spec/lib/money-tree/address_spec.rb,5JXz5ZyFk31oHVTQxqce7yitCmTAPxBqeGQ4b7H3Aj3L45wUhoa
wkitty42/fTelnet,rip-tests/16c/SS-TT1.RIP,5KA85HAB5GAH5CAH5CAC5EA55FA95AAG56AG54A559A55AA955A
norsween/data-science,springboard-answers-to-exercises/Springboard Data Story Exercise.ipynb,5HA4jkQHgtPh9AsAKhoqUpB7FJykpSW1tbYqLi1NdXV3QXnNzs
alisdev/dss,dss-xades/src/test/resources/plugtest/esig2014/ESIG-XAdES/ES/Signature-X-ES-31.xsig,5HbuDMDxVSaWBvVRPSoaL8S6Vdtv3yBkz5bSgjAo1YKQn4q435
AmericasWater/awash,docs/compare-simulate.ipynb,5JAYT9xf2vChY6fDgWPnohZnfsxhmqA2yBRs1HHskMikDkvKto
coincooler/coincooler,spec/helpers/data_helper_spec.rb,5J2PLz9ej2k7c1UEfQANfQgLsZnFVeY5HjZpnDe1n6QSKXy1zFQ
magehost/magento-malware-scanner,corpus/backend/8afb44c2bd8e4ed889ec8a935d3b3d38,5HcfPTZ4JTqg6zdvKSRv92dUzgDEZBuvWiucdiZ7KS8UJWDU3y
daily-wallpapers/daily-wallpapers.github.io,2018/12/22/bd05d9380ae4459588eef75e0e25fc2c.html,5KafK34DXHztjFfL51J48dSvXUxtWW5akij5RVYEAaR228Kq7j
Grant-Redmond/cryptwallet,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
ryanralph/DIY-Piper,keys.txt,5JYiNDZgZrH9sDR6FC9XSG175FoBDKPrrt6eyyKxPCdQ1AWJgDD
FranklinChen/IHaskell,notebooks/IHaskell.ipynb,5Ktqn7AjR45QoEABnJ2d5Sd6hRBCPBaLxUJCQgJxcXGEhoZKwD
djredhand/epistolae,sites/all/modules/panopoly_demo/panopoly_demo.features.content.inc,5HbHJ6fWvXPDfhKbxT4ein8RTyw2kgDC3A2sR2J9PpXrGjabpvh
SAP/cloud-portal-tutorial,Image Gallery.html,5KepoCFUEsCQYFGkU3AvgK2AxqEjAp3CjjpaTCeWkdjQ3MDCznq
yueou/nyaa,public/img/mafuyu.svg,5K4do2Fj2wgV9FPLkoWz3Sk9au3M3BQStA5TmvvfYaL798nfPnz
dwiel/tensorflow_hmm,notebooks/gradient_descent_example.ipynb,5JXNt3LQB9NAgYHU2naQLmdJwBPP1ML2R21QS4cAYAR2frp87Z
imrehg/electrum,lib/tests/test_account.py,5Khs7w6fBkogoj1v71Mdt4g8m5kaEyRaortmK56YckgTubgnrhz
thirdkey-solutions/pycoin,tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
asm-products/AssemblyCoins,API/Whitepaper.md,5JYVttUTzATan4zYSCRHHdN2nfJJHv6Nu1PB6VnhWSQzQRxnyLa
karlobermeyer/numeric-digit-classification,numeric_digit_classification.baseline.ipynb,5KaGgoQUFBvPTSS1y5cqVRTXV1NTNnziQ2NpZ27doRFhZGSkoK
richardkiss/pycoin,tests/cmds/test_cases/ku/bip32_keyphrase.txt,5JvNzA5vXDoKYJdw8SwwLHxUxaWvn9mDea6k1vRPCX7KLUVWa7W
paulgovan/AutoDeskR,vignettes/AutoDeskR.html,5Jb6vUZGXEC1AcK2vzQW5oApRuJseyHkerTHgp4pbMbU5t5kgV
trehansiddharth/mlsciencebowl,git/hs/archive/357.txt.mail,5KAAACAAgALgABAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAEEAAA
mermi/EFD2015,img/weblitmap-1.1.svg,5H3756AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPj78
breadwallet/breadwallet-core,Java/Core/src/test/java/com/breadwallet/core/BRWalletManager.java,5Kb8kLf9zgWQnogidDA76MzPL6TsZZY36hWXMssSzNydYXYB9KF
zsoltii/dss,dss-xades/src/test/resources/plugtest/esig2014/ESIG-XAdES/ES/Signature-X-ES-74.xsig,5HbuDMDxVSaWBvVRPSoaL8S6Vdtv3yBkz5bSgjAo1YKQn4q435
gyglim/Recipes,examples/ImageNet Pretrained Network (VGG_S).ipynb,data/text/MATRYOSHKA-CHALLENGE,5HW67XgqGZKakwVrpftp9bzQFBig1gfWCPUTUyxWaVCCcfLV17
dyslexic-charactersheets/assets,languages/spanish/pathfinder/Archetypes/Druid/Plains Druid (Animal Companion).pdf,5J5ut7i5STzG9rLppih1CFLy8kKSTK8sfwpHXeJa99hUnfChHRf
keith-epidev/VHDL-lib,top/stereo_radio/ip/xfft/c_addsub_v12_0/hdl/c_addsub_v12_0_legacy.vhd,5JW64dZ8AXjc3DEXpwxS1wcUvakyfxBHNyPgk9SseKsP4PojeGV
gitonio/pycoin,COMMAND-LINE-TOOLS.md,5KhoEavGNNH4GHKoy2Ptu4KfdNp4r56L5B5un8FP6RZnbsz5Nmb
techdude101/code,C Sharp/SNMP HDD Monitor v0.1/SNMP HDD Monitor/mainForm.resx,5KSkv8FBQXjAAAASQAAACsAAAAcAAAADwAAAAUAAAABAAAAAAAA
RemitaBit/Remitabit,tests/regression_tests/short_below_feed/alice.log,5HpUwrtzSztqQpJxVHLsrZkVzVjVv9nUXeauYeeSxguzcmpgRcK
SportingCRED/sportingcrdblog,desempenho/jonathan_rosa.html,5HRm4g8amexBY4KFhCyBVy5Db6UNkSKAgKo2ogXibQxfSDVmjh
bashrc/zeronet-debian,src/src/Test/TestSite.py,5JU2p5h3R7B1WrbaEdEDNZR7YHqRLGcjNcqwqVQzX2H4SuNe2ee
martindale/fullnode,test/privkey.js,5JxgQaFM1FMd38cd14e3mbdxsdSa9iM2BV6DHBYsvGzxkTNQ7Un
atsuyim/ZeroNet,src/Content/ContentManager.py,5JCGE6UUruhfmAfcZ2GYjvrswkaiq7uLo6Gmtf2ep2Jh2jtNzWR
greenfield-innovation/greenfield-innovation.github.io,_site/examples/visual/index.html,5JZUUxKS1NzVaFe8Rt21aD6ELyD1n8FCadw334UkZHAXb3NSyZ
JorgeDeLosSantos/master-thesis,src/ch3/parts_01.svg,5KWSJQ8D2d3T6SNwwbSs3Mwh7RyV3qRbSUbf13GNmML9zN1FvC
JaviMerino/lisa,ipynb/tutorial/00_LisaInANutshell.ipynb,5JB3VPWgYAAACALLBw4cKwQ6hm3rx5YYcQCZxRRVbiXnHwifyCT
nopdotcom/2GIVE,src/vanitygen-master/README,5JLUmjZiirgziDmWmNprPsNx8DYwfecUNk1FQXmDPaoKB36fX1o
dacsunlimited/dac_play,tests/regression_tests/collected_fees/alice.log,5J3SQvvxRK4RfzFDcWZR5sLRkjrMvTn1FKXnzNGvWLgWdctLDQm
bussiere/ZeroNet,src/Content/ContentManager.py,5JCGE6UUruhfmAfcZ2GYjvrswkaiq7uLo6Gmtf2ep2Jh2jtNzWR
loon3/Tokenly-Pockets,Chrome Extension/js/bitcoinsig.js,5JeWZ1z6sRcLTJXdQEDdB986E6XfLAkj9CgNE4EHzr5GmjrVFpf
openledger/graphene-ui,web/lib/common/trxHelper.js,5KikQ23YhcM7jdfHbFBQg1G7Do5y6SgD9sdBZq7BqQWXmNH7gqo
tensorflow/probability,tensorflow_probability/examples/jupyter_notebooks/Factorial_Mixture.ipynb,5JBZJUGinA9k9twEaTFKNRiorTP6NEoJYzAhoBzRqBrpCiu8Bt
mostafaizz/Face_Landmark_Localization,BioID-FaceDatabase/BioID_0380.pgm,5JdWRQPQTTYWPRPTUSUPSUUXRSTSTPSXUSQVXSSPQQPTSSUTXY
number7team/SevenCore,test/test.PrivateKey.js,5KS4jw2kT3VoEFUfzgSpX3GVi7qRYkTfwTBU7qxPKyvbGuiVj33
thephez/electrum,lib/tests/test_account.py,5Khs7w6fBkogoj1v71Mdt4g8m5kaEyRaortmK56YckgTubgnrhz
surikov/webaudiofontdata,sound/12842_7_Chaos_sf2_file.js,5HGw2nKaKhUAtyFG6aLMpTRULLRiTyHBCMFnFRWg6BsaULCwta
CMPUT301F14T07/Team7-301Project,UML/UML.class.violet.html,5HSUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAHNePP
kernoelpanic/tinker-with-usbarmory,README.md,5JrAdQp23Zkqi4NFwSGoh6kftHochz56ctxuFVemX1vy4KozLvV
benzwjian/bitcoin-fp,test/ecpair.js,5J3mBbAH58CpQ3Y5RNJpUKPE62SQ5tfcvU2JpbnkeyhfsYB1Jcn
zaki/slave,SLave/SLaveForm.resx,5H3v8s6YfyJQZApinCSUNL6T32hHUDnneGdaiwCa8ekodUyfF6Z
bokae/szamprob,notebooks/Package04/pi.ipynb,5J5VdLSedNJqWbYso6TUxoj52bfA1XwmZse2N1TdMGvdgscMP3
sciant/sciant.github.io,assets/desktop/Oval.svg,5HCxg2h2kWtAYr8RmVxHSgYzBTDQwiZrD25K75ZbazTeJpHN1rK
UCL-BLIC/legion-buildscripts,cytofpipe/v1.3/Rlibs/openCyto/doc/openCytoVignette.html,5KEPZdiwYXq9nkajPfT2oiCgPnqU5xT1zhyLSE1fM74n8e8Pgs
xJom/mbunit-v3,src/Extensions/Icarus/Gallio.Icarus/Reload/ReloadDialog.resx,5Hsyo5SiKUCgUtJRSSyk151wrpTRjTANQSRzBrGfWiJc1BMePHz
GroestlCoin/bitcoin,src/wallet/test/psbt_wallet_tests.cpp,5KSSJQ7UNfFGwVgpCZDSHm5rVNhMFcFtvWM3zQ8mW4qNDEN7LFd
gitcoinco/web,app/assets/v2/images/kudos/doge.svg,5JCWfzJMfuHtoYFVMBYxXDbGLbpib6af32A5tEX9eTfbiEDyKy9
moocowmoo/dashman,lib/pycoin/tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
reblws/tab-search,src/static/lib/fonts.css,5KgjXgnXgNVBJ3HpaugH8GWwEr4NNgL2D9ofN25TvnfjvYkXWsK
schildbach/bitcoinj,core/src/test/java/org/bitcoinj/core/Base58Test.java,5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsreAbuatmU
QuantEcon/QuantEcon.notebooks,ddp_ex_career_py.ipynb,5JkqSRsSBaCKrqjiSvAv4fsAg4q6quGnJYkiSNjAWREABU1T8D
moncho/warpwallet,bitcoin/bitcoin_test.go,5JfEekYcaAexqcigtFAy4h2ZAY95vjKCvS1khAkSG8ATo1veQAD
BigBrother1984/android_external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,5Js3pmbMmKEaNmyoRowYoTj2kHb2MBW74ap169bq7NmzrvFDhg
taishi107/K-means_clustering,K-means_clustering3.ipynb,5HLKFQZeSqRSCTPGTLyVCKRSJ4zpGKXSCSS5wyp2CUSieQ5Qyp2
keith-epidev/VHDL-lib,top/stereo_radio/ip/xfft/xfft_v9_0/hdl/shift_ram.vhd,5HAT5FAXSTzJUoP6GBwzVHhyeEqVpX4CC8pfA8TEjGyxPv2tkY
sawatani/bitcoin-hall,test/Fathens/Bitcoin/Wallet/KeysSpec.hs,5JHm1YEyRk4dV8KNN4vkaFeutqpnqAVf9AWYi4cvQrtT5o57dPR
ivansib/sib16,src/test/data/base58_keys_valid.json,5KEyMKs1jynRbTfpGFPveXyxMcfZb1X9SnR3TneYQwRtXdzkzhL
ledeprogram/algorithms,class4/homework/najmabadi_shannon_4_3.ipynb,5Jfp5kZKyPAjkSNQwje6pGVVFbXpuUV1teS9WoqgLVKNiYTdQw
desihub/desitarget,doc/nb/connecting-spectra-to-mocks.ipynb,5JjZv3ozVq1ebXJfqjGtZW4bCJkVFRcjMzHToGpMmTcKJEycE97
eneldoserrata/marcos_openerp,addons/fleet/fleet_cars.xml,5KJ9TQiWUeuAtrhEWjgCZwAbgg3VLU4Ecg8pQCRBRmUpFGPSLex
JCROM-Android/jcrom_external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,5Js3pmbMmKEaNmyoRowYoTj2kHb2MBW74ap169bq7NmzrvFDhg
alixaxel/dump.HN,data/items/2014/03/08/23-24.csv,5HpHagT65TZzG1PH3CSu63k8DbpvD9KsvQVUCsn2t55TVA1jxW7
qutip/qutip-notebooks,development/development-ssesolve-tests.ipynb,5KM85KMs5CXg9ZCzHbZ3DPJa2wDP7uzm6e1dDKYtykNe3r26iT
bitshares/devshares,tests/regression_tests/short_below_feed/alice.log,5HpUwrtzSztqQpJxVHLsrZkVzVjVv9nUXeauYeeSxguzcmpgRcK
denkhaus/bitsharesx,tests/regression_tests/titan_test/client1.log,5JMnSU8bfBcu67oA9KemNm5jbs9RTp2eBHqxoR53WWyB4CH2QJF
ivanfoong/practical-machine-learning-assessment,building_human_activity_recognition_model.html,5Jno7z6Py8fkQkwASbABJgAE2AC9U5g4cKFcmoy4jheeeWVHh1
vikashvverma/machine-learning,mlfoundation/istat/project/investigate-a-dataset-template.ipynb,5KMV5Jz3ytfN7q9KTC4AqBPiLPhRwjLowiY19xaGhmRJ93auJjF
partrita/partrita.github.io,posts/altair/index.html,5KWn2EcAX6ibXaCEEXvTSJeP6T445Sc9mPreCPPrDUmX4cNegw
OriolAbril/Statistics-Rocks-MasterCosmosUAB,Rico_Block3/Block3_HT.ipynb,5HxYmbvSepRw6a73P3VoM9dipzifb4xaztSfWptxuqcdhvRM7Pj
wildbillcat/MakerFarm,MakerFarm/Migrations/201401212104523_Formatted external to humanfriendly column name.resx,5K1zhDkW7JZWeJsS9brtA8fojjTko3p1edWzMukZHXw5GZRciJN
nccgroup/grepify,Win.Grepify/Win.Grepify/Form1.resx,5J6AMbRdASzmN4pnx622HrzrEm7attVmygUetm88aLpJkAKMRUy
voytekresearch/misshapen,demo_Shape dataframe.ipynb,5HRUdGxuLsLAwBAcH49WrV3j27Bnu3bsns56oPCP1GjRokKdrA
SirmaITT/conservation-space-1.7.0,cs-models/Partners definitions/SMK/template/treatmentreportmultipleobjecttemplate.xml,5KoJspjvTDGPkPbDHC2NtohN1MKypc69Fiy9LFepfg6tt4S3orU
4flyers/4flyers.github.io,img/portfolio/portfolio_projects.svg,5KYWXACSGEEHJxmxnZB2HBQed3MyWEEEJ6ncA3zhVoU2Wxy69J
frrp/bitshares,tests/regression_tests/issue_1229_titan/alice.log,5KasHemYTcbGtHXKHNx5sUMPrrz8r4GuU3ao157F6Wx95y7NnbN
wkitty42/fTelnet,rip-tests/16c/JE-APOC.RIP,5HF56H958H15CGW5DGS5DGL5EGG5EG75EG15DFT5CFN5CFJ5AF
iobond/bitcore-old,test/test.WalletKey.js,5KMpLZExnGzeU3oC9qZnKBt7yejLUS8boPiWag33TMX2XEK2Ayc

Errno 22 when using shutil.copyfile on dictionary values in python

I am getting a feedback error message that I can't seem to resolve. I have a csv file that I am trying to read and generate pdf files based on the county they fall in. If there is only one map in that county then I do not need to append the files (code TBD once this hurdle is resolved as I am sure I will run into the same issue with the code when using pyPDF2) and want to simply copy the map to a new directory with a new name. The shutil.copyfile does not seem to recognize the path as valid for County3 which meets the condition to execute this command.
Map.csv file
County Maps
County1 C:\maps\map1.pdf
County1 C:\maps\map2.pdf
County2 C:\maps\map1.pdf
County2 C:\maps\map3.pdf
County3 C:\maps\map3.pdf
County4 C:\maps\map2.pdf
County4 C:\maps\map3.pdf
County4 C:\maps\map4.pdf
My code:
import csv, os
import shutil
from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter
merged_file = PdfFileMerger()
counties = {}
with open(r'C:\maps\Maps.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=",")
for n, row in enumerate(reader):
if not n:
continue
county, location = row
if county not in counties:
counties[county] = list()
counties[county].append((location))
for k, v in counties.items():
newPdfFile = ('C:\maps\Maps\JoinedMaps\County-' + k +'.pdf')
if len(str(v).split(',')) > 1:
print newPdfFile
else:
shutil.copyfile(str(v),newPdfFile)
print 'v: ' + str(v)
Feedback message:
C:\maps\Maps\JoinedMaps\County-County4.pdf
C:\maps\Maps\JoinedMaps\County-County1.pdf
v: ['C:\\maps\\map3.pdf']
Traceback (most recent call last):
File "<module2>", line 22, in <module>
File "C:\Python27\ArcGIS10.5\lib\shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 22] invalid mode ('rb') or filename: "['C:\\\\maps\\\\map3.pdf']"
There are no blank lines in the csv file. In the csv file I tried changing the back slashes to forward slashes, double slashes, etc. I still get the error message. Is it because data is returned in brackets? If so, how do I strip these?
You are actually trying to create the file ['C:\maps\map3.pdf'], you can tell this because the error messages shows the filename its trying to create:
IOError: [Errno 22] invalid mode ('rb') or filename: "['C:\\\\maps\\\\map3.pdf']"
This value comes from the fact that you are converting to string, the value of the dictionary key, which is a list here:
shutil.copyfile(str(v),newPdfFile)
What you need to do is check if the list has more than one member or not, then step through each member of the list (the v) and copy the file.
for k, v in counties.items():
newPdfFile = (r'C:\maps\Maps\JoinedMaps\County-' + k +'.pdf')
if len(v) > 1:
print newPdfFile
else:
for filename in v:
shutil.copyfile(filename, newPdfFile)
print('v: {}'.format(filename))

python readline from big text file

When I run this:
import os.path
import pyproj
srcProj = pyproj.Proj(proj='longlat', ellps='GRS80', datum='NAD83')
dstProj = pyproj.Proj(proj='longlat', ellps='WGS84', datum='WGS84')
f = file(os.path.join("DISTAL-data", "countries.txt"), "r")
heading = f.readline() # Ignore field names.
with open('C:\Python27\DISTAL-data\geonames_20160222\countries.txt', 'r') as f:
for line in f.readlines():
parts = line.rstrip().split("|")
featureName = parts[1]
featureClass = parts[2]
lat = float(parts[9])
long = float(parts[10])
if featureClass == "Populated Place":
long,lat = pyproj.transform(srcProj, dstProj, long, lat)
f.close()
I get this error:
File "C:\Python27\importing world datacountriesfromNAD83 toWGS84.py",
line 13, in for line in f.readlines() : MemoryError.
I have downloaded countries file from http://geonames.nga.mil/gns/html/namefiles.html as entire country file dataset.
Please help me to get out of this.
readlines() for large files creates a large structure in memory, you can try using:
f = open('somefilename','r')
for line in f:
dosomthing()
Answer given by Yael is helpful, I would like to improve it. A Good way to read a file or large file
with open(filename) as f:
for line in f:
print f
I like to use 'with' statement which ensure file will be properly closed.

how to lookup the numbers next to character using python

this is just part of the long python script. there is a file called aqfile and it has many parameters. I would like to extract what is next to "OWNER" and "NS".
Note:
OWNER = text
NS = numbers
i could extract what is next to OWNER, because they were just text and i could extract.
for line in aqfile.readlines():
if string.find(line,"OWNER")>0:
print line
m=re.search('<(.*)>',line)
owner=incorp(m.group(1))
break
but when i try to modify the script to extract the numbers
for line in aqfile.readlines():
if string.find(line,"NS")>0:
print line
m=re.search('<(.*)>',line)
ns=incorp(m.group(1))
break
it doesnt work any more.
Can anyone help me?
this is the whole script
#Make a CSV file of datasetnames. pulseprog and, if avaible, (part of) the title
#Note: the whole file tree is read into memory!!! Do not start too high in the tree!!!
import os
import os.path
import fnmatch
import re
import string
max=20000
outfiledesc=0
def incorp(c):
#Vervang " door """ ,CRLF door blankos
c=c.replace('"','"""')
c=c.replace("\r"," ")
c=c.replace("\n"," ")
return "\"%s\"" % (c)
def process(arg,root,files):
global max
global outfiledesc
#Get name,expno,procno from the root
if "proc" in files:
procno = incorp(os.path.basename(root))
oneup = os.path.dirname(root)
oneup = os.path.dirname(oneup)
aqdir=oneup
expno = incorp(os.path.basename(oneup))
oneup = os.path.dirname(oneup)
dsname = incorp(os.path.basename(oneup))
#Read the titlefile, if any
if (os.path.isfile(root + "/title")):
f=open(root+"/title","r")
title=incorp(f.read(max))
f.close()
else:
title=""
#Grab the pulse program name from the acqus parameter
aqfile=open(aqdir+"/acqus")
for line in aqfile.readlines():
if string.find(line,"PULPROG")>0:
print line
m=re.search('<(.*)>',line)
pulprog=incorp(m.group(1))
break
towrite= "%s;%s;%s;%s;%s\n" % (dsname,expno,procno,pulprog,title)
outfiledesc.write(towrite)
#Main program
dialogline1="Starting point of the search"
dialogline2="Maximum length of the title"
dialogline3="output CSV file"
def1="/opt/topspin3.2/data/nmrafd/nmr"
def2="20000"
def3="/home/nmrafd/filelist.csv"
result = INPUT_DIALOG("CSV file creator","Create a CSV list",[dialogline1,dialogline2,dialogline3],[def1,def2,def3])
start=result[0]
tlength=int(result[1])
outfile=result[2]
#Search for procs files. They should be in any dataset.
outfiledesc = open(outfile,"w")
print start
os.path.walk(start,process,"")
outfiledesc.close()

How to use list of strings as training data for svm using scikit.learn?

I am using scikit.learn to train an svm based on data where each observation (X) is a list of words. The tags for each observation (Y) are floating point values. I have tried following the example given in the scikit learn documentation (http://scikit-learn.org/stable/modules/svm.html) for Multi-class classification.
Here is my code:
from __future__ import division
from sklearn import svm
import os.path
import numpy
import re
'''
The stanford-postagger was included to see how it tags the words and to see if it would help in getting just the names
of the ingredients. Turns out its pointless.
'''
#from nltk.tag.stanford import POSTagger
mainDirectory = './nyu/PROJECTS/Epicurious/DATA/ingredients'
#st = POSTagger('/usr/share/stanford-postagger/models/english-bidirectional-distsim.tagger','/usr/share/stanford-postagger/stanford-postagger.jar')
'''
This is where we would reach each line of the file and then run a regex match on it to get all the words before
the first tab. (these are the names of the ingredients. Some of them may have adjectives like fresh, peeled,cut etc.
Not sure what to do about them yet.)
'''
def getFileDetails(_filename,_fileDescriptor):
rankingRegexMatch = re.match('([0-9](?:\_)[0-9]?)', _filename)
if len(rankingRegexMatch.group(0)) == 2:
ranking = float(rankingRegexMatch.group(0)[0])
else:
ranking = float(rankingRegexMatch.group(0)[0]+'.'+rankingRegexMatch.group(0)[2])
_keywords = []
for line in _fileDescriptor:
m = re.match('(\w+\s*\w*)(?=\t[0-9])', line)
if m:
_keywords.append(m.group(0))
return [_keywords,ranking]
'''
Open each file in the directory and pass the name and file descriptor to getFileDetails
'''
def this_is_it(files):
_allKeywords = []
_allRankings = []
for eachFile in files:
fullFilePath = mainDirectory + '/' + eachFile
f = open(fullFilePath)
XandYForThisFile = getFileDetails(eachFile,f)
_allKeywords.append(XandYForThisFile[0])
_allRankings.append(XandYForThisFile[1])
#_allKeywords = numpy.array(_allKeywords,dtype=object)
svm_learning(_allKeywords,_allRankings)
def svm_learning(x,y):
clf = svm.SVC()
clf.fit(x,y)
'''
This just prints the directory path and then calls the callback x on files
'''
def print_files( x, dir_path , files ):
print dir_path
x(files)
'''
code starts here
'''
os.path.walk(mainDirectory, print_files, this_is_it)
When the svm_learning(x,y) method is called, it throws me an error:
Traceback (most recent call last):
File "scan for files.py", line 72, in <module>
os.path.walk(mainDirectory, print_files, this_is_it)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.py", line 238, in walk
func(arg, top, names)
File "scan for files.py", line 68, in print_files
x(files)
File "scan for files.py", line 56, in this_is_it
svm_learning(_allKeywords,_allRankings)
File "scan for files.py", line 62, in svm_learning
clf.fit(x,y)
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/svm/base.py", line 135, in fit
X = atleast2d_or_csr(X, dtype=np.float64, order='C')
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 116, in atleast2d_or_csr
"tocsr")
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 96, in _atleast2d_or_sparse
X = array2d(X, dtype=dtype, order=order, copy=copy)
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 80, in array2d
X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_bbcfcf6_20130307-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 331, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
Can anyone help? I am new to scikit and could not find any help in the documentation.
You should take a look at: Text feature extraction. You are going to want to use either a TfidfVectorizer, a CountVectorizer, or a HashingVectorizer(if your data is very large). These components take your text in and output feature matrices that are acceptable to classifiers. Be advised that these work on lists of strings, with one string per example, so if you have a list of lists of strings (you have already tokenized), you may need to either join() the tokens to get a list of strings or skip tokenization.