couchbase python sdk ascii exception - python-2.7

First of all, this is the exception
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\scrapy\middleware.py", line 62, in _process_chain
return process_chain(self.methods[methodname], obj, *args)
File "C:\Python27\lib\site-packages\scrapy\utils\defer.py", line 65, in process_chain
d.callback(input)
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 383, in callback
self._startRunCallbacks(result)
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 491, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 578, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "D:\ScrapyProjects\General_Spider_code_version_4\General_Spider_code_version_4\pipelines.py", line 14, in process_item
connection.set(fileName, dict(item)) #write the item to the couchbase database
File "C:\Python27\lib\site-packages\couchbase-1.2.5-py2.7-win-amd64.egg\couchbase\connection.py", line 331, in set
persist_to, replicate_to)
File "C:\Python27\lib\site-packages\couchbase-1.2.5-py2.7-win-amd64.egg\couchbase\_bootstrap.py", line 99, in _json_encode_wrapper
return json.dumps(*args, ensure_ascii=False, separators=(',', ':'))
File "C:\Python27\lib\json\__init__.py", line 250, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "C:\Python27\lib\json\encoder.py", line 210, in encode
return ''.join(chunks)
couchbase.exceptions.ValueFormatError: <Couldn't encode value, inner_cause='ascii' codec can't decode byte 0xe2 in position 5: ordinal not in range(128), C Source=(src\convert.c,131), OBJ={'bathrooms': 1.0, 'furnished': 'No', 'ad_title': 'Large Studio For Rent in IMPZ just 45K/4chqs(KK)', 'agent_fees': -1, 'size': 550.0, 'category': 'Apartment', 'company_rera_number': '12913', 'agent_company': 'AL ANAS REAL ESTATE BROKER', 'ded_licence_number': '700590', 'source': 'dubizzleproperty', 'location': 'UAE \xe2\x80\xaa>\xe2\x80\xaa Dubai \xe2\x80\xaa>\xe2\x80\xaa IMPZ International Media Production Zone ; 3.1 km from Meadows Town Centre \xc2\xa0', 'image_links': [u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/73ff34e2a38c7b104401c9e5c54b03628971053f/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/24ec831f6b4afb47fecc1c3e0991cf3090c90c24/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/77fee11394090aaea2d668cfe2754b92d6e36264/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/5d4113319ccbabcdd65b0ffe7302da59b374b5fe/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/8070689f309759d5860e97aa35d3f0eac425dc1d/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/8e86702847e69d485d147629dd2e48e1ad831e63/main.jpeg'], 'latitude': -1, 'description': 'Central A/C & Heating , Balcony , Shared Pool , Built in Wardrobes , Walk-in Closet , Shared Gym , Security , Built in Kitchen Appliances', 'bedrooms': 'Studio', 'rent_is_paid': 'Quarterly', 'action': 'Rent', 'link': 'http://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2015/2/18/large-studio-for-rent-in-impz-just-45k4chq-2/?back=ZHViYWkuZHViaXp6bGUuY29tL3Byb3BlcnR5LWZvci1yZW50L3Jlc2lkZW50aWFsL2FwYXJ0bWVudGZsYXQv&pos=1', 'longitude': -1, 'property_reference': '', 'yearly_cost': 45000.0, 'agent_mobile': -1, 'posting_date': '2015-02-19'}>
I am trying to store a dictionary on a couchbase. i am using this code
connection.set(fileName, dict(item))
to transfer the item to dictionary. as you see from the error message. i have a unicode values, which is according to python sdk couchbase is fine, could you help me please?

Your values are not unicode. Keep in mind that a str object containing valid unicode escape sequences does not automatically make it "Unicode" in Python parlance. You need to ensure the strings are properly Unicode.
This seems to work with the normal json.dumps() function (without any arguments); whereas the python client passes (by default) the ensure_ascii=False parameter to decrease the data size (JSON itself can be in UTF-8 encoding, and is not limited to ASCII).
Thus, a workaround may be to set your own encoding function for JSON which does not pass the ensure_ascii parameter; like so:
import json
import couchbase
couchbase.set_json_converters(json.dumps, json.loads)
Though this workaround is not recommended as it may inflate your document size slightly.

Related

Why I am getting this error "google.api_core.exceptions.ResourceExhausted: 429 received trailing metadata size exceeds limit"?

I am new to google cloud platform. I have created a endpoint after uploading a model on google Vertex AI. But when I am running the prediction function (python) suggested in the sample request I am getting this error :-
Traceback (most recent call last):
File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\grpc_helpers.py", line 67, in
error_remapped_callable
return callable_(*args, **kwargs)
File "C:\Users\My\anaconda3\lib\site-packages\grpc\_channel.py", line 923, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "C:\Users\My\anaconda3\lib\site-packages\grpc\_channel.py", line 826, in
_end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "received trailing metadata size exceeds limit"
debug_error_string = "{"created":"#1622724354.768000000","description":"Error received
from peer ipv4:***.***.***.**","file":"src/core/lib/surface/call.cc",
"file_line":1063,"grpc_message":"received trailing metadata size exceeds limit",
"grpc_status":8}">
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "b.py", line 39, in <module>
predict_custom_trained_model_sample(
File "b.py", line 28, in predict_custom_trained_model_sample
response = client.predict(
File "C:\Users\My\anaconda3\lib\site-
packages\google\cloud\aiplatform_v1\services\prediction_service\client.py", line 445, in
predict
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\gapic_v1\method.py", line 145,
in __call__
return wrapped_func(*args, **kwargs)
File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\grpc_helpers.py", line 69, in
error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.ResourceExhausted: 429 received trailing metadata size exceeds limit
the code that I executed for prediction is
from typing import Dict
from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
def predict_custom_trained_model_sample(
project: str,
endpoint_id: str,
instance_dict: Dict,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
# The AI Platform services require regional API endpoints.
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
# This client only needs to be created once, and can be reused for
multiple requests.
client =
aiplatform.gapic.PredictionServiceClient(client_options=client_options)
# The format of each instance should conform to the deployed model's prediction input schema.
instance = json_format.ParseDict(instance_dict, Value())
instances = [instance]
parameters_dict = {}
parameters = json_format.ParseDict(parameters_dict, Value())
endpoint = client.endpoint_path(
project=project, location=location, endpoint=endpoint_id
)
response = client.predict(
endpoint=endpoint, instances=instances, parameters=parameters
)
print("response")
print(" deployed_model_id:", response.deployed_model_id)
# The predictions are a google.protobuf.Value representation of the model's predictions.
predictions = response.predictions
for prediction in predictions:
print(" prediction:", dict(prediction))
After running this code I got the error.
If anyone knows about this issue pls help.
Few things to consider:
Profile your custom container model, make sure it's predict api function isn't for some reason latent
Allow your prediction service to serve using multiple workers
Increase your number of replicas in Vertex or set your machine types to stronger types as long as you gain improvement
However, there's something worth doing first in the client side assuming most of your prediction calls go through successfully and it is not that frequent that the service is unavailable,
Configure your prediction client to use Retry (exponential backoff):
from google.api_core.retry import Retry, if_exception_type
import requests.exceptions
from google.auth import exceptions as auth_exceptions
from google.api_core import exceptions
if_error_retriable = if_exception_type(
exceptions.GatewayTimeout,
exceptions.TooManyRequests,
exceptions.ResourceExhausted,
exceptions.ServiceUnavailable,
exceptions.DeadlineExceeded,
requests.exceptions.ConnectionError, # The last three might be an overkill
requests.exceptions.ChunkedEncodingError,
auth_exceptions.TransportError,
)
def _get_retry_arg(settings: PredictionClientSettings):
return Retry(
predicate=if_error_retriable,
initial=1.0, # Initial delay
maximum=4.0, # Maximum delay
multiplier=2.0, # Delay's multiplier
deadline=9.0, # After 9 secs it won't try again and it will throw an exception
)
def predict_custom_trained_model_sample(
project: str,
endpoint_id: str,
instance_dict: Dict,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
...
response = await client.predict(
endpoint=endpoint,
instances=instances,
parameters=parameters,
timeout=SOME_VALUE_IN_SEC,
retry=_get_retry_arg(),
)

Python 2.7, Too many values to unpack

I`m want to write the script for Check bitcoin private addresses for money from csv file.
Python 2.7.16 64-bit on Ubuntu 19.04
import requests
from pybitcoin import BitcoinPrivateKey
import pybitcoin
import time
keys = set()
with open('results.csv') as f:
for line in f.read().split('\n'):
if line:
repo_name, path, pkey = line.split(",")
keys.add(pkey)
for priv in keys:
try:
p = BitcoinPrivateKey(priv)
pub = p.public_key().address()
r = requests.get("https://blockchain.info/rawaddr/{}".format(pub))
time.sleep(1)
print '{} {} {:20} {:20} {:20} '.format(priv, pub,
r.json()['final_balance'],
r.json()['total_received'],
r.json()['total_sent'])
except (AssertionError, IndexError):
pass
except ValueError:
print r
print r.text
Exception has occurred: ValueError
too many values to unpack
File "/home/misha/bitcoinmaker/validate.py", line 9, in <module>
repo_name, path, pkey = line.split(",")
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/runpy.py", line 82, in _run_module_code
mod_name, mod_fname, mod_loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 252, in run_path
return _run_module_code(code, init_globals, run_name, path_name)
Some CSV data (its cut version of file (original file have 14042 lines of code)):
repo_name,path,pkey
tobias/tcrawley.org,src/presentations/ClojureWestJava9/assets/AB1E09C5-A5E3-4B1D-9E3B-C2E586ACFAC2/assets/AB1E09C5-A5E3-4B1D-9E3B-C2E586ACFAC2.pdfp,----
annavernidub/MDGServer,specs/resources/xml/results.xml,----
gyfei/bitcc,routes/home.js~,----
Nu3001/external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,----
cdawei/digbeta,dchen/music/format_results.ipynb,----
bitsuperlab/cpp-play,tests/regression_tests/issue_1229_public/alice.log,----
justin/carpeaqua-template,assets/fonts/562990/65B3CCFE671D2E128.css,----
dacsunlimited/dac_play,tests/regression_tests/issue_1218/alice.log,----
amsehili/audio-segmentation-by-classification-tutorial,multiclass_audio_segmentation.ipynb,----
biosustain/pyrcos,examples/.ipynb_checkpoints/RegulonDB network-checkpoint.ipynb,----
blockstack/blockstore,integration_tests/blockstack_integration_tests/scenarios/name_pre_reg_stacks_sendtokens_multi_multisig.py,----
gitcoinco/web,app/assets/v2/images/kudos/smart_contract.svg,----
Is it csv file are too large??
or maybe its some syntax error?
What im missing?
This is the way I would do it with a check and add some logging:
import csv
unknown_list = []
with open('results.csv') as f:
f = csv.reader(f, delimiter=',')
for line in f:
if len(line) == 3:
pkey = line[2]
keys.add(pkey)
else:
temp_list = []
for i in range(len(line)):
temp_list.append(line[i])
unknown_list.append(temp_list)
After that you can view what lines gave you the issue by printing or logging the unknown list. You can also log at every step of the process to see where the script is breaking.
If you provide some sample data from your results.csv file it will be easier to give you an accurate answer.
In general your line:
repo_name, path, pkey = line.split(",")
is providing three variables for the values derived from the line being split by a comma delimiter, but the split is producing more than three values.
The sample data:
sobakasu/vanitygen,README,5JLUmjZiirgziDmWmNprPsNx8DYwfecUNk1FQXmDPaoKB36fX1o
lekanovic/pycoin,tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
wkitty42/fTelnet,rip-tests/16c/SA-MRC.RIP,5HHR5GHR5CHR5AHR5AHR59HR57HR57HR54HR53HR52HR51HR4ZH
NKhan121/Portfolio,SAT Scores/SAT Project.ipynb,5Jy8FAAAAwGK6PMwpEonoxRdfVCAQaLf8yJEjeuGFFzrdr7m5We
chengsoonong/digbeta,dchen/music/format_results.ipynb,5JKSoHtHSE2Fbj3UHR4A5v4fVHFV92jN5iC9HKJ4MvRZ7Ek4Z7j
the-metaverse/metaverse,test/test-explorer/commands/wif-to-ec.cpp,5JuBiWpsjfXNxsWuc39KntBAiAiAP2bHtrMGaYGKCppq4MuVcQL
hessammehr/ChemDoodle-2D,data/spectra/ir_ACD.jdx,5K626571K149659j856919j347351J858139j136932j515732
designsters/android-fork-bitcoinj,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
HashEngineering/groestlcoinj,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
dyslexic-charactersheets/assets,languages/german/pathfinder/Archetypes/Wizard/Wizard (Familiar).pdf,5KAD7sCUfirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdi
ElementsProject/elements,src/wallet/rpcwallet.cpp,5Kb8kLf9zgWQnogidDA76MzPL6TsZZY36hWXMssSzNydYXYB9KF
pavlovdog/bitcoin_in_a_nutshell,markdown_source/bitcoin_in_a_nutshell_cryptography.md,5HtqcFguVHA22E3bcjJR2p4HHMEGnEXxVL5hnxmPQvRedSQSuT4
ValyrianTech/BitcoinSpellbook-v0.3,unittests/test_keyhelpers.py,5Jy4SEb6nqWLMeg4L9QFsi23Z2q6fzT6WMq4pKLfFqkTC389CrG
jrovegno/fci,Derechos-Agua-Petorca.ipynb,5JNZvHgxAAsWLGDJkiUsW7YMePhNGvT5UYPSHvNNfX54eHig2mM
ionux/bitforge,tests/data/privkey.json,5Ke7or7mg3MFzFuPpiTf2tBCnFQk6dR9qsbTmoE74AYWcQ8FmJv
chengsoonong/mclass-sky,projects/alasdair/notebooks/09_thompson_sampling_vstatlas.ipynb,5Keno6jz32WJXtERERNVZoCXdDhgypMhe1VmnQ54mVY1wuV62r
surikov/webaudiofontdata,sound/12842_4_Chaos_sf2_file.js,5HGw2nKaKhUAtyFG6aLMpTRULLRiTyHBCMFnFRWg6BsaULCwta
malikoski/wabit,src/main/resources/ca/sqlpower/wabit/example_workspace.wabit,5HSGKWPp2yPSqBzMxKyUfKyeuRybCzdi5cxV6Nmur8q78gyRLc6
GemHQ/money-tree,spec/lib/money-tree/address_spec.rb,5JXz5ZyFk31oHVTQxqce7yitCmTAPxBqeGQ4b7H3Aj3L45wUhoa
wkitty42/fTelnet,rip-tests/16c/SS-TT1.RIP,5KA85HAB5GAH5CAH5CAC5EA55FA95AAG56AG54A559A55AA955A
norsween/data-science,springboard-answers-to-exercises/Springboard Data Story Exercise.ipynb,5HA4jkQHgtPh9AsAKhoqUpB7FJykpSW1tbYqLi1NdXV3QXnNzs
alisdev/dss,dss-xades/src/test/resources/plugtest/esig2014/ESIG-XAdES/ES/Signature-X-ES-31.xsig,5HbuDMDxVSaWBvVRPSoaL8S6Vdtv3yBkz5bSgjAo1YKQn4q435
AmericasWater/awash,docs/compare-simulate.ipynb,5JAYT9xf2vChY6fDgWPnohZnfsxhmqA2yBRs1HHskMikDkvKto
coincooler/coincooler,spec/helpers/data_helper_spec.rb,5J2PLz9ej2k7c1UEfQANfQgLsZnFVeY5HjZpnDe1n6QSKXy1zFQ
magehost/magento-malware-scanner,corpus/backend/8afb44c2bd8e4ed889ec8a935d3b3d38,5HcfPTZ4JTqg6zdvKSRv92dUzgDEZBuvWiucdiZ7KS8UJWDU3y
daily-wallpapers/daily-wallpapers.github.io,2018/12/22/bd05d9380ae4459588eef75e0e25fc2c.html,5KafK34DXHztjFfL51J48dSvXUxtWW5akij5RVYEAaR228Kq7j
Grant-Redmond/cryptwallet,core/src/test/java/org/bitcoinj/core/DumpedPrivateKeyTest.java,5HtUCLMFWNueqN9unpgX2DzjMg6SDNZyKRb8s3LJgpFg5ubuMrk
ryanralph/DIY-Piper,keys.txt,5JYiNDZgZrH9sDR6FC9XSG175FoBDKPrrt6eyyKxPCdQ1AWJgDD
FranklinChen/IHaskell,notebooks/IHaskell.ipynb,5Ktqn7AjR45QoEABnJ2d5Sd6hRBCPBaLxUJCQgJxcXGEhoZKwD
djredhand/epistolae,sites/all/modules/panopoly_demo/panopoly_demo.features.content.inc,5HbHJ6fWvXPDfhKbxT4ein8RTyw2kgDC3A2sR2J9PpXrGjabpvh
SAP/cloud-portal-tutorial,Image Gallery.html,5KepoCFUEsCQYFGkU3AvgK2AxqEjAp3CjjpaTCeWkdjQ3MDCznq
yueou/nyaa,public/img/mafuyu.svg,5K4do2Fj2wgV9FPLkoWz3Sk9au3M3BQStA5TmvvfYaL798nfPnz
dwiel/tensorflow_hmm,notebooks/gradient_descent_example.ipynb,5JXNt3LQB9NAgYHU2naQLmdJwBPP1ML2R21QS4cAYAR2frp87Z
imrehg/electrum,lib/tests/test_account.py,5Khs7w6fBkogoj1v71Mdt4g8m5kaEyRaortmK56YckgTubgnrhz
thirdkey-solutions/pycoin,tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
asm-products/AssemblyCoins,API/Whitepaper.md,5JYVttUTzATan4zYSCRHHdN2nfJJHv6Nu1PB6VnhWSQzQRxnyLa
karlobermeyer/numeric-digit-classification,numeric_digit_classification.baseline.ipynb,5KaGgoQUFBvPTSS1y5cqVRTXV1NTNnziQ2NpZ27doRFhZGSkoK
richardkiss/pycoin,tests/cmds/test_cases/ku/bip32_keyphrase.txt,5JvNzA5vXDoKYJdw8SwwLHxUxaWvn9mDea6k1vRPCX7KLUVWa7W
paulgovan/AutoDeskR,vignettes/AutoDeskR.html,5Jb6vUZGXEC1AcK2vzQW5oApRuJseyHkerTHgp4pbMbU5t5kgV
trehansiddharth/mlsciencebowl,git/hs/archive/357.txt.mail,5KAAACAAgALgABAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAEEAAA
mermi/EFD2015,img/weblitmap-1.1.svg,5H3756AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPj78
breadwallet/breadwallet-core,Java/Core/src/test/java/com/breadwallet/core/BRWalletManager.java,5Kb8kLf9zgWQnogidDA76MzPL6TsZZY36hWXMssSzNydYXYB9KF
zsoltii/dss,dss-xades/src/test/resources/plugtest/esig2014/ESIG-XAdES/ES/Signature-X-ES-74.xsig,5HbuDMDxVSaWBvVRPSoaL8S6Vdtv3yBkz5bSgjAo1YKQn4q435
gyglim/Recipes,examples/ImageNet Pretrained Network (VGG_S).ipynb,data/text/MATRYOSHKA-CHALLENGE,5HW67XgqGZKakwVrpftp9bzQFBig1gfWCPUTUyxWaVCCcfLV17
dyslexic-charactersheets/assets,languages/spanish/pathfinder/Archetypes/Druid/Plains Druid (Animal Companion).pdf,5J5ut7i5STzG9rLppih1CFLy8kKSTK8sfwpHXeJa99hUnfChHRf
keith-epidev/VHDL-lib,top/stereo_radio/ip/xfft/c_addsub_v12_0/hdl/c_addsub_v12_0_legacy.vhd,5JW64dZ8AXjc3DEXpwxS1wcUvakyfxBHNyPgk9SseKsP4PojeGV
gitonio/pycoin,COMMAND-LINE-TOOLS.md,5KhoEavGNNH4GHKoy2Ptu4KfdNp4r56L5B5un8FP6RZnbsz5Nmb
techdude101/code,C Sharp/SNMP HDD Monitor v0.1/SNMP HDD Monitor/mainForm.resx,5KSkv8FBQXjAAAASQAAACsAAAAcAAAADwAAAAUAAAABAAAAAAAA
RemitaBit/Remitabit,tests/regression_tests/short_below_feed/alice.log,5HpUwrtzSztqQpJxVHLsrZkVzVjVv9nUXeauYeeSxguzcmpgRcK
SportingCRED/sportingcrdblog,desempenho/jonathan_rosa.html,5HRm4g8amexBY4KFhCyBVy5Db6UNkSKAgKo2ogXibQxfSDVmjh
bashrc/zeronet-debian,src/src/Test/TestSite.py,5JU2p5h3R7B1WrbaEdEDNZR7YHqRLGcjNcqwqVQzX2H4SuNe2ee
martindale/fullnode,test/privkey.js,5JxgQaFM1FMd38cd14e3mbdxsdSa9iM2BV6DHBYsvGzxkTNQ7Un
atsuyim/ZeroNet,src/Content/ContentManager.py,5JCGE6UUruhfmAfcZ2GYjvrswkaiq7uLo6Gmtf2ep2Jh2jtNzWR
greenfield-innovation/greenfield-innovation.github.io,_site/examples/visual/index.html,5JZUUxKS1NzVaFe8Rt21aD6ELyD1n8FCadw334UkZHAXb3NSyZ
JorgeDeLosSantos/master-thesis,src/ch3/parts_01.svg,5KWSJQ8D2d3T6SNwwbSs3Mwh7RyV3qRbSUbf13GNmML9zN1FvC
JaviMerino/lisa,ipynb/tutorial/00_LisaInANutshell.ipynb,5JB3VPWgYAAACALLBw4cKwQ6hm3rx5YYcQCZxRRVbiXnHwifyCT
nopdotcom/2GIVE,src/vanitygen-master/README,5JLUmjZiirgziDmWmNprPsNx8DYwfecUNk1FQXmDPaoKB36fX1o
dacsunlimited/dac_play,tests/regression_tests/collected_fees/alice.log,5J3SQvvxRK4RfzFDcWZR5sLRkjrMvTn1FKXnzNGvWLgWdctLDQm
bussiere/ZeroNet,src/Content/ContentManager.py,5JCGE6UUruhfmAfcZ2GYjvrswkaiq7uLo6Gmtf2ep2Jh2jtNzWR
loon3/Tokenly-Pockets,Chrome Extension/js/bitcoinsig.js,5JeWZ1z6sRcLTJXdQEDdB986E6XfLAkj9CgNE4EHzr5GmjrVFpf
openledger/graphene-ui,web/lib/common/trxHelper.js,5KikQ23YhcM7jdfHbFBQg1G7Do5y6SgD9sdBZq7BqQWXmNH7gqo
tensorflow/probability,tensorflow_probability/examples/jupyter_notebooks/Factorial_Mixture.ipynb,5JBZJUGinA9k9twEaTFKNRiorTP6NEoJYzAhoBzRqBrpCiu8Bt
mostafaizz/Face_Landmark_Localization,BioID-FaceDatabase/BioID_0380.pgm,5JdWRQPQTTYWPRPTUSUPSUUXRSTSTPSXUSQVXSSPQQPTSSUTXY
number7team/SevenCore,test/test.PrivateKey.js,5KS4jw2kT3VoEFUfzgSpX3GVi7qRYkTfwTBU7qxPKyvbGuiVj33
thephez/electrum,lib/tests/test_account.py,5Khs7w6fBkogoj1v71Mdt4g8m5kaEyRaortmK56YckgTubgnrhz
surikov/webaudiofontdata,sound/12842_7_Chaos_sf2_file.js,5HGw2nKaKhUAtyFG6aLMpTRULLRiTyHBCMFnFRWg6BsaULCwta
CMPUT301F14T07/Team7-301Project,UML/UML.class.violet.html,5HSUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAHNePP
kernoelpanic/tinker-with-usbarmory,README.md,5JrAdQp23Zkqi4NFwSGoh6kftHochz56ctxuFVemX1vy4KozLvV
benzwjian/bitcoin-fp,test/ecpair.js,5J3mBbAH58CpQ3Y5RNJpUKPE62SQ5tfcvU2JpbnkeyhfsYB1Jcn
zaki/slave,SLave/SLaveForm.resx,5H3v8s6YfyJQZApinCSUNL6T32hHUDnneGdaiwCa8ekodUyfF6Z
bokae/szamprob,notebooks/Package04/pi.ipynb,5J5VdLSedNJqWbYso6TUxoj52bfA1XwmZse2N1TdMGvdgscMP3
sciant/sciant.github.io,assets/desktop/Oval.svg,5HCxg2h2kWtAYr8RmVxHSgYzBTDQwiZrD25K75ZbazTeJpHN1rK
UCL-BLIC/legion-buildscripts,cytofpipe/v1.3/Rlibs/openCyto/doc/openCytoVignette.html,5KEPZdiwYXq9nkajPfT2oiCgPnqU5xT1zhyLSE1fM74n8e8Pgs
xJom/mbunit-v3,src/Extensions/Icarus/Gallio.Icarus/Reload/ReloadDialog.resx,5Hsyo5SiKUCgUtJRSSyk151wrpTRjTANQSRzBrGfWiJc1BMePHz
GroestlCoin/bitcoin,src/wallet/test/psbt_wallet_tests.cpp,5KSSJQ7UNfFGwVgpCZDSHm5rVNhMFcFtvWM3zQ8mW4qNDEN7LFd
gitcoinco/web,app/assets/v2/images/kudos/doge.svg,5JCWfzJMfuHtoYFVMBYxXDbGLbpib6af32A5tEX9eTfbiEDyKy9
moocowmoo/dashman,lib/pycoin/tests/build_tx_test.py,5JMys7YfK72cRVTrbwkq5paxU7vgkMypB55KyXEtN5uSnjV7K8Y
reblws/tab-search,src/static/lib/fonts.css,5KgjXgnXgNVBJ3HpaugH8GWwEr4NNgL2D9ofN25TvnfjvYkXWsK
schildbach/bitcoinj,core/src/test/java/org/bitcoinj/core/Base58Test.java,5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsreAbuatmU
QuantEcon/QuantEcon.notebooks,ddp_ex_career_py.ipynb,5JkqSRsSBaCKrqjiSvAv4fsAg4q6quGnJYkiSNjAWREABU1T8D
moncho/warpwallet,bitcoin/bitcoin_test.go,5JfEekYcaAexqcigtFAy4h2ZAY95vjKCvS1khAkSG8ATo1veQAD
BigBrother1984/android_external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,5Js3pmbMmKEaNmyoRowYoTj2kHb2MBW74ap169bq7NmzrvFDhg
taishi107/K-means_clustering,K-means_clustering3.ipynb,5HLKFQZeSqRSCTPGTLyVCKRSJ4zpGKXSCSS5wyp2CUSieQ5Qyp2
keith-epidev/VHDL-lib,top/stereo_radio/ip/xfft/xfft_v9_0/hdl/shift_ram.vhd,5HAT5FAXSTzJUoP6GBwzVHhyeEqVpX4CC8pfA8TEjGyxPv2tkY
sawatani/bitcoin-hall,test/Fathens/Bitcoin/Wallet/KeysSpec.hs,5JHm1YEyRk4dV8KNN4vkaFeutqpnqAVf9AWYi4cvQrtT5o57dPR
ivansib/sib16,src/test/data/base58_keys_valid.json,5KEyMKs1jynRbTfpGFPveXyxMcfZb1X9SnR3TneYQwRtXdzkzhL
ledeprogram/algorithms,class4/homework/najmabadi_shannon_4_3.ipynb,5Jfp5kZKyPAjkSNQwje6pGVVFbXpuUV1teS9WoqgLVKNiYTdQw
desihub/desitarget,doc/nb/connecting-spectra-to-mocks.ipynb,5JjZv3ozVq1ebXJfqjGtZW4bCJkVFRcjMzHToGpMmTcKJEycE97
eneldoserrata/marcos_openerp,addons/fleet/fleet_cars.xml,5KJ9TQiWUeuAtrhEWjgCZwAbgg3VLU4Ecg8pQCRBRmUpFGPSLex
JCROM-Android/jcrom_external_chromium_org,chrome/browser/resources/ntp_android/mockdata.js,5Js3pmbMmKEaNmyoRowYoTj2kHb2MBW74ap169bq7NmzrvFDhg
alixaxel/dump.HN,data/items/2014/03/08/23-24.csv,5HpHagT65TZzG1PH3CSu63k8DbpvD9KsvQVUCsn2t55TVA1jxW7
qutip/qutip-notebooks,development/development-ssesolve-tests.ipynb,5KM85KMs5CXg9ZCzHbZ3DPJa2wDP7uzm6e1dDKYtykNe3r26iT
bitshares/devshares,tests/regression_tests/short_below_feed/alice.log,5HpUwrtzSztqQpJxVHLsrZkVzVjVv9nUXeauYeeSxguzcmpgRcK
denkhaus/bitsharesx,tests/regression_tests/titan_test/client1.log,5JMnSU8bfBcu67oA9KemNm5jbs9RTp2eBHqxoR53WWyB4CH2QJF
ivanfoong/practical-machine-learning-assessment,building_human_activity_recognition_model.html,5Jno7z6Py8fkQkwASbABJgAE2AC9U5g4cKFcmoy4jheeeWVHh1
vikashvverma/machine-learning,mlfoundation/istat/project/investigate-a-dataset-template.ipynb,5KMV5Jz3ytfN7q9KTC4AqBPiLPhRwjLowiY19xaGhmRJ93auJjF
partrita/partrita.github.io,posts/altair/index.html,5KWn2EcAX6ibXaCEEXvTSJeP6T445Sc9mPreCPPrDUmX4cNegw
OriolAbril/Statistics-Rocks-MasterCosmosUAB,Rico_Block3/Block3_HT.ipynb,5HxYmbvSepRw6a73P3VoM9dipzifb4xaztSfWptxuqcdhvRM7Pj
wildbillcat/MakerFarm,MakerFarm/Migrations/201401212104523_Formatted external to humanfriendly column name.resx,5K1zhDkW7JZWeJsS9brtA8fojjTko3p1edWzMukZHXw5GZRciJN
nccgroup/grepify,Win.Grepify/Win.Grepify/Form1.resx,5J6AMbRdASzmN4pnx622HrzrEm7attVmygUetm88aLpJkAKMRUy
voytekresearch/misshapen,demo_Shape dataframe.ipynb,5HRUdGxuLsLAwBAcH49WrV3j27Bnu3bsns56oPCP1GjRokKdrA
SirmaITT/conservation-space-1.7.0,cs-models/Partners definitions/SMK/template/treatmentreportmultipleobjecttemplate.xml,5KoJspjvTDGPkPbDHC2NtohN1MKypc69Fiy9LFepfg6tt4S3orU
4flyers/4flyers.github.io,img/portfolio/portfolio_projects.svg,5KYWXACSGEEHJxmxnZB2HBQed3MyWEEEJ6ncA3zhVoU2Wxy69J
frrp/bitshares,tests/regression_tests/issue_1229_titan/alice.log,5KasHemYTcbGtHXKHNx5sUMPrrz8r4GuU3ao157F6Wx95y7NnbN
wkitty42/fTelnet,rip-tests/16c/JE-APOC.RIP,5HF56H958H15CGW5DGS5DGL5EGG5EG75EG15DFT5CFN5CFJ5AF
iobond/bitcore-old,test/test.WalletKey.js,5KMpLZExnGzeU3oC9qZnKBt7yejLUS8boPiWag33TMX2XEK2Ayc

Different behaviour for io in pickle with string content

When working with pickled data I encountered a different behavior for the io.open and __builtin__.open. Consider the following simple example:
import pickle
payload = 'foo'
fn = 'test.pickle'
pickle.dump(payload, open(fn, 'w'))
a = pickle.load(open(fn, 'r'))
This works as expected. But running this code here:
import pickle
import io
payload = 'foo'
fn = 'test.pickle'
pickle.dump(payload, io.open(fn, 'w'))
a = pickle.load(io.open(fn, 'r'))
gives the following Traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "D:/**.py", line 15, in <module>
pickle.dump(payload, io.open(fn, 'w'))
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 1370, in dump
Pickler(file, protocol).dump(obj)
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 224, in dump
self.save(obj)
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 488, in save_string
self.write(STRING + repr(obj) + '\n')
TypeError: must be unicode, not str
As I want to be future-compatible, how can I circumwent this misbehavior? Or, what else am I doing wrong here?
I stumbled over this when dumping dictionaries with keys of type string.
My python version is:
'2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]'
The difference is not supprising, because io.open() explicitly deals with Unicode strings when using text mode. The documentation is quite clear about this:
Note: Since this module has been designed primarily for Python 3.x, you have to be aware that all uses of “bytes” in this document refer to the str type (of which bytes is an alias), and all uses of “text” refer to the unicode type. Furthermore, those two types are not interchangeable in the io APIs.
and
Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as unicode strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.
You need to open files in binary mode. The fact that it worked without with the built-in open() at all is actually more luck than wisdom; if your pickles contained data with \n and/or \r bytes the pickle loading may well fail. The Python 2 default pickle happens to be a text protocol but the output should still be considered as binary.
In all cases, when writing pickle data, use binary mode:
pickle.dump(payload, open(fn, 'wb'))
a = pickle.load(open(fn, 'rb'))
or
pickle.dump(payload, io.open(fn, 'wb'))
a = pickle.load(io.open(fn, 'rb'))

Specifying select features to be categorical using OneHotEncoder in sklearn 0.14

I am using the sklearn 0.14 module in Python to create a decision tree. I was hoping to use the OneHotEncoder to convert some features into categorical features. According to the documentation, I should be able to provide an array of indices to indicate which features should be converted. However, trying the following code:
xs = [[64, 15230], [3, 67673], [16, 43678]]
encoder = preprocessing.OneHotEncoder(n_values='auto', categorical_features=[1], dtype=numpy.integer)
encoder.fit(xs)
I receive the following error:
Traceback (most recent call last): File
"C:\Users\sara\Documents\Shipping
Project\PythonSandbox\CarrierDecisionTree.py", line 35, in <module>
encoder.fit(xs) File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
892, in fit
self.fit_transform(X) File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
944, in fit_transform
self.categorical_features, copy=True) File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
795, in _transform_selected
return sparse.hstack((X_sel, X_not_sel)) File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 417,
in hstack
return bmat([blocks], format=format, dtype=dtype) File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 532,
in bmat
dtype = upcast( *tuple([A.dtype for A in blocks[block_mask]]) ) File "C:\Python27\lib\site-packages\scipy\sparse\sputils.py", line 53,
in upcast
raise TypeError('no supported conversion for types: %r' % (args,)) TypeError: no supported conversion for types: (dtype('int32'),
dtype('S6'))
If instead, I provide the array [0, 1] to categorical_features, it works correctly and converts both features properly. The same correct behavior occurs with using 'all' to categorical_features. However, I only want the second feature converted and not the first. I understand I could do this manually by converting one feature at a time, but I was hoping to use all the beauty of OneHotEncoder as I will be using many more features later on.
Posting as an answer, for the record:
TypeError: no supported conversion for types: (dtype('int32'), dtype('S6'))
means something in the true xs (not the one shown in the code snippet) is a string: dtype('S6') is NumPy's length-six string type.

How to store the triples in 4store

File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/django_gstudio-0.3.dev-py2.7.egg/gstudio/testing1.py", line 129, in rdf_description
store.add(self,(subject, predicate, object),context)
File "/usr/local/lib/python2.7/dist-packages/rdflib-3.2.0-py2.7.egg/rdflib/plugins/memory.py", line 298, in add
Store.add(self, triple, context, quoted)
File "/usr/local/lib/python2.7/dist-packages/rdflib-3.2.0-py2.7.egg/rdflib/store.py", line 177, in add
def add(self, (subject, predicate, object), context, quoted=False):
in
store.add(self, (subject, predicate, object), context, quoted=False)
AFAIK - rdflib does not support 4store. But you can easily assert the triples using curl and python and the 4store SPARQL Server. Here there is an example:
import subprocess
command = ["curl","-s",
"-T","/some/file/with/triples",
"-H","Content-Type: application/x-turtle",
"http://localhost:port/data/http://graph.to/save/triples"]
p = subprocess.Popen(command,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
output, err = p.communicate()
ret = p.poll()
if ret <> 0:
raise Exception, "Error asserting triples"
In this example the content type is turtle but you can use any of the other RDF serializations (ntriples, rdfxml).
If you do not want to deal with subprocesses you can also translate this call into a urllib/urllib2 function.
There are more examples in the 4store SparqlServer documentation. And, optionally, you can use any of the Python 4store client libraries.