I've got a question regarding a mysterious doctrine query error.
Long story short: I'm trying to store longblob data within my database (which can go up to x00mb for example), so i did the following steps:
Create my own longblob type and field, register it according to:
https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/cookbook/advanced-field-value-conversion-using-custom-mapping-types.html
Doctrine custom data type
My MySQL database looks like: so i think it works?
mysql> describe DataBlocks;
+-----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| data_type_id_id | int(11) | NO | MUL | NULL | |
| project_id_id | int(11) | NO | MUL | NULL | |
| data_block_name | varchar(100) | YES | | NULL | |
| content | longblob | YES | | NULL | |
| comment | longtext | YES | | NULL | |
| ts_added | datetime | NO | | NULL | |
+-----------------+--------------+------+-----+---------+----------------+
My Symfony4.1 FormType file field is as follows:
public function buildForm(FormBuilderInterface $builder, array $options)
{
$builder
->add('dataBlockName', TextType::class)
->add('content', FileType::class)
I also adjusted the lines in my php.ini file for unlimited file size (i know this isn't really secure but.. it's just for now)
post_max_size = 0M
upload_max_filesize = 0M
And I get this error when my entity manager flushes the entity:
An exception occurred while executing 'INSERT INTO DataBlocks
(data_block_name, content, comment, ts_added, data_type_id_id,
project_id_id) VALUES (?, ?, ?, ?, ?, ?)' with params
["BTC_DOGE_tradehistory", Resource id #66, "450mb", "2018-10-08
10:19:44", 1, 1]:
Warning: Error while sending QUERY packet. PID=6016
Your help would be kindly appreciated!
FYI: it works for small files, but when i try to upload something big it becomes that vague error
The query describe in the exception tells you that the column content has for value a PHP resource. So i think it's a cast problem. Blob data are stored as bytes string. You also have possible issue with server configuration. There is Apache/Nginx or whatever, PHP but also the server sql.
Here an example for mysql: doc
Related
i have simple django notification table with the following structure
+-------------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| level | varchar(20) | NO | | NULL | |
| unread | tinyint(1) | NO | | NULL | |
| actor_object_id | varchar(255) | NO | | NULL | |
| verb | varchar(255) | NO | | NULL | |
| description | longtext | YES | | NULL | |
| target_object_id | varchar(255) | YES | | NULL | |
| action_object_object_id | varchar(255) | YES | | NULL | |
| timestamp | datetime(6) | NO | | NULL | |
| public | tinyint(1) | NO | | NULL | |
| action_object_content_type_id | int(11) | YES | MUL | NULL | |
| actor_content_type_id | int(11) | NO | MUL | NULL | |
| recipient_id | int(11) | NO | MUL | NULL | |
| target_content_type_id | int(11) | YES | MUL | NULL | |
| deleted | tinyint(1) | NO | | NULL | |
| emailed | tinyint(1) | NO | | NULL | |
| data | longtext | YES | | NULL | |
And all what i want is to fetch the content so this is my view
#api_view(['GET'])
#login_required()
def getnotifications(request, page):
try:
if page == None:
page = 1
userID = request.user
unreadnum = Notification.objects.filter(recipient=request.user,
unread=True).count()
notifs = Notification.objects.filter(recipient=userID, unread=True).distinct().order_by(
'-timestamp')
print("got ntifs")
paginator = Paginator(notifs, 10)
paginatednotifs = paginator.page(page)
return Response(
{"notifications": NotificationSerializer(paginatednotifs,many=True, context={"user": request.user}).data,
"unread": unreadnum,"has_next":paginatednotifs.has_next()})
except Exception as e:
print("========")
print(str(e))
return Response(
{"notifications": str(e)})
and thus view's serilizer is like this :
class NotificationSerializer(serializers.ModelSerializer):
actor = serializers.SerializerMethodField()
target = serializers.SerializerMethodField()
class Meta:
model = Notification
fields = ("id","actor", "target","timestamp","verb")
def get_actor(self,obj):
user = Useraccount.objects.get(user__id=obj.actor_object_id)
return UserAccountSerializer(user,many=False,context={"user":self.context["user"]}).data
def get_target(self,obj):
if obj.target_content_type.model == "action":
action = ActstreamAction.objects.get(id=obj.target_object_id)
return ActionNotificationSerializer(action,many=False).data
return {"targetType":obj.target_content_type.model,"action":obj.action_object_content_type.model}
i tried to make many modifications in the serilizer and the view but always and always the same error
from_db_value() takes 4 positional arguments but 5 were given
i couldn't find this from_db_value() function
i'm really having a hard time with this problem, and i know just the basics about Django
i'm using
django : 1.11.18
djangorestframework : 3.6.4
mysql : 5.7.25
A traceback for the error:
Traceback (most recent call last):
File "<homedir>/project/webServer/app/myNotifications/views.py", line 66, in getnotifications
{"notifications": NotificationSerializer(paginatednotifs,many=True, context={"user": request.user}).data,
File "<homedir>/virtualenv/lib/python3.6/site-packages/rest_framework/serializers.py", line 739, in data
ret = super(ListSerializer, self).data
File "<homedir>/virtualenv/lib/python3.6/site-packages/rest_framework/serializers.py", line 263, in data
self._data = self.to_representation(self.instance)
File "<homedir>/virtualenv/lib/python3.6/site-packages/rest_framework/serializers.py", line 657, in to_representation
self.child.to_representation(item) for item in iterable
File "<homedir>/virtualenv/lib/python3.6/site-packages/rest_framework/serializers.py", line 657, in <listcomp>
self.child.to_representation(item) for item in iterable
File "/usr/lib/python3.6/_collections_abc.py", line 883, in __iter__
v = self[i]
File "<homedir>/virtualenv/lib/python3.6/site-packages/django/core/paginator.py", line 145, in __getitem__
self.object_list = list(self.object_list)
File "<homedir>/virtualenv/lib/python3.6/site-packages/django/db/models/query.py", line 250, in __iter__
self._fetch_all()
File "<homedir>/virtualenv/lib/python3.6/site-packages/django/db/models/query.py", line 1121, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "<homedir>/virtualenv/lib/python3.6/site-packages/django/db/models/query.py", line 62, in __iter__
for row in compiler.results_iter(results):
File "<homedir>/virtualenv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 847, in results_iter
row = self.apply_converters(row, converters)
File "<homedir>/virtualenv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 832, in apply_converters
value = converter(value, expression, self.connection, self.query.context)
TypeError: from_db_value() takes 4 positional arguments but 5 were given
TL;DR:
Most likely, jsonfield package is not incompatible with Django==1.11.18
Details:
You are using Django in version 1.11.18, which requires 5 positional arguments for from_db_value method and do not support JSONFields.
You are also using django-notifications package, which internally uses jsonfield>=1.0.3 package. Since there is no max. version set, django-notifications uses the newest version of jsonfield package.
The newest versions of jsonfield (3.0.0 and higher) doesn't support Django below 2.2. One of the reasons is that it takes only 4 arguments instead of 5.
The highest version of jsonfield that supports Django 1.11 is jsonfield==2.1.1
Please check the version of installed jsonfield package (use grep only if you're on unix sytem):
pip freeze | grep jsonfield
If it's 3.0.0 or more, you may try to downgrade it to 2.1.1. Be aware that it may (or may not) cause other compatibility issues with other packages.
I got the same error:
File "/home/django/lee3/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 833, in apply_converters
value = converter(value, expression, self.connection, self.query.context)
TypeError: from_db_value() takes 4 positional arguments but 5 were given
The 'from_db_value' it was complaining about was in /picklefield/fields.py.
Changed line 184:
def from_db_value(self, value, expression, connection):
to:
def from_db_value(self, value, expression, connection, context=None):
Everything works fine now.
According to my assignment admin must be able to create Polls with Questions (create, delete, update) and Choices related to this questions. All of this should be displayed and changable on the same admin page.
Poll
|
|_question_1
| |
| |_choice_1(text)
| |
| |_choice_2
| |
| |_choice_3
|
|_question_2
| |
| |_choice_1
| |
| |_choice_2
| |
| |_choice_3
|
|_question_3
|
|_choice_1
|
|_choice_2
|
|_choice_3
Ok, it's not a problem to display one level of nesting like so on
class QuestionInline(admin.StackedInline):
model = Question
class PollAdmin(ModelAdmin):
inlines = [
QuestionInline,
]
But how to do to get the required poll design structure?
Check out this library it should provide the functionality.
I'm new to database programming, apologies if I ask something simply.
I newly add few tables into my DB use Django model and migrations, now I'm using python bring data and print on scripts
Now to point of my error:
DB is connected successfully
Failed to execute database program
relation "cgi_limit" does not exist
LINE 1: SELECT * FROM CGI_limit
^
connection of DB had close successfully
now I check twice on naming. I try others tables such as auth_user and its was able print the table contents and I check to see if table exit in my DB as shown below;
Farm=# SELECT * FROM pg_tables;
schemaname | tablename | tableowner | tablespace | hasindexes | hasrules | hastriggers | rowsecurity
public | django_session | FAT | | t | f | f | f
public | auth_permission | FAT | | t | f | t | f
public | auth_user_user_permissions | FAT | | t | f | t | f
public | auth_user | FAT | | t | f | t | f
public | django_admin_log | FAT | | t | f | t | f
public | CGI_ambient | FAT | | t | f | f | f
public | CGI_tank_system | FAT | | t | f | f | f
public | CGI_limit | FAT | | t | f | f | f
I my python code that render the DB;
#import liberys
import psycopg2 as pg2
from datetime import timedelta, datetime, date
############################################
# Function codes
def getDbConnection():
#Get Database connection
try:
connection =pg2.connect(user='FAT',
password='*******',
host='',
port='5432',
database='Farm')
print ("DB is connected succefully")
return connection
except(Exception, pg2.DatabaseError) as error:
print("Failed to connect to database")
def closeDbConnection(connection):
#Close Database connection
try:
connection.close()
print("connection of DB had close succefully")
except(Exception, pg2.DatabaseError) as error:
print("Failed to close database connection")
def DisplayDBdata():
try:
connection = getDbConnection()
cursor = connection.cursor()
query = 'SELECT * FROM "CGI_limit"'
cursor.execute(query,)
records = cursor.fetchall()
for row in records:
print("date: = ", row[1])
except(Exception, pg2.DatabaseError) as error:
print("Failed to execute database program")
print(error)
finally:
closeDbConnection(connection)
#############################################################
#code to be excuted
#DeleteDBdata()
DisplayDBdata() #for testing only
#end of code thats excute
I'm stump of what I should do. I did some google search and result only naming
I appreciate if you could help me
Postgres does not like capitalized table names. You will need to put the table name in quotes to make it work. I would recommend sticking with lowercase names.
query = 'SELECT * FROM "CGI_limit"'
Documentation link
I figured out how to read files into my pyspark shell (and script) from an S3 directory, e.g. by using:
rdd = sc.wholeTextFiles('s3n://bucketname/dir/*')
But, while that's great in letting me read all the files in ONE directory, I want to read every single file from all of the directories.
I don't want to flatten them or load everything at once, because I will have memory issues.
Instead, I need it to automatically go load all the files from each sub-directory in a batched manner. Is that possible?
Here's my directory structure:
S3_bucket_name -> year (2016 or 2017) -> month (max 12 folders) -> day (max 31 folders) -> sub-day folders (max 30; basically just partitioned the collecting each day).
Something like this, except it'll go for all 12 months and up to 31 days...
BucketName
|
|
|---Year(2016)
| |
| |---Month(11)
| | |
| | |---Day(01)
| | | |
| | | |---Sub-folder(01)
| | | |
| | | |---Sub-folder(02)
| | | |
| | |---Day(02)
| | | |
| | | |---Sub-folder(01)
| | | |
| | | |---Sub-folder(02)
| | | |
| |---Month(12)
|
|---Year(2017)
| |
| |---Month(1)
| | |
| | |---Day(01)
| | | |
| | | |---Sub-folder(01)
| | | |
| | | |---Sub-folder(02)
| | | |
| | |---Day(02)
| | | |
| | | |---Sub-folder(01)
| | | |
| | | |---Sub-folder(02)
| | | |
| |---Month(2)
Each arrow above represents a fork. e.g. I've been collecting data for 2 years, so there are 2 years in the "year" fork. Then for each year, up to 12 months max, and then for each month, up to 31 possible day folders. And in each day, there will be up to 30 folders just because I split it up that way...
I hope that makes sense...
I was looking at another post (read files recursively from sub directories with spark from s3 or local filesystem) where I believe they suggested using wildcards, so something like:
rdd = sc.wholeTextFiles('s3n://bucketname/*/data/*/*')
But the problem with that is it tries to find a common folder among the various subdirectories - in this case there are no guarantees and I would just need everything.
However, on that line of reasoning, I thought what if I did..:
rdd = sc.wholeTextFiles("s3n://bucketname/*/*/*/*/*')
But the issue is that now I get OutOfMemory errors, probably because it's loading everything at once and freaking out.
Ideally, what I would be able to do is this:
Go to the sub-directory level of the day and read those in, so e.g.
First read in 2016/12/01, then 2016/12/02, up until 2012/12/31, and then 2017/01/01, then 2017/01/02, ... 2017/01/31 and so on.
That way, instead of using five wildcards (*) as I did above, I would somehow have it know to look trough each sub-directory at the level of "day".
I thought of using a python dictionary to specify the file path to each of the days, but that seems like a rather cumbersome approach. What I mean by that is as follows:
file_dict = {
0:'2016/12/01/*/*',
1:'2016/12/02/*/*',
...
30:'2016/12/31/*/*',
}
basically for all the folders, and then iterating through them and loading them in using something like this:
sc.wholeTextFiles('s3n://bucketname/' + file_dict[i])
But I don't want to manually type out all those paths. I hope this made sense...
EDIT:
Another way of asking the question is, how do I read the files from a nested sub-directory structure in a batched way? How can I enumerate all the possible folder names in my s3 bucket in python? Maybe that would help...
EDIT2:
The structure of the data in each of my files is as follows:
{json object 1},
{json object 2},
{json object 3},
...
{json object n},
For it to be "true json", it either just needed to be like the above without a trailing comma at the end, or something like this (note square brackets, and lack of the final trailing comma:
[
{json object 1},
{json object 2},
{json object 3},
...
{json object n}
]
The reason I did it entirely in PySpark as a script I submit is because I forced myself to handle this formatting quirk manually. If I use Hive/Athena, I am not sure how to deal with it.
Why dont you use Hive, or even better, Athena? These will both deploy tables ontop of file systems, to give you access to all the data. Then you can capture this in to Spark
Alternatively, I believe you can also use HiveQL in Spark to set up a tempTable ontop of your file system location, and it'll register it all as a Hive table which you can execute SQL against. It's been a while since I've done that, but it is definitely do-able
This might be Naive, but I just started with PySpark and Spark. Please help me understanding the One Hot Technique in Pyspark. I am trying to do OneHotEncoding on one of the column. After one hot encoding, the dataframe schema adds avector. But to apply Machine Learning algorithm, that should be an individual columns added to the existing data frame with each column represents a category, but not the vector type column. How can validate the OneHotEncoding.
My Code:
stringIndexer = StringIndexer(inputCol="business_type", outputCol="business_type_Index")
model = stringIndexer.fit(df)
indexed = model.transform(df)
encoder = OneHotEncoder(dropLast=False, inputCol="business_type_Index", outputCol="business_type_Vec")
encoded = encoder.transform(indexed)
encoded.select("business_type_Vec").show()
This display:
+-----------------+
|business_type_Vec|
+-----------------+
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
| (2,[0],[1.0])|
+-----------------+
only showing top 20 rows
The newly added column is of vector type. How can I convert that to individual columns of each category
You probably already have an answer, but maybe it will be helpful for someone else. For vector split, you can use this answer (I've checked that it works):
How to split dense Vector into columns - using pyspark
However I don't think you need to convert vector back to columns (as mtoto already said), as all models in spark actually require you to provide input features in vector format (please correct me if I am wrong).