Foreign Keys on Scrapy - django

im doing an scrap with scrapy and my model on django is:
class Creative(models.Model):
name = models.CharField(max_length=200)
picture = models.CharField(max_length=200, null = True)
class Project(models.Model):
title = models.CharField(max_length=200)
description = models.CharField(max_length=500, null = True)
creative = models.ForeignKey(Creative)
class Image(models.Model):
url = models.CharField(max_length=500)
project = models.ForeignKey(Project)
And my scrapy model:
from scrapy.contrib.djangoitem import DjangoItem
from app.models import Project, Creative
class ProjectItems(DjangoItem):
django_model = Project
class CreativeItems(DjangoItem):
django_model = Creative
So when i save:
creative["name"] = hxs.select('//*[#id="owner"]/text()').extract()[0]
picture = hxs.select('//*[#id="owner-icon"]/a/img/#src').extract()
if len(picture)>0:
creative["picture"] = picture[0]
creative.save()
# Extract title and description of the project
project["title"] = hxs.select('//*[#id="project-title"]/text()').extract()[0]
description = hxs.select('//*[#class="project-description"]/text()').extract()
if len(description)>0:
project["description"] = description[0]
project["creative"] = creative
project.save()
I got the error:
Project.creative" must be a "Creative" instance.
So, how can i add a foreing key value on scrapy?

This can be done by assigning the return value of the creative.save() to the value at project['creative'] So for instance in the following example we use the djangoCreativeItem variable to pass this information to the project:
creative["name"] = hxs.select('//*[#id="owner"]/text()').extract()[0]
picture = hxs.select('//*[#id="owner-icon"]/a/img/#src').extract()
if len(picture)>0:
creative["picture"] = picture[0]
djangoCreativeItem = creative.save()
# Extract title and description of the project
project["title"] = hxs.select('//*[#id="project-title"]/text()').extract()[0]
description = hxs.select('//*[#class="project-description"]/text()').extract()
if len(description)>0:
project["description"] = description[0]
project["creative"] = djangoCreativeItem
project.save()

Like it's been done here, put the ID of your creative directly in creative_id, it should work I think:
project["creative_id"] = creative.id
It will specify the foreign key, without bother you with the object missing (because you're in a Scrapy environment where you don't directly touch the model objects...).

Related

Django query based on another query results

I have 4 models in my simplified design
class modelA(models.Model):
name = models.CharField()
class modelsUser(model.Model):
username = models.CharField()
class bridge(models.Model):
user = models.ForeignKey(modelUser, on_delete=models.CASCADE, related_name='bridges')
modelA = models.ForeignKey(modelA, on_delete=models.CASCADE, related_name='bridges')
class subModelA(models.Model):
modelA = models.ForeignKey(modelA, on_delete=models.CASCADE, related_name='subModelAs')
value = models.IntegerField()
class subModelB(models.Model):
modelA = models.ForeignKey(modelA, on_delete=models.CASCADE, related_name='subModelBs')
text = models.TextField()
What I am trying to to is to get all subModelBs and subModelAs that are for modelAs for which given modelUser have bridge.
I've started with this:
user = modelUser.objects.get(pk=1)
bridges = user.bridges.all()
What I've been thinking is something like this:
subModelBs = subModelB.objects.filter(modelA__in=bridges__modelA)
but unfortunately it doesn't work because of error that __modelA is not defined.
Is there any proper way to do this?
Find first the modelAs and then do two other queries:
modelAs = bridge.objects.filter(user__pk=1).values_list('modelA', flat=True)
subModelAs = subModelA.object.filter(modelA__in=modelAs)
subModelBs = subModelB.object.filter(modelA__in=modelAs)
A good question first of all!
Tried reproducing on my system, the following worked for me:
user = modelUser.objects.get(pk=1)
bridges = user.bridges.all()
subModelAs = subModelA.objects.filter(
modelA_id__in=[x.modelA_id for x in list(bridges)]
)
And similarly for subModelBs. Hope this helps you well.

Django Retrieve id and name based on a ForeignKey

I have the following models :
class Projects(models.Model):
id = models.IntegerField(primary_key=True, blank=True)
name = models.CharField(max_length=20, unique=True)
company = models.CharField(max_length=20, blank=True, null=True)
creation_date = models.DateField(auto_now_add=True, auto_now=False)
class Packages(models.Model):
id = models.IntegerField(primary_key=True, blank=True)
name = models.CharField(max_length=50)
extension = models.CharField(max_length=3)
gen_date = models.DateTimeField(auto_now_add=True, auto_now=False)
project = models.ForeignKey(Projects)
In my views, for the Homepage function, I'm trying to display the last package, AND the associated project. I don't understand how to retrieve the 'project' field (FK) :
try:
lastpackages = Packages.objects.reverse()[:1].get()
except Packages.DoesNotExist:
lastpackages = None
projectid = lastpackages.select_related('project_id')
project = Projects.objects.get(id=lastpackages.project)
return render(request, 'homepage.html', {'lastpackages': lastpackages,
'project': project})
In fact, I want to display the 'projectname' corresponding to the package retrieved by reverse. But the lines projectid and project are not correct. I hope it's enough clear..
Sorry to say but your code is a bit messy. You don't need to look up the Project separately, django ORM does it for you:
package = Package.objects.order_by('-id')[0]
project = package.project
package.project would give you the project associated with the package, no need to query using id.
Some advises here:
You don't need to define id, django will do it for you.
Don't use plural form in your model name, django will do it for you.
In view it's usually good exercise to use get_object_or_404 to get the object, it saves your try except block.
reverse() should be used along with order_by() statement. In your case it's easier to just use id to find the last entry, because in django id is auto incremented.
Try this:
lastpackage = Packages.objects.reverse()[0]
project = lastpackage.project
The first thing you need to keep in mind is that lastpackages is a Packages object, not a Queryset, so this line is wrong:
projectid = lastpackages.select_related('project_id')
It should return AttributeError: 'Packages' object has no attribute 'select_related'
About what you're asking, once you have a Packages object, you can get the corresponding project id like this:
lastpackages.project.pk
And the full Projects object, if needed:
lastpackages.project

django change formfield data

I am working with django-taggit (https://github.com/alex/django-taggit). To let a user add tags i use a formfield that I convert into the tags and add them.
However, when i try to load the template for editing. i get the taggit objects in my bar.
Now i want to convert those in a normal readable string again.
However, i can't seem to edit that field of the instance before passing it to my form.
The Form:
class NewCampaignForm(forms.ModelForm):
""" Create a new campaign and add the searchtags """
queryset = Game.objects.all().order_by("name")
game = forms.ModelChoiceField(queryset=queryset, required=True)
tags = forms.CharField(required=False)
focus = forms.ChoiceField(required=False, choices=Campaign.CHOICES)
class Meta:
model = Campaign
fields = ["game", "tags", "focus"]
exclude = ["owner"]
my model:
class Campaign(models.Model):
""" campaign information """
ROLEPLAY = "Roleplay"
COMBAT = "Combat"
BOTH = "Both"
CHOICES = (
(ROLEPLAY, "Roleplay"),
(COMBAT, "Combat"),
(BOTH, "Both"),
)
owner = models.ForeignKey(User)
game = models.ForeignKey(Game)
focus = models.CharField(max_length=15, choices=CHOICES)
tags = TaggableManager()
view:
def campaign_view(request, campaign_id):
campaign = get_object_or_404(Campaign, pk=campaign_id)
campaign.tags = "Some string"
new_campaign_form = NewCampaignForm(instance=campaign)
But when i try this i still get the taggit objects([]) in my inputfield instead of the "Some string"
How should i solve this
I am not sure how i overlooked this. But this works:
new_campaign_form = NewCampaignForm(instance=campaign, initial={"tags": "Some String"})
Excuse me, should have looked more and try better

Many to many. files is an invalid keyword argument for this function

I'm trying with relationship in django
models
class File(models.Model):
name = models.CharField(max_length=255)
src = models.FileField(upload_to="files"
class UserBuyFile(models.Model):
user = models.ForeignKey(User)
files = models.ManyToManyField(File)
views.py
def buy_file(request,id):
f = File.objects.get(id=id)
user_buy_file = UserBuyFile.objects.create(files=f,user=request.user)
I have this error:
'files' is an invalid keyword argument for this function
That's not how ManyToManyFields are populated. Create and save the model first, then use the manager on the field.
Try this:
def buy_file(request,id):
f = File.objects.get(id=id)
user_buy_file, dummy_created = UserBuyFile.objects.get_or_create(user=request.user)
user_buy_file.files.add(f)
I also recommend you set unique=True for the field user in model UserBuyFile:
class UserBuyFile(models.Model):
user = models.ForeignKey(User, unique=True)
files = models.ManyToManyField(File)

Automatically update images

I'd like to implement a functionality in an app of mine, but I don't know how to go about it. What I want is this: I have a model class that uses imagekit to save its images, and I'd like to have the users being able to update the images easily for the vehicles without having to edit each respective vehicle record.
How they'll do this is that there will be a folder named originals and it'll contain folders for each vehicle in the format <stock_number>/PUBLIC If a user moves images into the PUBLIC folder for a vehicle, when the script is executed, it'll compare those images with the current ones and update them if those in the PUBLIC folder are newer. If the record has no images, then they will be added. Also, if the images have been deleted from the site_media directory, then their links should be deleted from the database.
How can I go about this in an efficient way? My models are as below:
class Photo(ImageModel):
name = models.CharField(max_length = 100)
original_image = models.ImageField(upload_to = 'photos')
num_views = models.PositiveIntegerField(editable = False, default=0)
position = models.ForeignKey(PhotoPosition)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type', 'object_id')
class IKOptions:
spec_module = 'vehicles.specs'
cache_dir = 'photos'
image_field = 'original_image'
save_count_as = 'num_views'
class Vehicle(models.Model):
objects = VehicleManager()
stock_number = models.CharField(max_length=6, blank=False, unique=True)
vin = models.CharField(max_length=17, blank=False)
....
images = generic.GenericRelation('Photo', blank=True, null=True)
Progress Update
I've tried out the code, and while it works, I'm missing something as I can get the image, but after that, they aren't transferred into the site_media/photos directory...am I suppossed to do this or imagekit will do this automatically? I'm a bit confused.
I'm saving the photos like so:
Photo.objects.create(content_object = vehicle, object_id = vehicle.id,
original_image = file)
My advice is running django script in a crontab job, lets say, 5 in 5 minutes.
The script would dive into the image folders and compare the images with the records.
A simplified example:
# Set up the Django Enviroment
from django.core.management import setup_environ
import settings
setup_environ(settings)
import os
from your_project.your_app.models import *
from datetime import datetime
vehicles_root = '/home/vehicles'
for stock_number in os.listdir(vehicles_root):
cur_path = vehicles_root+'/'+stock_number
if not os.path.isdir(cur_path):
continue # skip non dirs
for file in os.listdir(cur_path):
if not isfile(cur_path+'/'+file):
continue # skip non file
ext = file.split('.')[-1]
if ext.lower() not in ('png','gif','jpg',):
continue # skip non image
last_mod = os.stat(cur_path+'/'+file).st_mtime
v = Vehicle.objects.get(stock_number=stock_number)
if v.last_upd < datetime.fromtimestamp(last_mod):
# do your magic here, move image, etc.
v.last_upd = datetime.now()
v.save()