Search & replace row by row, letter by letter in table using Python - python-2.7

I wrote a script for ArcMap using python that would take a table with unsupported characters, Romanize or transliterate those applicable fields, and create a shapefile with any geographic information that might also be contained in the table. The rest of the code I have determined is working alright. My main problem is the ability to search letter by letter within each row of the input table, which I had working earlier but I guess I reverted to an earlier error.
# Loop through each row in the copied table and replacing each cell with one based on the reference table.
rows = access.SearchCursor(outtable, orfield) # Creates a search cursor that looks in the outtable for what is in the native language field.
row = rows.next() # Establishes a variable that cycles row to row.
while row: # Beginning of "while" loop.
for r in row: # Searches each letter within the searched row.
for o in orfield: # Searches each cell based on the searched field from the reference table.
if r == o: # If/Else statement for seeing if the letter being searched for in the intable (r) is identical to the letter in the orthography field (o).
r.replace(orfield, profield) # Replaces any identical letters with the associated pronunciation.
else:
row.next() # Cycles to the next row.
I feel like I'm missing something but not quite sure. If you need me to elaborate more on what else is contained in my script, let me know. Don't necessarily need to write the script for me but if there's a module or function that I could you, let me know what it is and where I can read about it.

A little fuzzy on some of your details but it appears the code is attempting to compare a Field data object (created w/for r in row) with the elements in some input set, which you seem to imply is a string. Aside from the type mismatch of Field vs string, I believe that a Row object is not iterable the way you've written it. You can get the fields with something like:
fldList = list()
for fld in arcpy.ListFields(r'C:\Workspace\somedata.shp'):
fldList.append(fld.name)
and then iterate of fldList using something like:
for fld in fldList:
fldValue = row.getValue(fld)
....do some regex/string parsing
....if condition met, use row.setValue(fld, newValue) and rows.update(row)

Related

Postgrest filter does not seem work with fields from related table

I would like to retrieve records of kiscourse where the related joblist.job has a value that contains tech within the joblist.job string value.
This returns the expected results:
/joblist?select=job,kiscourse_id(*)&limit=10&job=ilike.*tech*
This does not:
/kiscourse?select=*,joblist(*)&limit=10&joblist.job=ilike.*tech*
And according to: https://postgrest.com/en/v4.1/api.html#embedded-filters-and-order, this seems to be the intended:
> GET /films?select=*,roles(*)&roles.character=in.Chico,Harpo,Groucho
> HTTP/1.1
Once again, this restricts the roles included to certain characters
but does not filter the films in any way. Films without any of those
characters would be included along with empty character lists.
Is there any way to accomplish the above (besides procedures)?
the joblist.job filter you have there will affect only the entities on the second level, it does not apply to the first level.
The way to read this query
/kiscourse?select=*,joblist(*)&limit=10&joblist.job=ilike.*tech*
is this:
Give me 10 rows from kiscourse with all the columns, and for each row, i want the joblists for that row that are named *tech*

Can I Iterate rows of a CSV to create keys paired with empty lists?

What I have:
A CSV which I populated with rows of text, which are one word per cell.
Micro level: Attempting:
I am trying to create a dictionary where each row is a Key and each Key is assigned an empty list as a variable (see below).
I can do this one row at a time by converting the list to a tuple -->
creating an empty list -->
Adding the tuple to my dictionary as a key and assigning the empty list as the variable
However, I would like to do this in an automatic fashion as doing this individually is tedious.
Macro level: Attempting:
I want to assign a list of keywords (tags) to each row in my CSV to call upon the text later based on their tags.
My question:
Is there a way to do this the way I am describing?
Am I going about it wrong and should be doing this a different way?
*edit: I am thinking that if I flip this I could solve my overall issue.
For example make x amount of tags as key values for my tag dictionary and make a one time run through to assign each key with a empty dictionary value. Then populate the dictionaries with the text from my CSV.
This would not remove the one by one method; however, would reduce the amount of times I would need to enter Key/Value pairs as I am more likely to have more text than tags.
see code below
!#Python3
import csv
import os
import string
#open CSV and assign var to the list content
outputFile = open("output.csv", encoding="utf-8")
outputReader = csv.reader(outputFile)
data = list(outputReader)
#Get rid of empty cells
for list in data:
for object in list:
while "" in list:
list.remove("")
#open a dictionary
tags = {}
#Turn first row of CSV into a tuple
article1 = tuple(data[1])
#generate empty list
article1_tags = []
#Assign empty list as a variable to the article1 Key and put in tags dictionary
While True:
if article1 in tags :
break
else:
tags[article1] = article1_tags
Now that I have bit more of idea of what you are trying to acheive I would suggest using a list of dictionary's. Each dictionary containing the data about each article (or row from your csv file). The key here is that a csv file is still a plain text file, there is nothing special about csv. In fact I would avoid using excel altogether and edit using a text editor.
I would start by opening the file and reading each row (line) from the file into a key/value pair of a dictionary.
The cool thing about python 3 is that you do that very easily without extra modules.
csvfile = open('output.csv', encoding='utf-8')
articlelist = []
for line in csvfile:
articlelist.append(dict(textkey=line,tagskey=[]))
Using the iterator 'line' in this context with a text file stream object will automatically go row by row and take all the text of that line as a single string. So line is a string object here.
Once you have list of dictionary's like this you can simply iterate through the articlelist printing out or adding tags or doing whatever you wish even adding more key/value pairs to each dictionary. Doing it in this way means that not all the dictionary's need to follow the same format (although thats desirable).
I added the tagskey key and the value is an empty list which you can add to later.
Do not use infinite while loops or while loops at all to go through lists etc. Always use the
for iterator in theList:
method.
I would also look into using the JSON format for your little exercise here. I think it will lend itself much nicer to what you are trying to acheive. And with Python JSON is very easily read and then output again all using plain text. You could then output to a JSON text file manually edit it and then python read it again and process it.
I hope this helps.

Filtering on the concatenation of two model fields in django

With the following Django model:
class Item(models.Model):
name = CharField(max_len=256)
description = TextField()
I need to formulate a filter method that takes a list of n words (word_list) and returns the queryset of Items where each word in word_list can be found, either in the name or the description.
To do this with a single field is straightforward enough. Using the reduce technique described here (this could also be done with a for loop), this looks like:
q = reduce(operator.and_, (Q(description__contains=word) for word in word_list))
Item.objects.filter(q)
I want to do the same thing but take into account that each word can appear either in the name or the description. I basically want to query the concatenation of the two fields, for each word. Can this be done?
I have read that there is a concatenation operator in Postgresql, || but I am not sure if this can be utilized somehow in django to achieve this end.
As a last resort, I can create a third column that contains the combination of the two fields and maintain it via post_save signal handlers and/or save method overrides, but I'm wondering whether I can do this on the fly without maintaining this type of "search index" type of column.
The most straightforward way would be to use Q to do an OR:
lookups = [Q(name__contains=word) | Q(description__contains=word)
for word in words]
Item.objects.filter(*lookups) # the same as and'ing them together
I can't speak to the performance of this solution as compared to your other two options (raw SQL concatenation or denormalization), but it's definitely simpler.

How to create a filter from request.GET parameters? [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
filter using Q object with dynamic from user?
I am working on a filter feature in my app. I send a comma separated string via jquery to Django (within jquery I replace the space with a +, so that it can be sent over the wire).
/?ajax&sales_item=t2,+t1
Now in the view when I retrieve the GET parameters, I can see Django has already replaced the + with a space, which is great. Then I split the keywords by comma and strip the whitespace.
sales_item_raw = request.GET['sales_item']
sales_item_keywords = sales_item_raw.split(',')
I need first to check if the given names even exist as sales item. I have to use a icontains, hence sales_items can be more than one item.
for item in sales_item_keywords:
sales_items = profile.company.salesitem_set.filter(item_description__icontains=item.strip())
Last but not least the queryset is used to filter deals for the given sales_items:
deals_queryset = deals_queryset.filter(sales_item__in=sales_items)
If the user filters for only one keyword that would work fine, however if there are two keywords the sales_items will be obviously overwritten in each loop iteration.
What is the most performant way to solve this? Shall I just append the content of sales_itemsin each iteration to a list outside the loop? And eventually send the new list to the final deals_queryset.filter?
I am not sure if this is a good way to solve this...
Use Django's Q object to create "or" logic in your filter.
# create a chain of Qs, one for each item, and "or" them together
q_filters = Q(item_description__icontains=sales_item_keywords[0].strip())
for item in sales_item_keywords[1:]:
q_filters = q_filters | Q(item_description__icontains=item.strip())
# do a single filter with the chained Qs
profile.company.salesitem_set.filter(q_filters)
This is ugly code, as I'm not sure how to handle the initial Q elegantly, as I'm not sure what is an "empty" Q to which you can chain all other Qs, including the first one. (I'm guessing you could use Q(pk=pk) but that's ugly in a different way.)
EDIT: Ignacio's link above shows the way, i.e.
q_filters = reduce(operator.or_, (Q(item_description__icontains=item.strip()) for item in sales_items_keywords))
profile.company.salesitem_set.filter(q_filters)

Word count query in Django

Given a model with both Boolean and TextField fields, I want to do a query that finds records that match some criteria AND have more than "n" words in the TextField. Is this possible? e..g.:
class Item(models.Model):
...
notes = models.TextField(blank=True,)
has_media = models.BooleanField(default=False)
completed = models.BooleanField(default=False)
...
This is easy:
items = Item.objects.filter(completed=True,has_media=True)
but how can I filter for a subset of those records where the "notes" field has more than, say, 25 words?
Try this:
Item.objects.extra(where=["LENGTH(notes) - LENGTH(REPLACE(notes, ' ', ''))+1 > %s"], params=[25])
This code uses Django's extra queryset method to add a custom WHERE clause. The calculation in the WHERE clause basically counts the occurances of the "space" character, assuming that all words are prefixed by exactly one space character. Adding one to the result accounts for the first word.
Of course, this calculation is only an approximation to the real word count, so if it has to be precise, I'd do the word count in Python.
I dont know what SQL need to be run in order for the DB to do the work, which is really what we want, but you can monkey-patch it.
Make an extra fields named wordcount or something, then extend the save method and make it count all the words in notes before saving the model.
The it is trivial to loop over and there is still no chance that this denormalization of data will break since the save method is always run on save.
But there might be a better way, but if all else fails, this is what I would do.