How can I loop through the request data and post it as one line in to the database, user can submit multiple descriptions, lengths and so on, problem I have is in the DB its creating massive amounts of rows to get to the correct format of the last one A1 but the user could submit A1,1,1,1,1; A2,2,2,8,100 and so on as its a dynamic add form)
descriptions = request.POST.getlist('description')
lengths = request.POST.getlist('lengthx')
widths = request.POST.getlist('widthx')
depths = request.POST.getlist('depthx')
quantitys = request.POST.getlist('qtyx')
for description in descriptions:
for lengt in lengths:
for width in widths:
for depth in depths:
for quantity in quantitys:
newquoteitem = QuoteItem.objects.create(
bottom entry is correct
post system

First solutions
Use formsets. That is exactly what they are meant to handle.
Second solution
descriptions = request.POST.getlist('description') is returning a list of all descriptions, so let's say there are 5, it iterates 5 times. Now lengths = request.POST.getlist('lengthx') is a list of all lengths, again, 5 of them, so it will iterate 5 times, and since it is nested within the descriptions for loop, that's 25 times!
So, although I still think formsets are the way to go, you can try the following:
descriptions = request.POST.getlist('description')
lengths = request.POST.getlist('lengthx')
widths = request.POST.getlist('widthx')
depths = request.POST.getlist('depthx')
quantitys = request.POST.getlist('qtyx')
for i in range(len(descriptions)):
newquoteitem = QuoteItem.objects.create(
Here, if there are 5 descriptions, then len(descriptions) will be 5, and there is one loop, which will iterate 5 times in total.


How to filter a Django queryset, after it is modified in a loop

I have recently been working with Django, and it has been confusing me a lot (although I also like it).
The problem I am facing right now is when I am looping, and in the loop modifying the queryset, in the next loop the .filter is not working.
So let's take the following simplified example:
I have a dictionary that is made from the queryset like this
animal_dict = {chicken: 6,
cows: 7,
fish: 1,
sheep: 2}
The queryset is called self.animals
for key in dict:
if dict[key] < 3:
remove_animal = max(dict, key=dict.get)
remove = self.animals.filter(animal = remove_animal)[-2:]
self.animals = self.animals.difference(remove)
key[replaced_industry] = key[replaced_industry] - 2
What I am trying to do is as follows: my goal is that there needs to be a balance under the animals. So since there are not enough fish, 2 of the animals with the highest n have to go (cows). And then in the second loop - since there are not enough sheep, 2 of the animals with the highest n have to go again (chicken).
Now the first time it loops (with fish), the .filter does exactly as it should. However, when I loop it a second time (for sheep), the remove = self.animals.filter(animal = remove_animal)[-2:] gives me an output is not in line with animal = filter. When I print the remove in the second loop, it returns a list of all different animals (instead of just 1).
After the loops, the dict should look like this: {chicken: 4,
cows: 5,
fish: 1,
sheep: 2}
This because first cow will go down 2 and it is the max, and then chicken will go down 2, as it is then the max
I am definitely missing some Django logic here, but to me this seems very strange. I hope the question is well-understood, else happy to clarify further.
As others have pointed out, every time you call self.animals.filter you are making a request to the database, and this should not be done in a loop.
It isn't very clear what you are trying to achieve, but it seems like you want the number of each type of animal to be (almost?) the same number, and that the only operation you can perform on it is reducing the number of animals.
Its always better to avoid loops if you can.
If you want them all to be the same number, and you can only reduce the number of animals you have, then the best solution would be
fewest_number = min(self.animals.values())
self.animals = {animal: fewest_number for animal in self.animals.keys()}
If you want, to say, allow a tolerance +1 of the fewest animal
fewest_number = min(self.animals.values()) + 1
self.animals = {animal: fewest_number for animal in self.animals.keys()}
If you can increase the number of each type of animal, then you could find the average:
average_number = sum(self.animals.values()) / len(self.animals)
self.animals = {animal: average_number for animal in self.animals.keys()}
Answering my own question here. Apparently it didn't work because only count(), order_by(), values(), values_list() and slicing of union queryset is allowed. You can't filter on union queryset and the same applies to .difference.
More about this here: Django: Filter a Queryset made of unions not working
Because in the second loop it is made into a union queryset, the filter function simply doesn't work. The weird thing is that this doesn't show an error, but will simply not filter, what makes it hard to detect.

REST Api pagination Loop... Power Query M language

I am wondering if anyone can help me with api pagination... I am trying to get all records from an external api but it restricts me with only getting maximum of 10. There are around 40k records..
The api also does not shows "no.of pages"(response below). hence i cant get my head around a solution.
There is NO "skip" or "count" or "top" supported either.. i am stuck...and i dont know how to create a loop in M language until all records are fetched. Can someone help me write a code or how it can look like
Below is my code.
Source = Json.Document(
RelativePath ="Search",
Headers =
#"Content-Type" = "application/json"
[key="status", operator="EqualTo", value="Active", resultType="Full"]
and below is output
"data": {
"totalCount": 6705,
"page": 1,
"pageSize": 10,
This might help you along your way. While I was looking into something similar for working with Jira, I found some helpful info from two individuals in the Atlassian Community site. Below is what I think might be a relevant snippet from a query I developed with the assistance of their posts. (To be clear this snippet is their code, which I used in my query.) While I'm providing more of the query (the segment of which is also comprised of their code) below, I think the key part that relates to your particular issue is this part.
yourJiraInstance = "",
Source = Json.Document(Web.Contents(yourJiraInstance, [Query=[maxResults="100",startAt="0"]])),
totalIssuesCount = Source[total],
// Now it is time to build a list of startAt values, starting on 0, incrementing 100 per item
startAtList = List.Generate(()=>0, each _ < totalIssuesCount, each _ +100),
urlList = List.Transform(startAtList, each Json.Document(Web.Contents(yourJiraInstance, [Query=[maxResults="100",startAt=Text.From(_)]]))),
// ===== Consolidate records into a single list ======
// so we have all the records in data, but it is in a bunch of lists each 100 records
// long. The issues will be more useful to us if they're consolidated into one long list
I'm thinking that maybe you could try substituting pageSize for maxResults and totalCount for totalIssuesCount. I don't know about startAt. There must be something similar available to you. Who knows? It could actually be startAt. I believe your pageSize would be 10 and you would increment your startAt by 10 instead of 100.
This is from Nick's and Tiago's posts on this thread. I think the only real difference may be that I buffered a table. (It's been a while and I did not dig into their thread and compare it for this answer.)
// I must credit the first part of this code -- the part between the ********** lines -- as being from Nick Cerneaz (and Tiago Machado) from their posts on this thread:
// **********
yourJiraInstance = "",
Source = Json.Document(Web.Contents(yourJiraInstance, [Query=[maxResults="100",startAt="0"]])),
totalIssuesCount = Source[total],
// Now it is time to build a list of startAt values, starting on 0, incrementing 100 per item
startAtList = List.Generate(()=>0, each _ < totalIssuesCount, each _ +100),
urlList = List.Transform(startAtList, each Json.Document(Web.Contents(yourJiraInstance, [Query=[maxResults="100",startAt=Text.From(_)]]))),
// ===== Consolidate records into a single list ======
// so we have all the records in data, but it is in a bunch of lists each 100 records
// long. The issues will be more useful to us if they're consolidated into one long list
// In essence we need extract the separate lists of issues in each data{i}[issues] for 0<=i<#"total"
// and concatenate those into single list of issues .. from which then we can analyse
// to figure this out I found this post particulary helpful (thanks Vitaly!):
// so first create a single list that has as its members each sub-list of the issues,
// 100 in each except for the last one that will have just the residual list.
// So iLL is a List of Lists (of issues):
iLL = List.Generate(
() => [i=-1, iL={} ],
each [i] < List.Count(urlList),
each [
i = [i]+1,
iL = urlList{i}[issues]
each [iL]
// and finally, collapse that list of lists into just a single list (of issues)
issues = List.Combine(iLL),
// Convert the list of issues records into a table
#"Converted to table" = Table.Buffer(Table.FromList(issues, Splitter.SplitByNothing(), null, null, ExtraValues.Error)),
// **********

python performance issue while searching in a huge list

I need to speed up (dramatically) the search in a "huge" single dimension list of unsigned values. The list has 389.114 elements, and I need to perform a check before I add an item to make sure it doesn't already exist
I do this check 15 millions times...
Of course, it takes too much time
The fastest way I found was :
if this_item in my_list:
i = my_list.index(this_item)
i = len(my_list)
I am building a dataset from time series logs
One column of these (huge) logs is a text message, which is very redondant
To dramatically speed up the process, I transform this text into an unsigned with Adler32(), and get a unique numeric value, which is great
Then I store the messages in a PostgreSQL database, with this value as index
For each line of my log files (15 millions all together), I need to update my database of unique messages (389.114 unique messages)
It means that for each line, I need to check if the message ID belongs to my in memory list
I tried "... in list", same with dictionaries, numpy arrays, transforming the list in a string and using, sql query in the database with good index...
Nothing better than "if item in list" when the list is loaded into memory (very fast)
if this_item in my_list:
i = my_list.index(this_item)
i = len(my_list)
For 15 millions iterations with some stuff and NO search in the list:
- It takes 8 minutes to generate 2 tables of 15 millions lines (features and targets)
- When I activate the code above to check if a message ID already exists, it takes 1 hour 35 mn ...
How could I optimize this?
Thank you for your help
If your code is, roughly, this:
my_list = []
for this_item in collection:
if this_item in my_list:
i = my_list.index(this_item)
i = len(my_list)
Then it will run in O(n^2) time since the in operator for lists is O(n).
You can achieve linear time if you use a dictionary (which is implemented with a hash table) instead:
my_list = []
table = {}
for this_item in collection:
i = table.get(this_item)
if i is None:
i = len(my_list)
table[this_item] = i
Of course, if you don't care about processing the items in the original order, you can just do:
for i, this_item in enumerate(set(collection)):

What's slowing down this piece of python code?

I have been trying to implement the Stupid Backoff language model (the description is available here, though I believe the details are not relevant to the question).
The thing is, the code's working and producing the result that is expected, but works slower than I expected. I figured out the part that was slowing down everything is here (and NOT in the training part):
def compute_score(self, sentence):
length = len(sentence)
assert length <= self.n
if length == 1:
word = tuple(sentence)
return float(self.ngrams[length][word]) / self.total_words
words = tuple(sentence[::-1])
count = self.ngrams[length][words]
if count == 0:
return self.alpha * self.compute_score(sentence[1:])
return float(count) / self.ngrams[length - 1][words[:-1]]
def score(self, sentence):
""" Takes a list of strings as argument and returns the log-probability of the
sentence using your language model. Use whatever data you computed in train() here.
output = 0.0
length = len(sentence)
for idx in range(length):
if idx < self.n - 1:
current_score = self.compute_score(sentence[:idx+1])
current_score = self.compute_score(sentence[idx-self.n+1:idx+1])
output += math.log(current_score)
return output
self.ngrams is a nested dictionary that has n entries. Each of these entries is a dictionary of form (word_i, word_i-1, word_i-2.... word_i-n) : the count of this combination.
self.alpha is a constant that defines the penalty for going n-1.
self.n is the maximum length of that tuple that the program is looking for in the dictionary self.ngrams. It is set to 3 (though setting it to 2 or even 1 doesn't anything). It's weird because the Unigram and Bigram models work just fine in fractions of a second.
The answer that I am looking for is not a refactored version of my own code, but rather a tip which part of it is the most computationally expensive (so that I could figure out myself how to rewrite it and get the most educational profit from solving this problem).
Please, be patient, I am but a beginner (two months into the world of programming). Thanks.
I timed the running time with the same data using time.time():
Unigram = 1.9
Bigram = 3.2
Stupid Backoff (n=2) = 15.3
Stupid Backoff (n=3) = 21.6
(It's on some bigger data than originally because of time.time's bad precision.)
If the sentence is very long, most of the code that's actually running is here:
def score(self, sentence):
for idx in range(len(sentence)): # should use xrange in Python 2!
def compute_score(self, sentence):
words = tuple(sentence[::-1])
count = self.ngrams[len(sentence)][words]
if count == 0:
self.ngrams[len(sentence) - 1][words[:-1]]
That's not meant to be working code--it just removes the unimportant parts.
The flow in the critical path is therefore:
For each word in the sentence:
Call compute_score() on that word plus the following 2. This creates a new list of length 3. You could avoid that with itertools.islice().
Construct a 3-tuple with the words reversed. This creates a new tuple. You could avoid that by passing the -1 step argument when making the slice outside this function.
Look up in self.ngrams, a nested dict, with the first key being a number (might be faster if this level were a list; there are only three keys anyway?), and the second being the tuple just created.
Recurse with the first word removed, i.e. make a new tuple (sentence[2], sentence[1]), or
Do another lookup in self.ngrams, implicitly creating another new tuple (words[:-1]).
In summary, I think the biggest problem you have is the repeated and nested creation and destruction of lists and tuples.

How to get 3 unique values using random.randint() in python?

I am trying to populate a list in Python3 with 3 random items being read from a file using REGEX, however i keep getting duplicate items in the list.
Here is an example.
import re
import random as rn
data = '/root/Desktop/Selenium[FILTERED].log'
with open(data, 'r') as inFile:
index =
URLS = re.findall(r'https://www\.\w{1,10}\.com/view\?i=\w{1,20}', index)
list_0 = []
for i in range(3):
list_0.append(URLS[rn.randint(1, 30)])
for i in range(len(list_0)):
What would be the cleanest way to prevent duplicate items being appended to the list?
This is the code that i think has done the job quite well.
def random_sample(data):
r_e = ['https://www\.\w{1,10}\.com/view\?i=\w{1,20}', '..']
with open(data, 'r') as inFile:
urls = re.findall(r'%s' % r_e[0],
x = list(set(urls))
return x
data = '/root/Desktop/[TEMP].log'
sample = random_sample(data)
for i in range(3):
Unordered collection with no duplicate entries.
Use the builtin random.sample.
random.sample(population, k)
Return a k length list of unique elements chosen from the population sequence or set.
Used for random sampling without replacement.
After seeing your edit, it looks like you've made things much harder than they have to be. I've wired a list of URLS in the following, but the source doesn't matter. Selecting the (guaranteed unique) subset is essentially a one-liner with random.sample:
import random
# the following two lines are easily replaced
URLS = ['url1', 'url2', 'url3', 'url4', 'url5', 'url6', 'url7', 'url8']
# the following one-liner yields the randomized subset as a list
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
print(urlList) # produces, e.g., => ['url7', 'url3', 'url4']
Note that by using len(URLS) and SUBSET_SIZE, the one-liner that does the work is not hardwired to the size of the set nor the desired subset size.
Addendum 2
If the original list of inputs contains duplicate values, the following slight modification will fix things for you:
URLS = list(set(URLS)) # this converts to a set for uniqueness, then back for indexing
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
Or even better, because it doesn't need two conversions:
URLS = set(URLS)
urlList = [u for u in random.sample(URLS, SUBSET_SIZE)]
seen = set(list_0)
randValue = URLS[rn.randint(1, 30)]
# [...]
if randValue not in seen:
Now you just need to check list_0 size is equal to 3 to stop the loop.