Parse large file and paginate / load its parts with scrolling - django

I'm looking for suggestions and the most Django way of loading large variable content (say massive 10,000 lines list) part by part to user page to display only some lines before user asks for more.
This is a detailed scenario (I hope it makes sense to you. It is just a simple example to help dealing with large template variables and pagination):
User goes to website.com/searchfiles which is hosted on my Django backend and returned as a template
searchfiles.html template contains one form with Select drop-down menu to let choose a file that already exists on server (say there are 20 massive log files). Below the drop-down menu there is a text box that allows user to enter a regular expression string. So only two items in the form.
P.S. Each file is usually pretty big e.g. 20-30MB
When user selects the file and enters regular expression in the text box and clicks on "Submit", HTTP POST is made
Django backend receives POST, reads the filename + regexp string and executes function dosearch(FILE, pattern)
dosearch function does something like this:
dosearch(FILE, pattern):
result = []
fh = open(FILE, 'r')
for line in fh:
if re.match(pattern, line):
result.append(line)
return result
Now, result is a list that, depending on pattern, can be pretty large (e.g. 10-20MB). Processing of the file is completed now and I want to present user with "result" variable. After HTTP post, user is redirected to website.com/parsed.
As you can imagine, my goal in step 7 is to return this variable to the user after HTTP POST. But because "result" variable can be huge, I don't want to just dump, say, 10,000 lines of output directly to the page. What I want to achieve is the way that maybe first 200 lines are displayed, and as user is scrolling down, additional 200 lines are loaded once user reaches the bottom of the page.
To keep it simple, ignore the scroll part. User can be also presented with [NEXT] button to click and load additional 200 entries and so on.
What is the most Django way to achieve this? Do I need to save results variable to database and use Ajax?
Also assume that multiple users are going to use the very same page/website so I need to be able to distinguish between two users searching two different files at the same time.
When user navigates out, "result" variable that was generated should be destroyed from the memory.

I can think of two possibilities:
A. using a Model
class ResultLine(models.Model):
user = models.ForeignKey(User)
sequence_number = models.IntegerField()
line = models.CharField(max_length=1000)
created_at = models.DateTimeField(auto_now_add=True)
After parsing the file you would store each result line as a instance of this model, using sequence_number to specify the order of the lines.
In your result view you could use pagination or generic ListView to show the first lines, or use AJAX to fetch more lines.
You would need to add a delete button to clear the users data from this model, or run periodical jobs (maybe using crontab and a custom management command) to delete old result lines.
B. using session data
Another possibility would be to store the result in the users session.
request.session['result_list'] = dosearch(FILE, pattern)
Depending on the session engine there could be size restrictions; this post states that the database-backed sessions are only limited by the database engine (which means you could store many MB or even GB of data in the session).
Also, your server needs sufficient RAM to hold the whole result list of multiple users.
And later in your result view you just read from the session instead from a model.
Performance-wise there are differences: both approaches store the data in the database (with database-backed sessions), but option A allows you to do partial reads in your result view, while option B always reads the whole result list into memory on each request (because the whole session dict is stored in encoded format).

Related

Django: Correct way of passing data object from view to template to view?

From a template users can upload a csv file which gets parsed in
def parseCSV(request):
magic happens here (conforming date formats and all such fun things)
return column names to template
This view returns a list of columns and the user is asked to pick x columns to save.
The users choice is posted to
def saveCSV(request):
logic for saving
Now my question is, how do I most correctly handle the csv data object between view 1 and 2? Do i save it as a temperary file or do i send it back and forth view1->template->view2 as a data object? Or maybe something third?
There is no "correct" way as it all depends on the concrete situation. In this case, it depends on the size of the data from the CSV file. Given that the data is rather large, the best approach is most likely to store the parsed data on the server, and then in the next request only send the user's selection of the full data set.
I would suggest you to parse the data and store it as a JSON blob in the database, so that you can easily retrieve it for the next request. This way you can send the user's selection of rows and columns (or "coordinates"), and save that as real data afterwards. The benefit of storing it right away is that the user can return to the process even after leaving the flow. The downside is, though, that you save unused data, if the user never completes the process, and you might need to clear this later. If you store it in a table containing only temporary data, it should ease the cleaning process.
I would like to parse the CSV file at the frontend and give an option to user to choose columns. After choosing columns, I would send these columns with value to Backend.

Mapping user spreadsheet columns to database fields

I’m not sure where to start on this project. I know how to read the contents of the excel spreadsheet, I know how to identify the header row, I know how to loop over the contents. I believe I have the UX portion worked out but I am not sure how to process the data.
I’ve googled and only found .Net solutions but I’m looking for a ColdFusion/Lucee solution.
I have a working form allowing me to map a user's spreasheet column to my database values (this is being kept simple for this post; user does not have direct access to the database).
Now that I have my data, I'm not sure how to loop over the data results. I believe there will be several loops (an outer and an inner). Then of course I also need to loop over the file contents but I think if I can get the headings mapped out,I can figure out the remaining.
Any good links, tutorials, or guides would be greatly appreciated.
Some pseudo code might be enough to get me started.
User uploads form
System reads headers and content.
User is presented form with a list of columns from their uploaded spreadsheet to match with available database fields (eg “column1” matches “customer name”.
User submits form.
Now what?
UPDATED
Here is what the data looks like AFTER the mapping has been done in my form. The column deliiter is the ::: and within the column the ||| indicates the ID associated with the selected column value. I've included the id and the column value since I plan on displaying the mapping again as a confirmation. Having the ID saves a trip to the database.
If I understand correctly, your question is: how do you provide the user a form allowing them to map their spreadsheet columns to that of the database
Since you have their spreadsheet column names, and you have the database column names, then this problem is essentially a UI/UX problem. You need to show both lists, and allow the user to map them. I can imagine several approaches to this. My first thought would be some sort of drag/drop operation, as follows:
Create a list of boxes, one for each field in your database table, and include the field name in (or above) the box. I'll call this the db field list. Then, create another list for each column from the spreadsheet, which I'll call the spreadsheet column list. The user would drag/drop items from the spreadsheet column list to the db field list.
When a mapping has been completed by the user, you would store the column/field names in as data for the DOM element of the db field list box. Then upon submission, you would acquire the mapping data by visiting each box and adding it to an array. Then you would serialize that array into JSON and send that to your form submission handler.
This could be difficult or easy, depending on your knowledge of UI implementations using JavaScript. jQuery makes this easy (if you know jQuery). There's even a jquery UI plugin that does this: https://jqueryui.com/droppable/.
A quick search for javascript drag drop would help, and here's a few articles I found:
https://www.w3schools.com/html/html5_draganddrop.asp
https://medium.com/quick-code/simple-javascript-drag-drop-d044d8c5bed5
You would also need to submit the array of mappings using javascript. You could search for that as well, and here's an article I found:
https://codereview.stackexchange.com/questions/94493/submit-an-array-as-an-html-form-value-using-javascript

Trying to minimize the number of trips to a database voting table

I use django 1.10.1, postgres 9.5 and redis.
I have a table that store users votes and looks like:
==========================
object | user | created_on
==========================
where object and user are foreign keys to the id column of their own tables respectively.
The problem is that in many situations, I have to list many objects in one page. If the user is logged in or authenticated, I have to check for every object whether it was voted or not (and act depending on the result, something like show vote or unvote button). So in my template I have to call such function for every object in the page.
def is_obj_voted(obj_id, usr_id):
return ObjVotes.objects.filter(object_id=obj_id, user_id=usr_id).exists()
Since I may have tens of objects in one page, I found, using django-debug-toolbar, that the database access alone could take more than one second because I access just one row for each query and that happens in a serial way for all objects in the page. To make it worse, I use similar queries from that tables in other pages (i.e. filter using user only or object only).
What I try to achieve and what I think it is the right thing to do is to find a way to access the database just once to fetch all objects voted filtered by some user (maybe when the user logs in in or the at the first page hit requiring such database access), and then filter it further to whatever I want depending on the page needs. Since I use redis and django-cacheops app, can it help me to do that job?
In your case I'd better go with getting an array of object IDs and querying all votes by user's ID and this array, something like:
object_ids = [o.id for o in Object.objects.filter(YOUR CONDITIONS)]
votes = set([v.object_id for v in ObjVotes.objects.filter(object_id__in=object_ids, user_id=usr_id)]
def is_obj_voted(obj_id, votes):
return obj_id in votes
This will make only one additional database query for getting votes by user per page.

Making a Row Read Only in a tabular form based on table value

I have a tabular form which is updated throughout the year and i wanted to prevent users from editing certain rows. Currently the 'row type' is hard coded however I want the application admin to control which 'row types' are readable / write at certain times. My answered question, click here.
Currently a dynamic action is fired which prevents the rows that contain the type 'manager figure' and 'sales_target' being edited.
I have created a table with the three row types against each customer. Each status is set by a number: 0 to 3 (These i will decode into something meaningful for users).
0 - Row with that row type is read only.
1 - Users can enter into the row with that row type.
2 - row is read only with that row type.
3 - row is complete and set to read only.
I have created a new form (new tab) for the admin user to maintain each status.
Currently for Customer 'Big Toy Store' rows should be set as follows:
Manager Figure row should be read only (since set to 2)
Sales should be readable (since set to 0)
Sales target should be writable (since set to 1)
Please can i be pointed in the right direction, ive looked into jquery but struggling to work out how to pass the output of an sql query to it, so it can be used to determine which rows should be read only.
Link:apex.oracle.com
workspace: apps2
user: developer.user
password: DynamicAction
application name: Application 71656 Read only Rows for Tabular Form
I'm not sure that a tabular form is a good format to work out this idea. As you can see, you require quite a bit of javascript to produce the results you want. Not only that, but this is all client side too, and thus there are some security risks to take into account. After all, I could just run some Firebug and disable or revert all things you did, and even change the numbers. Especially with sales figures, which is something you most definitely do want altered by everybody and is also the nature of your question, security is important.
There are more elegant ways here for you to control this, and not in the least to reduce the amount of highly customized javascript code. For example, you could do away with the tabular form, and instead implement a modal popup from an interactive report. Since the modal popup would be an iframe and thus a different page, you can create a form page. On a form page you have a lot more control over what happens to certain elements. You can specify conditions, read-only conditions, or use authorization schemes. All things you can not evidently use in a tabular form.
I'd think you'd do yourself a service by thinking this over again, and explore a different option. How much of a dealbreaker is using a tabular form actually?
You need the user. You need to know what group he belongs to, and then this has to be checked against the different statusses and rows have to be en/disabled. Do you really want this to happen on the client side?
I'm not saying it can't be done in a tabular form and javascript. It can, I'm just really doubting this is the correct approach!

Django create several records at a time

I want to be able to create as many records as a user wants for a database table in a single form.
For example, there will be some inputs for the data required for a record and at the end of the line a "+" button that would make a new line of inputs appear. There should be no limit to the number of lines and when the user clicks on the single submit button, all of the records would be inserted.
Thing is : I don't know how to make a new line appear dynamically, I suppose I have to use jquery for that but I'm kind of a newbie :)
And I don't know how I can iterate through all the lines dynamically added.
If someone can point to an example or something, it'd save me a lot of hair pulling !
EDIT :
By following this blog post I managed to do that. I have one last problem which is : when I try to insert several records at one time, it keeps the last one fine, but the previous ones are considered empty. It tells me that the fields are required, I fill them up and click on save and only then it saves them allright.
Maybe I'll ask a new question for this!
Start here: https://docs.djangoproject.com/en/1.3/topics/forms/modelforms/#model-formsets
I suppose I have to use jquery for that
That can also work.
And I don't know how I can iterate through all the lines dynamically added.
You'll get all the fields of the form (all of them) in your request.POST object. If you use a formset, it will largely be handled by the form's clean() and save() methods.