Django: Correct way of passing data object from view to template to view? - django

From a template users can upload a csv file which gets parsed in
def parseCSV(request):
magic happens here (conforming date formats and all such fun things)
return column names to template
This view returns a list of columns and the user is asked to pick x columns to save.
The users choice is posted to
def saveCSV(request):
logic for saving
Now my question is, how do I most correctly handle the csv data object between view 1 and 2? Do i save it as a temperary file or do i send it back and forth view1->template->view2 as a data object? Or maybe something third?

There is no "correct" way as it all depends on the concrete situation. In this case, it depends on the size of the data from the CSV file. Given that the data is rather large, the best approach is most likely to store the parsed data on the server, and then in the next request only send the user's selection of the full data set.
I would suggest you to parse the data and store it as a JSON blob in the database, so that you can easily retrieve it for the next request. This way you can send the user's selection of rows and columns (or "coordinates"), and save that as real data afterwards. The benefit of storing it right away is that the user can return to the process even after leaving the flow. The downside is, though, that you save unused data, if the user never completes the process, and you might need to clear this later. If you store it in a table containing only temporary data, it should ease the cleaning process.

I would like to parse the CSV file at the frontend and give an option to user to choose columns. After choosing columns, I would send these columns with value to Backend.

Related

Is there a way to pass data source connection string as a parameter to power bi embedded?

I have a pbix file that takes an Azure Storage account as a parameter and reads data from there accordingly. The next step is to be able to embed this powerbi dashboard on a webpage and let the end user specify the storage account. I see a lot of questions and answers surrounding passing in filter query parameters--this is different, we're trying to read from a completely different data source and not filtering on a static data source.
Another way to ask this question is: is there a way to embed powerbi template files, if not, is there a feature request somewhere we can upvote?
The short answer is no.
There is a reason to use filters in this case instead of parameters. Parameters are something that is part of the report itself. Each users that looks at your reports will get the same parameter values as the others. If one of them changes some parameter, this will affect all other users. Filters on the other hand, is something local for your session. You can filter the report the way you like, and this will not affect other users experience in any way.
You can't embed templates, because template is simply a state of the report on the disk. When you open it, it's not a template anymore, but becomes a report.
You can either combine the data from all of your data sources in a single report, adding one more column to indicate from where this data comes from, and then filter on this new column. Or create/modify ETL process (for example dataflows can be used for this) to combine these data sources into a single one.

Alternatives to dynamically creating model fields

I'm trying to build a web application where users can upload a file (specifically the MDF file format) and view the data in forms of various charts. The files can contain any number of time based signals (various numeric data types) and users may name the signals wildly.
My thought on saving the data involves 2 steps:
Maintain a master table as an index, to save such meta information as file names, who uploaded it, when, etc. Records (rows) are added each time a new file is uploaded.
Create a new table (I'll refer to this as data tables) for each file uploaded, within the table each column will be one signal (first column being timestamps).
This brings the problem that I can't pre-define the Model for the data tables because the number, name, and datatype of the fields will differ among virtually all uploaded files.
I'm aware of some libs that help to build runtime dynamic models but they're all dated and questions about them on SO basically get zero answers. So despite the effort to make it work, I'm not even sure my approach is the optimal way to do what I want to do.
I also came across this Postgres specifc model field which can take nested arrays (which I believe fits the 2-D time based signals lists). In theory I could parse the raw uploaded file and construct such an array and basically save all the data in one field. Not knowing the limit of size of data, this could also be a nightmare for the queries later on, since to create the charts it usually takes only a few columns of signals at a time, compared to a total of up to hundreds of signals.
So my question is:
Is there a better way to organize the storage of data? And how?
Any insight is greatly appreciated!
If the name, number and datatypes of the fields will differ for each user, then you do not need an ORM. What you need is a query builder or SQL string composition like Psycopg. You will be programatically creating a table for each combination of user and uploaded file (if they are different) and programtically inserting the records.
Using postgresql might be a good choice, you might also create a GIN index on the arrays to speed up queries.
However, if you are primarily working with time-series data, then using a time-series database like InfluxDB, Prometheus makes more sense.

Mapping user spreadsheet columns to database fields

I’m not sure where to start on this project. I know how to read the contents of the excel spreadsheet, I know how to identify the header row, I know how to loop over the contents. I believe I have the UX portion worked out but I am not sure how to process the data.
I’ve googled and only found .Net solutions but I’m looking for a ColdFusion/Lucee solution.
I have a working form allowing me to map a user's spreasheet column to my database values (this is being kept simple for this post; user does not have direct access to the database).
Now that I have my data, I'm not sure how to loop over the data results. I believe there will be several loops (an outer and an inner). Then of course I also need to loop over the file contents but I think if I can get the headings mapped out,I can figure out the remaining.
Any good links, tutorials, or guides would be greatly appreciated.
Some pseudo code might be enough to get me started.
User uploads form
System reads headers and content.
User is presented form with a list of columns from their uploaded spreadsheet to match with available database fields (eg “column1” matches “customer name”.
User submits form.
Now what?
UPDATED
Here is what the data looks like AFTER the mapping has been done in my form. The column deliiter is the ::: and within the column the ||| indicates the ID associated with the selected column value. I've included the id and the column value since I plan on displaying the mapping again as a confirmation. Having the ID saves a trip to the database.
If I understand correctly, your question is: how do you provide the user a form allowing them to map their spreadsheet columns to that of the database
Since you have their spreadsheet column names, and you have the database column names, then this problem is essentially a UI/UX problem. You need to show both lists, and allow the user to map them. I can imagine several approaches to this. My first thought would be some sort of drag/drop operation, as follows:
Create a list of boxes, one for each field in your database table, and include the field name in (or above) the box. I'll call this the db field list. Then, create another list for each column from the spreadsheet, which I'll call the spreadsheet column list. The user would drag/drop items from the spreadsheet column list to the db field list.
When a mapping has been completed by the user, you would store the column/field names in as data for the DOM element of the db field list box. Then upon submission, you would acquire the mapping data by visiting each box and adding it to an array. Then you would serialize that array into JSON and send that to your form submission handler.
This could be difficult or easy, depending on your knowledge of UI implementations using JavaScript. jQuery makes this easy (if you know jQuery). There's even a jquery UI plugin that does this: https://jqueryui.com/droppable/.
A quick search for javascript drag drop would help, and here's a few articles I found:
https://www.w3schools.com/html/html5_draganddrop.asp
https://medium.com/quick-code/simple-javascript-drag-drop-d044d8c5bed5
You would also need to submit the array of mappings using javascript. You could search for that as well, and here's an article I found:
https://codereview.stackexchange.com/questions/94493/submit-an-array-as-an-html-form-value-using-javascript

Parse large file and paginate / load its parts with scrolling

I'm looking for suggestions and the most Django way of loading large variable content (say massive 10,000 lines list) part by part to user page to display only some lines before user asks for more.
This is a detailed scenario (I hope it makes sense to you. It is just a simple example to help dealing with large template variables and pagination):
User goes to website.com/searchfiles which is hosted on my Django backend and returned as a template
searchfiles.html template contains one form with Select drop-down menu to let choose a file that already exists on server (say there are 20 massive log files). Below the drop-down menu there is a text box that allows user to enter a regular expression string. So only two items in the form.
P.S. Each file is usually pretty big e.g. 20-30MB
When user selects the file and enters regular expression in the text box and clicks on "Submit", HTTP POST is made
Django backend receives POST, reads the filename + regexp string and executes function dosearch(FILE, pattern)
dosearch function does something like this:
dosearch(FILE, pattern):
result = []
fh = open(FILE, 'r')
for line in fh:
if re.match(pattern, line):
result.append(line)
return result
Now, result is a list that, depending on pattern, can be pretty large (e.g. 10-20MB). Processing of the file is completed now and I want to present user with "result" variable. After HTTP post, user is redirected to website.com/parsed.
As you can imagine, my goal in step 7 is to return this variable to the user after HTTP POST. But because "result" variable can be huge, I don't want to just dump, say, 10,000 lines of output directly to the page. What I want to achieve is the way that maybe first 200 lines are displayed, and as user is scrolling down, additional 200 lines are loaded once user reaches the bottom of the page.
To keep it simple, ignore the scroll part. User can be also presented with [NEXT] button to click and load additional 200 entries and so on.
What is the most Django way to achieve this? Do I need to save results variable to database and use Ajax?
Also assume that multiple users are going to use the very same page/website so I need to be able to distinguish between two users searching two different files at the same time.
When user navigates out, "result" variable that was generated should be destroyed from the memory.
I can think of two possibilities:
A. using a Model
class ResultLine(models.Model):
user = models.ForeignKey(User)
sequence_number = models.IntegerField()
line = models.CharField(max_length=1000)
created_at = models.DateTimeField(auto_now_add=True)
After parsing the file you would store each result line as a instance of this model, using sequence_number to specify the order of the lines.
In your result view you could use pagination or generic ListView to show the first lines, or use AJAX to fetch more lines.
You would need to add a delete button to clear the users data from this model, or run periodical jobs (maybe using crontab and a custom management command) to delete old result lines.
B. using session data
Another possibility would be to store the result in the users session.
request.session['result_list'] = dosearch(FILE, pattern)
Depending on the session engine there could be size restrictions; this post states that the database-backed sessions are only limited by the database engine (which means you could store many MB or even GB of data in the session).
Also, your server needs sufficient RAM to hold the whole result list of multiple users.
And later in your result view you just read from the session instead from a model.
Performance-wise there are differences: both approaches store the data in the database (with database-backed sessions), but option A allows you to do partial reads in your result view, while option B always reads the whole result list into memory on each request (because the whole session dict is stored in encoded format).

Fetching data from large BigQuery table in python

What I have is a BigQuery table(>5mil rows).
I need to fetch this data in batches and process it inside AppEngine, python.
The only way to fetch from a table that I know is to run SELECT query on this table and then iterate the result using tokens fetch_data returns.
It looks like this:
query = u"""\
SELECT url FROM %s
""" % (query_table)
query_job = client.run_async_query(str(uuid.uuid4()), query)
query_job.begin()
wait_for_job(query_job, 1)
query_results = query_job.results()
rows, total_rows, next_token = query_results.fetch_data(max_results=per_page, page_token=page_token)
This works on smaller tables, but on larger ones like mine it asks to allow large requests and specify target table. But this makes no sense to me. For to simply fetch data from a table I have to copy it to another table?
What you are running into is described in this documentation. In summary, apart from the limit on how much data can be fetched at a time, there is a point where your results become "large results." This is when your results are more than 128MB compressed as described here. When your results are classified as large, you can only store the result of a query in a table in Big Query.
Unfortunately I'm not sure there's a nice way to do what you want without reducing how many rows you are retrieving at once. What you'll likely need to do is explore the exporting data documentation for big query.
You should use tabledata.list API for fetching data from table.
Using parameters (startIndex or pageToken) and maxResults you can control size of page you fetch.
I think this is exactly what you need link, as far as I understood you can't get a large result of a query but you can get the entire table data to your app no mater how big it is, thats why you need to put the large result in a table and then get this table data to your app and do whatever you want with it
good luck :)