I'm trying to build a whole sheet from scratch, and stay efficient while doing it.
For that purpose, I am trying to rely on bulk operations.
I can build a massive list of rows and add them easily using add_rows().
However, I need some rows to be children of other rows, and neither row.indent nor row.parent_id seem possible to set on new rows (since the fresh rows don't have an id yet).
I could possibly: create the parent row > add_rows() > get_sheet() > find the row id in sheet > create the child row > add_rows() but I'm losing the benefits of bulk operations.
Is there any way at all so set child/parent relationships in python before ever communicating with the smartsheet server?
[Edit] Alternatively, a way to export an excel file via the SDK (or other) would also work, as I'm able to create my table with xlsxwrite and upload it manually to smartsheet at the moment. (Which is not an option, as we're trying to generate dozens of sheets, multiple times a day, got to automate it.)
Thanks
You cannot create a sheet with hierarchy in a single call. All rows in a single POST or PUT must have the same location specifier.
You can either:
(1) Add all rows as a flat list, then indent each contiguous group of child rows. Repeat down the hierarchy.
(2) Add top level rows, then add each contiguous group of indented rows
Related
Workflow
In a data import workflow, we are creating a staging table using CREATE TABLE LIKE statement.
CREATE TABLE abc_staging (LIKE abc INCLUDING DEFAULTS);
Then, we run COPY to import CSV data from S3 into the staging table.
The data in CSV is incomplete. Namely, there are fields partition_0, partition_1, partition_2 which are missing in the CSV file; we fill them in like this:
UPDATE
abc_staging
SET
partition_0 = 'BUZINGA',
partition_1 = '2018',
partition_2 = '07';
Problem
This query seems expensive (takes ≈20 minutes oftentimes), and I would like to avoid it. That could have been possible if I could configure DEFAULT values on these columns when creating the abc_staging table. I did not find any method as to how that can be done; nor any explicit indication that is impossible. So perhaps this is still possible but I am missing how to do that?
Alternative solutions I considered
Drop these columns and add them again
That would be easy to do, but ALTER TABLE ADD COLUMN only adds columns to the end of the column list. In abc table, they are not at the end of the column list, which means the schemas of abc and abc_staging will mismatch. That breaks ALTER TABLE APPEND operation that I use to move data from staging table to the main table.
Note. Reordering columns in abc table to alleviate this difficulty will require recreating the huge abc table which I'd like to avoid.
Generate the staging table creation script programmatically with proper columns and get rid of CREATE TABLE LIKE
I will have to do that if I do not find any better solution.
Fill in the partition_* fields in the original CSV file
That is possible but will break backwards compatibility (I already have perhaps hundreds thousands of files in there). Harder but manageable.
As you are finding you are not creating a table exactly LIKE the original and Redshift doesn't let you ALTER a column's default value. Your proposed path is likely the best (define the staging table explicitly).
Since I don't know your exact situation other paths might be better so me explore a bit. First off when you UPDATE the staging table you are in fact reading every row in the table, invalidating that row, and writing a new row (with new information) at the end of the table. This leads to a lot of invalidated rows. Now when you do ALTER TABLE APPEND all these invalidated rows are being added to your main table. Unless you vacuum the staging table before hand. So you may not be getting the value you want out of ALTER TABLE APPEND.
You may be better off INSERTing the data onto your main table with an ORDER BY clause. This is slower than the ALTER TABLE APPEND statement but you won't have to do the UPDATE so the overall process could be faster. You could come out further ahead because of reduced need to VACUUM. Your situation will determine if this is better or not. Just another option for your list.
I am curious about your UPDATE speed. This just needs to read and then write every row in the staging table. Unless the staging table is very large it doesn't seem like this should take 20 min. Other activity could be creating this slowdown. Just curious.
Another option would be to change your main table to have these 3 columns last (yes this would be some work). This way you could add the columns to the staging table and things would line up for ALTER TABLE APPEND. Just another possibility.
The easiest solution turned to be adding the necessary partition_* fields to the source CSV files.
After employing that change and removing the UPDATE from the importer pipeline, the performance has greatly improved. Imports now take ≈10 minutes each in total (that encompasses COPY, DELETE duplicates and ALTER TABLE APPEND).
Disk space is no longer climbing up to 100%.
Thanks everyone for help!
The context
In a wxWidgets (version 3.0.2) C++ application, I am trying to hide the first column of a wxListCtrl.
I did not find a member function to do this so I tried to set the width of the column to 0:
myListCtrl->SetColumnWidth(0, 0);
first argument being the column ID and second one the width in pixels (wxListCtrl documentation).
After running the program, the header of the first column is hidden as I wanted but the data of each row of the first column overlaps the data of each row of the second column (which is not hidden). It is obviously not what I want. The header and the data of the first column should be hidden.
The question
In wxWidgets 3.0.2, is there a way to hide the first column (header and data of each rows) of a wxListCtrl?
I don't believe you can. You have a few options.
Delete the column using DeleteColumn(int columnIndex). You aren't losing any data, just the display of it, so you can always re-insert the column and repopulate it if you need to re-add it. Obviously this could be time consuming if your data is excessively large.
Depending on your application, just don't create the column in the first place. You don't say why you want to hide it, so if you just don't want it, don't add it.
Implement your control as a virtual control which gives your application control over what to display where. The burden of data display management falls to you to do manually but you have a great deal more flexibility. Inherit the class with wxLC_VIRTUAL style and implement OnGetItemText http://docs.wxwidgets.org/3.0/classwx_list_ctrl.html#a92370967f97215e6068326645ee76624
Edit:
To expand on the comment question, how to get the selected item index:
The wxListCtrl is a little weird when it comes to selected items. I'm sure it has to do with needing to support report, icon, etc. different views. When dealing with a multi-column report mode, you might find that you can only select items in the first column. If you are on Windows, it should automatically be set to "Full Row Select" but I don't know about other OSs.
Anyway, here is a utility method that returns the first selected item (note that you can support multi-selection if you want to).
//Get the item currently selected
int ListView::GetItemSelected() const
{
for(int i=0; i<GetItemCount(); ++i)
if (GetItemState(i, wxLIST_STATE_SELECTED) == wxLIST_STATE_SELECTED)
return i;
return -1;
}
If you want (and it makes sense), you can connect the list item selected event.
this->Connect(wxEVT_COMMAND_LIST_ITEM_SELECTED, wxCommandEventHandler(ListView::selected_Changed), NULL, this);
and within that event handler, get the selected item and do what needs doing (depending entirely on your application).
You will note that I'm using a derived class here which just makes things a lot easier but you don't have to. You can connect to something like MyMainForm::sqlResults_selectedChanged or whatever.
There is more than one way to accomplish all this and you can also find some good suggestions and help here: https://wiki.wxwidgets.org/WxListCtrl
basically my whole career is based on reading question here but now I'm stuck since I even do not know how to ask this correctly.
I'm designing a SQLITE database which is meant for the construction of data sheets out of existing data sheets. People like reusing stuff and I want to manage this with a DB and an interface. A data sheet has reusable elements like pictures, text, formulas, sections, lists, frontpages and variables. Sections can contain elements -> This can be coped with recursive CTEs - thanks "mu is too short" for that hint. Texts, Formulas, lists etc. can contain variables. At the end I want to be able to manage variables which must be unique per data sheet, manage elements which are an ordered list making up the data sheet. So selecting a data sheet I must know which elements are contained and what variables within the elements are used. I must be able to create a new data sheet by re-using elements and/or creating new ones if desired.
I came so far to have (see also link to screen shot at the bottom)
a list of variables
which (several of them) can be contained in elements
a list of elements
elements make up the
a list of data sheets
Reading examples like
Store array in SQLite that is referenced in another table
How to store a list in a column of a database table
give me already helpful hints like that I need to create for each data sheet a new atomic list containing the elements and the position of them. Same for the variables which are referenced by each element. But the troubles start when I want to have it consistent and actually how to query it.
How do I connect the the variables which are contained within elements and the elements that are contained within the data sheets. How do I check when one element or variable is being modified, which data sheets need to be recompiled since they are using the same variables and/or elements?
The more I think about this, the more it sounds like I need to write my own search tree based on an object oriented inheritance class structure and must not use data bases. Can somebody convince me that a data base is the right tool for my issue?
I learned data bases once but this is quite some time ago and to be honest the university was not giving good lectures since we never created a database by our own but only worked on existing ones.
To be more specific, my knowledge leads to this solution so far without knowing how to correctly query for a list of data sheets when changing the content of one value since the reference is a text containing the name of a table:
screen shot since I'm a greenhorn
Update:
I think I have to search for unique connections, so it would end up in many-to-many tables. Not perfectly happy with it but I think I can go on with it.
still a green horn, how are you guys using correct high lightning for sql?
I was wondering if you can help me out with my current problem which is to insert data into multiple tables in my relational database using a single form. I am fairly new to APEX but do have a little bit of background on mysql and php programming. In the past, I normally achieve such task by creating a view of all the columns from different table that I want to populate and using a simple insert commands but doing the same thing in apex gives me and error stating that "ORA-01779: cannot modify a column which maps to a non key-preserved table".
In Oracle you can not just update a view which has eg a JOIN clause. Oracle will not map all columns back to the source tables: one table might while the others won't. This isn't an apex problem: if you were to run an update against your view in the db you would get this error just as well.
If you want to have your apex screen remain as transparent as possible, then you may want to consider user an instead-of trigger on the view. You will have to write the correct dml statements in this trigger though in order to ensure your data is pushed through correctly to all tables.
Another option is to use the view only to fetch, and use different processes to push the data to the correct tables. Using data-layer packages might alleviate the use of code stored in apex (eg having a lot of plsql code in apex itself is usually not favored and is rather stored in packages).
Create items and get all the items values and use PL/SQL on submit button.
Eg: p1_party_Name, p2_Service_Name
BEGIN;
INSERT INTO par VALUES(par_party_uid_seq.nextval,:p1_Party_name);
INSERT INTO par VALUES(ser_service_uid_seq.nextval,:p2_Service_name);
END;
I want to process all of the data in a column family in a MapReduce job. Ordering is not important.
An approach is to iterate over all the row keys of the column family to use as the input. This could be potentially a bottleneck and could replaced with a parallel method.
I'm open to other suggestions, or for someone to tell me I'm wasting my time with this idea. I'm currently investigating the following:
A potentially more efficient way is to assign ranges to the input instead of iterating over all row keys (before the mapper starts). Since I am using RandomPartitioner, is there a way to specify a range to query based on the MD5?
For example, I want to split the task into 16 jobs. Since the RandomPartitioner is MD5 based (from what I have read), I'd like to query everything starting with a for the first range. In other words, how would I query do a get_range on the MD5 with the start of a and ends before b. e.g. a0000000000000000000000000000000 - afffffffffffffffffffffffffffffff?
I'm using the pycassa API (Python) but I'm happy to see Java examples.
I'd cheat a little:
Create new rows job_(n) with each column representing each row key in the range you want
Pull all columns from that specific row to indicate which rows you should pull from the CF
I do this with users. Users from a particular country get a column in the country specific row. Users with a particular age are also added to a specific row.
Allows me to quickly pull the rows i need based on the criteria i want and is a little more efficient compared to pulling everything.
This is how the Mahout CassandraDataModel example functions:
https://github.com/apache/mahout/blob/trunk/integration/src/main/java/org/apache/mahout/cf/taste/impl/model/cassandra/CassandraDataModel.java
Once you have the data and can pull the rows you are interested in, you can hand it off to your MR job(s).
Alternately, if speed isn't an issue, look into using PIG: How to use Cassandra's Map Reduce with or w/o Pig?