Is there a way to create reusable data fusion pipeline that can handle multiple table transformations.
Example: I have 2 tables in BigQuery dataset in raw format and I would like to create data fusion pipeline and load transformed data in another BigQuery dataset say:
Table 1:
<table>
<thead>
<tr>
<th>ID</th>
<th>first_name</th>
<th>last_name</th>
<th>age</th>
<th>address</th>
<th>salary</th>
</tr>
</thead>
<tbody>
<tr>
<td>e111</td>
<td>Amy</td>
<td>Fowler</td>
<td>34</td>
<td>123 Stevens Ave, Dallas, TX, 75252</td>
<td>105,000</td>
</tr>
<tr>
<td>e222</td>
<td>Leonard</td>
<td>Hoffstadar</td>
<td>32</td>
<td>Apt 213 Stamford Village, Stamford, CT, USA2</td>
<td>70,000</td>
</tr>
</tbody>
</table>
Table 2:
<table>
<thead>
<tr>
<th>Dept_id</th>
<th>Dept_name</th>
<th>Supervisor</th>
</tr>
</thead>
<tbody>
<tr>
<td>d123</td>
<td>abc</td>
<td>Amy</td>
</tr>
<tr>
<td>d234</td>
<td>xyz</td>
<td>John</td>
</tr>
</tbody>
</table>
Task:
Read table 1 and table 2 from Bigquery Raw Dataset
Apply transformation:
a). Concatenate first_name and last_name in table 1 to create another column with full name.
b). Uppercase dept_name in table 2
Load this transformed data to BigQuery Curated dataset
Now, I would like to create single reusable pipeline that could do the above job by maybe be passing table and field names as argument and the whole process is required to be automated without human input required.
Bigquery argument setter could be used, but I'd like to know if it is possible and how to apply separate transformations to individual tables in single pipeline. Does wrangler support this?
Related
I am currently in the process of learning myself, Power Bi. However, I got myself stuck when I tried to create a slicer which can manipulate what to show or hide in a line graph, which is located bellow the slicer. In order to have a better understanding here is the table that I created:
table, th, td {
border: 1px solid black;
}
<table>
<tbody>
<tr>
<td>
<p>Period</p>
</td>
<td>Module A </td>
<td>Module B </td>
<td>Module C </td>
</tr>
<tr>
<td>CW01 </td>
<td>80%</td>
<td>75% </td>
<td>90% </td>
</tr>
<tr>
<td>CW02 </td>
<td>82% </td>
<td>65% </td>
<td>92% </td>
</tr>
<tr>
<td>CW03 </td>
<td>83% </td>
<td>73% </td>
<td>88% </td>
</tr>
</tbody>
</table>
My end goal is for the slicer to have the column names included in the line graph and hide/show the selected column names from the slicer into the line graph bellow it. However, whenever I include the column inside my slicer it results in putting the values of that columns instead just the column names. I managed to solve this issue by creating a new table, this time with the column names as values, but I was wondering if there would be a more elegant way to solve this problem. For example, would it be possible to create a measure, which takes the column names of a certain table or maybe a measure that will act as a column with manually defined values in it? I would be really grateful if you can help me get into this matter further. Thank you in advance!
I'm trying to parse certain contents from table looking like below:
<table class="dataTbl col-4">
<tr>
<th scope="row">Rent</th>
<td>5.5</td>
<th scope="row">Management</th>
<td>3.3</td>
</tr>
<tr>
<th scope="row">Deposit</th>
<td>No</td>
<th scope="row">Other</th>
<td>No</td>
</tr>
<tr>
<th scope="row">Other2</th>
<td>No</td>
<th scope="row">Insurance</th>
<td>Yes</td>
</tr>
</table>
My goal is to find specific row (for example, Rent) and if there is a match, extract the content in the next <td> tag(For example, 5.5).
But how can I do it in Python?
I'm using Python3/Scrapy 1.3.0.
Thanks
In [9]: Selector(text=html).xpath('//th[text()="Rent"]/following-sibling::td[1]').extract()
Out[9]: ['<td>5.5</td>']
Use text()="Rent" to id the th tag
Use following-sibling:: get it's sibling and use [1] to get first
Using a python's regular expression.
r'\>text\<.+\n +\<td\>(\d+\.\d+)'
In your case, change text by Rent. Also, this is a useful web page to debug regular expressions.
Let me begin by saying that I'm a novice at ColdFusion and trying to learn so please bear with me.
I work in an apartment complex that caters to students from the local college. We have one, two and four bedroom apartments. Each room in an apartment is leased to an individual student. What I want to do is populate an HTML table with all the people in a room. My query is working and pulling all the relevant data but what is happening is that each person is being split out to their own HTML table instead of all the people in a room being put into the same table. Here is an example:
What I want
What is happening:
Here is my code:
<!---Begin data table--->
<cfoutput query = "qryGetAssignments">
<div class="datagrid">
<table>
<tr><td align="right"><strong>#RoomType#</strong></td></tr>
<thead>
<tr>
<th>#RoomNumber#</th>
</thead>
<tbody>
<tr><td><strong>#Bed#</strong>
| #FirstName# #LastName# :: #StudentNumber#
</td>
</tr>
</tbody>
</table>
</div>
</cfoutput>
I know why the output is coming out like it is, I just don't know how to fix it. I want there to be four residents in one table for a four bedroom apartment, two residents in a table for a two bedroom, and so on. Thanks in advance for your help.
Edit:
Sorry about the confusion. Here is a full pic of what I'm going for:
This should do what you need, assuming your query is properly ordered by roomType, for the <cfoutput group=""> to work.
<!---Begin data table--->
<cfoutput query="qryGetAssignments" group="roomType">
<div class="datagrid"><!--- If this isn't needed to style the tables, it can be moved outside the loop --->
<table>
<tr><td align="right"><strong>#qryGetAssignments.roomType#</strong></td></tr>
<thead>
<tr>
<th>#qryGetAssignments.roomNumber#</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<strong>#qryGetAssignments.bed#</strong>
<cfoutput><!--- this output here will loop over rows for that groupby --->
| #FirstName# #LastName# :: #StudentNumber#
</cfoutput>
</td>
</tr>
</tbody>
</table>
</div>
</cfoutput>
I've also scoped your query variables, at least I believe they are variables from a query.
That should work except it might need to be grouped by "roomNumber" such as N108.
I have an irregular table structure I would like to nest (either using actual tables or using div tables). I have a list of 'Parent' classes, each of which has many 'Children' instances, and I want to show them in the same table, with a single set of headers. I would like the output to be something like this (simplified - I have many more columns to display).
<table>
<tr>
<th>Parent</th>
<th>Child</th>
<th>Remove</th>
</tr>
<tr>
<td colspan="2">parent one</td>
<td>child one</td>
</tr>
<tr>
<td>child two</td>
</tr>
</table>
But I can't figure out how to accomplish this in handlebars because of the fact that I need to have a new <tr> element around every child except for the first one. This is easy enough to figure out in code, but I can't see how to do it cleanly in handlebars. Is it possible or should I write a custom view?
I need to be able to check xml with html-style table data to ensure that it's "rectangular". For example this is rectangular (2x2)
<table>
<tr>
<td>Foo</td>
<td>Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
This is not
<table>
<tr>
<td>Foo</td>
<td>Bar</td>
</tr>
<tr>
<td>Baz</td>
</tr>
</table>
This is complicated by row and column spans and the fact that I need to accept two styles of markup, either where spanned cells are included as empty td or where span cells are omitted.
<!-- good (3x2), spanned cells included -->
<table>
<tr>
<td colspan="2">Foo</td>
<td/>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- also good (3x2), spanned cells omitted -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
Here are a bunch of examples of bad tables where it's ambiguous how to deal with them
<!-- bad, looks like spanned cells are included but more cells in row 1 than 2 -->
<table>
<tr>
<td colspan="2">Foo</td>
<td/>
<td rowspan="2">Bar</td>
<td>BAD</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- bad, looks like spanned cells are omitted but more cells in row 1 than 2 -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
<td>BAD</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
<!-- bad, can't tell if spanned cells are included or omitted -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- bad, looks like spanned cells are omitted but a non-emtpy cell is overspanned -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td>BAD</td>
</tr>
</table>
I already have a working XSLT 2.0 solution for this problem that involves normalizing the data to the "spanned cells included" style then validating, however, my solution is cumbersome and starts to perform poorly for tables with an area of greater than 1000 cells. My normalization and validation routines involve iterating sequentially over the cells and passing along a param of cells that should be created by spans and inserting them when I pass their coordinates in the table. I'm not happy with either of them.
I'm looking for suggestions about cleverer ways in which to achieve this validation that hopefully would have better performance profiles on large tables. I need to account for th and td but omitted th from the examples for sake of simplicity, they can be included or ignored in any answers. I'm not checking to see if thead, tbody, and/or tfoot have the same width, this can also be included or omitted. I'm currently using XSLT 2.0 but I'd be interested in 3.0 solutions if they were significantly better than a solution implemented in 2.0.
I don't think this kind of problem is suited for XSLT - especially if you have to process very large tables.
I'd suggest to develop a solution using a procedural languge - maybe using XSLT to pre- or post- process the XML.