Manipulating .xls columns and rows with Open Refine - geocoding

I need to manipulate a data set such that it can be mapped with Google Fusion Tables. Current xls data is formatted as follows:
Image of xls file with personal data anonymized
Note that a blank row indicates a new entry. I need the information in the column to be sorted into a rows under the appropriate heading, specifically the address for geocoding. Any ideas?

First, do some clean up to merge your second and third column into a single one and then use the feature Columnize by key/value column to transpose data in the third and fourth columns into separate fields.
Once this done, Fusion table should be able to geocode the dataset based on the address. If it is not the case, there is plenty of tutorials to geocode a dataset with OpenRefine. See:
OpenRefine wiki,
Google Maps,
OpenStreet Map,
Yahoo Maps.

Related

Google Cloud Bigtable: Storing multiple rows for a row key

In Bigtable, I am trying to create a column family corresponding to a row key in this format shown below.
Under the preferences column, there are multiple cells. Note that these are not multiple versions of the same cell, but multiple cells in a column corresponding to same row key.
Access patterns include:
reading all the preferences of user(RK)
reading the beta preference of a user
and so on.
How do I create a column family in this schema?
The most straightforward option is to create a column family called "preferences" with columns named "alpha", "beta", "gamma", etc. This structure is compatible with both reading all preferences (just read all columns) or a single preference (use a column filter).

Is there a way that POWERBI does not agregate all numeric data?

so, I got 3 xlsx full of data already treated, so I pretty much just got to display the data using the graphs. The problem seems to be, that Powerbi aggregates all numeric data (using: count, sum, etc.) In their community they suggest to create new measures, the thing is, in that case I HAVE TO CREATE A LOT OF MEASURES...Also, I tried to convert the data to text and even so, Powerbi counts it!!!
any help, pls?
There are several ways to tackle this:
When you pull a field into the field well for a visualisation, you can click the drop down in the field well and select "Don't summarize"
in the data model, select the column and on the ribbon select "don't summarize" as the summarization option in the Properties group.
The screenshot shows the field well option on the left and the data model options on the right, one for a numeric and one for a text field.
And, yes, you never want to use the implicit measures, i.e. the automatic calculations that Power BI creates. If you want to keep on top of what is being calculated, create your own measures, and yes, there will be many.
Edit: If by "aggregating" you are referring to the fact that text values will be grouped in a table (you don't see any duplicates), then you need to add a column with unique values to the table so all the duplicates of the text values show up. This can be done in the data source by adding an Index column, then using that Index column in the table and setting it to a very narrow with to make it invisible.

How to create a Fact table from multiple different tables in Pentaho

I have been following a tutorial on creating a data warehouse using Pentaho Data Integration/Kettle.
The tutorial is based off of a CSV file but I am practicing with the northwinds database and postgresql I am trying to figure out how to select values from more than one table then output them into a single table.
My ETL process goes like this: I have several stages for each table, values are selected from each table and stored in a stage table for each table in the database, from there I have my dimensions table set up but I am trying to figure out the step between the stages and the dimensions which is where I am trying to select the values to update the dimensions table.
I have several stages set up for each of my tables at this point I am not sure if I should create a separate values table for each table or a single values table. Any help would be greatly appreciated. Thanks
When I try to select values from multiple tables I get an error that says "we detected rows with varying number of fields" It' seems I would need to create separate tables with
In kette, the metadata structure of the data stream cannot change. As such, if row 1 has 3 columns, one integer and two strings, for example, all rows must have the same structure.
If you're combining rows coming from different sources, you must ensure the structure is the same. That error is telling you that some of the incoming streams of data have a different number of fields.

Sort visualization data using a column not displayed

This may look like a rookie question but in PowerBI, is it possible to
Not sort the data in a visualization (keep the natural order from the query)
If the answer to 1 is no, sort the data using a column that is not displayed in the visualization (for instance, sort table rows by an index that is not displayed in the table)
I found a workaround for charts (using the sort column as tooltip, the sort column becomes available in the available sorting columns) but I didn't for tables.
Edit: You can insert any column under Tooltips and it will be available for sorting in the same toolbar as in the screenshot below. Take a look at the answer here for further details.
Regarding:
Not sort the data in a visualization (keep the natural order from the
query)
I recently had problems with categorical data being sorted in a Visualization in a useless way, and not in the same way as in the query (or table in Power Query Editor). And If I'm not missing the point completely, you should be able to sort your data as you like by clicking the three dots in the top right of your visualization:

coldfusion 10 cfspreadsheet get meta data from columns

we are currently using cfspreadsheet to process excel spreadsheets that are being imported into our app.
At present we dont have an easy way of validating the data types that are imported from the data as we are trying to work with a QoQ object after we have the spreadsheet in memory.
Is there any sort of easy way to loop over a query object to detect the data types for each column in the query dataset ?
<cfspreadsheet action="read" src="#form.uploadedFile#" query="mycontent" headerrow="1" excludeheaderrow="yes">
<cfquery name="mycontent" dbtype="query">
SELECT *
FROM mycontent
</cfquery>
Ive tried looking for meta data functions for queries, but cant seem to find any
No. There are no built in methods that return the data types (or more accurately "cell types") of the values read from a spreadsheet. You must use the underlying POI library to access that information.
In addition, as Dan alluded to above, there is not an exact correlation between "cell types" and query "data types". Unlike database tables, a spreadsheet may contain multiple types of cells, within the same column. Just because the first cell in a column contains date, is no guarantee that the all of the cells in that column do as well. That is one of the reasons why all of the resulting query columns are assigned type varchar. Technically there are no "column" data types when it comes to spreadsheets.
That said, here is an example of how to extract the types of individual cells using POI. It is primarily geared towards examining cell format, but the basic concepts are the same.
Can you elaborate on the ultimate goal? ie How do you intend to use this information and how does it relate to your QoQ?