we are currently using cfspreadsheet to process excel spreadsheets that are being imported into our app.
At present we dont have an easy way of validating the data types that are imported from the data as we are trying to work with a QoQ object after we have the spreadsheet in memory.
Is there any sort of easy way to loop over a query object to detect the data types for each column in the query dataset ?
<cfspreadsheet action="read" src="#form.uploadedFile#" query="mycontent" headerrow="1" excludeheaderrow="yes">
<cfquery name="mycontent" dbtype="query">
SELECT *
FROM mycontent
</cfquery>
Ive tried looking for meta data functions for queries, but cant seem to find any
No. There are no built in methods that return the data types (or more accurately "cell types") of the values read from a spreadsheet. You must use the underlying POI library to access that information.
In addition, as Dan alluded to above, there is not an exact correlation between "cell types" and query "data types". Unlike database tables, a spreadsheet may contain multiple types of cells, within the same column. Just because the first cell in a column contains date, is no guarantee that the all of the cells in that column do as well. That is one of the reasons why all of the resulting query columns are assigned type varchar. Technically there are no "column" data types when it comes to spreadsheets.
That said, here is an example of how to extract the types of individual cells using POI. It is primarily geared towards examining cell format, but the basic concepts are the same.
Can you elaborate on the ultimate goal? ie How do you intend to use this information and how does it relate to your QoQ?
Related
In my document, I need to generate a table that has merged cells.
simple example
As M2Doc does not support the merge of cells, I have tried two workarounds.
Workaround 1
It consists in creating nested tables inside the second column. I have played with the borders to hide the fact that it is a nested table.
M2Doc template of workaround 1
Unfortunately Word does not handle nested table correctly, as there is no way to garantee the constant width of cells. Which results in columns that don't have a constant width.
illustration of inconsistent column width
Workaround 2
My second workaround was to generate excel tables outside M2Doc, with Python4Capella. And in my M2Doc template, I create references towards the generated tables.
This second workaround would work well if I did not have to display XHTML descriptions in my table. So far, I can only get the markup code in Excel and I have no means to interpret it.
Any idea of how I could implement my table including merged cells with the current capabilities of M2Doc? For example with a dedicated Java service that I would develop? If so, any hint about how this service could be implemented is highly welcome. And so is any idea of strategy!
Thank you
It is possible to create a Java service to merge cells for instance this service.
Also you have Excel services to insert a table from an .xlsx file.
With Python for capella you will have to parse the XHTML from your description to use OpenPyXL formating.
And maybe an other idea could be to use MS Excel itself to do the conversion via a macro or some cell format option.
so, I got 3 xlsx full of data already treated, so I pretty much just got to display the data using the graphs. The problem seems to be, that Powerbi aggregates all numeric data (using: count, sum, etc.) In their community they suggest to create new measures, the thing is, in that case I HAVE TO CREATE A LOT OF MEASURES...Also, I tried to convert the data to text and even so, Powerbi counts it!!!
any help, pls?
There are several ways to tackle this:
When you pull a field into the field well for a visualisation, you can click the drop down in the field well and select "Don't summarize"
in the data model, select the column and on the ribbon select "don't summarize" as the summarization option in the Properties group.
The screenshot shows the field well option on the left and the data model options on the right, one for a numeric and one for a text field.
And, yes, you never want to use the implicit measures, i.e. the automatic calculations that Power BI creates. If you want to keep on top of what is being calculated, create your own measures, and yes, there will be many.
Edit: If by "aggregating" you are referring to the fact that text values will be grouped in a table (you don't see any duplicates), then you need to add a column with unique values to the table so all the duplicates of the text values show up. This can be done in the data source by adding an Index column, then using that Index column in the table and setting it to a very narrow with to make it invisible.
I have a data table connected to multiple lookup tables, and I'm trying to find a way to use a the RELATED function to fetch values from a dynamically-selected lookup table based on the values of one of the columns.
e.g.
If the Month column's value is "2018_01", the Type column's value is "Adjustment", and the Variant column's value is "B", look in '2018_01_Adjustment'[Var_B] (essentially '<Month>_<Type>'[Var_<variant>]).
I was hoping DAX had some parallel to Excel's INDIRECT, but from looking through the internet, it appears it doesn't, so I need an alternative.
The most important thing using DAX is your data model.
For this type of question you need to model your data as thus:
First, do you have a date dimension? Once both the fact table (I think that's what you mean when you write "data table") have a link to the dimension table, you can link and group by the dimension you want, for example the MONTH or the YEAR.
I need to manipulate a data set such that it can be mapped with Google Fusion Tables. Current xls data is formatted as follows:
Image of xls file with personal data anonymized
Note that a blank row indicates a new entry. I need the information in the column to be sorted into a rows under the appropriate heading, specifically the address for geocoding. Any ideas?
First, do some clean up to merge your second and third column into a single one and then use the feature Columnize by key/value column to transpose data in the third and fourth columns into separate fields.
Once this done, Fusion table should be able to geocode the dataset based on the address. If it is not the case, there is plenty of tutorials to geocode a dataset with OpenRefine. See:
OpenRefine wiki,
Google Maps,
OpenStreet Map,
Yahoo Maps.
In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types
You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.