How can i make power query read ".dss" files? - powerbi

Im trying to make a dashboard on Power BI with the .dss files from simulations of HEC-HMS to show results of time series datas, but they are inside a ".dss" file and power query says that: "we don't recognize the format of the first file"
How can I open those ".dss" files inside the power query ?
see a pic:
enter image description here
Thanks! Waiting help.

This looks like what you might be looking for:
HEC-DSS File and HEC-DSSVue – Gridded Data:
Quote:
HEC-DSS, USACE Hydrologic Engineering Center Data Storage System, is a type of database system to store data primarily for hydrologic and hydraulic modeling (*.dss file). HEC-DSSVue is a tool to view, edit, and visualize a HEC-DSS file. Unlike other commercial or open source databases, HEC-DSS is not a relational database: HEC-DSS uses blocks (records) to store data within a HEC-DSS file and each HEC-DSS file can have numerous blocks (records), In addition to time series data and paired data in HEC-DSS, gridded data can also be stored in a HEC-DSS file.
HEC-DSSVue can be downloaded from here:
https://www.hec.usace.army.mil/software/hec-dssvue/

Related

GCP > Video Intelligence: Prepare CSV error: Has critical error in root level csv, Expected 2 columns, but found 1 columns only

I'm trying to follow documentation from below GCP link to prepare my video training data. In the doc, it says that if you want to use GCP to label videos, you can use UNASSIGNED feature.
I have my videos uploaded to a bucket.
I have a traffic_video_labels.csv with below rows:
gs://video_intel/1.mp4
gs://video_intel/2.mp4
Now, in my Video Intelligence Import section, I want to use a CSV called check.csv that has below row as it references back to the video locations. Using UNNASIGNED value should let me use the labelling feature within GCP.
UNASSIGNED,gs://video_intel/traffic_video_labels.csv
However, when I try to check.csv as a file, I get the error:
Has critical error in root level csv gs://video_intel/check.csv line 1: Expected 2 columns, but found
1 columns only.
Can anyone pls help with this? thanks!
https://cloud.google.com/video-intelligence/automl/object-tracking/docs/prepare
For the error message "Expected 2 columns, but found
1 columns only." try to fix the format of your CSV file, open the file in a text editor of your choice (such as Cloud Shell, Sublime, Atom, etc.) to inspect the file format.
When opening a CSV file in Google Sheets or a similar product, you won't be able to format the file properly (i.e. empty values from tailing commas) due to limitation on the user interface, but in text editors, you should not run into those issues.
If this does not work, please share your CSV file to make a test with your file by my own.

The source files structure will be changed on daily basis in informatica cloud

Requirement is, The source files structure will be changed on daily basis / dynamically. how we can achieve in Informatica could:
For example,
Let's consider the source is a flat file with different formats like with header, without header, different metadata(today file with 4 columns and tomorrow its 7 different columns and day after tomorrow without header , another day file with count of records in file)
I need to consume all dynamically changed files in one informatica cloud mapping. could you please help me on this.
This is a tricky situation. I know its not a perfect solution but here is my idea-
create a source file structure having maximum number of columns of type text, say 50. Read file, apply filter to cleanup header data etc. Then use router to treat files as per their structure - may be filename can give you a hint what it contains. Once you identify the type of file, treat,convert columns according to their data type and load into correct target.
Mapping would look like Source -> SQ -> EXP -> FIL -> RTR -> TGT1, TGT2
There has to be a pattern to identify the dynamic file structure.
HTH...
To summarise my understanding of the problem:
You have a random number of file formats
You don't know the file formats in advance
The files don't contain the necessary information to determine their format.
If this is correct then I don't believe this is a solvable problem in Informatica or in any other tool, coding language, etc. You don't have enough information available to enable you to define the solution.
The only solution is to change your source files. Possibilities include:
a standard format (or one of a small number of standard formats with information in the file that allows you to programatically determine the format being used)
a self-documenting file type such as JSON

Google DataPrep - Apparently Limited Table Size

I'm trying to prepare SEO data from Screaming Frog, Majestic and Ahrefs, join it before importing said data into BigQuery for analysis.
The Majestic and Ahrefs csv files import after some pruning down to the 100MB limit.
The Screaming Frog CSV file however doesn't fully load, only displaying approx 37,000 rows of 193,000. By further pruning less important cols in Excel and reducing the filesize(from 44MB to 39MB) , the number of rows loaded increases slightly. This would indicate to me that it's not an errant character or cell.
I've made sure(resaved via text editor) that the CSV file is saved in UTF8, checked the limitations of Dataprep to see if there is a limit on the number of cells per Flow/Wrangle and can find nothing.
The Majestic and AHREFS files are larger and load completely with no issue. There is no data corruption in the Screaming Frog file. Is there something common I'm missing?
Is the total limit for all files 100MB?
Any advice or insight would be appreciated.
To get the full transformation of your files, you should run the recipe.
What you see in the Dataprep Transformer Page is a head sample.
You can take a look about how the sampling works here.

Tool for querying large numbers of csv files

We have large numbers of csv files, files/directories are partitioned by date and several other factors. For instance, files might be named /data/AAA/date/BBB.csv
There are thousands of files, some are in the GB range in size. Total data sizes are in the terabytes.
They are only ever appended to, and usually in bulk, so write performance is not that important. We don't want to load it into another system because there are several important processes that we run that rely on being able to stream the files quickly, which are written in c++.
I'm looking for tool/library that would allow sql like queries against the data directly off the data. I've started looking at hive, spark, and other big data tools, but its not clear if they can access partitioned data directly from a source, which in our case is via nfs.
Ideally, we would be able to define a table by giving a description of the columns, as well as partition information. Also, the files are compressed, so handling compression would be ideal.
Are their open source tools that do this? I've seen a product called Pivotal, which claims to do this, but we would rather write our own drivers for our data for an open source distributed query system.
Any leads would be appreciated.
Spark can be a solution. It is in memory distributed processing engine. Data can be loaded into memory on multiple nodes in the cluster and can be processed in memory. You do not need to copy data to another system.
Here are the steps for your case:
Build multiple node spark cluster
Mount NFS on to one of the nodes
Then you have to load data temporarily into memory in the form of RDD and start processing it
It provides
Support for programming languages like scala, python, java etc
Supports SQL Context and data frames. You can define structure to the data and start accessing using SQL Queries
Support for several compression algorithms
Limitations
Data has to be fit into memory to be processed by Spark
You need to use data frames to define structure on data after which you can query the data using sql embedded in programming languages like scala, python, java etc
There are subtle differences between traditional SQL in RDBMS and SQL in distributed systems like spark. You need to aware of those.
With hive, you need to have data copied to HDFS. As you do not want to copy the data to another system, hive might not be solution.

Export stata graph (data) to Excel?

Is there a simple way to export the "underlying" data of a Stata graph in order to reproduce that graph in MS Excel? Imagine you create a ROC curve using roctab y yhat, graph and you want to reproduce that graph in Excel.
I assume that you do not have access to the actual raw data that was used to compile the .gph in the first place, and somehow want to back engineer the .gph file... then, eek, good luck!
If you do however have the access to the data originally used then with new command available in Stata 13, You can use the function putexcel command
A more detailed description of the putexcel command can be found here stata press releasse on exporting tables to excel
The data in the .gph file are stored in the serset format between the and tags. There's no utility I know of that will parse the serset information, but it is very similar to Stata's dta file (v115 and below). I wrote up the basic file format information here. The Python library pandas has code for reading/writing dta files so with those you could probably create your own serset reader/writer.