VERTEX AI, BATCH PREDICTION - google-cloud-platform

Does anyone know how I can choose the option "Files on Cloud Storage (file list) in the program? How can I [enter image description here choose the list format?

You can't enter multiple files in the UI. You point it to one text file in cloud storage, and that file should contain the paths of all the images you want to run predictions on. Here is the file format.

Related

How can i make power query read ".dss" files?

Im trying to make a dashboard on Power BI with the .dss files from simulations of HEC-HMS to show results of time series datas, but they are inside a ".dss" file and power query says that: "we don't recognize the format of the first file"
How can I open those ".dss" files inside the power query ?
see a pic:
enter image description here
Thanks! Waiting help.
This looks like what you might be looking for:
HEC-DSS File and HEC-DSSVue – Gridded Data:
Quote:
HEC-DSS, USACE Hydrologic Engineering Center Data Storage System, is a type of database system to store data primarily for hydrologic and hydraulic modeling (*.dss file). HEC-DSSVue is a tool to view, edit, and visualize a HEC-DSS file. Unlike other commercial or open source databases, HEC-DSS is not a relational database: HEC-DSS uses blocks (records) to store data within a HEC-DSS file and each HEC-DSS file can have numerous blocks (records), In addition to time series data and paired data in HEC-DSS, gridded data can also be stored in a HEC-DSS file.
HEC-DSSVue can be downloaded from here:
https://www.hec.usace.army.mil/software/hec-dssvue/

Google Cloud Vision not automatically splitting images for trainin/test

It's weird, for some reason GCP Vision won't allow me to train my model. I have met the minimum of 10 images per label, no images unlabeled and tried uploading a CSV pointing to 3 of this labels images as VALIDATION images.. Yet I get this error
Some of your labels (e.g. ‘Label1’) do not have enough images assigned to your Validation sets. Import another CSV file and assign those images to those sets.
any ideas would be appreciated
This error generally occurs when you did not labelled all the images because AutoML divides your images, including the mislabelled ones, into the categories and this error is triggered when the unlabelled images go to the VALIDATION set.
According to the documentation, it is recomended 1000 images per label. However, the minimum is 10 images for each label or 50 for complex cases. In addition,
The model works best when there are at most 100x more images for the most common label than for the least common label. We recommend removing very low frequency labels.
Furthermore, AutoML Vision uses the 80% of your content documents for training, 10% for validating, and 10% for testing. Since your images were not divided into these three categories, you should manually assign them to TRAIN, VALIDATION and TEST. You can do that by, uploading your images to a GCS bucket and referencing each labelled image in a .csv file, as follows:
TRAIN, gs://my_bucket/image1.jpeg,cat
As you can see above, it follows the format [SET],[GCS image path], [Label]. Note that you will be dividing your dataset manullay and it should respect the percentages already mentioned. Thus, you will have enough data in each category. You can follow the steps for preparing your training data here and here.
Note: please be aware that your .csv file is case sentive.
Lastly, in order to validate your dataset and inspect labelled/unlabelled images you can export the created dataset and check the exported .csv file. You can do it as described in the documentation. After exporting, download it and verify each SET( TRAIN, VALIDATION and TEST).

GCP > Video Intelligence: Prepare CSV error: Has critical error in root level csv, Expected 2 columns, but found 1 columns only

I'm trying to follow documentation from below GCP link to prepare my video training data. In the doc, it says that if you want to use GCP to label videos, you can use UNASSIGNED feature.
I have my videos uploaded to a bucket.
I have a traffic_video_labels.csv with below rows:
gs://video_intel/1.mp4
gs://video_intel/2.mp4
Now, in my Video Intelligence Import section, I want to use a CSV called check.csv that has below row as it references back to the video locations. Using UNNASIGNED value should let me use the labelling feature within GCP.
UNASSIGNED,gs://video_intel/traffic_video_labels.csv
However, when I try to check.csv as a file, I get the error:
Has critical error in root level csv gs://video_intel/check.csv line 1: Expected 2 columns, but found
1 columns only.
Can anyone pls help with this? thanks!
https://cloud.google.com/video-intelligence/automl/object-tracking/docs/prepare
For the error message "Expected 2 columns, but found
1 columns only." try to fix the format of your CSV file, open the file in a text editor of your choice (such as Cloud Shell, Sublime, Atom, etc.) to inspect the file format.
When opening a CSV file in Google Sheets or a similar product, you won't be able to format the file properly (i.e. empty values from tailing commas) due to limitation on the user interface, but in text editors, you should not run into those issues.
If this does not work, please share your CSV file to make a test with your file by my own.

The source files structure will be changed on daily basis in informatica cloud

Requirement is, The source files structure will be changed on daily basis / dynamically. how we can achieve in Informatica could:
For example,
Let's consider the source is a flat file with different formats like with header, without header, different metadata(today file with 4 columns and tomorrow its 7 different columns and day after tomorrow without header , another day file with count of records in file)
I need to consume all dynamically changed files in one informatica cloud mapping. could you please help me on this.
This is a tricky situation. I know its not a perfect solution but here is my idea-
create a source file structure having maximum number of columns of type text, say 50. Read file, apply filter to cleanup header data etc. Then use router to treat files as per their structure - may be filename can give you a hint what it contains. Once you identify the type of file, treat,convert columns according to their data type and load into correct target.
Mapping would look like Source -> SQ -> EXP -> FIL -> RTR -> TGT1, TGT2
There has to be a pattern to identify the dynamic file structure.
HTH...
To summarise my understanding of the problem:
You have a random number of file formats
You don't know the file formats in advance
The files don't contain the necessary information to determine their format.
If this is correct then I don't believe this is a solvable problem in Informatica or in any other tool, coding language, etc. You don't have enough information available to enable you to define the solution.
The only solution is to change your source files. Possibilities include:
a standard format (or one of a small number of standard formats with information in the file that allows you to programatically determine the format being used)
a self-documenting file type such as JSON

How to create on-demand cost file in weka?

I am using weka data mining tool. In weka I am trying to use metacost classifier to classify my data. But while executing it, there is a error popup saying "On-demand cost file doesn't exist".
Anyone know how to create on-demand cost file?
If you are using GUI,
You need to create a cost matrix first. Use "use explicit cost matrix" option, create a cost matrix from "cost matrix" option. You can simply use this option to train your base classifier.
If you want cost file on demand option, save this cost file as "relation name of the training data(arff file) ".cost" file.
When you select "Load matrix on demand" in Weka GUI, provide it with a directory path on "onDemand Directory" for the path where you have saved the cost file.
Remember that your name of the cost file must be "relation name of the training data plus ".cost". Now you can train your base classifier with on-demand cost.
Edit :For more details: http://weka.sourceforge.net/doc/weka/classifiers/meta/MetaCost.html
Its similar in weka API too.