I'm using ML.NET to do Multiclass Classification. I have 3 use cases with different input models(different number of columns and data types) and there will be more to come so it doesn't make sense to have to create a physical file for each input models for every new use cases. I'd like to have preferably just ONE physical file that can adapt to any models if possible and if not, dynamically create the input model at runtime based on the column definitions defined out of a json string retrieved from a table in a Sql Server DB. Is this even possible? If so, can you share the sample codes?
Here are some snippets of the prediction codes that I'd like to make generic :-
public class DynamicInputModel
{
[ColumnName("ColumnA"), LoadColumn(0)]
public string ColumnA { get; set; }
[ColumnName("ColumnB"), LoadColumn(1)]
public string ColumnB { get; set; }
}
PredictionEngine<DynamicInputModel, MulticlassClassificationPrediction> predEngine = _predEnginePool.GetPredictionEngine(modelName: modelName);
IDataView dataView = _mlContext.Data.LoadFromTextFile<DynamicInputModel>(
path: testDataPath,
hasHeader: true,
separatorChar: ',',
allowQuoting: true,
allowSparse: false);
var testDataList = _mlContext.Data.CreateEnumerable<DynamicInputModel>(dataView, false).ToList();
I don't think you can do DynamicInput, however you can create pipelines from one input schema and create multiple different models based on the labels/features. I have an example below that does that...two label columns and you can pass in an array of what feature columns to use for the model. The one downside to this approach is that the input schema (CSV/Database) has to be static (not change on load):
https://github.com/bartczernicki/MLDotNet-BaseballClassification
Related
from offical code snippet example of Spanner Java Client :
https://github.com/GoogleCloudPlatform/java-docs-samples/blob/HEAD/spanner/spring-data/src/main/java/com/example/spanner/SpannerTemplateSample.java
I can see the usage of
new SpannerQueryOptions().setAllowPartialRead(true)):
#Component
public class SpannerTemplateSample {
#Autowired
SpannerTemplate spannerTemplate;
public void runTemplateExample(Singer singer) {
// Delete all of the rows in the Singer table.
this.spannerTemplate.delete(Singer.class, KeySet.all());
// Insert a singer into the Singers table.
this.spannerTemplate.insert(singer);
// Read all of the singers in the Singers table.
List<Singer> allSingers = this.spannerTemplate
.query(Singer.class, Statement.of("SELECT * FROM Singers"),
new SpannerQueryOptions().setAllowPartialRead(true));
}
}
I didn't find any explanation on it. Anyone can help?
Quoting from the documentation:
Partial read is only possible when using Queries. In case the rows returned by query have fewer columns than the entity that it will be mapped to, Spring Data will map the returned columns and leave the rest as they of the columns are.
I'm trying to convert a tablerow containing multiple values to a KV. I can achieve this in a DoFn but that adds more complexity to the code that I want to write further and makes my job harder.
(Basically I need to perform CoGroupBy operation on two pcollections of tablerow)
Is there any way I can convert a PCollection to PCollection<KV<String, String>>, where the keys and values are stored in the same format as present in the tablerow?
I wrote a snippet that looks something like this but this doesnt give me the result I want, is there any way I can load all the entries in tablerow and generate a KV with those values?
ImmutableList<TableRow> input = ImmutableList.of(new TableRow().set("val1", "testVal1").set("val2", "testVal2").set("val3", "testVal3");
PCollection<TableRow> inputPC = p.apply(Create.of(input));
inputPC.apply(MapElements.into(TypeDescriptors.kvs(TypeDescriptors.strings(), TypeDescriptors.strings()))
.via(tableRow -> KV.of((String) tableRow.get("val1"), (String) tableRow.get("val2"))));
It looks like what you want is a way to perform a Join on data obtained from BigQuery. There is no way to perform Joins on TableRows directly, and this is because TableRows are not meant to be generally manipulated as elements in your pipeline, their purpose is specifically for reading and writing with BigQuery IO.
In order to be able to use existing Beam transforms, you'll want to convert those TableRows into a more useful representation, such as either a Java object you write yourself, or the Beam schema Row type. Since TableRow is essentially a dictionary of JSON strings, all you need to do is write a Map function that reads the appropriate types and parses them if necessary. For example:
PCollection<TableRow> tableRows = ... // Reading from BigQuery IO.
PCollection<Foo> foos = tableRows.apply(MapElements.via(
new SimpleFunction<TableRow, Foo>() {
#Override
public Foo apply(TableRow row) {
String bar = (String) row.get("bar");
Integer baz = (Integer.parseInt((String) row.get("baz")));
return new Foo(bar, baz);
}
});
Once you have the data in a type of your choice, you can use find a way to perform a Join with built-in Beam transforms. There are many potential ways to do this so I won't list all of them, but a clear first choice to look at is the Join class.
To convert from PCollection TableRow into PCollection string you can use the following code:
static class StringConverter extends DoFn<String, TableRow> {
#Override
public void processElement(ProcessContext c) {
c.output(new TableRow().set("string_field", c.element())); }
}
Here you can read more on how to transform from a TableRow to a String.
I'm creating a windows forms application for university. My data is stored in multible c arrays of abstract data types. Each array is displayed in it's own datagridview. This all works no problem.
Since the data is connected like in a database I added a new column to my parent table and set the ColumnType to DataGridViewComboBoxColumn. Now I need to fill this combobox with the data of a different array / datagridview.
I've already tried to add the options manually from my array as discribed in this question: Filling DataGridView ComboBox programatically in unbound mode?. Since I need to use C/C++ and this example uses vb the used functions are not availible. I can select my column dataGridView1->Columns[0] but the ->Items does not exist.
Is it possible to create a datasource from my arrays to link them directly? Or have I overlooked a way to add them manually from my array?
You don't provide any code, neither you describe well where did you ran into a problem. Assuming you're after unbound list to be filled into the DataGridViewComboBoxColumn, here is a simple example.
C#:
DataTable dt;
dt.Columns.Add("ID", typeof(Int32));
dt.Columns.Add("Category", typeof(string));
dt.Rows.Add({1, "Fruits"});
dt.Rows.Add({2, "Vegetables"});
dt.Rows.Add({3, "Trees"});
DataGridViewComboBoxColumn dgCol = this.DataGridView1.columns("Category");
dgCol.ValueMember = "ID";
dgCol.DisplayMember = "Category";
dgCol.DataSource = dt;
In this example, we create a DataTable, fill it with desired values and use it as a DataSource of the DataGridViewComboBoxColumn. Obviously, your values provided to this column in the DataGridView DataSource must match IDs in the list in new DataTable "dt".
You can convert an array or other data into DataTable or even use them directly as DataSource. It depends on what data you have. The principle will be very similar, though.
VB.NET:
Dim dt As DataTable
dt.Columns.Add("ID", GetType(Int32))
dt.Columns.Add("Category", GetType(String))
dt.Rows.Add({1, "Fruits"})
dt.Rows.Add({2, "Vegetables"})
dt.Rows.Add({3, "Trees"})
Dim dgCol As DataGridViewComboBoxColumn = Me.DataGridView1.columns("Category")
dgCol.ValueMember = "ID"
dgCol.DisplayMember = "Category"
dgCol.DataSource = dt
I have Json Documents in the following format
Name :
Class :
City :
Type :
Age :
Level :
Mother :
Father :
I have a map function like this
function(doc,meta)
{
emit([doc.Name,doc.Age,doc.Type,doc.Level],null);
}
What I can do is give "name" and filter out all results but what I also want to do is give "age" only and filter out on that. For that couchbase does not provide functionality to skip "Name" key. So I have to create a new map function which has "Age" as first key, but I also have to query on only "Level" key also so like this. I would have to create many map functions for each field which obviously is not feasible so is there anything I can do apart from making new map function to achieve this type of functionality?
I can't us n1ql because I have 150 million documents so it will take a lot of time.
First of all - that is not a very good reduce function.
it does not have any filtering
function header should be function(doc, meta)
if you have mixture between json and binary objects - add meta.type == "json"
Now for the things you can do:
If you are using v4 and above (v4.1 in much more recommended) you can use N1QL and use it very similar to SQL language. (I didn't understand why you can't use n1ql)
You can emit multiple items in multiple order
i.e. if I have doc in the format of
{
"name": "Roi",
"age": 31
}
I can emit to the index two values:
function (doc, meta) {
if (meta.type=="json") {
emit(doc.name, null);
emit(doc.age, null);
}
}
Now I can query by 2 values.
this is much better than creating 2 views.
Anyway, if you have something to filter by - it is always recommended.
Somewhere between Sitecore 7.2 and 8.0 the logic for how empty date fields (i.e. date fields for which the content editor has not selected a value) are stored changed. They used to be be stored as DateTime.MinValue (i.e. 00010101); however, now they are stored as an empty string. Under Sitecore 7.2 I used to be able to run the following line of code to find all items that have no value selected for a given date field:
var myQuery = _searchContext.GetQueryable<MyClass>.Where(item => item.MyDateField == DateTime.MinValue);
Which generated the follow Lucene query: +mydatefield: 00010101
This of course no longer works since the field value in the index is an empty string. I'm not quite sure how to use the ContentSearch API to setup the query since DateTime can't be compared to a null or empty string value. I'm wondering if there's a way to query this new format or if I need to look into modifying how Sitecore stores empty date values to match the old format.
One approach you can take is to define a new boolean computed field that keeps track of the presence of the date field. This will make your queries easier to read, and it does not require special knowledge of how Sitecore matches empty fields. It is also likely to be future proof if a change is made to how the values are stored.
using System;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.ComputedFields;
using Sitecore.Data.Fields;
using Sitecore.Data.Items;
namespace YourProject.ComputedFields
{
public class HasDateComputedIndexField : IComputedIndexField
{
public object ComputeFieldValue(IIndexable indexable)
{
Item item = indexable as SitecoreIndexableItem;
const string dateFieldName = "MyDateField";
return
item != null &&
item.Fields[dateFieldName] != null &&
!((DateField)item.Fields[dateFieldName]).DateTime.Equals(DateTime.MinValue) &&
!((DateField)item.Fields[dateFieldName]).DateTime.Equals(DateTime.MaxValue);
}
public string FieldName { get; set; }
public string ReturnType { get; set; }
}
}
The computed field will need to be added to your search configuration and your indexes rebuilt. From there, you can reference the computed field in your search result item class and query as follows:
public class MyClass : PageSearchResultItem
{
[IndexField("has_date")]
public bool HasDate { get; set; }
}
var myQuery = _searchContext.GetQueryable<MyClass>.Where(item => item.HasDate);
I believe you can use a nullable DateTime (DateTime?) for your field which should then have no value when the data is blank.
Your check can then be as simple as checking the HasValue property.
var myQuery = _searchContext.GetQueryable<MyClass>.Where(item => item.MyDateField.HasValue);