Dataflow - how to create a PCollectionView to use with DoFnTester? - unit-testing

This question is about Google Dataflow. I would like to test a do function with side inputs. The Google manuals list that you need code like this:
static class MyDoFn extends DoFn<String, Integer> { ... }
MyDoFn myDoFn = ...;
DoFnTester<String, Integer> fnTester = DoFnTester.of(myDoFn);
PCollectionView<List<Integer>> sideInput = ...;
Iterable<Integer> value = ...;
fnTester.setSideInputInGlobalWindow(sideInput, value);
I wonder how the code to create the PCollectionView instance looks like. When using DoFnTester you do not have a pipeline, and I do not see how to create PCollectionView instances without a pipeline. Can you tell me how to create a PCollectionView instance for use with DoFnTester?
Thanks for your time.
With kind regards,
Martijn Dirkse

I found the answer myself. You can just create a TestPipeline instance and use it to build the PCollectionView you need. It is no problem that the TestPipeline does not have any other purpose in your code.

Dataflow 2.1 sdk sample here . There is no setSideInputInGlobalWindow in 2.X use setSideInput instead.

Related

Trouble using M2Doc core generation API and SiriusServices

I am trying to generate documentation using the core generation API (as described here https://www.m2doc.org/ref-doc/3.1.0/index.html#core-generation-api). But I have the following error:
Couldn't find the 'isRepresentationDescriptionName()' service.
(It works fine when I use the genconf not programmatically).
I tried to add the SiriusServices using SiriusServiceConfigurator, but didn't manage to solve this issue.
Or maybe is it because I didn't add the SiriusSession option that refers to the .aird file?
I have looked at how new services are added in the newEnvironmentWithDefaultServices work but it is seems not applicable for SiriusServices.
final IQueryEnvironment queryEnvironment = org.eclipse.acceleo.query.runtime.Query
.newEnvironmentWithDefaultServices(null);
final Monitor monitor = new BasicMonitor.Printing(System.out);
final ResourceSet resourceSetForModels = session.getTransactionalEditingDomain().getResourceSet();
resourceSetForModels.createResource(modelUri);
try (DocumentTemplate template = M2DocUtils.parse(resourceSetForModels.getURIConverter(), templateURI,
queryEnvironment, classProvider, monitor)) {
final Map<String, Object> variable = new HashMap<>();
M2DocUtils.generate(template, queryEnvironment, variable, resourceSetForModels, outputURI, monitor);
...
Thanks
The Sirius related services need the Sirius Session. The Session is initialized using the SiriusSession option in the .genconf file. It should be set to an URI referencing the .aird file. In the class M2DocUtils you have several methods to create an IQueryEnvironment that take a Map of String where you can add the SiriusSession option, for instance:
M2DocUtils.getQueryEnvironment(ResourceSet, URI, Map<String, String>)
Note that your code needs to be ran inside of Eclipse, not a standalone java program.

How to use LinearSvm?

Currently I'm using FastTree for binary classification, but I would like to give SVM a try and compare metrics.
All the docs mention LinearSvm, but I can't find code example anywhere.
mlContext.BinaryClassification.Trainers does not have public SVM trainers. There is LinearSvm class and LinearSvm.TrainLinearSvm static method, but they seem to be intended for different things.
What am I missing?
Version: 0.7
For some reason there is no trainer in the runtime API but there is a linear SVM trainer in the Legacy API (for v0.7) found here. They might be generating a new one for the upcoming API, so my advice is to either use the legacy one, or wait for a newer API.
At this stage, ML.Net is very much in development.
Copy pasting the response I got on Github:
I have two answers for you: What the status of the API is, and how to use the LinearSVM in the meantime.
First, we have LinearSVM in the ML.NET codebase, but we do not yet have samples or the API extensions to place it in mlContext.BinaryClassification.Trainers. This is being worked through in issue #1318. I'll link this to that issue, and mark it as a bug.
In the meantime, you can use direct instantiation to get access to LinearSVM:
var arguments = new LinearSvm.Arguments()
{
NumIterations = 20
};
var linearSvm = new LinearSvm(mlContext, arguments);
var svmTransformer = linearSvm.Fit(trainSet);
var scoredTest = svmTransformer.Transform(testSet);
This will give you an ITransformer, here called svmTransformer that you can use to operate on IDataView objects.

How can I configure Jenkins using .groovy config file to set up 'Build strategies -> Tags' in my multi-branch pipeline?

I want something similar for 'Basic Branch Build Strategies' plugin https://plugins.jenkins.io/basic-branch-build-strategies
I figure out to make it something like this but it's not working:
def traits = it / sources / data / 'jenkins.branch.BranchSource' / source / traits
traits << 'com.cloudbees.jenkins.plugins.bitbucket.TagDiscoveryTrait' {
strategyId(3)
}
traits << 'jenkins.branch.buildstrategies.basic.TagBuildStrategyImpl' {
strategyId(1)
}
Here you can find full config file: https://gist.github.com/sobi3ch/170bfb0abc4b7d91a1f757a9db07decf
The first trait is working fine 'TagDiscoveryTrait' but second (my change) doesn't apply on Jenkins restart, 'TagBuildStrategyImpl'.
How can I configure 'Build strategies -> Tags' in .groovy config for my multibranch pipeline using 'Basic Branch Build Strategies' plugin?
UPDATE: Maybe I don't need to use traits at all. Maybe there is a simpler solution. I'm not expert in Jenkins groovy configuration.
UPDATE 2: This is scan log for my code https://gist.github.com/sobi3ch/74051b3e33967d2dd9dc7853bfb0799d
I am using the following Groovy init script to setup a Jenkins job with a "tag" build strategy.
def job = instance.createProject(WorkflowMultiBranchProject.class, "<job-name>")
PersistedList sources = job.getSourcesList()
// I am using Bitbucket, you need to replace this with your source
def pullRequestSource = new BitbucketSCMSource("<repo-owner>", "<repo-name>")
def source = new BranchSource(pullRequestSource)
source.setBuildStrategies([new TagBuildStrategyImpl(null, null)])
sources.add(source)
If I am recognizing the syntax correctly, the question is about Job DSL plugin.
The problem with the attempted solution is that the TagBuildStrategyImpl is not a Trait (known as Behavior in UI) but a Build Strategy. The error confirms this:
java.lang.ClassCastException: jenkins.branch.buildstrategies.basic.TagBuildStrategyImpl cannot be cast to jenkins.scm.api.trait.SCMSourceTrait
Class cannot be cast because TagBuildStrategyImpl does not extend SCMSourceTrait, it extends BranchBuildStrategy.
The best way to discover the JobDSL syntax applicable for a specific installation of Jenkins is to use the built-in Job DSL API Viewer. It is available under <jenkins-location>/plugin/job-dsl/api-viewer/index.html, e.g. https://ci.jenkins.io/plugin/job-dsl/api-viewer/index.html
On the version I am running what you are try to achieve would look approximately like this:
multibranchPipelineJob('foo') {
branchSources {
branchSource {
source {
bitbucket {
...
traits {
bitbucketTagDiscovery()
}
}
}
buildStrategies {
buildTags { ... }
}
}
}
}

Sitecore: Glass Mapper Code First

It is possible to automatically generate Sitecore templates just coding models? I'm using Sitecore 8.0 and I saw Glass Mapper Code First approach but I cant find more information about that.
Not sure why there isn't much info about it, but you can definitely model/code first!. I do it alot using the attribute configuration approach like so:
[SitecoreType(true, "{generated guid}")]
public class ExampleModel
{
[SitecoreField("{generated guid}", SitecoreFieldType.SingleLineText)]
public virtual string Title { get; set; }
}
Now how this works. The SitecoreType 'true' value for the first parameter indicates it may be used for codefirst. There is a GlassCodeFirstDataprovider which has an Initialize method, executed in Sitecore's Initialize pipeline. This method will collect all configurations marked for codefirst and create it in the sql dataprovider. The sections and fields are stored in memory. It also takes inheritance into account (base templates).
I think you first need to uncomment some code in the GlassMapperScCustom class you get when you install the project via Nuget. The PostLoad method contains the few lines that execute the Initialize method of each CodeFirstDataprovider.
var dbs = global::Sitecore.Configuration.Factory.GetDatabases();
foreach (var db in dbs)
{
var provider = db.GetDataProviders().FirstOrDefault(x => x is GlassDataProvider) as GlassDataProvider;
if (provider != null)
{
using (new SecurityDisabler())
{
provider.Initialise(db);
}
}
}
Furthermore I would advise to use code first on development only. You can create packages or serialize the templates as usual and deploy them to other environment so you dont need the dataprovider (and potential risks) there.
You can. But it's not going to be Glass related.
Code first is exactly what Sitecore.PathFinder is looking to achieve. There's not a lot of info publicly available on this yet however.
Get started here: https://github.com/JakobChristensen/Sitecore.Pathfinder

How to programmatically update references in sitecore?

I have 2 templates: ArticleItem and ArticlePageItem, the ArticlePageItem has a ReferenceField 'Content.Reference' that links to an ArticleItem. Below is the code to create an article:
Item articlePageItem = articlePageParentItem.Add(articleItem.Name, new TemplateItem(master.GetItem(ConstantString.ArticlePageTemplateID)));
using (new UserSwitcher(Sitecore.Context.User))
{
articlePageItem.Editing.BeginEdit();
articlePageItem.Fields["Content.Reference"].Value = articleItem.ID.ToString();
articlePageItem.Editing.EndEdit();
}
But after I execute the code above, I cannot get the ArticleItem reference through Globals.LinkDatabase.GetReferences(articlePageItem), even though I use Globals.LinkDatabase.UpdateReference(articlePageItem).
Does anyone know how to implement this?
[Update]
Below is our environment:
We have a website based on Sitecore, and we're developing another system aims to simplify the article management. We use .NET 4 & ASP.NET MVC 3 to implement this system, and reference Sitecore.Kernal.dll & Sitecore.Client.dll to our project. But our sitecore version is 6.2 which is incomplatible with .NET 4, so I just copied part of the configurations. I think it maybe dues to the incomplete web.config.
If you are executing the above code you should also consider publishing the item changes.
This can be done by using the following code snippet:
// publish all changed content
Database webDatabase = Sitecore.Configuration.Factory.GetDatabase("web");
PublishOptions publishOptions = new PublishOptions(masterDatabase, webDatabase, PublishMode.Smart, Sitecore.Context.Language, DateTime.Now);
publishOptions.RootItem = vacatureRoot;
publishOptions.Deep = true;
Publisher publisher = new Publisher(publishOptions);
publisher.Publish();
Where 'vacatureRoot' is the root -> in your case articlePageParentItem
After publishing the references should be set automatically and should be retrievable by using the normal way of getting Fields.
It looks like you are using a ReferenceField and therefore your code should look something like this:
ReferenceField rfRef = Sitecore.Context.Item.Fields["Content.Reference"];
if(rfRef != null && rfRef.TargetItem != null)
{
//Your logic here
}
Answer for comment:
I think you could best use the following code fragment ->
Sitecore.Globals.LinkDatabase.UpdateReferences(articlePageItem);
I think this will do what the name says, update the references for this item.
Hope this will work for you!