External offset store with the debezium embedded connector

External offset store with the debezium embedded connector - amazon-web-services

My team is building a CDC service with the Debezium embedded connector. For the offset storage we're thinking about using S3/DynamoDB. Just wondering if anyone here has written something similar to externalize the offset store and what they chose and why they chose that.

We have a Postgres DB as source. Change Data Capture (CDC) is implemented by the Postgres itself (done by the extension pglogical). This CDC subsystem of Postgres is responsible for offset management. The CDC subsytem will maintain a list of CDC clients (aka slots). So if your client creates a CDC connection the DB will start from the point where that client disconnected before (on the same slot). A new client will create a new slot and start receiving only the CDC records created from that point in time on. So there is no need for us to remember the offsets.

Had this challenge recently. You can write a custom class that implements org.apache.kafka.connect.storage.FileOffsetBackingStore or extend org.apache.kafka.connect.storage.MemoryOffsetBackingStore.
Subsequently ensure "offset.storage" config is set to the fully-qualified class name
Please see a sample below using redis (maybe not in production) as a backing store to give you an idea how this can work.
package com.sample.cdc.offsetbackingstore
import com.sample.cdc.service.RedisManager
import org.apache.kafka.connect.errors.ConnectException
import org.apache.kafka.connect.runtime.WorkerConfig
import org.apache.kafka.connect.storage.MemoryOffsetBackingStore
import java.io.IOException
import java.nio.ByteBuffer
import java.util.concurrent.Callable
import java.util.concurrent.Future
class RedisOffsetBackingStore : MemoryOffsetBackingStore() {
lateinit var redisManager : RedisManager
lateinit var redisHost : String
lateinit var redisPort : String
override fun configure(config: WorkerConfig?) {
super.configure(config)
redisHost = config?.getString("custom.config.redis.host")
redisPort = config?.getString("custom.config.redis.port")
}
// Called by Debezium Engine at some point
override fun start() {
super.start()
println("Initializing redis manager...")
redisManager = RedisManager(redisHost, redisPort)
}
// Called by Debezium Engine during graceful shutdown
override fun stop() {
super.stop()
println("Disposing redis client resources...")
if(this::redisManager.isInitialized)
redisManager.dispose()
}
// Called by DebeziumEngine OffsetReader to read Offset
override fun get(keys: MutableCollection<ByteBuffer>?): Future<MutableMap<ByteBuffer, ByteBuffer?>> {
if(data.isNotEmpty())
return super.get(keys)
return executor.submit(Callable<MutableMap<ByteBuffer, ByteBuffer?>> {
val result: MutableMap<ByteBuffer, ByteBuffer?> = HashMap()
keys?.forEach {
val offsetKey = String(it.array())
val offsetValue = redisManager.get(offsetKey)
if(offsetValue.isNotEmpty()){
val buffer = ByteBuffer.wrap(offsetValue.toByteArray())
result[it] = buffer
data[it] = buffer
}
}
result
})
}
// Invoked by set() in MemoryOffsetBackingStore class to persist Offset
// during commit or graceful shutdown
override fun save() {
try {
for ((key, value) in data) {
val offsetKey = String(key!!.array())
val offsetValue = String(value!!.array())
redisManager.save(offsetKey, offsetValue)
}
} catch (e: IOException) {
throw ConnectException(e)
}
}
}
//Ensure the below config setting is set in DebeziumConfig
//"offset.storage":"com.sample.cdc.offsetbackingstore.RedisOffsetBackingStore",
//"custom.config.redis.host": "localhost"
//"custom.config.redis.port": "6379"
Note: In case of multiple standalone embedded debezium services (for reliabilty and fault tolerance) with a custom offset backing store, you'll have to provide a way to handle offset race condition, and event deduplication.

Related

Can someone please explain the proper usage of Timers and Triggers in Apache Beam?

I'm looking for some examples of usage of Triggers and Timers in Apache beam, I wanted to use Processing-time timers for listening my data from pub sub in every 5 minutes and using Processing time triggers processing the above data collected in an hour altogether in python.

Please take a look at the following resources: Stateful processing with Apache Beam and Timely (and Stateful) Processing with Apache Beam
The first blog post is more general in how to handle states for context, and the second has some examples on buffering and triggering after a certain period of time, which seems similar to what you are trying to do.
A full example was requested. Here is what I was able to come up with:
PCollection<String> records =
pipeline.apply(
"ReadPubsub",
PubsubIO.readStrings()
.fromSubscription(
"projects/{project}/subscriptions/{subscription}"));
TupleTag<Iterable<String>> every5MinTag = new TupleTag<>();
TupleTag<Iterable<String>> everyHourTag = new TupleTag<>();
PCollectionTuple timersTuple =
records
.apply("WithKeys", WithKeys.of(1)) // A KV<> is required to use state. Keying by data is more appropriate than hardcode.
.apply(
"Batch",
ParDo.of(
new DoFn<KV<Integer, String>, Iterable<String>>() {
#StateId("buffer5Min")
private final StateSpec<BagState<String>> bufferedEvents5Min =
StateSpecs.bag();
#StateId("count5Min")
private final StateSpec<ValueState<Integer>> countState5Min =
StateSpecs.value();
#TimerId("every5Min")
private final TimerSpec every5MinSpec =
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#StateId("bufferHour")
private final StateSpec<BagState<String>> bufferedEventsHour =
StateSpecs.bag();
#StateId("countHour")
private final StateSpec<ValueState<Integer>> countStateHour =
StateSpecs.value();
#TimerId("everyHour")
private final TimerSpec everyHourSpec =
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#ProcessElement
public void process(
#Element KV<Integer, String> record,
#StateId("count5Min") ValueState<Integer> count5MinState,
#StateId("countHour") ValueState<Integer> countHourState,
#StateId("buffer5Min") BagState<String> buffer5Min,
#StateId("bufferHour") BagState<String> bufferHour,
#TimerId("every5Min") Timer every5MinTimer,
#TimerId("everyHour") Timer everyHourTimer) {
if (Objects.firstNonNull(count5MinState.read(), 0) == 0) {
every5MinTimer
.offset(Duration.standardMinutes(1))
.align(Duration.standardMinutes(1))
.setRelative();
}
buffer5Min.add(record.getValue());
if (Objects.firstNonNull(countHourState.read(), 0) == 0) {
everyHourTimer
.offset(Duration.standardMinutes(60))
.align(Duration.standardMinutes(60))
.setRelative();
}
bufferHour.add(record.getValue());
}
#OnTimer("every5Min")
public void onTimerEvery5Min(
OnTimerContext context,
#StateId("buffer5Min") BagState<String> bufferState,
#StateId("count5Min") ValueState<Integer> countState) {
if (!bufferState.isEmpty().read()) {
context.output(every5MinTag, bufferState.read());
bufferState.clear();
countState.clear();
}
}
#OnTimer("everyHour")
public void onTimerEveryHour(
OnTimerContext context,
#StateId("bufferHour") BagState<String> bufferState,
#StateId("countHour") ValueState<Integer> countState) {
if (!bufferState.isEmpty().read()) {
context.output(everyHourTag, bufferState.read());
bufferState.clear();
countState.clear();
}
}
})
.withOutputTags(every5MinTag, TupleTagList.of(everyHourTag)));
timersTuple
.get(every5MinTag)
.setCoder(IterableCoder.of(StringUtf8Coder.of()))
.apply(<<do something every 5 min>>);
timersTuple
.get(everyHourTag)
.setCoder(IterableCoder.of(StringUtf8Coder.of()))
.apply(<< do something every hour>>);
pipeline.run().waitUntilFinish();

Reducing code duplication when testing a KtorClient

I am creating a service on top of a Ktor client. My payload is XML, and as such a simplified version of my client looks like this :
class MavenClient(private val client : HttpClient) {
private suspend fun getRemotePom(url : String) =
try{ MavenClientSuccess(client.get<POMProject>(url)) }catch (e: Exception) { MavenClientFailure(e)
}
companion object {
fun getDefaultClient(): HttpClient {
return HttpClient(Apache) {
install(JsonFeature) {
serializer = JacksonSerializer(jackson = kotlinXmlMapper)
accept(ContentType.Text.Xml)
accept(ContentType.Application.Xml)
accept(ContentType.Text.Plain)
}
}
}
}
}
Note the use of a custom XMLMapper, attached to a custom data class.
I want to test this class, and follow the documentation.
I end up with the following code for my test client :
private val mockClient = HttpClient(MockEngine) {
engine {
addHandler { request ->
when (request.url.fullUrl) {
"https://lengrand.me/minimal/1.2/minimal-1.2.pom" -> {
respond(minimalResourceStreamPom.readBytes()
, headers = headersOf("Content-Type" to listOf(ContentType.Application.Xml.toString())))
}
"https://lengrand.me/unknown/1.2/unknown-1.2.pom" -> {
respond("", HttpStatusCode.NotFound)
}
else -> error("Unhandled ${request.url.fullUrl}")
}
}
}
// TODO : How do I avoid repeating this again ? That's my implementation?!
install(JsonFeature) {
serializer = JacksonSerializer(jackson = PomParser.kotlinXmlMapper)
accept(ContentType.Text.Xml)
accept(ContentType.Application.Xml)
accept(ContentType.Text.Plain)
}
}
private val Url.hostWithPortIfRequired: String get() = if (port == protocol.defaultPort) host else hostWithPort
private val Url.fullUrl: String get() = "${protocol.name}://$hostWithPortIfRequired$fullPath"
private val mavenClient = MavenClient(mockClient)
Now, I am not worried about the Mapper itself, because I test it directly.
However what bothers me is that I essentially have to duplicate the complete logic of my client to test behaviour?
This seems very brittle, because for example it will cause my tests to fail and have to be updated if I move to Json tomorrow. Same if I start using Response Validation for example.
This is even more true for another client where I am using a defaultRequest, which I have to completely copy over as well:
private val mockClient = HttpClient(MockEngine) {
install(JsonFeature) {
serializer = JacksonSerializer(mapper)
accept(ContentType.Application.Json)
}
defaultRequest {
method = HttpMethod.Get
host = "api.github.com"
header("Accept", "application/vnd.github.v3+json")
if (GithubLogin().hasToken()) header("Authorization", GithubLogin().authToken)
}
Am I doing things wrong? Am I testing too much ? I am curious as to how I can improve this.
Thanks a lot for your input!
P.S : Unrelated but the page about testing on Ktor mentions adding the dependency to the implementation. Sounds like I should use testImplementation instead to avoid shipping the lib with my application ?

The MockEngine is designed for stubbing real HTTP client implementation to test objects that use it. The duplication problem, you encounter, lies in the fact that transforming response body responsibility belongs to the client. So I suggest either use Jackson directly to transform a response body (in this case you don't need to use JsonFeature) or extract common configuration in a extension function and call it for both engines.

How to simulate a CRM plugin sandbox isolation mode in unit tests?

Context
I would like to write some unit tests against classes what will be utilized by CRM 2016 CodeActivity and Plugin classes. The final assembly will be registered in sandbox isolation mode.
I want to be sure if a test case is green when running unit tests, it will not be more restricted in sandbox isolation security restrictions when registered and run in CRM.
Question
Is there any way to simulate the sandbox isolation when running unit tests?

That's a really good question. You can maybe simulate running the plugin assemblies and code activities in a sandbox based on this Sandbox example.
With that example you could run the codeactivity with a limited set of permissions.
Now, what are the exact limitations of CRM online? Found this article. There is a Sandbox Limitations sections with some of them. If you find another one please let me know. Cause I'd be keen on adding this feature to FakeXrmEasy
Cheers,

I found this today: https://github.com/carltoncolter/DynamicsPlugin/blob/master/DynamicsPlugin.Tests/PluginContainer.cs
Which I used to turn into this:
using System;
using System.Diagnostics;
using System.Globalization;
using System.Net;
using System.Net.NetworkInformation;
using System.Reflection;
using System.Security;
using System.Security.Permissions;
using System.Text.RegularExpressions;
namespace Core.DLaB.Xrm.Tests.Sandbox
{
public static class SandboxWrapper
{
public static T Instantiate<T>(object[] constructorArguments = null)
{
return new SandboxWrapper<T>().Instantiate(constructorArguments);
}
public static T InstantiatePlugin<T>(string unsecureConfig = null, string secureConfig = null)
{
object[] args = null;
if (secureConfig == null)
{
if (unsecureConfig != null)
{
args = new object[] {unsecureConfig};
}
}
else
{
args = new object[]{unsecureConfig, secureConfig};
}
return new SandboxWrapper<T>().Instantiate(args);
}
}
public class SandboxWrapper<T> : MarshalByRefObject, IDisposable
{
private const string DomainSuffix = "Sandbox";
/// <summary>
/// The Sandbox AppDomain to execute the plugin
/// </summary>
public AppDomain SandboxedAppDomain { get; private set; }
public T Instantiate(object[] constructorArguments = null)
{
/*
* Sandboxed plug-ins and custom workflow activities can access the network through the HTTP and HTTPS protocols. This capability provides
support for accessing popular web resources like social sites, news feeds, web services, and more. The following web access restrictions
apply to this sandbox capability.
* Only the HTTP and HTTPS protocols are allowed.
* Access to localhost (loopback) is not permitted.
* IP addresses cannot be used. You must use a named web address that requires DNS name resolution.
* Anonymous authentication is supported and recommended. There is no provision for prompting the
on user for credentials or saving those credentials.
*/
constructorArguments = constructorArguments ?? new object[] { };
var type = typeof(T);
var source = type.Assembly.Location;
var sourceAssembly = Assembly.UnsafeLoadFrom(source);
var setup = new AppDomainSetup
{
ApplicationBase = AppDomain.CurrentDomain.BaseDirectory,
ApplicationName = $"{sourceAssembly.GetName().Name}{DomainSuffix}",
DisallowBindingRedirects = true,
DisallowCodeDownload = true,
DisallowPublisherPolicy = true
};
var ps = new PermissionSet(PermissionState.None);
ps.AddPermission(new SecurityPermission(SecurityPermissionFlag.SerializationFormatter));
ps.AddPermission(new SecurityPermission(SecurityPermissionFlag.Execution));
ps.AddPermission(new FileIOPermission(PermissionState.None));
ps.AddPermission(new ReflectionPermission(ReflectionPermissionFlag.RestrictedMemberAccess));
//RegEx pattern taken from: https://msdn.microsoft.com/en-us/library/gg334752.aspx
ps.AddPermission(new WebPermission(NetworkAccess.Connect,
new Regex(
#"^http[s]?://(?!((localhost[:/])|(\[.*\])|([0-9]+[:/])|(0x[0-9a-f]+[:/])|(((([0-9]+)|(0x[0-9A-F]+))\.){3}(([0-9]+)|(0x[0-9A-F]+))[:/]))).+")));
// We don't need to add these, but it is important to note that there is no access to the following
ps.AddPermission(new NetworkInformationPermission(NetworkInformationAccess.None));
ps.AddPermission(new EnvironmentPermission(PermissionState.None));
ps.AddPermission(new RegistryPermission(PermissionState.None));
ps.AddPermission(new EventLogPermission(PermissionState.None));
SandboxedAppDomain = AppDomain.CreateDomain(DomainSuffix, null, setup, ps, null);
return Create(constructorArguments);
}
private T Create(object[] constructorArguments)
{
var type = typeof(T);
return (T)Activator.CreateInstanceFrom(
SandboxedAppDomain,
type.Assembly.ManifestModule.FullyQualifiedName,
// ReSharper disable once AssignNullToNotNullAttribute
type.FullName, false, BindingFlags.CreateInstance,
null, constructorArguments,
CultureInfo.CurrentCulture, null
).Unwrap();
}
#region IDisposable Support
//Implementing IDisposable Pattern: https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/dispose-pattern
private bool _disposed; // To detect redundant calls
protected virtual void Dispose(bool disposing)
{
if (_disposed) return;
if (disposing)
{
if (SandboxedAppDomain != null)
{
AppDomain.Unload(SandboxedAppDomain);
SandboxedAppDomain = null;
}
}
_disposed = true;
}
// This code added to correctly implement the disposable pattern.
void IDisposable.Dispose()
{
// Do not change this code. Put cleanup code in Dispose(bool disposing) above.
Dispose(true);
}
#endregion
}
}
Which can be used as such:
SandboxWrapper.InstantiatePlugin<YourPluginType>(unsecureString, secureString)
Not sure how much of it is valid or not, but it worked for handling my testing of xml and JSON serialization correctly.

How to create new record from web service in ADF?

I have created a class and published it as web service. I have created a web method like this:
public void addNewRow(MyObject cob) {
MyAppModule myAppModule = new MyAppModule();
try {
ViewObjectImpl vo = myAppModule.getMyVewObject1();
================> vo object is now null
Row r = vo.createRow();
r.setAttribute("Param1", cob.getParam1());
r.setAttribute("Param2", cob.getParam2());
vo.executeQuery();
getTransaction().commit();
} catch (Exception e) {
e.printStackTrace();
}
}
As I have written in code, myAppModule.getMyVewObject1() returns a null object. I do not understand why! As far as I know AppModule has to initialize the object by itself when I call "getMyVewObject1()" but maybe I am wrong, or maybe this is not the way it should be for web methods. Has anyone ever faced this issue? Any help would be very appreciated.

You can check nice tutorial: Building and Using Web Services with JDeveloper
It gives you general idea about how you should build your webservices with ADF.
Another approach is when you need to call existing Application Module from some bean that doesn't have needed environment (servlet, etc), then you can initialize it like this:
String appModuleName = "org.my.package.name.model.AppModule";
String appModuleConfig = "AppModuleLocal";
ApplicationModule am = Configuration.createRootApplicationModule(appModuleName, appModuleConfig);
Don't forget to release it:
Configuration.releaseRootApplicationModule(am, true);
And why you shouldn't really do it like this.
And even more...
Better aproach is to get access to binding layer and do call from there.
Here is a nice article.

Per Our PM : If you don't use it in the context of an ADF application then the following code should be used (sample code is from a project I am involved in). Note the release of the AM at the end of the request
#WebService(serviceName = "LightViewerSoapService")
public class LightViewerSoapService {
private final String amDef = " oracle.demo.lightbox.model.viewer.soap.services.LightBoxViewerService";
private final String config = "LightBoxViewerServiceLocal";
LightBoxViewerServiceImpl service;
public LightViewerSoapService() {
super();
}
#WebMethod
public List<Presentations> getAllUserPresentations(#WebParam(name = "userId") Long userId){
ArrayList<Presentations> al = new ArrayList<Presentations>();
service = (LightBoxViewerServiceImpl)getApplicationModule(amDef,config);
ViewObject vo = service.findViewObject("UserOwnedPresentations");
VariableValueManager vm = vo.ensureVariableManager();
vm.setVariableValue("userIdVariable", userId.toString());
vo.applyViewCriteria(vo.getViewCriteriaManager().getViewCriteria("byUserIdViewCriteria"));
Row rw = vo.first();
if(rw != null){
Presentations p = createPresentationFromRow(rw);
al.add(p);
while(vo.hasNext()){
rw = vo.next();
p = createPresentationFromRow(rw);
al.add(p);
}
}
releaseAm((ApplicationModule)service);
return al;
}
Have a look here too:
http://www.youtube.com/watch?v=jDBd3JuroMQ

mysterious console output to stderr from jetty?

When running my embedded jetty web app launcher, I see the following output to stderr. I just started seeing this after moving my build to maven-2. Has anyone seen this before?
IDLE SCEP#988057 [d=false,io=1,w=true,rb=false,wb=false],NOT_HANDSHAKING, in/out=0/0 Status = OK HandshakeStatus = NOT_HANDSHAKING
bytesConsumed = 5469 bytesProduced = 5509
It repeats occasionally seemingly at random times.

This seems to be coming from jetty NIO support -- it appears that jetty feels it is appropriate to log to stderr when it close idle connections.
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.checkIdleTimestamp(SelectChannelEndPoint.java:231)
at org.eclipse.jetty.io.nio.SelectorManager$SelectSet$2.run(SelectorManager.java:768)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
For those with similar problems, I overrode System.err with a mock output stream:
public class DebugOutputStream extends OutputStream {
private Logger s_logger = LoggerFactory.getLogger(DebugOutputStream.class);
private final OutputStream m_realStream;
private ByteArrayOutputStream baos = new ByteArrayOutputStream();
private Pattern m_searchFor;
public DebugOutputStream(OutputStream realStream, String regex) {
m_realStream = realStream;
m_searchFor = Pattern.compile(regex);
}
public void write(int b) throws IOException {
baos.write(b);
if (m_searchFor.matcher(baos.toString()).matches()) {
s_logger.info("unwanted output detected", new RuntimeException());
}
if (b == '\n') baos.reset();
m_realStream.write(b);
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

External offset store with the debezium embedded connector - amazon-web-services

My team is building a CDC service with the Debezium embedded connector. For the offset storage we're thinking about using S3/DynamoDB. Just wondering if anyone here has written something similar to externalize the offset store and what they chose and why they chose that.

Related

Can someone please explain the proper usage of Timers and Triggers in Apache Beam?

Reducing code duplication when testing a KtorClient

How to simulate a CRM plugin sandbox isolation mode in unit tests?

How to create new record from web service in ADF?

mysterious console output to stderr from jetty?

Categories

Resources