Document versioning in monorepo - github-pages

I have one repo with multiple subdirectories. e.g
Each of the subdirectory has its own version tag in Github. e.g
Foo v1.0.0
Bar v2.0.0
I'm currently exploring an individual versioning documentation for each subdirectory
I've started looking into using Mkdocs and hosts the document in Github Pages. While I could use mike for versioning and monorepo plugin to merge all the documents in one place, this option can only create one global document version for the repo rather than individual document version for each subdirectory.
So I would have something like this in Github Page
MyRepo Document V1
MyRepo Document V2
Instead of
MyRepo Document V1, V2
Foo V1, V2
Bar V1, V2
I'm just wondering if anyone has run into this issue and what would be the best option for individual document versioning in a monorepo scenario. Or whether this is not possible?
Thank you

Ideally, version number does not appear in file or folder names: that would defeat the all "versioning" aspect.
You would be using:
a branch V1 or V2, in which you can have versions of your documents towards V1 or V2
a tag V1 or V2 to mark the final state of V1 or V2.
That way, since the documents keep the same path and filename, you can merge between branches or make diffs easily.


Google Artifact Registry: what's the alternative to

With Google Container Registry, one could use gsutil to list Docker layers stored within the registry. For example, this command would calculate total size (I'm assuming that's the billable size) of stored layers:
gsutil du -hs gs://
What's the alternative for this for Artifact Registry and
I tried the python API and while I can retrieve image sizes with ListDockerImagesRequest, there's no information of how layers are used/shared. Is there a way to find total billable repository size for Artifact Registry?
For AR, you can use ListFiles to get all files in a repository. The repository size is just the sum of these file sizes and this will work for all repository types. (for docker, "Files" includes both manifests and layers)
Api docs for ListFiles are here: ("Parent" here is the repository. I will look into making this more clear in the docs)

GCP Data Catalog - ONE for all projects (of one and more orgs)

What is the best practice
to get a company wide (one or more organisations each with multiple folders and projects)
INTO one central and all metadata contained data catalog ?
(if "multiple orgs" is too complex than let's start with one)
I've put together a sample showing how to work with one organization.
The main ideia is to use a Tag Central Project, where you store the common resources, like Tag Templates, Policy Tags and Custom Entries that could be reused.
So you have:
Tag Central Project
List of Analytics Projects (Where you have the data assets)
Then the next thing is the user personas you would use, I'd suggest starting with:
Data Governors
Data Curators
Data Analysts
This google-datacatalog-governance-best-pratices GitHub repo contains the code which uses terraform to automatically set up those governance best practices I mentioned.
You can adapt those samples to work at folder or organization level by changing the terraform resources.

AWS CDK - Multiple Stacks - Parameters for the location of Lambda Code is not found

I'm using CDK to set up a CI/CD Pipeline. I have currently a code build from a git into the pipeline. There are then two builds - one that pulls out code for a lambda and builds an artifact for it, and a second that issues the cdk synth to construct the lambda framework (including a nested bucket and dynamo).
Then it heads to a deploy stage, but fails because it can't find the parameters for the location of the lambda code
ive been using this example:
the only differences from this example are that I'm using python for all of it and due to known future needs, the lamdba's are are in a parallel directory from the stack code
Everything runs up until deploy where it fails with the error "The following CloudFormation Parameters are missing a value:" and then lists the BucketName and ObjectKey
I assigned those as overrides as per the above link:
as part of the pipeline actions CloudFormationCreateUpdateStackAction, and passed the code just like in the example from lambda stack to the pipeline stack. But every time the lambda stack is attempted to deploy the parameters for the location of the code 'do not exist'
I've tried overriding the parameters, but being in the pipeline and dynamically created I am hesitant to follow further (and my attempts didnt work anyways). I've tried a bunch of different stack/nested stack/single stack configurations but haven't had a Successs yet.
This basically boils down to CodeUri in the Cloudformation template will automatically append the s3 bucket if your CodeUri starts with ./
So you have 2 options.
In your pipeline output your artifact as normal, just do the whole repo from the codebuild into the code deploy. Your code deoploy can pick up the artifact naturally and will automatically append the S3 url to that
if you're using Python however, you MUST be aware that starting from a lambda directory deeper in the tree will mean that the python Imports expect that directory to be a root directory - meaning if you were in Lambdas/Lambda1 and wanted to import a file that existed in the Lambda1 directory, in order for it to work on AWS Lambda you would need to have the import be just the file name, ignoring the rest of the path.
This means that coding can be difficult, and running unit tests can be difficult as well. You'll want to add all the individual lambda folders (and their paths) from root to the PYTHONPATH env variable of your codebuild instance so the unit tests know where to do so (and add a .env file to your IDE as well to handle this in your local)
You use CDK and you cdk synth the stack you want to deploy. This creates a cdk.out folder with a bunch of asset zips in it plus the stack template (a json). you adjust your artifact output in the codebuild to output the cdk.out folder, and the asset zips are automatically (thanks to cdk) subbed into the codeUri locations in the also automatically synthed template. Once you know what the templates name is its easy to set the CodeDeploy to look for that template name and it will find the asset zips individually for each lambda.

Bulk Tag Bigquery columns with python & Google Cloud Datacatalog

Is there a way to bulk tag bigquery tables with python
If you want to take a look at sample code which uses the python client library, I've put together a utilities open source script, that creates bulk Tags using a CSV as source. If you want to use a different source, you may use this script as reference, hope it helps.
create bulk tags from csv
For this purpose you may consider using DataCatalogClient() method which is included in class as a part of PyPI Python google-cloud-datacatalog package leveraging Google Cloud Data Catalog API service.
By the first, you have to enable Data Catalog and BigQuery APIs
in your project;
Install Python Cloud Client Libraries for the Data Catalog API:
pip install --upgrade google-cloud-datacatalog
Set up authentication, exporting
GOOGLE_APPLICATION_CREDENTIALS environment variable holding JSON
file that contains your service account key:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
Refer to this example from official documentation that
intelligibly reflects a way creating Data catalog tag template,
attaching appropriate tag fields to the target Bigquery table using
create_tag_template() function.
Having any doubts feel free to extend you initial question or add a comment below this answer, thus we can address particular use case according to your needs.

WSO2 GR: add application artifact and lifecyle when defining new application in the GR

I have a WSO2 Goverance Registry setup conformant to this blog post
When defining a new application in the WSO2 GR using the menu: Metadata > Add > Application I would like to be able to directly add the actual application artifact (war/car file).
The selected file should then by placed in the SVN location conforming to the initial state of the lifecycle to which I will bind the application. This of course implies that I would also need to be able to directly add the lifecycle when defining a new application.
The new application form would then be something like this:
Name: ExampleApplication-1.0.0
Type: .war (is now redundant)
Description: My Example Application Artifact: Selected file
ExampleApplication-1.0.0.war Lifecyle: MyDTAP-Lifecycle_v1
Does anybody know a good starting point for adding this functionality in terms of code hooks or extension points?
If I have understood you correctly, what you need to do is basically provide an file upload option in your "Application" RXT (Governance Artifact Configuration) which will upload what ever your file type and based on that you want to fill the derivable information to the meta data of the artifact. And also to attach a selected/pre defined life cycle to it at artifact creation. What you are looking for is Registry Handlers [1]. You can achieve all aforementioned tasks probably through a single handler.
[1] -