How to store and manage test data in a CI project without polluting the code repository? - unit-testing

I am setting up a test scheme for a product that is data-hungry. The goal is to have a basic set of unit tests for c++/cuda code running within a Github repo, and also to make them self-hostable.
The sparse set of cowboy-coded tests that I have inherited have big(~20 MB) binary data files stuffed into the github repo, which hurts my heart. We have a snazzy Amazon S3 data repository.
I could hand-cut a bespoke solution, but I don't have the rest of my life to do it.
How have others intelligently structured their tests to handle this?
The tests as they exist store data in the code repo.
Possible cuts:
each test checks for data in a data/ folder and wgets it if it isn't there (will github handle this?)
have a data/ folder that is soft-mapped (how?) to the S3 repo
etc., etc.

Related

How to properly manage django production and development files

How can I properly manage my development files and production files with source control like GitHub (private repo). I am new to Django and while publishing my code to production there are many settings which needs to be changed in order to use it in production but the same setting won't work on my development server. I am stuck here Please guide me how manage. I want my files to be source control like any time I can make my development environment on another pc and anytime I can make my production server on any server.
how to manage secret keys, passwords on source control (like GitHub private repo)
Also please specify how should an Ideal file structure look like
hope you are well.
To answer the following questions I will do so based in the format you have asked.
You ask about management of production and development files with source control (namely GitHub). It would be best to store these in different branches of source control. Example: "main" branch being used for production and a "development" branch being used for development. This will allow you to work with both branches and you can merge development branch into the production branch.
The best way you can manage sensitive information such as passwords and keys in source control is to avail of .env files (What is the use of python-dotenv?), which stores variables in an environment. You can store variables in this file and tell GitHub to ignore this file in the .gitignore file.
You mention ideal file structure. There are many ways which files can be structured and normally I would say this is preference on the developers behalf as this doesn't really matter as long as the developer and future developers can make sense of the file structure. From personal recommendation and this is my own opinion.
Project setup
staticfiles folder
projectName
settings.py etc
app
app files
manage.py
Hope this can be of some benefit to you and some clarity around your questions.
Have a good day.

AWS Lambda - separate/protected binary store, so devs don't have to share binary files?

I'm trying to wrap my head around how using custom binaries with Lambda works. Since you have to upload the code bundle in a ZIP file (or pull it from S3), that means this action overwrites whatever you currently have in place. So let's say I have a folder structure like this in said ZIP file:
myFunc/
index.js
bin/
node_modules/
And in the bin folder are a couple of binary executables. This means all the developers on the team would have to have access to these binaries, and every time even the smallest code change was made to index.js, they would have to ZIP this up with the binaries every time and upload the bundle.
Is there not some way in Lambda to specify some sort of separate cache/store where binaries can be kept, independently of the source code?
This is what you use a build server for - anybody pushes a code change, and a bundle is created automatically within seconds. Even better, the bundle can be pushed through a test pipeline until it reaches production (via the AWS API using for example boto) a few minutes later.
You could possibly store binaries somewhere like S3 for access by the Lambda, but then you have a massive problem of version controlling them. Much easier (and safer) to create a complete bundle with absolutely everything the program needs. Additional benefits:
You can be absolutely certain of which exact code was involved in handling a particular request, making debugging much easier.
Developers can download the whole bundle and run it without having to establish a connection to the binary repository.
The bundle can be migrated to another service with minimal effort.

Deployment of files other than source code

I am starting to prepare a roadmap for our release process. We are at present using tortoise svn and ant for building source. I am considering implementing continuous integration and would like to know right direction for the choices below:
Firstly, the present process is such that a developer would work on a file, commits that file directly to repo. Others would run the tortoise update command to pull in the required changes. The same process is followed on the build server where in would update the source code, build and then deploy to qa and production servers. However, this process lacks control of repo since during an update, unwanted code is also pulled in case two developers worked on the same file fixing two different issues. One approved by qa and other rejected. How can i overcome this scenario.
Secondly, apart from source we have a bunch of other files such as xml files, css,js etc . How do i automate deployment of these files? I have configured cruisecontrol on my local machine and it works fine when it comes to executing a build but now sure how to handle other files since updating those files in production seems risky and error prone. Any suggestion in this would be really helpful.
You could try integrating PowerShell with CruiseControl, our team has CC fire off the build process and then PowerShell to copy the resulting project files (code and others) to production or a test site or wherever.
I'd suggest to deal with the lack of repository control that you create a candidate Branch off your Trunk and designate that as your Integration code. Once it's settled and necessary changes have been committed or pulled, promote it to Regression for further testing. Then once that testing is successful, promote it to Production.
In this process your developers wouldn't be committing to Production directly, but instead through an iterative process a new production repository will result, whose changes can then be reintegrated into Trunk so the process can start anew for the next release.

Is there an ideal way to move from Staging to Production for Coldfusion code?

I am trying to work out a good way to run a staging server and a production server for hosting multiple Coldfusion sites. Each site is essentially a fork of a repo, with site specific changes made to each. I am looking for a good way to have this staging server move code (upon QA approval) to the production server.
One fanciful idea involved compiling the sites each into EAR files to be run on the production server, but I cannot seem to wrap my head around Coldfusion archives, plus I cannot see any good way of automating this, especially the deployment part.
What I have done successfully before is use subversion as a go between for a site, where once a site is QA'd the code is committed and then the production server's working directory would have an SVN update run, which would then trigger a code copy from the working directory to the actual live code. This worked fine, but has many moving parts, and still required some form of server access to each server to run the commits and updates. Plus this worked for an individual site, I think it may be a nightmare to setup and maintain this architecture for multiple sites.
Ideally I would want a group of developers to have FTP access with the ability to log into some control panel to mark a site for QA, and then have a QA person check the site and mark it as stable/production worthy, and then have someone see that a site is pending and click a button to deploy the updated site. (Any of those roles could be filled by the same person mind you)
Sorry if that last part wasn't so much the question, just a framework to understand my current thought process.
Agree with #Nathan Strutz that Ant is a good tool for this purpose. Some more thoughts.
You want a repeatable build process that minimizes opportunities for deltas. With that in mind:
SVN export a build.
Tag the build in SVN.
Turn that export into a .zip, something with an installer, etc... idea being one unit to validate with a set of repeatable deployment steps.
Send the build to QA.
If QA approves deploy that build into production
Move whole code bases over as a build, rather than just changed files. This way you know what's put into place in production is the same thing that was validated. Refactor code so that configuration data is not overwritten by a new build.
As for actual production deployment, I have not come across a tool to solve the multiple servers, different code bases challenge. So I think you're best served rolling your own.
As an aside, in your situation I would think through an approach that allows for a standardized codebase, with a mechanism (i.e. an API) that allows for the customization you're describing. Otherwise managing each site as a "custom" project is very painful.
Update
Learning Ant: Ant in Action [book].
On Source Control: for the situation you describe, I would maintain a core code base and overlays per site. Export core, then site specific over it. This ensures any core updates that site specific changes don't override make it in.
Call this combination a "build". Do builds with Ant. Maintain an Ant script - or perhaps more flexibly an ant configuration file - per core & site combination. Track version number of core and site as part of a given build.
If your software is stuffed inside an installer (Nullsoft Install Shield for instance) that should be part of the build. Otherwise you should generate a .zip file (.ear is a possibility as well, but haven't seen anyone actually do this with CF). Point being one file that encompasses the whole build.
This build file is what QA should validate. So validation includes deployment, configuration and functionality testing. See my answer for deployment on how this can flow.
Deployment:
If you want to automate deployment QA should be involved as well to validate it. Meaning QA would deploy / install builds using the same process on their servers before doing a staing to production deployment.
To do this I would create something that tracks what server receives what build file and whatever credentials and connection information is necessary to make that happen. Most likely via FTP. Once transferred, the tool would then extract the build file / run the installer. This last piece is an area I would have to research as to how it's possible to let one server run commands such as extraction or installation remotely.
You should look into Ant as a migration tool. It allows you to package your build process with a simple XML file that you can run from the command line or from within Eclipse. Creating an automated build process is great because it documents the process as well as executes it the same way, every time.
Ant can handle zipping and unzipping, copying around, making backups if needed, working with your subversion repository, transferring via FTP, compressing javascript and even calling a web address if you need to do something like flush the application memory or server cache once it's installed. You may be surprised with the things you can do with Ant.
To get started, I would recommend the Ant manual as your main resource, but look into existing Ant builds as a good starting point to get you going. I have one on RIAForge for example that does some interesting stuff and calls a groovy script to do some more processing on my files during the build. If you search riaforge for build.xml files, you will come up with a great variety of them, many of which are directly for ColdFusion projects.

same project, multiple customers git workflow

After my first question, id like to have a confirmation about the best git workflow in my case.
I have a single django project, hosted at github, and differents clones with each his own branch : customerA, customerB, demo... (think websites)
Branches share the same core but have differents data and settings (these are in gitignore)
When i work on CustomerA branch, how should i replicate some bug corrections to the other deployments ?
When i create a new general feature, i create a special branch, then merge it into my master. Then, to deploy on the 'clients', i merge the master branch into the customer branch. Is it the right way ? or should i rebase ?
# from customerA branch
git fetch origin master
git merge origin master
Also, i have created a remote branch for each customer so i can backup the customers branches to github.
It looks a very classic problem but i guess i dont use git the right way
Thanks.
Ju.
I would have a single project repo at a well-known place containing a master branch with the common code, and branches for specific deployments (e.g. customer/A customer/B demo).
Then I would have checkouts from each of these branches for each customer, for the demo server, and so on. You can let these pull automatically from their respective branch with a commit hook on the single project repo.
Every developer would have their local copy of the project repo, do local work, and then push stuff back to the single project repo.
The challenge will be to maintain the branches diverging from master and doing the regular merges so the diversion do not grow over time.
I have seen this solution describe somewhere in much more detail somewhere on the web, but I could not find it quickly again. Some blog post on using git for a staging and production web server, IIRC.
If the three sites share some 'core' code (such as a Django app) you should factor that core out into its own repo and use git submodules to include it in the other projects, rather than duplicating it.
I would have a repo called project-master or something like that and a repo for each client. Then, when you have code you need to be available to those client repos, you pull from the project-master to that repo.
Don't separate the projects in branches, separate them into different repositories.
Make the "common" code generic enough so that costumerA's copy of the common code is exactly the same as costumerB's copy of the common code.
Then, you don't have to pull or merge anything. When you update the common code, both costumerA and costumerB will get the update automagically (because they use the same common code).
By "common" code: I'm referring to the package/series-of-apps that power the websites you're developing.
I'm assuming costumerA and costumerB repositories would only include things like site-specific settings and templates.
The key here is making the "common" code generic: don't let costumerA use a "slightly modified version" of the "common" code.
Also, I'd suggest using a deployment mechanism that doesn't rely on git. git is a great source code management tool; but it's not designed (AFAIK) to be a deployment tool.