I'm using Codemagic for my Flutter CI. For my repo, I'm using a monorepo structure. So I split my repository into +20 local packages. For every package, I'm running an own workflow (analyze, test, format, etc.).
Because of the high number of packages, becomes the codemagic.yaml to a massive file and the readability decreases.
With GitHub Actions, I'm able to split my workflows into multiple files, which increases the readability a lot (for every package a single file).
I asked the Codemagic Team an got this answer:
Unfortunately we do not plan on introducing the feature to split configurations into multiple different files at this point.
[But] there’s a couple of things you could look at doing. Firstly, take a look into using YAML anchors and aliases so you don’t repeat scripts. Here’s a guide and a sample codemagic.yaml.
The other option is to use the Codemagic REST API to trigger your builds and pass app specific environment variables for the build. In this was you can has a single workflow which you pass the variables too. See more about the API here.
There’s a few script samples here that show how to call it, and this yaml shows where the environment variables are overwritten.
Source: https://codemagicio.slack.com/archives/CEKE2KZ37/p1645538193515549?thread_ts=1645537597.216489&cid=CEKE2KZ37
Related
I am developing machine learning repositories that require fairly large trained model files to run. These files are not part of the git remote but is tracked by DVC and is saved in a separate remote storage. I am running into issues when I am trying to run unit tests in the CI pipeline for functions that require these model files to make their prediction. Since I don't have access them in the git remote, I can't test them.
What is the best practice that people usually do in this situation? I can think of couple of options -
Pull the models from the DVC remote inside the CI pipeline. I don't want to do this becasue downloading models every time you want to run push some code will quickly eat up my usage minutes for CI and is an expensive option.
Use unittest.mock to simulate the output of from the model prediction and test other parts of my code. This is what I am doing now but it's sort of a pain with unittest's mock functionalities. That module wasn't really developed with ML in mind from what I can tell. It's missing (or is hard to find) some functionalities that I would have really liked. Are there any good tools for doing this geared specifically towards ML?
Do weird reformatting of the function definition that allows me to essentially do option 2 but without a mock module. That is, just test the surrounding logic and don't worry about the model output.
Just put the model files in the git remote and be done with it. Only use DVC to track data.
What do people usually do in this situation?
If we talk about unit tests, I think it's indeed better to do a mock. It's best to have unit tests small, testing actual logic of the unit, etc. It's good to have other tests though that would pull the model and run some logic on top of that - I would call them integration tests.
It's not black and white though. If you for some reason see that it's easier to use an actual model (e.g. it changes a lot and it is easier to use it instead of maintaining and updating stubs/fixtures), you could potentially cache it.
I think, to help you with the mock, you would need to share some technical details- how does the function look like, what have you tried, what breaks, etc.
to do this because downloading models every time you want to run push some code will quickly eat up my usage minutes for CI and is an expensive option.
I think you can potentially utilize CI systems cache to avoid downloading it over and over again. This is the GitHub Actions related repository, this is CircleCI. The idea is the same across all common CI providers. Which one are considering to use, btw?
Just put the model files in the git remote and be done with it. Only use DVC to track data.
This can be the way, but if models are large enough you will pollute Git history significantly. On some CI systems it can become even slower since they will be fetching this with regular git clone. Effectively, downloading models anyway.
Btw, if you use DVC or not take a look at another open-source project that is made specifically to do CI/CD for ML - CML.
We are moving our build and testing system to Jenkins, and are looking for an easy way (where we don't have to code all the logic ourselves) to manage the build artifacts.
Basically we need an organised way to store them in order by the type of the build, the user who built it and such for example:
johnd/nightly/r543241/win32/program.zip
johnd/nightly/trunk/lin64/program.tar.gz
master/release/2.1/win32/program.zip
This way we can upload when a build is complete, and retrieve the required artifact in the testing stage with ease.
Until now we simply stored the files in directories on NFS, but recently started considering an artifact manager. I've looked at Artifcatory, Archiva and Nexus. But all seem very Java centered or at least required maven to work with. Since I don't want to introduce more complexity (We mainly work with python, scons is our build tool) and I don't want to introduce maven to the mix, I'm looking for something that has an easy command line (or better REST/Python interface) to upload, download, manage artifacts.
If you don't use an artifact manager, but use some other clever method to manage your C++ artifacts for release/test needs I'd be happy to hear that also.
I have exactly the same issue. I didn't succeed to integrate artifactory easilly, so I forgot it.
For the moment I use the copy artefact plugin to copy by scp the file to an NFS share. But this is not a good solution since shares may be mounted differently on two machines, and windows cannot access them directly.
But a real artifact manager would be so such a good thing for our project.
In my previous job, I was using a documentation CMS (LiveLink) which provided good services:
- self organising folder.
- persistant url (we can move, rename folders and files, a public URL to a given files never changes, that was SOOO great : "http://server/go/123456789123456789" always pointed to the same file. So you just had to give this url to wget and you get the file
- file versionning.
- meta data, advanced search
I'm still searching such a solution in opensource software and don't find it.
In you just need to share artefact between two jobs, simple use the "copy artifact for another build" plugin. I use it to split my job into a a build job and a test job (actually there are 3 test jobs, one very fast done after each commit, one quite slow done every day, and one very slow done each week end).
I would love to hear ideas on how to best move code from development server to production server.
A list of gotcha's, don't do this list would be helpful.
Any tools to help automate the steps of.
Make backups of existing code, given these list of files
Record the Deployment of these files from dev to production
Allow easier rollback if deployment or app fails in any way...
I have never worked at a company that had a deployment process, other than a very manual, ftp files from dev to production.
What have you done in your companies, departments, etc?
Thank you...
Yes, I am a coldfusion programmer, but files are files, and this should be language agnostic question.
OK, I'll bite. There's the technology aspect of this problem, which other answers have already covered. But the real issue is a process problem. Where the real focus should be ensuring a meaningful software development life cycle (SDLC) - planning, development, validation, and deployment. I'll cover each in turn. What you want is a repeatable activity at each phase.
Planning
Articulating and recording what's to be delivered. Often tickets or user stories are enough. Sometimes you do more, like a written requirements document, that a customer signs off on, that's translated into various artifacts such as written use cases - ultimately what you want though is something recorded in an electronic system where you can associate changes to code with it. Which leads me to...
Development
Remember that electronic system? Good. Now when you make changes to code (you're committing to source control right?) you associate those change with something in this electronic system - typically tickets. I like Trac, but have also heard good things about Atlassian's suite. This gives you traceability. So you can assert what's been done and how. Then you can use this system and source control to create a build - all the bits needed for whatever's changed - and tag that build in source control - that's your list of what's changed. Even better, have a build contain everything, so that it's standalone entity that can easily be deployed on it's own. The build is then delivered for...
Validation
Perhaps the most important step that many shops ignore - at their own peril. Defects found in production are exponentially more expensive to fix then when they're discovered earlier in the process. And validation is often the only step where this occurs in many shops - so make sure yours does it.
This should not be done by the programmer! That's like the fox watching the hen house. And whoever is doing is should be following some sort of plan. We use Test Link. This means each build is validated the same way, so you can identify regression bugs. And, this build should be deployed in the same way as you would into production.
If all goes well (we usually need a minimum of 3 builds) the build is validated. And this goes to...
Deployment
This should be a non-event, because you're taking a validated build following the same steps as you did in testing. Could be first it hits a staging server, where there's an automated copying process, but the point being is that is shouldn't be an issue at this point, because you validated with the same process.
Conclusion
In terms of knowing what's where, what you really want is a logical way to group changes together. This is where the idea of a build comes in. It's really the unit that should segue between steps in the SDLC. If you already have that, then the ability to understand the state of a given system becomes trivial.
Check out Ant or Maven - these are build and deployment tools used in the Java world which can help you copy / ftp files, backup and even check out code from SVN.
You can automate your deployment steps using these tools, for example Ant will allow you declare a set of tasks as part of your deployment. So you could, for example:
Check out a revision using SVNAnt or similar to a directory
Copy (and perhaps zip first) these files to a backup directory
FTP all the files to your web server(s)
Create a report to email to the team illustrating the deployment
Really you can do almost anything you wish to put time into using Ant. Maven is a little more strucutred (and newer) and you can see a discussion of the differences here.
Hope that helps!
In a nutshell...
You should start with some source control solution - probably Subversion or Git. Once that's in place you can create a script that generates a clean build of your source code and deploys it to your production server(s).
You could do this with a simple batch script or use something like Ant for more control. Here is a simple example of a batch file using Subversion:
svn copy svn://path/to/your/project/trunk -r HEAD svn://path/to/your/project/tags/%version%
svn checkout svn://path/to/your/project/trunk -r HEAD //path/to/target/directory
Ant makes it easy to do things like automatically run unit tests and sync directories. For example:
<sync todir="//path/to/target/directory" includeEmptyDirs="true" overwrite="true">
<fileset dir="${basedir}">
<exclude name="**/*.svn"/>
<exclude name="**/test/"/>
</fileset>
</sync>
This is really just a starting point. A next step might be a continuous integration solution like Hudson. I would also recommend reading "Pragmatic Project Automation: How to Build, Deploy, and Monitor Java Applications".
One ColdFusion specific gotcha is to make sure you clear the Application scope when required (to update any singleton components). A common approach here is to use a URL parameter that causes onRequestStart() to call onApplicationStart(). You may also have to clear the trusted cache.
We use a system called AnthillPro: http://www.anthillpro.com
It's commercial software, but it allows us to completely automate our deployment process across multiple servers and operating systems (We currently use it for both ColdFusion and Java, but it can be used for most languages. It has a ton of 3rd party integrations:
http://www.anthillpro.com/html/products/anthillpro/tool-integrations.html
Im new to GIT and dont know yet how much it will fit my needs, but it looks impressive.
I have a single webapp that i use for differents customers (django+javascript)
I plan to use GIT to handle these differents customers version as branches. Each customer can have custom files, folders and settings, improved versions... but the should share the same 'core'. We are a small team and suscribed a github account.
Is the branch the good way to handle this case ?
About the settings file, how would you proceed ? Would you .gitignore the customer specific settings file and add a settings.xml.sample file for example is the repo ?
Also, is there any way to prevent some files to be merged into master ? (but commited to the customer branch). For example, id like to save some customer data to the customer branch but prevent from to being commited to master.
Is the .gitignore file branch-specific ? YES
EDIT
After reading all your answers (thanks!) i decided to first refactor my django project structure to isolate the core and my differents apps in an apps subfolder. Doing this makes a cleaner project, and tweaking the .gitignore file makes it easy to use git branches to manage the differents customers and settings !
Ju.
In addition to cpharmston's answer, it sounds like you need to do some refactoring to separate out what is truly custom for each client and what isn't. Then you may consider adding additional repositories to track the customizations for each client (entirely new repos, not branches). Then your deployment can pull your "core" from your main repo, and the client-specific stuff from that repo.
I would not use branches to accomplish what you are trying to do.
In source control, branches are intended to be used for things that are meant to be merged back into trunk. For example, Alex Gaynor spent his summer of code working on a branch of Django that allows for support for multiple databases, with the goal of eventually merging it back into the Django trunk.
Checkouts (or clones, in Git's case) might better suit what you are trying to do. You would create a repo containing all of the project's base files (and .sample files, if you will), and clone the repo to all the various locations you wish to deploy the code. Then manually create the configuration and customization files at each deployment (take care not to add them to the repo). Whenever you update the code in the repo, run a pull on each deployment to update the code. Viola!
Other answers are correct that you'll be in the best shape for maintenance to the extent that you separate out your core code from the custom per-client code. However, I'll break from the crowd and say that if you're unable to do that (say because you need to add extra functionality to core code for a certain client), DVCS branches would work just fine for what you want to do. Though I'd probably recommend per-directory branches rather than in-repo branches for this purpose (git can do per-directory branches as well, it's nothing but a cloned repo that diverges).
I use hg, not git, but all of my Django projects are cloned from the same base "project template" repo that has utility scripts, a basic common set of INSTALLED_APPS, etc. This means when I make changes to that project template, I can easily merge those common updates into existing projects. This isn't exactly the same as what you're planning, but is similar. You will occasionally have to deal with merge conflicts, if you modify the same area of code in the core that you've already customized for a specific client.
Matthew Talbert is correct, you really need to separate the custom stuff from non-custom stuff. If you can refactor all the core code to be contained in one directory, your clients can use it as a read-only git submodule. The added benefit is that you lock them into an explicit version of the core code. This means that they would have to consciously update to a newer revision, which is what you want for production code.
After reading all your answers (thanks!) i decided to first refactor my django project structure to isolate the core and my differents apps in an apps subfolder. Doing this makes a cleaner project, and tweaking the .gitignore in the differents branches file makes it easy to use git branches to manage the differents customers and settings !
OK, so I have a Django project. I was wondering if I'm supposed to put each app in its own git repository, or is it better to just put the whole project into a git repository, or whether I should have a git repo for each app and also a git repo for the project?
Thanks.
It really depends on whether those reusable apps are actually going to be reused outside of the project or not?
The only reason you might need it in a different repo is if other projects with seperate repos might need to use it. But you can always split it out later. Making a git repo is cheap so it's one of those things you can do later if it becomes necessary. Making things complicated up front is just going to frustrate you later so feel free to wait till you know it's necessary
YAGNI it's for more than just code.
I like Tom's or Alex's answers, except they lack real rationale behind them
("easier to have one repository per development team" ?
"substantial numbers of people might be interested in pulling out (or watching changes to) separate apps"
why ?)
First of all, one or several repositories is an server-side administration decision (in term of server resources).
SVN easily sets up several repositories behind one server, whereas ClearCase will have its own "vob_server" process per VOB, meaning: you do not want more than a hundred VOB per (Unix) server (or more than 20-30 on a Windows server).
Git is particular: setting a repository is cheap, accessing it can be easy (through a simple shared path), or can involved a process (a git daemon). The latest solution means: "not to much repositories accessed directly from outside". (they can be accessed indirectly through submodules referenced by a super-project)
Then there is the client-side administration: how complex will the configuration of a workspace be, when one or several repositories are involved. How can the client references the right configuration? (list of labels needed to reference the correct files).
SVN would use externals, git "submodules", and in both case, that adds complexity.
That is why Tom's answer can be correct.
Finally, there is the configuration management aspect (referenced in Alex's answer). Can you tag part of a repo of not ?
If you can (like in SVN where you actually make a svn-copy of part of a repo), that means you can have a component approach, where several groups of files have their own life cycle, and their own tags (set at their own individual pace).
But in Git, that is not possible: a tag references a commit which always concerns the all repository.
That means a "system-based" approach, where you only want the all project anyway (as opposed to the "watching separate apps - I've never observed it in real life" bit from Alex's answer). If that is the case (if you want the all system anyway), that is not important.
But for those of us who think in term of "independent groups of files", that means a git repository actually represents an individual group of files (with its own rhythm in term of evolutions and tagging), with potentially a super-project referencing those repositories as submodules.
That is not your everyday setup, so for independent projects, I would recommend only few or one Git repo.
But for more complex interdependent set of projects... you need to realize that by the very way Git has been conceived, a "repository" represents a coherent set of files supposed to evolve at the same pace as a all. And not "all" can always fit in one set of files (if "all" is complex enough). Hence, several repositories would be required in this case.
And a complex set of inter-dependent projects does happen in real life too ;)
Almost always easier to have one repository per development team, no matter how many products you have. Eventually you will want to share code between projects (even several seperate DJango websites!) and it is much easier to do with only one repository.
By all means set up a nice folder structure WITHIN that repository. However, a single checkout from git (probably from a subfolder) should give you all the files you need for your test website (python, templates, css, jpegs...), and then you just copy httpd.conf and so on to complete the installation.
I'm no git expert, but I don't believe the appropriate strategy would be any different from one for hg, or even, in this case, svn, so I'm going to give my 2 cents' worth anyway;-).
A repo per app makes any sense only if substantial numbers of people might be interested in pulling out (or watching changes to) separate apps; this could theoretically be the case but I've never observed it in real life -- mostly, people who are interested in the project are interested in it as a whole. Therefore, I would avoid complications and make the whole proj into one repo.
Do those individual apps utilize shared code and libraries? If so, they really belong in the same repository .. which enables you to see the impact of new changes when running a single testing suite.
If they are completely separate and agnostic, it really doesn't matter. Just for sanity, ease of building and ease of packaging, I prefer to keep a whole project in a combined repository. However, we typically work in very small teams. So, if no shared code is involved, its really just a question of which method is most convenient and efficient for all concerned.
If you have reusable apps, put them in a seperate repo.
If you are worried about the number of private repositories have a look at bitbucket (if you decide to make them open source, I advice github)
There is a nice way of including your own apps in to the project, even making use of version tags:
I recently found out a way to do this with buildout and git tags
(I used svn:externals to include an app, but switched from svn to git recently).
I tried mr.developer first, but while not getting this to work i found an alternative for mr.developer:
I found out gp.vcsdevlop is very easy to use for this purpose.
see https://pypi.python.org/pypi/gp.vcsdevelop
I ended up, putting this in my buildout file and got it working at once (I had to add a pip requirements.txt to my app to get it working, but that's a good thing after all):
vcs-update = True
extensions =
gp.vcsdevelop
buildout-versions
develop-dir=./local_checkouts
vcs-extend-develop=git+git#bitbucket.org:<my bitbucket username>/<the app I want to include>.git#0.1.38#egg=<the appname in django>
develop = .
in this case it chacks out my app and clone it to version tag 0.1.38 in to the project in the subdit ./local_checkouts/
and develops it when running bin/buildout
Edit: remark 26 august 2013
While using this solution and editing in this local checkout of the app used in a project. I found out this:
you will get this warning when trying the normal 'git push origin master' command:
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes before pushing again. See the 'Note about
fast-forwards' section of 'git push --help' for details.
EDIT 28 august 2013
About working in the local_checkouts for shared apps included by gp.vcsdevelop:
(and handle the warning discussed in the remakr above)
git push origin +master seems to screw up commit history for the shared code
So the way to work in the local_checkout dir is like this:
go in to the local checkout (after a bin/buildout so the checkout is the master):
cd localcheckouts/<shared appname>
Create a new branch and swith to it like this:
use git checkout -b 'issue_nr1' (e.g. the name of the branch is here named after the issue you are working on)
and after you are done working in this branche, use: (after the usuals git add and git commit)
git push origin issue_nr1
when tested and completed merge the branch back in the master:
first checkout the master:
git checkout master
update (probably only neede when other commits in the mean time)
git pull
and merge to the master (where you are in right now):
git merge issue_nr1
and finaly push this merge:
git push origin master
(with sepcial thanks to this simplified guide to git: http://rogerdudler.github.io/git-guide/)
and after a while, to clean up the branches, you might want to delete this branch
git branch -d issue_nr1