How to include data files with ado packages in Stata

How to include data files with ado packages in Stata - stata

If I publish an .ado package for Stata, I can include .dta files among the installation files in the .pkg description file, using a line like:
f amazingdata.dta
However, it is not clear how, other than by navigating to an out-of-the-way directory for user-selected add-on packages how users can load these data. For example, is there a way to make data files for .ado files available with commands like:
. sysuse amazingdata

You have found the documentation for package files, so this seems to be a question in the first instance about whether sysuse marches with installations of user-written packages, and the short answer is an emphatic No.
sysuse is intended as a quick-and-easy way for Stata users to access datasets made available by StataCorp to support official commands. I have not tried it, but my guess is that sysuse would work with any dataset so long as that dataset was placed in the directories searched by Stata.
However, I would argue that would be very poor style. When Stata programmers publish a package centred on programs and their help files, they often make test datasets available, but the best standard is to mark such files as ancillary and let users download them to a location of their own choice using net get. As said, this is a choice.
The argument can be strengthened. It's best practice to keep StataCorp's own files and other files strictly segregated. That way, updates and upgrades, copying files to other machines, etc. all are much less likely to get confused or tangled given some clash of names. Most likely, you might install or reinstall Stata and "forget" that you had user-written stuff mixed in with StataCorp's own files and waste time trying to find it.
In any case, for sysuse to work like this, users would have to install files manually in the place(s) searched by Stata, as Stata's download commmands would not do that automatically.
As for "out-of-the-way", this is not for you to decide. Many users have very strict personal or workplace rules that each project requires quite different directories or folders, so placing files only where they can be found quite deliberately is, in their own terms, a very good idea. Otherwise put, the net get mechanism implies that users decide, carefully or not, where files are to go. Users also have the scope to manipulate their adopath should they wish to supplement Stata's rules on where it searches.

Turns out that I misunderstood the installation process, and that the functionality I desire is in Stata. I had assumed that the data files were automatically installed, but see now that they are considered ancillary, and require additional explicit installation. When this is done, the user gains access by simply typing, for example, use amazingdata. Done. Simple!

Related

Is there a way to apply SAS EG processes to new files?

I'm taking over a project from a coworker that involves several extensive SAS process flows. I have all the files with all the same names and a copy of the process flows they used. Since the file paths in their processes are direct references to their computer, normally I would just re-import the files with the same output names and run the process from there. In a few cases I would have to recreate a query builder as I'm using a few .sas7bdat files from another project.
However, there are quite a few files involved and I may end up having to pass this to another coworker in a few months, and since I can't get a good look at exactly what the import task is doing I'm concerned I may have some of the variables imported incorrectly. Is there an easy way to just change the file path the import or other task refers to?

Given the updates in comments, there's two possibilities I see.
If the paths you're changing are, or can be, relative to the location of the EGP, then you can right click on the Project->Properties->File References and check "Use paths relative to the project...", which means instead of storing a file in c:\my EGP folder\my code folder\code.sas it would store it as my code folder\code.sas. So then if the whole project moves to another computer (or just any other folder) then it automatically has the right path. This is mostly useful for code or similar things.
Otherwise, you're going to have to convert things to SAS code modules. There you can use macro variables to define the locations of things.

What's wrong with this user ignore file for Mercurial?

A little retrospective now that I've settled into Mercurial. Forget forget files combined with hg remove. It's crazy and bass-ackwards. You can use hg remove once you've established that something is in a forget file that isn't forgetting because the item in question was tracked before the original repo was created. Note that hg remove effectively clears tracked status but it also schedules the file for deletion in anything that gets changes from your repo. If ignored, however the tracking deactivation still happens but that delete-me change set won't ever reach another repo and for some reason will never delete in yours which IMO is counter-intuitive. It is a very sure sign that somebody and I don't know these guys, is UNWILLING TO COMPROMISE ON DUH DESIGN PROBLEMS. The important thing to understand is that you don't determine what's important, Mercurial does. Except when you're merging on a pull of course. It's entirely reasonable then. But I digress...
Ignore-file/remove is a good combo for already-tracked but very specific files you want forgotten but if you're dealing with a larger quantity of built files determined with broader patterns it's not worth the risk. Just go with a double-repo and pull -u from the remote repo to your syncing repo and then pull -u commits from your working repo and merge in a repo whose sole purpose is to merge changes and pass them on in a place where your not-quite tracked or untracked files (behavior is different when pulling rather than pushing of course because hey, why be consistent?) won't cause frustration. Trust me. The idea that you should have to have two repos just to get 'er done offends for good reason AND THAT SO MANY OF US ARE DOING IT should suggest a serioush !##$ing design problem, but it's much less painful than all the other awful things that will make you regret seeking a sensible alternative.
And use hg help. It's actually Mercurial's best feature and often better than the internet (which I don't fault for confusion on the matter of all things hg) for getting answers to everything that is confusing and counter-intuitive in this VCS.
/retrospective
# switch to regexp syntax.
syntax: regexp
#Config Files
#.Net
^somecompany\.Net[\\/]MasterSolution[\\/]SomeSolution[\\/]SomeApp[\\/]app\.config
^somecompany\.Net[\\/]MasterSolution[\\/]SomeSolution[\\/]SomeApp_test[\\/]App\.config
#and more of the same following
And in my mercurial.ini at the root of my user directory
[ui]
username = ereppen
merge = bcomp
ignore = C:\<path to user ignore file>\.hgignore-config
Context:
I wrote an auto-config utility in node. I just want changes to the files it changes to get ignored. We have two teams and both aren't on the same page with making this a universal thing so it needs to be user-specific for now.
The config file is in place and pointed at by my ini file. I clone. I run the config utility and change the files and stat reveals a list of every single file with an M next to it. I thought it was the utf-8 thing and explicitly set the file to utf-16 little endian. I don't think I'm doing with the regEx that any modern flavor of regEx worth actually calling regEx wouldn't support.

The .hgignore file has no effect for files that are tracked. Its function is to stop you from seeing files you want ignored listed as "untracked". If you're seeing "M" then they're already added (you got them with the clone) so .hgignore does nothing.
The usual way config files that differ from machine to machine are handled is to put a app.config.sample in source control, have app.config in .hgignore and have people do a copy when they're making their config edits.
Alternately if your config files allow for includes and overrides you end them with include app-local.config and override any settings in a app-local.config which you don't add and do include in .hgignore.

How to organize sources of complex program?

We're creating very complex embedded system and «sources» contains few projects of Visual C++, IAR, Code Composer Studio and Altium Designer schemes and pcbs. All of that possibly could be in few versions.
So, what practice could you advice me to arrange all that stuff?
Thank you

I have the same setup as you.
I use Altium Designer for the hardware schematics and PCB design. But I also have Firmware source files and related utilities. And I have mechanical design files.
Here's how I do it:
Project Name
Firmware
MainCpu
trunk
tags
branches
IoCpu
trunk
tags
branches
Hardware
MainPcb
trunk
tags
branches
IoPcb
trunk
tags
branches
PowerPcb
trunk
tags
branches
Mechanical
Chassis
trunk
tags
branches
Other
trunk
tags
branches
This way all the project files are stored together in the SVN repository. The only down side I've found is that you can't just check out the Project and get the latest FW/HW/MEK files. You have to check out each Head of FW/HW/MEK.
The reason for the separate sub-modules for FW/HW/MEK is that they will get separate version tags.

Everything that you consider as sources should be under a Source Control System, like SVN. This is the best way to handle versions, revisions, branches and tags. SVN can handle binary files, so you won't have problems with non-text files.

If your C++ source files are numerous and span multiple directories then the effort put into grokking Large Scale C++ Software Design by John Lakos may be very worth it. The main theme of the book is how your physical layout of the software, that is, the arrangement of source code files in directories, limit or extend your ability to modify the software.

I like to have a directory structure that at the top level reflects each of the programmable parts.(i.e. microcontroller, DSP1, FPGA1, FPGA2,...)
I also like to have a subdirectory(ies) that has all the generated files, so it is easy to make a clean source tree. Also make it easy to do a clean build straight from the source code configuration tool. (i.e. get and build from source to binary image(s) in as few steps as possible)
Also have each programmable part have it's own version number, and one version number that reflects each of the combination of the sub component version numbers.

Definitely use source control, if the program itself doesn't support it, just keep the parent folder you use under source control. SVN is my current fav.
As far as how to arrange your files, I noticed you had Altium Designer on your list, that program will a) play nice with source control, and b) arrange your files in an orderly manner, assuming you use their whole 'project' file structure. Look into using their 'PCB' (if that's what your doing) or 'embedded' projects, when you create one, it creates buckets for you to store all your different types of files into.
Even if you don't want to actually use Altium for your files, create a project and look at their directory structure to get an idea about all the files you'll need to keep track of.

(Aside from trivial helper classes) put one class in each cpp/h file, and name the cpp/h files the same as the class.
Group related classes files into folders (you can optionally use a hierarchy of namespaces that match the folder structure. The .net approach here is to use a CompanyName.ProductName namespace, with your files stored in a ProductName project/subfolder of your solution). So for example, you might group your Math, I/O, and Drawing classes into separate "subsystem" folders.
Ideally, make these separate sections into re-usable libraries (MyCompany.Math). You'll be glad of this later when you want to develop a new product that will share some of the code. In that case, the top level "folders" become separate projects in their own right, and you can start to work on minimising dependences between them to realise and then enforce a much better overall framework design in your code base.
The ideal within folders is to find a good balance between clutter and sparseness - try to balance the folders so that they have between 5-15 files in each. If fewer, consider merging the folders; if greater, consider adding sub-category folders to break down the complexity.
As long as your classes/files and namespaces/folders have good descriptive names, and your folders are logically structured, you can make an extremely large project very easy to navigate.
At the risk of starting a religious war, I prefer to put the headers and their source files in the same folder so that when you are editing a .cpp the .h is easily accessible rather than having to move up and fown by a folder all the time.

Reduce the complexity!
My first engineering professor had a famous first lecture. It consisted of a single equation written on the blackboard:
Perfection = Simplicity
The problem with Source Control Systems is that they manage complexity but also promote it.

C++ Header files - put them in one directory or merged in a tree structure?

I have a substantial body of source code (OOFILE) which I'm finally putting up on Sourceforge. I need to decide if I should go with a monolithic include directory or keep the header files with the source tree.
I want to make this decision before pushing to the svn repo on SourceForge. I expect a lot of people who use it after that move will keep a working copy checked out directly from SF so won't want to change their structure.
The full source tree has about 262 files in 25 folders. There are a lot more classes than that suggests as due to conforming to 8.3 character names (yes it dates back to Win3.1) many classes are in one file. As I used to develop with ObjectMaster, that never bothered me but I will be splitting it up to conform to more recent trends to minimise the number of classes per file. From a quick skim of the class list, there are about 600 classes.
OOFILE is a cross-platform product expected to be built on Mac, Windows and assorted Unix platforms. As it started life on Mac, with compilers that point to include trees rather than flat include dirs, headers were kept with the source.
Later, mainly to keep some Visual Studio users happy, a build was reorganised with a single include directory. I'm trying to choose between those models.
The entire OOFILE product covers quite a few domains:
database front-end
range of database backends
simple 2D graphing engine for Mac and Windows
simple character-mode report-writer for trivial html and text listing
very rich banding report-writer with Mac and Windows Preview and Printing and cross-platform generation of text, RTF, HTML and XML reports
forms integration engine for easy CRUD forms binding to the database, with implementations on PowerPlant and MFC
cross-platform utility classes
file and directory manipulation
strings
arrays
XML and tag generation
Many people only want to use it on a single platform and some of those code areas are pure legacy (eg: PowerPlant UI framework on classic Mac). It therefore seems people would appreciate not having headers from those unwanted areas dumped in their monolithic include directory.
I started thinking about having an include directory split up into a few of the domains above and then realised that was sounding more like the original structure.
In summary, the choices seem to be:
Keep original model, all headers adjacent to source - max flexibility at cost of some complex includes in projects.
one include directory with everything inside
split includes by domain, so there may be about 6 directories for someone using the lot but a pure database user would probably have a single directory.
From a Unix build aspect, the recommended structure has been 2. My situation is complicated by needing to keep Visual Studio and XCode users happy (sniff, CodeWarrior, how I doth miss thee!).
Edit - the chosen solution:
I went with four subdirectories in include. I started trying to divide them up further by platform but it just got very noisy very quickly.

Personally I would go with 2, or 3 if really pushed.
But whichever you choose, please make it crystal clear in the build instructions how to set up the include paths. Nothing dooms an open source project more than it being really difficult to build - developers want a quick out-of-the-box experience and if it involves faffing around with many undocumented environment variables (or whatever) most will simply go away.

C++ Directory Restructuring

I have a source code of about 500 files in about 10 directories. I need to refactor the directory structure - this includes changing the directory hierarchy or renaming some directories.
I am using svn version control. There are two ways to refactor: one preserving svn history (using svn move command) and the other without preserving. I think refactoring preserving svn history is a lot easier using eclipse CDT and SVN plugin (visual studio does not fit at all for directory restructuring).
But right now since the code is not released, we have the option to not preserve history.
Still there remains the task of changing the include directives of header files wherever they are included. I am thinking of writing a small script using python - receives a map from current filename to new filename, and makes the rename wherever needed (using something like sed). Has anyone done this kind of directory refactoring? Do you know of good related tools?

If you're having to rewrite the #includes to do this, you did it wrong. Change all your #includes to use a very simple directory structure, at mot two levels deep and only using a second level to organize around architecture or OS dependencies (like sys/types.h).
Then change your make files to use -I include paths.
Voila. You'll never have to hack the code again for this, and compiles will blow up instantly if something goes wrong.
As far as the history part, I personally find it easier to make a clean start when doing this sort of thing; archive the old one, make a new repository v2, go from there. The counterargument is when there is a whole lot of history of changes, or lots of open issues against the existing code.
Oh, and you do have good tests, and you're not doing this with a release coming right up, right?

I would preserve the history, even if it takes a small amount of extra time. There's a lot of value in being able to read through commit logs and understand why function X is written in a weird way, or that this really is an off-by-one error because it was written by Oliver, who always gets that wrong.
The argument against preserving the history can be made for the following users:
your code might have embarrassing things, like profanity and fighting among developers
you don't care about the commit history of your code, because it's not going to change or be maintained in the future
I did some directory refactoring like this last year on our code base. If your code is reasonable structured at the beginning, you can do about 75-90% of the work using scripts written in your language of choice (I used Perl). In my case, we were moving from set of files all in one big directory, to a series of nested directories depending on namespaces. So, a file that declared the class protocols::serialization::SerializerBase was located in src/protocols/serialization/SerializerBase. The mapping from the old name to the new name was trivial, so that doing a find and replace on #includes in every source file in the tree was trivial, although it was a big change. There were a couple of weird edge cases that we had to fix by hand, but that seemed a lot better than either having to do everything by hand or having to write our own C++ parser.

Hacking up a shell script to do the svn moves is trivial. In tcsh it's foreach F ( $FILES ) ... end to adjust a set of files. Perl & Python offer better utility.
It really is worth saving the history. Especially when trying to track down some exotic bug. Those who do not learn from history are doomed to repeat it, or some such junk...
As for altering all the files... There was a similar question just the other day over at:
https://stackoverflow.com/questions/573430/
c-include-header-path-change-windows-to-linux/573531#573531

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js