Getting more advanced at stata

One of my goals this summer is to get better at Stata programming.  I’m mostly self-taught, as are most economists, and I’ve worked with different people who have different styles and I’ve definitely noticed there are things that are helpful in writing code and things that are not so helpful.  I’ve also been picking up tricks that I should have learned years ago.

My goals are two-fold.  I want to be more organized and I want to be more efficient with my programming.

Two books I recommend: 1. A Gentle Introduction to Stata.  This is actually a great intro-to-programming book that mainly only goes over basic stuff.  (#2 likes it, and has it from the library.)  But I’ve managed to pick up some good tricks from it (numlabel _all, add  FTW).   2. The Workflow of Data Analysis Using Stata— this is a really great book for thinking about how to organize, comment, label, etc. etc. etc. your, um, Stata workflow.  It says a lot of stuff I already knew, but haven’t been acting on, and puts it all together in a way that I hope to be able to act on.

So what does this mean specifically?  First, I’m being much better about commenting my code, particularly the part at the top that says what the purpose of the .do file is.  I’m also getting better at consistent names– my previous system would have, for example, multiple “Table 2″s every time the tables in the paper would change (#2 is shocked).  Now I’m better about saying things like, Table_2_SOLE, which would be the version of Table 2 back when the paper was presented at SOLE, and I have better more informative table names for things that aren’t official paper tables.

I’m also trying to do a better job of keeping my current files in one folder and moving out the older versions so I don’t accidentally use an older version after I’ve fixed a mistake.  Recently that has caused me some embarrassment that a referee noticed.  I’m getting a bit better about dating files as well.  #2 tends to change the name of the files to something like “data analysis project X OLD DO NOT USE” and “revised data set USE THIS ONE”.  Also #2 uses dates, but I don’t find them as useful as they should be.

In terms of programming itself, two of my big goals are to start using loops automatically instead of cut/paste/replace automatically.  I need to get more practiced at them so I don’t have to look up the code each time.  (I’m proud of myself for finally figuring out which `’ to use when in the loops!)  I also want to start using locals more, which is again something I tend to cut/paste/replace for when I really should have shorter and cleaner code that just changes the local.

It’s a bit embarrassing that I’m just making these changes now, but as always, I remind myself that lesson I learned in graduate school– later will be even more embarrassing, so given that before is sunk, now is the best time to figure out something I should have figured out a long time ago.  #2 adds, it’s never too late to improve your workflow and versioning.  I’m trying to make mine better all the time.

Do you have any self-improvement things going on in your life?

17 Responses to “Getting more advanced at stata”

  1. bogart Says:

    Well, I’m trying to get rid of stuff and generally neaten up my house, does that count? Modest progress in a couple of small areas + the front entryway (a larger accomplishment, initiated by hubby but jointly undertaken), so far.

  2. Foscavista Says:

    I’m taking Harvard’s free online CS50 to learn basic computer programming to try something relatively unknown to me.

  3. chacha1 Says:

    just the paralegal certificate course. Am really looking forward to wrapping that up so I can embark on a fresh self-improvement project entitled Start Dancing Again You Pudgy Couch Potato.

  4. AJ Says:

    I can’t believe that both of you actually version your files by copying and pasting the source code file, and changing it to a different date… or moving it to another folder. Why don’t you both check out a version control system (VCS) called Git.

    It’s a little complicated, but versions all of your files, and keeps track of all of the various changes for you… Your current way of doing this is grossly inefficient and error prone. I’m a computer programmer. Happy to point you toward some resources you can use to learn Git.

    P.S. Long time lurker, first time poster. Isn’t one of your sigificant others a programmer? He probably already knows how to use Git and can teach you!

    • nicoleandmaggie Says:

      We don’t copy and paste the source code file… that sounds like an unnecessary extra step.

      I will talk to DH about Git, but I imagine it’s either unnecessary for what we’re doing, is similar to what we’re doing, or it doesn’t play well with our statistical programs, otherwise we would already be doing it. But I will check it out, thanks!

    • nicoleandmaggie Says:

      DH: git will add some effort/complexity, and provide some versioning benefit. If versioning is an issue, then you should check it out and I can help you with it because used correctly it’ll solve those issues. If versioning isn’t really an issue, then it’ll just add effort.

      • AJ Says:

        From your post, it sounds like version-ing is a problem, so I hope it helps you!! =)

      • nicoleandmaggie Says:

        It’s only a problem because instead of saving a new version (even something like turning Table 4 into Table 5 and creating a new Table 4) under a different name and I dunno, actually commenting the code, past practice has been to be lazy and not do anything. (Yes, I know, I was trained better, but there’s always this belief that I’ll remember or maybe this is only a temporary program and it turns out not to be or I’ll go back later and fill stuff in. Past experience says no, I need to work on having better documentation now, now in the future.)

        Most of our statistical programs are short one-use items so doing something like saving a table as Table_4_SOLE, to be distinguished from Table_4_ASSA would be helpful (these are different conferences). Also there are changes like, did we decide to use state specific time trends or not? We need to keep both programs in case a reviewer wants a different version, so it isn’t going to be the same program, just an alternate version of the same program. I don’t think versioning software helps that problem. But giving things better names does.

  5. nicoleandmaggie Says:

    #2 is late to the comment party but I have used Git. I don’t particularly like it, but I have used Git. It’s ok. I don’t need it for what I do; if I end up doing a lot of programming in the future I’ll get back on GitHub again.

  6. Kellen Says:

    So, I started looking into learning some programming for two reasons:
    1) I keep thinking about going to grad school, and most programs I would want to switch to require me to take some math classes, and suggest that you know “some programming”
    2) I hang out with a bunch of software developers at work now, and one of them encouraged me to try it

    In terms of doing a grad degree in something like economics, where the programming is used for the data crunching I assume (as opposed to a degree where you literally have to create user-friendly computer programs), how familiar with a programming language should one really be? And I noticed some schools list specific languages, others just say programming in general–I’m guessing these programs really just want you to be able to pick up on how to use something like Matlab quickly, so should you just learn how to use Matlab instead of say, getting really familiar with python?

    • nicoleandmaggie Says:

      What they really want is stata and SAS. I got points off on my homework for doing the math in Matlab, which was irritating. Personally I have found that being able to do real programming has given me a leg up in my area of Econ, as have a couple of my coauthors, because a lot of people don’t realize what is possible so there is new ground to tread. A lot depends on what part of Econ you go into. Some are more simulation intensive or require programming to run an experiment. Some just need data crunching.

      • Kellen Says:

        Hmmm… I guess the second question is how to get familiar with these programs before starting a degree (since they’re not free.) It’s easier to learn say, python, for free. But maybe there are open source equivalents… Will check.

      • nicoleandmaggie Says:

        R is what you want of the free programs.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: