Ryan Kulla: August 2011

Saturday, August 20, 2011

Knowing What to Unit Test

You may have heard things in the TDD world such as "always write a failing test first" and "write the tests you wish you had". Good advice but not very specific.

The main thing that seems obvious to test is the public API of what your class does. If your class extracts email addresses from strings and files then it might seem obvious to have tests like Should_extract_email_addresses_from_string() and Should_extract_email_addresses_from_file().

That is all well and good but if you follow the BDD principle Test behavior not methods then you wouldn't merely be testing at a 1:1 ratio between your test methods and MUT (Methods Under Test). You often need more than one test per MUT because you need to test different things about it. For example, just because you have a method called blockFollower() and have the corresponding test method Should_block_follower(), it doesn't mean you shouldn't have additional test methods like: Should_not_block_all_followers(), Should_not_block_who_you_are_following(), and so on.

Another way to figure out what to test is to think not only of the positive tests but also the negative tests. If your method should throw an exception if you give it certain data (such as no dat) then have a test like: Should_throw_exception_if_input_is_null().

Take this a step further and think of all the behavior your code should exhibit. Suppose your process should continue to run even if there's an error. You can make a test for too: Should_continue_running_even_if_there_is_an_error().

Whenever you're refactoring, you should also be unit testing. If the legacy code you're refactoring doesn't have unit tests, be sure to avoid regression by adding test coverage around it before you try to change it.

There's no limit to what you can test. Even if your class has a method that sends mail you can still put tests on it. One way to do this is to have a test method that checks if the method was simply called. You don't have to test that it returned something "in real life" if it's not possible. If your method is called sendMail() then you could have a test called Should_send_mail() and at least test that the sendMail() method is successfully called with whatever input you give it; you're testing that the arguments it takes work properly, etc.

Look again at your MUTs whenever you feel you have all the tests you need. Are they really designed to only do one thing each? When the answer is no, break them down into smaller pieces and put tests around those pieces, or at least around the groups of pieces if they all constitute a single behavior.

Perhaps you don't even have methods because no one bothered to put the code in a class or even procedural functions to begin with. I see this all the time, especially with PHP code. If that's the case it's time to start extracting as many classes and methods as you can and put those under test. It helps to think about the structure of the code. Look for code smells like globals and lack of Dependency Injection. Aim for functional decomposition.

Languages that aren't strictly typed, such as PHP, also presents you with some new unit test ideas. Type Hinting in PHP will only go as far as arrays and objects, but with a little creativity you create tests that enforce that a variable has to be a string or an integer; all without the need to write extra code in your application to enforce it:

  public function should_only_allow_strings_for_area_codes() {
     $this->assertInternalType('string', $this->obj->getAreaCode());
  }

Think destruction. The more you can get into the evil mindset of purposely trying to break code, the more test ideas you'll come up with. For instance, think of boundary conditions where things could go awry. Try throwing the date "Feburary 29" at your calendar function (account for leap year, though) or a non ASCII string at your form validator. Create test helper functions that generates random dates, strings, and so on and throw it at your MUTs. Aim to write tests that fail even though you think they should work.

The book Working Effectively with Legacy Code suggests that it's okay to test static methods, as long as they don't have state or nested static methods. A lot of ideas of things to test are lost whenever we hear that something should never be done. As in life, there are almost always exceptions.

Another thing you can test is your ideas and prototypes. TDD is great because, since you're writing your tests first, you can actually design an entire class (or application for that matter) without so much as an internet connection. Hell, you could even design the whole thing on paper, just by thinking of all the test method names. If at Friday at 5:50 p.m. your project manager tells you that first thing Monday morning they want you to create a feature for the admin interface to allow the deletion of users, then by 6:00 p.m. you could have already written the skeleton tests: Should_delete_user(), Should_not_delete_admin(), and so on.

Remember to always watch your tests fail before writing the code to make them pass, because you need to make sure that if it fails the right messages are displayed and so on.

Whenever you get a bug report, start by writing a unit test that exposes the bug before you fix it.

The key isn't to just write more unit tests for the sake of writing more unit tests. The key is quality over quantity. This will also help you avoid the all-or-nothing thinking that prevents some people from ever writing any unit tests. The more meaningful unit tests you have, the more confident you'll feel working with, and using, the entire system.

What about knowing which things mock? Try not to mock third-party libraries and other things you don't have control over because it will create fragile expectations.

As far as database unit testing goes, avoid using DBUnit style integration tests. It's fine to mock database access in order to satisfy Interface Type Hints and the like. I often Stub PDO when working in PHP for example. I also like to use SQLite because in-memory databases allows the unit tests be unit tests (and quick unit tests) and it's just plain nice to be able to delete the database between each test without hurting anything.

That's all for now. As I learn more I'll post more entries. Happy Testing.

Wednesday, August 17, 2011

Some Points on Git vs Subversion

Since giving a talk on Git a few months ago at work, and hearing from some people about how they especially liked my comparisons of Git and Subversion, I decided to make a blog post about just that.

Aside from the obvious points about how "Subversion is centralized and Git is distributed", I thought I'd offer some other places where the two differ - particularly in practicalities. In passing conversations, it's very easy to say that Git is better than Subversion without going into much detail as to why, so I'm going to cover quite a lot here.

In Git, you create your repositories. You don't need to ask the system administrator to do it for you on another server (unless it's to setup a central repository). You create the repo with a simple command:

 $ cd proj/
 $ git init

Compare this to Subversion's way:

 $ ssh [svn server]
 $ sudo svnadmin create /path/to/subversion/repos/proj --fs-type fsfs

The Git repository lives in the top level of your project (proj/.git). So, unlike Subversion, you just have one hidden directory for version control. No .svn directories or anything in every single directory of your project.

Git doesn't use the term working copy like Subversion does. Git uses the term working tree. Since there's no separation of the working tree from the repository there's no copy. Make sense? In Subversion, your repository exists over there on another server. In Git, your repository is right here, in the .git directory inside your project's directory on your work station. This means deleting your git repo is as simple as rm -rf proj/.git.

Git also doesn't use separate file systems that you need to worry about creating or updating. With Subversion, if you create a repository with FSFS it will be a specific version of FSFS. If you upgrade svn, you'll also have to upgrade your Subversion file system to get the new features offered by the new version of the svn software. This requires asking your friendly system administrator to do an svn upgrade on your repositories. Git doesn't break backward compatibility like that.

"That sounds great and all, but doesn't it mean I will have to manage my own backups for each repository I create - rather than relying on my System Administrator to do it for me?" Yes and no. This is where Central Repositories for Git are useful. Even though Git is distributed, it doesn't mean Git can't make use of central repositories. In fact if you're working on a team, you'll probably want to have a central repository to push your changes to and allow others to then pull them. This also serves as a backup.

Git has three main states that your files can reside in: committed, modified and staged. Committed means that the data is stored in your local database; Modified means that you've changed the file but haven't committed to your database yet; and Staged means you have marked a modified file in its current version to go into your next commit snapshot.

This is conceptually similar to how Subversion works, but Git's staging area is far more powerful than how svn does things. Say you create a couple of files. Just like with svn, before you can commit any new files you need to add them:

 $ git add .

That is what it means in Git terms when someone says to stage your files; You add them to the staging area so Git can be aware of them. Once the files are added, you can commit them:

 $ git commit -m "initial import"

Once a file has been committed, and you make further changes to it, you will have to stage the file again before you commit it. Fortunately, you can just use the -a option to stage it and commit it at the same time. Edit the file foo and add a line of text blah to it, then:

 $ git commit -a -m "Committing blah"

Another thing that's great about Git is that it gives you a lot more verbose output when you run commands. Subversion doesn't usually tell you very much about what's happening. For example, Git often tells you which commands you can run in order to undo an operation.

Create two new files called spam and eggs. To see what state the unadded files are in, type:

 $ git status

This is akin to svn status, which shows ? next to unadded files. Git is more verbose and will say the files are untracked. So track them:

 $ git add spam eggs

Now typing git status will say they are new files with changes ready to be committed. Instead of reverting files, you can unstage them if you don't change your mind and don't want to commit them.

Git is really good about letting you see diffs and times very easily and flexibly. For example, to see what was last committed, both the messages and the diffs, type:

 $ git show

One of my favorite commands is:

 $ git whatchanged

It will show you a git log with every file tht was changed and how it was changed (M, A, D, etc.) You can even do:

 $ git whatchanged --since="2 weeks ago"

Like svn log, you can do git log to see all that was committed thus far. You can take this many steps further with Git. For instance, you can see all the diffs for everything with:

 $ git log --stat -p

To narrow this output down to just all the commits that rkulla made, do:

 $ git log --author=rkulla

This is much nicer than in svn where you end up having to grep the log output. In fact, grep'ing the log output with git can be done with:

 $ git log --grep=foo

There's no need to git log | grep foo, which is nice because piping to grep causes you to lose information because it only shows you the lines that contain the exact match.

You can make aliases directly in Git. There's little need to outsource aliases to your command-shell like you need with Subversion. Since there are so many command-line options with Git, I often make custom commands. Take this one:

 $ git config --global alias.lol 'log --pretty=format:"%C(yellow)%h%d%Creset %s - %an [%ar]"'

This lets me type:

 $ git lol

to show the git log like you get with git log --oneline, but it's more verbose and shows things like the author and how long ago things were committed.

You can also apply filters to git log. For example to see all files that were ever deleted from the repository:

 $ git log --name-status --diff-filter=D

The pickaxe can help you find code that's been deleted or moved (or introduced) based on a string. To use the pickaxe pass -S[string] to git log:

 $ git log -Sfoo

That will show the commit(s) that the string foo was ever in. Because the ncurses based program tig supports all the git options, you can view the list of commits and then see the diffs by hitting enter by first running:

 $ tig -Sfoo

With git, commands like git log, git diff--and others that produce lots of output--will automatically get piped to your pager program (e.g., less(1) or more(1)). With Subversion, you always have to pipe things to less manually.

You can also use git grep to grep for things that exist:

 $ git grep 'foo'

That will show you all files that have the string foo in them. All without the need to have you specify file names or exclusions; it even recurses into sub-directories automatically. Contrast this with what Subversion would make you go through with:

 $ find . -not \( -name .svn -prune \) -exec grep foo {} +

(If you are using Subversion, do yourself a favor and install ack, so you don't have to write find commands like the one above.)

Git's grep command is also powerful enough so that you don't have to write regular expressions as much. Adding -p will show you what functions the matches are in:

 $ git grep -np VIDEORESIZE
 imgv.py=33=def main():
 imgv.py:105:    if event.type == VIDEORESIZE:
 ...

Moving on to branching differences really quick. Creating branches in Git is much easier than in Subversion. You don't even have to checkout your branch in git after you create it like you do in svn. So instead of doing:

 $ svn cp ^/trunk ^/branch/branchname -m "creating branch"
 $ svn switch ^/branch/branchname

You can just do:

 $ git branch branchname
 $ git checkout branchname

That's it. Or even easier:

 $ git checkout -b branchname

To create the branch and switch to it at the same time.

Deleting a branch is as easy as:

 $ git branch -d branchname

(You do delete your branches when you're done with them, don't you?)

Git supports merging between branches much better than Subversion does. Git keeps track of much more history to make it a smooth operation, and the command is easier to type. Once you're on one of the branches to be merged, you can merge the other one with:

 $ git merge [branch]

If there are no conflicts it even commits automatically for you. Though you can tell it not to commit for you with --no-commit.

Aside from merging, sometimes you just want to grab a commit from a different branch and apply it to your current branch. This is called cherry picking and in git the command is appropriately named:

 $ git cherry-pick [revision]

Compare this with Subversion, which has a much less intuititive way of doing this:

 $ svn merge -c [revision] [url]
 $ svn ci -m "cherry picked [rev]

As you may have guessed by now, creating a tag with Git is as easy as creating a branch:

 $ git tag -a tagname

And you can list which tags you have as simple as:

 $ git tag -l

Another thing Git can do that Subversion can't is stashing. For those times when your changes are in an incomplete state and you're ready to commit but you need to temporarily return to the last fresh commit, you can push all your uncommitted changes onto a stack. See the documentation for how to do this.

Moving on, moving on... Okay, how about reverting? To do the equivalent of "svn revert -R ." (revert all local, unstaged, changes):

 $ git reset --hard HEAD

Rolling back a commit is as easy in git as:

 $ git revert HEAD

It will even fill in the commit message for you with "reverting [whatever your last commit message was]" along with the SHA hash of the commit.

You can even pick specific commits to undo with:

 $ git revert [hash]

Oh yes, git lets you easily change commit messages, too. Say you have a post-commit hook script that looks for the string "bug #nnnn" in commit messages--in order to create a list of files in the corresponding ticket number in your bug tracker. Well, what if you forget to input that special syntax into your commit messages? With Git, you could just:

 $ git commit --ammend

Which will open your editor and let you change the commit message. Once you close your editor, it's done. It even changes the commit messages automatically for any reverts associated with the commit to say "reverted [new message]"! Good luck doing that with Subversion. If you need to modify multiple commit messages, or a commit messages several commits back, look into using Git's interactive rebase.

Moving onto deployments now... Git has archiving features built in. For example, you can create a tar of the latest revision using:

 $ cd my-proj
 $ git archive -o /tmp/my-proj.tar HEAD

You can create a tarball with:

 $ git archive HEAD | gzip > /tmp/my-proj.tar.gz

Say you want to zip up just the documentation of your project:

 $ git archive --format=zip --prefix=my-docs HEAD:docs/ >  /tmp/my-docs.zip

Now when you unzip my-docs.zip it will unpack a directory called my-docs with your documentation.

This list of comparisons is getting really long, so I'm going to stop now. Feel free to add your own additions in the comments section below.

Sunday, August 14, 2011

Getting Rid of Cable TV

UPDATE 2016: I've been using roku's almost exclusively after getting rid of the Boxee Box and Xbox360 (I do still have a Playsation3 we use for these apps).

I recently realized I was paying $160.00 a month for cable TV and Internet, after a promotion I had expired. My bill was $110 before that, which was really still too much. I just had up to 18mbps downloads and 1.5 uploads, and my cable was rather basic and didn't include HBO, Cinemax or Showtime. If you were wondering if it's worth ditching cable or satellite TV, read on!

I just decided to ditch cable TV all together and just have internet (up to 18 mbps downloads for $53.00 a month). My provider, AT&T UVerse, didn't have any other promotions right now that sounded any good. The best they could do to lower my bill was jump me down to 70 channels (Family Plan) for $82 a month + taxes and surcharges (and those taxes and surcharges end up being $15-20 in California), so I'd be right back at ~$100 a month.

In the living room I already have an Xbox360 (which are capable of Netflix streaming. EDIT: XBox 360 is also capable of Hulu Plus!), a stand-alone Sony blu-ray player and Boxee Box. The Boxee Box is great because it streams any downloaded file such as .avi, .mkv, .mp3, etc, and can stream Netflix and comes with free streaming channels like Southpark, Youtube, NasaTV, Tech Podcasts, HGTV, News channels, and much more). What's also great about the Boxee Box its that it never has a problem finding my network. I just right-click a folder on any of my computers (be them Linux, Windows or Mac), share the folder, and bam, it shows up in Boxee Box instantly.

For the bedroom TV, since I would only have had a DVD player after getting rid of cable TV, I decided to get a "Roku 2 XS" player (capable of netflix, hulu, amazon prime, games like angry birds, etc). It was only ~$90 and I signed up for Hulu Plus. Note that they also make $59 versions of Roku boxes that are almost as good.

Even if that all that ends up not being enough content, and I decide to reactivate my Netflix account, that will still only be $53 (internet) + $8 (hulu plus) + $8 (netflix) = $69 a month. But since I don't have any plans to ever go back to Netflix, I expect my monthly internet/"pseudo TV" bill to only be about $61.00.

So I'm saving $90-100 a month by getting rid of Cable TV. The Roku will have paid for itself the first month alone. Plus with all that savings I can upgrade to Spotify Premium and maybe get a Squeeze Box and still be saving money; and I'm way better off than I ever was with cable TV that had a ton of channels I never watched. With a modern internet connection I have yet to even have the Roku or Boxee have any bothersome "buffering" problems even while streaming from three devices at once.

One other thing to consider is that when you give up cable, you have to give them back your cable boxes (set top boxes). Since they usually come with a universal remote control that you can no longer use, I recommend purchasing a good univeral remote, which you can get from almost any department or electronic store. I recommend at least a Logitech Harmony 650, which can control at least 5 devices at once and lets you set up macros to automatically turn on your tv, device(s), and switch to the appropriate HDMI port. The harmony line should even partially control your XBox 360, Boxee Box, etc, believe it or not. So look to spend at least $50-100 on a good universal remote.

For any other content I might be missing there's always iTunes (I'm always getting gift cards), RedBox, video streaming apps for iPhone/iPad, and so on. For HBO and Showtime shows that I love to watch, I mostly just wait until the season of the series completes so I can watch them all at once and without commercials.

Ryan Kulla