Tuesday, July 1, 2008

Repository Commentary

Some examples of source code repository comments:

Bad
Modified structure to include a linked list (12/20/2007 MT).
Why include the date and username?- this is known from other information in the repository.

Worse
Modified structure to include a linked list.
What about this information could not be derived from a cursory glance at the source code changes?

Fair
The linked list was added to the caching structure to support a LRU ejection policy.
Too general. A naive understanding of caches would have been sufficient to derive this information.

Better
Without the linked list component, the caching structure would often exhibit suboptimal performance. Adding the linked list component is part of an effort to implement a more effective LRU caching policy.
Much improved, but it could be more specific.

Best
Modified caching structure to include an LRU ordered linked list. Hereafter, the cache will more effectively decrease latencies for read-only database access. This change was necessary because the prior suboptimal performance of the cache often resulted in unacceptably long latencies. This problem was especially prevalent on the user account screen.
Perfect.

Template:
  1. One sentence describing the implemented change: Modified caching...
  2. One sentence describing the desired result: Hereafter...
  3. Two sentences describing why the change was necessary: This change was necessary...
If you using a bug tracking system, you might use the bug identifier as part of #3's content. In some cases, a bug identifier is sufficient to replace most of a repository comment.

Admittedly, any sort of support for these suggestions is likely to be anecdotal. After all, there haven't been any multi-year studies comparing poor and superior repository comment practices. However, I'd prefer to include an scenario that, though anecdotal, nevertheless illustrates the value of concise and informative repository comments.

Suppose that several years later, the user account screen is completely reworked to use asynchronous database accesses. It is unclear whether the referenced database caching scheme needs to be reworked to support asynchronous accesses (no small task), or completely discarded. A good set of repository comments would prove invaluable in this case. Knowing the reasoning behind the original change empowers the developer to identify potential risks in future development.

The repository commentary permits a developer to decrease the complexity of source code comments. A good repository comment should cover historical information about the software. In most cases, this historical information can therefore be omitted from the source code commentary.

Including more information in source control comments also has the potential to convict the implementer. A well-written but incorrect change description can reveal how little one knows about the problem being solved. Because the repository commentary is more visible, it is unlikely to be overlooked. That sort of information is invaluable to the project team- but harmful to the individual developer's ego. If you are writing a repository comment, and you fear this scenario, remember the G.I. Joe mantra: knowing is half the battle. Egos are expendable.

Monday, June 23, 2008

Repository Usage

What is the purpose of a source code repository? In my limited experience, the source code repository has been central to every software engineer's workflow. It's such a fact of life that nobody asks that question anymore.

My guess is that a source code repository solves the following problem:
It's very hard to keep track of your files.
Perhaps this problem is just part of the more general problem of handling complexity in a software project. The use of source code repository relieves us of the responsibility of working with that kind of complexity. However, you may create additional complexity by making some of the following mistakes:

Using the source code repository as a workspace. Too many checkins often results in a cluttered source history. Unless there is some mechanism to manage these checkins- it will be difficult in the future to discern source code history. A general rule I use: Only checkin code once it has met some criteria for suitability- e.g. passed all unit tests (you might include that in your commit comment). Consider using a personal repository for more fine-grained revision management.

Not using the source code repository as a workspace
. Too few checkins also has the tendency to obscure source history. However, usually 'too few' means none at all. It's better to checkin buggy code than not at all. Too few checkins force developers to rely upon the source code changes (not the comment history) to understand changes. A large quantity of source code changes in a single checkin is difficult to understand.

Not knowing how to use the repository
. This is probably the biggest problem of all. Misuse usually follows. If you don't know how to use branches, tags and merging functionality of your source code repository, it will force you to use tactics that are far less efficient.

Leaving the repository in an inconsistent state
. It's difficult to discern that any number of checkins are related unless they occur in the same commit. Ideally, each commit will thereafter leave the project in a workable state. This solves a number of problems when it becomes desirable to back out, or reapply source code changes.

Clutter in the repository
. Clutter is defined thusly: unused code, unused files/tests, commented-out snippets, anything that isn't used- but "may be used later". By following a protective policy for unused source code, you are guaranteed to increase the complexity of your project. And that is exactly the enemy source control is meant to fight. Unused code is a bane for a few reasons:
  • Because it exists in the source code repository, it is presumed to be 'tested' and 'solid' code. It's not.
  • It will need to be re-understood and re-tested in order to be reused. Whatever time it took to type the code will almost invariably be re-spent testing and understanding it anew.
  • It makes everything surrounding it more difficult to read and understand.

Finally some pet peeves:
  • Duplicating revision history in the header of a code file. There is no guarantee this history is accurate. Unless the comments are long (which would make the history ungainly) they are invariably too short. And if they are too short, they don't explain enough to be useful.
Which leads me to my second pet peeve:
  • Inappropriately short revision history comments. It would be lovely to understand the history of project simply by reading the revision history. But if the comments are limited in verbosity to "Bug fix" or "Wrong date calc" or "Algorithm rework", there isn't much use in them at all. Consider using complete sentences. Consider using statements that will be understood by someone who hasn't worked on the problem. Consider that you will promptly forget everything you did to the source code within 6 months.

Tuesday, May 6, 2008

Visual Sourcesafe 3.1 is the devil's repository. I spent the last few weeks at work wrangling the out-dated repository system into a subversion repository. The entire process involved a lot of experimentation because the sourcesafe 3.1 mechanisms for detailing file history are somewhat difficult to figure out.

I originally started with the intent of using Polarion's SVNimporter tool. It turned out that solution was not sufficient. It was meant for a newer version of sourcesafe. Our version here at work dates back to the 90's.

I ended up writing about 1500 lines of code in perl. The process of converting the repository can be distilled into 3 steps:
  1. Dump all of the sourcesafe history into a file. (This can take almost an hour, depending on the size of your repository).
  2. Parse the history file and sort it by date. (This part takes the longest, because the history file will not be sorted by date, and it is often several gigabytes in size).
  3. Convert the parsed history into an svn dump file. (This involves translating the sourcesafe behavior into subversion behavior. A direct mapping is not always possible).
This was a difficult thing to test as well. Some errors would only be revealed after several hours had passed during a test run. The 1st and 2nd step usually took the longest. It was difficult to engineer the system because I was unfamiliar with the sourcesafe system at the start. For example, sourcesafe does not maintain the history of files that have been "Removed" in the history output. Therefore, it was necessary to "Restore" all of the removed files before retrieving the history. This is the sort of thing that I needed to learn from painful experience.

Wednesday, March 5, 2008

Floating point inaccuracies

My current project in work involves resolving some differences between output under different platforms (32bit vs. 64bit vs. VC6 compiler vs. VS2005 compiler, etc.). Sometimes the floating point errors are difficult to trace. Many times, they are not particularly intuitive. I think that the average programmer probably doesn't know why calculating (x+y)*(x-y) is preferred over (x^2 - y^2).

Here's a spec for a useful tool that deals with these issues:

Input:
  1. A function that returns a floating point value.
  2. A description of the inputs (ranges, precision) to the function.

Output:
  1. The worst case, best case and average error introduced by the floating point operations in the program (Error is provided as a relative error and ulps (units in the last place)).

Operational description:
  1. Compile the input program using a floating point and fixed point data type for the operands involved.
  2. Execute the program using the variety of input parameters specified.
  3. Record the results from both versions of the software and compare.
  4. Provide the summary statistics as output.

Required tools:
  1. Some sort of fixed point math library.
  2. Perhaps some sort of parser for the programming language desired (I dunno).

Monday, December 3, 2007

MSDN suggestion

Before
After

Here are some screenshots of a sample MSDN page. I made a bunch of changes:

A lot of the content is gone. There are two reasons for this:
  1. Some of the content is duplicated. For example, the left-side table of contents and the mouse-over responsive header line duplicate the same information. The bottom half of the screen also has the same contents as the both of these.
  2. Some of the content is of the quality: "Want more...", or "Maybe you shouldn't be here...". These interjections have the quality of the well-known "clippy" abomination. Although they are not as intrusive as the aforementioned devilish monster, they have the quality of presuming the user's thoughts and actions. If the software is smart enough to accurately predict the user's intentions- it probably shouldn't have sent the user here. Alternatively, if the user doesn't want to be at the page, it is faster to read the title and choose a different search criteria than it is to read each individual pop-out. To accommodate the user who uses a more time-intensive approach, mouse-overs with pop-outs provide the same capabilities without making the page overly busy.
I changed the font of the paragraph-style text. If the text is intended to be read like a book, it should use a book-like typeface. Serif typefaces are intended for this purpose.

The tabs at the top of the page (Home, Library, Learn, etc.) are gone. They only serve as links to a different parts of the main website. See this page for an explanation of good tab usage.

The search box is more prominent.

The table of contents on the left side has no scrollbars. Scrollbars are not prohibited, but avoided if at all possible. The amount of information displayed in the table of contents therefore had to be reduced. This change will be mitigated by using mouse-overs and limiting the display of contents above the current chapter. This approach assumes that a user drills-down his search more often than he or she backtracks. The common case is optimized.

As mentioned in the previous blog entry, the URL is more intuitive.

Where at all possible, the screen is not refreshed. The only exception might occur when creating a bookmark (because a URL is required).

Much of the content at the bottom of the page has been removed. Too many features overwhelms me.

Thursday, November 29, 2007

Room for improvement

Today at work I spent most of the day reading documentation from Microsoft's MSDN web site. I am displeased with the experience for a few reasons:
  • AJAX has been around for a few years now, and I still need a full page refresh every time I click a link? It makes it impossible to keep track of the table of contents on the left side.
  • Broken links. Broken links are the bane of any website. It's just plain unprofessional. It would also be nice to know if a link was broken intentionally (as in the case of bad documentation) or unintentionally.
  • Bizarre URL naming conventions. The document entitled "Mixed (Native and Managed) Assemblies" is available at msdn2.microsoft.com/en-us/library/x0w2664k(VS.80).aspx. I'd prefer something like: ..,/library/vs2005/vc++/guide/interoperability/mixedassemblies.html (Don't click on this, cuz it don't work).
  • Too much information and a poor choice of fonts. As you can see below, the size of the font is too small for comfortable reading. The chapter and contents display is almost always going to be truncated.
  • Poor search. None of the problems listed thus far would be an issue if the search functionality was better.
Lest you think I'm only an engineer on the complain train, over the next few days I'm going to try and put a few sample screenshots of what I think a heavyweight software documentation site should look like. If I don't get too busy.

Tuesday, November 20, 2007

Upgrading

In all the software projects I've been in, there has been an occasion where a "platform upgrade" has occurred. Unfortunately, the platform upgrade always brings with it a bunch of unforeseen problems.

Here are some pitfalls:
  • If you can't migrate/upgrade except in once gigantic leap- you'll fail.
  • If you can't rollback the upgrade, you will wish you could.
For these reasons, most people avoid being early-adopters. There is too much risk. But it should be noted that the risk isn't so much about whether the new platform/upgrade is beneficial. The risk is in the process of leaping. It almost always isn't as smooth as expected.

Pivotal to an upgrade of a tool or platform is the certification date. The certification date is the date when the product of the new tool is certified as equivalent or better than the previous version. Another important date is the point of no return. That is the date when it becomes impossible or prohibitively difficult to return to the previous version of the tool. Ideally, the certification date should be a long time before the point of no return. If the certification date coincides with the point of no return, you are asking for trouble.

A unique set of problems exist when the version control system undergoes an upgrade. With such an upgrade it is often tempting to use the certification date as the point of no return. This is a common approach because the process of rolling back is very difficult once the migration occurs.

The necessity of rolling back would only occur if the value of the version control system is subordinate to the software's release date. It is not likely that the new version control system has a bug that prevents the release from occurring. It is more likely that the new version control system is misused by developers and thus adds unscheduled delays to development. Also, having the rollback option appeases the risk-averse managerial types.

Here are some suggestions:
  • Use a version control system's post-commit scripting capability to push the committed changes to the old version control system. This will permit a rollback after the migration.
  • Use the new version control system to perform a software release concurrently with the previous version control system. After a successful release, migrate to the new system. You'll have until the next release to iron out any issues.
  • Develop or use an existing tool (like polarion's svn importer).
  • Train developers in the new system prior to migration (duh).