Archive for the 'Rants' Category

A Scientific Method for Troubleshooting

I think there’s a new heavyweight champion in the world of Mike’s IT-related pet peeves. It’s called “just trying a bunch of random things until my problem goes away.” This is in no way related to troubleshooting, which I will define as “uncovering the root cause of an issue and then resolving it deliberately.” While there are many different techniques for troubleshooting specific problems, I’m going to attempt to show how the scientific method can provide a common framework for more effective troubleshooting.

Step 1: Describe the problem

This is the easy part. Find out what the problem is and reproduce it. That 2nd part is important, because the typical user can’t always be trusted to know what they’re doing. By reproducing the problem, you can rule out user error and verify that there actually is a problem.

Step 2: Gather and analyze data

This is the part everyone likes to skip, and is the real subject of this rant. Step two requires direct observations in order to find out exactly what is happening. How you do this is very much dependent on the problem at hand, but it involves things like log and packet analysis (which may very well require a 3rd party tool not included within the base OS). In any case, please don’t just take a wild guess about what’s causing the problem and then proceed down the path of random experimentation. I don’t know for sure how people acquire this bad habit, but I have a strong suspicion that it comes from working in Microsoft Land, where the computers have personalities, rebooting is the inexplicable fix for everything, and a sea of GUIs makes it easy for the novice sysadmin to miss what’s really going on.

Part of problem in Microsoft Land is that proper troubleshooting tends to be a lot more difficult than it needs to be. For example, there is a severe lack of useful diagnostic tools included within the Windows OS itself. Why are the Windows support tools, resource kit tools, and IIS diagnostic tools still separate downloads? The same question can be asked about the Sysinternals Suite (which Microsoft has owned for several years now). Why are they still shipping obsolete utilities instead of their newer replacements (e.g. nslookup, which was obsoleted by dig many, many years ago)? And lastly, why does Microsoft constantly try to hide any information that could be useful for troubleshooting? Anyone who has ever had to view the message headers on an email in Outlook knows exactly what I’m talking about here, but I digress.

Step 3: Form a hypothesis

It isn’t until you figure out what‘s happening that you can address the question of why it’s happening. Step three is where you use the information you gathered in step two to determine a logical course of action. Remember that a hypothesis beginning with “maybe” or “I think” with little or no direct evidence to back it up is often a dead giveaway for someone who doesn’t know what the hell they’re talking about.

Step 4: Test your hypothesis

Perform your planned course of action.

Step 5: Analyze results and draw conclusions

Check to see if the issue is resolved. If not, revert your changes and go back to step three. When drawing conclusions, ask yourself how this problem occurred in the first place. Was your most recent fix permanent or just a temporary band-aid? If the fix was temporary, make sure you schedule a time to implement a permanent fix.

Are You Googlable?

I’ve decided that if you work in IT and I can’t find you on Google, then you might as well retire.

Stop Using Wizards!

My #1 problem with wizards is that they make people think they are capable of configuring things properly, regardless of whether or not they actually know what the hell they’re doing.  This is also one of the major gripes that I have with companies like Microsoft, who have managed to convince people all over the world that a pretty interface with a bunch of wizards is a good substitute for competence.  Sorry, but that’s bullshit, and every IT professional worth their salt knows this.

For the record, I am not just an elitist who advocates doing everything manually through a command line.  I understand that a wizard can help get you up and running quickly, and I think any wizard that tells you all the things it did would be a great learning tool.  However, I have yet to encounter a wizard that tells you much (if anything) about what it’s doing, and nobody is going to convince me that speed of implementation is more important than knowing how to configure something so that you can fix it when it breaks.

The bottom line is that if you feel the need to use the wizard (especially for critical security infrastructure like firewalls), then you have no business using it, because you obviously don’t know what you’re doing.

Windows 2008 Telnet (not SSH) Server

Have you heard that Windows 2008 will be able to run in a command-line only mode, but will continue to ship with a telnet server instead of SSH? This is awesome, seeing as how telnet is an insecure, antiquated method of remote access that should not be used by anyone under any circumstances. Congratulations Microsoft! Welcome to the 1970′s! Should we expect the SSH server in Windows Server 2033?

Seriously, what the fuck are those people doing over there?

Update According to Microsoft, there will be “a technology like this included in Windows Server 2008 called WinRS; or Windows Remote Shell. This command line tool allows administrators to remotely execute most cmd.exe commands using the WS_Management protocol.” Too bad it sucks!

See Also: “Not Invented Here Syndrome

How to PROPERLY Choose your Internal DNS Domain

One of my biggest IT-related pet peeves is a broken DNS infrastructure. Since nobody seems to know how to implement this properly, I have decided to write a little howto to help put an end to the insanity.

  • Don’t just use whatever the hell domain name you want and justify it by saying “we’ll only be using this domain internally, so it doesn’t matter if we actually own it or not.” That’s just as dumb as using someone else’s public IP addresses on your LAN, and if you don’t understand what’s wrong with that, you’re fired. Make sure the domain you want to use is unique on the internet, and register it.
  • Do use a standard TLD; not that .local bullshit. Using a non-standard TLD like .local is a great way of showing the world that you have absolutely no taste (see below).
  • Don’t go out and register two entirely different domains (e.g. example.com and example.net) for your internal and external namespaces. This is unnecessary, will confuse your users, and will tell the world that you don’t understand how DNS works. Just use sub-domains (e.g. hq.example.com, office.example.com, etc.) for your internal networks, and reserve the root domain (i.e. example.com) for your external resources.
  • Do use different internal and external namespaces. If your external namespace is example.com, don’t use example.com for your internal (i.e. Active Directory) namespace too. Otherwise, you’ll run into problems when your internal users can’t resolve external resources (like your website which may be hosted off-site). If you were stupid enough to make this mistake, one solution is to mirror all your external resource records on your internal DNS servers, but then you’ll have to add/change every record in two places.
  • Do run your internal and external namespaces on separate servers (or at least in separate views). It’s not a good idea to make your internal resource records available to the whole world in the first place, but if you’re using proper private IP addresses on your LAN, this won’t help anyone access your servers over the internet anyway.

Before you ask, if you think you have a good reason not to follow any of the above rules, you are wrong. Don’t do it. I’m begging you.