Thursday, July 26, 2007

Choose a network troubleshooting methodology

Takeaway: Many network administrators don't use an official methodology when it comes to troubleshooting network problems, but there's something to be said for taking a more formal approach. David Davis examines three network troubleshooting methodologies and discusses the advantages of each approach.

A decent portion of every network administrator's job involves troubleshooting. Network problems are as certain as death and taxes—and while you can take steps to prevent issues, sometimes they're just unavoidable.

Network problems range in complexity. You could be dealing with one workstation unable to access the network or the entire network going down.

When you do encounter a network problem, how do you begin troubleshooting? Many admins have never even bothered to thing about it: They don't have a formal methodology—they just jump right in.

But there's something to be said for a formal troubleshooting methodology. For one, it gives you a place to start. And it never hurts to add one more trick to your administrator's toolkit.

Let's look at three common network troubleshooting methodologies. Cisco documents these in its Cisco Internetwork Troubleshooting guidebooks, and you can expect to see questions about them on the CIT 642-831 exam, which is required to achieve CCNP certification.

OSI model

The basis of each of these troubleshooting approaches is the seven-layer OSI Reference Model. If you're unfamiliar with the OSI model or just rusty on the details, here's a look at the seven layers:

  • Layer 1: Physical
  • Layer 2: Data Link
  • Layer 3: Network
  • Layer 4: Transport
  • Layer 5: Session
  • Layer 6: Presentation
  • Layer 7: Application

Here's how the OSI model works: Traffic flows down from the application to the physical layer across the network using the physical medium (for example, an Ethernet cable) to the receiver's physical layer. It then moves up through the layers to the receiver's application.

Once on the receiver's side, the receiver becomes the sender, and the sender becomes the receiver. The response from the receiver traverses the reverse path and moves back to the original sender.

So if one of the layers of the OSI model doesn't work, no traffic will flow. For example, if the data link layer isn't working, the traffic will never make it from the application layer to the physical layer.

Bottom-up

The bottom-up approach is my personal favorite. As the name implies, start at the bottom—Layer 1, the physical layer—and work your way up to the top layer (application).

The physical layer includes the network cable and the network interface card. So if you encounter a broken or disconnected network cable, there's probably no need to do anymore troubleshooting.

You must resolve any physical layer problems before moving on. After fixing the problem, check to see if the trouble still exists. If so, move on to troubleshooting the data link layer.

For example, an Ethernet LAN has an Ethernet switch, which keeps a table of MAC addresses. If there's something wrong with that table—such as a duplicate MAC entry—then resolve that problem before looking at anything on the network layer (e.g., an IP address or routing).

Top-down

Once again, the name of this methodology implies the approach. With the top-down method, start at the top of the OSI model (i.e., the application layer) and work your way down to the bottom layer (i.e., physical).

Divide and conquer

This approach involves a little more intuition. With the divide and conquer method, start at whichever layer you best feel is the root cause of the problem. From there, you can go either up or down through the layers. (Yes, folks, even the "no-method method" has a name.)

Choosing an approach

Which approach you decide to use may depend on where you believe the problem lies. For example, if a user is unable to browse the Web and you think most users have a lot of problems with spyware and Internet Explorer settings, then you may want to start with the top-down approach. On the other hand, if the user mentions that he or she just connected a laptop to the network and can't browse the Web, you might want to use the bottom-up method since there's a good chance the user has a disconnected cable or similar problem.

Do you use a troubleshooting methodology when dealing with networking problems? If so, post your approach in this article's discussion. How important do you think it is to have a troubleshooting methodology?

Miss a column?

Check out the Cisco Routers and Switches Archive, and catch up on David Davis' most recent columns.

Want to learn more about router and switch management? Automatically sign up for our free Cisco Routers and Switches newsletter, delivered each Friday!

David Davis has worked in the IT industry for 12 years and holds several certifications, including CCIE, MCSE+I, CISSP, CCNA, CCDA, and CCNP. He currently manages a group of systems/network administrators for a privately owned retail company and performs networking/systems consulting on a part-time basis.


cheers Aurobindo

No comments: