14.7. Problem Symptoms
Some problems, unfortunately,
aren't as easy to identify as the ones we listed. You'll
experience some misbehavior but won't be able to attribute it
directly to its cause, often because any of a number of problems can
cause the symptoms you see. For cases like this, we'll suggest
some of the common causes of these symptoms and ways to isolate them.
14.7.1. Local Name Can't Be Looked Up
The first thing to do when a program like
telnet or
ftp can't
look up a local domain name is to use
nslookup
or
dig to try to look up the same name. When we
say "the same name," we mean
literally the same name -- don't add
labels and a trailing dot if the user didn't type either one.
Don't query a different name server than the user did.
As often as not, the user mistyped the name or doesn't
understand how the search list works and just needs direction.
Occasionally, you'll turn up real host configuration errors:
You can check for either of these using
nslookup's
set all
command.
If nslookup points to a problem with the name
server rather than with the host configuration, check for the
problems associated with the type of name server. If the name server
is the primary master for the zone, but it isn't responding
with data you think it should:
- Check that the zone data file contains the data in question and that
the name server has loaded it (problem 2). A database dump can tell
you for sure whether the data was loaded.
- Check the configuration file and the pertinent zone data file for
syntax errors (problem 5). Check the name server's
syslog output for indications of those errors.
- Ensure that the records have trailing dots, if they require them
(problem 6).
If the name server is a slave server for the zone, you should first
check whether or not its master has the correct data. If it does and
the slave doesn't:
- Make sure you've incremented the serial number on the primary
master (problem 1).
- Look for a problem on the slave in updating the zone (problem 3).
If the primary master
doesn't have the
correct data, of course, diagnose the problem on the primary.
If the problem server is a caching-only name server:
- Make sure it has its root hints (problem 7).
- Check that your parent zone's delegation to your zone exists
and is correct (problems 9 and 10). Remember that to a caching-only
server, your zone looks just like any other remote zone. Even though
the host it runs on may be inside your zone, the caching-only name
server must be able to locate an authoritative server for your zone
from your parent zone's servers.
14.7.2. Remote Names Can't Be Looked Up
If your local lookups succeed but you
can't look up domain names outside your local zones, there is a
different set of problems to check:
- First, did you just set up your name servers? You might have omitted
the root hints data (problem 7).
- Can you ping the remote zone's name
servers? Maybe you can't reach the remote zone's servers
because of connectivity loss (problem 8).
- Is the remote zone new? Maybe its delegation hasn't yet
appeared (problem 9). Or the delegation information for the remote
zone may be wrong or out of date due to neglect (problem 10).
- Does the domain name actually exist on the remote zone's
servers (problem 2)? On all of them (problems 1 and 3)?
14.7.3. Wrong or Inconsistent Answer
If you get
the wrong answer when looking up a local domain name, or an
inconsistent answer depending on which name server you ask or when
you ask, first check the synchronization between your name servers:
- Are they all holding the same serial number for the zone? Did you
forget to increment the serial number on the primary master after you
made a change (problem 1)? If you did, the name servers may all have
the same serial number, but they will answer differently out of their
authoritative data.
- Did you roll the serial number back to one (problem 1 again)? Then
the primary master's serial number will appear much lower than
the slaves' serial numbers.
- Did you forget to reload the primary master (problem 2)? Then the
primary will return (via nslookup or
dig, for example) a different serial number from
the one in the zone data file.
- Are the slaves having trouble updating from their master(s) (problem
3)? If so, they should have syslogged
appropriate error messages.
- Is the name server's round robin feature rotating the addresses
of the domain name you're looking up?
If you get these results when looking up a domain name in a remote
zone, you should check whether the remote zone's name servers
have lost synchronization. You can use tools like
nslookup and
dig to
determine whether the remote zone's administrator forgot to
increment the serial number, for example. If the name servers answer
differently from their authoritative data but show the same serial
number, the serial number probably wasn't incremented. If the
primary master's serial number is much lower than the
slaves', the primary's serial number was probably
accidentally reset. We usually assume a zone's primary master
name server is running on the host listed in the MNAME (first) field
of the SOA record.
You probably can't determine conclusively that the primary
master hasn't been reloaded, though. It's also difficult
to pin down updating problems between remote name servers. In cases
like this, if you've determined that the remote name servers
are giving out incorrect data, contact the zone administrator and
(gently) relay what you've found. This will help the
administrator track down the problem on the remote end.
If you can determine that a parent name server -- a remote
zone's parent, your zone's parent, or even one in your
zone -- is giving out a bad answer, check whether this is coming
from old delegation information. Sometimes this requires contacting
both the administrator of the remote zone and the administrator of
its parent to compare the delegation and the current, correct list of
authoritative name servers.
If you can't induce the administrator to fix the data or if you
can't track down the administrator, you can always use the
bogus substatement or
bogusns directive to instruct your name server
not to query that particular server.
14.7.4. Lookups Take a Long Time
Slow name
resolution is usually due to one of two problems:
- Connectivity loss (problem 8), which you can diagnose with name
server debugging output and tools like ping
- Incorrect delegation information (problem 10) pointing to the wrong
name servers or the wrong IP addresses
Usually, going over the debugging output and sending a few
pings will point to one or the other: either you
can't reach the name servers at all, or you can reach the hosts
but the name servers aren't responding.
Sometimes, though, the results are inconclusive. For example, the
parent name servers delegate to a set of name servers that
don't respond to pings or queries, but
connectivity to the remote network seems all right (a
traceroute, for example, will get you to the
remote network's "doorstep" -- the last router
between you and the host). Is the delegation information so badly out
of date that the name servers have long since moved to other
addresses? Are the hosts simply down? Or is there really a remote
network problem? Usually, finding out requires a call or a message to
the administrator of the remote zone. (Remember,
whois gives you phone numbers!)
14.7.5. rlogin and rsh to Host Fails Access Check
This is a problem you expect to see right
after you set up your name servers. Users unaware of the change from
the host table to domain name service won't know to update
their
.rhosts files. (We covered what needs to
be updated in
Chapter 6, "Configuring Hosts".) Consequently,
rlogin's or
rsh's access
check will fail and deny the user access.
Other causes of this problem are missing or incorrect
in-addr.arpa delegation (problems 9 and
10) or forgetting to add a PTR record for the client host (problem
4). If you've recently upgraded to BIND Version 4.9 or newer
and have PTR data for more than one in-addr.arpa zone in a single zone data
file, your name server may be ignoring the out-of-zone data. Any of
these situations will result in the same behavior:
% rlogin wormhole
Password:
In other words, the user is prompted for a password despite having
set up password-less access with
.rhosts or
hosts.equiv. If you were to look at the
syslog file on the destination host (
wormhole.movie.edu, in this case),
you'd probably see something like this:
May 4 18:06:22 wormhole inetd[22514]: login/tcp: Connection
from unknown (192.249.249.213)
You can tell which problem it is by stepping through the resolution
process with yourfavorite query tool. First
query one of your
in-addr.arpazone's parent name servers for NS records for your
in-addr.arpa zone. If these
are correct, query the name servers listed for the PTR record
corresponding to the IP address of the
rlogin or
rsh client. Make sure they all have the PTR
record and that the record maps to the right domain name. If not all
the name servers have the record, check for a loss of synchronization
between the primary master and the slaves (problems 1 and 3).
14.7.6. Access to Services Denied
Sometimes
rlogin and
rsh aren't
the only services to go. Occasionally you'll install BIND on
your server and your diskless hosts won't boot, and hosts
won't be able to mount disks from the server, either.
If this happens, make sure that the case of the domain names your
name servers return agrees with the case your previous name service
returned. For example, if you are running NIS and your NIS host maps
contain only lowercase names, you should make sure your name servers
also return lowercase domain names. Some programs are case-sensitive
and won't recognize names in a different case in a data file,
such as /etc/bootparams or
/etc/exports.
14.7.7. Can't Get Rid of Old Data
Sometimes, after decommissioning a name
server or changing a server's IP address, you'll find the
old address record lingering around. An old record may show up in a
name server's cache or in a zone data file weeks or even months
later. The record clearly should have timed out of any
caches by now. So why's
it still there? Well, there are a few reasons this happens.
We'll describe the simpler cases first.
14.7.7.1. Old delegation information
The first (and simplest) case
occurs if a parent zone doesn't keep up with its children or if
the children don't inform the parent of changes to the
authoritative name servers for the zone. If the
edu administrators have this old
delegation information for
movie.edu:
$ORIGIN movie.edu.
@ 86400 IN NS terminator
86400 IN NS wormhole
terminator 86400 IN A 192.249.249.3
wormhole 86400 IN A 192.249.249.254 ; wormhole's former
; IP address
then the
edu name servers
will give out the bogus old address for
wormhole.movie.edu.
This is easily corrected once it's isolated to the parent
zone's name servers: just contact the parent zone's
administrator and ask to have the delegation information updated. If
your parent zone is one of the gTLDs, you may be able to fix the
problem by filling out a form on your registrar's web site to
modify the information about the name server. If any of the child
zone's name servers have cached the bad data, kill them (to
clear out their caches), delete any backup zone data files that
contain the bad data, then restart them.
14.7.7.2. Registration of a non-name server
This is a problem unique to the
gTLD zones:
com,
net, and
org. Sometimes, you'll find the
gTLD name servers giving out stale address information about a host
in one of your zones -- and not even a name server! But why would
the gTLD name servers have information about an arbitrary host in one
of your zones?
Here's the answer: you can register hosts in the gTLD zones
that aren't name servers at all, such as your web server. For
example, you could register an address for www.foo.com through a com registrar, and the com name servers will give out that
address. You shouldn't, though, because you'll lose a
fair amount of control over the address. If you need to change the
address, it could take a day or more to push the change through your
registrar. If you run the foo.com primary master name server, you
can make the change almost instantly.
14.7.7.3. What have I got?
How do you determine which of these problems is plaguing you? Pay
attention to which name servers are distributing the old data and
which zones the data relates to:
- Is the name server a gTLD name server? Check for a stale, registered
address.
- Is the name server your parent name server but not a gTLD name
server? Check the parent for old delegation information.
That's about all we can think to cover. It's certainly
not a comprehensive list, but we hope it'll help you solve the
more common problems you encounter with DNS and give you ideas about
how to approach the rest. Boy, if we'd only had a
troubleshooting guide when
we started!
| | |
14.6. TSIG Errors | | 15. Programming with the Resolver and Name Server Library Routines |