请帮我翻译一下.(100分)

  • 主题发起人 huangbaili
  • 开始时间
H

huangbaili

Unregistered / Unconfirmed
GUEST, unregistred user!
Disk Related Issues

12. If a NetWare volume has more than 2.5 to 3 million directory entries
allocated and/or used, high utilization can be experienced in specific
circumstances. This problem is specific to NetWare 4.11 due to the fact that
it allows a maximum of 16 million directory entries per volume. NetWare 4.10
will not experience this specific problem because a maximum of 2 million
directory entries per volume is allowed.
To check the number of directory entries per volume, load servman.nlm at the
server’s system console. Go to Volume information. This menu will display each
volume mounted on the server at the present time. Highlight a volume and press
<ENTER>. A table is now displayed specific to the volume selected. At the
bottom of the table (2nd and 3rd from the bottom) are the statistics,
“Directory entries” and “Used directory entries” The Directory entries
statistic displays the number of directory entries allocated currently on the
volume. The Used directory entries statistic displays the number of directory
entries actually used out of the number allocated. Directory entries are stored
in what is known as a directory entry table or DET. The DET contains basic
information about files, directories, directory trustees, or other entities on
the volume. If either of these two numbers are greater than 2.5 million, high
utilization can occur. As this number grows larger (i.e., more files and
directories on the volume), the problem will progressively get worse.
All of the typical high utilization symptoms are experienced with this.
Namely: users lose connections to the server, saving/reading files will be
extremely slow (if not impossible), logins will be slow to non-existent,
server hangs, and abends in some rare situations.Here is how the utilization
problem is presented: An application running on either the server, or a
client workstationdo
es a trustee rights search for a specific user or group
on a NetWare volume using the API named scan Bindery Object Trustee Paths?
This APIdo
es a flat scan of the entire directory entry table from start to
finish on the volume selected. There is no selective searching of the
directory entry table unless a point lower in the directory tree (i.e., a
subdirectory) is specified initially. In addition, there are currently no
other APIs that can return the information desired from this search on a
traditional NetWare file system. When this search is running, the process
locks the volume by holding the volume semaphore for the duration of the
entire search. Itdo
es not yield or release the semaphore until the search is
completed. This means that all access to this specific volume is queued up
and will wait until the semaphore is released to read or write data to/from
the volume. Because of the way the file system was architected, the rights
to the files are stored at the file system level, and the rights to the
directories are stored at the directory level. Thus, the use of this API is
very expensive because the search must go all the waydo
wn to the files to
return trustee assignments.

Top

If a volume on your server has more than 2.5 to 3 million directory entries,
a test can bedo
ne (after hours!) to verify that this is actually a cause of
high utilization. Bring up NWAdmin (3.x, 95 or NT version) on a workstation.
Select a user and go to the user’s details screen. Select the “Rights to
Files and Directories” button. A screen is now displayed that lists the
volumes and trustee rights to those specific volumes. By default there are no
volumes displayed, and thus no trustees displayed. By selecting either the
“Find” button, or the “Show” button, a search of a volume (or specific
directory structure) is prepared and then
run. After this procedure is
completed, a list of all specific trustee assignments for the user in question
is returned. During this search is where the high utilization condition is
seen. These two buttons use the Scan Bindery Object Trustee Paths API and will
increase the utilization of a server. Tests here at Novell have shown that
the workstation running the search can be tied up for 30 minutes with around
8 million directory entries on a volume (Pentium/133). Obviously faster server
hardware will result in faster response to the query. FYI:
NWAdmin will not use the problem API in any other function within the utility.
Again, it is not the program’s fault, it is the API’s problem.
We have duplicated this problem in NWAdmin (as mentioned above) and NetAdmin
with a similar search for trustees specific to a user or group. Novell’s
ftp server (ftpserv.nlm) will also use the API in question when a user
establishes a session to the host anddo
es not have a home directory defined
within the DS database. If this is the case, a scan of the volume is initiated
to find what rights the user actually has to files/directories on the volume.
Obviously if the user has a home directory defined with DS, then
the scan for
trustees will not occur. The ftp server will never run this scan for trustee
rights again during the currently established connection, only when the session
is first established.
We have also seen this problem in 3rd party products like BindView. Again, any
program (Novell or 3rd party) that makes use of this specific API can cause
high utilization on a server. If a trace is taken and analyzed, look for NCP
function 23, sub-function 71.

Top

The fixes for this problem that Novell currently recommends include the
following:
1.do
not use functions within programs (NWAdmin/NetAdmin) that make use of
the problem API (searching for trustee rights on a user or group).
2. If possible, wait until after hours to scan for trustees on a user or
group within NWAdmin. High utilization late at night is much easier to live
with than during production hours when users are waiting around to get their
workdo
ne.
3. If use of NWAdmin is needed,do
not check trustees for the whole server
(every volume), or even at a volume level. Use the browser dialogue box and
select specific directories to perform the search on. This will greatly
reduce the number of directory entries to search through, and thus there is
less of a chance to have high utilization.
4. Reduce the number of directory entries allocated on a volume by removing
information and data from that volume. Purge the volume afterwards and then

dismount and remount it.
5. The RIGHTS command can be used to gather information and modify trustee
assignments;
however, this command line driven utility is not as user friendly
as NWAdmin. This utilitydo
ES NOT use the API in question.
6. NSS. NetWare 5 is Novell’s official solution to the problem currently.
Novell has tested NSS and found a much better response from the server with
high numbers of directory entries allocated on an NSS volume. NSS will work
on a visibility basis. It scans immediate subdirectories and ondo
wn.
Problems can occur if the user has thousands of directories all at the same
level;
however, most usersdo
not set up a file system like this.
Novell is working on a different solution to the problem other than the 6
items listed above. A different API is being investigated that will yield
control during the searching of the file system;
i.e., release the volume semaphore for other processes to use. This new API would still lock the workstation for the duration of the search, but the server would not experience near the severity of utilization as before. Thus, users would not be disconnected, and the performance of the server would not dramatically increase to a point where service degradation is noticeable.

13. Update drivers to the current available from the card vendor. Novelldo
es
not carry updated 3rd party drivers on it’s web site, or any other place.
Novell will ship the latest drivers available for NetWare on the NetWare CDROM,
but these drivers will not change;
i.e., NetWare 4.10 CD-ROM has drivers dated
in 1994 no matter if you bought it in 1994 or 1997 (the same is true of
intraNetWare, 1996 drivers). The drivers that ship with the NetWare cd, are
given to Novell to be released with the product, that’s it. The updated
driver must be obtained from the manufacturer. Normally, a manufacturer will
update the disk driver a couple of times a year (one at least). Look for
drivers no older than 6-12 months from the current date. If a driver is old
(1994 or before), it may not have been certified to work with NetWare 4.10 or
NetWare 4.11. Check with the manufacturer to see if it is still certified,
and if they are still maintaining the NLM. If they are not going to update the
driver, or if they have stopped support for the driver, this should tell you
something (consider newer/different hardware).
Top

14. Adequate “Free Blocks” are essential. A “Free Block” is a disk block
that has no salvageable files stored in it. A file that is deleted AND purged
is Free space. Maintain a minimum of 1000 free blocks on each NetWare volume
that has suballocation enabled. Suballocation uses free blocks to perform its
function. Suballocation is normally a low priority thread, meaning that it
only runs when the processor has nothing else
todo
and is idle. When disk
space is low (less than 10%) or when free blocks are low, suballocation can go into an aggressive mode which causes it to become a normal priority thread. It will take control of the volume semaphore until it has cleaned up and freed as much space as possible. This locking of the volume semaphore causes other processes, who are trying to use the volume, to wait until the semaphore is released. In large installations, this results in an increase of Packet Receive Buffers and File Service Processes. When the Packet Receive Buffers max out, the server will start dropping connections and users are unable to login. When suballocation completes its cleanup, the semaphore will be released and the processes on the run-queue will be serviced. Maintaining over 1000 free blocks will usually avoid this problem. To be safe, give yourself a buffer against this happening by maintaining your volume’s free space at 20% or greater (you never know when someone will decide to copy their hard drive up to the network, plus a couple of CD’s).
To check how many Free Blocks you have, go into servman | volume information |
<ENTER> on a volume and then
look at the statistic free blocks. If there are
not at least 1000 free blocks on the volume, run a PURGE / ALL from the root
of the volume from a client workstation. This will free the freeable limbo
blocks?increasing your free space. If the server’s utilization is so high
that no one can log in, dismount the volume and run VREPAIR with the option
purge all deleted files selected. Remember that VREPAIR will report errors
when it purges files for you;
Don’t panic, this is normal. However, remember
that VREPAIR can be a ruthless utility and will destroy your data if it cannot
interpret it correctly. So, before running VREPAIR, consider the state of
your
backup.
If you have applications that create large amounts of temporary files, you
may want to set the PURGE flag on the directory where these files are created.
Every temporary file that is created will be put on the deleted file list?
These files are kept on the disk until a PURGE is run. You could also “set
immediate purge of deleted files = on” (set at the file server level, and
applicable for every server volume).

Top

15. If you are using suballocation, use a volume block size of 64K. This
block size is the fastest and most efficient block size for volumes with
suballocation enabled. Also, keep a MINIMUM of 10%-20% of the volume space
free to avoid the suballocation “aggressive” mode.

16. Decompression of files occur on the fly. A Pentium processor (60 MHz) can
decompress on average one meg per second. The NetWare compression sub-system
will compress files up to and including 256 meg. This means that decompression
of a very large file, for example 100 meg, could take up to a minute and a
half (100 seconds) on such a machine. If NetWare is decompressing a large
file, utilization will be high for the duration of the decompression. This is
normal.

17. Directories that are flagged IC (immediate compress) can cause excessive
decompression and then
recompression during production hours. Sometimes users
with disk space restrictions may flag their home directory with IC to save
disk space. Normally, compression is a low priority thread, which means it
only compresses files when the server is idle. When the IC flag is set,
compression is bumped up to a normal priority thread, and will not wait for
idle time. Care should be taken by administrators to ensure that directories
or files marked IC are rarely accessed. Excessive decompression and
recompression, or “Thrashing” can be duplicated by setting the
“Days untouched before compression = 0" or “Convert Compressed to
uncompressed option = 0" which means that each time a file is closed,
it is compressed immediately. Normally, the compression set parameters
should remain at their default.
Another cause of high utilization with compression is where a volume is
nearly full with compression enabled. In this situation, files will be
compressed and never committed decompressed due to the failure of
allocating enough space on the disk to hold the decompressed version.
This can also be caused by “Minimum file delete wait time” being set to a
large value thus not allowing any deleted files to be reclaimed for space to
commit a compressed file. This full volume situation is usually indicated
by the “Compressed files are not being committed” alert which will occur
on the system console. This can be fixed by setting “Decompress percent
disk space free to allow commit = <some number less than current>” Remember,
there must be enough space on the volume to allow for the decompressed
version of the file to be committed in order for that file to be committed
decompressed.

Top

18. To check if there is excessive compression/decompression activity,
activate the compression tracking screen using, “Set compress screen = on”
then
toggle the system console to the compression screen. The screen displays
all compression/decompression activity. The lines per second can give an
indication of how busy compression is. Lines that are preceded with an *
indicate files that are being decompressed. By referring to monitor’s
processor histogram (in processor utilization), you can see the CPU time and
load used by compression.

19. “Set Deleted Files Compression Option = 2" will cause the immediate
compression of files that have been deleted. This can cause high utilization
because the processor is immediately compressing files upon their deletion.
Set this parameter to 1, the default;
this will compress the file the next day.
20. File compression should be set to occur off-hours or during times of low
server usage. This isdo
ne through the use of the set parameters under the file
system category in servman. The default settings work fine. Make sure that
changes have not been made that are causing utilization problems. File
compression can consume a CPU’s total bandwidth if mishandled or expected to
perform extensively in a real-time environment. File compression requires CPU
cycles to compress and decompress data. The CPU cycles to decompress a file
will be taken when the file is accessed but the cycles to compress the file
should be the unused cycles (late at night or early in the morning) that would
otherwise be wasted in the idle loop.
21. Some disk statistics to watch:
A. Dirty cache buffers: Dirty cache buffers are cache buffers that have been
updated by the user and have not been written to disk within 3.3 seconds.
A flag is set on the cache buffer designating it as “dirty”, which indicates
that it should be written to disk before a regular cache buffer is. If dirty
cache buffers are consistently at or above 70% (dirty cache buffers / total
cache buffers), evaluate your disk system and upgrade to faster disk components
if necessary.
B. Current disk requests: Current disk requests are disk requests (read or
write) that have been issued but not completed. If this number is consistently
high, evaluate your disk channel. Be aware that if you increase maximum
concurrent disk cache writes, the current disk requests will also increase.
These set parameters can change the server from being faster at reading data,
or faster at writing data;
consult the NetWare manuals for more details, and
set them according to the work environment (read or write intensive)

Memory Issues

22. Adequate memory resources are critical in a server environment. The best
indication of adequate memory is the LRU Sitting Time (LRU stands for Least
Recently Used). This parameter is an indication of how often the server’s
ram is completely flushed with new data. Obviously, a higher number is better
because it indicates good usage of existing ram. Go to monitor | cache
utilization. The LRU sitting time should be a minimum of 20 minutes. Baseline
this number;
know its average for your server. A number lower then
this may
indicate you are low on physical RAM or the server is being overused for the
amount of hardware it has. The LRU Sitting time will fall into one of these
categories:

Above 40 min. Excellent.

20-40 minutes Satisfactory to good (depending which number you are closer to)
Below 20 min. Below Average. Keep an eye on it.
Under 5 min. Critical. There is definitely a problem with the server
(unless it just came up 5 to 10 minutes ago, or if a heavy i/o operation just
completed).
There was an incident reported to Novell Technical Services where the LRU
sitting time was reported at 13 seconds. The disk drive light never went off
and the customer could hear the drive thrashing constantly. This problem
turned out to be an error with Directory Services that was fixed with DSRepair.

23. Another indication of a memory problem is the cache buffer statistics.
Divide the Total Cache Buffers by the Original Cache Buffers (statistics found
in Monitor). Here are some recommendations for the results:
Above 70% Excellent. you are in good shape with the server memory.

50% - 70% Satisfactory.
40% - 50% Below average. Keep an eye on it. You may need to add memory to
the server.
Below 40% Critical, add more memory. If this is your situation, this would
be the NUMBER ONE priority. Servers will not only give you high utilization,
but can abend anddo
a number of other things detrimental to your data. There
are some programs that have been known to “Leak” memory, i.e., they take
memory until it is gone. When a server is below 40% cache buffers and this
happens (memory leak), ram disappears fast and data may be corrupted if the
server abends. See TID 2924988 for help in resolving server memory leaks.
See the definitions at the end of thisdo
cument for an explanation on what
cache buffers are used for.


24. The “Cache Utilization” screen in Monitor is another good check for the
proper memory needed by the server to function correctly. Besides the LRU
sitting time, the four actual cache hit statistics will tell you how well your
server retrieves information for its users. Both “short term ... hits” items
should read 100%. If these drop, and stay at lower numbers, ram is needed for
the server. This would and should have been recognized in the previous steps.
The “long term cache hits” will normally be in the 90's. This is the
percentage of disk blocks requested that were already in cache. This value
accumulates from the time the server is started. Because it reflects a long
term history of memory utilization, it is the most accurate way to assess
overall cache utilization. If one of these numbers is constantly in the 80's
or below, add ram. While in the cache utilization screen in monitor, hit <F1>
for an explanation of what each statistic means.
One incident was reported to Novell where the user had a server with 512 meg
of ram, 40 gig of hard drive space and his cache statistics were unrealistically low. Plus, he had only 3 users on the network. The customer just knew there was a problem with Novell’s operating system and he wanted us to fix it. It turns out that he was using a recursively called command in a batch file to copy files from a workstation to the server. This batch file would run for extended periods of time with straight copying of files. With this situation, it is only natural for the cache statistics to be low because the ram is being flushed constantly and most of the files weren’t accessed while they were in ram, consequently the cache statistics were low, which in this situation was a correct report by NetWare. The customer was trying to “test” his server for what he thought was normal usage. If this is a situation with your server,do
notdo
what this customer did. Novell suggests that the internet be searched for bench marking utilities for NetWare. Also, at http://developer.novell.com many companies have performed this kind of test on their servers and Novell posts what is called Certification Bulletins to let customers know what has and what has not been tested and certified.


25. If you have SCSI devices attached to the server;
which most customer now
do, make sure that the line “Set reserved buffers below 16 meg = 200" is in
your startup.ncf file (or iostart.ncf for SFTIII). There are situations where
high utilization can result if this parameter is not set. It is required if
an aspi layer is loaded on the server. This set parameter has a maximum of 300,
sodo
n’t get carried away with it.
Directory Services Issues

26. Update DS to version 5.12 or later for NetWare 4.10, and 5.99a or later
for NetWare 4.11. Update ALL of your 4.1x servers (this is one patch that
needs to be applied to every server at pretty much the same time). If you ever
have an issue that needs to be resolved through the help of a Directory
Services engineer, they will ask you to update to the latest ds patch. Update
now, save yourself some headaches in the future.


27. The Directory Services tree should have no more than three replicas of
each partition. DS needs to keep synchronization among all the servers in the
replica ring. The more replicas there are of any partition, the more traffic
will be on the wire, and the higher the utilization. The exception to the
rule here occurs when bindery emulation is required, thus necessitating the
placement of more than 3 real copies of the partition on servers.

28. Check for Directory Services synchronization problems at the server
console by typing:
set dstrace = !D60 (This will turn OFF all in bound and out bound syncing
for 60 minutes)
set dstrace = !E (This will cancel the !D)
This test will tell you if the ds synchronization is causing your high
utilization. After turning synchronization off, wait about 15 minutes to give
the server a chance to catch up with queued work todo
. The number following
the is in minutes and can be anything you want;
i.e., set dstrace = !D120
turns syncing off for 2 hours. This specific test is only useful in a
multiple server environment. If you have only one 4.x server, skip this step.

The D switch will not inhibit the authen
tication process of users logging in;
rather, itdo
es not allow the server to synchronize its DS changes with other
servers. That’s it.


29. Check for other DS errors by typing the following at the server console:

set dstrace = on This will turn on the directory services trace console
screen. A set dstrace = off will turn this screen off).
set dstrace = +s The s turns on the synchronization filter;
allowing the
sync traffic to be displayed on the trace screen. A set dstrace = -s would
disable the displaying of sync information.
set dstrace = *h They stands for heartbeat. This commands schedules and
immediate synchronization of all partitions and replicas stored on this
particular server. In other words, sync right now!
For information on other dstrace commands and ds error code meanings,
there are multiple TIDs written, and articles have been published in AppNotes
concerning these items (e.g., Feb 1997 AppNotes has a good article on dstrace
and all of the possible switches and filters).
Toggle over to the Directory Services console, and watch the screen. The main
thing to look for is the statement:
All processed = YES You will see other statements indicating which partitions
are syncing. If there is an error with Directory Services, it will probably be
shown here, and an “all processed = NO” (shown in red) is a good indication
that there is a Directory Services issue that needs to be addressed. This
means that one of the replicas / partitions did not finish the syncing process
because of an error condition. You should be able to tell which one it is from
this screen, and that will give you a good indication of which server has the
DS errors you need to work on. Watch for synchronization errors that never go
away. If there is more than one 4.x server, go to those server and run this
same command make sure that they are syncing correctly also, just looking at
one server will not indicate the syncing problems of other servers.
If there are many replicas on this server the dstrace screen will be difficult
to follow and find and error message on. Go to dsrepair.nlm and run Check
synchronization status from the main menu. This will give you a good overview
of how each partition is syncing, and error messages will be listed on the
screen. Again, check for TIDs, or call Novell Technical services for help in
troubleshooting specific directory service issues.


30. A lot of bindery emulation can increase utilization. The processor only
creates one process thread for ALL of the bindery connections;
this includes
users, printers, and anything else
that requests a bindery connection to the
server. So, if there are 200 users and 40 printers (all using bindery emulation),
they will all queue up on the one processor thread available and dedicated to
servicing bindery requests. This backlog of requests will dramatically
increase utilization. Customers have seen slight decreases with the changing
of communication set parameters, however, the utilization problem is not
resolved, rather it is sloweddo
wn or delayed for a longer period of time.
The largest percentage of customers who see this problem are those who upgrade
from NetWare 3.x to NetWare 4.x and leave all of their users with bindery
connections to the server (using the netx client).
Printing Issues

31. Printing:

- No more then
about 40 printers serviced from one server.
- Netport must be set to be NDS aware.
- HP Jetdirect: firmware upgrade to be NDS aware, AND change the setup
(Jetadmin) for NDS.

It is recommended, if you are concerned about utilization problems, that you
set your print devices todo
the processing instead of the server. This will
slowdo
wn printer output, but will relieve the utilization on the file server.


32. Check all of the print queues for corruption, or for anything unusual.
There have been high utilization incidents reported because of a corrupt print
queue. As soon as the queue was accessed (a job was submitted), the server
utilization would spike (and stay there). One specific incident involved a bad
queue with over 1500 jobs still remaining in the queue. After the queue was
deleted through pconsole, everything worked fine. Corrupt queues can also
cause abends on a server anytime they are accessed.
The process to delete the queue is simple. The first step is to identify the
print queue where the problem is. From Pconsole, choose Print Queues and
highlight each print queue and press <ENTER> respectively. If the server
abends or exhibits high utilization immediately after accessing the queue,
this is a damaged print queue. Go to the Information option and press <ENTER>
on this;
this option will give the queue name (for example 0703003E). This is
the name of the directory in SYS:QUEUES (unless they have been moved to another
location). The directory should be deleted (through Filer ordo
S, etc.), and
then
the queue removed through Pconsole;
in that order. After deleting the
corrupt print queue, some users have dismounted the volume and run VREPAIR
with the purge option on. This will ensure that the queue directory is cleaned
from the server.


33. Sometimes HP Jet Direct cards will storm the server with requests to login.
This happens if there is a corrupt print queue that it is trying to attach to.
The print server must attach to the print queue to correctly print. When the
queue becomes corrupt, it can no longer attach, and therefore storms the server
in an effort to establish communication to the queue. If the Jet Direct cards
are using the bindery to login (not NDS), then
by setting a bindery context to
NOTHING this can prevent the cards from attaching to the server. The obvious
resolution to this is to delete the print queue and recreate it again. To
troubleshoot this problem,do
the following: Type “set bindery context=”
and then
hit <ENTER> at the system console prompt. If the utilization drops
after typing this, look for corrupt queues, or bindery objects accessing the
server that could cause high utilization. Remember, by not having a bindery
context, every user or object that is/was bindery dependent can no longer
attach or login to the server. So, if there are bindery users connected to
the server, they will stay connected, but the server will not allow any new
bindery connections. To reset the bindery context, type it back in at the
system console prompt. Look in the autoexec.ncf file for the setting, or check
in servman | Directory Service Parameters if the original context setting is
forgotten.
Client / Workstation Issues

34. Remote and rspx will not cause high utilization, but they will aggravate
it. No matter what is going on, rconsole, remote, and rspx will not cause high
utilization (unless they are corrupt, and then
there is a possibility).
Usually, if high utilization comes about when these are used, this is an
indication of something else
being wrong. If Processor Utilization reports
remote as being a big source of utilization (40-60% of the load), then
it is
possible that rconsole on a Windows 3.x/95/98/NT machine is a background
process. When rconsole goes to a background process from a foreground process,
the server spends more time trying to contact that session (to update screens,
etc.) then
it would were that process a foreground process on the machine.
Rconsole warns users of this when it first loads by saying that “AS Windows
may cause Rconsole to behave erratically” If this is the case, either unload
remote and rspx, or change the rconsole session to a foreground session;
an
immediate change should be noticeable.

35. Check for hung connections, and clear them;
this isdo
ne in monitor.

36. Update client pieces ... regardless of whose client it is (Novell’s,
Microsoft’s, etc.). There have been many fixes for Novell’s client. Most of
the fixes are rolled forward to the latest client, not back. It is
advantageous to use these fixes because they can not only prevent problems
(like high utilization resulting from bad client requests to the server), but
they fix other problems (caching, file locking, etc.). Client problems can
sometimes be identified by changing the following set parameters. Go into
servman | server parameters | NCP. Now set the following parameters:
Display NCP bad component warnings = on
Reject NCP packets with bad components = on
Display NCP bad length warnings = on
Reject NCP packets with bad lengths = on
By setting these parameters, the server will display a warning message each
time a bad packet is received at the server. The warning message will also
include the MAC addresses of devices (NIC card, etc.) or the connection
number (the MAC can be found by checking the connection number through
monitor) from which the packets were sent. It is really helpful here if
you have a “map” of your area and the MAC address of each NIC in each
machine;
that way you can walk right to the machine that is sending bad
packets, instead of searching for it.
This will help trackdo
wn a possible bad NIC is a client machine. This will
also prevent the server from handling bad packets and thus increasing the load
on the processor. If a client application stops working after these set
parameters are enabled, ask yourself if this application could be the problem.


37. Some utilities will use a “Flat bindery scan” for the information you’
re requested. Syscon, for example,do
es a flat bindery scan for trustees. This
will temporarily peg utilization. If your high utilization is related to
running a particular command at the workstation, this may be the reason and
explanation.
One customer had a pretty serious problem with high utilization that turned
out to be related to the bindery, with a twist to the resolution. After turning
off the bindery context, and seeing the subsequent drop in utilization, it
was apparent that a bindery request was the problem. However, all of the
printers/queues were working fine (they were bindery), and none of the users
were making bindery connections to the server. Plus, the bindery context was
absolutely required so that the printers could operate during production hours.
So, with no corrupt printing objects, and no users making bindery requests,
it was some other item on the network making a bindery request of this
specific NetWare server. After analyzing several LAN traces, it was discovered
that there was a program running on an NT Server that was making bindery
requests to the NetWare server. After disabling this program, the utilization
returned to normal.

LAN Issues

38. Update lan drivers to the current available from the card vendor. Also,
update the tsm (Topology Specific Module), msm (Media Support Module), and
nbi (NetWare Bus Interface) support modules from Novell. You’ll find these
on our web site (support.novell.com) in the file LandrX.exe (where X is the
current revision of the file) for NetWare 4.10, or in IWSPX for NetWare 4.11.

The latest LandrX patches ship with both ODI Spec 3.2 and ODI Spec 3.3 drivers.
If you are not sure which set of drivers your lan card can work with or
support, choose the 3.2 spec. There have been instances where the 3.3 spec
was applied to servers, and the users could no longer log in (not much of a
server now). The ODI Spec 3.3 is brand new with NetWare 4.11. It will only
work if your lan card is specified to work with it. Check with your
manufacturer for compatibility,do
not guess about this. In the latest patch
kit for NetWare 4.11, while installing the patch, you have the option to
install the ODI 3.31 drivers. Again, because this is even newer than the
3.30 ODI spec,do
uble check and make sure you have access to a 3.31 driver
for your specific lan card. There will be problems (usually not utilization
problems, but rather, communication issues) if a 3.30 lan driver is mixed with
3.31 spec modules! Need a driver that is guaranteed to be ODI 3.30 or 3.31?
Novell writes one, it’s called NE2000.LAN.
Note for Compaq users: Compaq users SHOULD NOT update their lan drivers with
the LandrX patch kit. They need to speak with Compaq about getting a set of
updated drivers. As of this writing, http://www.compaq.com/support/files/
server/softpaqs/Netware/NSSD.html is the web site for obtaining the latest
set of patches from Compaq. These are the drivers that Compaq has certified
to work with their machines running NetWare. Stick with these.

Top

39. Use pbrstoff.nlm to disable packet burst at the server AS A TEST. then
have all the users log off and then
log back on again (once pbrstoff.nlm is
loaded, it only affects new login’s). You can also use “Set enable packet
burst statistics screen = on” to verify that there is no packet burst
happening (FYI, Novell Application Notes Feb
1996 has a great explanation of how to read and interpret this screen). If
the high utilization clears up, there is some type of lan channel problem. Be
sure to unload pbrstoff.nlm after the test. Pbrstoff.nlm can be found in the
file Tabnd2a.exe. The 4.10 packet burst disabling nlm will not work with 4.11.
Contact Novell if the 4.11 version is needed.
If youdo
have a lan channel problem, start with the basics (drivers, lan
cards) and then
work up to the advanced (cabling, concentrators, switches,
auto-sensing hubs, MSAU’s, routers, CSU/DSU, etc).

 
40. Most lan card manufacturers recommend Category V cabling be used with
100 Mb lan cards.
 
41. Unplug the server from the lan. If the high utilization is related to a
lan issue, the utilization on the server will IMMEDIATELY drop. If this is the
case, EVERYTHING on the lan is suspect;
this includes routers, hubs,
concentrators, MSAU’s, lan cards, cabling, etc. Try and eliminate as many
variables as possible so that finding a bad piece of hardware (if that is what
is causing the problem) can be expedited. Remember that disconnecting a
server from the LAN also disables its DS synchronization process.
To help customers understand this problem, Novell has had the customer make a
separated lan (one MSAU, or concentrator) with all of the servers on it. This
will then
indicate if it is a problem with a server or with the lan itself.

 
42. If an RJ-45 plug has been jarred lose from a lan card/concentrator it
can cause a broadcast storm on the lan. This is a pain to check, but it is a
possibility because the server will continue to send requests out to the card.
This kind of situation is usually seen when an RJ-45 plug has been damaged
where a couple of the wires have been pulled out, but the others are still
connected.

43. Some customers report large amounts of packets on the lan channel
(viewed through either monitor, or their own favorite 3rd party program).
This can lead to high utilization. Some customers even see this kind of
traffic while
there are no users logged in to the network. The best solution to this is to
type “track on” at the server console, and see if there are specific MAC
addresses that are sending this kind of traffic. It is possible that a
misconfigured router is causing the problem. FYI: To check the MAC address of
a server, type “config” and look under the network card information area;
there will be a label node address this is the MAC for the server.


44. Often during a high utilization situation, low memory is either the
cause, or the situation causes a low memory condition. If that occurs, you
may see the Get ECB requests failed count incrementing. This statistic is
viewed from in servman.nlm | Network information. This is often a sign that
the serverdo
es not have enough memory to support the users at peak loads.
The No ECB Available Count statistic in monitor.nlm | LAN/WAN information |
Select a Lan driver and hit <ENTER>, refers to the number of times the server
received a packet from the LAN but did not have ECB (Event Control Buffers)
buffers available to store it. When a NetWare server receives a packet, it
must first be stored in an ECB before the NetWare OS can process it. When all
ECBs are being used and the server receives a packet, then
the packet is
dropped and the no ECB count is incremented. The cause for this count to
increment is that there are too few packet receive buffers for the number
of network packets coming in. This can be because the maximum number of
packet receive buffers is too low, there are too many users on the server for
the hardware to keep up, or the network is saturated with packets due to a
network problem.


45. Directory services can consume a great deal of IPX sockets on a server.
Sometimes, an increase in the number of available IPX sockets can decrease the
amount of time DS waits for a socket to become available. To change the number
of available IPX socketsdo
one of the following:
1. At the server console prompt type “Load spxconfg” then
look over the
parameters presented. Change parameter number 6 from its default of 1200 to
2400. This change is dynamic and is immediately realized on the server.
2. Type “Load spxconfg i=2400 q=1" at the system console prompt. This will
invoke spxconfg.nlm and change the open IPX sockets on the fly. The q=1
parameter tells the spxconfg to close its screen after the parameter is
changed. This command line option can also be placed in the autoexec.ncf file
to make the change permanent if the server is rebooted.
Miscellaneous Issues

46. Appletalk - There was a problem with the ATXPR.NLM that caused high
utilization. The file 41mac1.exe contains the new version of ATXPR.NLM. This
nlm can be found in the file 41macX.exe (where X is the current revision of
the file).
There have been a few utilization/performance issues with MAC workstations
where they will freeze. When the server/network is busy, the MAC users notice
that the little AppleTalk arrows at the top left of their screen come on for
5-30 seconds. When this happens nothing they type appears on the screen.
When it stops (the arrows), the typed text all appears in a rush, and sometimes
the MACs need to be restarted. At these times the server is usually at 100%
utilization, with AFP processes using most of the server cycles. A sniffer
trace taken can show that during this time the MACs are sending AFP requests
to the server, but getting no replies. When the problem stops, the server
replies to all of the requests in one spurt. This problem may be seen with
the addition of more MAC workstations on the network.
Customers with this problem have tried many things (i.e., purging volumes,
patch the server, interrupt troubleshooting, server resource troubleshooting,
etc.). None of these things seem to make a difference with the response of
the server. However, rebuilding all of the desktopsdo
es seem to help. Some
customers have even rebuilt their desktops twice and finally found problems
the second time around. Other things to check that are known issues are to
make sure that “Calculate folder sizes” is OFF on all of the MACs. Also, if
Admin sets the view to LIST on the server and then
opens a lot of folders, the
MACs will have to track the subfolder information when the server is mounted.
Changing the view in all of the subfolders back to ICON then
rebooting all of
the MACs will correct this problem. You can only change the view of a folder
when you are logged in as an Admin user.
One of the biggest helps to this problem is to update AppleTalk.nlm and its
associated message file with ATK51A.EXE. This is version 5.11a of AppleTalk.nlm
and is not released yet. However, some customers have found that this new
version helped their problem(s) greatly.


47. If rconsole is run from a Windows’s os box and left running while you
toggle to another Windows program, the server utilization will usually max
out. Hint:do
n抰do
this. If you need to run rconsole in Windows 3.x, Windows
95/98, or Windows NT Workstation, fine;
close it when you aredo
ne.
 
48. As a last resort beforedo
wning a server for high utilization, try this
(refer to Supervising the Network II, p.486):
1. Load Monitor | Scheduling Information
2. See which process is taking up the greatest processor load, and remember
the “Process Name”
3. At the server console, type in “Load schdelay process_name = number”
The number can be between 2 and 10000. The higher the value, the lower the
priority of the process.
This NLM can be used to reassign a process a lower value than other processes
to reprioritize it with the CPU. This would bedo
ne only in situations where
an NLM was monopolizing the CPU. This command can be put in the Autoexec.ncf
file.


49. Virus check thedo
s partition on the server. If there is a virus there,
REPLACE the server.exe file from the clean one off the NetWare cdrom. then
,
reinstall the 410PTX.exe patch for NetWare 4.10 or the IWSPX.exe patch for
NetWare 4.11;
this will modify the server.exe to the latest loader patch
update. If there happens to be a virus on the server partition itself, clean
it, and the recopy public and system files. then
reinstall all of the patches.
 
50. There have been some customers that have found that by mounting certain CDROMs as NetWare volumes that utilization will stay at 100%. The first troubleshooting tip here is to get CDUP5a.exe and then
do
a “d purge” at the console prompt. This will remove all of the old indexes and force the server to recreate an index upon mounting an new cd. If one of the index files is corrupt, this can cause high utilization. Also, if there are index files on the server that are accessed rarely, they will be compressed and then
decompressed upon use. Flag all of the index files DC fordo
n抰 compress. These files are located in the sys:cdrom$$.rom directory. This is a hidden directory.

 
我靠,你把我们当廉价劳动力啊,这么多。要是少一些的化还差可以考虑帮帮你,呵呵,但
感觉你这个人简直是得寸进尺。
 
太多了吧,我们要收稿费的,呵呵
 
没有五、六张票子恐怕难有人干,都挺忙的。
 
难道就真的没有好心人帮我这个忙了吗,大家可以只翻译其中的一段呀,能翻译一点是一点呀
,分数不够还可以商量的呀。
 
试翻一段:
Disk Related Issues

12. If a NetWare volume has more than 2.5 to 3 million directory entries
allocated and/or used, high utilization can be experienced in specific
circumstances. This problem is specific to NetWare 4.11 due to the fact that
it allows a maximum of 16 million directory entries per volume. NetWare 4.10
will not experience this specific problem because a maximum of 2 million
directory entries per volume is allowed.
如果一个NetWare 网的卷可以有250,000,000 到300,000,000个目录被分配或使用,在特殊
环境下必须采用高利用率.但只针对于4.11版以上的NetWare ,因为它最大支持16,000,000,
000个目录.4.10版因为它只支持2,000,000个目录,就谈不上这个问题了.
 
看来还是有些好心人呀,不过还是希望有更多的朋友帮忙呀,大家一起来呀。谢谢了
 
我看大家一起帮忙翻译它吧。
TO HUANGBAILI:
你最好不要一次贴这么多出来,分批慢慢贴嘛!否则这么多内容让人见了就头晕,更谈不上替你翻译了!
 
不好意思呀,我也是没有多少办法,英语能力有限。只好请大家帮忙了。
 
我来试试一、二段:
12、如果一个 NetWare 卷分配或使用的目录项多于2,500,000到3,000,000个,在特定的情
况下就会发生过度负荷(high utilization)问题。这个问题只NetWare 4.11有,因为
NetWare 4.11允许每个卷最多有16,000,000个目录项。而NetWare 4.10 只允许每个卷最
多有2,000,000个目录项。
要查看每个卷的目录项个数,在控制台下加载Servman.nlm,选取Volume information项 。
这个菜单会显示服务器上当前安装的每个卷。加亮某个卷,按回车键。一个表就会显示卷的
相关信息。在表的底部(从底部数第二和第三个)是统计信息,“目录项”(“Directory
entries”)和“使用中的目录项”(“Used directory entries”):“目录项”的
统计信息显示卷分配的目录项个数。“使用中的目录项”的统计信息显示实际使用的目录
项个数。目录项保存在称为目录项表(DET)的表里。DET 表含有关文件、目录、目录受托
权限的基本信息以及卷的其它结构信息。如果“目录项”和“使用中的目录项”中的任一个
大于2,500,000,过度负荷(high utilization)就可能发生。随着这个数量的增长(也就是说
,卷上有更多的文件和目录),问题会逐渐变严重。
 
非常感谢大家的帮助。
 
同学,的确太多了啊,哪里不明白弄出来倒可以,翻一本小说似的,
继续还是结束?
 
晕倒!!!有没好心人扶我一把[:D]
 
Oh,My God!
这么长,看完都要花多少时间啊!
 
呵呵,免费的原版资料啊,难得!谢谢,朋友!:)
 

所有典型的过度负荷(high utilization)特征如下。即:用户失去与服务器的连接,存/取文件将相当的慢(如果可能),登录慢得好像不存在,服务器挂起,及极端情况下的abends。下面是这一问题的描述:一个运行在服务器或工作站的应用程序为特定用户或组在某个NetWare卷上进行信任权的的搜索,它会使用叫做‘scan Bindery Object Trustee Paths’的API。这个API会对选取定卷的整个目录项表从头到尾进行平面式的扫描。除非在一开始就指向较低级的目录树(比如:一个子目录),此外对整个目录项表的扫描没有别的可选项。另外,目前没有其它的API能够在传统的NetWare文件系统上返回我们期望的信息。当这种搜索运行时,程序在整个搜索期间通过锁定卷的信号量来锁定卷,直到这种搜索完毕才释放信号量。这就是说所有对这个卷的存取必须进入等待队列直到信号量被释放才能对这个卷进行读写操作。因为这种文件系统的组织方法,文件的权限被存在文件系统一级,目录的权限被存在目录一级。因而,使用这个API开销是非常昂贵的,因为它必须搜索所有到达文件的路径才能返回授权任务。
Top
如果你的服务器有大于2500000到3000000个目录项,就可以做实验(数小时后)来验证问题确实由‘high utilization’引起的。在工作站上运行‘NWAdmin’(3.x,95 or NT版本),选择一用户进入用户信息屏幕。选择“Rights to Files and Directories”按钮,一个窗口就会显示卷,及某个卷的权限信息。缺省情况下没有卷显示,因而没有权限显示。通过选取‘Find’按钮或 ‘Show’按钮,某个卷(或指定某个目录结构)准备好开始搜索并运行搜索。运行完毕之后,就会返回这个用户的权限列表。在这种搜索期间,我们看到了‘high utilization’的条件。这两个按钮使用了‘Scan Bindery Object Trustee Paths’API,将增加服务器的负荷。在Novell的这个实验已经表明,对于一个约有8000000个目录项的卷(Pentium/133)的服务器,在工作站上进行这种搜索将可能需要30 分钟。显然在更快的服务器硬件上将更快得到结果。FYI:(?)NWAdmin 在它的工具中的其它功能将不使用这个有问题的API。再说,这不是程序的错误,它是API的问题。
我们在‘NWAdmin’已经复现了这个问题(象上面提到的),而且‘NetAdmin’以相同的搜索方式对用户或组进行授权。当一个用户没有在DS 数据库中定义自己的主目录,并且建立了与服务器的会话,Novell的ftp 服务器(ftpserv.nlm)也使用这个有问题的API。如果上这种情况,对卷的扫描就开始进行,以期到用户对卷目录和文件所拥有的权限。显然如果用户在DS数据库有自己的主目录,就不会进行这种扫描。对于当前建立的链接,ftp 服务器记远不会运行这种扫描操作,除非会话是第一次建立。
我们在第三方产品象‘BindView’也发现了这一问题。总之,任何(Novell 或第三方的)利用了这个特殊的API的程序都可能在服务器上引发‘high utilization’问题。如果跟踪分析,可查看 NCP 函数23 ,子函数 71。
Top
Novell 目前推荐的解决这个问题的方法如下:
1、不要使用在程序中得用了这个有问题的API的程序(NWAdmin/NetAdmin)来搜索用户或组的受托权限。
2、如果可能,等数小时之后再在NWAdmin内扫描用户或组的权限,在深夜发生 ‘High utilization’总比在各地用户在工作期间发生问题要好得多。
3、如果必须要使用‘NWAdmin’,不要整个服务器(也就是说包括每一全卷)或在卷一级上进行检查。
使用溜揽对话框选择某个目录完成这种搜索工作。这将极大地减小搜索的目录项数量,因此发生‘high utilization’的机会就会减少。
4、通过删除卷上的信息和数据,减少卷的可分配目录项数量,然后清理卷,然后卸载卷,再重新装载卷。
5、 可以用 RIGHTS 命令取得和更改授权;但是这种命令行工具没有NWAdmin友好。但是它不使用这个有问题的API。
6、NSS 。NetWare 5是Novell当前官方的解决这一问题的方案。Novell已经测试了NSS, 发现在有在高数量的分配目录项表的NSS 卷上有更好的反应性能。
 
你把问题分成几段,或许我们还可以帮帮你!!
 

Similar threads

A
回复
0
查看
860
Andreas Hausladen
A
A
回复
0
查看
738
Andreas Hausladen
A
A
回复
0
查看
758
Andreas Hausladen
A
I
回复
0
查看
2K
import
I
顶部