Resolve Too Many Open Files
If you have exceptions pointing to having too many open files, you might have an operating system resource problem. See how to fix it
by Laurent Goldsztejn

December 3, 2004

File descriptors are unique to processes to identify open files of different types, including sockets or pipes. Exceptions indicating too many open files are thrown when the operating system runs short of file descriptors. However, running out of file descriptors is often the symptom of a more complex issue where resources associated with these files are not maintained correctly. Let's see how we can troubleshoot issues that lead to exceptions like this:

java.net.SocketException: Too many open 
   files
   at java.net.PlainSocketImpl.accept(
      Compiled Code) 
   at java.net.ServerSocket.implAccept(
      Compiled Code) 
   at java.net.ServerSocket.accept(
      Compiled Code) 
   at weblogic.t3.srvr.ListenThread.run(
      Compiled Code)
…

and like this:

java.io.IOException: Too many open files 
   at java.lang.UNIXProcess.forkAndExec(
      Native Method)
   at java.lang.UNIXProcess.(
      UNIXProcess.java:54)
   at java.lang.UNIXProcess.forkAndExec(
      Native Method)
   at java.lang.UNIXProcess.(
      UNIXProcess.java:54)
   at java.lang.Runtime.execInternal(
      Native Method)
   at java.lang.Runtime.exec(
      Runtime.java:551)
   at java.lang.Runtime.exec(
      Runtime.java:477)
   at java.lang.Runtime.exec(
      Runtime.java:443)
…

The first exception is thrown when the error affects the underlying TCP protocol, while the second is thrown when the error affects an I/O operation. Both are symptoms of a similar problem that blocks the server, which we'll address here with investigative techniques.

The second exception represents a scenario in which the JVM process lacks file descriptors, although it needs new ones to duplicate the parent process's file descriptors during the execution of a forkAndExec() subroutine. For each process, the operating system kernel maintains a file descriptor table where all file descriptors are indexed in the u_block structure.

Let's start with a refresher on file descriptors. A file descriptor is a handle represented by an unsigned integer used by a process to identify an open file. It is associated with a file object that includes information such as the mode in which the file was opened, its position type, its initial type, and so on. This information is called the context of the file.

The most common ways for processes to obtain file descriptors are through the open or create native subroutines or through inheritance from a parent process. The latter way allows the child process equal access to the files used by the parent process. File descriptors are generally unique to each process. When a child process is created with a fork subroutine, the child gets a copy of all its parent process's file descriptors open at the time of the fork. The same copy procedure occurs when a process is duplicated or copied by the fcntl, dup, and dup2 subroutines.

Lacking Descriptors
The exceptions shown indicate an operating system resource problem and stems from the OS and JVM process running out of file descriptors. This problem usually occurs after several concurrent users get a connection to the server. File descriptors are used when java reads in the classes required to run your application, and high-volume applications can use a lot of file descriptors that could lead to a lack of new file descriptors. Each new socket requires a new descriptor. Clients and servers communicate through TCP sockets. Each browser's HTTP request consumes TCP sockets when a connection is established to a server. Issues leading to sockets not being closed correctly will lead eventually to a lack of file descriptors because the file associated with the socket will not be released unless the socket gets closed.

File descriptors are retired when the file is closed or the process terminates. If the close() system call doesn't return a failure code, then the associated file descriptor becomes available for a future open() call that allocates a file descriptor. When all file descriptors associated with an open file description have been closed, the open file description is freed.

We should not rely on the garbage collection and the object finalization to free a non-Java resource such as a file descriptor, which is why the close() call should be used and its output handled in case an error occurs.

Closed sockets transit to TIME_WAIT to make sure that all the data was transmitted during the connection; a final acknowledgment (ACK) should finalize the data transfer. This state delays the release of the file descriptor allocated to it. The duration of this TIME_WAIT period is defined in the kernel parameter named tcp_time_wait_interval on Unix systems. On Microsoft Windows NT/2K/XP this period is defined in the registry called TcpTimedWaitDelay in the system key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters. On Solaris we recommend setting tcp_time_wait_interval to 60000 (60 seconds).

It is important to monitor file descriptors first and develop an understanding of how these diagnostics can inform us about the status of open files and other potential issues. After stepping through this troubleshooting section for our operating system, it may then be necessary to increase the number of file descriptors.

Let's turn to resolution steps. We need to identify if the total amount of file descriptors is too low or if some file descriptors are not being released correctly. We can diagnose this situation by checking the total number of file descriptors at different periods to determine whether or not this number decreases or keeps increasing.

If the number goes down, we should increase the maximum number of file descriptors to prevent the problem from reoccurring. This change can be combined with a reduction of the length of time that a connection stays in the TIME_WAIT state before being closed. On busy servers the default value of 240 seconds can delay other connection attempts and therefore will limit the maximum number of connections. If it keeps going up, we should identify if some descriptors are being handled too long (files not being closed correctly) and if too many files are being created (for example, where a driver library keeps loading a file for each new JDBC connection).

Loading JAR files can also reduce the number of file descriptors used. One descriptor is used for a JAR, although one descriptor would be used for each single class if loaded separately.

Monitoring Descriptors
We can use different techniques to monitor and diagnose how all the descriptors are used by one process, depending on the operating system.

Unix platforms: Among other things, the lsof (LiSt Open Files) Unix administrative tool reports information about open files and network file descriptors, including their type, size, and i-node. For a specific process the syntax is:

lsof -p <pid of process> 

Example 1: This command was executed right after starting WebLogic Server 8.1SP1 on Solaris 2.7, and it shows that 84 files descriptors were allocated by the Java process (PID 390) under which the server is running:

$ lsof -p 390 | wc -l
84

This number is far below the default hard limit of file descriptors, and this command can be executed after the exception occurs to make sure that the maximum number of open files was reached by this java process to confirm that the process lacks file descriptors. Then we can run:

$ lsof -p <pid>

and redirect the output to a file to check each one of the open files. If a file was supposed to be closed but is present in the list, then we will investigate why this file was not closed before as expected (see Listing 1). The lsof -h command displays all the possible syntax and options (See Resources for the latest version of this program.)

A file descriptor will be used for each socket connection and lsof can also show the type of socket (TCP or UDP) and the listen address and port (in the name column).

Example 2: On HP we can also use the performance-monitoring tool Glance (see Resources) to analyze the total number of files open when running the WebLogic Server:

COMMAND PID USER FD TYPE DEVICE 
   SIZE/OFF NODE NAME
in.telnet 29705 root 2u inet 
   0x30002808fd8 0t76 TCP 
   aaaaabbbb:telnet->
   abcdef.bea.com:3886 (
   ESTABLISHED)

If we do not have lsof available, we can also view all the file descriptors for a process in /proc//fd. Each file descriptor lives in this directory. Example 3: This example shows a socket in a CLOSE_WAIT state:

COMMAND PID USER FD TYPE DEVICE 
   SIZE/OFF NODE NAME
java 545 weblogic 24u IPv4 
   0x30002a4cea8 0t0 TCP 
   abcd:7001->xyz. com:12345 
   (CLOSE_WAIT)

As long as the socket remains in this state, its associated file descriptor will exist. If many sockets are in this state, then the process can run out of file descriptors. TCP sockets enter in this state when its peer (other side of the connection) sent a FIN. No timeout can be set to force the closure of these sockets, so they are just waiting for the local application to call the close() function.

Windows platforms: On Microsoft Windows NT or Windows 2000 the command-line tool handle reports information about handles that refer to open files as shown in this example:

C:\tmp>ps -ef | grep java
usera 1656 1428 0 10:11:41 
   CONIN$ 0:46 c:\Releases\WLS8.2\
   JDK141~1\bin\java -client 
   -Xms32m -Xmx200m 
   -XX:MaxPermSize=128m 
   -Xverify:none -Dweblogic.Name=
   myserver -Dweblogic.
   ProductionModeEnabled= 
   -Djava.security.policy=
   "c:\Releases\WLS8.2\WEBLOG~1\
   server\lib\weblogic.policy" 
   weblogic.Server

It can be used for a specific process (see Resources for this tool). The command:

handle -p java

provides output, for example, that shows that 65 file handles were used on Windows when WebLogic Server 8.1SP2 was running (see Listing 2).

Another tool for Windows, Process Explorer, is a more sophisticated utility to monitor file handles. It has a GUI interface and displays more information about each running process. You can use this program to search for a particular handle (see Resources for this tool). Figure 1 shows an output example in which 884 handles were used by the java process under which WebLogic Server was running, of which just a few (65) refer to open files.

By using any of these tools you can determine if a file that is supposed to be closed is still open. Next, you should check how the file was closed and how its file descriptor was released. Let's detail how file descriptors are defined on different platforms. The limit of file descriptors, as well as the maximum size that can be allocated to a process, are defined by a resource limit. These values should be set in accordance with OS-specific file descriptor values suggested in the WebLogic Server documentation.

Both Unix and Linux have file descriptors. The main difference, though, is in the setting of the hard limit value, the default value, and the configuration procedure of file descriptors. The maximum number of file descriptors is also called the hard limit. The soft limit defines how many files a process can open. The soft limit can be increased but cannot exceed the hard limit.

On Windows 2000 server, the open file handles limit is set to 16,384. This number can be monitored in the task manager performance summary.

Solaris platform: The /usr/bin/ulimit utility defines the number of file descriptor allowed for a single process. Its maximum value is defined in rlim_fd_max that is set at 1024 by default. Only the root user can modify these kernel values. The default value for the soft limit is 64 or 256 from Solaris 8.

Linux platform: The Admin user can set file descriptor limits in the etc/security/limits.conf configuration file:

soft nofile 1024
hard nofile 4096

A system-wide file descriptor limit can also be set by adding these three lines to the /etc/rc.d/rc.local startup script:

# Increase system-wide file 
   descriptor limit.
echo 4096 > /proc/sys/fs/file-max
echo 16384 > /proc/sys/fs/
   inode-max 

HP-UX platform: nfile defines the maximum number of open files. The maxfiles value is the soft-file limit per process, and maxfiles_lim is the hard-file limit per process; 2800 is usually a good value for nfile, the total number of concurrent file descriptors.

AIX platform: The file descriptors limit is set in the /etc/security/limits file, and its default value is 2,000. This limit can be changed by the ulimit command or the setrlimit subroutine. The maximum size is defined by the OPEN_MAX constant.

About the Author
Laurent Goldsztejn is a back-line developer relations engineer with BEA systems who specializes in troubleshooting and solving complex customer issues with their mission-critical applications. Contact Laurent at .