Introduction
P6R has built a brand new implementation of an SSH server. As with all our software, this SSH server runs on multiple platforms which include: Windows, Linux, and Solaris. The main motivation for this project was to construct a server that could easily scale to performance demands that far exceed existing SSH implementations. A secondary motivation was to test out P6Platform’s high-performance threading and asynchronous IO functions.
Our server design has several main principles:
- Threads should not block waiting for IO to complete.
- A thread should not be tied to a single SSH session.
- Protocol and application processing (e.g., shell, SCP, execs invocations) should not interfere with each other.
- The server should allow a customer to use their own shell, exec, or subsystem implementations.
Thread Design
Our server design uses two pools of threads. One pool is used to process all SSH protocol processing and the other to run application processing such as running a shell, executing execs, or running SSH subsystems. Each of these pools has its own p6IEventQ component which is used as a work queue. Each thread in a pool will block on its associated work queue to waiting for incoming work. Work arrives on a queue in the form of a message containing a predefined structure (e.g., with a type field indicating the type of the message (e.g., socket IO, file IO, application message between threads). For example, for SSH protocol threads incoming work could be new network connections or incoming packets on existing SSH sessions both arriving via socket IO. Notice, that the two independent thread pools satisfy item ‘c’ above from our design principles.
The SSH protocol threads communicate with the application threads by creating a message and placing it onto the work queue associated with the application threads. Such messages can contain data from an SSH client for a shell or maybe a SCP server. When an application thread has data to be sent/returned back to the SSH client it creates another message and places it on the work queue associated with the SSH protocol threads.
Another important point to make is that these messages also contain a local SSH session identifier and a SSH channel number (SSH sessions have multiple channels to send data over). When a new SSH session is initiated by a client connection the server allocates a session object and places it into a global map data structure indexed by session identifier. The session object contains all the state of the session including any data buffers, generated encryption keys, counters, etc.
The above mentioned session object must be found when an incoming packet arrives off of the network for an existing SSH session. This is easily done. The SSH server posts an outstanding socket read for all active sessions and associates its session object with the read. (Note that the thread does not block on this socket read request. But instead it checks its associated worker queue for other work to perform.) When a packet finally arrives for one of the outstanding socket reads a message is placed on the SSH worker queue with its associated session object. Notice that a session’s state is separate from the thread and thus any thread can service any SSH session request. Thus this state mechanism satisfies item ‘b’ above from our design principles.
Lastly, we should describe how application threads only need the data that is passed to them in order to run a shell, an exec command, or an SSH subsystem. This is accomplished by associating separate, newly created objects that implement a shell, an exec, or subsystem for each SSH channel in a session. These “application” objects maintain the application state over the lifetime of the SSH channel. These objects are included in the messages sent from the SSH protocol threads to the application threads. Thus when an application thread wakes up from waiting on its work queue it receives both data from a client and the object to process it. Again notice that all the state of the application is separated from the thread thus allowing any application thread to work on any application request. The benefit of this type of design is performance since threads are not dedicated to a specific session nor are they blocked waiting for IO to complete. This mechanism also supports item ‘b’ above from our design principles.
The application objects mentioned in the previous paragraph which implement a shell, exec, or subsystem are all P6R component objects that have a standard definition thus allowing a customer to write their own versions and replace those that come pre-existing with the SSH server.
Asychronous Design
Our p6IEventQ component supports an asynchronous IO model. The event queue supports several types of completion notifications (socket, file, timer, etc) and can be extended to support application specific messages as well. As an example of performing socket IO, an application would associate a socket component with the queue and then post requests to accept, connect, read, write, etc. to the queue. The applications thread (or threads) can then wait for an new event on the queue to receive the completion notification of the requested operation. The type of work returned on the queue is typed so that the waking thread can figure out how to handle it. (Note for socket IO the socket handle is also returned and for file IO the file handle is returned in the message as well.)
Thus this queue mechanism frees up threads to do other tasks instead of blocking and waiting for the requested IO to complete. Of course this requires a different style (and perhaps harder) of writing server code but the benefits can be significant performance gains. This queue design satisfies item ‘a’ above from out design principles.
Plugin Architecture
Out of the box our SSH server provides the following plugins:
- A password authentication method
- An asymmetric key authentication method
- A limited CLI for an application shell (which allows the user to check the status of the system state such as number of threads running, amount of memory used, etc)
- An exec command processor that implements SCP but nothing else. SCP allows for secure file and directory transfer between client and server.
A plugin is nothing more than a P6R component object which implements a defined interface(s). A plugin is found by either including its GUID in the SSH server’s configuration file or it is searched for by the server’s initialization code. Details on how plugins are implemented, defined, and loaded by the SSH server can be found in the server’s documentation.
The key point to take away from this section is that any of the plugins defined can be replaced by customer implemented versions. In addition, other authentication method plugins can be added easily (e.g., to support Kerberos). Also, any number of SSH subsystems can be defined and added to our server just as long as they define unique names (“[email protected]” is reserved for P6R applications). Thus our plugin architecture satisfies item ‘d’ above from our design principles.
Miscellaneous
Our SSH server performs session crypto re-keying based on several different schemes which can be controlled via configuration parameters.
We support re-keying based on time intervals, number of bytes transferred over the connection, and number of SSH packets transferred. One or more of the these schemes can be turned on at the same time.
We have added extensive logging at all levels of the server implementation. We also pass a flag into plugins indicating what level of logging has been requested. For example, we use this to turn on SCP protocol logging even though SCP is implemented in an exec component that runs in the application threads. Our SCP implementation handles Windows’ driver letters (e.g., “C:”, “e:”) in file paths so that transfer of files and directories from/to Windows and Unix works easily.
Lastly, we have tested our SSH server with all major SSH clients and command line SCP clients.