All Categories :
Java
Chapter 23
Introduction to Network Programming
by Mike Fletcher
CONTENTS
One of the best features of Java is its networking support. Java
has classes that range from low-level TCP/IP connections to ones
that provide instant access to resources on the World Wide Web.
Even if you have never done any network programming before, Java
makes it easy.
The following chapters introduce you to the networking classes
and how to use them. A guide to what is covered by each chapter
follows:
- Chapter 23, "Introduction
to Network Programming"
The chapter you are reading contains an introduction to TCP/IP
networking, a list of the concepts you should be familiar with
before reading the rest of the networking section, and an overview
of the networking facilities provided by Java.
- Chapter 24, "Developing
Content and Protocol Handlers"
This chapter discusses what protocol and content handlers
are and how they can be applied, and provides an introduction
to writing your own handlers.
- Chapter 25, "Client/Server
Fundamentals"
This chapter covers the basics of client/server programming
and how Java supports the client/server model.
- Chapter 26, "Java Socket
Programming"
This chapter shows you how to use Java's low-level TCP/IP
socket facilities.
- Chapter 27, "Multiuser
Network Programming"
This chapter presents techniques for creating applets that
allow multiple users to interact with each other.
Although networking with Java is fairly simple, there are a few
concepts and classes from other packages you should be familiar
with before reading this part of the book. If you are interested
only in writing an applet that interacts with an HTTP daemon,
you probably can concentrate just on the URL
class for now. For the other network classes, you need at least
a passing familiarity with the World Wide Web, java.io
classes, threads, and TCP/IP networking.
World Wide Web Concepts
If you are using Java, you probably already have a familiarity
with the Web. You need some knowledge of how Uniform Resource
Locators (URLs) work to use the URL
and URLConnection classes.
java.io Classes
Once you have a network connection established using one of the
low-level classes, you will use java.io.InputStream
and java.io.OutputStream
objects or appropriate subclasses of these objects to communicate
with the other endpoint. Also, many of the java.net
classes throw a java.io.IOException
when they encounter a problem.
Threads
Although not strictly needed for networking, threads make using
the network classes easier. Why tie up your user interface waiting
for a response from a server when a separate communication thread
can wait? Server applications also can service several clients
simultaneously by spawning off a new thread to handle each incoming
connection.
TCP/IP Networking
Before using the networking facilities of Java, you should be
familiar with the terminology and concepts of the TCP/IP networking
model. The next part of this chapter gets you up to speed.
TCP/IP (Transmission Control Protocol/Internet Protocol) is the
set of networking protocols used by Internet hosts to communicate
with other Internet hosts. If you have ever had any experience
with networks or network programming in general, you should be
able to skim this section and check back when you find a term
you are not familiar with. A list of references is given at the
end of this section if you want more detailed information.
TCP/IP and Networking Terms
Like any other technical field, computer networking has its own
set of jargon. These definitions should clear up what the terms
mean:
- host. An individual machine on a network. Each host
on a TCP/IP network has at least one unique address (see IP
number).
- hostname. A symbolic name that can be mapped into an
IP number. Several methods exist for performing this mapping,
such as DNS (Domain Name Service) and Sun's NIS (Network Information
Services).
- IETF. The Internet Engineering Task Force, a group
responsible for maintaining Internet standards and defining new
ones.
- internet. A network of networks. When capitalized as
the Internet, the term refers to the globally interconnected
network of networks.
- intranet. A term used to describe a network using TCP/IP
protocols which either is not connected to the Internet or is
connected through a firewall.
- IP number. A unique address for each host on the Internet
(unique in the sense that a given number can be used only by one
particular machine, but a particular machine may be known by multiple
IP numbers). Currently, this is a 32-bit number that consists
of a network part and a host part. The network part identifies
the network on which the host resides; the host part is the specific
host on that network. Sometimes, the IP number is referred to
as the IP address of a host.
- packet. A single message sent over a network. Sometimes,
a packet is referred to as a datagram, but the former term
usually refers to data at the network layer and the latter term
refers to a higher-layer message.
- protocol. A set of data formats and messages used to
transmit information. Different network entities must speak the
same protocol if they are to understand each other.
- protocol stack. Networking services can be thought
of as different layers that use lower-level services to provide
services to higher-level services. The set of layers that provides
network functionality is known as a protocol stack.
- RFC. Request For Comments-documents in which proposed
Internet standards are released. Each RFC is issued a sequential
number, which is how they are usually referenced. Examples are
RFC 791, which specifies the Internet Protocol (the IP of TCP/IP),
and RFC 821, which specifies the protocol used for transferring
e-mail between Internet hosts (SMTP).
- router. A host that knows how to forward packets between
different networks. A router can be a specialized piece of network
hardware or can be something as simple as a machine with two network
interfaces (each on a different physical network).
- socket. A communications endpoint (that is,
one end of a conversation). In the TCP/IP context, a socket usually
is identified by a unique pair consisting of the source IP address
and port number and the destination IP address and port number.
The Internet Protocols
TCP/IP is a set of communications protocols for communicating
between different types of machines and networks (hence the name
internet). The name TCP/IP comes from two of the protocols:
the Transmission Control Protocol and the Internet Protocol. Other
protocols in the TCP/IP suite are the User Datagram Protocol (UDP),
the Internet Control Message Protocol (ICMP), and the Internet
Group Multicast Protocol (IGMP).
These protocols define a standard format for exchanging information
between machines (known as hosts) regardless of the physical
connections between them. TCP/IP implementations exist for almost
every type of hardware and operating system imaginable. Software
exists to transmit IP datagrams over network hardware ranging
from modems to fiber-optic cable.
TCP/IP Network Architecture
There are four layers in the TCP/IP network model. Each of the
protocols in the TCP/IP suite provides for communication between
entities in one of these layers (see Figure 23.1). These lower-level
layers are used by higher-level layers to get data from host to
host. The layers are as follows, with examples of which protocols
live at each layer:
Figure 23.1: The TCP/IP protocol stack.
- Physical (Ethernet, Token Ring, PPP)
- Network (IP)
- Transport (TCP, UDP)
- Application (telnet, HTTP, FTP, Gopher)
Each layer in the stack takes data from the one above it and adds
the information needed to get the data to its destination, using
the services of the layer below. One way to think of this layering
is like the layers of an onion. Each protocol layer adds a layer
to the packet going down the protocol stack (see Figure 23.2).
When the packet is received, each layer peels off its addressing
to determine where next to send the packet.
Figure 23.2: Addressing information is added and removed at each layer.
Suppose that your Web browser wants to retrieve something from
a Web server running on a host on the same physical network. The
browser sends an HTTP request using the TCP layer. The TCP layer
asks the IP layer to send the data to the proper host. The IP
layer then uses the physical layer to send the data to the appropriate
host.
At the receiving end, each layer strips off the addressing information
that the sender added and determines what to do with the data.
Continuing the example, the physical layer passes the received
IP packet to the IP layer. The IP layer determines that the packet
is a TCP packet and passes it to the TCP layer. The TCP layer
passes the packet to the HTTP daemon process. The HTTP daemon
then processes the request and sends the data requested back through
the same process to the other host.
When the hosts are not on the same physical network, the IP layer
handles routing the packet through the correct series of hosts
(known as routers) until the packet reaches its destination.
One of the nice features of the IP protocol is that individual
hosts do not have to know how to reach every host on the Internet.
The host simply passes to a default router any packets for networks
it does not know how to reach.
For example, a university may have only one machine with a physical
connection to the Internet. All the campus routers know to forward
all packets destined for the Internet to this host. Similarly,
any host on the Internet only has to get packets to this one router
to reach any host at the university. The router forwards the packets
to the appropriate local routers (see Figure 23.3).
Figure 23.3: An example of IP routing.
Note |
A publicly available program for UNIX platforms called traceroute is useful if you want to find out what routers are actually responsible for getting a packet from one host to another and how long each hop takes. The source for traceroute can be found by consulting an Archie server for an FTP site near you, or from ftp://ee.lbl.gov.
|
The Future: IP Version 6
Back when the TCP/IP protocols were being developed in the early
1970s, 32-bit IP numbers seemed more than capable of addressing
all the hosts on an internet. Although there currently is no lack
of IP numbers, the explosive growth of the Internet in recent
years is rapidly consuming the remaining unassigned addresses.
To address this lack of IP numbers, a new version of the IP protocols
is being developed by the IETF.
This new version, known as either IPv6 or IPng (IP Next Generation),
will provide a much larger address space of 128 bits. This address
space will allow for approximately 3.4 x 1038 different IP addresses.
Where IP addresses used to be expressed as four decimal numbers
(with values 0 to 255) separated by a period (.), as in 192.242.139.42,
IPv6 addresses are expressed as eight groups of four hexadecimal
digits separated by colons, like this:
5A02:1364:DD03:0432:0031:12CA:0001:BEEF
IPv6 will be backward compatible with current IP implementations
to allow older clients to interoperate with newer ones. Provisions
are contained in the protocol for tunneling IPv6 traffic over
an IPv4 network (and vice versa). Other benefits of the new version
are as follows:
- Improved support for multicasting
(sending packets to several destinations at one time).
- Simplified packet header formats.
- Support for authentication and encryption
of packet contents at the network layer.
- Support for designating a connection as
a special flow which should be given special treatment (such as
real-time audio data that needs quick delivery).
Several new protocols are being added to the TCP/IP suite. The
RTP (Real Time Protocol) and RTCP (Real Time Control Protocol)
protocols provide support for applications such as video and audio
conferencing. Some protocols are being done away with and the
functionality they provide is being merged into other existing
protocols. IGMP (Internet Group Membership Protocol), which provided
support for membership in multicast groups, has been done away
with; multicast membership is now handled with ICMP messages.
These enhancements to TCP/IP should allow the Internet to continue
the phenomenal growth it has experienced over the past few years.
Where to Find More Information
This chapter was not meant to completely cover the subject of
TCP/IP. If your curiosity has been piqued, the following online
documents and books may be of interest to you.
RFCs
The first and definitive source of information on the IP protocol
family are the Request For Comments documents defining the standards
themselves. An index of all of RFC documents is available through
the Web at http://ds.internic.net/ds/rfc-index.html.
This page has pointers to all currently available RFCs (organized
in groups of 100) as well as a searchable index.
Table 23.1 gives the numbers of some relevant RFCs and what they
cover. Keep in mind that a given RFC may have been made obsolete
by a subsequent RFC. The InterNIC site's index will note in the
description any documents that were made obsolete by a subsequent
RFC
Table 23.1. RFC documents.
RFC Number | Topic
|
791 | The Internet Protocol (IPv4)
|
793 | The Transmission Control Protocol (TCP)
|
768 | The User Datagram Protocol 2(UDP)
|
894 | Transmission of IP Datagrams over Ethernet Networks
|
1171 | The PPP Protocol
|
1883 | IP Version 6
|
1602 | The Internet Standards Process: How an RFC Becomes a Standard
|
1880 | Current Internet Standards
|
Books on TCP/IP
A good introduction to TCP/IP is the book TCP/IP Network Administration
by Craig Hunt (O'Reilly and Associates, ISBN 0-937175-82-X). Although
it was written as a guide for system administrators of UNIX machines,
the book contains an excellent introduction to all aspects of
TCP/IP, such as routing and the Domain Name Service (DNS).
Another book worth checking out is The Design and Implementation
of the 4.3BSD UNIX Operating System by Samuel J. Leffler,
et al. (Addison-Wesley, ISBN 0-201-06196-1). In addition to covering
how a UNIX operating system works, it contains a chapter on the
TCP/IP implementation.
If you are a beginner, another way to get started get started
with TCP/IP is by reading Teach Yourself TCP/IP in 14 Days
by Timothy Parker (Sams Publishing, ISBN 0-672-30549-6).
IPng and the TCP/IP Protocols by Stephan A. Thomas (John
Wiley & Sons, ISBN 0-471-13088-5) offers an overview of version
6 of the Internet protocols.
This section gives a short overview of the capabilities and limitations
of the different network classes provided in the java.net
package. If you have never done any network programming, this
section should help you decide what type of connection class you
need to base your application. The overview will help you pick
the Java classes that best fit your networking application. An
overview of Java security, as it relates to network programming,
is also provided.
Which Class Is Right for Me?
The answer to this question depends on what you are trying to
do and what type of application you are writing. Each network
protocol has its own advantages and disadvantages. If you are
writing a client for someone else's protocol, the decision probably
has been made for you. If you are writing your own protocol from
scratch, the following should help you decide which transport
method (and hence, which Java classes) best fit your application.
The URL Class
The URL class is an example
of what can be accomplished using the other, lower-level network
objects. The URL class is
best suited for applications or applets that need to access content
on the World Wide Web. If all you need to use Java for is writing
Web browser applets, the URL
and URLConnection classes
in all likelihood will handle your network communications needs.
The URL class enables you
to retrieve a resource from the Web by specifying the Uniform
Resource Locator for it. The content of the URL is fetched and
turned into a corresponding Java object (such as a String
containing the text of an HTML document). If you are fetching
arbitrary information, the URLConnection
object provides methods that will try to deduce the type of the
content either from the filename in the URL or from the content
stream itself.
The Socket Class
The Socket class provides
a reliable, ordered stream connection (that is, a TCP/IP socket
connection). The host and port number of the destination are specified
when the Socket is created.
The connection is reliable because the transport layer (the TCP
protocol layer) acknowledges the receipt of sent data. If one
end of the connection does not get an acknowledgment back within
a reasonable period of time, the other end re-sends the unacknowledged
data (a technique known as Positive Acknowledgment with Retransmission,
often abbreviated as PAR). Once you have written data into a Socket,
you can assume that the data will get to the other side (unless
you receive an IOException,
of course).
The term ordered stream means that the data arrives at
the opposite end in the exact same order it is written. However,
because the data is a stream, write boundaries are not preserved.
What this means is that if you write 200 characters, the other
side may read all 200 at once. It might get the first 10 characters
one time and the next 190 the next time data is received from
the socket. In any case, the receiver cannot tell where each group
of data was written.
The reliable stream connection provided by Socket
objects is well suited for interactive applications. Examples
of protocols that use TCP as their transport mechanism are telnet
and FTP. The HTTP protocol used to transfer data for the Web also
uses TCP to communicate between hosts.
The ServerSocket Class
A ServerSocket class represents
what Socket-type connections
communicate with. Server sockets listen on a given port for connection
requests when their accept()
method is called. The ServerSocket
offers the same connection-oriented, ordered stream protocol (TCP)
that the Socket object does.
In fact, once a connection has been established, the accept()
method returns a Socket object
to talk with the remote end.
The DatagramSocket Class
The DatagramSocket class
provides an unreliable, connectionless, datagram connection (that
is, a UDP/IP socket connection).
Unlike the reliable connection provided by a Socket,
there is no guarantee that what you send over a UDP connection
actually gets to the receiver. The TCP connection provided by
the Socket class takes care
of retransmitting any packets that get lost. Packets sent through
UDP simply are sent out and forgotten, which means that if you
need to know that the receiver got the data, you will have to
send back some sort of acknowledgment. This arrangement does not
mean that your data will never get to the other end of a UDP connection.
If a network error happens (your cat jiggles the Ethernet plug
out of the wall, for example), the UDP layer does not try to send
it again or even know that the packet did not get to the recipient.
Connectionless means that the socket does not have a fixed
receiver. You can use the same DatagramSocket
to send packets to different hosts and ports; however, you can
use a Socket connection only
to connect to a given host and port. Once a Socket
is connected to a destination, that destination cannot be changed.
The fact that UDP sockets are not bound to a specific destination
also means that the same socket can listen for packets as well
as originating them. There is no UDP DatagramServerSocket
equivalent to the TCP ServerSocket.
Datagram refers to the fact that the information is sent
as discrete packets rather than as a continuous ordered stream.
The individual packet boundaries are preserved. It may help to
think of this process as dropping fixed-size postcards in a mailbox.
If you send four packets, the order in which they arrive at the
destination is not guaranteed to be the same in which they were
sent. The receiver may get them in the same order they were sent,
or the packets may arrive in reverse order. In any case, each
packet is received whole.
Given the above constraints, why would anyone want to use a DatagramSocket?
There are several advantages to using UDP:
- You have to communicate with several different hosts.
Because a DatagramSocket
is not bound to a particular host, you can use the same object
to communicate with different hosts by specifying the InetAddress
when you create each DatagramPacket.
- You are not worried about reliable delivery. If the
application you are writing does not have to know that the data
it sends was received at the other end, using a UDP socket eliminates
the overhead of acknowledging each packet as TCP does. Another
case is if the protocol you are implementing has its own method
of handling reliable delivery and retransmission.
- The amount of data being sent does not merit the overhead
of setting up a connection and the reliable delivery mechanism.
An application that is sending only 100 bytes for each transaction
every 10 minutes is an example of this situation.
The NFS (Network File System) protocol version 2, originally developed
by Sun with implementations available for most operating systems,
is an example application that uses UDP for its transport mechanism.
Another example of an application in which a DatagramSocket
may be appropriate is a multiplayer game. The central server must
communicate with all the players involved and does not necessarily
have to know that a position update got to the player.
Note |
An actual game that uses UDP for communication is Netrek, a space combat simulation loosely based on the Star Trek series. Information on Netrek can be found using the Yahoo subject catalog at this URL:
http://www.yahoo.com/Recreation/Games/Internet_Games/Netrek/
There is also a Usenet newsgroup:
news:rec.games.netrek
|
Decisions, Decisions
Now that you know what the classes are capable of, you can choose
the one that best fits your application. Table 23.2 sums up the
type of connection each of the base networking classes creates.
The Direction column indicates where a connection originates:
Outgoing indicates that your application is opening a connection
out to another host; Incoming indicates that some other
application is initiating a connection to yours.
Table 23.2. Summary of low-level connection objects.
Class | Connection Type
| Direction |
Socket
| Connected, ordered byte stream (TCP) | Outgoing
|
ServerSocket
| Connected, ordered byte stream (TCP) | Incoming
|
DatagramSocket
| Connectionless datagram (UDP) | Incoming or Outgoing
|
You should look at the problem you are trying to solve, any constraints
you have, and the transport mechanism that best fits your situation.
If you are having problems choosing a transport protocol, take
a look at some of the RFCs that define Internet standards for
applications (such as HTTP or SMTP). One of them might be similar
to what you are trying to accomplish. As an alternative, you can
be indecisive and provide both TCP and UDP versions of your service,
duplicating the processing logic and customizing the network logic.
Trying both transport protocols with a pared-down version of your
application can give you an indication of which protocol better
serves your purposes. Once you've looked at these factors, you
should be able to decide which class to use.
One of the purposes of Java is to enable executable content from
an arbitrary network source to be retrieved and run securely.
To accomplish this goal, the Java runtime enforces certain limitations
on what classes obtained through the network may do. You should
be aware of these constraints because they affect the design of
applets and how the applets must be loaded. You must take into
consideration whatever security constraints are imposed by your
target environment and your development environment as well when
you design your application or applet.
For example, Netscape Navigator 2.0 allows code loaded from local
disk more privileges than code loaded over a network connection.
A class loaded from an HTTP daemon may create only outgoing connections
back to the host from which it was loaded. If the class is loaded
from the local host (that is, if it is located somewhere in the
class search path on the machine running Navigator), the class
can connect to an arbitrary host. Contrast this with the applet
viewer provided with Sun's Java Developers Kit. The applet viewer
can be configured to act similarly to Navigator or to enforce
no restrictions on network connectivity.
If you need full access to all Java's capabilities, there is always
the option of writing a standalone application. A standalone application
(that is, one not running in the context of a Web browser) has
no restrictions on what it is allowed to do. Sun's HotJava Web
browser is an example of a standalone application.
Note |
For a more detailed discussion of Java security and how it is designed into the language and runtime, take a look at Chapter 35, "Java Security."
In addition, Sun has several white paper documents and a collection of frequently asked questions available at http://www.javasoft.com/sfaq/.
|
These checks are implemented by a subclass of java.lang.SecurityManager.
Depending on the security model, the object will allow or deny
certain actions. You can check beforehand whether a capability
your applet needs is present by calling the SecurityManager
yourself. The java.lang.System
object provides a getSecurityManager()
method that returns a reference to the SecurityManager
active for the current context. If your applet needs to open a
ServerSocket, for example,
you can call the checkListen()
method yourself and print an error message (or pop up a dialog
box) alerting the users and referring them to installation instructions.
This chapter is a roadmap to the next four chapters. It has shown
what concepts you need to be familiar with before you dive into
network programming in Java. You should be comfortable with how
TCP/IP networking operates in general (or at least know where
to look for more information). You also should now have an idea
of which Java class provides what function-ality.
Contact
reference@developer.com with questions or comments.
Copyright 1998
EarthWeb Inc., All rights reserved.
PLEASE READ THE ACCEPTABLE USAGE STATEMENT.
Copyright 1998 Macmillan Computer Publishing. All rights reserved.