<?xml version="1.0"?>
<article><articleinfo><title>Usenet News HOWTO </title><authorgroup><author><firstname>Shuvam Misra</firstname><othername>                (usenet at starcomsoftware dot com)
                </othername></author></authorgroup><address format="linespecific">Starcom Software Private Limited.
		starcomsoftware.com
            <country>Mumbai, India</country>
        </address><revhistory><revision><revnumber>2.1</revnumber><date>2002-08-20</date><authorinitials>sm</authorinitials><revremark>New sections on Security and Software History,
		    lots of other small additions and cleanup</revremark></revision><revision><revnumber>2.0</revnumber><date>2002-07-30</date><authorinitials>sm</authorinitials><revremark>Rewritten by new authors at Starcom Software</revremark></revision><revision><revnumber>1.4</revnumber><date>1995-11-29</date><authorinitials>vs</authorinitials><revremark>Original document; authored by Vince Skahan.</revremark></revision></revhistory></articleinfo><section><title>What is the Usenet?</title><section><title>Discussion groups </title><para>The Usenet is a huge worldwide collection of discussion
groups. Each discussion group has a name, <emphasis>e.g.</emphasis>
<literal moreinfo="none">comp.os.linux.announce</literal>, and a collection of messages.
These messages, usually called <emphasis>articles</emphasis>, are posted
by readers like you and me who have access to Usenet servers, and are
then stored on the Usenet servers.</para><para>This ability to both read and write into a Usenet newsgroup makes
the Usenet very different from the bulk of what people today call ``the
Internet.'' The Internet has become a colloquial term to refer to the
World Wide Web, and the Web is (largely) read-only. There are online
discussion groups with Web interfaces, and there are mailing lists, but
Usenet is probably more convenient than either of these for most large
discussion communities. This is because the articles get replicated to
your local Usenet server, thus allowing you to read and post articles
without accessing the global Internet, something which is of great value
for those with slow Internet links. Usenet articles also conserve
bandwidth because they do not come and sit in each member's mailbox, unlike 
email
based mailing lists. This way, twenty members of a mailing list in one
office will have twenty copies of each message copied to their
mailboxes. However, with a Usenet discussion group and a local Usenet
server, there's just one copy of each article, and it does not fill up
anyone's mailbox.</para><para>Another nice feature of having your own local Usenet server is
that articles stay on the server even after you've read them. You can't
accidentally delete a Usenet articles the way you can delete a message
from your mailbox. This way, a Usenet server is an
<emphasis>excellent</emphasis> way to archive articles of a group
discussion on a local server without placing the onus of archiving on
any group member. This makes local Usenet servers very valuable as
archives of internal discussion messages within corporate Intranets,
provided the article expiry configuration of the Usenet server software
has been set up for sufficiently long expiry periods.</para></section><section><title>How it works, loosely speaking</title><para> Usenet news works by the reader first firing up a Usenet news
program, which in today's GUI world will highly likely be something like
Netscape Messenger or Microsoft's Outlook Express. There are a lot of
proven, well-designed character-based Usenet news readers, but a proper
review of the user agent software is outside the scope of this HOWTO, so
we will just assume that you are using whatever software you like. The
reader then selects a Usenet newsgroup from the hundreds or thousands of
newsgroups which are hosted by her local server, and accesses all unread
articles. These articles are displayed to her. She can then decide to
respond to some of them.</para><para>When the reader writes an article, either in response to an
existing one or as a start of a brand-new thread of discussion, her
software <emphasis>posts</emphasis> this article to the Usenet server.
The article contains a list of newsgroups into which it is to be posted.
Once it is accepted by the server, it becomes available for other users
to read and respond to. The article is automatically
<emphasis>expired</emphasis> or deleted by the server from its internal
archives based on expiry policies set in its software; the author of the
article usually can do little or nothing to control the expiry of her
articles.</para><para>A Usenet server rarely works on its own. It forms a part of
a collection of servers, which automatically exchange articles with
each other. The flow of articles from one server to another is called a
<emphasis>newsfeed</emphasis>. In a simplistic case, one can imagine a
worldwide network of servers, all configured to replicate articles with
each other, busily passing along copies across the network as soon as one
of them receives a new articles posted by a human reader. This replication
is done by powerful and fault-tolerant processes, and gives the Usenet
network its power. Your local Usenet server literally has a copy of all
current articles in all relevant newsgroups.</para></section><section><title>About sizes, volumes, and so on </title><para>Any would-be Usenet server administrator or creator
<emphasis>must</emphasis> read the <quote>Periodic Posting about the basic steps
involved in configuring a machine to store Usenet news,</quote> also known as
the Site Setup FAQ, available from
<literal moreinfo="none">ftp://rtfm.mit.edu/pub/usenet/news.answers/usenet/site-setup</literal>
or
<literal moreinfo="none">ftp://ftp.uu.net/usenet/news.answers/news/site-setup.Z</literal>.
It was last updated in 1997, but trends haven't changed much since
then, though absolute volume figures have.</para><para>If you want your Usenet server to be a repository for all articles
in all newsgroups, you will probably not be reading this HOWTO, or even
if you do, you will rapidly realise that anyone who needs to read this
HOWTO may not be ready to set up such a server. This is because the
volumes of articles on the Usenet have reached a point where very
specialised networks, very high end servers, and large disk arrays
are required for handling such Usenet volumes. Those setups are called
``carrier-class'' Usenet servers, and will be discussed a bit later on in
this HOWTO. Administering such an array of hardware may not be the job
of the new Usenet administrator, for which this HOWTO (and most Linux
HOWTO's) are written.</para><para>Nevertheless, it may be interesting to understand what volumes we
are talking about. Usenet news article volumes have been doubling every
fourteen months or so, going by what we hear in comments from
carrier class Usenet administrators. In the beginning of 1997, this
volume was 1.2 GBytes of articles a day. Thus, the volumes should have
roughly done five doublings, or grown 32 times, by the time we reach
mid-2002, at the time of this writing. This gives us a volume of 38.4
GBytes per day. Assume that this transfer happens using uncompressed
NNTP (the norm), and add 50% extra for the overheads of NNTP, TCP,
and IP. This gives you a raw data transfer volume of 57.6 GBytes/day or
about 460 Gbits/day. If you have to transfer such volumes of data in 24
hours (86400 seconds), you'll need raw bandwidth of about 5.3 Mbits per
second just to <emphasis>receive all these articles</emphasis>. You'll
need more bandwidth to send out feeds to other neighbouring Usenet
servers, and then you'll need bandwidth to allow your readers to access
your servers and read and post articles in retail quantities. Clearly,
these volume figures are outside the network bandwidths of most
corporate organisations or educational institutions, and therefore only
those who are in the business of offering Usenet news can afford
it.</para><para>At the other end of the scale, it is perfectly feasible for a
small office to subscribe to a well-trimmed subset of Usenet newsgroups,
and exclude most of the high-volume newsgroups.  Starcom Software, where
the authors of this HOWTO work, has worked with a fairly large subset of
600 newsgroups, which is still a tiny fraction of the 15,000+ newsgroups
that the carrier class services offer. Your office or college may not
even need 600 groups. And our company had excluded specific high-volume
but low-usefulness newsgroups like the <literal moreinfo="none">talk</literal>,
<literal moreinfo="none">comp.binaries</literal>, and <literal moreinfo="none">alt</literal>
hierarchies. With the pruned subset, the total volume of articles per
day may amount to barely a hundred MBytes a day or so, and can be easily
handled by most small offices and educational institutions. And in such
situations, a single Intel Linux server can deliver excellent performance
as a Usenet server.</para><para>Then there's the <emphasis>internal</emphasis> Usenet service. By
internal here, we mean a private set of Usenet newsgroups, not a private
computer network. Every company or university which runs a Usenet news
service creates its own hierarchy of internal newsgroups, whose articles
never leave the campus or office, and which therefore do not consume
Internet bandwidth. These newsgroups are often the ones most hotly
accessed, and will carry more <emphasis>internally generated</emphasis>
traffic than all the ``public'' newsgroups you may subscribe to, within your
organisation.  After all, how often does a guy have something to say
which is relevant to the world at large, unless he's discussing a globally
relevant topic like ``Unix rules!''? If such internal newsgroups are the
focus of your Usenet servers, then you may find that fairly modest
hardware and Internet bandwidth will suffice, depending on the size of
your organisation.</para><para>The new Usenet server administrator has to undertake a sizing
exercise to ensure that he does not bite off more than he, or his
network resources, can chew. We hope we have provided sufficient
information for him to get started with the right questions.</para></section></section><section><title>Principles of Operation</title><para>Here we discuss the basic concepts behind the operation of a Usenet news
system.</para><section><title>Newsgroups and articles </title><para>A Usenet news article sits in a file or in some other on-disk
data structure on the disks of a Usenet server, and its contents look
like this:</para><programlisting format="linespecific">Xref: news.starcomsoftware.com starcom.tech.misc:211 starcom.tech.security:452
Newsgroups: starcom.tech.misc,starcom.tech.security
Path: news.starcomsoftware.com!purva!shuvam
From: Shuvam entshuvam@starcomsoftware.coment
Subject: "You just throw up your hands and reboot" (fwd)
Content-Type: TEXT/PLAIN; charset=US-ASCII 
Distribution: starcom
Organization: Starcom Software Pvt Ltd, India
Message-ID: entPine.LNX.4.31.0107022153490.30462-100000@starcomsoftware.coment
Mime-Version: 1.0
Date: Mon, 2 Jul 2001 16:27:57 GMT

Interesting quote, and interesting article.

Incidentally, comp.risks may be an interesting newsgroup to follow. We
must be receiving the feed for this group on our server, since we
receive all groups under comp.*, unless specifically cancelled. Check it
out sometime.

comp.risks tracks risks in the use of computer technology, including
issues in protecting ourselves from failures of such stuff.

Shuvam

ent Date: Thu, 14 Jun 2001 08:11:00 -0400
ent From: "Chris Norloff" entcnorloff@norloff.coment
ent Subject: NYSE: "Throw up your hands and reboot"
ent 
ent When the New York Stock Exchange computer systems crashed for 85
ent minutes (8 Jun 2001), Andrew Brooks, chief of equity trading at
ent Baltimore mutual fund giant T. Rowe Price, was quoted as saying "Hey,
ent we're all subject to the vagaries of technology. It happens on your
ent own PC at home. You just throw up your hands and reboot."
ent 
ent http://www.washingtonpost.com/ac3/ContentServer?articleid=A42885-2001Jun8entpagename=article
ent 
ent Chris Norloff
ent 
ent 
ent This is from --
ent 
ent From: risko@csl.sri.com (RISKS List Owner)
ent Newsgroups: comp.risks
ent Subject: Risks Digest 21.48
ent Date: Mon, 18 Jun 2001 19:14:57 +0000 (UTC)
ent Organization: University of California, Berkeley
ent 
ent RISKS-LIST: Risks-Forum Digest  Monday 19 June 2001
ent Volume 21 : Issue 48
ent 
ent    FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks)
ent    ACM Committee on Computers and Public Policy,
ent    Peter G. Neumann, moderator
ent 
ent This issue is archived at entURL:http://catless.ncl.ac.uk/Risks/21.48.htmlent
ent and by anonymous ftp at ftp.sri.com, cd risks .
ent </programlisting><para>A Usenet article's header is very interesting if you want to learn
about the functioning of the Usenet. The <literal moreinfo="none">From</literal>,
<literal moreinfo="none">Subject</literal>, and <literal moreinfo="none">Date</literal> headers are
familiar to anyone who has used email. The <literal moreinfo="none">Message-ID</literal>
header contains a unique ID for each message, and is present in each
email message, though not many non-technical email users know about it.
The <literal moreinfo="none">Content-Type</literal> and <literal moreinfo="none">Mime-Version</literal>
headers are used for MIME encoding of articles, attaching files and
other attachments, and so on, just like in email messages.</para><para>The <literal moreinfo="none">Organisation</literal> header is an informational header
which is supposed to carry some information identifying the organisation
to which the author of the article belongs. What remains now are the
<literal moreinfo="none">Newsgroups</literal>, <literal moreinfo="none">Xref</literal>,
<literal moreinfo="none">Path</literal> and <literal moreinfo="none">Distributions</literal> headers.
These are special to Usenet articles and are very important.</para><para>The <literal moreinfo="none">Newsgroups</literal> header specifies which newsgroups
this article should belong to. The <literal moreinfo="none">Distributions</literal>
header, sadly under-utilised in today's globalised Internet world,
allows the author of an article to specify how far the article will be
re-transmitted. The author of an article, working in conjunction with
well-configured networks of Usenet servers, can control the ``radius'' of
replication of his article, thus posting an article of local significance
into a newsgroup but setting the <literal moreinfo="none">Distribution</literal> header to
some suitable setting, <emphasis>e.g.</emphasis> <literal moreinfo="none">local</literal>
or <literal moreinfo="none">starcom</literal>, to prevent the article from being relayed
to servers outside the specified domain.</para><para>The <literal moreinfo="none">Xref</literal> header specifies the precise
<emphasis role="strong">article number</emphasis> of this article in each of the
newsgroups in which it is inserted, for the current server. When an
article is copied from one server to another as part of a newsfeed,
the receiving server throws away the old <literal moreinfo="none">Xref</literal> header
and inserts its own, with its own article numbers. This indicates an
interesting feature of the Usenet system: each article in a Usenet server
has a unique number (an integer) for each newsgroup it is a part of.
Our sample above has been added to two newsgroups on our server, and has
the article numbers 211 and 452 in those groups. Therefore, any Usenet
client software can query our server and ask for article number 211 in
the newsgroup <literal moreinfo="none">starcom.tech.misc</literal> and get this article.
Asking for article number 452 in <literal moreinfo="none">starcom.tech.security</literal>
will fetch the article too. On another server, the numbers may be very
different.</para><para>The <literal moreinfo="none">Path</literal> specifies the list of machines through
which this article has travelled before it has reached the current
server. UUCP-style syntax is used for this string. The current
example indicates that a user called <literal moreinfo="none">shuvam</literal> first
wrote this article and posted it onto a computer which calls itself
<literal moreinfo="none">purva</literal>, and this computer then transferred this article
by a newsfeed to <literal moreinfo="none">news.starcomsoftware.com</literal>. The
<literal moreinfo="none">Path</literal> header is critical for breaking loops in
newsfeeds, and will be discussed in detail later.</para><para>Our sample article will sit in the two newsgroups listed above
forever, unless expired. The Usenet software on a server is usually
configured to expire articles based on certain conditions,
<emphasis>e.g.</emphasis> after it's older than a certain number of
days. The C-News software we use allows expiry control based on the
newsgroup hierarchy and the type of newsgroup, <emphasis>i.e.</emphasis>
moderated or unmoderated. Against each class of newsgroups, it allows
the administrator to specify a number of days after which the article
will be expired. It is possible for an article to control its own
expiry, by carrying an <literal moreinfo="none">Expires</literal> header specifying a
date and time. Unless overriden in the Usenet server software, the
article will be expired only after its explicit expiry time is
reached.</para></section><section><title>Of readers and servers</title><para>Computers which access Usenet articles are broadly of two classes:
the readers and the servers. A Usenet server carries a repository of
articles, manages them, handles newsfeeds, and offers its repository to
authorised readers to read. A Usenet reader is merely a computer with
the appropriate software to allow a user to access a software, fetch
articles, post new articles, and keep track of which articles it has
read in each newsgroup. In terms of functionality, Usenet reading
software is less interesting to a Usenet administrator than a Usenet
server software. However, in terms of lines of code, the Usenet reader
software can often be much larger than Usenet server software, primarily
because of the complexities of modern GUI code.</para><para>Most modern computers almost exclusively access Usenet servers using
the NNTP (Network News Transfer Protocol) for reading and posting. This
protocol can also be used for inter-server communication, but those
aspects will be discussed later. The NNTP protocol, like any other
well-designed TCP-based Internet protocol, carries ASCII commands and
responses terminated with <literal moreinfo="none">CR-LF</literal>, and comprises a
sequence of commands, somewhat reminiscent of the POP3 protocol for
email. Using NNTP, a Usenet reader program connects to a Usenet server,
asks for a list of active newsgroups, and receives this (often huge)
list. It then sets the ``current newsgroup'' to one of these, depending
on what the user wants to browse through. Having done this, it gets the
meta-data of all current articles in the group, including the author,
subject line, date, and size of each article, and displays an index of
articles to the user.</para><para>The user then scans through this list, selects an article, and
asks the reader to fetch it.  The reader gives the article number of
this article to the server, and fetches the full article for the user
to read through. Once the user finishes his NNTP session, he exits,
and the reader program closes the NNTP socket. It then (usually)
updates a local file in the user's home area, keeping track of which
news articles the user has read. These articles are typically not shown
to the user next time, thus allowing the user to progress rapidly to new
articles in each session. The reader software is helped along in this
endeavour by the <literal moreinfo="none">Xref</literal> header, using which it knows
all the different identities by which a single article is identified
in the server. Thus, if you read the sample article given above by
accessing <literal moreinfo="none">starcom.tech.misc</literal>, you'll never be shown
this article again when you access <literal moreinfo="none">starcom.tech.misc</literal>
or <literal moreinfo="none">starcom.tech.security</literal>; your reader software will
do this by tracking the <literal moreinfo="none">Xref</literal> header and mapping
article numbers.</para><para>When a user posts an article, he first composes his message using
the user interface of his reader software. When he finally gives the
command to send the article, the reader software contacts the Usenet
server using the pre-existing NNTP connection and sends the article to
it. The article carries a <literal moreinfo="none">Newsgroups</literal> header with the
list of newsgroups to post to, often a <literal moreinfo="none">Distribution</literal>
header with a distribution specification, and other headers
like <literal moreinfo="none">From</literal>, <literal moreinfo="none">Subject</literal>
<emphasis>etc.</emphasis> These headers are used by the server
software to do the right thing. Special and rare headers like
<literal moreinfo="none">Expires</literal> and <literal moreinfo="none">Approved</literal> are acted upon
when present. The server assigns a new article number to the article for
each newsgroup it is posted to, and creates a new <literal moreinfo="none">Xref</literal>
header for the article.</para><para>Transfer of articles between servers is done in various ways, and
is discussed in quite a bit of detail in Section XXX titled
``Newsfeeds'' below.</para></section><section><title>Newsfeeds </title><section><title> Fundamental concepts</title><para>When we try to analyse newsfeeds in real life, we begin to see
	that, for most sites, traffic flow is not symmetrical in both
	directions. We usually find that one server will feed the bulk
	of the world's articles to one or more secondary servers every
	day, and receive a few articles written by the users of those
	secondary servers in exchange. Thus, we usually find that
	articles flow down from the stem to the branches to the leaves
	of the worldwide Usenet server network, and not exactly in a totally
	balanced mesh flow pattern. Therefore, we use the term
	``upstream server'' to refer to the server from which we receive
	the bulk of our daily dose of articles, and ``downstream
	server'' to refer to those servers which receive the bulk dose
	of articles from us.</para><para>Newsfeeds relay articles from one server to their ``next door
	neighbour'' servers, metaphorically speaking. Therefore, articles
	move around the globe, not by a massive number of single-hop
	transfers from the originating server to every other server in
	the world, but in a sequence of hops, like passing the baton in
	a relay race.  This increases the latency time for an article
	to reach a remote tertiary server after, say, ten hops, but
	it allows tighter control of what gets relayed at every hop,
	and helps in redundancy, decentralisation of server loads,
	and conservation of network bandwidth. In this respect, Usenet
	newsfeeds are more complex than HTTP data flows, which
	typically use single-hop techniques.</para><para>Each Usenet news server therefore has to worry about
	newsfeeds each time it receives an article, either by a fresh post
	or from an incoming newsfeed. When the Usenet server digests this
	article and files it away in its repository, it simultaneously
	looks through its database to see which other server it should
	feed the article to. In order to do this, it carries out a
	sequence of checks, described below.</para><para>Each server knows which other servers are its ``next door
	neighbours;'' this information is kept in its newsfeed
	configuration information. Against each of its ``next door
	neighbours,'' there will be a list of newsgroups which it
	wants, and a list of distributions. The new article's list of
	newsgroups will be matched against the newsgroup list of the
	``next door neighbour'' to see whether there's even a single
	common newsgroup which makes it necessary to feed the article to
	it. If there's a matching newsgroup, and the server's distribution
	list matches the article's distribution, then the article is
	marked for feeding to this neighbour.</para><para>When the neighbour receives the article as part of the
	feed, it performs some sanity checks of its own. The first check
	it performs is on the <literal moreinfo="none">Newsgroups</literal> header of
	the new article. If none of the newsgroups listed there are part
	of the active newsgroups list of this server, then the article
	can be rejected. An article rejected thus may even be queued for
	outgoing feeds to other servers, but will not be digested for
	incorporation into the local article repository.</para><para>The next check performed is against the
	<literal moreinfo="none">Path</literal> header of the incoming article. If this
	header lists the name of the current Usenet server anywhere,
	it indicates that it has already passed through this server at
	least once before, and is now re-appearing here erroneously because
	of a newsfeed loop. Such loops are quite often configured into
	newsfeed topologies for redundancy: ``I'll get the articles from
	Server X if not Server Y, and may the first one in win.'' The
	Usenet server software automatically detects a duplicate feed
	of an article and rejects it.</para><para>The next check is against what is called the server's
	<emphasis>history database</emphasis>. Every Usenet server has
	a history database, which is a list of the message IDs of all
	current articles in the local repository. Oftentimes the history
	database also carries the message IDs of all messages recently
	expired. If the incoming article's message ID matches any of the
	entries in the database, then again it is rejected without being
	filed in the local repository. This is a second loop detection
	method. Sometimes, the mere checking of the article's
	<literal moreinfo="none">Path</literal> header does not detection of all
	potential problems, because the problem may be a re-insertion
	instead of a loop. A re-insertion happens when the same incoming
	batch of news articles is re-fed into the local server, perhaps
	after recovering the system's data from tapes after a system
	crash. In such cases, there's no newsfeed loop, but there's
	still the risk that one article may be digested into the local
	server twice. The history database prevents this.</para><para>All these simple checks are very effective, and work
	across server and software types, as per the Internet standards.
	Together, they allow robust and fail-safe Usenet article flow
	across the world.</para></section><section><title>Types of newsfeeds</title><para>This section explains the basics of newsfeeds, without getting 
	into details of software and configuration files.</para><section><title>Queued feeds</title><para>	    This is the commonest method of sending articles from one server
	    to another, and is followed whenever large volumes of articles 
	    are to be transferred per day. This approach needs a one-time 
	    modification to the upstream server's configuration for each 
	    outgoing feed, to define a new <emphasis>queue.</emphasis>
	</para><para>	    In essence all queued feeds work in the following way. When the 
	    sending server receives an article, it processes it for 
	    inclusion into its local repository, and also checks through all
	    its outgoing feed definitions to see whether the article needs 
	    to be queued for any of the feeds. If yes, it is added to a 
	    <emphasis>queue file</emphasis> for each outgoing feed. The
	    precise details
	    of the queue file can change depending on the software 
	    implementation, but the basic processes remain the same. A queue
	    file is a list of queued articles, but does not contain the
	    article contents. Typical queue files are ASCII text files with
	    one line per article giving the path to a copy of the article in
	    the local spool area.
	</para><para>	    Later, a separate process picks up each queue file and creates 
	    one or more <emphasis>batches</emphasis> for each outgoing feed.
	    A <emphasis>batch</emphasis> is a large file containing multiple
	    Usenet news 
	    articles. Once the batches are created, various transport 
	    mechanisms can be used to move the files from sending server to
	    receiving server. You can even use scripted FTP.  You only need
	    to ensure that the batch is picked up from the upstream server 
	    and somehow copied into a designated incoming batch directory in
	    the downstream server.
	</para><para>	    UUCP has traditionally been the mechanism of choice for batch 
	    movement, because it predates the Internet and wide availability
	    of fast packet-switched data networks. Today, with TCP/IP 
	    everywhere, UUCP once again emerges as the most logical choice 
	    of batch movement, because it too has moved with the times: it 
	    can work over TCP.
	</para><para>	    NNTP is the <emphasis>de facto</emphasis> mechanism of choice
	    for moving 
	    queued newsfeeds for carrier-class Usenet servers on the 
	    Internet, and unfortunately, for a lot of other Usenet servers 
	    as well. The reason why we find this choice unfortunate is 
	    discussed in <xref linkend="feedefficiency"></xref>ent below. But in NNTP
	    feeds, an intermediate step of building batches out of queue 
	    files can be eliminated --- this is both its strength and its 
	    weakness.
	</para><para>	    In the case of queued NNTP feeds, articles get added to queue 
	    files as described above. An NNTP transmit process periodically
	    wakes up, picks up a queue file, and makes an NNTP connection to
	    the downstream server. It then begins a processing loop where, 
	    for each queued article, it uses the NNTP
	    <literal moreinfo="none">IHAVE</literal> 
	    command to inform the downstream server of the article's 
	    message~ID. The downstream server checks its local repository to
	    see whether it already has the message. If not, it responds with
	    a <literal moreinfo="none">SENDME</literal> response. The transmitting server
	    then pumps
	    out the article contents in plaintext form.  When all articles 
	    in the queue have been thus processed, the sending server closes
	    the connection. If the NNTP connection breaks in between due to
	    any reason, the sending server truncates the queue file and 
	    retains only those articles which are yet to be transmitted, 
	    thus minimising repeat transmissions.
	</para><para><anchor id="dialupnonntp"></anchor>ent
	    A queued NNTP feed works with the sending server making an NNTP
	    connection to the receiving server. This implies that the 
	    receiving server must have an IP address which is known to the 
	    sending server or can be looked up in the DNS. If the receiving
	    server connects to the Internet periodically using a dialup 
	    connection and works with a dynamically assigned IP address, 
	    this can get tricky. UUCP feeds suffer no such problems because
	    the sending server for the newsfeed can be the UUCP server,
	    <emphasis>i.e.</emphasis>
	    passive. The receiving server for the feed can be the UUCP 
	    master, <emphasis>i.e.</emphasis> the active party. So the
	    receiving server can then
	    initiate the UUCP connection and connect to the sending server.
	    Thus, if even one of the two parties has a static IP address, 
	    UUCP queued feeds can work fine.
	</para><para>	    Thus, NNTP feeds can be sent out a little faster than the 
	    batched transmission processes used for UUCP and other older 
	    methods, because no batches need to be constructed. However, 
	    NNTP is often used in newsfeeds where it is not necessary and it
	    results in colossal waste of bandwidth.  Before we study 
	    efficiency issues of NNTP versus batched feeds, we will cover 
	    another way feeds can be organised using NNTP: the pull feeds.
	</para></section><section><title>Pull feeds</title><para>	    This method of transferring a set of articles works only over 
	    NNTP, and requires absolutely no configuration on the 
	    transmitting, or upstream, server. In fact, the upstream server
	    cannot even easily detect that the downstream server is pulling
	    out a feed --- it appears to be just a heavy and thorough
	    newsreader, that's all.
	</para><para>	    This pull feed works by the downstream server pulling out
	    articles i one by one, just like any NNTP newsreader, using the
	    NNTP <literal moreinfo="none">ARTICLE</literal> command with the Message-ID as
	    parameter.
	    The interesting detail is how it gets the message~IDs to begin
	    with. For this, it uses an NNTP command, specially designed for
	    pull feeds, called <literal moreinfo="none">NEWNEWS</literal>. This command
	    takes a hierarchy and a date, 
	    <screen format="linespecific"> NEWNEWS comp 15081997 </screen>
	</para><para>	    This command is sent by the downstream server over NNTP to the 
	    upstream server, and in effect asks the upstream server to list
	    out all news articles which are newer than 15 August 1997 in the
	    <literal moreinfo="none">comp</literal> hierarchy. The upstream server responds
	    with a 
	    (often huge) list of message~IDs, one per line, ending with a
	    period on a line by itself.
	</para><para>	    The pulling server then compares each newly received message~ID
	    with its own article database and makes a (possibly shorter)
	    list of all articles which it does not have, thus eliminating
	    duplicate fetches.  That done, it begins fetching articles one
	    by one, using the NNTP <literal moreinfo="none">ARTICLE</literal> command as
	    mentioned above.
	</para><para>	    In addition, there is another NNTP command,
	    <literal moreinfo="none">NEWGROUPS</literal>,
	    which allows the NNTP client --- <emphasis>i.e.</emphasis> the
	    downstream server in
	    this case --- to ask its upstream server what were the new
	    newsgroups created since a given date. This allows the
	    downstream server to add the new groups to its
	    <literal moreinfo="none">active</literal> file.
	</para><para>	    The <literal moreinfo="none">NEWNEWS</literal> based approach is usually one of
	    the most inefficient methods of pulling out a large Usenet feed.
	    By inefficiency, here we refer to the CPU loads and RAM
	    utilisation on the upstream server, not on bandwidth usage. This
	    inefficiency is because most Usenet news servers do not keep
	    their article databases indexed by hierarchy and date; CNews
	    certainly does not. This means that a <literal moreinfo="none">NEWNEWS</literal>
	    command issued to an upstream server will put that server into a
	    sequential search of its article database, to see which articles
	    fit into the hierarchy given and are newer than the given date.
	</para><para>	    If pull feeds were to become the most common way of sending out 
	    articles, then all upstream servers would badly need an
	    efficient way of sorting their article databases to allow each 
	    <literal moreinfo="none">NEWNEWS</literal> command to rapidly generate its list
	    of matching articles. A slow upstream server today might take
	    minutes to begin responding to a <literal moreinfo="none">NEWNEWS</literal>
	    command, and
	    the downstream server may time out and close its NNTP connection
	    in the meanwhile. We have often seen this happening, till we
	    tweak timeouts.
	</para><para>	    There are basic efficiency issues of bandwidth utilisation
	    involved in NNTP for news feeds, which are applicable for both
	    queued and pull feeds. But the problem with
	    <literal moreinfo="none">NEWNEWS</literal> is unique to pull feeds, and relates
	    to server loads, not bandwidth wastage. 
	</para></section></section></section><section id="controlmsg"><title>Control messages</title><para>The Usenet is a massive dispersed collection of servers which
operate almost without any supervision, provided they have adequate disk
space and do not suffer disk corruption due to power failures,
<emphasis>etc.</emphasis> (It is indeed surprising how self-managing a
good Usenet server is, provided these two pre-requisites are met.) These
servers are each under the control of human administrators, but it is
preferable that certain routine actions be performed across all these
servers remotely from one location, without the manual intervention of
these humans.</para><para>One common need for centralised operations is the creation of new
groups in the standard eight hierarchies. The Usenet follows a fairly
formal process which asks for votes from readers worldwide before
deciding on the restructuring of its newsgroups list, including merging of
low-volume groups, splitting of high-volume groups into many specialised
groups, creating new groups, and even deleting groups. Once the voting
process for a change concludes and the change action is to be carried
out, it would be extremely tedious to send email to the hundreds of
thousands of Usenet administrators and hope that they make the changes
right, and answer their doubts if they get confused. It would be much
better to have an <emphasis>automatic</emphasis> way to make the
changes across all servers, of course with proper authorisation.</para><para>The solution to this does not lie in giving some central authority
the ability to run an OS-level command of his choice on all the world's
Usenet servers, because OS commands differ from OS to OS, and because
few Usenet administrators would trust a stranger from another part of
the world with OS level access. Therefore, the solution lay in defining
a small set of common Usenet maintenance actions, and permitting only
these actions to be triggered on all servers through the passing of
special command messages, called <emphasis role="bold">control
messages</emphasis>.</para><para>Control messages look like ordinary Usenet articles, more or less.
They have an extra header line, with its value in a specific format,
but they usually carry body text which looks like a normal human-written
article. Here is a control message (a spurious one at that, but it'll
do for now):</para><programlisting format="linespecific">Xref: news.starcomsoftware.com control:814217
Path: news.starcomsoftware.com!linux594.dn.net!news.dn.hoopoo.com!
	feed-out.newsfeeds.com!newsfeeds.com!feed.newsfeeds.com!
	newsfeeds.com!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!
	newsfeed.icl.net!newsfeed.skycache.com!Cidera!newsfeed.gamma.ru!
	Gamma.RU!carrier.kiev.ua!goblin.nadrabank.kiev.ua!not-for-mail
From: tale@uunet.uu.net (David C Lawrence)
Newsgroups: news.groups,humanities.hipcrime
Subject: cmsg newgroup humanities.hipcrime
Control: newgroup humanities.hipcrime
Date: Sun, 18 Feb 2001 11:50:28 GMT
Organization: The Cabal
Lines: 20
Approved: tale@uunet.uu.net
Message-ID: ent3afWYZTIR.G5YOC2@uunet.uu.netent
NNTP-Posting-Host: 203.145.147.67
X-Trace: goblin.nadrabank.kiev.ua 982528840 21455 203.145.147.67
         (18 Feb 2001 20:40:40 GMT)
X-Complaints-To: usenet@nadrabank.kiev.ua
NNTP-Posting-Date: 18 Feb 2001 20:40:40 GMT
X-No-Archive: Yes

humanities.hipcrime is an unmoderated newsgroup which passed its
vote for creation by 326:10 as reported in news.announce.newgroups
on 18 Feb 2001.

For your newsgroups file:
humanities.hipcrime	HipCrime for Humanity - you committed one now!

Anyone can create a newsgroup in the alt, biz, comp, earth,
humanities, misc, news, meow, rec, sci, soc, talk, us, or
any other Usenet hierarchy.  New newsgroup proposals may be
optionally discussed in news.groups. Please be sure that your
/usr/lib/news/control.ctl is configured correctly:

##	NEWGROUP MESSAGES
## honor them all and log in \${LOG}/newgroup.log
newgroup:*:alt.*|biz.*|comp.*|earth.*|humanities.*|misc.*|news.*|\
	meaw.*|rec.*|sci.*|soc.*|talk.*|us.*:doit=newgroup

##	RMGROUP MESSAGES
## drop them all and don't log
rmgroup:*:*:drop

Meow!
David C Lawrence</programlisting><para>A control message must have a <literal moreinfo="none">Control</literal>
header. Besides, all control messages <emphasis>will</emphasis>
have an <literal moreinfo="none">Approved</literal> header, like messages posted
to moderated newsgroups. The <literal moreinfo="none">Control</literal> header
actually specifies a command to run on the local server, and the
parameter(s) to supply to it. The local Usenet server software is
supposed to figure out its own way to get the task done. In this
example, the command in the <literal moreinfo="none">Control</literal> header is
<literal moreinfo="none">newgroup</literal>, which creates a new newsgroup. And its
parameter is <literal moreinfo="none">humanities.hipcrime</literal>, which gives the
name of the newsgroup to create.</para><para>In C-News, the control message implementation works through
separate shellscripts kept in a fixed directory,
<literal moreinfo="none">$NEWSBIN/ctl/</literal>, as a security measure; if the
executable script isn't present there, the control message command will
be ignored. The control message types supported are:</para><itemizedlist><listitem><para><literal moreinfo="none">checkgroups</literal>:  control message to
check whether the list of newsgroups in your active file are all correct
as per a master list of newsgroups sent in the control message</para></listitem><listitem><para><literal moreinfo="none">newgroup</literal>: control message to create a
new newsgroup</para></listitem><listitem><para><literal moreinfo="none">rmgroup</literal>: control message to delete a
newsgroup and all articles in it</para></listitem><listitem><para><literal moreinfo="none">sendsys</literal>: control message to cause an
email response to be sent to the author with the <literal moreinfo="none">sys</literal>
file of your server in it. This results in a response storm of
emails from all the Usenet servers in the world to the author. These
responses allow the sender of the control message to analyse all the
<literal moreinfo="none">sys</literal> files of the world's Usenet servers and create the
directed graph of Usenet newsfeeds. Why someone would want to do this is
hard to guess, but the result is surely an awesome picture of one facet
of networked human civilisation, like looking at a giant world map.</para><para>Incidentally, there is no invasion of privacy here, because
your server's <literal moreinfo="none">sys</literal> file is supposed to be public
information, if you take feeds from the public Usenet.</para></listitem><listitem><para><literal moreinfo="none">version</literal>: control message which results
in your Usenet software sending an email to the author of the message,
containing the type and version of the Usenet news software you are
using. This too is not an invasion of privacy, because this information
is supposed to be public knowledge.</para></listitem><listitem><para>The cancel message: the most frequently occurring type of
control messages. They specify the message ID of an article, and result
in the cancellation (deletion) of that article. If you post an article
and regret it a moment later, your Usenet newsreader software usually
allows you to ``cancel'' it by generating a cancel message.</para></listitem></itemizedlist><para>The Usenet news software maintains a pseudo-newsgroup called
<literal moreinfo="none">control</literal>, where it files all control messages it
receives. If you have an incoming newsfeed from the public Usenet, your
server's <literal moreinfo="none">control</literal> group will usually be full with
thousands of cancel messages from trigger-happy fingers all over the
world. Usenet news server software like C-News allows you to filter the
incoming feed based on newsgroups, and will discard articles for groups
they do not subscribe to. But since all servers have to receive and
process control messages, they will all accept these cancel messages,
though many of them may apply to articles which are not part of your
highly-pruned subset of groups. <literal moreinfo="none">C'est la vie</literal>.</para><para>Remember to set expiry for the <literal moreinfo="none">control</literal> group to
one day or even shorter, so that the junk can be cleaned out as rapidly as
possible, just like the <literal moreinfo="none">junk</literal> newsgroup.</para><para>The beauty of the control message architecture is that it
integrates seamlessly into the newsfeed mechanism for automatic control
of the network of servers. No separate channel of connection is needed
for the control actions. And article replication automatically
propagates control messages with human-readable articles, thus
guaranteeing reach across heterogenous networks technologies.</para><para>What your Usenet server does on receiving a
control message is governed by an authorisation file:
<literal moreinfo="none">$NEWSCTL/controlperms</literal> in the case of C-News
and <literal moreinfo="none">control.ctl</literal> in the case of INN, for
instance. The security measures implemented by this module are
further enhanced by the <literal moreinfo="none">pgpcontrol</literal> package with its
<literal moreinfo="none">pgpverify</literal> script. Using <literal moreinfo="none">pgpverify</literal>,
your server can check that all control messages (except for article
cancellation messages) are digitally signed by a trusted party
using military-spec public key cryptography. Our integrated
Usenet news software distribution includes integration with
<literal moreinfo="none">pgpverify</literal>.</para></section></section><section><title>Usenet news software</title><section><title>A brief history of Usenet systems</title><para>Towards the end of this HOWTO, we have added some information
about the history of Usenet server software by quoting sections from an
earlier Usenet Periodic Posting. We consider this historical
perspective, and the Usenix papers and other documents referred to in
it, essential reading for any Usenet server administrator. Please see
the section titled <quote><xref linkend="softwarehistory"></xref>ent</quote>.</para></section><section><title>C-News and NNTPd</title><para>C-News was written by Henry Spencer and Geoff Collyer of the
Department of Zoology, University of Toronto, almost entirely in shell
and <literal moreinfo="none">awk</literal>, as a replacement for an earlier system called
B-News. The focus was on adding some extra features and a
lot of performance. The first release was called Shellscript Release,
which was deployed by a very large number of servers worldwide, as
a natural upgrade to B-News.  This version of C-News had upward
compatibility with B-News meta-data, <emphasis>e.g.</emphasis> history
files. This was the version of C-News which was initially rolled out
in 1991 or so at the National Centre for Software Technology (NCST,
<literal moreinfo="none">http://www.ncst.ernet.in</literal>) and the Indian Institutes
of Technology in India as part of the Indian educational and research
network (ERNET). We received guidance from the NCST about Usenet news
installation and management.</para><para>The Shellscript Release was soon followed by a re-write with a lot
more C code, called Performance Release, and then a set of cleanup and
component integration steps leading to the last release called the Cleanup
Release. This Cleanup Release was patched many times by the authors,
and the last one was CR.G (Cleanup Release revision G). The version of
C-News discussed in this HOWTO is a set of small bug fixes on CR.G.</para><para>Since C-News came from shellscript-based antecedents, its
architecture followed the set-of-programs style so typical of Unix,
rather than large monolothic software systems traditional to some other
OSs. All pieces had well-defined roles, and therefore could be easily
replaced with other pieces as needed. This allowed easy adaptations and
upgradations. This never affected performance, because key components
which did a lot of work at high speed, <emphasis>e.g.</emphasis>
<literal moreinfo="none">newsrun</literal>, had been rewritten in C by that time. Even
within the shellscripts, crucial components which handled binary data,
<emphasis>e.g.</emphasis> a component called <literal moreinfo="none">dbz</literal>
to manipulate efficient on-disk hash arrays, were C programs with
command-line interfaces, called from scripts.</para><para>C-News was born in a world with widely varying network line speeds,
where bandwidth utilisation was a big issue and dialup links with UUCP
file transfers was common. Therefore, it has strong support for
batched feeds, specially with a variety of compression techniques and
over a variety of fast and slow transport channels. And C-News virtually
does not know the existence of TCP/IP, other than one or two tiny batch
transport programs like <literal moreinfo="none">viarsh</literal>. However, its design
was so modular that there was absolutely no problem in plugging in NNTP
functionality using a separate set of C programs without modifying
a single line of C-News. This was done by a program suite called
NNTP Reference Implementation, which we call NNTPd.</para><para>This software suite could work with B-News and C-News article
repositories, and provided the full NNTP functionality.  Since B-News
died a gradual death, the combination of C-News and NNTPd became a freely
redistributable, portable, modern, extensible, and high-performance
software suite for Unix Usenet servers.  Further refinements were
added later, <emphasis>e.g.</emphasis> <literal moreinfo="none">nov</literal>, the News
Overview package and <literal moreinfo="none">pgpverify</literal>, a public-key-based
digital signature module to protect Usenet news servers against
fraudulent control messages.</para></section><section><title>INN</title><para>INN is one of the two most widely used Usenet news server solutions. It
was written by Rich Salz for Unix systems which have a socket API ---
probably all Unix systems do, today.</para><para>INN has an architecture diametrically opposite to CNews. It is a
monolithic program, which is started at bootup time, and keeps running
till your server OS is shut down. This is like the way high performance
HTTP servers are run in most cases, and allows INN to cache a lot of
things in its memory, including message-IDs of recently posted messages,
<emphasis>etc.</emphasis> This interesting architecture has been discussed
in an interesting paper by the author, where he explains the problems
of the older B-News and C-News systems that he tried to address. Anyone
interested in Usenet software in general and INN in particular should
study this paper.</para><para>INN addresses a Usenet news world which revolves around NNTP, though it
has support for UUCP batches --- a fact that not many INN administrators 
seem to talk about. INN works faster than the CNews-NNTPd combination when
processing multiple parallel incoming NNTP feeds. For multiple readers
reading and posting news over NNTP, there is no difference between the
efficiency of INN and NNTPd. <xref linkend="innefficiency"></xref>ent discusses
the efficiency issues of INN over the earlier C-News architecture, based
on Rich Salz' paper and our analyses of usage patterns.  </para><para>INN's architecture has inspired a lot of high-performance Usenet news
software, including a lot of commercial systems which address the
``carrier class'' market. That is the market for which the INN
architecture has clear advantages over C-News.</para></section><section><title>Leafnode</title><para>This is an interesting software system, to set up a ``small'' Usenet
news server on one computer which only receives newsfeeds but does not
have the headache of sending out bulk feeds to other sites,
<emphasis>i.e.</emphasis> it is a ``leaf node'' in the newsfeed flow
diagram. According to its homepage (<literal moreinfo="none">www.leafnode.org</literal>),
``Leafnode is a USENET software package designed for small sites running
any flavour of Unix, with a few tens of readers and only a slow link
to the net. [...] The current version is 1.9.24.''</para><para>This software is a sort of combination of article repository and
NNTP news server, and receives articles, digests and stores them on the
local hard disks, expires them periodically, and serves them to an NNTP
reader. It is claimed that it is simple to manage and is ideal for
installation on a desktop-class Unix or Linux box, since it does not
take up much resources.</para><para>Leafnode is based on an appealing idea, but we find no problem
using C-News and NNTPd on a desktop-class box. Its resource consumption is
somewhat proportional to the volume of articles you want it to process,
and the number of groups you'll want to retain for a small team of users
will be easily handled by C-News on a desktop-class computer. An office
of a hundred users can easily use C-News and NNTPd on a desktop computer
running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk
space. Of course, ease of configuration and management is dependent on
familiarity, and we are more familiar with C-News than with Leafnode. We
hope this HOWTO will help you in that direction.</para><para>There <emphasis>is</emphasis>, however, one area in which Leafnode
is far easier to administer than INN or C-News. Leafnode constantly
monitors the actual usage of the newsgroups it carries, based on
readership statistics of its NNTP readers. If a particular newsgroup
is not read at all by any user for a week, then Leafnode will delete
all articles in that newsgroup, free up disk space, and stop fetching
new articles for it. If it finds that a previously abandoned newsgroup
is now again receiving attention, even from one user, then it'll fetch
all articles for that group from its upstream server the next time it
connects. This self-tuning feature of Leafnode is really an excellent
advantage which makes a Leafnode site easier to manage, specially for
small setups with bandwidth and disk space constraints.</para><para>The Leafnode Website gives a lot of details in an easily
understood format.</para><para>TO BE EXTENDED AND CORRECTED.</para></section><section><title>Suck</title><para>Suck is a program which lets you pull out an NNTP feed from an NNTP
server and file it locally. It does not contain any article repository
management software, expecting you to do it using some other
software system, <emphasis>e.g.</emphasis> C-News or INN.  It can
create batchfiles which can be fed to C-News, for instance. (Well,
to be fair, Suck <emphasis>does</emphasis> have an option to store the
fetched articles in a spool directory tree very much like what is used
by C-News or INN in their article area, with one file per article. You
can later read this raw message spool area using a mail client which
supports the <literal moreinfo="none">msgdir</literal> file layout for mail folders,
like MH, perhaps. We don't find this option useful if you're running
Suck on a Usenet server.)  Suck finally boils down to a single
command-line program which is invoked periodically, typically from
<literal moreinfo="none">cron</literal>. It has a zillion command-line options which
are confusing at first, but later show how mature and finely tunable
the software is.</para><para>If you need an NNTP pull feed, then we know of no better programs
than Suck for the job. The <literal moreinfo="none">nntpxfer</literal> program which
forms part of the NNTPd package also implements an NNTP pull feed, for
instance, but does not have one-tenth of the flexibility and fine-tuning
of Suck. One of the banes of the NNTP pull feed is connection timeouts;
Suck allows a lot of special tuning to handle this problem.  If we had
to set up a Usenet server with an NNTP pull feed, we'd use Suck right
away.</para><para>TO BE EXTENDED AND CORRECTED.</para></section><section><title>Carrier class software</title><para>Carrier-class servers are expected to handle a complete feed of all
articles in all newsgroups, including a lot of groups which have what we
call a ``high noise-to-signal ratio.'' They do not have the luxury of
choosing a ``useful'' subset like administrators of internal corporate
Usenet servers do. Secondly, carrier-class servers are expected to turn
articles around very fast, <emphasis>i.e.</emphasis> they are expected to
have very low latency from the moment they receive an article to the
time they retransmit it by NNTP to downstream servers. Third, they
are supposed to provide very high availability, like other ``carrier
class'' services. This usually means that they have parallel arrays of
computers in load sharing configurations. And fourth, they usually do
not cater to retail connections for reading and posting articles by human
users. Usenet news carriers usually reserve separate computers to handle
retail connections.</para><para>Thus, carrier-class servers do not need to maintain a repository
of articles; they only need to focus on super-efficient real-time
re-transmission. These highly specialised servers have software which
receive an article over NNTP, parse it, and immediately re-queue it for
outward transmission to dozens or hundreds of other servers. And since
they work at these high throughputs, their downstream servers are also
expected to be live on the Internet round the clock to receive incoming
NNTP connections, or be prepared to lose articles. Therefore, there's
no batching or long queueing needed, and C-News-style batching in fact
is totally inapplicable.</para><para>Therefore, these carrier-class Usenet servers are more like packet
routers than servers with repositories. They are referred to nowadays as
NNTP routers or news routers.</para><para>It can be seen why batch-oriented repository management
software like C-News is a total anachronism here, and why they need an
NNTP-oriented, online, real-time design. The INN antecedents of some
of these systems is therefore natural. We would love to hear from any
Linux HOWTO reader whose Usenet server requirements include carrier-class
behaviour.</para><para>We are aware of only one freely redistributable NNTP router:
NNTPRelay (see <literal moreinfo="none">http://nntprelay.maxwell.syr.edu/</literal>); this
software runs on NT. There is no reason why such services cannot run off
Linux servers, even Intel Linux, provided you have fast network links and
arrays of servers. Linux as an OS platform is not an issue here.</para><para>TO BE EXTENDED AND CORRECTED.</para></section></section><section id="settingup" xreflabel="Setting up C News + NNTPd"><title>Setting up CNews + NNTPd</title><section><title>Getting the sources and stuff</title><section><title>The sources</title><para>C-News software can be obtained from
<literal moreinfo="none">ftp://ftp.uu.net/networking/news/transport/cnews/cnews.tar.Z</literal>
and will need to be uncompressed using the BSD
<literal moreinfo="none">uncompress</literal> utility or a compatible program. The
tarball is about 650 KBytes in size. It has its own highly intelligent
configuration and installation processes, which are very well
documented. The version that is available is Cleanup Release revision G,
on which our own version is based.</para><para>NNTPd (the NNTP Reference Implementation) is available from
<literal moreinfo="none">ftp://ftp.uu.net/networking/news/nntp/nntp.1.5.12.1.tar.Z</literal>.
It has no automatic scripts and processes to configure itself. After
fetching the sources, you will have to follow a set of directions given
in the documentation and configure some C header files. These
configuration settings must be done keeping in mind what you have
specified when you build the C-News sources, because NNTPd and C-News
must work together. Therefore, some key file formats, directory paths,
<emphasis>etc.</emphasis>, will have to be specified identically in both
software systems.</para><para>The third software system we use is Nestor. This too is to be
found in the same place where the NNTPd software is kept, at
<literal moreinfo="none">ftp://ftp.uu.net/networking/news/nntp/nestor.tar.Z</literal>.
This software compiles to one binary program, which must be run
periodically to process the logs of <literal moreinfo="none">nntpd</literal>, the NNTP
server which is part of NNTPd, and report usage statistics to the
administrator. We have integrated Nestor into our source base.</para><para>The fourth piece of the system, without which no Usenet server
administrator dares venture out into the wild world of public Internet
newsfeeds, is <literal moreinfo="none">pgpverify</literal>.</para><para>We have been working with C-News and NNTPd for many years now, and
have fixed a few bugs in both packages. We have also integrated the four
software systems listed above, and added a few features here and there to
make things work more smoothly. We offer our entire source base to
anyone for free download from
<literal moreinfo="none">http://www.starcomsoftware.com/proj/usenet/src/news.tar.gz</literal>.
There are no licensing restrictions on our sources; they are as freely
redistributable as the original components we started with.</para><para>When you download our software distribution, you will extract it
to find a directory tree with the following subdirectories and files:</para><itemizedlist><listitem><para><literal moreinfo="none">c-news</literal>: the source tree of the CR.G
    software release, with our additions like
    <literal moreinfo="none">pgpverify</literal> integration, our scripts like
    <literal moreinfo="none">mail2news</literal>, and pre-created configuration
    files.
    </para></listitem><listitem><para><literal moreinfo="none">nntp-1.5.12.1</literal>: the source tree of the
    original NNTPd release, with header files pre-configured to fit in
    with our configuration of C-News, and our addition of bits and
    pieces like Nestor, the log analysis program.
    </para></listitem><listitem><para><literal moreinfo="none">howto</literal>: this document, and its SGML
    sources and Makefile.
    </para></listitem><listitem><para><literal moreinfo="none">build.sh</literal>: a shellscript you can run
    to compile the entire combined source tree and install binaries in the
    right places, if you are lucky and all goes well.
    </para></listitem></itemizedlist><para>Needless to say, we believe that our source tree is a better
place to start with than the original components, specially if you
are installing a Usenet server on a Linux box and for the first time.
We will be available on email to provide technical assistance should
you run into trouble.</para></section><section><title>The key configuration files</title><para>Once you get the sources, you will need some key configuration
files to seed your C-News system. These configuration files are
actually database tables, and are changing frequently, whenever
newsgroups are created, modified or deleted. These files specify
the list of active newsgroups in the ``public'' Usenet. You can,
and should, add your organisation's internal newsgroups to this
list when you set up your own server, but you will need to know
the list of public standard newsgroups to begin with. This list
can be obtained from the same FTP server by downloading the files
<literal moreinfo="none">active.gz</literal> and <literal moreinfo="none">newsgroups.gz</literal> from
<literal moreinfo="none">ftp://ftp.uu.net/networking/news/config/</literal>. You
can create your own <literal moreinfo="none">active</literal> and
<literal moreinfo="none">newsgroups</literal> files by retaining a subset of the entries
in these two files. Both these are ASCII text files.</para><para>Getting the sources from our server will not obviate the need to
get the latest versions of these files from
<literal moreinfo="none">ftp.uu.net</literal>. We do not (yet) maintain an up-to-date
copy of these files on our server, and we will add no value to the
original by just mirroring them.</para></section></section><section><title>Compiling and installing</title><para>    For installing, first make sure you have an entry for a user called 
    <literal moreinfo="none">news</literal> in your <literal moreinfo="none">/etc/password</literal> file. This
    is setting the news-database owner to <literal moreinfo="none">news</literal>. Now download
    the source from us and untar it in the home directory of news. This creates
    two main directories <emphasis>viz.</emphasis> <literal moreinfo="none">c-news</literal>
    and <literal moreinfo="none">nntp</literal>. 
    To install and compile, run the script <literal moreinfo="none">build.sh</literal> as root
    in the directory that contains the script. It is important that the script 
    run as <literal moreinfo="none">root</literal> as it sets ownerships, installs and 
    compiles the source as user <literal moreinfo="none">news</literal>. This
    is a one-step process that puts in place both the C-News and the
    NNTP software, setting correct permissions and paths.
    Following
    is a brief description of what build.sh does:</para><itemizedlist><listitem><para> 
    Checks for the <literal moreinfo="none">OS</literal> platform and exits if
    it is not <literal moreinfo="none">Linux</literal>.</para></listitem><listitem><para> 
    Again, exits if you are not running as
    <literal moreinfo="none">root</literal>.</para></listitem><listitem><para>    Looks for and exits if cannot find the above two directories.</para></listitem><listitem><para> 
    Compiles <literal moreinfo="none">C-News</literal> and performs regression tests if the 
    compilation was successfull.  Sends out a warning to read the error file 
    <literal moreinfo="none">make.out.r</literal> and to fix 'em.
    Compilation erros are written to a file called <literal moreinfo="none">make.out</literal>. 
</para></listitem><listitem><para> 
    Performs the above operation in the <literal moreinfo="none">nntp</literal> directory, too. </para></listitem><listitem><para> 
    Checks for the presence of the three key directories:
    <literal moreinfo="none">$NEWSARTS - (/var/spool/news)</literal> that houses the artciles, 
    <literal moreinfo="none">$NEWSCTL -(/var/lib/news)</literal> that contain
    configuration, log and status files and <literal moreinfo="none">$NEWSBIN - 
    (/usr/lib/newsbin)</literal> that contain binaries and
    executables for
    the working of the Usenet News system. Tries to create them if non-existent
    and exits if it results in failure.</para></listitem><listitem><para> 
    Changes the ownership of these directories to <literal moreinfo="none">news.news</literal>.
    This is important since the entire Usenet News System runs as user <literal moreinfo="none">news.</literal> It
    will not function properly as any other user. </para></listitem><listitem><para> 
    Then starts the installation process of C News. It runs
    <literal moreinfo="none">make install </literal>to install binaries at the right locations; 
    <literal moreinfo="none">make setup </literal>to set the correct paths and umask, create 
    directories for newsgroups, determine who will receive reports; 
    <literal moreinfo="none">make ui</literal> to set up inews and injnews and 
    <literal moreinfo="none">make readpostcheck </literal>to use readnews, postnews and 
    checknews scripts provided by C News. The errors, if any are to be found in
    the respective <literal moreinfo="none">make.out</literal> files. e.g. make.setup will write
    errors to <literal moreinfo="none">make.out.setup</literal></para></listitem><listitem><para> 
    <literal moreinfo="none">Newsspool</literal>,  which queues incoming
    batches in <literal moreinfo="none">$NEWSARTS/in.coming</literal> directory should run as
    set-userid and set-groupid. This is done.</para></listitem><listitem><para> 
    A softlink is made to <literal moreinfo="none">/var/lib/news</literal> from 
    <literal moreinfo="none">/usr/lib/news.</literal>	</para></listitem><listitem><para> 
    The NNTP software is installed. </para></listitem><listitem><para> 
    Sets up the manpages for C News and makes it world
    readable. The NNTP manpages get installed when the software is installed.
    Compiles the C News documentation <literal moreinfo="none">guide.ps</literal> and makes it 
    readable and available in <literal moreinfo="none">/usr/doc/packages/news</literal> or
    <literal moreinfo="none">/usr/doc/news</literal>.</para></listitem><listitem><para> 
    Checks for the PGP binary and asks the administrator to get
    it, if not found.</para></listitem></itemizedlist></section><section id="configuresystem"><title>Configuring the system: What and how to configure files?</title><para>Once installed, you have to now configure the system to accept feeds and 
batch them for your neighbours. You will have to do the following:</para><itemizedlist><listitem><para><literal moreinfo="none">nntpd</literal>:  
    Copy the compiled <literal moreinfo="none">nntpd</literal> into a directory where
    executables are kept and activate it. It runs on port 119 as a daemon
    through <literal moreinfo="none">inetd</literal> unless you have compiled it as stand-alone.
    An entry in the <literal moreinfo="none">/etc/services</literal> file for nntp would look 
    like this:
    <screen format="linespecific">nntp	119/tcp    \# Network News Transfer Protocol</screen>
    An entry in the <literal moreinfo="none">inetd.conf </literal>file will be:
    <screen format="linespecific"> nntp    stream    tcp    nowait   news    path-to-tcpd  path-to-nntpd </screen>

    The last two fields in the <literal moreinfo="none">inetd.conf</literal> file are paths to 
    binaries of the <literal moreinfo="none">tcp</literal> and the <literal moreinfo="none">nntp </literal> 
    daemon respectively.</para></listitem><listitem><para><emphasis role="bold">Configuring control files:</emphasis>  
    There are plenty of control files in <literal moreinfo="none">$NEWSCTL</literal> that will
    need to be configured before you can
    start using the news system.  The files mentioned here are also discussed
    in the first section of the section titled 
    <quote><xref linkend="component"></xref>ent</quote>. These control files are 
    dealt in detail in the following below.</para><itemizedlist><listitem><para><literal moreinfo="none">sys</literal>:
	One line per system/NDN listing all the
	newsgroup hierarchies each system subscribes to. Each line is prefixed
	with the system name and the one beginning with
	<screen format="linespecific">ME:</screen> indicates what your
	server is willing to receive. Following are typical entries that go into
	this file: <screen format="linespecific">ME:comp,news,misc,netscape</screen>
	This line indicates what newsgroups your server has subscribed to.
	<screen format="linespecific">server/server.starcomsoftware.com:all,!general/all:f</screen>
        This is a list of newsgroups your server will pass on to your NDN.
	The newsgroups specified should be a comma separated list and the entire
	line should contain no spaces. The <emphasis>f</emphasis> flag indicates
	that the newsgroup name and the article number alongwith its size will 
	make up one entry in the <literal moreinfo="none">togo</literal> file in the 
	<literal moreinfo="none">$NEWSARTS/out.going</literal> directory.
    </para></listitem><listitem><para><literal moreinfo="none">explist</literal>: 
	This file has entries indicating which articles expire and when and  
	whether they have to be archived. The order in which the newsgroups are
	listed is important. An example follows:
	<screen format="linespecific">comp.lang.java.3d    x    60    /var/spool/news/Archive</screen>
	This means that the articles of comp.lang.java expire after 60 days and
	shall be archived in the directory mentioned in the fourth field. 
	Archiving is an option. The second field indicates that this line 
	applies to both moderated and unmoderted newsgroups.
	<emphasis>m</emphasis> would 
	specify moderated and <emphasis>u</emphasis> would specify unmoderated
	groups. If you want to specify an extremely large no. as the expiry
	period you can use the keyword <quote>never</quote>. 
    </para></listitem><listitem><para><literal moreinfo="none">batchparms</literal>:
	<literal moreinfo="none">sendbatches</literal> is a program that
	administers batched transmission of news articles to other sites. To do
	this it consults the <literal moreinfo="none">batchparms</literal> file. Each line in 
	the file specifies the behaviour for each of your NDN mentioned in the
	<literal moreinfo="none">sys</literal> file. There are five fields for each site to be
	specified.</para><screen format="linespecific"> server   u     100000    100    batcher | gzip -9 | viauux -d gunzip </screen><para>	The first field is the site name which matches the entry in the 
	<literal moreinfo="none">sys</literal> file and has a corresponding directory in 
	<literal moreinfo="none">$NEWSARTS/out.going </literal>by that name. 
    </para><para>	
	The second field is the class of the site,<emphasis>u</emphasis> for 
	UUCP and <emphasis>n </emphasis>for NNTP feeds. A <quote>!</quote> in 
	this field means that batching for this site has been disabled. 
    </para><para>	The third field is the size of batches to be prepared in bytes.
    </para><para>	
	The fourth field is the maximum length of the output queue for 
	transmission to that site.
    </para><para>	The fifth field is the command line to be used to build, compress and
	transmit batches to that site. The contents of the 
	<literal moreinfo="none">togo </literal> file are made available on standard input.
    </para></listitem><listitem><para><literal moreinfo="none">controlperm</literal>:
	This file controls how the news
	system responds to control messages. Each line consists of 4-5 fields
	separated by white space. Control messages has been discussed in 
	<quote><xref linkend="controlmsg"></xref>ent</quote>.
	</para><screen format="linespecific">comp,sci    tale@uunet.uu.net   nrc    pv   news.announce.newsgroups</screen><para>	The first field is a newsgroup pattern to which the line applies.
    </para><para>	The second field is either the keyword <quote>any</quote> or an e-mail 
	address. The latter specifies that the line applies to control messages
	from only that author.
    </para><para>	The third field is a set of opcode letters indicating what control
	operations need to be performed on messages emanating from the e-mail
	address mentioned in the second field. <emphasis>n</emphasis> stands for
	creating a newgroup, <emphasis>r</emphasis> stands for deleting a 
	newsgroup and <emphasis>c</emphasis> stands for checkgroup. 
    </para><para>	The fourth field is a set of flag letters indicating how to respond to
	a control message that meets all the applicability tests:
	<screen format="linespecific">	     y 	Do it.
	     n	Don't do it.
	     v 	Report it and include the entire control
	        message in the report.
	     q 	Don't report it.
	     p	Do it iff the control message carries a valid PGP signature. 
	</screen>
	Exactly one of y, n or p must be present.
    </para><para>	The fifth field, which is optional, will be used if the fourth field
	contains a <emphasis>p</emphasis>. It must contain the PGP key ID of the
	public key to be used for signature verification. 
    </para></listitem><listitem><para><literal moreinfo="none">mailpaths</literal>: 
	This file describes how to reach
	the moderators of various hierarchies of newsgroups by mail. Each line
	consists of two fields: a news group pattern and an e-mail address. The
	first line whose group pattern matches the newsgroup is used. As an
	example:

       <screen format="linespecific">	   comp.lang.java.3d		somebody@mydomain.com
	   all				%s@moderators.uu.net
      </screen>

	In the second example, the <literal moreinfo="none">%s</literal> gets replaced with the
	groupname and all dots appearing in the newsgroup name are substituted 
	with dashes.
    </para></listitem><listitem><para><emphasis role="bold">Miscellaneous files:</emphasis>
	The other files to be modified are:
	<itemizedlist><listitem><para><literal moreinfo="none">mailname:</literal>
	    Contains the Internet domain name of the
	    news system.  Consider getting one if you don't have it.
	</para></listitem><listitem><para><literal moreinfo="none">organization:</literal> 
	    Contains the default value for the <literal moreinfo="none">Organization:</literal>
	    header for postings originating locally.
	</para></listitem><listitem><para><literal moreinfo="none">whoami:</literal>
	    Contains the name of the news system. This is the site name used in
	    the <literal moreinfo="none">Path:</literal> headers and hence should concur with
	    the names your neighbours use in their <literal moreinfo="none">sys</literal> files.
	</para></listitem></itemizedlist>
    </para></listitem><listitem><para><literal moreinfo="none">active </literal>file:
	This file specifies one line for each
	newsgroup (not just the hierarchy) to be found on your news system. You
	will have to get the most recent copy of the active file from 
	<literal moreinfo="none">ftp://ftp.isc.org/usenet/CONFIG/active</literal> and prune it
	to delete newsgroups that you have not subscribed to. Run the script 
	<literal moreinfo="none">addgroup</literal> for each newsgroup in this file which will 
	create relevant directories in the <literal moreinfo="none">$NEWSARTS</literal> area. 
	The <literal moreinfo="none">addgroup</literal> script takes
	two paramters: the newsgroup name being created and a flag. The flag can
	be any one of the following:
	<screen format="linespecific">	    y		local postings are allowed
	    n 		no local postings, only remote ones
	    m		postings to this group must be approved
	                by the moderator
	    j		articles in this group are only passed and not kept
	    x		posting to this newsgroup is disallowed
	    =foo.bar	articles are locally filed in
	                "foo.bar" group
	</screen>

	An entry in this file looks like this:

	 <screen format="linespecific">comp.lang.java.3d	0000003716	01346	m </screen>

	The first field is the name of the newsgroup. The second field is the
	highest article number that has been used in that newsgroup. The
	third field is the lowest article number in the group. The fourth
	field is a flag as explained above.
    </para></listitem><listitem><para><literal moreinfo="none">newsgroups </literal>file:
	This contains a one-line description
	of each newsgroup to be found in the active file. You will have to
	get the most recent file from
	<literal moreinfo="none">ftp://ftp.isc.org/usenet/CONFIG/newsgroups</literal> 
	and prune it to remove unwanted information. As an example:

	<screen format="linespecific">comp.lang.java.3d 	3D Graphics APIs for the Java language</screen>
    </para></listitem><listitem><para><emphasis role="bold">Aliases: </emphasis>
	These aliases are required for trouble reporting. 
	Once the system is in place and scripts are run, anomalies/problems
	are reported to addresses in the <literal moreinfo="none">/etc/aliases</literal> file. 
	These entries include email addresses for <literal moreinfo="none">newsmaster, 
	newscrisis, news, usenet,  newsmap</literal>.
	They should ideally point to an email address that will be 
	accessed at regularly.  Arrange the emails for 
	<literal moreinfo="none">newsmap</literal> to be discarded to minimize the effect of 
	<literal moreinfo="none">sendsys bombing</literal> by practical jokers.
    </para></listitem><listitem><para><emphasis role="bold">Cron jobs:</emphasis> 
	Certain scripts like <literal moreinfo="none">newsrun</literal> that picks up incoming 
	batches and maintenance scripts, should run through news-database 
	owner's cron which is <literal moreinfo="none">news</literal>. The cron entries ideally
	will be for the following: A more detailed report can be found in 
	<quote><xref linkend="cronjobs"></xref>ent</quote> 
	<orderedlist inheritnum="ignore" continuation="restarts"><listitem><para><literal moreinfo="none">newsrun: </literal>
	    This script processes incoming batches of
	    article.  Run this as frequently as you want them to get digested.
	</para></listitem><listitem><para><literal moreinfo="none">sendbatches:</literal>
	    This script transmit batches to the
	    NDNs. Set the frequency according to your requirements.
	</para></listitem><listitem><para><literal moreinfo="none">newsdaily:</literal>
	    This should be run ideally once a day
	    since it reports errors and anomalies in the news system.
	</para></listitem><listitem><para><literal moreinfo="none">newswatch:</literal>
	    This looks for errors/anomalies at a more detailed level and hence
	    should be run atleast once every hour
	</para></listitem><listitem><para><literal moreinfo="none">doexpire:</literal>
	    This script expires old articles as
	    determined by the explist file. Run this once a day.
	</para></listitem></orderedlist>
    </para></listitem><listitem><para><literal moreinfo="none">newslog: </literal>
	Make an entry in the system's <literal moreinfo="none">syslog.conf</literal>
	file for logging messages spewed out by <literal moreinfo="none">nntpd</literal> in 
	<literal moreinfo="none">newslog </literal>.  It should be located in 
	<literal moreinfo="none">$NEWSCTL</literal>. The entry will look like this:

	<screen format="linespecific">news.debug		-/var/lib/news/newslog</screen>
    </para></listitem><listitem><para><literal moreinfo="none">Newsboot: </literal>
	Have this run (as <literal moreinfo="none">news</literal> the
	news-database owner) when the system boots to clear out debris left
	around by crashes.
    </para></listitem><listitem><para>Add a Usenet mailer in sendmail: 
	The <literal moreinfo="none">mail2news</literal> program provided as 
	part of the source code is a handy tool to send an e-mail to a newsgroup
	which gets digested as an article. You will have to add the following 
	ruleset and mailer definition in your <literal moreinfo="none">sendmail.cf </literal>
	file:</para><itemizedlist><listitem><para>Under SParse1, add the following:
	    <screen format="linespecific">	    R$+ . USENET ent @ $=w . ent      $#usenet     $: $1
	    </screen>
	</para></listitem><listitem><para>Under mailer definitions, define the mailer Usenet as:
	<screen format="linespecific">	    MUsenet 	 P=/usr/lib/newsbin/mail2news/m2nmailer, F=lsDFMmn, 
		S=10, R=0, M=2000000, T=X-Usenet/X-Usenet/X-Unix, A=m2nmailer $u
	</screen>
	</para></listitem></itemizedlist><para>In order to send a mail to a newsgroup you will now have to suffix
	the newsgroup name with usenet <emphasis>i.e.</emphasis> your 
	<literal moreinfo="none">To:</literal>  header will look like this:
	<screen format="linespecific">To: misc.test.usenet@yourdomain.</screen>
	The mailer definition of usenet will intercept this mail and post it to
	the respective newsgroup, in this case, <literal moreinfo="none">misc.test</literal>
	</para></listitem></itemizedlist><para>This, more or less, completes the configuration part.</para></listitem></itemizedlist></section><section><title>Testing the system</title><para>To locally test the system, follow the steps given below:</para><itemizedlist><listitem><para>post an article: 
    Create a local newsgroup
    <screen format="linespecific">    cnewsdo addgroup mysite.test y
    </screen>
    and using <literal moreinfo="none">postnews </literal>post an article to it.</para></listitem><listitem><para>Has it arrived in <literal moreinfo="none">$NEWSARTS/in.coming</literal>?:
    The article should show up in the directory mentioned. Note the nomenclature
    of the article.</para></listitem><listitem><para>When newsrun runs: 
    When newsrun runs from <literal moreinfo="none">cron </literal>, the article disappears from
    <literal moreinfo="none">in.coming</literal> directory and appears in 
    <literal moreinfo="none">$NEWSARTS/mysite/test</literal>. Look how
    the <literal moreinfo="none">newsgroup, active, log and history </literal>(not the errorlog)
    files and <literal moreinfo="none">.overview </literal>file in
    <literal moreinfo="none">$NEWSARTS/mysite/test</literal> reflect the digestion of the file
    into the news system.</para></listitem><listitem><para>reading the article: 
    Try to read the article through <literal moreinfo="none">readnews</literal> or any 
    news client. If you are able to, then you have set most everything right.</para></listitem></itemizedlist></section><section><title><literal moreinfo="none">pgpverify</literal> and <literal moreinfo="none">controlperms</literal></title><para>    As mentioned in <quote><xref linkend="controlmsg"></xref>ent</quote>, it becomes 
    necessary to authenticate control messages to protect yourself from being 
    attacked by pranksters. For this, you will have to configure the
    <literal moreinfo="none">$NEWSCTL/controlperm </literal>file to declare whose control
    messages you are willing to honour and for what newsgroups alongwith their
    public key ID. The <literal moreinfo="none">controlperm</literal> manpage shall give you 
    details on the format.</para><para>    This will work only in association with <literal moreinfo="none"> pgpverify </literal> which
    verifies the Usenet control messages that have been signed using the 
    <literal moreinfo="none">signcontrol</literal> process. The script can be found at
    <literal moreinfo="none">ftp://ftp.isc.org/pub/pgpcontrol/pgpverify</literal>. 
    <literal moreinfo="none"> pgpverify </literal>internally uses the PGP binary which
    will have to be made available in the default executables directory. If you
    wish to send control messages for your local news system, you will have to
    digitally sign them using the above mentioned <literal moreinfo="none">signcontrol</literal>
    program which is available at
    <literal moreinfo="none">ftp://ftp.isc.org/pub/pgpcontrol/signcontrol</literal>. You will
    also have to configure the <literal moreinfo="none">signcontrol</literal> program accordingly.</para></section><section><title>Feeding off an upstream neighbour</title><para>    For external feeds, commercial customers will have to buy them
    from a regular News Provider like <literal moreinfo="none">dejanews.com</literal>
    or <literal moreinfo="none">newsfeeds.com</literal>. You will have to specify
    to them what hierarchies you want and decide on the mode of
    transmission, <emphasis>i.e.</emphasis> UUCP or NNTP, based on
    your requirements. Once that is done, you will have to ask them to
    initiate feeds, and check <literal moreinfo="none">$NEWSARTS/in.coming</literal>
    directory to see if feeds are coming in.</para><para>    If your organisation belongs to the academic community or is
    otherwise lucky enough to have an NDN server somewhere which is
    willing to provide you a free newsfeed, then the payment issue goes
    out of the picture, but the rest of the technical requirements
    remain the same.</para><para>    One problem with incoming NNTP feeds is that it is far easier to use
    (relatively) efficient NNTP inflows if you have a server with a
    permanent Internet connection and a fixed IP address. If you are a
    small office with a dialup Internet connection, this may not be
    possible. In that case, the only way to get incoming newsfeeds by
    NNTP may be by using a highly inefficient pull feed.</para></section><section><title>Configuring outgoing feeds</title><para>    If you are a leaf node, you will only have to send feeds back to your
    news provider for your postings in public newsgroups to propagate
    to the outside world. To enable this, you need one line in the
    <literal moreinfo="none">sys</literal> and <literal moreinfo="none">batchparms</literal> files
    and one directory in <literal moreinfo="none">$NEWSARTS/out.going</literal>. If
    you are willing to transmit articles to your neighbouring
    sites, you will have to configure <literal moreinfo="none">sys</literal> and
    <literal moreinfo="none">batchparms</literal> with more entries. The number of directories
    in <literal moreinfo="none">$NEWSARTS/out.going</literal> shall increase, too. Refer
    to first two sections of the chapter titled 
    <quote><xref linkend="component"></xref>ent</quote>for a better understanding of
    outgoing feeds. Again, you will have to determine how you wish to
    transmit the feed: UUCP or NNTP.</para><section><title>By UUCP</title><para>For outgoing feeds by UUCP, we recommend that you start with
    Taylor UUCP. In fact, this is the UUCP version which forms part
    of the GNU Project and is the default UUCP on Linux
    systems.</para><para>A full treatment of UUCP configuration is beyond the scope of
    this document. However, the basic steps will be as follows. First,
    you will have to define a <quote>system</quote> in your Usenet server for the
    NDN (next door neighbour) host. This definition will include various
    parameters, including the manner in which your server will call the
    remote server, the protocol it will use, <emphasis>etc.</emphasis>
    Then an identical process will have to be followed on the NDN
    server's UUCP configuration, for your server, so that
    <emphasis>that</emphasis> server can recognize
    <emphasis>your</emphasis> Usenet server.</para><para>Finally, you will need to set up appropriate
    <literal moreinfo="none">cron</literal> jobs for the user <literal moreinfo="none">uucp</literal>
    to run <literal moreinfo="none">uucico</literal> periodically. Taylor UUCP comes with
    a script called <literal moreinfo="none">uusched</literal> which may be modified to
    your requirements; this script calls <literal moreinfo="none">uucico</literal>. One
    <literal moreinfo="none">uucico</literal> connection will both upload and download
    news batches. Smaller sites can run <literal moreinfo="none">uusched</literal> even
    once or twice a day.</para><para>Later versions of this document will include the
    <literal moreinfo="none">uusched</literal> scripts that we use in Starcom. We use
    UUCP over TCP/IP, and we run the <literal moreinfo="none">uucico</literal>
    connection through an SSH tunnel, to prevent transmission of
    UUCP passwords in plain text over the Internet, and our SSH tunnel
    is established using public-key cryptography, without passwords
    being used anywhere.</para></section><section><title>By NNTP</title><para>For NNTP feeds, you will have to decide whether your server
    will be the connection initiator or connection recipient. If you are
    the connection initiator, you can send outgoing NNTP feeds more
    easily. If you are the connection recipient, then outgoing feeds
    will have to be pulled out of your server using the NNTP
    <literal moreinfo="none">NEWNEWS</literal> command, which will place heavy loads on
    your server. This is not recommended.</para><para>Connecting to your NDN server for pushing out outgoing feeds
    will require the use of the <literal moreinfo="none">nntpsend.sh</literal> script,
    which is part of the NNTPd source tree. This script will perform
    some housekeeping, and internally call the
    <literal moreinfo="none">nntpxmit</literal> binary to actually send the queued set
    of articles out. You may have to provide authentication information
    like usernames and passwords to <literal moreinfo="none">nntpxmit</literal> to allow
    it to connect to your NDN server, in case that server insists on
    checking the identity of incoming connections. (You can't be too
    careful in today's world.) <literal moreinfo="none">nntpsend.sh</literal> will clean
    up after an <literal moreinfo="none">nntpxmit</literal> connection finishes, and
    will requeue any unsent articles for the next session. Thus, even if
    there is a network problem, typically nothing is lost and all
    pending articles are transmitted next time.</para><para>Thus, pushing feeds out <emphasis>via</emphasis> may mean
    setting up <literal moreinfo="none">nntpsend.sh</literal> properly, and then
    invoking it periodically from <literal moreinfo="none">cron</literal>. If your
    Usenet server connects to the Internet only intermittently, then the
    process which sets up the Internet connection should be extended or
    modified to fire <literal moreinfo="none">nntpsend.sh</literal> whenever the Internet
    link is established. For instance, if you are using the Linux
    <literal moreinfo="none">pppd</literal>, you can add statements to the
    <literal moreinfo="none">/etc/ppp/ip-up</literal> script to change user to
    <literal moreinfo="none">news</literal> and run <literal moreinfo="none">nntpsend.sh</literal></para></section></section></section><section><title>Setting up INN</title><section><title>Getting the source</title><para>INN is maintained and archived by the ISC (Internet Software
Consortium, <literal moreinfo="none">www.isc.org</literal>) since 1996, and the INN
homepage is at <literal moreinfo="none">http://www.isc.org/products/INN/</literal>. The
latest release of INN as of the time of this writing is INN v2.3.3,
released 7 May 2002. The full sources can be downloaded from
<literal moreinfo="none">ftp://ftp.isc.org/isc/inn/inn-2.3.3.tar.gz</literal></para></section><section><title>Compiling and installing</title><para>TO BE EXTENDED LATER.</para></section><section><title>Configuring the system</title><para>TO BE ADDED LATER.</para></section><section><title>Setting up <literal moreinfo="none">pgpverify</literal></title><para>TO BE ADDED LATER.</para></section><section><title>Feeding off an upstream neighbour</title><para>TO BE ADDED LATER.</para></section><section><title>Setting up outgoing feeds</title><para>TO BE ADDED LATER.</para></section><section id="innefficiency"><title>Efficiency issues and advantages</title><para>TO BE ADDED LATER.</para></section></section><section><title>Connecting email with Usenet news</title><para>Usenet news and mailing lists constantly remind us of each other.  And the
parallels are so strong that many mailing lists are gatewayed two-way
with corresponding Usenet newsgroups, in the <literal moreinfo="none">bit</literal> hierarchy
which maps onto the old BITNET, and elsewhere.</para><para>There are probably ten different situations where a mailing list is
better, and ten others where the newsgroup approach works better. The
point to recognise is that the system administrator needs a choice of
gatewaying one with the other, whenever tradeoffs justify it. Instead
of getting into the tradeoffs themselves, this chapter will then focus
on the mechanisms of gatewaying the two worlds.</para><para>One clear and recurring use we find for this gatewaying is for mailing
lists which are of general use to many employees in a corporate network.
For instance, in stockbroking company, many employees may like to
subscribe to a business news mailing list. If each employee had to
subscribe to the mailing list independently, it would waste mail spool
area and perhaps bandwidth. In such situations, we receive the mailing
list into an internal newsgroup, so that individual mailboxes are not
overloaded. Everyone can then read the newsgroup, and messages are also
archived till expired.</para><section><title>Feeding Usenet news to email</title><para>In CNews, this is trivially done by adding one line to the
<literal moreinfo="none">sys</literal> file, defining a new outgoing feed listing all the
relevant groups and distributions, and specifying the commandline to be executed
which is supposed to send out the outgoing message to that ``feed.'' This
command, in our case, should be a mail-sending program,
<emphasis>e.g.</emphasis>
<literal moreinfo="none">/bin/mail user@somewhere.com</literal>. This is often adequate to get
the job done. We are sure almost every Usenet news software system will have
an equally easy way of piping the feed of a newsgroup to an email address.</para></section><section><title>Feeding email to news: the <literal moreinfo="none">mail2news gateway</literal></title><para>With our Usenet software sources has been integrated a set of
scripts which we have been using for at least five years internally.
This set of scripts is called <literal moreinfo="none">mail2news</literal>. It contains
one shellscript called <literal moreinfo="none">mail2news</literal>, which takes an
email message from <literal moreinfo="none">stdin</literal>, processes it, and feeds the
processed version to <literal moreinfo="none">inews</literal>, the
<literal moreinfo="none">stdin</literal>-based news article injection utility of C-News.
The <literal moreinfo="none">inews</literal> utility accepts a new article post in its
<literal moreinfo="none">stdin</literal> and queues it for digestion by
<literal moreinfo="none">newsrun</literal> whenever it runs next.</para><para>To use <literal moreinfo="none">mail2news</literal>, we assume you are using
Sendmail to process incoming email. Our instructions can easily be
modified to adapt to any Mail Transport Agent (MTA) of your choice. You
will have to configure Sendmail or any other MTA to redirect incoming
mails for the gateway to a program called <literal moreinfo="none">m2nmailer</literal>,
a Perlscript which accepts the incoming message in its standard input
and a list of newsgroup names, space separated, on its command line.
Sendmail can be easily configured to trigger <literal moreinfo="none">m2nmailer</literal>
this way by defining a new mailer in <literal moreinfo="none">sendmail.cf</literal>,
and directing all incoming emails meant for the Usenet news system to
this mailer. Once you set up the appropriate rulesets for Sendmail,
it automatically triggers <literal moreinfo="none">m2nmailer</literal> each time an
incoming email comes for the <literal moreinfo="none">mail2news</literal>
gateway.</para><para>The precise configuration changes to Sendmail have already been
specified in the chapter titled ``Setting up C-News + NNTPd.''</para></section><section><title>Using GNU Mailman as an email-NNTP gateway</title><para>TO BE ADDED LATER</para><section><title>GNU's all-singing all-dancing MLM</title><para>TO BE ADDED LATER</para></section><section><title>Features of GNU Mailman</title><para>TO BE ADDED LATER</para></section><section><title>Gateway features connecting NNTP and email</title><para>TO BE ADDED LATER</para></section></section></section><section><title>Security issues</title><para>It almost seems strange that we are discussing security issues in
the context of Usenet news servers. Usenet news has been one of the most
open and free-for-all network services traditionally. However, with the
exponential growth of the Internet, all services are becoming aware of
potential threats. The community of Internet intruders too has acquired
new profiles: a lot of Internet intrusion attempts are program-driven,
and exploit a set of ``well known'' vulnerabilities,
<emphasis>i.e.</emphasis> vulnerabilities which have been identified by
the computer security and intrusion community and published in their
reports and advisories. Thus, the question of ``Why will someone attack
my harmless Usenet server?'' is no longer valid. It will be attacked if
it can be attacked, merely because its IP address falls in a range of
addresses being targeted, perhaps.</para><para>Security issues for Usenet news servers fall into two categories.
First come vulnerabilities which will allow an attacker to bring down
your server or run code of his choice on it. Second come vulnerabilities
which can distort or corrupt your Usenet article hierarchy, either by
junk postings, unsolicited commercial messages, or forged control
messages. The second category of threats is specific to Usenet news and
needs Usenet-specific protection mechanisms, some of which require
tapping into defence mechanisms designed by the Usenet administrator
community.</para><section><title>Intrusion threats</title><para>Here we discuss the vulnerabilities which will allow an intruder
to ``gain control'' of your Usenet server, or ``bring it down,'' either
of which may be irritating, embarassing, or downright disastrous for your
business or occupation.</para><section><title>Generic server vulnerabilities</title><para>Foremost among these vulnerabilities are those which render
<emphasis>any</emphasis> server vulnerable to intrusion attempts.
Most of these vulnerabilities are unrelated to Usenet news itself. For
instance, if you have the Telnet service active on a server exposed to
the Internet, then it is likely that systematic attempts by intruders
to acquire usernames and passwords will bear fruit, using methods we
will best leave to specialised texts on the subject. Once this is done,
the intruder will merely ``walk into'' your server by Telnetting into
it.</para><para>We will not discuss this class of vulnerabilities here any further;
they belong in documents dedicated to general security issues. For
further reading, check the ``Security HOWTO'', the ``Security Quickstart
HOWTO'', the ``User Authentication HOWTO'', the ``VPN HOWTO'', and
the ``VPN Masquerade HOWTO'' ... and that's just from the Linux HOWTO
collection. As one can see, there is, if anything, a surfeit of material
on this and related subjects.</para><para>There are vulnerabilities which allow an intruder to mount the
so-called DoS attacks, which make your service inaccessible to
legitimate users, even though it does not let the intruder in. The most
publicised of these attacks were the SYNFlood and the Ping of Death
attacks, both quite old and well-understood by now. A Linux server
running a recent version of the kernel and properly configured, should
be immune to both these attack methods. But network protocols being
what they are, there are always new DoS methods being thought up, which
can temporarily overload or slow down a server. Once again, the texts
discussing generic security issues are the best place to study these
vulnerabilities.</para></section><section><title>Vulnerabilities in Usenet software</title><para>Then come server vulnerabilities, if any, which are caused
specifically by Usenet news software. For instance, if it was possible
for an intruder to issue some string of bytes to your server's NNTP
server and cause it to execute a command of the intruder's choice, then
this vulnerability would be in this category.</para><para>Any server which accepts a text string as input from a
client is open to the buffer overrun class of attacks, if the
<literal moreinfo="none">gets()</literal> C library function has been used in its code
instead of the <literal moreinfo="none">fgets()</literal> with a buffer size limit. This
was a vulnerability made famous by the 1988 Morris Internet Worm,
discussions on which can be found elsewhere. (Go Google for it if you're
keen.) As far as we know, the INN NNTP server and the
<literal moreinfo="none">nntpd</literal> which forms part of the NNTP Reference
Implementation both have no known buffer overrun vulnerabilities. This
class of vulnerabilities is less significant in the case of NNTPd or
INN because these daemons do not run as <literal moreinfo="none">root</literal>. In
fact, they would begin to cause malfunctioning of the underlying Usenet
software if they ran as <literal moreinfo="none">root</literal>. Therefore, even if an
intrepid intruder could find some way of gaining control of these
daemons, she would only be able to get into the server as user
<literal moreinfo="none">news</literal>, which means that she can play havoc with the
Usenet installation, but no further. A daemon which runs as
<literal moreinfo="none">root</literal>, if compromised, can allow an intruder to take
control of the operating system itself.</para><para>UUCP is generally believed to be insecure. We believe a careful
configuration of Taylor UUCP plugs a lot of these vulnerabilities. One
vulnerability with UUCP over TCP is that the username and password travel
in plaintext form in TCP data streams, much like with Telnet or FTP. We
therefore do not advise using UUCP over TCP in this manner if security
is a concern at all. We recommend the use of UUCP through a SSH tunnel,
with the SSH setup working only with a pre-installed public key. This way,
there is no need for usernames and passwords for the SSH tunnel setup,
and passwords cannot be leaked even intentionally. And the UUCP username
and password then passes through this encrypted tunnel and is therefore
totally superfluous for security; the preceding SSH tunnel provides a much
stronger connection authentication than the UUCP username and password.
And since we set up our SSH tunnels to demand key-based authentication
only, it rejects any attempt to connect using usernames and passwords
when the tunnel is being set up.</para><para>A third possible vulnerability is related to the back-end software
which processes incoming Usenet articles. It is conceivable that an
NNTP server will receive an incoming <literal moreinfo="none">POST</literal> command,
receive an article, and queue it for processing on the local spool;
the NNTP server often does not perform any real-time processing on the
incoming post. The post-processing software which periodically processes
the incoming spool (the <literal moreinfo="none">in.coming</literal> directory in C-News)
will read this article and somehow be forced to run a command of the
intruder's choice, either by buffer overrun vulnerabilities or any
other means.</para><para>While this possibility exists, it appears that neither the C-News
<literal moreinfo="none">newsrun</literal> and family nor INN are vulnerable to this class
of attempts. We base our comment on the solid evidence that both these
systems have been around in an intrusion-prone world of public Usenet
servers for more than a decade. INN, the newer of the two, completed
one decade of life on 20 August 2002. And both these software systems
had their source freely available to all, including intruders. We can be
fairly certain that if vulnerabilities of this class have not been seen,
it not for want of intrusion attempts.</para></section></section><section><title>Vulnerabilities unique to the Usenet service</title><para>There are certain security precautions that a Usenet server
administrator has to take to ensure that her servers are not swamped by
irritating junk or configured out of shape by spurious control messages.
These vulnerabilities do not allow an intruder to run her software on
your servers, but allows her to mess up your server, causing you to lose
a precious weekend (or week) straightening out the mess.</para><section><title>Unsolicited commercial messages</title><para>Unsolicited commercial messages are called SPAM. There is a
war against SPAM being fought in the Internet community. The biggest
battlefront is in the world of email. Second to that is Usenet
newsgroups.</para><para>There are many tools that Usenet administrators use in their
battle against SPAM. The most important of these is the NoCeM suite. See
<literal moreinfo="none">http://www.cm.org/</literal> for details of NoCeM, and the
newsgroup <literal moreinfo="none">alt.nocem.misc</literal> for the SPAM cancel messages
which NoCeM reads to identify which articles to discard. Your server
will need a feed of <literal moreinfo="none">alt.nocem.misc</literal> to use the NoCeM
facility. These special messages are signed by NoCeM volunteers whose
job is to identify SPAM articles, list their message-IDs, and then
issue these deletion instruction, digitally signed with special private
keys, which tell all Usenet servers to delete the SPAM messages. Your
server's NoCeM software will need public key software (typically PGP)
and a keyring with the public key of each NoCeM volunteer you want to
accept instructions from.</para><para>Other anti-spam tools for Usenet services are listed in the
Anti-SPAM Software Web page
(<literal moreinfo="none">http://www.exit109.com/~jeremy/news/antispam.html</literal>).
The <literal moreinfo="none">Cleanfeed</literal> software will clean out articles
identified as SPAM. There are many others.</para><para>SPAM is such a nuisance and a drain on organisational expense
pockets (by wasting bandwidth you pay for) that it is almost imperative
today that every Usenet server protects itself against it. We will
integrate some selected anti-SPAM measures into our integrated source
distribution soon.</para></section><section><title>Spurious control messages</title><para>Control messages, discussed in detail earlier in <xref linkend="controlmsg"></xref>ent, instruct a Usenet server to take certain
actions, like delete a message or create a newsgroup. If this facility
is ``open to the public'', anyone with half a brain can forge control
messages to create twenty new newsgroups, and then post thousands of
articles into those groups. In the mid-nineties, we were hit by a storm
of over 2,000 (two thousand) <literal moreinfo="none">newgroup</literal> control
messages, which rapidly taught us the danger of unprotected control
messages and the protection against them.</para><para>The standard protection mechanism against this
vulnerability is <literal moreinfo="none">pgpverify</literal>, which can be
downloaded from multiple Websites and FTP mirror sites by
searching for <literal moreinfo="none">pgpverify</literal> (the program) or
<literal moreinfo="none">pgpcontrol</literal> (the total software package). We have
integrated this into our source distribution, so that our C-News works
in a tightly coupled manner with <literal moreinfo="none">pgpverify</literal>.</para><para><literal moreinfo="none">pgpverify</literal> works using public key cryptography,
much like NoCeM, and all the official maintainers of respective Usenet
group hierarchies sign control messages using their private keys. Your
server will carry their public keys, and <literal moreinfo="none">pgpverify</literal>
will check the sign on each control message to ensure that it's from the
official maintainer of the hierarchy. It will then act upon legit
control messages and discard the spurious ones.</para><para>In today's nuisance-ridden Usenet environment, no sane Usenet
server administrator receiving a feed of ``public'' hierarchies and
control messages will even dream of running her server without
<literal moreinfo="none">pgpverify</literal> protection.</para></section></section></section><section><title>Access control in NNTPd</title><para>The original NNTPd had host-based authentication which allowed clients
connecting from a particular IP address to read only certain newsgroups.
This was very clearly inadequate for enterprise deployment on an
Intranet, where each desktop computer has a different IP address, often
DHCP-assigned, and the mapping between person and desktop is not static.</para><para>What was needed was a user-based authentication, where a username and
password could be used to authenticate the user. Even this was provided
as an extension to NNTPd, but more was needed. The corporate IS manager
needs to ensure that certain Usenet discussion groups remain visible only
to certain people. This authorisation layer was not available in NNTPd.
Once authenticated, all users could read all newsgroups.</para><para>We have extended the user-based authentication facility in NNTPd in some
(we hope!) useful ways, and we have added an entire authorisation layer
which lets the administrator specify which newsgroups each user can
read. With this infrastructure, we feel NNTPd is fit for enterprise
deployment and can be used to handle corporate document repositories,
messages, and discussion archives. Details are given below.</para><section><title>Host-based access control</title><para>TO BE ADDED LATER</para></section><section><title>User authentication and authorisation</title><section><title>The NNTPd password file</title><para>TO BE ADDED LATER</para></section><section><title>Mapping users to newsgroups</title><para>TO BE ADDED LATER</para></section><section><title>The <literal moreinfo="none">X-Authenticated-Author</literal> article header</title><para>TO BE ADDED LATER</para></section><section><title>Other article header additions</title><para>TO BE ADDED LATER</para></section></section></section><section id="component" xreflabel="Components of a running system"><title>Components of a running system</title><para>This chapter reviews the components of a running CNews+NNTPd server.
Analogous components will be found in an INN-based system too. We invite
additions from readers familiar with INN to add their pieces to this
chapter.</para><section><title><literal moreinfo="none">/var/lib/news</literal>: the CNews control area</title><para>This directory is more popularly known as <literal moreinfo="none">$NEWSCTL</literal>. It
contains configuration, log and status files. There are no
articles or binaries kept here. Let's see what some of the
files are meant for. Control files are dealt in slightly greater detail in 
<quote><xref linkend="configuresystem"></xref>ent</quote></para><itemizedlist><listitem><para><literal moreinfo="none">sys</literal>: 
    One line per system/NDN listing all the newsgroup 
    hierarchies each system subscribes to. Each line is prefixed with the system
    name and the one beginning with ME: indicates what we are going to receive.
    Look up manpage of <literal moreinfo="none">newssys</literal>.</para></listitem><listitem><para><literal moreinfo="none">explist</literal>:
    This file has entries indicating articles of which
    newsgroup expire and when and if they have to be archived.  The order in
    which the newsgroups are listed is important. See manpage of
    <literal moreinfo="none">expire</literal> for file format.</para></listitem><listitem><para><literal moreinfo="none">batchparms</literal>:
    Details of how to feed other sites/NDN, like the size of
    batches, the mode of transmission (UUCP/NNTP) are specified here.
    manpage to refer: <literal moreinfo="none">newsbatch</literal>.</para></listitem><listitem><para><literal moreinfo="none">controlperm</literal>: 
    If you wish to authenticate a control message before any
    action is taken on it, you must enter authentication-related information 
    here.  The <literal moreinfo="none">controlperm</literal> manpage will list all the fields
    in detail.</para></listitem><listitem><para><literal moreinfo="none">mailpaths</literal>: 
    It features the e-mail address of the moderator for each 
    newsgroup who is responsible for approving/disapproving
    articles posted to moderated newsgroups. The sample
    <literal moreinfo="none">mailpaths</literal> file in the <literal moreinfo="none">tar</literal> will 
    give you an idea of how entries are made.</para></listitem><listitem><para><literal moreinfo="none">nntp_access/user_access</literal>:  
    These files contain entries of servernames 
    and usernames on whom restrictions will apply when accessing newsgroups. 
    Again, the sample file in the tarball shall explain the format of the file.</para></listitem><listitem><para><literal moreinfo="none">log, errlog</literal>: 
    These are log files that keep growing large with each batch 
    that is received. The <literal moreinfo="none">log</literal> file has one entry per
    article telling you if it 
    has been accepted by your news server or rejected. To understand the
    format of this file, refer to Chapter 2.2 of the <literal moreinfo="none">CNews</literal>
    guide.  Errors, if any, while digesting the articles are
    logged in <literal moreinfo="none">errlog</literal>. These 
    log files have to be rolled as the files hog a lot of disk space. </para></listitem><listitem><para><literal moreinfo="none">nntplog</literal>:  
    This file logs information of the <literal moreinfo="none">nntpd</literal> giving
    details of when a connection was established/broken and what commands were 
    issued. This file needs to be configured in <literal moreinfo="none">syslog</literal> 
    <literal moreinfo="none">syslogd</literal> should be running.</para></listitem><listitem><para><literal moreinfo="none">active</literal>: 
    This file has one line per newsgroup to be found in your news
    server. Besides other things, it tells you how many articles are
    currently present in each newsgroup. It is updated when each batch is
    digested or when articles are expired. The <literal moreinfo="none">active</literal>
    manpage will furnish more details about other paramaters.</para></listitem><listitem><para><literal moreinfo="none">history</literal>: 
    This file, again, contains one line per article, mapping 
    <literal moreinfo="none">message-id</literal> to newsgroup name and also giving its
    associated article number in that newsgroup. It is updated
    each time a feed is digested 
    and when <literal moreinfo="none">doexpire</literal> is run. Plays a key role in
    loop-detection and serves as an article database. Read manpage of
    <literal moreinfo="none">newsdb</literal>, <literal moreinfo="none">doexpire</literal> for the file format </para></listitem><listitem><para><literal moreinfo="none">newsgroups</literal>:
    It has a one-line description for each newsgroup explaining 
    what kind of posts go into each of them. Ideally speaking, it should cover 
    all the newsgroups found in the <literal moreinfo="none">active</literal> file.</para></listitem><listitem><para><emphasis>Miscellaneous files</emphasis>:
    Files like <literal moreinfo="none">mailname</literal>, <literal moreinfo="none">organisation</literal>,
    <literal moreinfo="none">whoami</literal> contain information required for forming some of
    the headers of an article. The contents of
    <literal moreinfo="none">mailname</literal> form the <literal moreinfo="none">From:</literal> header and
    that of <literal moreinfo="none">organisation</literal> form the
    <literal moreinfo="none">Organisation:</literal> header. <literal moreinfo="none">whoami</literal> contains
    the name of the news system. Refer to chapter 2.1 of
    <literal moreinfo="none">guide.ps</literal> for a detailed list of files in the
    <literal moreinfo="none">$NEWSCTL</literal> area.  Read <literal moreinfo="none">RFC 1036</literal> for
    description of article headers .</para></listitem></itemizedlist></section><section><title><literal moreinfo="none">/var/spool/news</literal>: the article repository</title><para>This is also known as the <literal moreinfo="none">$NEWSARTS</literal> or
<literal moreinfo="none">$NEWSSPOOL</literal> directory. This is where the
articles reside on your disk. No binaries or control files
should belong here.  Enough space should be allocated to this 
directory as the number of articles keep increasing with each
batch that is digested. An explanation of the following sub-directories will
give you an overview of this directory:
<itemizedlist><listitem><para><literal moreinfo="none">in.coming</literal>:
    Feeds/batches/articles from NDNs on their arrival and
    before being processed reside in this directory. After processing, they
    appear in 
    <literal moreinfo="none">$NEWSARTS</literal> or in its <literal moreinfo="none">bad</literal> sub-directory
    if there were errors. </para></listitem><listitem><para><literal moreinfo="none">out.going</literal>: 
    This directory contains batches/feeds to be sent to your
    NDNs <emphasis>i.e.</emphasis> feeds to be pushed to your neighbouring sites
    reside here before they are transmitted. It contains one sub-directory per 
    NDN mentioned in the <literal moreinfo="none">sys</literal> file. These sub-directories 
    contain files called <literal moreinfo="none">togo</literal> which contain information about
    the article like the <literal moreinfo="none">message-id</literal> or the article number 
    that is queued for transmission. </para></listitem><listitem><para><anchor id="newsgroupdir"></anchor>ent<emphasis>newsgroup directories</emphasis>:
    For each newsgroup hierarchy that the news server
    has subscribed to, a directory is created under
    <literal moreinfo="none">$NEWSARTS</literal>. 
    Further sub-directories are created under the parent to hold
    articles of specific newsgroups. For instance, for a
    newsgroup like <literal moreinfo="none">comp.music.compose</literal>, the parent directory
    <literal moreinfo="none">comp</literal> will appear in <literal moreinfo="none">$NEWSARTS</literal> and a
    sub-directory called <literal moreinfo="none">music</literal> will be created under
    <literal moreinfo="none">comp</literal>. The <literal moreinfo="none">music</literal> sub-directory 
    shall contain a further sub-directory called <literal moreinfo="none">compose</literal> and
    all articles of <literal moreinfo="none">comp.music.compose</literal>
    shall reside here. In effect, article 242 of newsgroup
    <literal moreinfo="none">comp.music.compose</literal> shall map to file
    <literal moreinfo="none">$NEWSARTS/comp/music/compose/242</literal>.</para></listitem><listitem><para><literal moreinfo="none">control</literal>: 
    The control directory houses only the control messages that
    have been received by this site. The control messages could be any of the
    following: <literal moreinfo="none">newgroup, rmgroup, checkgroup</literal> and
    <literal moreinfo="none">cancel</literal> appearing in the subject line of the article.
    More information to be found in <quote><xref linkend="controlmsg"></xref>ent</quote></para></listitem><listitem><para><literal moreinfo="none">junk</literal>: 
    The <literal moreinfo="none">junk</literal> directory contains all
    articles that the news 
    server has received and has decided, after processing, that it does not 
    belong to any of the hierarchies it has subscribed to. The news server 
    transfers/passes all articles in this directory to NDNs
    that have subscribed to the <literal moreinfo="none">junk</literal> hierarchy.</para></listitem></itemizedlist></para></section><section><title><literal moreinfo="none">/usr/lib/newsbin</literal>: the executables</title><para>TO BE ADDED LATER</para></section><section id="cronjobs"><title><literal moreinfo="none">crontab and cron jobs </literal></title><para>The heart of the Usenet news server is the various scripts that run at regular
intervals processing articles, digesting/rejecting them and
transmitting them to NDNs. I shall try to enumerate the ones that are important
enough to be cronned. :)</para><itemizedlist><listitem><para><literal moreinfo="none">newsrun</literal>: 
    The key script. This script picks the batches in the 
    <literal moreinfo="none">in.coming</literal> directory, uncompresses them if necessary and
    feeds it to <literal moreinfo="none">relaynews</literal> which then processes each
    article digesting and batching them and logging any errors. This script
    needs to run through <literal moreinfo="none">cron</literal>
    as frequently as you want the feeds to be digested. Every half hour should 
    suffice for a non-critical requirement.</para></listitem><listitem><para><literal moreinfo="none">sendbatches</literal>: 
    This script is run to transmit the <literal moreinfo="none">togo</literal> files formed in
    the <literal moreinfo="none">out.going</literal> directory to your NDNs. It reads the
    <literal moreinfo="none">batchparms</literal> file to know 
    exactly how and to whom the batches need to be transmitted. The frequency,
    again, can be set according to your requirements. Once an hour should be 
    sufficient.</para></listitem><listitem><para><literal moreinfo="none">newsdaily</literal>: 
    This script does maintenance chores like rolling logs and 
    saving them, reporting errors/anomalies and doing cleanup jobs.
    It should typically run once a day.</para></listitem><listitem><para><literal moreinfo="none">newswatch</literal>: 
    This looks for news problems at a more detailed level than
    newsdaily like looking for persistent lock files or unattended batches, 
    determining space shortage issues, and the likes. This should typically run
    once every hour.  For more on this and the above, read the 
    <literal moreinfo="none">newsmaint</literal> manpage.</para></listitem><listitem><para><literal moreinfo="none">doexpire</literal>: 
    This script expires old articles as determined by the
    control file <literal moreinfo="none">explist</literal> and updates the
    <literal moreinfo="none">active</literal> file. This is necessary if you do not 
    want unnecessary/unwanted articels hogging up your disk space. Run it once 
    a day.  Manpage: <literal moreinfo="none">expire</literal></para></listitem><listitem><para><literal moreinfo="none">newsrunning off/on</literal>: 
    This script shuts/starts off the news server for you.
    You could choose to add this in your cron job if you think the news server 
    takes up lots of CPU time during peak hours and you wish to keep a check on
    it. </para></listitem></itemizedlist></section><section><title><literal moreinfo="none">newsrun</literal> and <literal moreinfo="none">relaynews</literal>: digesting received articles </title><para>The heart and soul of the Usenet News system, <literal moreinfo="none">newsrun</literal> just picks up the batches/
articles in the <literal moreinfo="none">in.coming</literal> directory of
<literal moreinfo="none">$NEWSARTS</literal> and uncompresses them (if required) and calls
<literal moreinfo="none">relaynews</literal>. It should run from cron.</para><para><literal moreinfo="none">relaynews</literal> picks up each article one by one through 
<literal moreinfo="none">stdin</literal>, determines if it belongs to a subscribed group
by looking up
<literal moreinfo="none">sys</literal> file, looks in the <literal moreinfo="none">history</literal> file
to determine that it does not already exist locally, digests it updating the 
<literal moreinfo="none">active</literal>  and <literal moreinfo="none">history</literal> file and batches it
for neighbouring sites. Logs errors on encountering problems while processing
the article and takes appropriate action if it happens to be
a control message. More info in manpage of <literal moreinfo="none">relaynews</literal>.</para></section><section><title><literal moreinfo="none">doexpire</literal> and <literal moreinfo="none">expire</literal>: removing old articles </title><para>A good way to get rid of unwanted/old articles from the
<literal moreinfo="none">$NEWSARTS</literal> area is to run <literal moreinfo="none">doexpire</literal> once a 
day. It reads the
<literal moreinfo="none">explist</literal> file from the <literal moreinfo="none">$NEWSCTL</literal> directory
to determine what articles expire today. It can archive the
said article if so configured. It then updates the
<literal moreinfo="none">active</literal> and the <literal moreinfo="none">history</literal> file accordingly.
If you wish to retain the article entry in the
<literal moreinfo="none">history</literal> file to avoid re-digesting it as a new
article after having expired it, add a special <emphasis>/expired/;</emphasis>
line in the control file. More on the options and functioning in the 
<literal moreinfo="none">expire </literal> manpage.</para></section><section><title><literal moreinfo="none">nntpd</literal> and <literal moreinfo="none">msgidd</literal>: managing the NNTP interface </title><para>As has already been discussed in the chapter on setting up the software,
<literal moreinfo="none">nntpd</literal> is a TCP-based server daemon which runs under
<literal moreinfo="none">inetd</literal>. It is fired by <literal moreinfo="none">inetd</literal>
whenever there's an incoming connection on the NNTP port, and it takes
over the dialogue from there. It reads the C-News configuration and data
files in <literal moreinfo="none">$NEWSCTL</literal>, article files from
<literal moreinfo="none">$NEWSARTSent</literal>, and receives incoming posts and
transfers. These it dutifully queues in
<literal moreinfo="none">$NEWSARTS/in.coming</literal>, either as batch files or single
article files.</para><para>It is important that <literal moreinfo="none">inetd</literal> be configured to
fire <literal moreinfo="none">nntpd</literal> as user <literal moreinfo="none">news</literal>, not as
<literal moreinfo="none">root</literal> like it does for other daemons like
<literal moreinfo="none">telnetd</literal> or <literal moreinfo="none">ftpd</literal>. If this is not
done correctly, a lot of problems can be caused in the functioning of
the C-News system later.</para><para><literal moreinfo="none">nntpd</literal> is fired each time a new NNTP connection
is received, and dies once the NNTP client closes its connection. Thus,
if one <literal moreinfo="none">nntpd</literal> receives a few articles by an incoming
batch feed (not a <literal moreinfo="none">POST</literal> but an <literal moreinfo="none">XFER</literal>),
then another <literal moreinfo="none">nntpd</literal> will not know about the receipt of
these articles till the batches are digested. This will hamper
duplicate newsfeed detection if there are multiple upstream NDNs feeding
our server with the same set of articles over NNTP. To fix this,
<literal moreinfo="none">nntpd</literal> uses an ally: <literal moreinfo="none">msgidd</literal>, the
message ID daemon. This
daemon is fired once at server bootup time through
<literal moreinfo="none">newsboot</literal>, and keeps running quietly in the
background, listening on a named Unix socket in the
<literal moreinfo="none">$NEWSCTL</literal> area. It keeps in its memory a list of all
message IDs which various incarnations of <literal moreinfo="none">nntpd</literal> have
asked it to remember.</para><para>Thus, when one copy of <literal moreinfo="none">nntpd</literal> receives an
incoming feed of news articles, it updates <literal moreinfo="none">msgidd</literal>
with the message IDs of these messages through the Unix socket. When
another copy of <literal moreinfo="none">nntpd</literal> is fired later and the NNTP
client tries to feed it some more articles, the <literal moreinfo="none">nntpd</literal>
checks each message ID against <literal moreinfo="none">msgidd</literal>. Since
<literal moreinfo="none">msgidd</literal> stores all these IDs in memory, the lookup is
very fast, and duplicate articles are blocked at the NNTP interface
itself.</para><para>On a running system, expect to see one instance of
<literal moreinfo="none">nntpd</literal> for each active NNTP connection, and just one
instance of <literal moreinfo="none">msgidd</literal> running quietly in the background,
hardly consuming any CPU resources. Our <literal moreinfo="none">nntpd</literal> is
configured to die if the NNTP connection is more than a few minutes
idle, thus conserving server resources. This does not inconvenience the
user because modern NNTP clients simply re-connect. If an
<literal moreinfo="none">nntpd</literal> instance is found to be running for days, it is
either hung due to a network error, or is receiving a very long incoming
NNTP feed from your upstream server. We used to receive our primary
incoming feed from our service provider through NNTP sessions lasting 18
to 20 hours without a break, every day.</para></section><section><title><literal moreinfo="none">nov</literal>, the News Overview system</title><para>NOV, the News Overview System is a recent augmentation to the
C-News and NNTP systems and to the NNTP protocol. This subsystem
maintains a file for each active newsgroup, in which it maintains one
line per current article. This line of text contains some key meta-data
about the article, <emphasis>e.g.</emphasis> the contents of the
<literal moreinfo="none">From</literal>, <literal moreinfo="none">Subject</literal>,
<literal moreinfo="none">Date</literal> and the article size and message ID. This speeds
up NNTP response enormously. The <literal moreinfo="none">nov</literal> library has been
integrated into the <literal moreinfo="none">nntpd</literal> code, and into key binaries
of C-News, thus providing seamless maintenance of the News Overview
database when articles are added or deleted from the repository.</para><para>When <literal moreinfo="none">newsrun</literal> adds an article into
<literal moreinfo="none">starcom.test</literal>, it also updates
<literal moreinfo="none">$NEWSARTS/starcom/test/.overview</literal> and adds a line with
the relevant data, tab-separated, into it. When <literal moreinfo="none">nntpd</literal>
comes to life with an NNTP client, and it sees the
<literal moreinfo="none">XOVER</literal> NNTP command, it reads this
<literal moreinfo="none">.overview</literal> file, and returns the relevant lines to the
NNTP client. When <literal moreinfo="none">expire</literal> deletes an article, it also
removes the corresponding line from the <literal moreinfo="none">.overview</literal>
file. Thus, the maintenance of the NOV database is seamless.</para></section><section><title>Batching feeds with UUCP and NNTP</title><para>Some information about batching feeds has been provided in earlier
sections. More will be added later here in this document.</para></section></section><section><title>Monitoring and administration</title><para>Once the Usenet News system is in place and running, the news administrator
is then aided in monitoring the system by various reports generated by it.
Also, he needs to make regular checks in specific directories and
files to ascertain the smooth working of the system.</para><section><title>The <literal moreinfo="none">newsdaily</literal> report</title><para>This report is generated by the script <literal moreinfo="none">newsdaily</literal> which is 
typically run through <literal moreinfo="none">cron</literal>. I shall enumerate some of the 
problems that are reported by it, based on my observations .</para><itemizedlist><listitem><para><emphasis>bad input batches</emphasis>:  
    This gives a list of articles that have been declared bad after processing
    and hence have not been digested. The reason for this is not given. You
    are expected to check each article and determine the cause. </para></listitem><listitem><para><emphasis>leading unknown newsgroups by articles</emphasis>:  
    Newsgroup names that do not appear in the <literal moreinfo="none">active</literal> file
    but their hierarchy has been subscribed to, would find their names mentioned
    under this heading. Choose to add the name in the active file if you think 
    it is important. For <emphasis>e.g.</emphasis>, you would see this happen
    if you have subscribed to the hierarchy <literal moreinfo="none">comp</literal> but the
    <literal moreinfo="none">active</literal> does not contain the newsgroup name 
    <literal moreinfo="none">comp.lang.java.3d</literal>. You could deny subscription to this 
    particular newsgroup by specifying so in the <literal moreinfo="none">sys</literal> file.
    </para></listitem><listitem><para><emphasis>leading unsubscribed newsgroups</emphasis>:
    If the news server receives maximum articles of a particular newsgroup
    hierarchy to which you haven't subscribed, it will appear under this
    heading. You really cannot do much about this except to 
    subscribe to them if they are required.</para></listitem><listitem><para><emphasis>leading sites sending bad headers</emphasis>: 
    This will list your NDNs who
    are sending articles with malformed/insufficient headers. </para></listitem><listitem><para><emphasis>leading sites sending stale/future/misdated news</emphasis>: 
    This will list your NDNs who are sending you articles that are older than
    the date you have specified for accepting feeds. </para></listitem><listitem><para>Some of the reports generated by us: 
    We have modified the newsdaily script to include some more statistics. 
    <itemizedlist><listitem><para><emphasis>disk usage</emphasis>: 
	This reports the size in bytes of the <literal moreinfo="none">$NEWSARTS</literal>
	area. If you are receiving feeds regularly, you should see this figure
	increasing.
    </para></listitem><listitem><para><emphasis>incoming feed statistics</emphasis>: 
	This reports the number of articles and total bytes recevied from each 
	of your NDNs.
    </para></listitem><listitem><para><emphasis>NNTP traffic report</emphasis>: 
	The output of nestor has also been included in this report which gives
	details of each <literal moreinfo="none">nntp</literal> connection and the overall 
	performance of the network connection read from the <literal moreinfo="none">newslog
	</literal> file.  To understand the format, read the manpage of 
	<literal moreinfo="none">nestor</literal>.
    </para></listitem></itemizedlist></para></listitem><listitem><para><emphasis>Error reporting from the <literal moreinfo="none">errorlog</literal>
file</emphasis>: 
    Reports  errors logged in the <literal moreinfo="none">errorlog</literal> file. Usually
    these are  file ownership or file missing problems which can be easily
    handled. </para></listitem></itemizedlist></section><section><title>Crisis reports from <literal moreinfo="none">newswatch</literal></title><para>Most of the problems reported to me are those with either space shortage or
persistent locks. There are instances when the scripts have created locks files
and have aborted/terminated without removing them. Sometimes they are 
innocuous enough to be deleted but this should be determined after a careful
analysis. They could be an indication of some part of the system not working
correctly. For <emphasis>e.g.</emphasis> I would receive this error message when
sendbatches would abnormally terminate trying to transmit huge togo files. I had
to determine why sendbatches was failing this often.</para><para>The space shortage issue has to be addressed immediately. You could
delete unwanted articles by running <literal moreinfo="none">doexpire</literal> or add more disk
space at the OS level. </para></section><section><title>Disk space</title><para>The <literal moreinfo="none">$NEWSBIN</literal> area occupies space that is fixed. Since the
binaries do not grow once installed, you do not have to worry about disk 
shortage here. The areas that take up more space as feeds come in are
<literal moreinfo="none">$NEWSCTL</literal> and <literal moreinfo="none">$NEWSARTS</literal>. The 
<literal moreinfo="none">$NEWSCTL</literal> has log files that keep growing with each feed.
As the articles are digested in huge numbers, the <literal moreinfo="none">$NEWSARTS</literal>
area continues to grow. Also, you will need space if you have chosen to archive
articles on expiry.  Allocate a few GB of disk space for
<literal moreinfo="none">$NEWSARTS</literal> depending on the number of hierarchies you are
subscribing and the feeds that come in everyday. <literal moreinfo="none">$NEWSCTL</literal>
grows to a lesser proportion as compared to <literal moreinfo="none">$NEWSARTS</literal>.
Allocate space for this accordingly.</para></section><section><title>CPU load and RAM usage</title><para>With modern C-News and NNTPd, there is very little usage of these
system resources for processing news article flow. Key components like
<literal moreinfo="none">newsrun</literal> or <literal moreinfo="none">sendbatches</literal> do not load
the system much, except for cases where you have a very heavy flow of
compressed outgoing batches and the compression utility is run by
<literal moreinfo="none">sendbatches</literal> frequently. <literal moreinfo="none">newsrun</literal> is
amazingly efficient in the current C-News release. Even when it takes
half an hour to digest a large consignment of batches, it hardly loads the
CPU of a slow Pentium 200 MHz CPU or consumes much RAM in a 64 MB
system.</para><para>One thing which does slow down a system is a large bunch of
users connecting using NNTP to browse newsgroups. We do not have
heuristic based figures off-hand to provide a guidance figure for
resource consumption for this, but we have found that the load on the
CPU and RAM for a certain number of active users invoking
<literal moreinfo="none">nntpd</literal> is more than with an equal number of
users connecting to the POP3 port of the same system for pulling
out mailboxes. A few hundred active NNTP users can really slow down
a dual-P-III Intel Linux server, for instance. This loading has no
bearing on whether you are using INN or <literal moreinfo="none">nntpd</literal>;
both have practically identical implementations for NNTP
<emphasis>reading</emphasis> and differ only in their handling of
feeds.</para><para>Another situation which will slow down your Usenet news server is
when downstream servers connect to you for pulling out NNTP feeds using
the pull method. This has been mentioned before. This can really load
your server's I/O system and CPU.</para></section><section><title>The <literal moreinfo="none">in.coming/bad</literal> directory</title><para>The <literal moreinfo="none">in.coming</literal> directory is where the batches/articles reside when you have 
received feeds from your NDN and before processing happens. Checking this
directory regularly to see if there are batches is a good way of determining
that feeds are coming in. The batches and articles have different nomenclature.
Batches, typically, have names like <emphasis>nntp.GxhsDj</emphasis> and 
individual articles are named beginning with digits like <emphasis>0.10022643380.t</emphasis></para><para>The <literal moreinfo="none">bad</literal> sub-directory under <literal moreinfo="none">in.coming</literal> 
holds batches/articles that have encountered errors when they were being
processed by <literal moreinfo="none">relaynews</literal>. You will have to look at the 
individual files in this directory to determine the cause . Ideally speaking, 
this directory should be empty.</para></section><section><title>Long pending queues in <literal moreinfo="none">out.going</literal></title><para>TO BE ADDED.</para></section><section><title>Problems with <literal moreinfo="none">nntpxmit</literal> and <literal moreinfo="none">nntpsend</literal></title><para>TO BE ADDED.</para></section><section><title>The <literal moreinfo="none">junk</literal> and <literal moreinfo="none">control</literal> groups</title><para>Control messages are those that have a <literal moreinfo="none">newgroup/rmgroup/cancel/checkgroup</literal> in
their subject line. Such messages result in <literal moreinfo="none">relaynews</literal> calling
the appropriate script and on execution a message is mailed to the admin about
the action taken. These control messages are stored in the 
<literal moreinfo="none">control</literal> directory of <literal moreinfo="none">$NEWSARTS</literal>.  For the
propogation of such messages, one must subscribe to the
<literal moreinfo="none">control</literal> hierarchy.</para><para>When your news system determines that a certain article has not been subscribed
by you, it is <quote>junked</quote> i.e. such articles appear in the junk
directory. This
directory plays a key role in transferring articles to your NDNs as they would
subscribe to the junk hierarchy to receive feeds. If you are a leaf node, there
is no reason why articles should pile here. Keep deleting them on a daily 
basis.</para></section></section><section><title>Usenet news clients</title><para>This HOWTO was written to allow a Linux system administrator provide the
Usenet news service to readers of those articles. The rest of this HOWTO
focuses on the server-end software and systems, but one chapter
dedicated to the clients does not seem disproportionate, considering
that the <emphasis>raison d'etre</emphasis> of Usenet news servers is to serve
these clients.</para><para>The overwhelming majority of clients are software programs which access
the article database, either by reading <literal moreinfo="none">/var/spool/news</literal> on a
Unix system or over NNTP, and allow their human users to read and post
articles. We can therefore probably term this class of programs UUA, for
Usenet User Agents, along the lines of MUA for Mail User Agents.</para><para>There are other special-purpose clients, which either pull out
articles to copy or transfer somewhere else, or for analysis,
<emphasis>e.g.</emphasis> a search engine which allows you to search a
Usenet article archive, like Google (<literal moreinfo="none">www.google.com</literal>)
does.</para><para>This chapter will discuss issues in UUA software design, and bring out
essential features and efficiency and management issues. What this
chapter will certainly <emphasis>never</emphasis> attempt to do is catalogue all
the different UUA programs available in the world --- that is best left to
specialised catalogues on the Internet.</para><para>This chapter will also briefly cover special-purpose clients which
transfer articles or do other special-purpose things with them.</para><section><title>Usenet User Agents</title><section><title>Accessing articles: NNTP or spool area?</title><para>TO BE ADDED LATER</para></section><section><title>Threading</title><para>TO BE ADDED LATER</para></section><section><title>Quick reading features</title><para>TO BE ADDED LATER</para></section></section><section><title>Clients that transfer articles</title><para>We will discuss Suck and <literal moreinfo="none">nntpxfer</literal> from the NNTP server
distribution here. Suck has already discussed earlier. We will be happy
to take contributed additions that discuss other client software.</para></section><section><title>Special clients</title><section><title>NNTPCache</title><para>NNTPCache is an interesting transparent cacheing proxy for
    news articles. News articles are read-only by definition,
    <emphasis>i.e.</emphasis> they do not change once they are posted;
    they can only be deleted. NNTPCache uses this feature to build a
    local cache of news articles.</para><para>You set up NNTPCache to listen on the NNTP port of your local
    Unix server, and act like an NNTP daemon. You configure it to
    connect back-to-back to another NNTP daemon, further away, which has
    all the interesting stuff the users want to read. When a user
    connects to the local NNTPCache, it connects to the remote NNTP
    server and acts as a relay for the NNTP connection, ferrying
    commands and responses back and forth. What the user sees therefore
    comes from the remote server, the first time. However, all news
    articles fetched by NNTPCache are also stored in a local cache, thus
    allowing the next user to browse the same set of articles faster.
    Like all demand-driven caches, the advantage here is that the local
    NNTPCache does not need (much) administering, and will automatically
    delete all articles from its cache once they've been lying unread
    long enough.</para><para>We list it here as an NNTP client because every proxy server
    is a server on one side and a client on the other.</para></section></section></section><section><title>Our perspective</title><para>This chapter has been added to allow us to share our perspective on
certain technical choices. Certain issues which are more a matter of
opinion than detail, are discussed here.</para><section id="feedefficiency"><title>Efficiency issues of NNTP</title><para>    To understand why NNTP is often an inappropriate choice for
    newsfeeds, we need to understand TCP's sliding window protocol
    and the nature of NNTP. NNTP is an apalling waste of bandwidth
    for most bulk article transfer situations, because of the
    following simple reasons:</para><itemizedlist><listitem><para>    <emphasis>No compression</emphasis>: articles are transferred in plain text. </para></listitem><listitem><para> 
    <emphasis>No article transmission restart</emphasis>: if a
    connection breaks halfway through an article, the next round
    will have to start with the beginning of the article.</para></listitem><listitem><para> 
    <emphasis>Ping-pong protocol</emphasis>: NNTP is unsuitable for
    bulk streaming data transfer because the TCP sliding window feature
    is unusable with NNTP.</para></listitem></itemizedlist><para>    What is a ping-pong protocol? TCP uses a sliding window mechanism to
    pump out data in one direction very rapidly, and can achieve near
    wire speeds under most circumstances. However, this only works if
    the application layer protocol can aggregate a large amount of data
    and pump it out without having to stop every so often, waiting for
    an ack or a response from the other end's application layer. This is
    precisely why sending one file of 100 Mbytes by FTP takes so much less
    clock time than 10,000 files of 10 Kbytes each, all other parameters
    remaining unchanged. The trick is to keep the sliding window sliding
    smoothly over the outgoing data, blasting packets out as fast as the
    wire will carry it, without ever allowing the window to empty out
    while you wait for an ack.  Protocols which require short bursts of
    data from either end constantly, <emphasis>e.g.</emphasis> in the
    case of remote procedure calls, are called ``ping pong protocols''
    because they remind you of a table-tennis ball.</para><para>    With NNTP, this is precisely the problem. The average size
    of Usenet news messages, including header and body, is
    3 Kbytes. When thousands of such articles are sent out by
    NNTP, the sending server has to send the message ID of the
    first article, then wait for the receiving server to respond
    with a ``yes'' or ``no.'' Once the sending server gets the
    ``yes'', it sends out that article, and waits for an ``ok''
    from the receiving server. Then it sends out the message ID
    of the second article, and waits for another ``yes'' or
    ``no.'' And so on. The TCP sliding window never gets to do
    its job.  </para><para>    This sub-optimal use of TCP's data pumping ability, coupled with
    the absence of compression, make for a protocol which is great
    for synchronous connectivity, <emphasis>e.g.</emphasis> for news
    reading or real-time
    updates, but very poor for batched transfer of data which can be
    delayed and pumped out. All these are precisely reversed in the
    case of UUCP over TCP.</para><para>    To decide which protocol, UUCP over TCP or NNTP, is appropriate
    for your server, you must address two questions:</para><orderedlist inheritnum="ignore" continuation="restarts"><listitem><para> 
    How much time can your server afford to wait from the time
    your upstream server receives an article to the time it
    passes it on to you?</para></listitem><listitem><para> 
    Are you receiving the same set of hierarchies from multiple
    next-door neighbour servers, <emphasis>i.e.</emphasis> is your
    newsfeed flow pattern a mesh instead of a tree?</para></listitem></orderedlist><para>    If your answers to the two questions above are ``messages cannot
    wait'' and ``we operate in a mesh'', then NNTP is the correct
    protocol for your server to receive its primary feed(s). </para><para>    In most cases, carrier-class servers operated by major service
    providers do not want to accept even a minute's delay from the
    time they receive an article to the time they retransmit it out.
    They also operate in a mesh with other servers operated by their
    own organisations (<emphasis>e.g.</emphasis> for redundancy) or
    others. They usually
    sit very close to the Internet backbone,
    <emphasis>i.e.</emphasis> with Tier 1 ISPs,
    and have extremely fast Internet links, usually more than
    10 Mbits/sec. The amount of data that flows out of such servers
    in outgoing feeds is more than the amount that comes in, because
    each incoming article is retained, not for local consumption,
    but for retransmission to others lower down in the flow. And
    these servers boast of a retransmission latency of less than 30
    seconds, <emphasis>i.e.</emphasis> I will retransmit an article
    to you within 30 seconds of my having received it.  </para><para>    However, if your server is used by a company for making Usenet
    news available for its employees, or by an institute to make the
    service available for its students and teachers, then you are
    not operating your server in a mesh pattern, nor do you mind it
    if messages take a few hours to reach you from your upstream
    neighbour. </para><para>    In that case, you have enormous bandwidth to conserve by moving
    to UUCP.  Even if, in this Internet-dominated era, you have no
    one to supply you with a newsfeed using dialup point-to-point
    links, you can pick up a compressed batched newsfeed using UUCP
    over TCP, over the Internet.  </para><para>    In this context, we want to mention Taylor UUCP, an excellent
    UUCP implementation available under GNU GPL. We use this UUCP
    implementation in preference to the bundled UUCP systems offered
    by commercial Unix vendors even for dialup connections, because
    it is far more stable, high performance, and always supports
    file transfer restart. Over TCP/IP, Taylor is the only one we
    have tried, and we have no wish to try any others.  </para><para>    Apart from its robustness, Taylor UUCP has one invaluable
    feature critical to large Usenet batch transfers: file transfer
    restart. If it is transferring a 10 MB batch, and the connection
    breaks after 8 MB, it will restart precisely where it left off
    last time. Therefore, no bytes of bandwidth are wasted, and
    queues never get stuck forever.  </para><para>    Over NNTP, since there is no batching, transfers happen one
    article at a time. Considering the (relatively) small size of an
    article compared to multi-megabyte UUCP batches, one would
    expect that an article would never pose a major problem while
    being transported; if it can't be pushed across in one attempt,
    it'll surely be copied the next time.  However, we have
    experienced entire NNTP feeds getting stuck for days on end
    because of one article, with logs showing the same article
    breaking the connection over and over again while being
    transferred <footnote><para>    This lack of a restart facility is something NNTP shares with
    its older cousin, SMTP, and we have often seen email messages
    getting stuck in a similar fashion over flaky data links. In
    many such networks which we manage for our clients, we have
    moved the inter-server mail transfer to Taylor UUCP, using UUCP
    over TCP.</para></footnote>. Some rare articles can be
    more than a megabyte in size, particularly in
    <literal moreinfo="none">comp.binaries</literal>. In each such incident, we have
    had to manually edit the queue file on the transmitting server
    and remove the offending article from the head of the queue.
    Taylor UUCP, on the other hand, has never given us a single
    hiccup with blocked queues.  </para><para>    We feel that the overwhelming majority of servers offering the
    Usenet news service are at the leaf nodes  of the Usenet news
    flow, not at the heart. These servers are usually connected in a
    tree, with each server having one upstream ``parent node'', and
    multiple downstream ``child nodes.'' These servers receive their
    bulk incoming feed from their upstream server, and their users
    can tolerate a delay of a few hours for articles to move in and
    out. If your server is in this class, we feel you should
    consider using UUCP over TCP and transfer compressed batches.
    This will minimise bandwidth usage, and if you operate using
    dialup Internet connections, it will directly reduce your
    expenses.  </para><para>    A word about the link between mesh-patterned newsfeed flow and
    the need to use NNTP. If your server is receiving primary ---
    as against trickle --- feeds from multiple next-door neighbours,
    then you have to use NNTP to receive these feeds. The reason
    lies in the way UUCP batches are accepted. UUCP batches are
    received in their entirety into your server, and then they are
    uncompressed and processed. When the sending server is giving
    you the batch, it is not getting a chance to go through the
    batch article by article and ask your server whether you have or
    don't have each article. This way, if multiple servers give you
    large feeds for the same hierarchies, then you will be bound to
    receive multiple copies of each article if you go the UUCP way.
    All the gains of compressed batches will then be neutralised.
    NNTP's <literal moreinfo="none">IHAVE</literal> and <literal moreinfo="none">SENDME</literal>
    dialogue in effect
    permits precisely this double-check for each article, and thus
    you don't receive even a single  article twice. </para><para>    For Usenet servers which connect to the Internet periodically
    using dialup connections to fetch news, the UUCP option is
    especially important. Their primary incoming newsfeed cannot be
    pushed into them using queued NNTP feeds for reasons described
    in the above <link linkend="dialupnonntp">paragraph</link>
    These
    hapless servers are usually forced to pull out their articles
    using a pull NNTP feed, which is often very slow. This may lead
    to long connect times, repeat attempts after every line break,
    and high Internet connection charges.  </para><para>    On the other hand, we have been using UUCP over TCP and
    <literal moreinfo="none">gzip</literal>'d batches for more than five years now
    in a variety of sites. Even today, a full feed of all eight
    standard hierarchies, plus the full
    <literal moreinfo="none">microsoft</literal>, <literal moreinfo="none">gnu</literal>
    and <literal moreinfo="none">netscape</literal> hierarchies, minus 
    <literal moreinfo="none">alt</literal> and <literal moreinfo="none">comp.binaries</literal>, can
    comfortably be handled in just a few hours of connect time every
    night, dialing up to the 
    Internet at 33.6 or 56 Kbits/sec. We believe that the proverbial
    `full feed' with all hierarchies including
    <literal moreinfo="none">alt</literal> can be handled comfortably with a 24-hour
    link at 56 Kbits/sec, provided you forget about NNTP feeds. We
    usually get compression ratios of 4:1 using
    <literal moreinfo="none">gzip -9</literal> on our news batches, incidentally. </para></section><section><title>C-News+NNTPd or INN?</title><para>INN and CNews are the two most popular free software implementations
of Usenet news. Of these two, we prefer CNews, primarily because
we have been using it across a very large range of Unixen for more
than one decade, starting from its earliest release --- the so-called
``Shellscript release'' --- and we have yet to see a need to
change.<footnote><para>One of us did his first installation with with BNews,
actually, at the IIT Mumbai. Then we rapidly moved from there to CNews
Shellscript Release, then CNews Performance Release, CNews Cleanup
Release, and our current release has fixed some bugs in the latest
Cleanup Release.</para></footnote></para><para>We have seen INN, and we are not comfortable with a software
implementation which puts in so much of functionality inside one
executable. This reminds us of Windows NT, Netscape Communicator,
and other complex and monolithic systems, which make us uncomfortable
with their opaqueness. We feel that CNews' architecture, which comprises
many small programs, intuitively fits into the Unix approach of building
large and complex systems, where each piece can be understood, debugged,
and if needed, replaced, individually.</para><para>Secondly, we seem to see the move towards INN accompanied by a move
towards NNTP as a primary newsfeed mechanism. This is no fault of INN;
we suspect it is a sort of cultural difference between INN users and
CNews users.  We find the issue of UUCP versus NNTP for batched newsfeeds
a far more serious issue than the choice of CNews versus INN. We simply
cannot agree with the idea that NNTP is an appropriate protocol for bulk
Usenet feeds for most sites. Unfortunately, we seem to find that most
sites which are more comfortable using INN seem to also prefer NNTP over
UUCP, for reasons not clear to us.</para><para>Our comments should not be taken as expressing any reservation about
INN's quality or robustness. Its popularity is testimony to its
quality; it most certainly ``gets the job done'' as well as anything
else. In addition, there are a large number of commercial Usenet news
server implementations which have started with the INN code; we do not
know of any which have started with the CNews code. The Netwinsite DNews
system and the Cyclone Typhoon, we suspect, both are INN-spired.</para><para>We will recommend CNews and NNTPd over INN, because we are more
comfortable with the CNews architecture for reasons given above, and we
do not run carrier-class sites. We will continue to support, maintain and
extend this software base, at least for Linux.  And we see no reason for
the overwhelming majority of Usenet sites to be forced to use anything
else. Your viewpoints welcome.</para><para>Had we been setting up and managing carrier-class sites with their
near-real-time throughput requirements, we would probably not have
chosen CNews. And for those situations, our opinion of NNTP versus
compressed UUCP has been discussed in <xref linkend="feedefficiency"></xref>ent</para><para>Suck and Leafnode have their place in the range of options, where they
appear to be attractive for novices who are intimidated by the ``full
blown'' appearance of CNews+NNTPd or INN. However, we run CNews + NNTPd
even on Linux laptops. We suspect INN can be used this way too. We do
not find these ``full blown'' implementations any more resource
hungry than their simpler cousins. Therefore, other than administration
and configuration familiarity, we don't see any other reason why even a
solitary end-user will choose Leafnode or Suck over CNews+NNTPd. As
always, contrary opinions invited.</para></section></section><section id="softwarehistory" xreflabel="Usenet software: a historical perspective"><title>Usenet software: a historical perspective</title><para>This section comprises excerpts from a well-known Usenet Periodic
Posting document which was last changed in Feb 1998. Our copy of that old
document was picked up from</para><para><literal moreinfo="none">ftp://rtfm.mit.edu/pub/usenet-by-hierarchy/news/software/b/Usenet_Software:_History_and_Sources</literal></para><para>We suspect other copies will also be found elsewhere. The physical
file on the FTP server appears to have been touched last on 29 Dec
1999. The first few lines of the archived file provide information about
the origin of this document and its authors:</para><programlisting format="linespecific">Date: Tue, 28 Dec 1999 09:00:19 GMT
Supersedes: entFMMECL.58s@tac.nyc.ny.usent
Expires: Fri, 28 Jan 2000 09:00:19 GMT
Message-ID: entFnG10J.HAo@tac.nyc.ny.usent
From: netannounce@deshaw.com (Mark Moraes)
Subject: Usenet Software: History and Sources
Newsgroups: news.admin.misc,news.announce.newusers,news.software.readers,news.software.b,news.answers
Followup-To: news.newusers.questions
Approved: netannounce@deshaw.com (Mark Moraes)

Archive-name: usenet/software/part1
Original-from: spaf@cs.purdue.edu (Gene Spafford)
Comment: edited until 5/93 by spaf@cs.purdue.edu (Gene Spafford)
Last-change: 9 Feb 1998 by netannounce@deshaw.com (Mark Moraes)
Changes-posted-to: news.admin.misc,news.misc,news.software.readers,news.software.b,news.answers</programlisting><para>We have been seeing this document as a periodic posting in
<literal moreinfo="none">news.announce.newusers</literal> since the early nineties, and
it has always been our final reference on the history of Usenet
server software. We reproduce excerpts below, retaining the portions
which discuss server software, and removing discussions of client
software, newsreaders, software for non-Unix operating systems,
<emphasis>etc.</emphasis> All quoted portions are reproduced unedited
other than changing FTP file paths to the modern URL format.  We have
added our comments emphasised, in separate paragraphs.  We feel the
information captured here is essential reading for anyone interested in
Usenet server software.</para><para>If anyone can point us to a fresher version of this document, in
case it is still maintained, we will be happy to refer to that version
instead of this one, though we suspect the reader will not suffer due to
the four-year gap; most of the information reproduced below is
historical anyway.</para><section><title>The quoted excerpts</title><para>Currently, Usenet readers interact with the news using a number of
software packages and programs.  This article mentions the important
ones and a little of their history, gives pointers where you can look
for more information and ends with some special notes about ``foreign''
and ``obsolete'' software.  At the very end is a list of sites from which
current versions of the Usenet software may be obtained.</para><para> ... </para><section><title>History</title><para>Usenet came into being in late 1979, shortly after the release of V7
Unix with UUCP.  Two Duke University grad students in North Carolina,
Tom Truscott and Jim Ellis, thought of hooking computers together to
exchange information with the Unix community.  Steve Bellovin, a grad
student at the University of North Carolina, put together the first
version of the news software using shell scripts and installed it on the
first two sites: <literal moreinfo="none">unc</literal> and <literal moreinfo="none">duke</literal>. At
the beginning of 1980 the network consisted of those two sites and
<literal moreinfo="none">phs</literal> (another machine at Duke), and was described
at the January Usenix conference.  Steve Bellovin later rewrote
the scripts into C programs, but they were never released beyond
<literal moreinfo="none">unc</literal> and <literal moreinfo="none">duke</literal>. Shortly thereafter,
Steve Daniel did another implementation in C for public distribution.
Tom Truscott made further modifications, and this became the ``A'' news
release.</para><para>In 1981 at U. C. Berkeley, grad student Mark Horton and high school
student Matt Glickman rewrote the news software to add functionality
and to cope with the ever increasing volume of news -- ``A'' News was
intended for only a few articles per group per day.  This rewrite was
the ``B'' News version.  The first public release was version 2.1 in
1982; the 1.* versions were all beta test.  As the net grew, the news
software was expanded and modified.  The last version maintained and
released primarily by Mark was 2.10.1.</para><para>Rick Adams, at the Center for Seismic Studies, took over
coordination of the maintenance and enhancement of the B News software
with the 2.10.2 release in 1984.  By this time, the increasing volume
of news was becoming a concern, and the mechanism for moderated groups
was added to the software at 2.10.2.  Moderated groups were inspired by
ARPA mailing lists and experience with other bulletin board systems.
In late 1986, version 2.11 of B News was released, including a number
of changes to support a new naming structure for newsgroups, enhanced
batching and compression, enhanced <literal moreinfo="none">ihave/sendme</literal>
control messages, and other features.</para><para>The final release of B News was 2.11, patchlevel 19.  B News has
been declared ``dead'' by a number of people, including Rick Adams, and
is unlikely to be upgraded further; most Usenet sites are using C News
or INN (see next paragraphs).</para><para>In March 1986 a package was released implementing news transmission,
posting, and reading using the Network News Transfer Protocol (NNTP)
(as specified in RFC 977).  This protocol allows hosts to exchange
articles via TCP/IP connections rather than using the traditional UUCP.
It also permits users to read and post news (using a modified news user
agent) from machines which cannot or choose not to install the Usenet
news software.  Reading and posting are done using TCP/IP messages to a
server host which does run the Usenet software.  Sites which have many
workstations like the Sun and SGI, and HP products find this a convenient
way to allow workstation users to read news without having to store
articles on each system.  Many of the Usenet hosts that are also on the
Internet exchange news articles using NNTP because the load impact of NNTP
is much lower than UUCP (and NNTP ensures much faster propagation).</para><para><emphasis>Our comments: This remark about relative loadings
    of UUCP and NNTP is no longer applicable with faster machines and
    networks, and with hugely increased traffic volumes. Today's desktop
    computers, let alone servers, can all handle both NNTP and UUCP loads
    effortlessly, if traffic volumes can be restricted. This is partly
    due to performance enhancements to UUCP as embodied in Taylor UUCP,
    and partly due to vastly faster processors.</emphasis></para><para>NNTP grew out of independent work in 1984-1985 by Brian Kantor
at U.  C.  San Diego and Phil Lapsley at U. C. Berkeley.  Primary
development was done at U. C. Berkeley by Phil Lapsley with help from
Erik Fair, Steven Grady, and Mike Meyer, among others.  The NNTP package
(now called the reference implementation) was distributed on the 4.3BSD
release tape (although that was version 1.2a and out-of-date) and is
also available on many major hosts by anonymous FTP.  The current
version is 1.5.12.2.  It includes NOV (News Overview -- see below)
support and runs on a wide variety of systems.  It is available from
<literal moreinfo="none">ftp.academ.com:/pub/nntp1.5/nntp.1.5.12.2.tar.gz</literal>.
For those with access to the World-Wide Web on the Internet, the
WWW page <literal moreinfo="none">http://www.academ.com/academ/nntp.html</literal>
contains a description and news about NNTP.  A different
variant, called nntp-t5, implements many of the extensions
provided by INN (including NOV support).  It is available from
<literal moreinfo="none">ftp.uu.net:/networking/news/nntp/nntp-t5.tar.gz</literal>.</para><para>One widely-used version of news, known as C News, was developed
at the University of Toronto by Geoff Collyer and Henry Spencer.  This
version is a rewrite of the lowest levels of news to increase article
processing speed, decrease article expiration processing and improve the
reliability of the news system through better locking, etc.  The package
was released to the net in the autumn of 1987.  For more information,
see the paper ``News Need Not Be Slow,'' published in The Winter 1987
Usenix Technical Conference proceedings.  This paper is also available
from <literal moreinfo="none">ftp://ftp.cs.toronto.edu/doc/programming/c-news.*,</literal>
and is recommended reading for all news software programmers.  The most
recent version of C News is the Sept 1994 ``Cleanup Release.''  C News
can be obtained by anonymous ftp from its official archive site,
<literal moreinfo="none">ftp.cs.toronto.edu:pub/c-news/c-news.tar.Z</literal>.  </para><para><emphasis>Our comments: C News is no longer maintained by
    anyone that we know, other than ourselves. However, after fixing
    the remaining bugs in the source, we have not found the need for
    further maintenance.  NNTPd from Brian Kantor and Phil Lapsley is
    in the same state, but we are working on enhancements to the source
    for access control and other functionality.</emphasis></para><para>Another Usenet system, known as InterNetNews, or INN, was written by
Rich Salz <literal moreinfo="none">(rsalz@uunet.uu.net)</literal>.  INN is designed to run
on Unix hosts that have a socket interface.  It is optimized for larger
hosts where most traffic uses NNTP, but it does provide full UUCP support.
INN is very fast, and since it integrates NNTP many people find it easier
to administer only one package.  The package was publicly released on
August 20, 1992.  For more information, see the paper ``InterNetNews:
Usenet Transport for Internet Sites'' published in the June 1992
Usenix Technical Conference Proceedings.  INN can be obtained
from many places, including the 4.4BSD tape; its official
archive site is <literal moreinfo="none">ftp.uu.net</literal> in the directory
<literal moreinfo="none">/networking/news/nntp/inn</literal>.  Rich's last official
release was 1.4sec in Dec 1993.</para><para><emphasis>Our comments: The original paper by Rich Salz about
    INN, where he proposed the design of an alternate Usenet server
    software, is a must-read for readers interested in Usenet server
    software. So is the paper by C News authors, cited before it. Most of
    the issues that Rich Salz had with C News, as stated in his paper,
    were very relevant at that time. Today, with the current version of
    NNTPd and the incorporation of the message ID daemon and NOV, these
    issues are no longer relevant, and the choice of C News+NNTPd versus
    INN is now based more on the level of maintenance of source code,
    familiarity and personal preferences than on core design factors.
    </emphasis></para><para>In June 1995, David Barr began a series of unoffical releases
of INN based on 1.4sec, integrating various bug-fixes, enhancements
and security patches.  His last release was 1.4unoff4, found in
<literal moreinfo="none">ftp://ftp.math.psu.edu:/pub/INN</literal>.  This site is also the
home of contributed software for INN and other news administration
tools.</para><para>INN is now maintained by the Internet Software Consortium
<literal moreinfo="none">(inn@isc.org)</literal>.  The official INN home is now
<literal moreinfo="none">http://www.isc.org/isc/</literal> and the latest version (1.7.2)
can be obtained from <literal moreinfo="none">ftp://ftp.isc.org/isc/inn/</literal>.</para><para><emphasis>Our comments: The URL for the INN home page above
    is probably incorrect. Try http://www.isc.org/products/INN/.
    </emphasis></para><para>Towards the end of 1992, Geoff Collyer implemented NOV (News
Overview): a database that stores the important headers of all news
articles as they arrive.  This is intended for use by the implementors
of news readers to provide fast article presentation by sorting and
``threading'' the article headers.  (Before NOV, newsreaders like
<literal moreinfo="none">trn</literal>, <literal moreinfo="none">tin</literal> and <literal moreinfo="none">nn</literal>
came with their own daemons and databases that used a nontrivial amount
of system resources).  NOV is fully supported by C News, INN and NNTP-t5.
Most modern news readers use NOV to get information for their threading
and article menu presentation; use of NOV by a newsreader is fairly easy,
since NOV comes with sample client-side threading code.</para><para> ... </para><para>Details on many other mail and news readers for MSDOS, Windows and
OS/2 systems can be found in the FAQ posted to
<literal moreinfo="none">comp.os.msdos.mail-news</literal>.</para><programlisting format="linespecific">  entftp://rtfm.mit.edu/pub/usenet/comp.os.msdos.mail-news/introent
  entftp://rtfm.mit.edu/pub/usenet/comp.os.msdos.mail-news/softwareent</programlisting></section><section><title>Newsfeed management software</title><para><emphasis role="bold">Gup</emphasis>, the Group Update Program is a Unix
mail-server program that lets a remote site change their newsgroups
subscription on their news feed without requiring the intervention of
the news administrator at the feed site.  Gup operates with the INN
(and likely the C News) batching mechanisms.  The news administrators
at the remote sites simply mail commands to gup to make changes to
their own site's subscription list.  The mail/interface is password
protected.  Gup checks the requests for valid newsgroup names,
patterns that have no effect and so on. Gup's authors are Mark
Delany (<literal moreinfo="none">markd@mira.net.au</literal>) and Andrew Herbert
(<literal moreinfo="none">andrew@mira.net.au</literal>).  Its official FTP location
is <literal moreinfo="none">ftp.mira.net.au:/unix/news/gup-0.4.tar.gz</literal>,
but since that's not as well connected as UUNET, people are strongly
advised to obtain it from a mirror site, <emphasis>e.g.</emphasis>
<literal moreinfo="none">ftp.uu.net:/networking/news/misc/gup-0.4.tar.gz</literal>.</para><para><emphasis role="bold"><literal moreinfo="none">dynafeed</literal></emphasis> is
a package from Looking Glass Software Limited that maintains a
<literal moreinfo="none">.newsrc</literal> for every remote site and generates the batches
for them.  Remote sites can use UUCP or run a program to change their
<literal moreinfo="none">.newsrc</literal> dynamically.  It comes with a program that the
remote site can run to monitor readership in newsgroups and dynamically
update the feed list to match reader interest.  The goal of this is
to get a feed that sends only exactly the groups currently being read.
<literal moreinfo="none">dynafeed</literal> can be obtained from
<literal moreinfo="none">ftp://ftp.clarinet.com/sources/dynafeed.tar.Z</literal>.</para></section><section><title>News processing software</title><para>Software also exists to automatically archive Usenet newsgroups.
The package <literal moreinfo="none">rkive</literal>, written by Kent Landfield
(<literal moreinfo="none">kent@sterling.com</literal>) can be configured to archive
news automatically based on different headers -- Archive-Name,
Volume-Issue, Chronological, Subject and External-Command to
name a few.  It can be run in batch mode from the command line or
from cron.  It can also be installed in the <literal moreinfo="none">sys</literal>
or <literal moreinfo="none">newsfeeds</literal> file to process articles as they are
received.  <literal moreinfo="none">rkive</literal> supports local spool directories as
well as NNTP based access.  <literal moreinfo="none">rkive</literal> is available via
FTP from <literal moreinfo="none">ftp://ftp.sterling.com/rkive</literal>.  </para><para>Newsclip is a programming language for writing news
filtering programs, from Looking Glass Software Limited, marketed
by ClariNet Communications Corp.  It is C-like, and translates to
C, so a C compiler is required.  It has data-types to represent
the kinds of things found in article headers and bodies.  It can
maintain databases of users, message-ids, patterns, subjects, etc.
These can be used to decide whether to ignore or select an article.
Newsclip can either operate as a standalone program or as part
of rn. It is free for non-commercial use and is available from
<literal moreinfo="none">ftp://ftp.clarinet.com/sources/nc.tar.Z</literal>.  Contact
<literal moreinfo="none">clari-info@clarinet.com</literal> with a subject line of
``newsclip'' for more info.</para></section><section><title>Commercial software</title><para>DNEWS is a commercial product from NetWin.  DNEWS licenses
are provided free to educational institutions for non profit
use. With DNEWS, the news is stored in a database so as not to
overload the raw file system.  DNEWS supports 'sucking' where only
groups which users read are pulled over from the feeder site. DNEWS
is currently known to run on VMS, Windows NT, Solaris, SunOS,
Unixware, HP/UX.  DNEWS binaries are available by anonymous FTP from
<literal moreinfo="none">ftp://ftp.std.com/ftp/vendors/netwin/dnews</literal> or from
<literal moreinfo="none">http://world.std.com/~netwin/</literal> DNEWS sources can be
obtained on request, see the file <literal moreinfo="none">source.txt</literal> in the
FTP area for more information.</para><para><emphasis>Our comments: The information on DNEWS may be dated. We
have been seeing DNEWS on their own Website for quite a few years now.
Check <literal moreinfo="none">www.netwinsite.com</literal>. Moreover, there are other
commercial Usenet server software systems available, including the one
bundled with the Internet Information Server of Microsoft Windows NT and
the ones from iPlanet. And for carrier class systems, there are many
commercial Usenet routers available.</emphasis></para></section><section><title>Special note on ``notes'' and old versions of news</title><para> ... </para><para>``B'' news software is currently considered obsolete.  Unix sites
joining the Usenet should install C news or INN to ensure proper
behavior and good performance.  Most old B news software had
compiled-in limits on the number of newsgroups and the number of
articles per newsgroup; the increasing volume of news means that B
news software cannot reliably cope with a moderately-full newsfeed.</para></section></section></section><section><title>Documentation, information and further reading</title><para>This section fills in gaps which were hard to classify under any
of the previous chapters.</para><section><title>The manpages</title><para>The following manpages are installed automatically when our
integrated software distribution is compiled and installed, listed here
in no particular order:</para><itemizedlist><listitem><para><literal moreinfo="none">badexpiry:</literal>
utility to look for articles with bad explicit Expiry headers</para></listitem><listitem><para><literal moreinfo="none">checkactive:</literal>
utility to perform some sanity checks on the <literal moreinfo="none">active</literal>
file</para></listitem><listitem><para><literal moreinfo="none">cnewsdo:</literal>
utility to perform some checks and then run C-News maintenance commands</para></listitem><listitem><para><literal moreinfo="none">controlperm:</literal>
configuration file for controlling responses to Usenet control messages</para></listitem><listitem><para><literal moreinfo="none">expire:</literal>
utility to expire old articles</para></listitem><listitem><para><literal moreinfo="none">explode:</literal>
internal utility to convert a master batch file to ordinary batch files</para></listitem><listitem><para><literal moreinfo="none">inews:</literal>
the program which forms the entry point for fresh postings to be
injected into the Usenet system</para></listitem><listitem><para><literal moreinfo="none">mergeactive:</literal>
utility to merge one site's newsgroups to another site's
<literal moreinfo="none">active</literal> file</para></listitem><listitem><para><literal moreinfo="none">mkhistory:</literal>
utility to rebuild news <literal moreinfo="none">history</literal> file</para></listitem><listitem><para><literal moreinfo="none">news(5):</literal>
description of Usenet news article file and batch file formats</para></listitem><listitem><para><literal moreinfo="none">newsaux:</literal>
a collection of C-News utilities used by its own scripts and by the
Usenet news administrator for various maintenance purposes</para></listitem><listitem><para><literal moreinfo="none">newsbatch:</literal>
covers all the utilities and programs which are part of the news
batching system of C-News</para></listitem><listitem><para><literal moreinfo="none">newsctl:</literal>
describes the file formats and uses of all the files in
<literal moreinfo="none">$NEWSCTL</literal> other than the two key files,
<literal moreinfo="none">sys</literal> and <literal moreinfo="none">active</literal></para></listitem><listitem><para><literal moreinfo="none">newsdb:</literal>
describes the key files and directories for news articles, including the
structure of <literal moreinfo="none">$NEWSARTS</literal>, the <literal moreinfo="none">active</literal>
file, the <literal moreinfo="none">active.times</literal> file, and the
<literal moreinfo="none">history</literal> file.</para></listitem><listitem><para><literal moreinfo="none">newsflag:</literal>
utility to change the flag or type column of a newsgroup in the
<literal moreinfo="none">active</literal> file</para></listitem><listitem><para><literal moreinfo="none">newsmail:</literal>
utility scripts used to send and receive newsfeeds by email. This is
different from a mail-to-news gateway, since this is for communication
between two Usenet news servers.</para></listitem><listitem><para><literal moreinfo="none">newsmaint:</literal>
utility scripts used by Usenet administrator to manage and maintain
C-News system</para></listitem><listitem><para><literal moreinfo="none">newsoverview(5):</literal>
file formats for the NOV database</para></listitem><listitem><para><literal moreinfo="none">newsoverview(8):</literal>
library functions of the NOV library and the utilities which use them</para></listitem><listitem><para><literal moreinfo="none">newssys:</literal>
the important <literal moreinfo="none">sys</literal> file of C-News</para></listitem><listitem><para><literal moreinfo="none">relaynews:</literal>
the <literal moreinfo="none">relaynews</literal> program of C-News</para></listitem><listitem><para><literal moreinfo="none">report:</literal>
utility to generate and send email reports of errors and events from
C-News scripts</para></listitem><listitem><para><literal moreinfo="none">rnews:</literal>
receive news batches and queue them for processing</para></listitem><listitem><para><literal moreinfo="none">nntpd:</literal>
The NNTP daemon</para></listitem><listitem><para><literal moreinfo="none">nntpxmit:</literal>
The NNTP batch transmit program for outgoing push feeds</para></listitem></itemizedlist></section><section><title>Papers, documents, articles</title><para>There are certain documents and published conference papers which
are a must-read for Usenet server administrators, both for their
historical value and for the insight they give into Usenet server
architecture in general. We list our chart-toppers here.</para><section><title>The Usenix paper on C News</title><para>This very interesting paper has been mentioned in the section titled
<quote><xref linkend="softwarehistory"></xref>ent</quote>. It is titled ``News
Need Not Be Slow'', and is available from
<literal moreinfo="none">ftp://ftp.cs.toronto.edu/doc/programming/c-news.*</literal> or
from our Website
(<literal moreinfo="none">http://www.starcomsoftware.com/proj/usenet/doc/c-news.{ps,pdf}</literal>).</para><para>It focuses on B News, analyses it for performance, and
demonstrates how specific changes in design and implementation can speed
things up. It is well-written, and is educative in many areas
independent of Usenet news.</para></section><section><title>The Usenix paper on INN</title><para>This paper talks about the things that C News didn't address,
and takes Usenet news processing into the world of pure Internet
connectivity. Its author is Rich Salz, the author of INN, and the paper
is titled ``InterNetNews: Usenet Transport for Internet Sites.'' This
can be picked up from
<literal moreinfo="none">ftp://ftp.uu.net/networking/news/nntp/inn/inn.usenix.ps.Z</literal>
or from our Website
(<literal moreinfo="none">http://www.starcomsoftware.com/proj/usenet/doc/inn.usenix.{ps,pdf}</literal>),
uncompressed.
Be warned: this PostScript file is probably missing some mandatory
first-line tag like <literal moreinfo="none">%!PS-Adobe-1.0</literal> and some
PostScript processors can have problems with it. For instance, on our
Linux boxes, <literal moreinfo="none">ghostview</literal> can display it, but
<literal moreinfo="none">kghostview</literal> can't, which is very strange.</para><para>This paper analyses the world of Usenet servers with C News and
NNTPd, in the presence of multiple parallel feeds, and proceeds to build
a case for a powerful NNTP-optimised software architecture which will
handle multiple parallel incoming NNTP feeds efficiently. What later INN
users appear to miss sometimes when comparing C-News+NNTPd with INN, is
that INN's strengths are <emphasis>only</emphasis> in situations which
its author had specifically targeted, <emphasis>i.e.</emphasis> multiple
parallel incoming NNTP feeds. There is no clear superiority of one
system over the other in any other situation.</para></section><section><title>The C News guide</title><para>This document is part of the C-News source, and is available in
the <literal moreinfo="none">c-news/doc</literal> directory of the source tree. The
<literal moreinfo="none">makefile</literal> here uses <literal moreinfo="none">troff</literal> and the
source files to generate <literal moreinfo="none">guide.ps</literal>. This C News Guide
is a very well-written document and provides an introduction to the
functioning of C News.</para></section></section><section><title>O'Reilly's books on Usenet news</title><para>O'Reilly and Associates had an excellent book that can form the
foundations for understanding C-News and Usenet news in general, titled
``Managing UUCP and Usenet,'' dated 1992. This was considered a bit
dated because it did not cover INN or the Internet protocols.</para><para>They have subsequently published a more recent book, titled
``Managing Usenet,'' written by Henry Spencer, the co-author of C-News,
and David Lawrence, one of the most respected Usenet veterans and
administrators today. The book was published in 1998 and includes both
C-News and INN.</para><para>We have a distinct preference for books published by O'Reilly; we
usually find them the best books on their subjects. We make no attempts
to hide this bias. We recommend both books. In fact, we believe that
there is very little of value in this HOWTO for someone who studies one
of these books and then peruses information on the Internet.</para></section><section><title>Usenet-related RFCs</title><para>TO BE ADDED</para></section><section><title>The source code</title><para>TO BE ADDED</para></section><section><title>Usenet newsgroups</title><para>There are many discussion groups on the Usenet dedicated to the
technical and non-technical issues in managing a Usenet server and
service. These are:</para><itemizedlist><listitem><para><literal moreinfo="none">news.admin.technical</literal>
Discusses technical issues about administering Usenet news</para></listitem><listitem><para><literal moreinfo="none">news.admin.policy</literal>
Discusses policy issues about Usenet news</para></listitem><listitem><para><literal moreinfo="none">news.software.b</literal>
Discusses C-News (no separate newsgroup was created after B-News gave
way to C-News) source, configuration and bugs (if any)</para></listitem></itemizedlist><para>MORE WILL BE ADDED LATER</para></section><section><title>We</title><para>We, at Starcom Software, offer the services of our Usenet news
team to provide assistance to you by email, as a service to the Linux
and Usenet administrator community, on a best effort basis.</para><para>We also offer you an integrated source distribution
of C News, NNTPd, as discussed earlier in the section titled
<quote><xref linkend="settingup"></xref>ent</quote>. This integrated
source distribution fixes some bugs in the component packages it
includes, and it comes pre-configured with ready made configuration
files which allow all components to be compiled and installed
on a Linux server in a manner by which they can work together
(<emphasis>e.g.</emphasis> key directory paths are specified consistently
across all components, <emphasis>etc.</emphasis>) This is available at
<literal moreinfo="none">http://www.starcomsoftware.com/proj/usenet/src/</literal></para><para>The URL
<literal moreinfo="none">http://www.starcomsoftware.com/proj/usenet/src/archives/</literal>
holds the original sources of some of the software components we base our
distribution on. These include C News (<literal moreinfo="none">c-news.tar.Z</literal>),
NNTPd (<literal moreinfo="none">nntp.1.5.12.1.tar.Z</literal>), and Nestor
(<literal moreinfo="none">nestor.tar.Z</literal>). Other components, like
<literal moreinfo="none">pgpverify</literal> are maintained by their current maintainers
and can be obtained from their respective sites. Therefore, they are not
included in our archives.</para><para>The URL
<literal moreinfo="none">http://www.starcomsoftware.com/proj/usenet/doc/</literal>
carries copies of some of the important technical articles and Usenix
papers on the subject of the Usenet.</para><para>We will endeavour to answer all queries sent to
<literal moreinfo="none">usenet@starcomsoftware.com</literal>, pertaining to the source
distribution we have put together and its configuration and maintenance,
and also pertaining to general technical issues related to running a
Usenet news service off a Unix or Linux server.</para><para>We may not be in a position to assist with software components we
are not familiar with, <emphasis>e.g.</emphasis> Leafnode, or platforms
we do not have access to, <emphasis>e.g.</emphasis> SGI IRIX. Intel
Linux will be supported as long as our group is alive; our entire office
runs on Linux servers and diskless Linux desktops.</para><para>You are not forced to be dependent on us, because neither do we
have proprietary knowledge nor proprietary closed-source software. All
the extensions we are currently involved in with C-News and NNTPd will
immediately be made available to the Internet in freely redistributable
source form.</para></section></section><section><title>Wrapping up</title><section><title>Acknowledgements</title><para>This HOWTO is a by-product of many years of experience setting up and
managing Usenet news servers. We have learned a lot from those who have
trod the path ahead of us. Some of them include the team of the ERNET
Project (Educational and Research Network), which brought the Internet
technology to India's academic institutions in the early
nineties. We specially remember what we have learned from the
<emphasis>SIGSys</emphasis> Group of the Department of Computer Science
of the Indian Institute of Technology, Mumbai. We have also benefited
enormously from the guidance we received from the Networking Group at
the NCST (National Centre for Software Technology) in Mumbai, specially
from Geetanjali Sampemane.  </para><para>On a wider scale, our learning along the path of systems and
networks started with Unix, without which our appreciation of computer
systems would have remained very fragmented and superficial. Our insight
into Unix came from our ``Village Elders'' in the Department of Computer
Science of the IIT (Indian Institute of Technology) at Mumbai, specially
from ``Hattu,'' ``Sathe,'' and ``Sapre,'' none of whom are with the IIT
today, and from Professor D. B. Phatak and others, many of whom, luckily
are still with the Institute.</para><para>Coming to Starcom, all the members of Starcom Software who
have worked on various problems with networking, Linux, and Usenet news
installations have helped the authors in understanding what works and
what doesn't. Without their work, this HOWTO would have been a dry text
book.</para><para>Hema Kariyappa co-authored the first couple of versions of this
HOWTO, starting with v2.0.</para></section><section><title>Comments invited</title><para>Your comments and contributions are invited. We cannot possibly
write all sections of this HOWTO based on our knowledge alone. Please
contribute all you can, starting with minor corrections and bug fixes
and going on to entire sections and chapters. Your contributions will be
acknowledged in the HOWTO.</para></section><section><title>Copyright</title><para><emphasis role="bold">Copyright (c) 2002 Starcom Software Private Limited,
India</emphasis></para><para>Please freely copy and distribute (sell or give away) this
document in any format. It is requested that corrections and/or
comments be fowarded to the document maintainer, reachable at
<literal moreinfo="none">usenet@starcomsoftware.com</literal>. When these comments
and contributions are incorporated into this document and released
for distribution in future versions of this HOWTO, the content of the
incorporated text will become the copyright of Starcom Software Private
Limited. By submitting your contributions to us, you implicitly agree
to these terms.</para><para>You may create a derivative work and distribute it provided that
you:</para><orderedlist inheritnum="ignore" continuation="restarts"><listitem><para>    Send your derivative work (in the most suitable format such as SGML) to the 
    LDP (Linux Documentation Project) or the like for posting on the Internet. 
    If not the LDP, then let the LDP know where it is available.</para></listitem><listitem><para>    License the derivative work with this same license or use GPL.
    Include a copyright notice and at least a pointer to the licence
    used. </para></listitem><listitem><para>    Give due credit to previous authors and major contributors.
    If you are considering making a derived work other than a
    translation, it is requested that you discuss your plans with the 
    current maintainer.</para></listitem></orderedlist></section><section><title>About Starcom Software Private Limited</title><para><emphasis role="bold">starcom</emphasis> (Starcom Software Private
Limited, <literal moreinfo="none">www.starcomsoftware.com</literal>) has been building
products and solutions using Linux and Web technology since 1996. Our
entire office runs on Linux, and we have built mission-critical
solutions for some of the top corporate entities in India and abroad.
Our client list includes arguably the world's largest securities
depository (The National Securities Depository Limited, India,
<literal moreinfo="none">www.nsdl.com</literal>), one of the world's top five stock
exchanges in terms of trading volumes (The National Stock Exchange of
India Limited, <literal moreinfo="none">www.nseindia.com</literal>), and one of India's
premier financial institutions listed on the NYSE. In all these cases, we
have introduced them to Linux, and in many cases, we have built them their
first mission-critical business applications on Linux. Contact the authors
or check the Starcom Website for more information about the work we have done.</para></section></section></article>

