Creating Documents For Research And Thesis Publication in Statistics
Documents published in the field of Statistics often contain advanced mathematical formulas. Documents published in the field of Statistics also often contain extensive lists of references that must be formatted in a style appropriate for the publication requirements. There is some argument as to which tools are best for producing these formulas and reference lists. Word processors, such as Microsoft Word, have equation editors available that are relatively easy to use and generally produce an on-screen WYSIWYG image that is close to, but not always identical with the printed result. Microsoft Word also has a rudimentary native reference list functionality. By comparison, the LaTeX (TeX) document typesetting language is considered by many to be more flexible and to produce higher quality and accurate equation images. LaTeX also has a companion Bibliography processing tool that simplifies generating and reformatting references. LaTeX is not as easy for a novice to use, and the editing tools available in Statistics do not typically provide an interactive, WYSYWIG view of the formula or reference list as it is being edited.
Unfortunately, Microsoft Word and LaTeX are not compatible and it is not possible to easily embed formulas or references defined with LaTeX into documents created with Word. So it is usually necessary to choose one of the available approaches and stick with it.
If you are a graduate student, you may want to check with your advisor as to which editing tools are most appropriate for your publications. If you are a faculty member, it is likely that your peer community and the publishing houses that you work with will indicate a preference for the most appropriate tools.
Publication Preparation with Word Processors
Word processors are designed to be simple to use for preparing formatted documents. Most faculty and students already have a working knowledge on how to use Microsoft Word or equivalent packages to create documents. The major issue, if there is one, is the support that word processors provide to produce high-quality research publications suitable for peer reviewed journals and proceedings. One area of concern involves embedding fomulas, graphs and graphics. The second involves the flexibility in generating embedded references, citations and bibliographies. The
Viewing And Editing MS Documents In Statistics section addresses the issue of embedded formulas. The rudimentary tools for generating reference lists in MS Word documents are very limited and difficult to use. There are powerful add-on packages that improve flexibility of the native tools. But they can be costly (for example, Endnote, a popular standard, costs about $200 for one academic license) and are not typically installed in the Statistics Windows environments. There is a helpful discussion on using MS Word for technical publications written by John Krumm of Microsoft Research,
Layout Tips for Technical Papers in Microsoft Word 2000. He reinterates that Word currently lacks useful tools for building lists of references.
Generating Suitable Output from Word Processors
Some publications houses will accept MS Word files in the native
.doc format, so all that is needed is to save the document in native Word,
.doc format from MS Word or OpenOffice and send it off. WARNING: Equations produced in OpenOffice may not have a format acceptable for publication by those requesting a
.doc file.
Many publications require either a Postscript or a PDF/Acrobat file format. This can add some complexity to the process.
Postscript Output from MS Word
To generate a Postscript version of your document from MS Word on either the Linux version (
winword) or from a Statistics Windows computer, do the following:
- Select
File->Print
- Choose the "generic" printer that starts with
zzz from the pulldown menu (zzz-def on Windows and zzzz-cx-default on Linux)
- Check the "Print to File" box
- Select "OK"
- When asked, save the print file to an appropriate location with an appropriate <filename>
- Close the word processor and go to the file location
- Rename the file from <filename>.prn to <filename>.ps
PDF Output from MS Word
On Linux, apply the
ps2pdf command to the <filename>.ps file created above. For example, if the <filename> is
mypaper in the directory
~/papers (assuming use of
public06):
public06% cd ~/papers
public06% ps2pdf mypaper.ps
will create the file
mypaper.pdf in the same directory.
On Windows, you can also apply the
ps2pdf command. Open a DOS Command window from
Start->Programs->Accessories->Command Prompt. Change to the directory where the file is located, for example
U:\papers:
U:\> cd U:\papers
U:\papers> ps2pdf mypaper.ps
will create the file
mypaper.pdf in the same directory.
On some Windows computers (
winstat01 in 1274 MSC for example), you have the option of using
Adobe Acrobat,
Acrobat Writer or
Acrobat Distiller to convert Postscript files to PDF. These Adobe packages can also be used to request direct conversion of MS Word documents to PDF from within MS Word. Since these packages are not widely available, the details won't be covered here. You can find additional instructions on use on the web, for example:
How to Create a PDF file using Adobe Acrobat Writer/Distiller.
Postscript Output from OpenOffice
To generate a Postscript version of your document from OpenOffice on Linux (
soffice), do the following:
- Select
File->Print
- Choose the
Generic Printer from the pulldown menu
- Check the "Print to file" box
- Select "OK"
- When asked, save the print file to an appropriate location with an appropriate <filename>
- Close the word processor and go to the file location
- Rename the file from <filename>.prn to <filename>.ps
PDF Output from OpenOffice
Apply the
ps2pdf command to the <filename>.ps file created above. For example, if the <filename> is
mypaper (assuming use of
public06 and already in the file location directory):
public06% ps2pdf mypaper.ps
will create the file
mypaper.pdf in the same directory.
If you don't need the intermediate Postscript file, you can create a PDF version of the document directly from within OpenOffice:
- Select
File->Export as PDF
- When asked, save the print file to an appropriate location with an appropriate <filename>
WARNINGS
Putting word processing files through Postscript and PDF conversion does not guarantee that the final result will look the same as it did in the word processor. There are often failures with margin settings, centering, locations for equations, and font differences. Always look at the resulting PDF file very carefully before submitting a paper for publication. If it does not come out correctly the first time, it can be very difficult to jog images and change fonts to fix things.
Remember that you cannot edit the final PDF file, and it is very difficult to edit the intermediate Postscript file. So the cycle for making corrections to a Postscript or PDF paper prepared with a word processor is:
- Make modifications and corrections in the word processor document
- Convert the document to a Postscript file
- Convert the Postscript file to a PDF file for viewing
- Carefully look over the result
- Repeat
This can be a long and difficult process. Consider other options, like LaTeX, if you want to avoid this cycle.
Publication Preparation with LaTeX
LaTeX is a programming language with an associated post-processor for generating Postscript documents from a text file. The text file will include document content plus embedded typesetting commands. You can prepare a LaTeX file using any type of editor that can produce an output of plain text. This would include full-blown word processors like MS Word and OpenOffice. But it is more typical to use one of the text editors mentioned in the
Text And Program Editing In Statistics section. The preferred editor in Statistics is
emacs.
emacs has configurable features that help in formatting and understanding LaTeX files, including indents, colors and highlighting. It is also possible to use
nedit for preparing LaTeX files, but it is not as popular and not all of the special formatting tools and macros for LaTeX are installed in Statistics at this time.
There are many tutorials on the general preparation of LaTeX files (for example:
Beginning LaTeX). Here is an arbitrary selection of one that uses
emacs (actually
Xemacs) as the editor:
LaTeX Tutorial. There is a reference in this tutorial to getting a special
.emacs file that is not available here. You can modify your local
~/.emacs file by adding the following two lines at the end:
;; Access to AUCTeX -- the way to write LaTeX in Emacs
(require 'tex-site)
This will work with
emacs. To provide LaTex features in
xemacs, add the two lines above to the end of the file
~/.xemacs/init.el
Here is a tutorial that uses
nedit as the editor:
Quick and Dirty LaTeX Tutorial. For advanced editing of LaTeX files, you can find more tutorials and instructions by searching the web (for an example of a list of LaTeX sites of interest:
University of Cambridge Text Processing using LaTeX). Or you can work work with your colleagues on advanced topics.
Using LaTeX to Produce Publications with Formulas
There are numerous tutorials specific to preparing mathematical equations and formulas with LaTeX. Again, as an arbitrary starting point, you can look at:
Getting to Grips with Latex - Mathematics. This site also includes general LaTeX tutorials and a more advance equation tutorial.
Using LaTeX to Produce Publications with References and Bibliographies
Documents prepared for LaTeX can include markup appropriate for a widely-used bibliography database and formatting tool called Bibtex that is available in Statistics. Bibtex allows easy compilation and reformatting of references and provides another incentive to use LaTeX for producing Statistics publications. As usual, there are numerous tutorials and "How Tos" for using Bibtex, and an arbitrary example introduction is:
How to use BibTeX.
Generating Suitable Output with LaTeX
Using LaTeX on Linux
The
emacs or
Xemacs editor is recommended for preparing and editing LaTeX files on linux. The first thing you may want to do is prepare or copy a customized
.emacs file to configure
emacs and
Xemacs to simplify LaTeX editing. There are many locations on the web that show or discuss
.emacs files in general and specifically for LaTeX. One site that collects information about
emacs dot files is
The very unofficial dotemacs home. A default
.emacs file will soon be available in Statistics to use as a starting point for local customization. A first step is to modify your local
~/.emacs and
~/.xemacs/init.el files by adding the following two lines at the end:
;; Access to AUCTeX -- the way to write LaTeX in Emacs
(require 'tex-site)
This will add a set of common LaTex features to
emacs and
xemacs.
You can start
emacs or
Xemacs from the Linux command prompt (assuming use of
public06):
public06% emacs &
or
public06% xemacs &
(the
& is optional to put
emacs/Xemacs into the background and allow execution of other commands in the terminal window)
Edit your document per the instructions for using
emacs/Xemacs and instructions for creating LaTeX documents. To format and view the result outside of
emacs/Xemacs, save your file/buffer with a
.tex extension. Then execute the following instructions (assume file name is
mypaper.tex in directory
~/papers):
public06% cd ~/papers
public06% latex mypaper.tex
public06% latex mypaper.tex
public06% dvips mypaper.dvi -o mypaper.ps
public06% ps2pdf mypaper.ps
public06% pdfview mypaper.pdf
If you also include Bibtex markup in the document, an few extra commands are needed to format the Bibliography information:
public06% cd ~/papers
public06% latex mypaper.tex
public06% bibtex mypaper.tex
public06% latex mypaper.tex
public06% latex mypaper.tex
public06% dvips mypaper.dvi -o mypaper.ps
public06% ps2pdf mypaper.ps
public06% pdfview mypaper.pdf
Some or all of these commands can be combined into a script or Makefile so that only one command is typed each time a conversion is needed.
If you have a suitable
.emacs file, these commands can be combined and executed as a single line in
emacs/Xemacs.
Using LaTeX on Windows
You can use the
emacs or
Xemacs editor for preparing and editing LaTeX files on Windows. The first thing you may want to do is prepare or copy a customized
.emacs file to configure
emacs and
Xemacs to simplify LaTeX editing. There are many locations on the web that show or discuss
.emacs files in general and specifically for LaTeX (see the previous section, Using LaTeX on Linux).
You can start
emacs or
Xemacs from a Windows command prompt by opening a DOS Command window from
Start->Programs->Accessories->Command Prompt. Change to the directory where the file is located, for example
U:\papers::
U:\> cd U:\papers
Then opwn
emacs or
Xemacs:
U:\papers> emacs
or
U:\papers> xemacs
Edit your document per the instructions for using
emacs/Xemacs and instructions for creating LaTeX documents. To format and view the result outside of
emacs/Xemacs, save your file/buffer with a
.tex extension. Then execute the following instructions (assume file name is
mypaper.tex in directory
U:/papers):
U:\papers> cd ~/papers
U:\papers> latex mypaper.tex
U:\papers> latex mypaper.tex
U:\papers> dvips mypaper.dvi -o mypaper.ps
U:\papers> ps2pdf mypaper.ps
U:\papers> gsview32 mypaper.pdf
If you also include Bibtex markup in the document, an few extra commands are needed to format the Bibliography information:
U:\papers> cd ~/papers
U:\papers> latex mypaper.tex
U:\papers> bibtex mypaper.tex
U:\papers> latex mypaper.tex
U:\papers> latex mypaper.tex
U:\papers> dvips mypaper.dvi -o mypaper.ps
U:\papers> ps2pdf mypaper.ps
U:\papers> gsview32 mypaper.pdf
Some or all of these commands can be combined into a Windows batch file so that only one command is typed each time a conversion is needed.
If you have a suitable
.emacs file, these commands can be combined and executed as a single line in
emacs/Xemacs.
You can also open the resulting
mypaper.pdf file with the Acrobat Reader by starting the Reader from
Start->Programs->Adobe->Adobe Reader and using
File->Open to open the resulting file, or by opening the
U:\papers folder and double clicking the
mypaper.pdf file.
--
MikeRedmond - 09 Sep 2005