Wednesday, March 27, 2013

Antiword: Read MS Word Documents in Your Terminal [Linux]

Antiword: Read MS Word Documents in Your Terminal [Linux]

antiword-mainMicrosoft Word documents, almost ubiquitous in business settings, might be considered a necessary evil for Linux users to deal with. Sure, you can open Word files in LibreOffice, but it’s a pain to wait for a heavy graphical application to load your document. Antiword is a solution that runs in your terminal â€" perfect for people on slow computers or systems without a graphical environment.

Antiword has been ported to FreeBSD, BeOS, OS/2, Mac OS X, Amiga, VMS, NetWare, Plan9, EPOC, Zaurus PDA, MorphOS, Tru64/OSF, Minix, Solaris and DOS. For this article, I’ll focus on using it in Linux.

Main Features

Antiword lets you view and convert MS Word documents from the command line. You can convert to the following formats:

  • Plain text
  • Formatted text
  • PDF
  • Postscript
  • XML (only DocBook is currently supported)

Limitations

Before you get too excited, I have to mention that Antiword was last updated in 2005 and is not compatible with newer DOCX documents. You also cannot use it to edit your documents.

Getting Antiword

If your Linux distribution has a package manager, you can most likely find Antiword in one of your repositories.

Otherwise, grab the .tar.gz archive from the Antiword page on Freecode. Extract the archive and enter the antiword-0.37 directory. Then run:

Usage

For the following usage tips, I’m going to use my résumé as an example document. Here’s what it looks like in LibreOffice:

antiword-document-libreoffice

The most basic way to use antiword is to simply display the document:

antiword-display-document

As you can see, the default command doesn’t preserve certain aspects of formatting like font size, italics, and underlining, but it does a nice job of presenting the text in a readable form.

To display formatting information, use the “-f” flag in your command:

antiword-formatted

No, this doesn’t actually show you the formatting in a WYSIWYG style; rather, it tells you about it with a markdown-like syntax. For example, it shows _underlined text_ with underscores and *bold text* with asterisks.

To convert your Word document to a PDF file, you must specify a paper size using the “-a” flag. Antiword supports the following paper sizes:

  • 10×14
  • a3
  • a4
  • a5
  • b4
  • b5
  • executive
  • folio
  • legal
  • letter
  • note
  • quarto
  • statement
  • tabloid

You can use the same paper sizes when converting a document to Postscript, but in that case you must use the “-p” flag instead.

This example converts the document to a tabloid-sized PDF file:

antiword -a tabloid resume.doc > resume-tabloid.pdf

This is the resulting PDF file displayed in Okular:

antiword-tabloid

Not bad! The dotted underlining and e-mail address hyperlink disappeared, but overall, the conversion was successful.

If you’re converting to Postscript, you can also use the “-L” to print in landscape mode.

This example will convert the document to DocBook format:

antiword -x db resume.doc > resume-docbook.docbook

The conversion will also preserve metadata, including the author name and creation date of the document. Here’s what the raw XML looks like:

antiword-docbook-xml

And here’s what the DocBook file looks like in LibreOffice:

antiword-docbook-libreoffice

You can see that it looks different from the original Word document, but the structure has mostly been preserved. Converting to DocBook with Antiword would probably work better with Word documents that were created with conversion to XML in mind.

To see what else you can do with Antiword â€" including restoring text that has been changed in MS Word â€" check out the man page (it’s also online).

No comments:

Post a Comment

//PART 2