Microsoft Word documents, almost ubiquitous in business settings, might be considered a necessary evil for Linux users to deal with. Sure, you can open Word files in LibreOffice, but itâs a pain to wait for a heavy graphical application to load your document. Antiword is a solution that runs in your terminal â" perfect for people on slow computers or systems without a graphical environment.
Antiword has been ported to FreeBSD, BeOS, OS/2, Mac OS X, Amiga, VMS, NetWare, Plan9, EPOC, Zaurus PDA, MorphOS, Tru64/OSF, Minix, Solaris and DOS. For this article, Iâll focus on using it in Linux.
Main Features
Antiword lets you view and convert MS Word documents from the command line. You can convert to the following formats:
- Plain text
- Formatted text
- Postscript
- XML (only DocBook is currently supported)
Limitations
Before you get too excited, I have to mention that Antiword was last updated in 2005 and is not compatible with newer DOCX documents. You also cannot use it to edit your documents.
Getting Antiword
If your Linux distribution has a package manager, you can most likely find Antiword in one of your repositories.
Otherwise, grab the .tar.gz archive from the Antiword page on Freecode. Extract the archive and enter the antiword-0.37 directory. Then run:
Usage
For the following usage tips, Iâm going to use my résumé as an example document. Hereâs what it looks like in LibreOffice:
The most basic way to use antiword is to simply display the document:
As you can see, the default command doesnât preserve certain aspects of formatting like font size, italics, and underlining, but it does a nice job of presenting the text in a readable form.
To display formatting information, use the â-f
â flag in your command:
No, this doesnât actually show you the formatting in a WYSIWYG style; rather, it tells you about it with a markdown-like syntax. For example, it shows _underlined text_ with underscores and *bold text* with asterisks.
To convert your Word document to a PDF file, you must specify a paper size using the â-a
â flag. Antiword supports the following paper sizes:
- 10Ã14
- a3
- a4
- a5
- b4
- b5
- executive
- folio
- legal
- letter
- note
- quarto
- statement
- tabloid
You can use the same paper sizes when converting a document to Postscript, but in that case you must use the â-p
â flag instead.
This example converts the document to a tabloid-sized PDF file:
antiword -a tabloid resume.doc > resume-tabloid.pdf |
This is the resulting PDF file displayed in Okular:
Not bad! The dotted underlining and e-mail address hyperlink disappeared, but overall, the conversion was successful.
If youâre converting to Postscript, you can also use the â-L
â to print in landscape mode.
This example will convert the document to DocBook format:
antiword -x db resume.doc > resume-docbook.docbook
The conversion will also preserve metadata, including the author name and creation date of the document. Hereâs what the raw XML looks like:
And hereâs what the DocBook file looks like in LibreOffice:
You can see that it looks different from the original Word document, but the structure has mostly been preserved. Converting to DocBook with Antiword would probably work better with Word documents that were created with conversion to XML in mind.
To see what else you can do with Antiword â" including restoring text that has been changed in MS Word â" check out the man page (itâs also online).
No comments:
Post a Comment