Convert Word Docs to HTML
Install Pandoc
Visit pandoc.org to download and install the tool. I am on windows and used the installer package. It was very simple and nothing but the defaults were needed to install it.
This command-line tool will help get that word document into html. Its not perfect and most of the headings will need corrected, but it depends a lot as to how a user formatted things when writing the word doc. But its close and strips out all that nasty ms-style code every web developer hates.
Convert the Files
Open a command prompt and navigate to where your word document exists. In this example the document is called content.docx
. Run the following command to convert this document into an html file.
pandoc -f docx -t html -o content.html content.docx --extract-media=images
In the same directory there should now be a content.html
file. Now you typically have to correct the headlines and clean up some markup. But your halfway there and saved a bunch of time!
extract-media flag
The --extracted-media=
flag will extract any images from the document to a folder. I wanted them placed into an images
folder so the flag becomes --extract-media=images
.