You're in for a world of pain when you start parsing docx and pptx. On the bright side, if you can figure out a good solution, you'll likely have a solid business model. I would imagine that there would likely be significant demand for converting docx and pptx files into html or markdown, as a service. If you do come up with a nice, well-documented API for all of this, I'd certainly recommend your service. If you come up with an outstanding docx parser, then I'd use your service myself (I am using my own somewhat primitive solution for a current project involving the conversion of docx files).
Here's a few projects to look at, if you haven't already: