Recent News

Mar. 28, 2013
Responding to numerous requests, MS Outlook parser was released. more

Oct. 16, 2009
New OpenDocument family documents parsers are availble now.

Aug. 31, 2007
New MS Office 2007 documents parsers has been added to the collection.

Nov. 21, 2005
Docs2text 2.0 component released. Supported document formats are MS Word, MS Excel, MS PowerPoint, rtf, Adobe Acrobat PDF.

Our partner

Full Text Indexing and Retrieval library with Approximate Search.

Android platform

Android logo

As a response to our customers' requests we decided to port our libraries to Android - modestly speaking one of the most popular mobile platforms. Historically happened that we started and as a result continued to develop our libraries using powerful assembly language in attempt to achieve highest possible processing speed and flexibility, which we in fact did.


MS Office® family

MS Word logo
MS PowerPoint logo
MS Excel logo
MS Outlook logo

MS Word document format is a proprietary binary format used by MS Word® being de facto standard at office document's management it became very popular however its nondocumented structure makes it almost impossible to correctly read it by a third-party applications.
docs2text component/library is able to read MS Word 97 - 2003 documents without having MS Office/Word installed delivering high accuracy and incredible processing speed.
learn more

MS PowerPoint® format is a popular presentations format using for creating a stunning slide shows and presentations.
docs2text can extract text objects from MS PowerPoint presentations without MS PowerPoint installed.
learn more

MS Excel document format represents a popular spreadsheets storage. It can contain text, formulas, charts, images, complex calculations
As all MS Office binary formats MS Excel format doesn't make an exception and is nondocumented as well and as you may notice docs2text can easily read MS Excel's spreadsheets without any applications/components installed providing high accuracy, unbeatable performance and extreme flexibility.
learn more

MS Office 2007 documents (docx, xlsx, pptx) are also supported now.

Personal Storage Table (PST) is an open, proprietary file format designed to store messages, notes, calendar events and other items within Microsoft software such as Microsoft Outlook, Microsoft Exchange Client and Windows Messaging. And as usual, our pst2text library doesn't require any additional applications to work with .pst documents, delivering highest processing speed and flexibility.
learn more


Adobe Acrobat® PDF

PDF logo

PDF (stands for Portable Document Format) is developed by Adobe Systems Inc. for displaying/printing documents on a different systems and devices keeping its layout unchanged. It can contain text, images, movies, sounds, forms etc.
While PDF format is documented it isn't a trivial task to develop a reliable parser to process PDF documents. Vast majority of the current solutions on the market is based on the open source project xPDF with all its con's and pro's. According to our customers survey, pdf2text is up to, as unbelievable as it sounds, 100 times faster than any text from PDF extraction solution available on the market.
learn more

OpenDocument Format family

ODT logo
ODS logo
ODP logo
The OpenDocument Format (ODF) is an open cross-platform file format for office documents (text documents, spreadsheets, drawings, presentations and more), developed at OASIS, an independent, international standards group. Open means that any developer can learn its details and create an application that can read and write this format. ODF is a native file format for OpenOffice.org 2.0+, StarOffice 8+, IBM Workplace, AbiWord, KOffice 1.5+ and many other applications (MS Office can also read and write it).

In addition to being an OASIS standard, it is published as an ISO/IEC international standard, ISO/IEC 26300:2006 Open Document Format for Office Applications (OpenDocument) v1.0.

ODF is being adopted by many governments worldwide as a required file format for publishing and accepting documents.

Our OpenDocument parser is designed to convert OpenDocument's documents to text or extract any other necessary data and can handle the following document extentions:

  • .odt for word processing (text) documents;
  • .ods for spreadsheets;
  • .odp for presentations;
learn more