Graphic representing Doctor

Doctor


https://free.law/projects/doctor
Oakland, California

Doctor is a microservice for converting and extracting documents and audio files.

As a part of building CourtListener, we have spent years optimizing our document extraction and audio conversion pipelines. Doctor is the culmination of this work and has functionality like:

Extracting text from documents, including WPD, PDF, DOC, DOCX, RTF, and more. Completing optimized OCR extraction on image-based PDFs. Getting page counts from different document types. Converting audio files from WMA, OGG, WAV, and others to MP3. Making a PDF from images. Creating thumbnails from PDFs. Doctor is designed to scale while providing performant high-quality results. It can be scaled horizontally via a multi-worker or orchestrated single-worker model.

The code in Doctor has processed tens of millions of documents and over 2.5 million minutes of audio.

Organization Type: Non-profit / charity / foundation
Status: Active
Parent Organization: Free Law Project
Open Source: Yes
Last Modified: 11/19/2024
Added on: 11/19/2024

Project Categories

Back to Top