Package profile
unstructured
- Summary: A library that prepares raw documents for downstream ML tasks.
- Author: Unstructured Technologies
- Homepage: https://github.com/Unstructured-IO/unstructured
- Source: https://github.com/Unstructured-IO/unstructured (Repo profile)
- Number of releases: 189
- First release: 0.0.1.dev0 on 2022-09-06
- Latest release: 0.18.9 on 2025-07-16
Releases
PyPI Downloads
Dependencies
Unstructured has 47 dependencies, 26 of which optional.Dependency | Optional |
---|---|
backoff | false |
beautifulsoup4 | false |
chardet | false |
dataclasses-json | false |
emoji | false |
filetype | false |
html5lib | false |
langdetect | false |
lxml | false |
nltk | false |
numpy | false |
psutil | false |
python-iso639 | false |
python-magic | false |
python-oxmsg | false |
rapidfuzz | false |
requests | false |
tqdm | false |
typing-extensions | false |
unstructured-client | false |
wrapt | false |
effdet | true |
google-cloud-vision | true |
markdown | true |
msoffcrypto-tool | true |
networkx | true |
onnx | true |
onnxruntime | true |
openpyxl | true |
paddlepaddle | true |
pandas | true |
pdf2image | true |
pdfminer.six | true |
pi-heif | true |
pikepdf | true |
pypandoc | true |
pypdf | true |
python-docx | true |
python-pptx | true |
sacremoses | true |
sentencepiece | true |
torch | true |
transformers | true |
unstructured-inference | true |
unstructured.paddleocr | true |
unstructured.pytesseract | true |
xlrd | true |