Package profile
trafilatura
- Summary: Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
- Author: Adrien Barbaresi <barbaresi@bbaw.de>
- Homepage: https://trafilatura.readthedocs.io
- Source: https://github.com/adbar/trafilatura (Repo profile)
- Number of releases: 50
- First release: 0.0.1 on 2019-07-17
- Latest release: 2.0.0 on 2024-12-03
Releases
PyPI Downloads
Dependencies
Trafilatura has 19 dependencies, 12 of which optional.Dependency | Optional |
---|---|
certifi | false |
charset_normalizer | false |
courlan | false |
htmldate | false |
justext | false |
lxml | false |
urllib3 | false |
brotli | true |
cchardet | true |
faust-cchardet | true |
flake8 | true |
mypy | true |
py3langid | true |
pycurl | true |
pytest | true |
pytest-cov | true |
types-lxml | true |
types-urllib3 | true |
zstandard | true |