PySpect

Home

lists

Frequently asked questions

© 2025 PySpect

Package profile

trafilatura

  • Summary: Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
  • Author: Adrien Barbaresi <barbaresi@bbaw.de>
  • Homepage: https://trafilatura.readthedocs.io
  • Source: https://github.com/adbar/trafilatura (Repo profile)
  • Number of releases: 50
  • First release: 0.0.1 on 2019-07-17
  • Latest release: 2.0.0 on 2024-12-03

Releases

Dates and sizes of releases202020212022202320242025Release Date1.02.03.04.05.06.0Size in MB

PyPI Downloads

Weekly downloads over the last 3 monthsFebruaryMarchAprilMayJuneJulyDate050100150200250300350400450500550 thousand downloads per week

Dependencies

Trafilatura has 19 dependencies, 12 of which optional.
Dependencies of trafilatura (19).
DependencyOptional
certififalse
charset_normalizerfalse
courlanfalse
htmldatefalse
justextfalse
lxmlfalse
urllib3false
brotlitrue
cchardettrue
faust-cchardettrue
flake8true
mypytrue
py3langidtrue
pycurltrue
pytesttrue
pytest-covtrue
types-lxmltrue
types-urllib3true
zstandardtrue

Details