What I find most effective, is to wrap `get` with local cache, and this is the f... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		powersnail on Feb 20, 2024 \| parent \| context \| favorite \| on: Web Scraping in Python – The Complete Guide What I find most effective, is to wrap `get` with local cache, and this is the first thing I write when I start a web crawling project. Therefore, from the very beginning, even when I'm just exploring and experimenting, every page only gets downloaded once to my machine. This way I don't end up accidentally bother the server too much, and I don't have to re-crawl if I make a mistake in code.

bckygldstn on Feb 20, 2024 [–]

requests-cache [0] is an easy way to do this if using the requests package in python. You can patch requests with

  import requests_cache
  requests_cache.install_cache('dog_breed_scraping')

and responses will be stored into a local sqlite file.

[0] https://requests-cache.readthedocs.io/en/stable/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact