Quantcast
Channel: How to crawl using wget to download ONLY HTML files (ignore images, css, js) - Super User
Browsing all 3 articles
Browse latest View live

Answer by Spir for How to crawl using wget to download ONLY HTML files...

what about adding the options:--reject '*.js,*.css,*.ico,*.txt,*.gif,*.jpg,*.jpeg,*.png,*.mp3,*.pdf,*.tgz,*.flv,*.avi,*.mpeg,*.iso'--ignore-tags=img,link,script --header="Accept: text/html"

View Article


Answer by Nathan J.B. for How to crawl using wget to download ONLY HTML files...

@ernie's comment about --ignore-tags lead me down the right path!When I looked up --ignore-tags in man, I noticed --follow-tags.Setting --follow-tags=a allowed me to skip img, link, script, etc.It's...

View Article


How to crawl using wget to download ONLY HTML files (ignore images, css, js)

Essentially, I want to crawl an entire site with Wget, but I need it to NEVER download other assets (e.g. imagery, CSS, JS, etc.). I only want the HTML files.Google searches are completely...

View Article
Browsing all 3 articles
Browse latest View live