How to crawl using wget to download ONLY HTML files (ignore images, css, js)

↧

Answer by Spir for How to crawl using wget to download ONLY HTML files...

April 11, 2017, 9:07 am

what about adding the options:--reject '*.js,*.css,*.ico,*.txt,*.gif,*.jpg,*.jpeg,*.png,*.mp3,*.pdf,*.tgz,*.flv,*.avi,*.mpeg,*.iso'--ignore-tags=img,link,script --header="Accept: text/html"

View Article

Answer by Nathan J.B. for How to crawl using wget to download ONLY HTML files...

January 31, 2014, 10:00 am

@ernie's comment about --ignore-tags lead me down the right path!When I looked up --ignore-tags in man, I noticed --follow-tags.Setting --follow-tags=a allowed me to skip img, link, script, etc.It's...

View Article

How to crawl using wget to download ONLY HTML files (ignore images, css, js)

April 11, 2017, 9:07 am

Essentially, I want to crawl an entire site with Wget, but I need it to NEVER download other assets (e.g. imagery, CSS, JS, etc.). I only want the HTML files.Google searches are completely...

View Article