↧
Answer by Spir for How to crawl using wget to download ONLY HTML files...
what about adding the options:--reject '*.js,*.css,*.ico,*.txt,*.gif,*.jpg,*.jpeg,*.png,*.mp3,*.pdf,*.tgz,*.flv,*.avi,*.mpeg,*.iso'--ignore-tags=img,link,script --header="Accept: text/html"
View ArticleAnswer by Nathan J.B. for How to crawl using wget to download ONLY HTML files...
@ernie's comment about --ignore-tags lead me down the right path!When I looked up --ignore-tags in man, I noticed --follow-tags.Setting --follow-tags=a allowed me to skip img, link, script, etc.It's...
View ArticleHow to crawl using wget to download ONLY HTML files (ignore images, css, js)
Essentially, I want to crawl an entire site with Wget, but I need it to NEVER download other assets (e.g. imagery, CSS, JS, etc.). I only want the HTML files.Google searches are completely...
View Article