Add fer_mirror_web_scrap.wget.md

This commit is contained in:
Max HeadRoom 2025-09-06 05:54:57 +02:00
parent be51bcba03
commit bd76860655

View File

@ -0,0 +1,42 @@
wget -mpHkKEb -t 1 -e robots=off -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/84.0' http://www.example.com
m (--mirror) : turn on options suitable for mirroring (infinite recursive download and timestamps).
-p (--page-requisites) : download all files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.
-H (--span-hosts): enable spanning across hosts when doing recursive retrieving.
k (--convert-links) : after the download, convert the links in document for local viewing.
-K (--backup-converted) : when converting a file, back up the original version with a .orig suffix. Affects the behavior of -N.
-E (--adjust-extension) : add the proper extension to the end of the file.
-b (--background) : go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.
-e (--execute) : execute command (robots=off).
-t number (--tries=number) : set number of tries to number.
-U (--user-agent) : identify as agent-string to the HTTP server. Some servers may ban you permanently for recursively download if you send the default User Agent.
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org
Explanation of the various flags:
--mirror Makes (among other things) the download recursive.
--convert-links Convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.
--adjust-extension Adds suitable extensions to filenames (html or css) depending on their content-type.
--page-requisites Download things like CSS style-sheets and images required to properly display the page offline.
--no-parent When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.