diff --git a/fer_mirror_web_scrap.wget.md b/fer_mirror_web_scrap.wget.md new file mode 100644 index 0000000..b109faf --- /dev/null +++ b/fer_mirror_web_scrap.wget.md @@ -0,0 +1,42 @@ +wget -mpHkKEb -t 1 -e robots=off -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/84.0' http://www.example.com + + + + –m (--mirror) : turn on options suitable for mirroring (infinite recursive download and timestamps). + + -p (--page-requisites) : download all files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets. + + -H (--span-hosts): enable spanning across hosts when doing recursive retrieving. + + –k (--convert-links) : after the download, convert the links in document for local viewing. + + -K (--backup-converted) : when converting a file, back up the original version with a .orig suffix. Affects the behavior of -N. + + -E (--adjust-extension) : add the proper extension to the end of the file. + + -b (--background) : go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log. + + -e (--execute) : execute command (robots=off). + + -t number (--tries=number) : set number of tries to number. + + -U (--user-agent) : identify as agent-string to the HTTP server. Some servers may ban you permanently for recursively download if you send the default User Agent. + + + + + + + +wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org + + + Explanation of the various flags: + + --mirror – Makes (among other things) the download recursive. + --convert-links – Convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing. + --adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type. + --page-requisites – Download things like CSS style-sheets and images required to properly display the page offline. + --no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site. + +