Add fer_mirror_web_scrap.wget.md
This commit is contained in:
parent
be51bcba03
commit
bd76860655
42
fer_mirror_web_scrap.wget.md
Normal file
42
fer_mirror_web_scrap.wget.md
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
wget -mpHkKEb -t 1 -e robots=off -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/84.0' http://www.example.com
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
–m (--mirror) : turn on options suitable for mirroring (infinite recursive download and timestamps).
|
||||||
|
|
||||||
|
-p (--page-requisites) : download all files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.
|
||||||
|
|
||||||
|
-H (--span-hosts): enable spanning across hosts when doing recursive retrieving.
|
||||||
|
|
||||||
|
–k (--convert-links) : after the download, convert the links in document for local viewing.
|
||||||
|
|
||||||
|
-K (--backup-converted) : when converting a file, back up the original version with a .orig suffix. Affects the behavior of -N.
|
||||||
|
|
||||||
|
-E (--adjust-extension) : add the proper extension to the end of the file.
|
||||||
|
|
||||||
|
-b (--background) : go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.
|
||||||
|
|
||||||
|
-e (--execute) : execute command (robots=off).
|
||||||
|
|
||||||
|
-t number (--tries=number) : set number of tries to number.
|
||||||
|
|
||||||
|
-U (--user-agent) : identify as agent-string to the HTTP server. Some servers may ban you permanently for recursively download if you send the default User Agent.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org
|
||||||
|
|
||||||
|
|
||||||
|
Explanation of the various flags:
|
||||||
|
|
||||||
|
--mirror – Makes (among other things) the download recursive.
|
||||||
|
--convert-links – Convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.
|
||||||
|
--adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.
|
||||||
|
--page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.
|
||||||
|
--no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user