elvisimprstr,
Thanks for the idea. I like it. I prepared a script as you suggested, but somehow I can’t get it to work. Cut-n-paste is a bad idea as the ’ and " characters have been changed to left and right quotes and double quotes.
I can see how it is supposed to work:
wget will retrieve files, with amazing flexibility - the command options you suggested make sense.
-r recurse
-l0 zero levels
-np non-parent
-N ? Does not seem to be a valid option
-P local path prefix
-A add files that match, so all the pfSense-CE files
-X exclude the old directory, which we are not alloed in (I wish)
However, when I do run it, I don’t get any .gz files at all. I get some folders and a robots-follow.txt file. Its weird. I have read the man pages as closely as I can manage (well, its very long indeed).
The suggested “chmod a-w” fails - Operation not permitted. ??? The robot.txt.tmp file permissions are “-rwxr-x—+ 1 root”- so I take that to mean owner (root) as read and write permission. As I run the command as root I do not expect to be told I can’t remove w from all. Weird.
wget returns this log:
–2020-06-30 17:48:52-- https://nyifiles.pfsense.org/mirror/downloads
Resolving nyifiles.pfsense.org (nyifiles.pfsense.org)… 162.208.119.41, 162.208.119.40, 2607:ee80:10::119:40, …
Connecting to nyifiles.pfsense.org (nyifiles.pfsense.org)|162.208.119.41|:443… connected.
HTTP request sent, awaiting response… 301 Moved Permanently
Location: https://nyifiles.pfsense.org/mirror/downloads/ [following]
–2020-06-30 17:48:53-- https://nyifiles.pfsense.org/mirror/downloads/
Reusing existing connection to nyifiles.pfsense.org:443.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: ‘./pfsense-images/nyifiles.pfsense.org/mirror/downloads.tmp’0K .. 479M=0s
2020-06-30 17:48:53 (479 MB/s) - ‘./pfsense-images/nyifiles.pfsense.org/mirror/downloads.tmp’ saved [2809]
Loading robots.txt; please ignore errors.
–2020-06-30 17:48:53-- https://nyifiles.pfsense.org/robots.txt
Reusing existing connection to nyifiles.pfsense.org:443.
HTTP request sent, awaiting response… 200 OK
Length: 27 [text/plain]
Saving to: ‘./pfsense-images/nyifiles.pfsense.org/robots.txt.tmp’0K 100% 8.69M=0s
2020-06-30 17:48:53 (8.69 MB/s) - ‘./pfsense-images/nyifiles.pfsense.org/robots.txt.tmp’ saved [27/27]
Removing ./pfsense-images/nyifiles.pfsense.org/mirror/downloads.tmp since it should be rejected.
FINISHED --2020-06-30 17:48:53–
Total wall clock time: 0.5s
Downloaded: 2 files, 2.8K in 0s (316 MB/s)
I note that the URL is being redirected - to itself. Weird.
So from all of this I think wget is more sophisticated that I understand. Is it honouring the robots.txt, and not following to the mirror/downloads files??
Interesting way to keep up with old images, if I can get it to work.