One-Line Script to Refresh a Web Site Cache by Spidering all Sitemap Entries

One-Line Script to Refresh a Web Site Cache by Spidering all Sitemap Entries

Well, the title says it all. I wanted to refresh my web site’s page cache, as some pages are very slow loading otherwise. The site can (with the Drupal Boost module) serve pages without having to reconstruct them if they are in the cache.

After some digging around and consolidating information from a number of sources, I came up with the following that reads the xml sitemap for the site then checks each page without downloading it. Remember to replace www. example.com/sitemap.xml with the correct URL for your sitemap.

/usr/bin/wget -q http://www.example.com/sitemap.xml -O - | /usr/bin/awk -F'</?loc>' 'NF>1{print $2}' | /usr/bin/wget --spider -q -i -

Note that, while the above may be broken into several lines on your browser, it should be a single line for the shell command line unless you use line continuation characters.

I added it to my crontab to run immediately after the Drupal cron job in crontab by adding it on the same line as the Drupal cron job, with a semicolon (;) in between the cron job and the spider commands.

Tags: 

1 Comment

Thanks !!! it solved my

Thanks !!! it solved my problem … with a little extension to manage the fact that my sitemap.xml is all on one line… I used sed because I do not know awk….

/usr/bin/wget -q http://www. example.com/sitemap.xml -O — | sed ‘s+</url>+</url>\n+g’ | /usr/bin/awk -F’</?loc>’ ‘NF>1{print $2}’ | /usr/bin/wget —spider -i -