知乎專欄 | 多維度架構 | 微信號 netkiller-ebook | QQ群:128659835 請註明“讀者” |
每個搜索關鍵字都應該有一個惟一的URL,例如
https://www.google.com.hk/search?sourceid=chrome&ie=UTF-8&q=netkiller&sei=9v-QT_q1L6SZiAel2bGnBA&gbv=2 https://www.google.com.hk/search?aq=f&sourceid=chrome&ie=UTF-8&q=neo https://www.google.com.hk/search?sourceid=chrome&ie=UTF-8&q=bg7nyt
每搜索一次新的關鍵字就會產生一條唯一的URL,這樣就可以實現反向代理緩存,甚者通過HTTP頭,實現瀏覽器段的緩存。
<meta name="robots" content="noarchive">
例 2.1. example robots.txt
http://www.google.com/robots.txt
User-agent: * Disallow: /search Disallow: /groups Disallow: /images Disallow: /catalogs Disallow: /catalogues Disallow: /news Allow: /news/directory Disallow: /nwshp Disallow: /setnewsprefs? Disallow: /index.html? Disallow: /? Disallow: /addurl/image? Disallow: /pagead/ Disallow: /relpage/ Disallow: /relcontent Disallow: /imgres Disallow: /imglanding Disallow: /keyword/ Disallow: /u/ Disallow: /univ/ Disallow: /cobrand Disallow: /custom Disallow: /advanced_group_search Disallow: /googlesite Disallow: /preferencessection Disallow: /setprefs Disallow: /swr Disallow: /url Disallow: /default Disallow: /m? Disallow: /m/? Disallow: /m/blogs? Disallow: /m/ig Disallow: /m/images? Disallow: /m/local? Disallow: /m/movies? Disallow: /m/news? Disallow: /m/news/i? Disallow: /m/place? Disallow: /m/setnewsprefs? Disallow: /m/search? Disallow: /m/swmloptin? Disallow: /m/trends Disallow: /wml? Disallow: /wml/? Disallow: /wml/search? Disallow: /xhtml? Disallow: /xhtml/? Disallow: /xhtml/search? Disallow: /xml? Disallow: /imode? Disallow: /imode/? Disallow: /imode/search? Disallow: /jsky? Disallow: /jsky/? Disallow: /jsky/search? Disallow: /pda? Disallow: /pda/? Disallow: /pda/search? Disallow: /sprint_xhtml Disallow: /sprint_wml Disallow: /pqa Disallow: /palm Disallow: /gwt/ Disallow: /purchases Disallow: /hws Disallow: /bsd? Disallow: /linux? Disallow: /mac? Disallow: /microsoft? Disallow: /unclesam? Disallow: /answers/search?q= Disallow: /local? Disallow: /local_url Disallow: /froogle? Disallow: /products? Disallow: /products/ Disallow: /froogle_ Disallow: /product_ Disallow: /products_ Disallow: /print Disallow: /books Disallow: /bkshp?q= Allow: /booksrightsholders Disallow: /patents? Disallow: /patents/ Allow: /patents/about Disallow: /scholar Disallow: /complete Disallow: /sponsoredlinks Disallow: /videosearch? Disallow: /videopreview? Disallow: /videoprograminfo? Disallow: /maps? Disallow: /mapstt? Disallow: /mapslt? Disallow: /maps/stk/ Disallow: /maps/br? Disallow: /mapabcpoi? Disallow: /maphp? Disallow: /places/ Disallow: /maps/place Disallow: /help/maps/streetview/partners/welcome/ Disallow: /lochp? Disallow: /center Disallow: /ie? Disallow: /sms/demo? Disallow: /katrina? Disallow: /blogsearch? Disallow: /blogsearch/ Disallow: /blogsearch_feeds Disallow: /advanced_blog_search Disallow: /reader/ Allow: /reader/play Disallow: /uds/ Disallow: /chart? Disallow: /transit? Disallow: /mbd? Disallow: /extern_js/ Disallow: /calendar/feeds/ Disallow: /calendar/ical/ Disallow: /cl2/feeds/ Disallow: /cl2/ical/ Disallow: /coop/directory Disallow: /coop/manage Disallow: /trends? Disallow: /trends/music? Disallow: /trends/hottrends? Disallow: /trends/viz? Disallow: /notebook/search? Disallow: /musica Disallow: /musicad Disallow: /musicas Disallow: /musicl Disallow: /musics Disallow: /musicsearch Disallow: /musicsp Disallow: /musiclp Disallow: /browsersync Disallow: /call Disallow: /archivesearch? Disallow: /archivesearch/url Disallow: /archivesearch/advanced_search Disallow: /base/search? Disallow: /base/reportbadoffer Disallow: /base/s2 Disallow: /urchin_test/ Disallow: /movies? Disallow: /codesearch? Disallow: /codesearch/feeds/search? Disallow: /wapsearch? Disallow: /safebrowsing Allow: /safebrowsing/diagnostic Allow: /safebrowsing/report_error/ Allow: /safebrowsing/report_phish/ Disallow: /reviews/search? Disallow: /orkut/albums Disallow: /jsapi Disallow: /views? Disallow: /c/ Disallow: /cbk Disallow: /recharge/dashboard/car Disallow: /recharge/dashboard/static/ Disallow: /translate_a/ Disallow: /translate_c Disallow: /translate_f Disallow: /translate_static/ Disallow: /translate_suggestion Disallow: /profiles/me Allow: /profiles Disallow: /s2/profiles/me Allow: /s2/profiles Allow: /s2/photos Allow: /s2/static Disallow: /s2 Disallow: /transconsole/portal/ Disallow: /gcc/ Disallow: /aclk Disallow: /cse? Disallow: /cse/panel Disallow: /cse/manage Disallow: /tbproxy/ Disallow: /comparisonads/ Disallow: /imesync/ Disallow: /shenghuo/search? Disallow: /support/forum/search? Disallow: /reviews/polls/ Disallow: /hosted/images/ Disallow: /hosted/life/ Disallow: /ppob/? Disallow: /ppob? Disallow: /ig/add? Disallow: /adwordsresellers Disallow: /accounts/o8 Allow: /accounts/o8/id Disallow: /topicsearch?q= Disallow: /xfx7/ Disallow: /squared/api Disallow: /squared/search Disallow: /squared/table Disallow: /toolkit/ Allow: /toolkit/*.html Disallow: /qnasearch? Disallow: /errors/ Disallow: /app/updates Disallow: /sidewiki/entry/ Disallow: /quality_form? Disallow: /labs/popgadget/search Disallow: /buzz/post Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml Sitemap: http://www.google.com/hostednews/sitemap_index.xml Sitemap: http://www.google.com/ventures/sitemap_ventures.xml Sitemap: http://www.google.com/sitemaps_webmasters.xml Sitemap: http://www.gstatic.com/trends/websites/sitemaps/sitemapindex.xml Sitemap: http://www.gstatic.com/dictionary/static/sitemaps/sitemap_index.xml
User-agent: * Allow: * Disallow: /management/ Sitemap: http://netkiller.sourceforge.net/sitemaps.xml.gz
#!/bin/bash DOMAIN="http://www.netkiller.cn" PUBLIC_HTML=~/public_html if [ ! -z $1 ]; then DOMAIN=$1 fi lastmod=`date "+%Y-%m-%d"` echo '<?xml version="1.0" encoding="UTF-8"?>' echo '<?xml-stylesheet type="text/xsl" href="gss.xsl"?>' echo '<urlset xmlns="http://www.google.com/schemas/sitemap/0.84" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">' #echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' for htmlfile in $(find $PUBLIC_HTML/ -type f -name "*.html") do url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"` echo ' <url>' echo ' <loc>'${DOMAIN}'/'${url}'</loc>' echo ' <lastmod>'${lastmod}'</lastmod>' echo ' <changefreq>daily</changefreq>' echo ' <priority>0.5</priority>' echo ' </url>' done for htmlfile in $(find $PUBLIC_HTML/ -type f -name "*.epub") do url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"` echo ' <url>' echo ' <loc>'${DOMAIN}'/'${url}'</loc>' echo ' <lastmod>'${lastmod}'</lastmod>' echo ' <changefreq>daily</changefreq>' echo ' <priority>0.5</priority>' echo ' </url>' done for htmlfile in $(find $PUBLIC_HTML/ -type f -name "*.mobi") do url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"` echo ' <url>' echo ' <loc>'${DOMAIN}'/'${url}'</loc>' echo ' <lastmod>'${lastmod}'</lastmod>' echo ' <changefreq>daily</changefreq>' echo ' <priority>0.5</priority>' echo ' </url>' done for htmlfile in $(find $PUBLIC_HTML/ -type f -name "*.chm") do url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"` echo ' <url>' echo ' <loc>'${DOMAIN}'/'${url}'</loc>' echo ' <lastmod>'${lastmod}'</lastmod>' echo ' <changefreq>daily</changefreq>' echo ' <priority>0.5</priority>' echo ' </url>' done for htmlfile in $(find $PUBLIC_HTML/ -type f -name *.pdf) do url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"` echo ' <url>' echo ' <loc>'${DOMAIN}'/'${url}'</loc>' echo ' <lastmod>'${lastmod}'</lastmod>' echo ' <changefreq>daily</changefreq>' echo ' <priority>0.5</priority>' echo ' </url>' done echo "</urlset>"