Home | 簡體中文 | 繁體中文 | 雜文 | 知乎專欄 | Github | OSChina 博客 | 雲社區 | 雲棲社區 | Facebook | Linkedin | 視頻教程 | 打賞(Donations) | About
知乎專欄多維度架構 微信號 netkiller-ebook | QQ群:128659835 請註明“讀者”

2.9. 搜索引擎相關優化

2.9.1. 搜索結果靜態化

每個搜索關鍵字都應該有一個惟一的URL,例如

		
https://www.google.com.hk/search?sourceid=chrome&ie=UTF-8&q=netkiller&sei=9v-QT_q1L6SZiAel2bGnBA&gbv=2
https://www.google.com.hk/search?aq=f&sourceid=chrome&ie=UTF-8&q=neo
https://www.google.com.hk/search?sourceid=chrome&ie=UTF-8&q=bg7nyt
		
		

每搜索一次新的關鍵字就會產生一條唯一的URL,這樣就可以實現反向代理緩存,甚者通過HTTP頭,實現瀏覽器段的緩存。

2.9.2. robots.txt

			
<meta name="robots" content="noarchive">
			
		

例 2.1. example robots.txt

http://www.google.com/robots.txt

				
User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Allow: /news/directory
Disallow: /nwshp
Disallow: /setnewsprefs?
Disallow: /index.html?
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /relcontent
Disallow: /imgres
Disallow: /imglanding
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /googlesite
Disallow: /preferencessection
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /default
Disallow: /m?
Disallow: /m/?
Disallow: /m/blogs?
Disallow: /m/ig
Disallow: /m/images?
Disallow: /m/local?
Disallow: /m/movies?
Disallow: /m/news?
Disallow: /m/news/i?
Disallow: /m/place?
Disallow: /m/setnewsprefs?
Disallow: /m/search?
Disallow: /m/swmloptin?
Disallow: /m/trends
Disallow: /wml?
Disallow: /wml/?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=
Disallow: /local?
Disallow: /local_url
Disallow: /froogle?
Disallow: /products?
Disallow: /products/
Disallow: /froogle_
Disallow: /product_
Disallow: /products_
Disallow: /print
Disallow: /books
Disallow: /bkshp?q=
Allow: /booksrightsholders
Disallow: /patents?
Disallow: /patents/
Allow: /patents/about
Disallow: /scholar
Disallow: /complete
Disallow: /sponsoredlinks
Disallow: /videosearch?
Disallow: /videopreview?
Disallow: /videoprograminfo?
Disallow: /maps?
Disallow: /mapstt?
Disallow: /mapslt?
Disallow: /maps/stk/
Disallow: /maps/br?
Disallow: /mapabcpoi?
Disallow: /maphp?
Disallow: /places/
Disallow: /maps/place
Disallow: /help/maps/streetview/partners/welcome/
Disallow: /lochp?
Disallow: /center
Disallow: /ie?
Disallow: /sms/demo?
Disallow: /katrina?
Disallow: /blogsearch?
Disallow: /blogsearch/
Disallow: /blogsearch_feeds
Disallow: /advanced_blog_search
Disallow: /reader/
Allow: /reader/play
Disallow: /uds/
Disallow: /chart?
Disallow: /transit?
Disallow: /mbd?
Disallow: /extern_js/
Disallow: /calendar/feeds/
Disallow: /calendar/ical/
Disallow: /cl2/feeds/
Disallow: /cl2/ical/
Disallow: /coop/directory
Disallow: /coop/manage
Disallow: /trends?
Disallow: /trends/music?
Disallow: /trends/hottrends?
Disallow: /trends/viz?
Disallow: /notebook/search?
Disallow: /musica
Disallow: /musicad
Disallow: /musicas
Disallow: /musicl
Disallow: /musics
Disallow: /musicsearch
Disallow: /musicsp
Disallow: /musiclp
Disallow: /browsersync
Disallow: /call
Disallow: /archivesearch?
Disallow: /archivesearch/url
Disallow: /archivesearch/advanced_search
Disallow: /base/search?
Disallow: /base/reportbadoffer
Disallow: /base/s2
Disallow: /urchin_test/
Disallow: /movies?
Disallow: /codesearch?
Disallow: /codesearch/feeds/search?
Disallow: /wapsearch?
Disallow: /safebrowsing
Allow: /safebrowsing/diagnostic
Allow: /safebrowsing/report_error/
Allow: /safebrowsing/report_phish/
Disallow: /reviews/search?
Disallow: /orkut/albums
Disallow: /jsapi
Disallow: /views?
Disallow: /c/
Disallow: /cbk
Disallow: /recharge/dashboard/car
Disallow: /recharge/dashboard/static/
Disallow: /translate_a/
Disallow: /translate_c
Disallow: /translate_f
Disallow: /translate_static/
Disallow: /translate_suggestion
Disallow: /profiles/me
Allow: /profiles
Disallow: /s2/profiles/me
Allow: /s2/profiles
Allow: /s2/photos
Allow: /s2/static
Disallow: /s2
Disallow: /transconsole/portal/
Disallow: /gcc/
Disallow: /aclk
Disallow: /cse?
Disallow: /cse/panel
Disallow: /cse/manage
Disallow: /tbproxy/
Disallow: /comparisonads/
Disallow: /imesync/
Disallow: /shenghuo/search?
Disallow: /support/forum/search?
Disallow: /reviews/polls/
Disallow: /hosted/images/
Disallow: /hosted/life/
Disallow: /ppob/?
Disallow: /ppob?
Disallow: /ig/add?
Disallow: /adwordsresellers
Disallow: /accounts/o8
Allow: /accounts/o8/id
Disallow: /topicsearch?q=
Disallow: /xfx7/
Disallow: /squared/api
Disallow: /squared/search
Disallow: /squared/table
Disallow: /toolkit/
Allow: /toolkit/*.html
Disallow: /qnasearch?
Disallow: /errors/
Disallow: /app/updates
Disallow: /sidewiki/entry/
Disallow: /quality_form?
Disallow: /labs/popgadget/search
Disallow: /buzz/post
Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml
Sitemap: http://www.google.com/hostednews/sitemap_index.xml
Sitemap: http://www.google.com/ventures/sitemap_ventures.xml
Sitemap: http://www.google.com/sitemaps_webmasters.xml
Sitemap: http://www.gstatic.com/trends/websites/sitemaps/sitemapindex.xml
Sitemap: http://www.gstatic.com/dictionary/static/sitemaps/sitemap_index.xml
				
			

2.9.3. sitemaps

http://www.sitemaps.org/

sitemap.xml

2.9.4. Sitemap in robots.txt

				
User-agent: *
Allow: *
Disallow: /management/
Sitemap: http://netkiller.sourceforge.net/sitemaps.xml.gz			
				
		

2.9.5. sitemap 靜態內容生成工具

		
#!/bin/bash
DOMAIN="http://www.netkiller.cn"
PUBLIC_HTML=~/public_html
if [ ! -z $1 ]; then
	DOMAIN=$1
fi
lastmod=`date "+%Y-%m-%d"`

echo '<?xml version="1.0" encoding="UTF-8"?>'
echo '<?xml-stylesheet type="text/xsl" href="gss.xsl"?>'
echo '<urlset xmlns="http://www.google.com/schemas/sitemap/0.84" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">'
#echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'

for htmlfile in $(find $PUBLIC_HTML/ -type f -name "*.html")
do
	url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"`
	echo '   <url>'
	echo '      <loc>'${DOMAIN}'/'${url}'</loc>'
	echo '      <lastmod>'${lastmod}'</lastmod>'
	echo '      <changefreq>daily</changefreq>'
	echo '      <priority>0.5</priority>'
	echo '   </url>'
done

for htmlfile in $(find $PUBLIC_HTML/ -type f -name "*.epub")
do
        url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"`
        echo '   <url>'
        echo '      <loc>'${DOMAIN}'/'${url}'</loc>'
        echo '      <lastmod>'${lastmod}'</lastmod>'
        echo '      <changefreq>daily</changefreq>'
        echo '      <priority>0.5</priority>'
        echo '   </url>'
done

for htmlfile in $(find $PUBLIC_HTML/ -type f -name "*.mobi")
do
        url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"`
        echo '   <url>'
        echo '      <loc>'${DOMAIN}'/'${url}'</loc>'
        echo '      <lastmod>'${lastmod}'</lastmod>'
        echo '      <changefreq>daily</changefreq>'
        echo '      <priority>0.5</priority>'
        echo '   </url>'
done

for htmlfile in $(find $PUBLIC_HTML/ -type f -name "*.chm")
do
        url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"`
        echo '   <url>'
        echo '      <loc>'${DOMAIN}'/'${url}'</loc>'
        echo '      <lastmod>'${lastmod}'</lastmod>'
        echo '      <changefreq>daily</changefreq>'
        echo '      <priority>0.5</priority>'
        echo '   </url>'
done

for htmlfile in $(find $PUBLIC_HTML/ -type f -name *.pdf)
do
	url=`echo $htmlfile | sed -e "s:$PUBLIC_HTML/::"`
	echo '   <url>'
	echo '      <loc>'${DOMAIN}'/'${url}'</loc>'
	echo '      <lastmod>'${lastmod}'</lastmod>'
	echo '      <changefreq>daily</changefreq>'
	echo '      <priority>0.5</priority>'
	echo '   </url>'
done

echo "</urlset>"