xpath定位失败

谷歌浏览器直接提取的xpath，在python中无法提取相应内容

chrome拷贝的xpath里面会添加多余的tbody!!

问题:
- 在爬取网页的时候利用chrome直接获取的xpath在爬虫中无法直接获得对应内容
详细解释：Scrapy
原因
- 因为浏览器对不标准的HTML文档都有纠正功能，而lxml不会查看page source，注意是源代码，不是developer tool那个；
- 最后一个table并没有包含tbody，浏览器会自动补充tbody，而lxml没有这么做，所以你的xpath没有找到

问题栗子~

# -*- coding: utf-8 -*-
from lxml import etree
import requests
url='http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2014/'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'}
html=requests.get(url,headers=headers)
html.encoding='GBK'
selector = etree.HTML(html.text)
content=selector.xpath('//html/body/table[2]/tbody/tr[1]/td/table/tbody/tr[2]/td/table/tbody/tr/td/table/tbody/tr/td/a/text()')
for each in content:
    print each

代码中的其他问题！

代码中xpath太长，容易出错！
可以改成：

from lxml import etree
import requests
url='http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2014/'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'}
html=requests.get(url,headers=headers)
html.encoding='GBK'
selector = etree.HTML(html.text)
nodes=selector.xpath('//tr[@class="provincetr"]/node()')

for each in nodes:
    print(each.xpath('string()'))