网上看了别人用python爬的各大城市的大众点评热门餐厅前100名,我也拿出了尘封已久的“意大利炮”,小试身手,看看天津小伙伴最喜欢去的餐厅是什么。(文末揭晓答案) ▍网址 http://www.dianping.com/shoplist/shopRank/pcChannelRankingV2?rankId=fce2e3a36450422b7fad3f2b90370efd71862f838d1255ea693b953b1d49c7c0
fiddler调试以后,找到获取数据的真实网址稍有改变: http://www.dianping.com/mylist/ajax/shoprank?rankId=2e5d0080237ff3c8f5b5d3f315c7c4a508e25c702ab1b810071e8e2c39502be1
抓取页面 ▍实现代码 只需要模拟"User-Agent"即可获取完整responsetext数据。 Sub 大众点评热门餐厅() Dim arr(), brr() Dim strText As String With CreateObject("WinHttp.WinHttpRequest.5.1") .Open "GET", "http://www.dianping.com/mylist/ajax/shoprank?rankId=2e5d0080237ff3c8f5b5d3f315c7c4a508e25c702ab1b810071e8e2c39502be1", False .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36""" .Send strText = Replace(.responsetext, vbCrLf, "") Set reg = CreateObject("vbscript.regexp") reg.Global = True reg.Pattern = "address"":""(\S{0,35})"",""alt.*?avgPrice"":(\d+).*?mainCategoryName"":""(\D{1,10})"",""m.*?refinedScore1"":""(\d+\.\d+)"",""refinedScore2"":""(\d+\.\d+)"",""refinedScore3"":""(\d+\.\d+).*?shopId"":""(\d+)"",""shopName"":""(\D{0,40})"",""sh" For Each mat In reg.Execute(strText) k = k + 1 ReDim Preserve arr(1 To 8, 1 To k) arr(1, k) = mat.submatches(7) arr(2, k) = mat.submatches(0) arr(3, k) = mat.submatches(1) arr(4, k) = mat.submatches(2) arr(5, k) = mat.submatches(3) arr(6, k) = mat.submatches(4) arr(7, k) = mat.submatches(5) arr(8, k) = "http://www.dianping.com/shop/" & mat.submatches(6) Next For i = 1 To UBound(arr, 2) .Open "GET", arr(8, i), False .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Win64; x64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; HCTE; ms-office)" .Send strText = Replace(.responsetext, vbCrLf, "") reg.Pattern = "itemprop=""url""> (\D+) </a>" For Each mm In reg.Execute(strText) m = m + 1 ReDim Preserve brr(1 To m) brr(m) = mm Next Next End With Range("a1").Resize(1, 8) = Array("店铺名称", "地址", "人均消费", "分类", "口味", "环境", "服务", "店铺地址") Range("a2").Resize(UBound(arr, 2), 8) = Application.Transpose(arr) End Sub
如果需要抓其他城市的热门餐厅前100名,只需要修改网址中rankId=后面的内容 "上海","fce2e3a36450422b7fad3f2b90370efd71862f838d1255ea693b953b1d49c7c0"
"北京","d5036cf54fcb57e9dceb9fefe3917fff71862f838d1255ea693b953b1d49c7c0"
"广州","e749e3e04032ee6b165fbea6fe2dafab71862f838d1255ea693b953b1d49c7c0"
"深圳","e049aa251858f43d095fc4c61d62a9ec71862f838d1255ea693b953b1d49c7c0"
"杭州","91621282e559e9fc9c5b3e816cb1619c71862f838d1255ea693b953b1d49c7c0"
"南京","d6339a01dbd98141f8e684e1ad8af5c871862f838d1255ea693b953b1d49c7c0"
"苏州","536e0e568df850d1e6ba74b0cf72e19771862f838d1255ea693b953b1d49c7c0"
"成都","c950bc35ad04316c76e89bf2dc86bfe071862f838d1255ea693b953b1d49c7c0"
"武汉","d96a24c312ed7b96fcc0cedd6c08f68c08e25c702ab1b810071e8e2c39502be1"
"重庆","6229984ceb373efb8fd1beec7eb4dcfd71862f838d1255ea693b953b1d49c7c0"
"西安","ad66274c7f5f8d27ffd7f6b39ec447b608e25c702ab1b810071e8e2c39502be1
运行代码后我们得到如下结果:
抓取的数据 数据得到了,开始对这些数据进行简单的分析。(数据分析大师轻喷) ▍分析结果 1.评分及价格排名 针对大众点评热门餐厅前100,取前15家。对口味、环境、服务加和总分情况对比,发现,热门餐厅第一名的喜茶并不是综合评分最高的。 由此看出,大众点评的美食前100名不仅仅是评分的求和排序。有可能还和搜索频率次数、评论个数有关。 口味、环境、环境总分排名 人均消费前15名 这消费水平,目前一家也没去过。 2.菜品分类 我们对抓取的D列数据去重,并画出饼图。用VBA实现点击按钮自动完成上述步骤。 Sub 自动生成饼状图() Set d = CreateObject("scripting.dictionary") For Each rng In Sheet1.Range("d2:d100") d(rng.Value) = d(rng.Value) + 1 Next i = d.keys k = d.items With Sheet2 .Range("a1").Resize(1, 2) = Array("分类", "数量") .Range("a2").Resize(UBound(i), 1) = Application.Transpose(i) .Range("b2").Resize(UBound(k), 1) = Application.Transpose(k) .Shapes.AddChart2(251, xlPie).Select ActiveChart.SetSourceData Source:=.Range("a1:b" & .Cells(Rows.Count, 2).End(3).Row) ActiveChart.ClearToMatchStyle ActiveChart.ChartStyle = 260 ActiveChart.ClearToMatchStyle ActiveChart.ChartStyle = 259 ActiveSheet.Shapes("图表 1").ScaleWidth 1.7533333333, msoFalse, _ msoScaleFromBottomRight ActiveSheet.Shapes("图表 1").ScaleHeight 1.8291666667, msoFalse, _ msoScaleFromBottomRight ActiveSheet.Shapes("图表 1").IncrementLeft 171 ActiveSheet.Shapes("图表 1").IncrementTop 63 ActiveChart.SetElement (msoElementDataLabelCallout) ActiveChart.SetElement (msoElementDataLabelBestFit) End With End Sub
实现效果如下: 对天津热门餐厅前100的菜品种类进行分类,看看大家更喜欢去哪类餐厅,对来本地旅游的人也有一个参考。 通过饼图不难看到,排名第一的竟然是:自助餐。What?! 餐厅种类分类饼图 热门餐厅词云图 3.饭店集中区域分析 插入三维地图,对抓取的店铺地址作为字段导入地图,画热度地图,得到了以下的热度地图。如果想精确表示餐厅位置,可以批量获取餐厅地址的经纬度,有网站可查。 Sub 查取经纬度() Dim br(1 To 1000, 1 To 1) ar = Range([b2], [b65536].End(3)) For i = 1 To UBound(ar) strText = ar(i, 1) s = encodeURIByHtml(strText) URL = "http://restapi.amap.com/v3/place/text?s=rsv3&children=&key=8325164e247e15eea68b59e89200988b&page=1&offset=10&city=350500&language=zh_cn&callback=jsonp_529559_&platform=JS&logversion=2.0&sdkversion=1.3&appname=http%3A%2F%2Flbs.amap.com%2Fconsole%2Fshow%2Fpicker&csid=4167BCCD-C937-47BB-91DB-2C5F4257ABE7&keywords=" & s With CreateObject("msxml2.xmlhttp") .Open "GET", URL, False .Send strText = .responsetext & "location"":"" 未查到!!! ""," s = Split(Split(strText, "location"":""")(1), """,")(0) If Len(s) = 0 Then s = "wei" End With n = n + 1 br(n, 1) = s Next [c2].Resize(n) = "" [c2].Resize(n) = br End Sub 热门餐厅热度地图 餐厅热度地图中颜色最深的前五名分别是: 1 | 营口道地铁站商圈 | 和平大悦城、乐宾商厦、伊势丹、吉利大厦、世纪都会 | 2 | 南开大悦城 | 南开大悦城(天津大悦城火不是盖的,撑起来了半壁江山) | 3 | 和平路商圈 | 恒隆广场、天河城 | 4 | 西北角商圈 | 水游城、陆家嘴中心 | 5 | 上杭路街 | 万达广场、爱琴海购物中心 |
在天津,逛街还是去滨江道(营口道商圈、和平路商圈)。最热闹,好吃的餐厅最多的地方。
总结:这只是个对热门餐厅聚集地的一个大概分析,纯属个人分析。另,欢迎大家来天津!
|