阿里试用,女朋友逼着我给她排序

发布日期:2019-04-28

阿里试用排序

前景提要

说来简直丢尽了钢铁直男的脸,没错,昨晚我在愉快的做着外包的活(中国移动的小程序,自由职业,喂),11点多了,女友突然脑子一抽:“你能不能帮我把这个玩意排序一下给我用啊,我好薅点羊毛,技术能实现嘛?”我比较无奈的看了看,阿里试用咩?什么鬼,哦哦哦,就这玩意啊,爬虫爬一下就是了。我是前端……回道:“没问题啊,爬虫呗。”她:“哇,多久能做出来啊?”我:“我现在在忙诶,1-2小时吧。”她:“行了,你别忙了,赶紧帮我弄一下出来!”我看了看她的脸,羞耻的最小化《微信开发者工具》。。。

页面展示

你要是觉得这也是广告,那真是太抬举我了。

爬虫搞起来

NodeJS 爬虫,百度一下,到处都是现成的代码,我也就不一一分析了,拿出简书的一段代码,来自 埃米莉Emily:

const express = require("express");// 调用 express 实例,它是一个函数,不带参数调用时,会返回一个 express 实例,将这个变量赋予 app 变量。const superagent = require("superagent");const cheerio = require("cheerio");const app = express();app.get("/", (req, res, next) => { console.log(req) superagent.get("https://www.v2ex.com/") .end((err, sres) => { // 常规的错误处理 if (err) { return next(err); } // sres.text 里面存储着网页的 html 内容,将它传给 cheerio.load 之后 // 就可以得到一个实现了 jquery 接口的变量,我们习惯性地将它命名为 `$` // 剩下就都是 jquery 的内容了 let $ = cheerio.load(sres.text); let items = []; $(".item_title a").each((idx, element) => { let $element = $(element); items.push({ title: $element.text(), href: $element.attr("href") }); }); res.send(items); });});app.listen(3000, function () { console.log("app is listening at port 3000");});

嘛,express 用 NodeJS 的不可能不知道,superagent 理解成可以在 Node 里面做对外请求即可,cheerio 嗯,Node 专用 JQ。

首爬

把上面的请求地址换成:https://try.taobao.com/,查看页面标签结构,找到想要的选择器结构:

.tb-try-wd-item-info > .detail,把这个替换上面选择器 .item_title a,走起:

……我不想展示结果,因为只有六个,页面实际展示是 10 个,找了半天,发现两个问题:

如上,第一个是爬到的 6 个是推荐,喵的,不是下面列表;第二个,下面列表是后面通过 POST 单独请求来的数据,怎么看都是某框架的 SSR 干的好事。

于是爬虫不成,得换战略。

模拟 POST

OK,既然是 POST,就好弄了,直接把连接跟参数刨出来,然后 superagent 模拟:

superagent .post( `https://try.taobao.com/api3/call?what=show&page=${paylaod.page}&pageSize&api=x%2Fsearch` ) .set("content-type", "application/x-www-form-urlencoded; charset=UTF-8") .end((err, sres) => { // 常规的错误处理 if (err) { return next(err) } const result = JSON.parse(sres.text).result // 返回结构树 resolve(result) })

content-type 源自:

哼哼哼,你没猜错,失败了,如下:

想想是必然的,怎么可能给你随便请求呢,然后该怎么做?研究?nonono,老夫上来就是一梭子,不就是 Content-Type 么!

superagent .post( `https://try.taobao.com/api3/call?what=show&page=${paylaod.page}&pageSize&api=x%2Fsearch` ) .set( "user-agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" ) .set("accept", "pplication/json, text/javascript, */*; q=0.01") .set("accept-encoding", "gzip, deflate, br") .set( "accept-language", "zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7,zh-TW;q=0.6,da;q=0.5" ) // .set("content-length", "8") .set("content-type", "application/x-www-form-urlencoded; charset=UTF-8") .set( "cookie", "your cookie" ) .set("origin", "https://try.taobao.com") .set("referer", "https://try.taobao.com") .set("x-csrf-token", "f0b8e7443eb7e") .set("x-requested-with", "XMLHttpRequest") .end((err, sres) => { // 常规的错误处理 if (err) { return next(err) } const result = JSON.parse(sres.text).result resolve(result) })

依据就是下面这个:

不就是头么,不就是源么,不就是用户代理么,用个 HTTPS 还没有你办法了?

注意上面 .set("content-length", "8"),不知道那边怎么玩,加上这个就超时……

于是,交代了吧:

{ "pages": { "paging": { "n": 2182, "page": 1, "pages": 219 }, "items": [ { "shopUserId": "2450112357", "title": "凯度高端款嵌入式蒸烤箱", "status": 1, "totalNum": 1, "requestNum": 15530, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "casdon凯度旗舰店", "showId": "2561626", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34530215", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1ycS2eMDqK1RjSZSyXXaxEVXa.jpg", "shopItemId": "559771706359", "price": 13850 }, { "shopUserId": "3189770892", "title": "皇家美素佳儿老包装2段400g", "status": 1, "totalNum": 50, "requestNum": 2079, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "皇家美素佳儿旗舰店", "showId": "2551240", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34396042", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1YrSZaVYqK1RjSZLeXXbXppXa.jpg", "shopItemId": "547114874458", "price": 189 }, { "shopUserId": "1077716829", "title": "关注店铺优先审水密码幻彩隔离", "status": 1, "totalNum": 10, "requestNum": 6907, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "水密码旗舰店", "showId": "2568391", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34784086", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB16_4ChmzqK1RjSZPxXXc4tVXa.jpg", "shopItemId": "559005882880", "price": 599 }, { "shopUserId": "725786863", "title": "精品皮草派克大衣", "status": 1, "totalNum": 1, "requestNum": 11793, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "美瑞蓓特", "showId": "2557886", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34574078", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1zVLMdCrqK1RjSZK9XXXyypXa.jpg", "shopItemId": "577418950477", "price": 5980 }, { "shopUserId": "3000840351", "title": "保友智能新品Pofit电脑椅", "status": 1, "totalNum": 1, "requestNum": 12895, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "保友办公家具旗舰店", "showId": "2557100", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34528042", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1bYZEg6TpK1RjSZKPXXa3UpXa.png", "shopItemId": "577598687971", "price": 5408 }, { "shopUserId": "791732485", "title": "TEK手持吸尘器A8", "status": 1, "totalNum": 1, "requestNum": 17195, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "泰怡凯旗舰店", "showId": "2552265", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34444014", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1D6bWbhTpK1RjSZFGXXcHqFXa.jpg", "shopItemId": "547653053965", "price": 5199 }, { "shopUserId": "3229583972", "title": "椰富海南冷炸椰子油食用油1L", "status": 1, "totalNum": 20, "requestNum": 4451, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "椰富食品专营店", "showId": "2561698", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34532250", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1VjLSePDpK1RjSZFrXXa78VXa.jpg", "shopItemId": "578653506446", "price": 256 }, { "shopUserId": "855223948", "title": "卡西欧立式家用电钢琴PX770", "status": 1, "totalNum": 1, "requestNum": 16762, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "世纪音缘乐器专营店", "showId": "2551326", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34420041", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1CC6aa9zqK1RjSZFpXXakSXXa.jpg", "shopItemId": "562405126383", "price": 4838 }, { "shopUserId": "4065939832", "title": "关注宝贝送轻奢沙发床", "status": 1, "totalNum": 1, "requestNum": 17436, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "贝兮旗舰店", "showId": "2559904", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34532170", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1AzxYegHqK1RjSZFPXXcwapXa.jpg", "shopItemId": "577798067313", "price": 4399 }, { "shopUserId": "807974445", "title": "森海塞尔CX6蓝牙耳机", "status": 1, "totalNum": 4, "requestNum": 22557, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "sennheiser旗舰店", "showId": "2559701", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34532161", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1HET6d7voK1RjSZFwXXciCFXa.jpg", "shopItemId": "564408956766", "price": 999 } ] }}

细心的小伙伴应该看到,我没有发送 form 给他,一样可以请求到需要的数据,page 挂在了 query 上……这里,允许我严重怀疑一下他们的技术能力。

展示部分

数据拿到,就简单了,其实就是这一个接口实现剩下的功能了,没错,记住我是前端。

<!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> <title>tb try</title> <style> .warning { color: red; } button { width: 100px; height: 44px; margin-right: 44px; } table { border: 1px solid #d8d8d8; border-collapse: collapse; } tr { border-bottom: 1px solid #d8d8d8; cursor: pointer; } tr:last-child { border: 0; } </style></head><body> <button onclick="postPage()">下一页</button> <span id="currentPage"></span> <table> <tbody> <tr> <th>序号(倒序)</th> <th>概率</th> <th>名字</th> </tr> </tbody> <tbody id="results"></tbody> </table> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <script> let currentPage = 0 // 当前页面 let allItems = [] // 全部数据 let currentTime = 0 // 锁频率使用,标记上次时间 const loopInterval = 2 // 锁频率步长,单位秒 const results = document.querySelector("#results") const currentPageText = document.querySelector("#currentPage") const reFullTBody = arr => { let innerHtml = "" arr.forEach((item, i) => { item.rate = item.totalNum / item.requestNum * 100 let tr = ` <tr onclick="window.open("https://try.taobao.com/item.htm?id=${item.id}")"> <td>${i + 1}</td> <td>${item.rate.toFixed(3) + "%"}</td> <td>${item.title}</td> </tr> ` if (item.rate > 5) tr = tr.replace("<tr", "<tr ") innerHtml += tr }) currentPageText.innerText = `当前页:${currentPage}` results.innerHTML = innerHtml // 原谅我 mvvm 的臭毛病 } const postPage = () => { // 锁频率步长内取消请求 const newTime = new Date().getTime() const shoudBack = newTime - currentTime < loopInterval * 1000 if(shoudBack) { alert(loopInterval + "秒内不要多次点击哦。") return } currentTime = newTime $.post("/table", { page: currentPage }, res => { if(res.length < 1) { alert("今天结束的已经筛选完了") return } allItems = [...allItems, ...res] allItems.sort((a, b) => b.rate - a.rate) reFullTBody(allItems) currentPage-- }) } $.get("/total", res => { currentPage = res.pages postPage() }) </script></body></html>

长这个样子:

我多人性化,可以点击跳转、概率超过 5% 红色展示、还告诉你当前所在页码、点太快还给你提示………………………………

就是这么好用,喜欢的赶紧体验吧!


线上:点我体验

Github: Spider


觉得有用,不要吝惜 star 哦。