pypetter能做什么
自行百度
为什么没法下载
省略原因,只有mmp
解决方法 1
既然无法下载,那我们就手动下载然后放到指定位置么。问题来了
- 从哪下载
- 放到哪里
- 从哪里下载
- 找这个文件:find / -name chromium_downloader.py
- python 3.8环境下,找到文件:/usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py
- 备份py库文件稍后恢复,/usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py
- 编辑 vim /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py
备份
mv /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py.bak /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.p
稍后恢复
mv /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py.bak /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py
def download_zip(url: str) -> BytesIO:
"""Download data from url."""
logger.warning('start chromium download.\n'
'Download may take a few minutes.')
print("下载地址" + url) # 打印下载地址
运行python程序,pyppeteer输出下载链接
https://storage.googleapis.com/chromium-browser-snapshots/Linux_x64/588429/chrome-linux.zip
有了url,方便下载了吧。再不会那就要去搜别的主题了。
- 放到哪里
- 同上打开上面那个py文件
- 在chromium_executable()函数中增加print打印目录
def chromium_executable() -> Path:
"""Get path of the chromium executable."""
print(chromiumExecutable[current_platform()])
return chromiumExecutable[current_platform()]
得到存储位置
/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome
对应存储目录:/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/
- 创建子目录,放入chromium运行测试
ls /root/.local/share/pyppeteer/
发现目录存在,ok,逐级创建子目录即可
进入目录
cd /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux
解压缩上面下载的chrome-linux.zip
把chrome-linux文件夹下所有内容上传到/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux
给运行权限
cd /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/
chmod +x chrome
chmod +x chrome_sandbox
运行chrome
/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome
报错: error while loading shared libraries: libXss.so.1: cannot open shared object file: No such file or directory
- 解决错误
yum install pango.x86_64 libXcomposite.x86_64 libXcursor.x86_64 libXdamage.x86_64 libXext.x86_64 libXi.x86_64 libXtst.x86_64 cups-libs.x86_64 libXScrnSaver.x86_64 libXrandr.x86_64 GConf2.x86_64 alsa-lib.x86_64 atk.x86_64 gtk3.x86_64 ipa-gothic-fonts xorg-x11-fonts-100dpi xorg-x11-fonts-75dpi xorg-x11-utils xorg-x11-fonts-cyrillic xorg-x11-fonts-Type1 xorg-x11-fonts-misc -y
解决方法 2
修改下载地址为淘宝地址
vim /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py
# 修改为taobao地址
DEFAULT_DOWNLOAD_HOST = 'https://npm.taobao.org/mirrors';#'https://storage.googleapis.com'
linux环境运行报错问题问题
windows运行正常,linux环境报错:pyppeteer.errors.BrowserError: Browser closed unexpectedly
[root@iZbp1i88wdqojke2dk2ugbZ ymsq.com]# python test1.py
Traceback (most recent call last):
File "test1.py", line 24, in <module>
asyncio.get_event_loop().run_until_complete(getContent(link))
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
return future.result()
File "test1.py", line 13, in getContent
browser = await launch()
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 305, in launch
return await Launcher(options, **kwargs).launch()
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 166, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 225, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
解决方案:
# 增加参数options={'args': ['--no-sandbox']}
async def getContent(link):
try:
browser = await launch('args': ['--no-sandbox']) # 增加参数options={'args': ['--no-sandbox']}
page = await browser.newPage()
await page.goto(link)
content = await page.content()
doc = pq(content)
print(content)
#print('Quotes:', doc('.quote').length)
await browser.close() # 关闭浏览器
return content
except BadStatusLine as e:
print("错误" + e.getContent)
asyncio.get_event_loop().run_until_complete(getContent(link))
问题
- 在实际运行中。我在基于tonado的api服务上对外提供这个功能。运行中发现大量chromium僵尸进程,导致系统卡死。也就是说browser.close()并没有真正关闭chromium进程。
有多少个僵尸进程呢
[root@sdsdsdZ ~]# ps aux|grep pyppeteer/local-chromium | wc -l
192
可怕。192个。先杀掉所有含有pyppeteer的进程吧
ps aux | grep pyppeteer | awk '{print $2}' | xargs kill -9
解决方案
- 关闭页面
await page.close() # 增加关闭页面,观察下
相关github
学习了。重中之重