阿里云centos环境python的pyppeteer无法下载依赖chromium组件

楚天乐 289 0 条

pypetter能做什么

自行百度

为什么没法下载

省略原因,只有mmp

解决方法 1

既然无法下载,那我们就手动下载然后放到指定位置么。问题来了

  • 从哪下载
  • 放到哪里
  1. 从哪里下载
  • 找这个文件:find / -name chromium_downloader.py
  • python 3.8环境下,找到文件:/usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py
  • 备份py库文件稍后恢复,/usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py
  • 编辑 vim /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py

备份

mv /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py.bak /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.p

稍后恢复

mv /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py.bak /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py
def download_zip(url: str) -> BytesIO:
    """Download data from url."""
    logger.warning('start chromium download.\n'
                   'Download may take a few minutes.')
    print("下载地址" + url)    # 打印下载地址

运行python程序,pyppeteer输出下载链接

https://storage.googleapis.com/chromium-browser-snapshots/Linux_x64/588429/chrome-linux.zip

有了url,方便下载了吧。再不会那就要去搜别的主题了。

  1. 放到哪里
  • 同上打开上面那个py文件
  • 在chromium_executable()函数中增加print打印目录
def chromium_executable() -> Path:
    """Get path of the chromium executable."""
    print(chromiumExecutable[current_platform()])
    return chromiumExecutable[current_platform()]

得到存储位置

/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome

对应存储目录:/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/

  1. 创建子目录,放入chromium运行测试
ls /root/.local/share/pyppeteer/

发现目录存在,ok,逐级创建子目录即可

进入目录

cd  /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux

解压缩上面下载的chrome-linux.zip

微信图片_20200909194502.png

把chrome-linux文件夹下所有内容上传到/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux

微信图片_20200909194923.png

给运行权限

cd /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/
chmod +x chrome
chmod +x chrome_sandbox

运行chrome

/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome
报错: error while loading shared libraries: libXss.so.1: cannot open shared object file: No such file or directory
  1. 解决错误
yum install pango.x86_64 libXcomposite.x86_64 libXcursor.x86_64 libXdamage.x86_64 libXext.x86_64 libXi.x86_64 libXtst.x86_64 cups-libs.x86_64 libXScrnSaver.x86_64 libXrandr.x86_64 GConf2.x86_64 alsa-lib.x86_64 atk.x86_64 gtk3.x86_64 ipa-gothic-fonts xorg-x11-fonts-100dpi xorg-x11-fonts-75dpi xorg-x11-utils xorg-x11-fonts-cyrillic xorg-x11-fonts-Type1 xorg-x11-fonts-misc -y

解决方法 2

修改下载地址为淘宝地址

vim /usr/local/lib/python3.8/site-packages/pyppeteer/chromium_downloader.py

# 修改为taobao地址
DEFAULT_DOWNLOAD_HOST = 'https://npm.taobao.org/mirrors';#'https://storage.googleapis.com'

linux环境运行报错问题问题

windows运行正常,linux环境报错:pyppeteer.errors.BrowserError: Browser closed unexpectedly

[root@iZbp1i88wdqojke2dk2ugbZ ymsq.com]# python test1.py
Traceback (most recent call last):
  File "test1.py", line 24, in <module>
    asyncio.get_event_loop().run_until_complete(getContent(link))
  File "/usr/local/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "test1.py", line 13, in getContent
    browser = await launch()
  File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 305, in launch
    return await Launcher(options, **kwargs).launch()
  File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 166, in launch
    self.browserWSEndpoint = get_ws_endpoint(self.url)
  File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 225, in get_ws_endpoint
    raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

解决方案:

# 增加参数options={'args': ['--no-sandbox']}
async def getContent(link):
    try:
        browser = await launch('args': ['--no-sandbox'])   # 增加参数options={'args': ['--no-sandbox']}
        page = await browser.newPage()
        await page.goto(link)
        content = await page.content()
        doc = pq(content)
        print(content)
        #print('Quotes:', doc('.quote').length)

        await browser.close()   # 关闭浏览器
        return content
    except BadStatusLine as e:
        print("错误" + e.getContent)
asyncio.get_event_loop().run_until_complete(getContent(link))

问题

  1. 在实际运行中。我在基于tonado的api服务上对外提供这个功能。运行中发现大量chromium僵尸进程,导致系统卡死。也就是说browser.close()并没有真正关闭chromium进程。

微信图片_20200923105427.png

有多少个僵尸进程呢

[root@sdsdsdZ ~]# ps aux|grep pyppeteer/local-chromium | wc -l
192

可怕。192个。先杀掉所有含有pyppeteer的进程吧

ps aux | grep pyppeteer | awk '{print $2}' | xargs kill -9

解决方案

  • 关闭页面
await page.close()      # 增加关闭页面,观察下

相关github

https://github.com/pyppeteer/pyppeteer/issues/135



发表我的评论
昵称 (必填)
邮箱 (必填)
网址
执行时间: 48.344135284424 毫秒