BeautifulSoup常用集锦

官方文档

 htmlDoc = """
 <html><head><title>The Dormouse's story</title></head>
 <body>
 <p class="title"><b>The Dormouse's story</b></p>
 <p class="story">Once upon a time there were three little sisters; and their names were
 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p>
 <p class="story">...</p>"""

调用库

from bs4 import BeautifulSoup #Python3

soup = BeautifulSoup(htmlDoc)

查找元素

下面每行代码都是等价的方法,返回结果也都一样

 ps1 = soup('p') //返回所有<p></p>
 ps2 = soup.find_all('p')

结构化输出

1.仅获取文本
 print(soup.get_text())
 # The Dormouse's story
 #
 # The Dormouse's story
 #
 # Once upon a time there were three little sisters; and their names were
 # Elsie,
 # Lacie and
 # Tillie;# and they lived at the bottom of a well.
 #
 # ...
1. href
for link in soup.find_all('a'): 
    print(link.get('href')) 
    # http://example.com/elsie 
    # http://example.com/lacie 
    # http://example.com/tillie

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • Android 自定义View的各种姿势1 Activity的显示之ViewRootImpl详解 Activity...
    passiontim阅读 174,093评论 25 709
  • 突然看到一段视频,是王璐丹参加跨界歌王时的表演。 已经忘记了自己什么时候看的剧,当年,米莱站在台上...
    初之sweety阅读 570评论 0 1
  • 文/细嗅蔷薇88 熏风痴又傻, 只顾戏飞花。 未知离别意, 空枝自嗟呀。
    夜雨残灯阅读 221评论 0 1
  • 2017.3.7 电梯关门,两人独处,一男一女,彼此都在尽力避免四目相对; 电梯开门, 两人逃出,一前一后,从此,...
    大路上的小丑阅读 263评论 0 0