Menu
Woocommerce Menu

python批量下载豆瓣图片代码,linux中如何查询命令的用法

0 Comment

放到函数sorted()/list.sort()的运用

  在linux系统中,假使顾客某吩咐的效果非常的小清楚,能够应用man命令查询支持:
  [root@free root]# man shutdown ←
以man命令查询稍后要介绍的shutdown命令的用法。
  大多数指令的语法,还足以经过-h或—help参数查询。举个例子shutdown命令的语法能够运维shutdown
–h或上述的man shutdown 命令查得。

走走豆瓣的时候,发现存的图纸,懒得一个一个扒,在此以前写过c#和python版本的图片下载,因而拿以前的Python代码来改了改,折腾出贰个豆类版本,方便各位使用

简简单单利用

 

# -*- coding:utf8 -*-
import urllib2, urllib, socket
import re
import requests
from lxml import etree
import os, time

python对list有多个放置函数:sorted(),特地用来排序。比方:

 

DEFAULT_DOWNLOAD_TIMEOUT = 30

>>> a=[5,3,6,1,9,2]
>>> sorted(a)       #a经过sorted之后,获得贰个排序结果
[1, 2, 3, 5, 6, 9]  #只是,原有的a并未受到震慑
>>> a
[5, 3, 6, 1, 9, 2]
也可以应用list.sort()来开展上述操作。

 

class AppURLopener(urllib.FancyURLopener):
    version = “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT)”

>>> a.sort()
>>> a               #留心这里,经过list.sort()之后,原有
[1, 2, 3, 5, 6, 9]  #a的各样已经发生变化,与上述分歧之处。
sorted和list.sort()的区别:
list.sort()只好对list类型实行排序。如下:

图片 1

def check_save_path(save_path):
    if not os.path.exists(save_path):
        os.makedirs(save_path)

>>> b_dict={1:’e’,3:’m’,9:’a’,5:’e’}
>>> b_dict.sort()
Traceback (most recent call last):
  File “<stdin>”, line 1, in <module>
  AttributeError: ‘dict’ object has no attribute ‘sort’
而sorted则不然,看例子:

def get_image_name(image_link):
    file_name = os.path.basename(image_link)
    return file_name

>>> b_dict
{1: ‘e’, 3: ‘m’, 5: ‘e’, 9: ‘a’}
>>> sorted(b_dict)
[1, 3, 5, 9]
sorted之后,上述对dictinoary中,将key值抽取并排序,再次来到list类型的排序结果。

def save_image1(image_link, save_path):
    file_name = get_image_name(image_link)
    file_path = save_path + “\\” + file_name
    print(“打算下载{0} 到{1}”.format(image_link, file_path))
    try:
        urllib._urlopener = AppURLopener()
        socket.setdefaulttimeout(DEFAULT_DOWNLOAD_TIMEOUT)
        urllib.urlretrieve(url=image_link, filename=save_path)
        return True
    except Exception, ex:
        print(ex.args)
        print(“下载文件出错:{0}”.format(ex.message))
        return False

依据钦点关键词排序

def save_image(image_link, save_path):
    file_name = get_image_name(image_link)
    file_path = save_path + “\\” + file_name
    print(“计划下载{0} 到{1}”.format(image_link, file_path))
    try:
        file_handler = open(file_path, “wb”)
        image_handler = urllib2.urlopen(url=image_link,
timeout=DEFAULT_DOWNLOAD_TIMEOUT).read()
        file_handler.write(image_handler)
        return True
    except Exception, ex:
        print(“下载文件出错:{0}”.format(ex.message))
        return False

在list.sort()和sorted中,都能够依据钦赐的key值排序。比方:

def get_thumb_picture_link(thumb_page_link):
    try:
        html_content = urllib2.urlopen(url=thumb_page_link,
timeout=DEFAULT_DOWNLOAD_TIMEOUT).read()
        html_tree = etree.HTML(html_content)
        # print(str(html_tree))
        link_tmp_list =
html_tree.xpath(‘//div[@class=”photo_wrap”]/a[@class=”photolst_photo”]/img/@src’)
        page_link_list = []
        for link_tmp in link_tmp_list:
            page_link_list.append(link_tmp)
        return page_link_list
    except Exception, ex:
        print(ex.message)
        return []

sorted的例子:

def download_pictures(album_link, min_page_id, max_page_id,
picture_count_per_page, save_path):
    check_save_path(save_path)
    min_page_id = 0
    while min_page_id < max_page_id:
        thumb_page_link = album_link +
“?start={0}”.format(min_page_id * picture_count_per_page)
        thumb_picture_links =
get_thumb_picture_link(thumb_page_link)
        for thumb_picture_link in thumb_picture_links:
            full_picture_link =
thumb_picture_link.replace(“photo/thumb”, “photo/large”)
            save_flag = save_image(image_link=full_picture_link,
save_path=save_path)
            if not save_flag:
                full_picture_link =
thumb_picture_link.replace(“photo/thumb”, “photo/photo”)
                save_image(image_link=full_picture_link,
save_path=save_path)
            time.sleep(1)
        min_page_id += 1
    print(“下载实现”)

>>> qw=”I am Qiwsir you can read my articles im my
blog”.split()
>>> qw
[‘I’, ‘am’, ‘Qiwsir’, ‘you’, ‘can’, ‘read’, ‘my’, ‘articles’, ‘im’,
‘my’, ‘blog’]
>>> sorted(qw,key=str.lower)        #依据字母升序排列
[‘am’, ‘articles’, ‘blog’, ‘can’, ‘I’, ‘im’, ‘my’, ‘my’, ‘Qiwsir’,
‘read’, ‘you’]
list.sort()的例子:

# 设置图片保存的地面文件夹
save_path = “J:\\douban\\meiren2”
# 设置相册地址,注意以反斜杠结尾
album_link =
“https://www.douban.com/photos/album/43697061/”
# 设置相册总页数
max_page_id = 9
# 设置每页图片数量,默感到18张
picture_count_per_page = 18
download_pictures(album_link, max_page_id,
picture_count_per_page, save_path)

>>> qw 
[‘I’, ‘am’, ‘Qiwsir’, ‘you’, ‘can’, ‘read’, ‘my’, ‘articles’, ‘im’,
‘my’, ‘blog’]
>>> qw.sort(key=str.lower)
>>> qw
[‘am’, ‘articles’, ‘blog’, ‘can’, ‘I’, ‘im’, ‘my’, ‘my’, ‘Qiwsir’,
‘read’, ‘you’]
别的,key还能接过函数的十足重返值,依照该值排序。举个例子:

=============================================================

标签:,

发表评论

电子邮件地址不会被公开。 必填项已用*标注

相关文章

网站地图xml地图