在网上看到了一个从国家地理杂志网站下载每日图片的Perl程序。因为网站改版,脚本已经不能正常运行,经过修改后,经测试可以将每日图片下载到当前目录,代码如下:

#!/usr/bin/perl
# author:
# Seraphxby
# Purpose:
# 查看国家地理每日图片的Perl脚本
# history:
# v0.1 first draft
# 2010-12-8 modified by gaospecial
use warnings;
use strict;
use LWP::Simple;
my $url='http://photography.nationalgeographic.com/photography/photo-of-the-day';
my $content = get( "$url" ) || die "get the page failed!\n";
my $img;
if ($content =~ m/(.*?<\/a>)/s) {
$content = $1;
if($content =~ m/<img.*?src="([^\t]*?)"/){ # 非贪婪的数量词 *?,匹配中不含有跳格\t
$img = $1;
if($img =~ m/^(.*?)([^\/]+$)/){
$url = $1;
$img = $2;
getstore($url.$img,$img) || die "get img failed!\n";

}
}
}

 

然后,为了能够下载Cell的封面,将脚本中的相应参数修改后,类似的,可以从cell网站下载封面图片到当前目录。代码如下:

 

#!/usr/bin/perl
# author:
# gaospecial@gmail.com 2010-12
# Purpose:
# 获取最近一期Cell杂志封面的Perl脚本

use warnings;
use strict;
use LWP::Simple;
my $url='http://www.cell.com/current';
my $content = get( "$url" ) || die "get the page failed!\n";
my $img;
if ($content =~ m/(.*?<\/div>)/s) {
$content = $1;
if($content =~ m/<img.*?src="([^\t]*?)".*<p.*>(.*)<\/p>/){ # get url and cation
my $img_src = $1;
my $img_cation = $2;
if($img_src =~ m/^.*?([^\/]+$)/){ # get image name
my $img_name = $1;
getstore($img_src,$img_name) || die "get img failed!\n"; # save image to current dir
}
}
}

 

此处的两部分代码,可以作为Perl LWP模块的入门范例。

Posted from GScribble.

作者简介

Chun-Hui Gao is a Research Associate at Huazhong Agricultural University.

重复使用

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT. The full source is available at https://github.com/yihui/hugo-prose.

欢迎修订

如果您发现本文里含有任何错误(包括错别字和标点符号),欢迎在本站的 GitHub 项目里提交修订意见。

引用本文

如果您使用了本文的内容,请按照以下方式引用:

gaoch (2010). Perl LWP:从“国家地理每日图片”到“Cell最近一期的封面”. BIO-SPRING. /post/2010/12/08/2010-12-08-perl-lwp-image-fetch/

BibTeX citation

@misc{
  title = "Perl LWP:从“国家地理每日图片”到“Cell最近一期的封面”",
  author = "gaoch",
  year = "2010",
  journal = "BIO-SPRING",
  note = "/post/2010/12/08/2010-12-08-perl-lwp-image-fetch/"
}