August 29, 2008 by Alex Polski
If you want to develop a good scraper Perl can be very good solution for you. It has all you need for these purposes: mechanize library, treebuilder class and threads support.
1. Mechanize library is a complex library for automating interaction with websites. It completely simulates user’s activity like clicking on links and submitting forms and has a lot of another useful features. Let’s look at the code:
use WWW::Mechanize;
#create mechanize object
my $mech = WWW::Mechanize->new();
#set user agent string
$mech->agent('Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1');
#go to http://www.example.com/
$mech->get('http://www.example.com/');
#click on link 'Some text'
$mech->follow_link(text => 'Some text');
#fill and submit the form
$mech->submit_form(
form_name => 'search',
fields => { query => 'Some text' },
button => 'Search Now'
);
Of course, there were basic features used in the example above, you can find full documentation here.
The books I recommend:
This entry was posted on Friday, August 29, 2008 at 6:48 am and is filed under Scraping.
You can leave a response, or trackback from your own site.
Related posts
« Backing up MySQL tables under extremal conditions
How to develop a good scraper on Perl - Lesson 2 »
Leave a Reply
Posted on September 24th, 2008 at 5:08 pm
Take a good look at FEAR::API
you can get it here http://backpan.perl.org/authors/id/X/XE/XERN/