Screen scraping based on title using python bs4 -
i have problem in screen scraping using bs4. following code.
from bs4 import beautifulsoup import urllib2 url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?" page=urllib2.urlopen(url) soup = beautifulsoup(page.read()) properties=soup.findall('a',{'title':'bedroom'}) eachproperty in properties: print eachproperty['href']+",", eachproperty.string
when analyzed website, actual title structure looks this
1 bedroom, residential apartment in velachery
anchor links. not output , no error either. how tell program scrape data has title containing word "bedroom"
?
hope made clear.
you'll need use regular expression here, want match anchor links have bedroom
in title, not whole title:
import re properties = soup.find_all('a', title=re.compile('bedroom'))
this gives 47 matches url you've given.
Comments
Post a Comment