Screen scraping based on title using python bs4 -
i have problem in screen scraping using bs4. following code.
from bs4 import beautifulsoup import urllib2 url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?" page=urllib2.urlopen(url) soup = beautifulsoup(page.read()) properties=soup.findall('a',{'title':'bedroom'}) eachproperty in properties: print eachproperty['href']+",", eachproperty.string when analyzed website, actual title structure looks this
1 bedroom, residential apartment in velachery anchor links. not output , no error either. how tell program scrape data has title containing word "bedroom"?
hope made clear.
you'll need use regular expression here, want match anchor links have bedroom in title, not whole title:
import re properties = soup.find_all('a', title=re.compile('bedroom')) this gives 47 matches url you've given.
Comments
Post a Comment