Screen scraping based on title using python bs4 -

i have problem in screen scraping using bs4. following code.

from bs4 import beautifulsoup import urllib2 url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?" page=urllib2.urlopen(url) soup = beautifulsoup(page.read()) properties=soup.findall('a',{'title':'bedroom'}) eachproperty in properties:     print eachproperty['href']+",", eachproperty.string

when analyzed website, actual title structure looks this

1 bedroom, residential apartment in velachery anchor links. not output , no error either. how tell program scrape data has title containing word "bedroom"?

hope made clear.

you'll need use regular expression here, want match anchor links have bedroom in title, not whole title:

import re  properties = soup.find_all('a', title=re.compile('bedroom'))

this gives 47 matches url you've given.

Search This Blog

Brazell

Screen scraping based on title using python bs4 -

Comments

Post a Comment

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -