Screen scraping based on title using python bs4 -


i have problem in screen scraping using bs4. following code.

from bs4 import beautifulsoup import urllib2 url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?" page=urllib2.urlopen(url) soup = beautifulsoup(page.read()) properties=soup.findall('a',{'title':'bedroom'}) eachproperty in properties:     print eachproperty['href']+",", eachproperty.string 

when analyzed website, actual title structure looks this

1 bedroom, residential apartment in velachery anchor links. not output , no error either. how tell program scrape data has title containing word "bedroom"?

hope made clear.

you'll need use regular expression here, want match anchor links have bedroom in title, not whole title:

import re  properties = soup.find_all('a', title=re.compile('bedroom')) 

this gives 47 matches url you've given.


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

javascript - storing input from prompt in array and displaying the array -