regex - Get values from onclick attribute using python bs4 -
i unable parse through onclick attribute selected values. here onclick attribute
onclick="try{appendpropertyposition(this,'b10331465','9941951739','','dealer','murugan.n');jsb9onunloadtracking();jsevt.stopbubble(event);}catch(e){};" how selected values onclick attribute such (phonenumber , '', 'dealer','name'). here code.
from bs4 import beautifulsoup import urllib2 import re url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?" page=urllib2.urlopen(url) soup = beautifulsoup(page.read()) properties = soup.findall('a', title=re.compile('bedroom')) eachproperty in properties: print "http:/"+ eachproperty['href']+",", eachproperty.string, eachproperty['onclick'] update
i want 1 phone number, though there many, above mentioned onclick attribute.
for example, right getting
y10765227, 9884877926, 9283183326,, dealer, rgmuthu l10038779, 9551154555, ,, , r10831945, 9150000747, 9282109134, 9043728565, ,, , b10750123, 9952946340, , dealer, bala r10763559, 9841280752, 9884797013, , dealer, senthil this getting using following code
re.findall("'([a-za-z0-9,\s]*)'", (a['onclick'] if else '')) i trying modify in such way 1 phone number retrieved , rest should vanish. should this
y10765227, 9884877926, dealer, rgmuthu l10038779, 9551154555 r10831945, 9150000747 b10750123, 9952946340, dealer, bala r10763559, 9841280752, dealer, senthil i trying use
re.findall("'([a-za-z0-9,\s]*)'", (re.sub(r'([^,]+,[^,]+,)(.*?)([a-za-z].*)', r'\1\0',a['onclick']) if else '')) but not seem work.
you can use regex getting data out of onclick:
properties = soup.findall('a', title=re.compile('bedroom')) eachproperty in properties: print re.findall("'([a-za-z0-9,\s]*)'", eachproperty['onclick']) prints:
['y10765227', '9884877926, 9283183326', '', 'dealer', 'rgmuthu'] ['l10038779', '9551154555', ',', ','] ['r10831945', '9150000747, 9282109134, 9043728565', ',', ','] ['b10750123', '9952946340', '', 'dealer', 'bala'] ['r10763559', '9841280752, 9884797013', '', 'dealer', 'senthil'] ... hope helps.
Comments
Post a Comment