regex - Get valid python list from string (javascript array) -
i'm trying valid python list response of server can see below:
window.__search.list=[{"order":"1","base":"law","n":"148904","access":{"css":"avail_yes","title":"\u042 2\u0435\u043a\u0441\u0442\u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0434\u043e\u0441\u0442\u0443\u043f\u0435\u043d"},"title":"\"\u0410\u0440\u0431\u0438\u0442\u0440\u0430\u0436\u043d\u044b\u0439\u043f\u0440\u043e\u0446\u0435\u0441\u0441\u0443\u0430\u043b\u044c\u043d\u044b\u0439\u043a\u043e\u0434\u0435\u043a\u0441\u0420\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u043e\u0439\u0424\u0435\u0434\u0435\u0440\u0430\u0446\u0438\u0438\" \u043e\u0442 24.07.2002 n 95-\u0424\u0417 (\u0440\u0435\u0434. \u043e\u0442 02.07.2013) (\u0441 \u0438\u0437\u043c. \u0438 \u0434\u043e\u043f.,\u0432\u0441\u0442\u0443\u043f\u0430 \u044e\u0449\u0438\u043c\u0438\u0432 \u0441\u0438\u043b\u0443 \u0441 01.08.2013)"}, ... }];
i did through cutting off "window.__search.list=" , ";" string using data = json.loads(re.search(r"(?=\[)(.*?)\s*(?=\;)", url).group(1)) , looked standard json:
[{u'access': {u'css': u'avail_yes', u'title': u'\u0422\u0435\u043a\u0441\u0442\u0434\u043e\u043a\u04 43\u043c\u0435\u043d\u0442\u0430 \u0434\u043e\u0441\u0442\u0443\u043f\u0435\u043d'},u'title': u'"\u0410\u0440\u0431\u0438\u0442\u0440\u0430\u0436\u043d\u044b\u0439\u043f\u0440\u043e\u0446\u0435\u0441\u0441\u0443\u0430\u043b\u044c\u043d\u044b\u0439\u043a\u043e\u0434\u0435\u043a\u0441\u0420\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u043e\u0439\u0424\u0435\u0434\u0435\u0440\u0430\u0446\u0438\u0438" \u043e\u0442 24.07.2002 n 95-\u0424\u0417 (\u04 40\u0435\u0434. \u043e\u0442 02.07.2013) (\u0441 \u0438\u0437\u043c. \u0438 \u0434\u043e \u043f.,\u0432\u0441\u0442\u0443\u043f\u0430\u044e\u0449\u0438\u043c\u0438 \u0432 \u0441 \u0438\u043b\u0443 \u0441 01.08.2013)', u'base': u'law', u'order': u'1', u'n': u'148904'}, ... }]
but sometimes, during iterating others urls error this:
file "/developer/python/test.py", line 123, in order_search data = json.loads(re.search(r"(?=\[)(.*?)\s*(?=\;)", url).group(1)) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode obj, end = self.scan_once(s, idx) valueerror: invalid \uxxxx escape: line 1 column 20235 (char 20235) how can fix it, or maybe there's way valid json (desirable using native libraries)?
probably, regular expression has found char ';' somewhere in middle of response, , because of error, because, using regular expression, might have received incomplete, cropped response, , that's why not convert json.
yes, agree user rickya using native tools, code easier read trying make regex. here, i'd rather use regular expression, this:
data = re.search(r'(?=\[)(.*?)[\;]*$', response).group(1) /(?=\[)(.*?)[\;]*$/ (?=\[) positive lookahead \[ literal [ 1st capturing group (.*?) . 0 infinite times [lazy] character (except newline) char class [\;] 0 infinite times [greedy] matches: \; character ; $ end of string
i believe meant variable 'url' means response server, maybe better use name of variable 'response' instead of 'url'.
and, if you've troubles using regex, advise use editor of regular expressions, regex 101.this online regular expression editor, explains each block of inputted expression.
Comments
Post a Comment