regex - Get valid python list from string (javascript array) -


i'm trying valid python list response of server can see below:

window.__search.list=[{"order":"1","base":"law","n":"148904","access":{"css":"avail_yes","title":"\u042 2\u0435\u043a\u0441\u0442\u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0434\u043e\u0441\u0442\u0443\u043f\u0435\u043d"},"title":"\"\u0410\u0440\u0431\u0438\u0442\u0440\u0430\u0436\u043d\u044b\u0439\u043f\u0440\u043e\u0446\u0435\u0441\u0441\u0443\u0430\u043b\u044c\u043d\u044b\u0439\u043a\u043e\u0434\u0435\u043a\u0441\u0420\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u043e\u0439\u0424\u0435\u0434\u0435\u0440\u0430\u0446\u0438\u0438\" \u043e\u0442 24.07.2002 n 95-\u0424\u0417 (\u0440\u0435\u0434. \u043e\u0442 02.07.2013) (\u0441 \u0438\u0437\u043c. \u0438 \u0434\u043e\u043f.,\u0432\u0441\u0442\u0443\u043f\u0430 \u044e\u0449\u0438\u043c\u0438\u0432 \u0441\u0438\u043b\u0443 \u0441 01.08.2013)"}, ... }];

i did through cutting off "window.__search.list=" , ";" string using data = json.loads(re.search(r"(?=\[)(.*?)\s*(?=\;)", url).group(1)) , looked standard json:

[{u'access': {u'css': u'avail_yes', u'title': u'\u0422\u0435\u043a\u0441\u0442\u0434\u043e\u043a\u04 43\u043c\u0435\u043d\u0442\u0430 \u0434\u043e\u0441\u0442\u0443\u043f\u0435\u043d'},u'title': u'"\u0410\u0440\u0431\u0438\u0442\u0440\u0430\u0436\u043d\u044b\u0439\u043f\u0440\u043e\u0446\u0435\u0441\u0441\u0443\u0430\u043b\u044c\u043d\u044b\u0439\u043a\u043e\u0434\u0435\u043a\u0441\u0420\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u043e\u0439\u0424\u0435\u0434\u0435\u0440\u0430\u0446\u0438\u0438" \u043e\u0442 24.07.2002 n 95-\u0424\u0417 (\u04 40\u0435\u0434. \u043e\u0442 02.07.2013) (\u0441 \u0438\u0437\u043c. \u0438 \u0434\u043e \u043f.,\u0432\u0441\u0442\u0443\u043f\u0430\u044e\u0449\u0438\u043c\u0438 \u0432 \u0441 \u0438\u043b\u0443 \u0441 01.08.2013)', u'base': u'law', u'order': u'1', u'n': u'148904'}, ... }]

but sometimes, during iterating others urls error this:

file "/developer/python/test.py", line 123, in order_search     data = json.loads(re.search(r"(?=\[)(.*?)\s*(?=\;)", url).group(1)) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads     return _default_decoder.decode(s) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode     obj, end = self.raw_decode(s, idx=_w(s, 0).end()) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode     obj, end = self.scan_once(s, idx) valueerror: invalid \uxxxx escape: line 1 column 20235 (char 20235) 

how can fix it, or maybe there's way valid json (desirable using native libraries)?

probably, regular expression has found char ';' somewhere in middle of response, , because of error, because, using regular expression, might have received incomplete, cropped response, , that's why not convert json.

yes, agree user rickya using native tools, code easier read trying make regex. here, i'd rather use regular expression, this:

data = re.search(r'(?=\[)(.*?)[\;]*$', response).group(1) 
/(?=\[)(.*?)[\;]*$/ (?=\[) positive lookahead \[ literal [ 1st capturing group (.*?) . 0 infinite times [lazy] character (except newline) char class [\;] 0 infinite times [greedy] matches: \; character ; $ end of string 

i believe meant variable 'url' means response server, maybe better use name of variable 'response' instead of 'url'.

and, if you've troubles using regex, advise use editor of regular expressions, regex 101.this online regular expression editor, explains each block of inputted expression.


Comments

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -