python - How to parse xml-file with directory structure -
i've got xml-file containing directory structure files want put tar.gz file (flattened).
how should parse xml extract path each file?
right i'm using lxml , finding paths this:
paths = [] case in root.iter('case'): language in case.iter('language'): result in language.iter('result'): file in result.iter('file'): paths.append('/'.join([node.get('id') node in [case, language, result, file]]))
but feels bit hardcoded , not work if structure change.
i can find each file-node root.iter('file'), how can parents/directories each node/file? or should (completely?) different way?
the xml looks this:
<?xml version="1.0" encoding="utf-8"?> <files batch="regular"> <case id="case_10_some_description"> <language id="english"> <result id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> <file id="screenshot_4.png"/> <file id="screenshot_5.png"/> <file id="screenshot_6.png"/> </result> </language> </case> <case id="case_12_some_description"> <language id="english"> <result id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> </result> </language> </case> </files>
and files:
regular/case_10_some_description/english/images/screenshot_1.png regular/case_10_some_description/english/images/screenshot_2.png regular/case_10_some_description/english/images/screenshot_3.png regular/case_10_some_description/english/images/screenshot_4.png regular/case_10_some_description/english/images/screenshot_5.png regular/case_10_some_description/english/images/screenshot_6.png regular/case_12_some_description/english/images/screenshot_1.png regular/case_12_some_description/english/images/screenshot_2.png regular/case_12_some_description/english/images/screenshot_3.png
do create file-schema on own? if can change it, definitly. try make this:
<?xml version="1.0" encoding="utf-8"?> <directory id="regular"> <directory id="case_10_some_description"> <directory id="english"> <directory id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> <file id="screenshot_4.png"/> <file id="screenshot_5.png"/> <file id="screenshot_6.png"/> </directory> </directory> </directory> <directory id="case_12_some_description"> <directory id="english"> <directory id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> </directory> </directory> </directory> </directory>
always give tag same name if have same meaning. maybe use more different attributes tag, make parsing easier
Comments
Post a Comment