Read contents of .tar.gz file from website into a python 3.x object -


i new python. can't figure out doing wrong when trying read contents of .tar.gz file python. tarfile read hosted @ following web address:

ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/breast_cancer_res_2001_nov_9_3(1)_61-65.tar.gz

more info on file @ site (just can trust contents) http://www.pubmedcentral.nih.gov/utils/oa/oa.fcgi?id=pmc13901

the tarfile contains .pdf , .nxml copies of journal article. , couple of image files.

if open file in browser copying , pasting. can save location on pc , import tarfile fine using following commands (note: winzip changes file .tar.gz .tar when save location):

import tarfile thetarfile = "c:/users/dfcm/documents/breast_cancer_res_2001_nov_9_3(1)_61-65.tar" tfile = tarfile.open(thetarfile) tfile 

however, if try access file directly using similar commands:

thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/breast_cancer_res_2001_nov_9_3(1)_61-65.tar.gz" bbb = tarfile.open(thetarfile) 

that results in following error:

 traceback (most recent call last):  file "<pyshell#137>", line 1, in <module>  bbb = tarfile.open(thetarfile)  file "c:\python30\lib\tarfile.py", line 1625, in open  return func(name, "r", fileobj, **kwargs)  file "c:\python30\lib\tarfile.py", line 1687, in gzopen  fileobj = bltn_open(name, mode + "b")  file "c:\python30\lib\io.py", line 278, in __new__  return open(*args, **kwargs)  file "c:\python30\lib\io.py", line 222, in open  closefd)  file "c:\python30\lib\io.py", line 615, in __init__  _fileio._fileio.__init__(self, name, mode, closefd)  ioerror: [errno 22] invalid     argument: 'ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/breast_cancer_res_2001_nov_9_3(1)_61-65.tar' 

can explain doing wrong when trying read .tar.gz file directly web address? in advance. chris

unfortunately cannot open files network. things bit more complex here. have instruct interpreter create network request , create object representing request state. can done using urllib module.

import urllib.request import tarfile thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/breast_cancer_res_2001_nov_9_3(1)_61-65.tar.gz" ftpstream = urllib.request.urlopen(thetarfile) thetarfile = tarfile.open(fileobj=ftpstream, mode="r|gz") 

the ftpstream object file-like represents connection ftp server. tarfile module can access stream. since not pass filename, have specify compression in mode parameter.


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

javascript - storing input from prompt in array and displaying the array -