Read contents of .tar.gz file from website into a python 3.x object -
i new python. can't figure out doing wrong when trying read contents of .tar.gz file python. tarfile read hosted @ following web address:
ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/breast_cancer_res_2001_nov_9_3(1)_61-65.tar.gz
more info on file @ site (just can trust contents) http://www.pubmedcentral.nih.gov/utils/oa/oa.fcgi?id=pmc13901
the tarfile contains .pdf , .nxml copies of journal article. , couple of image files.
if open file in browser copying , pasting. can save location on pc , import tarfile fine using following commands (note: winzip changes file .tar.gz .tar when save location):
import tarfile thetarfile = "c:/users/dfcm/documents/breast_cancer_res_2001_nov_9_3(1)_61-65.tar" tfile = tarfile.open(thetarfile) tfile
however, if try access file directly using similar commands:
thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/breast_cancer_res_2001_nov_9_3(1)_61-65.tar.gz" bbb = tarfile.open(thetarfile)
that results in following error:
traceback (most recent call last): file "<pyshell#137>", line 1, in <module> bbb = tarfile.open(thetarfile) file "c:\python30\lib\tarfile.py", line 1625, in open return func(name, "r", fileobj, **kwargs) file "c:\python30\lib\tarfile.py", line 1687, in gzopen fileobj = bltn_open(name, mode + "b") file "c:\python30\lib\io.py", line 278, in __new__ return open(*args, **kwargs) file "c:\python30\lib\io.py", line 222, in open closefd) file "c:\python30\lib\io.py", line 615, in __init__ _fileio._fileio.__init__(self, name, mode, closefd) ioerror: [errno 22] invalid argument: 'ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/breast_cancer_res_2001_nov_9_3(1)_61-65.tar'
can explain doing wrong when trying read .tar.gz file directly web address? in advance. chris
unfortunately cannot open files network. things bit more complex here. have instruct interpreter create network request , create object representing request state. can done using urllib
module.
import urllib.request import tarfile thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/breast_cancer_res_2001_nov_9_3(1)_61-65.tar.gz" ftpstream = urllib.request.urlopen(thetarfile) thetarfile = tarfile.open(fileobj=ftpstream, mode="r|gz")
the ftpstream
object file-like represents connection ftp server. tarfile module can access stream. since not pass filename, have specify compression in mode
parameter.
Comments
Post a Comment