在python脚本中读取tar文件的内容而不用解开它

我有一个tar文件,里面有一些文件。 我需要编写一个python脚本,它将读取文件的内容,并提供总字符数,包括字母,空格,换行符,所有内容的总数,而不需要解压tar文件。

你可以使用getmembers()

>>> import tarfile >>> tar = tarfile.open("test.tar") >>> tar.getmembers() 

之后,您可以使用extractfile()将成员提取为文件对象。 只是一个例子

 import tarfile,os import sys os.chdir("/tmp/foo") tar = tarfile.open("test.tar") for member in tar.getmembers(): f=tar.extractfile(member) content=f.read() print "%s has %d newlines" %(member, content.count("\n")) print "%s has %d spaces" % (member,content.count(" ")) print "%s has %d characters" % (member, len(content)) sys.exit() tar.close() 

在上面的例子中,对于文件对象“f”,可以使用read(),readlines()等

你需要使用tarfile模块。 具体而言,您使用TarFile类的实例来访问该文件,然后使用TarFile.getnames()访问名称

  | getnames(self) | Return the members of the archive as a list of their names. It has | the same order as the list returned by getmembers(). 

如果你想读取内容 ,那么你使用这个方法

  | extractfile(self, member) | Extract a member from the archive as a file object. `member' may be | a filename or a TarInfo object. If `member' is a regular file, a | file-like object is returned. If `member' is a link, a file-like | object is constructed from the link's target. If `member' is none of | the above, None is returned. | The file-like object is read-only and provides the following | methods: read(), readline(), readlines(), seek() and tell() 

@ stefano-borini提到的方法的实现像这样通过文件名访问一个tar归档成员

 #python3 myFile = myArchive.extractfile( dict(zip( myArchive.getnames(), myArchive.getmembers() ))['path/to/file'] ).read()` 

积分: