读取和parsingTSV文件，然后操作它以保存为CSV（* effective *）

我的源数据是在一个TSV文件，6列和超过200万行。

这是我想要完成的：

我需要读取这个源文件中的3列（3,4,5）中的数据
第五列是一个整数。我需要使用这个整数值来复制使用第三和第四列中的数据（按整数倍数）的行条目。
我想将＃2的输出写入CSV格式的输出文件。

以下是我想出的。

我的问题是：这是一个有效的方法吗？在200万行上尝试时似乎可能是密集型的。

首先，我制作了一个样本选项卡单独的文件来处理，并将其称为“sample.txt”。这是基本的，只有四行：

Row1_Column1 Row1-Column2 Row1-Column3 Row1-Column4 2 Row1-Column6 Row2_Column1 Row2-Column2 Row2-Column3 Row2-Column4 3 Row2-Column6 Row3_Column1 Row3-Column2 Row3-Column3 Row3-Column4 1 Row3-Column6 Row4_Column1 Row4-Column2 Row4-Column3 Row4-Column4 2 Row4-Column6

那么我有这个代码：

 import csv with open('sample.txt','r') as tsv: AoA = [line.strip().split('\t') for line in tsv] for a in AoA: count = int(a[4]) while count > 0: with open('sample_new.csv','ab') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') csvwriter.writerow([a[2], a[3]]) count = count - 1

您应该使用csv模块来读取制表符分隔的值文件。一次不要把它读入记忆。毕竟，你阅读的每一行都有你需要的所有信息将行写入输出CSV文件。保持输出文件始终打开。

 import csv with open('sample.txt','rb') as tsvin, open('new.csv', 'wb') as csvout: tsvin = csv.reader(tsvin, delimiter='\t') csvout = csv.writer(csvout) for row in tsvin: count = int(row[4]) if count > 0: csvout.writerows([row[2:4] for _ in xrange(count)])

或者，使用itertools模块通过itertools.repeat()来执行重复操作：

 from itertools import repeat import csv with open('sample.txt','rb') as tsvin, open('new.csv', 'wb') as csvout: tsvin = csv.reader(tsvin, delimiter='\t') csvout = csv.writer(csvout) for row in tsvin: count = int(row[4]) if count > 0: csvout.writerows(repeat(row[2:4], count))

读取和parsingTSV文件，然后操作它以保存为CSV（* effective *）

在Excel 2007中使用换行符导入CSV

如何使用VBA下载文件（没有Internet Explorer）

如何在Ruby中创build新的CSV文件？

将数据从SQL Server Express导出为CSV（需要引用和转义）

使用VB将excel工作表保存为CSV文件名+工作表名称

导出没有col.names的CSV

使用LINQ将多行连接成单行（CSV属性）

为什么csvwriter.writerow（）在每个字符之后都放一个逗号？

在用Ant编译的文件中找不到主类

Python逐行写入CSV