根据分隔符将一个文件分成多个文件

我有一个文件-| 作为每个部分之后的分隔符…需要使用unix为每个部分创build单独的文件。

input文件的例子

 wertretr ewretrtret 1212132323 000232 -| ereteertetet 232434234 erewesdfsfsfs 0234342343 -| jdhg3875jdfsgfd sjdhfdbfjds 347674657435 -| 

文件1中的预期结果

 wertretr ewretrtret 1212132323 000232 -| 

文件2中的预期结果

 ereteertetet 232434234 erewesdfsfsfs 0234342343 -| 

文件3中的预期结果

 jdhg3875jdfsgfd sjdhfdbfjds 347674657435 -| 

一个class轮,没有编程。 (除了正则expression式等)

 csplit --digits=2 --quiet --prefix=outfile infile "/-|/+1" "{*}" 
 awk '{print $0 " -|"> "file" NR}' RS='-\\|' input-file 

Debian有csplit ,但我不知道这是否是所有/大多数/其他发行版通用的。 如果不是这样,追踪源代码并编译它不应该太困难。

我解决了一个稍微不同的问题,其中文件包含一个行的名称,其中应遵循的文本。 这个Perl代码为我做的伎俩:

 #!/path/to/perl -w #comment the line below for UNIX systems use Win32::Clipboard; # Get command line flags #print ($#ARGV, "\n"); if($#ARGV == 0) { print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename. All of the contents of filename.txt are written to that file until another mff is found.\n"; exit; } # this package sets the ARGV count variable to -1; use Getopt::Long; my $mff = ""; GetOptions('mff' => \$mff); # set a default $mff variable if ($mff eq "") {$mff = "-#-"}; print ("using file switch=", $mff, "\n\n"); while($_ = shift @ARGV) { if(-f "$_") { push @filelist, $_; } } # Could be more than one file name on the command line, # but this version throws away the subsequent ones. $readfile = $filelist[0]; open SOURCEFILE, "<$readfile" or die "File not found...\n\n"; #print SOURCEFILE; while (<SOURCEFILE>) { /^$mff (.*$)/o; $outname = $1; # print $outname; # print "right is: $1 \n"; if (/^$mff /) { open OUTFILE, ">$outname" ; print "opened $outname\n"; } else {print OUTFILE "$_"}; } 

你也可以使用awk。 我对awk不是很熟悉,但是下面的工作似乎对我有用。 它生成part1.txt,part2.txt,part3.txt和part4.txt。 请注意,生成的最后一个partn.txt文件是空的。 我不知道如何解决这个问题,但我相信这可以做一些调整。 任何build议任何人?

awk_pattern文件:

 BEGIN{ fn = "part1.txt"; n = 1 } { print > fn if (substr($0,1,2) == "-|") { close (fn) n++ fn = "part" n ".txt" } } 

bash命令:

awk -f awk_pattern input.file

以下是一个Python 3脚本,它根据分隔符提供的文件名将文件分成多个文件。 input文件示例:

 # Ignored ######## FILTER BEGIN foo.conf This goes in foo.conf. ######## FILTER END # Ignored ######## FILTER BEGIN bar.conf This goes in bar.conf. ######## FILTER END 

这是脚本:

 #!/usr/bin/env python3 import os import argparse # global settings start_delimiter = '######## FILTER BEGIN' end_delimiter = '######## FILTER END' # parse command line arguments parser = argparse.ArgumentParser() parser.add_argument("-i", "--input-file", required=True, help="input filename") parser.add_argument("-o", "--output-dir", required=True, help="output directory") args = parser.parse_args() # read the input file with open(args.input_file, 'r') as input_file: input_data = input_file.read() # iterate through the input data by line input_lines = input_data.splitlines() while input_lines: # discard lines until the next start delimiter while input_lines and not input_lines[0].startswith(start_delimiter): input_lines.pop(0) # corner case: no delimiter found and no more lines left if not input_lines: break # extract the output filename from the start delimiter output_filename = input_lines.pop(0).replace(start_delimiter, "").strip() output_path = os.path.join(args.output_dir, output_filename) # open the output file print("extracting file: {0}".format(output_path)) with open(output_path, 'w') as output_file: # while we have lines left and they don't match the end delimiter while input_lines and not input_lines[0].startswith(end_delimiter): output_file.write("{0}\n".format(input_lines.pop(0))) # remove end delimiter if present if not input_lines: input_lines.pop(0) 

最后这里是你如何运行它:

 $ python3 script.py -i input-file.txt -o ./output-folder/ 
 cat file| ( I=0; echo -n "">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done ) 

和格式化的版本:

 #!/bin/bash cat FILE | ( I=0; echo -n"">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done; ) 

以下命令适用于我。 希望能帮助到你。 bash awk 'BEGIN{file = 0; filename = "output_" file ".txt"} /-|/ {getline; file ++; filename = "output_" file ".txt"}{print $0 > filename}' input awk 'BEGIN{file = 0; filename = "output_" file ".txt"} /-|/ {getline; file ++; filename = "output_" file ".txt"}{print $0 > filename}' input

如果你有,请使用csplit

如果你没有,但你有Python …不要使用Perl。

假设您的示例文件被称为“ samplein ”:

 $ python -c "import sys for i, c in enumerate(sys.stdin.read().split('-|')): open(f'out{i}', 'w').write(c)" < samplein 

如果您的Python 3.5或更低版本,则不能使用f-string:

 $ python -c "import sys for i, c in enumerate(sys.stdin.read().split('-|')): open('out' + str(i), 'w').write(c)" < samplein 

现在:

 $ ls out* out0 out1 out2 out3 

这是我写的上下文分裂的问题: http : //stromberg.dnsalias.org/~strombrg/context-split.html

 $ ./context-split -h usage: ./context-split [-s separator] [-n name] [-z length] -s specifies what regex should separate output files -n specifies how output files are named (default: numeric -z specifies how long numbered filenames (if any) should be -i include line containing separator in output files operations are always performed on stdin 

这是一个perl的代码,将做的事情

 #!/usr/bin/perl open(FI,"file.txt") or die "Input file not found"; $cur=0; open(FO,">res.$cur.txt") or die "Cannot open output file $cur"; while(<FI>) { print FO $_; if(/^-\|/) { close(FO); $cur++; open(FO,">res.$cur.txt") or die "Cannot open output file $cur" } } close(FO);