SQL Server批量插入带有不一致引号的CSV文件

是否有可能扩大插入(SQL Server)一个CSV文件,其中的字段是只有OCCASSIONALLY引号包围? 具体而言,引号仅包含那些包含“,”的字段。

换句话说,我有这样的数据(第一行包含标题):

id, company, rep, employees 729216,INGRAM MICRO INC.,"Stuart, Becky",523 729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114 721177,GEORGE WESTON BAKERIES INC,"Hogan, Meg",253 

由于引号不一致,我不能使用“”,“”作为分隔符,而且我也不知道如何创build一个格式文件来解决这个问题。

我尝试使用','作为分隔符,并将其加载到一个临时表中,其中每列是一个varchar,然后使用一些kludgy处理去除引号,但这也不起作用,因为包含','被分成多列。

不幸的是,我没有能力预先处理CSV文件。

这是无望的吗?

非常感谢您的任何build议。

顺便说一下,我看到这个post从csv的SQL批量导入 ,但在这种情况下,每个字段一直包裹在引号。 那么,在这种情况下,他可以用','作为分隔符,然后去掉引号。

你将需要预处理文件,句点。

如果你真的需要这样做,这里是代码。 我写这个是因为我绝对没有select。 这是实用程序代码,我并不自豪,但它的工作原理。 该方法不是让SQL了解引用的字段,而是操作文件以使用完全不同的分隔符。

编辑:这是github回购代码。 它已经得到改进,现在有unit testing! https://github.com/chrisclark/Redelim-it

这个函数接受一个input文件,并用一个新的分隔符replace所有的字段分隔逗号(而不是引号内的逗号,只是实际的分隔符)。 然后你可以告诉sql server使用新的字段分隔符而不是逗号。 在这里函数的版本中,占位符是< TMP >(我相信这不会出现在原始的csv中 – 如果是的话,支持爆炸)。

因此在运行这个函数之后,你可以通过执行如下操作来导入sql:

 BULK INSERT MyTable FROM 'C:\FileCreatedFromThisFunction.csv' WITH ( FIELDTERMINATOR = '<*TMP*>', ROWTERMINATOR = '\n' ) 

不用担心,可怕的,可怕的function,我事先道歉,造成你(编辑 – 我已经发布了一个工作程序,而不只是在我的博客function):

 Private Function CsvToOtherDelimiter(ByVal InputFile As String, ByVal OutputFile As String) As Integer Dim PH1 As String = "<*TMP*>" Dim objReader As StreamReader = Nothing Dim count As Integer = 0 'This will also serve as a primary key' Dim sb As New System.Text.StringBuilder Try objReader = New StreamReader(File.OpenRead(InputFile), System.Text.Encoding.Default) Catch ex As Exception UpdateStatus(ex.Message) End Try If objReader Is Nothing Then UpdateStatus("Invalid file: " & InputFile) count = -1 Exit Function End If 'grab the first line Dim line = reader.ReadLine() 'and advance to the next line b/c the first line is column headings If hasHeaders Then line = Trim(reader.ReadLine) End If While Not String.IsNullOrEmpty(line) 'loop through each line count += 1 'Replace commas with our custom-made delimiter line = line.Replace(",", ph1) 'Find a quoted part of the line, which could legitimately contain commas. 'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder. Dim starti = line.IndexOf(ph1 & """", 0) If line.IndexOf("""",0) = 0 then starti=0 While starti > -1 'loop through quoted fields Dim FieldTerminatorFound As Boolean = False 'Find end quote token (originally a ",) Dim endi As Integer = line.IndexOf("""" & ph1, starti) If endi < 0 Then FieldTerminatorFound = True If endi < 0 Then endi = line.Length - 1 End If While Not FieldTerminatorFound 'Find any more quotes that are part of that sequence, if any Dim backChar As String = """" 'thats one quote Dim quoteCount = 0 While backChar = """" quoteCount += 1 backChar = line.Chars(endi - quoteCount) End While If quoteCount Mod 2 = 1 Then 'odd number of quotes. real field terminator FieldTerminatorFound = True Else 'keep looking endi = line.IndexOf("""" & ph1, endi + 1) End If End While 'Grab the quoted field from the line, now that we have the start and ending indices Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1) 'And swap the commas back in line = line.Replace(source, source.Replace(ph1, ",")) 'Find the next quoted field ' If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail starti = line.IndexOf(ph1 & """", starti + ph1.Length) End While line = objReader.ReadLine End While objReader.Close() SaveTextToFile(sb.ToString, OutputFile) Return count End Function 

从MSDN无法为此文件执行批量插入操作:

要用作批量导入的数据文件,CSV文件必须符合以下限制:

  • 数据字段不包含字段终止符。
  • 数据字段中的任何值或全部值都用引号(“”)括起来。

http://msdn.microsoft.com/en-us/library/ms188609.aspx

一些简单的文本处理应该是让文件准备好导入所需的一切。 或者,您的用户可能需要根据本指南格式化文件,或者使用除逗号之外的其他字符作为分隔符(例如|)

我发现Chris的答案非常有帮助,但是我想从SQL Server中使用T-SQL(而不是使用CLR)来运行它,所以我将其代码转换为T-SQL代码。 但是后来我把它放在一个存储过程中进行了更进一步的工作,

  1. 使用批量插入来初始导入CSV文件
  2. 使用克里斯的代码清理线路
  3. 以表格格式返回结果

为了我的需要,我进一步清除了行数值,并将两个双引号转换为一个双引号(我认为这是正确的方法)。

 CREATE PROCEDURE SSP_CSVToTable -- Add the parameters for the stored procedure here @InputFile nvarchar(4000) , @FirstLine int AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; --convert the CSV file to a table --clean up the lines so that commas are handles correctly DECLARE @sql nvarchar(4000) DECLARE @PH1 nvarchar(50) DECLARE @LINECOUNT int -- This will also serve as a primary key DECLARE @CURLINE int DECLARE @Line nvarchar(4000) DECLARE @starti int DECLARE @endi int DECLARE @FieldTerminatorFound bit DECLARE @backChar nvarchar(4000) DECLARE @quoteCount int DECLARE @source nvarchar(4000) DECLARE @COLCOUNT int DECLARE @CURCOL int DECLARE @ColVal nvarchar(4000) -- new delimiter SET @PH1 = '†' -- create single column table to hold each line of file CREATE TABLE [#CSVLine]([line] nvarchar(4000)) -- bulk insert into temp table -- cannot use variable path with bulk insert -- so we must run using dynamic sql SET @Sql = 'BULK INSERT #CSVLine FROM ''' + @InputFile + ''' WITH ( FIRSTROW=' + CAST(@FirstLine as varchar) + ', FIELDTERMINATOR = ''\n'', ROWTERMINATOR = ''\n'' )' -- run dynamic statement to populate temp table EXEC(@sql) -- get number of lines in table SET @LINECOUNT = @@ROWCOUNT -- add identity column to table so that we can loop through it ALTER TABLE [#CSVLine] ADD [RowId] [int] IDENTITY(1,1) NOT NULL IF @LINECOUNT > 0 BEGIN -- cycle through each line, cleaning each line SET @CURLINE = 1 WHILE @CURLINE <= @LINECOUNT BEGIN -- get current line SELECT @line = line FROM #CSVLine WHERE [RowId] = @CURLINE -- Replace commas with our custom-made delimiter SET @Line = REPLACE(@Line, ',', @PH1) -- Find a quoted part of the line, which could legitimately contain commas. -- In that case we will need to identify the quoted section and swap commas back in for our custom placeholder. SET @starti = CHARINDEX(@PH1 + '"' ,@Line, 0) If CHARINDEX('"', @Line, 0) = 0 SET @starti = 0 -- loop through quoted fields WHILE @starti > 0 BEGIN SET @FieldTerminatorFound = 0 -- Find end quote token (originally a ",) SET @endi = CHARINDEX('"' + @PH1, @Line, @starti) -- sLine.IndexOf("""" & PH1, starti) IF @endi < 1 BEGIN SET @FieldTerminatorFound = 1 If @endi < 1 SET @endi = LEN(@Line) - 1 END WHILE @FieldTerminatorFound = 0 BEGIN -- Find any more quotes that are part of that sequence, if any SET @backChar = '"' -- thats one quote SET @quoteCount = 0 WHILE @backChar = '"' BEGIN SET @quoteCount = @quoteCount + 1 SET @backChar = SUBSTRING(@Line, @endi-@quoteCount, 1) -- sLine.Chars(endi - quoteCount) END IF (@quoteCount % 2) = 1 BEGIN -- odd number of quotes. real field terminator SET @FieldTerminatorFound = 1 END ELSE BEGIN -- keep looking SET @endi = CHARINDEX('"' + @PH1, @Line, @endi + 1) -- sLine.IndexOf("""" & PH1, endi + 1) END END -- Grab the quoted field from the line, now that we have the start and ending indices SET @source = SUBSTRING(@Line, @starti + LEN(@PH1), @endi - @starti - LEN(@PH1) + 1) -- sLine.Substring(starti + PH1.Length, endi - starti - PH1.Length + 1) -- And swap the commas back in SET @Line = REPLACE(@Line, @source, REPLACE(@source, @PH1, ',')) --sLine.Replace(source, source.Replace(PH1, ",")) -- Find the next quoted field -- If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail SET @starti = CHARINDEX(@PH1 + '"', @Line, @starti + LEN(@PH1)) --sLine.IndexOf(PH1 & """", starti + PH1.Length) END -- get table based on current line IF OBJECT_ID('tempdb..#Line') IS NOT NULL DROP TABLE #Line -- converts a delimited list into a table SELECT * INTO #Line FROM dbo.iter_charlist_to_table(@Line,@PH1) -- get number of columns in line SET @COLCOUNT = @@ROWCOUNT -- dynamically create CSV temp table to hold CSV columns and lines -- only need to create once IF OBJECT_ID('tempdb..#CSV') IS NULL BEGIN -- create initial structure of CSV table CREATE TABLE [#CSV]([Col1] nvarchar(100)) -- dynamically add a column for each column found in the first line SET @CURCOL = 1 WHILE @CURCOL <= @COLCOUNT BEGIN -- first column already exists, don't need to add IF @CURCOL > 1 BEGIN -- add field SET @sql = 'ALTER TABLE [#CSV] ADD [Col' + Cast(@CURCOL as varchar) + '] nvarchar(100)' --print @sql -- this adds the fields to the temp table EXEC(@sql) END -- go to next column SET @CURCOL = @CURCOL + 1 END END -- build dynamic sql to insert current line into CSV table SET @sql = 'INSERT INTO [#CSV] VALUES(' -- loop through line table, dynamically adding each column value SET @CURCOL = 1 WHILE @CURCOL <= @COLCOUNT BEGIN -- get current column Select @ColVal = str From #Line Where listpos = @CURCOL IF LEN(@ColVal) > 0 BEGIN -- remove quotes from beginning if exist IF LEFT(@ColVal,1) = '"' SET @ColVal = RIGHT(@ColVal, LEN(@ColVal) - 1) -- remove quotes from end if exist IF RIGHT(@ColVal,1) = '"' SET @ColVal = LEFT(@ColVal, LEN(@ColVal) - 1) END -- write column value -- make value sql safe by replacing single quotes with two single quotes -- also, replace two double quotes with a single double quote SET @sql = @sql + '''' + REPLACE(REPLACE(@ColVal, '''',''''''), '""', '"') + '''' -- add comma separater except for the last record IF @CURCOL <> @COLCOUNT SET @sql = @sql + ',' -- go to next column SET @CURCOL = @CURCOL + 1 END -- close sql statement SET @sql = @sql + ')' --print @sql -- run sql to add line to table EXEC(@sql) -- move to next line SET @CURLINE = @CURLINE + 1 END END -- return CSV table SELECT * FROM [#CSV] END GO 

存储过程使用这个帮助函数将stringparsing为表(谢谢Erland Sommarskog!):

 CREATE FUNCTION [dbo].[iter_charlist_to_table] (@list ntext, @delimiter nchar(1) = N',') RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL, str varchar(4000), nstr nvarchar(2000)) AS BEGIN DECLARE @pos int, @textpos int, @chunklen smallint, @tmpstr nvarchar(4000), @leftover nvarchar(4000), @tmpval nvarchar(4000) SET @textpos = 1 SET @leftover = '' WHILE @textpos <= datalength(@list) / 2 BEGIN SET @chunklen = 4000 - datalength(@leftover) / 2 SET @tmpstr = @leftover + substring(@list, @textpos, @chunklen) SET @textpos = @textpos + @chunklen SET @pos = charindex(@delimiter, @tmpstr) WHILE @pos > 0 BEGIN SET @tmpval = ltrim(rtrim(left(@tmpstr, @pos - 1))) INSERT @tbl (str, nstr) VALUES(@tmpval, @tmpval) SET @tmpstr = substring(@tmpstr, @pos + 1, len(@tmpstr)) SET @pos = charindex(@delimiter, @tmpstr) END SET @leftover = @tmpstr END INSERT @tbl(str, nstr) VALUES (ltrim(rtrim(@leftover)), ltrim(rtrim(@leftover))) RETURN END 

以下是我如何从T-SQL中调用它。 在这种情况下,我将结果插入临时表,所以我先创build临时表:

  -- create temp table for file import CREATE TABLE #temp ( CustomerCode nvarchar(100) NULL, Name nvarchar(100) NULL, [Address] nvarchar(100) NULL, City nvarchar(100) NULL, [State] nvarchar(100) NULL, Zip nvarchar(100) NULL, OrderNumber nvarchar(100) NULL, TimeWindow nvarchar(100) NULL, OrderType nvarchar(100) NULL, Duration nvarchar(100) NULL, [Weight] nvarchar(100) NULL, Volume nvarchar(100) NULL ) -- convert the CSV file into a table INSERT #temp EXEC [dbo].[SSP_CSVToTable] @InputFile = @FileLocation ,@FirstLine = @FirstImportRow 

我还没有testing性能,但它适用于我所需要的 – 导入less于1000行的CSV文件。 但是,它可能会扼杀真正的大文件。

希望别人也认为它有用。

干杯!

我还创build了一个函数来将CSV转换为批量插入的可用格式。 我用克里斯·克拉克(Chris Clark)的答复文章作为创build以下C#函数的起点。

我结束了使用正则expression式来查找字段。 然后,我一行一行地重新创build文件,并将其写入一个新文件,从而避免将整个文件加载到内存中。

 private void CsvToOtherDelimiter(string CSVFile, System.Data.Linq.Mapping.MetaTable tbl) { char PH1 = '|'; StringBuilder ln; //Confirm file exists. Else, throw exception if (File.Exists(CSVFile)) { using (TextReader tr = new StreamReader(CSVFile)) { //Use a temp file to store our conversion using (TextWriter tw = new StreamWriter(CSVFile + ".tmp")) { string line = tr.ReadLine(); //If we have already converted, no need to reconvert. //NOTE: We make the assumption here that the input header file // doesn't have a PH1 value unless it's already been converted. if (line.IndexOf(PH1) >= 0) { tw.Close(); tr.Close(); File.Delete(CSVFile + ".tmp"); return; } //Loop through input file while (!string.IsNullOrEmpty(line)) { ln = new StringBuilder(); //1. Use Regex expression to find comma separated values //using quotes as optional text qualifiers //(what MS EXCEL does when you import a csv file) //2. Remove text qualifier quotes from data //3. Replace any values of PH1 found in column data //with an equivalent character //Regex: \A[^,]*(?=,)|(?:[^",]*"[^"]*"[^",]*)+|[^",]*"[^"]*\Z|(?<=,)[^,]*(?=,)|(?<=,)[^,]*\Z|\A[^,]*\Z List<string> fieldList = Regex.Matches(line, @"\A[^,]*(?=,)|(?:[^"",]*""[^""]*""[^"",]*)+|[^"",]*""[^""]*\Z|(?<=,)[^,]*(?=,)|(?<=,)[^,]*\Z|\A[^,]*\Z") .Cast<Match>() .Select(m => RemoveCSVQuotes(m.Value).Replace(PH1, '¦')) .ToList<string>(); //Add the list of fields to ln, separated by PH1 fieldList.ToList().ForEach(m => ln.Append(m + PH1)); //Write to file. Don't include trailing PH1 value. tw.WriteLine(ln.ToString().Substring(0, ln.ToString().LastIndexOf(PH1))); line = tr.ReadLine(); } tw.Close(); } tr.Close(); //Optional: replace input file with output file File.Delete(CSVFile); File.Move(CSVFile + ".tmp", CSVFile); } } else { throw new ArgumentException(string.Format("Source file {0} not found", CSVFile)); } } //The output file no longer needs quotes as a text qualifier, so remove them private string RemoveCSVQuotes(string value) { //if is empty string, then remove double quotes if (value == @"""""") value = ""; //remove any double quotes, then any quotes on ends value = value.Replace(@"""""", @""""); if (value.Length >= 2) if (value.Substring(0, 1) == @"""") value = value.Substring(1, value.Length - 2); return value; } 

这可能比你愿意使用的更复杂或更复杂,但是…

如果可以实现将行parsing为VB或C#中的字段的逻辑,则可以使用CLR表值函数(TVF)执行此操作。

当你希望有一些C#或VB代码将数据分隔成列和/或调整值时,CLR TVF可以是一种很好的执行方法,用于从外部源读取数据。

您必须愿意将CLR程序集添加到您的数据库(以及允许外部或不安全的操作,以便打开文件)。 这可能会有点复杂或涉及,但可能是值得你获得灵活性。

我有一些需要定期加载到表格的大文件,但某些代码翻译需要在某些列上执行,并且需要进行特殊处理才能加载值,否则会导致数据types错误,导致使用普通的批量插入。

简而言之,CLR TVF可以让你运行C#或者VB代码来对照文件的每一行,并且使用批量插入的方式执行(尽pipe你可能需要担心日志logging)。 SQL Server文档中的示例使您可以创build一个TVF以从可用作起点的事件日志中进行读取。

请注意,CLR TVF中的代码只能在处理第一行之前的初始阶段访问数据库(例如,不对每行进行查找 – 在此之上使用普通的TVF来执行此操作)。 您似乎不需要根据您的问题。

另外请注意,每个CLR TVF都必须明确指定其输出列,因此您不能为每个不同的csv文件重用一个通用的。

您可以编写一个CLR TVF来从文件中读取整行,返回一个列结果集,然后使用正常的TVF从每个types的文件读取。 这就要求代码parsing每行要写在T-SQL中,但是避免了写许多CLR TVF。

通常情况下,这个问题是由用户将Excel文件导出为CSV导致的。

有两种解决这个问题的方法:

  1. 根据微软的build议 ,使用macros从Excel导出
  2. 或者真正简单的方法:
    • 在Excel中打开CSV。
    • 保存为Excel文件。 (.xls或.xlsx)。
    • 将该文件作为Excel文件导入到SQL Server中。
    • 轻笑自己,因为你不必编码像上面的解决scheme…. muhahahaha

导入为Excel文件

这里有一些SQL,如果你真想编写脚本(在将CSV保存为Excel之后):

 select * into SQLServerTable FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=D:\testing.xls;HDR=YES', 'SELECT * FROM [Sheet1$]') 

你应该能够指定不仅字段分隔符,应该是[,],而且是文本限定符,在这种情况下,将是[“]。使用[]来包装,所以没有与”混淆。

当我在麦克,“456 2nd St,Apt 5”等领域中,我发现了几个问题。

解决此问题的方法是@ http://crazzycoding.blogspot.com/2010/11/import-csv-file-into-sql-server-using.html

谢谢,Ashish

克里斯,感谢一堆! 你救了我的cookies! 我真不敢相信,当XL做这么好的工作的时候,批量装载机不会处理这种情况。这些人不会在大厅里看到彼此? 无论如何…我需要一个ConsoleApplication版本,所以这里是我一起砍的。 这是肮脏的,但它像一个冠军! 我对分隔符进行了硬编码,并将标题注释掉,因为它们不是我的应用程序所需要的。

我也希望我也可以在这里贴一个漂亮的大啤酒。

Geeze,我不知道为什么End Module和Public Class在代码块之外… srry!

  Module Module1 Sub Main() Dim arrArgs() As String = Command.Split(",") Dim i As Integer Dim obj As New ReDelimIt() Console.Write(vbNewLine & vbNewLine) If arrArgs(0) <> Nothing Then For i = LBound(arrArgs) To UBound(arrArgs) Console.Write("Parameter " & i & " is " & arrArgs(i) & vbNewLine) Next obj.ProcessFile(arrArgs(0), arrArgs(1)) Else Console.Write("Usage Test1 <inputfile>,<outputfile>") End If Console.Write(vbNewLine & vbNewLine) End Sub End Module Public Class ReDelimIt Public Function ProcessFile(ByVal InputFile As String, ByVal OutputFile As String) As Integer Dim ph1 As String = "|" Dim objReader As System.IO.StreamReader = Nothing Dim count As Integer = 0 'This will also serve as a primary key Dim sb As New System.Text.StringBuilder Try objReader = New System.IO.StreamReader(System.IO.File.OpenRead(InputFile), System.Text.Encoding.Default) Catch ex As Exception MsgBox(ex.Message) End Try If objReader Is Nothing Then MsgBox("Invalid file: " & InputFile) count = -1 Exit Function End If 'grab the first line Dim line = objReader.ReadLine() 'and advance to the next line b/c the first line is column headings 'Removed Check Headers can put in if needed. 'If chkHeaders.Checked Then 'line = objReader.ReadLine 'End If While Not String.IsNullOrEmpty(line) 'loop through each line count += 1 'Replace commas with our custom-made delimiter line = line.Replace(",", ph1) 'Find a quoted part of the line, which could legitimately contain commas. 'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder. Dim starti = line.IndexOf(ph1 & """", 0) While starti > -1 'loop through quoted fields 'Find end quote token (originally a ",) Dim endi = line.IndexOf("""" & ph1, starti) 'The end quote token could be a false positive because there could occur a ", sequence. 'It would be double-quoted ("",) so check for that here Dim check1 = line.IndexOf("""""" & ph1, starti) 'A """, sequence can occur if a quoted field ends in a quote. 'In this case, the above check matches, but we actually SHOULD process this as an end quote token Dim check2 = line.IndexOf("""""""" & ph1, starti) 'If we are in the check1 ("",) situation, keep searching for an end quote token 'The +1 and +2 accounts for the extra length of the checked sequences While (endi = check1 + 1 AndAlso endi <> check2 + 2) 'loop through "false" tokens in the quoted fields endi = line.IndexOf("""" & ph1, endi + 1) check1 = line.IndexOf("""""" & ph1, check1 + 1) check2 = line.IndexOf("""""""" & ph1, check2 + 1) End While 'We have searched for an end token (",) but can't find one, so that means the line ends in a " If endi < 0 Then endi = line.Length - 1 'Grab the quoted field from the line, now that we have the start and ending indices Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1) 'And swap the commas back in line = line.Replace(source, source.Replace(ph1, ",")) 'Find the next quoted field If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail starti = line.IndexOf(ph1 & """", starti + ph1.Length) End While 'Add our primary key to the line ' Removed for now 'If chkAddKey.Checked Then 'line = String.Concat(count.ToString, ph1, line) ' End If sb.AppendLine(line) line = objReader.ReadLine End While objReader.Close() SaveTextToFile(sb.ToString, OutputFile) Return count End Function Public Function SaveTextToFile(ByVal strData As String, ByVal FullPath As String) As Boolean Dim bAns As Boolean = False Dim objReader As System.IO.StreamWriter Try objReader = New System.IO.StreamWriter(FullPath, False, System.Text.Encoding.Default) objReader.Write(strData) objReader.Close() bAns = True Catch Ex As Exception Throw Ex End Try Return bAns End Function End Class 

另一种方法 – 假设你没有负载的字段,或者期望在数据本身中出现一个报价,就是使用REPLACE函数。

 UPDATE dbo.tablename SET dbo.tablename.target_field = REPLACE(t.importedValue, '"', '') FROM #tempTable t WHERE dbo.tablename.target_id = t.importedID; 

我用过了。 我无法对表演提出任何要求。 解决这个问题只是一个快速和肮脏的方法。

这个代码适用于我:

  public bool CSVFileRead(string fullPathWithFileName, string fileNameModified, string tableName) { SqlConnection con = new SqlConnection(ConfigurationSettings.AppSettings["dbConnectionString"]); string filepath = fullPathWithFileName; StreamReader sr = new StreamReader(filepath); string line = sr.ReadLine(); string[] value = line.Split(','); DataTable dt = new DataTable(); DataRow row; foreach (string dc in value) { dt.Columns.Add(new DataColumn(dc)); } while (!sr.EndOfStream) { //string[] stud = sr.ReadLine().Split(','); //for (int i = 0; i < stud.Length; i++) //{ // stud[i] = stud[i].Replace("\"", ""); //} //value = stud; value = sr.ReadLine().Split(','); if (value.Length == dt.Columns.Count) { row = dt.NewRow(); row.ItemArray = value; dt.Rows.Add(row); } } SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock); bc.DestinationTableName = tableName; bc.BatchSize = dt.Rows.Count; con.Open(); bc.WriteToServer(dt); bc.Close(); con.Close(); return true; } 

我把下面的东西放在一起解决我的情况。 我需要预处理非常大的文件并整理不一致的引用。 只需将其粘贴到一个空白的C#应用​​程序,将常量设置为您的要求,然后离开。 这对超过10 GB的非常大的CSV工作。

 namespace CsvFixer { using System.IO; using System.Text; public class Program { private const string delimiter = ","; private const string quote = "\""; private const string inputFile = "C:\\temp\\input.csv"; private const string fixedFile = "C:\\temp\\fixed.csv"; /// <summary> /// This application fixes inconsistently quoted csv (or delimited) files with support for very large file sizes. /// For example : 1223,5235234,8674,"Houston","London, UK",3425,Other text,stuff /// Must become : "1223","5235234","8674","Houston","London, UK","3425","Other text","stuff" /// </summary> /// <param name="args"></param> static void Main(string[] args) { // Use streaming to allow for large files. using (StreamWriter outfile = new StreamWriter(fixedFile)) { using (FileStream fs = File.Open(inputFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) using (BufferedStream bs = new BufferedStream(fs)) using (StreamReader sr = new StreamReader(bs)) { string currentLine; // Read each input line in and write each fixed line out while ((currentLine = sr.ReadLine()) != null) { outfile.WriteLine(FixLine(currentLine, delimiter, quote)); } } } } /// <summary> /// Fully quote a partially quoted line /// </summary> /// <param name="line">The partially quoted line</param> /// <returns>The fully quoted line</returns> private static string FixLine(string line, string delimiter, string quote) { StringBuilder fixedLine = new StringBuilder(); // Split all on the delimiter, acceptinmg that some quoted fields // that contain the delimiter wwill be split in to many pieces. string[] fieldParts = line.Split(delimiter.ToCharArray()); // Loop through the fields (or parts of fields) for (int i = 0; i < fieldParts.Length; i++) { string currentFieldPart = fieldParts[i]; // If the current field part starts and ends with a quote it is a field, so write it to the result if (currentFieldPart.StartsWith(quote) && currentFieldPart.EndsWith(quote)) { fixedLine.Append(string.Format("{0}{1}", currentFieldPart, delimiter)); } // else if it starts with a quote but doesnt end with one, it is part of a lionger field. else if (currentFieldPart.StartsWith(quote)) { // Add the start of the field fixedLine.Append(string.Format("{0}{1}", currentFieldPart, delimiter)); // Append any additional field parts (we will only hit the end of the field when // the last field part finishes with a quote. while (!fieldParts[++i].EndsWith(quote)) { fixedLine.Append(string.Format("{0}{1}", fieldParts[i], delimiter)); } // Append the last field part - ie the part containing the closing quote fixedLine.Append(string.Format("{0}{1}", fieldParts[i], delimiter)); } else { // The field has no quotes, add the feildpart with quote as bookmarks fixedLine.Append(string.Format("{0}{1}{0}{2}", quote, currentFieldPart, delimiter)); } } // Return the fixed string return fixedLine.ToString(); } } } 

使用4.5框架创build一个VB.NET程序转换为新的分隔符TextFieldParser这将自动处理文本限定字段

修改了以上使用内置TextFieldParser的代码

模块Module1

 Sub Main() Dim arrArgs() As String = Command.Split(",") Dim i As Integer Dim obj As New ReDelimIt() Dim InputFile As String = "" Dim OutPutFile As String = "" Dim NewDelimiter As String = "" Console.Write(vbNewLine & vbNewLine) If Not IsNothing(arrArgs(0)) Then For i = LBound(arrArgs) To UBound(arrArgs) Console.Write("Parameter " & i & " is " & arrArgs(i) & vbNewLine) Next InputFile = arrArgs(0) If Not IsNothing(arrArgs(1)) Then If Not String.IsNullOrEmpty(arrArgs(1)) Then OutPutFile = arrArgs(1) Else OutPutFile = InputFile.Replace("csv", "pipe") End If Else OutPutFile = InputFile.Replace("csv", "pipe") End If If Not IsNothing(arrArgs(2)) Then If Not String.IsNullOrEmpty(arrArgs(2)) Then NewDelimiter = arrArgs(2) Else NewDelimiter = "|" End If Else NewDelimiter = "|" End If obj.ConvertCSVFile(InputFile,OutPutFile,NewDelimiter) Else Console.Write("Usage ChangeFileDelimiter <inputfile>,<outputfile>,<NewDelimiter>") End If obj = Nothing Console.Write(vbNewLine & vbNewLine) 'Console.ReadLine() End Sub 

结束模块

公共类ReDelimIt

 Public Function ConvertCSVFile(ByVal InputFile As String, ByVal OutputFile As String, Optional ByVal NewDelimiter As String = "|") As Integer Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(InputFile) MyReader.TextFieldType = FileIO.FieldType.Delimited MyReader.SetDelimiters(",") Dim sb As New System.Text.StringBuilder Dim strLine As String = "" Dim currentRow As String() While Not MyReader.EndOfData Try currentRow = MyReader.ReadFields() Dim currentField As String strLine = "" For Each currentField In currentRow 'MsgBox(currentField) If strLine = "" Then strLine = strLine & currentField Else strLine = strLine & NewDelimiter & currentField End If Next sb.AppendLine(strLine) Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException 'MsgBox("Line " & ex.Message & "is not valid and will be skipped.") Console.WriteLine("Line " & ex.Message & "is not valid and will be skipped.") End Try End While SaveTextToFile(sb.ToString, OutputFile) End Using Return Err.Number End Function Public Function SaveTextToFile(ByVal strData As String, ByVal FullPath As String) As Boolean Dim bAns As Boolean = False Dim objReader As System.IO.StreamWriter Try If FileIO.FileSystem.FileExists(FullPath) Then Kill(FullPath) End If objReader = New System.IO.StreamWriter(FullPath, False, System.Text.Encoding.Default) objReader.Write(strData) objReader.Close() bAns = True Catch Ex As Exception Throw Ex End Try Return bAns End Function 

末class