使用带引号和不带引号的string拆分逗号分隔的string

我有以下逗号分隔的string，我需要拆分。问题是，一些内容是在引号内，并包含不应该在拆分中使用的逗号…

串：

111,222,"33,44,55",666,"77,88","99"

我想输出：

 111 222 33,44,55 666 77,88 99

我试过这个：

 (?:,?)((?<=")[^"]+(?=")|[^",]+)

但它读取“77,88”，“99”之间的逗号作为命中，我得到以下输出：

 111 222 33,44,55 666 77,88 , 99

有谁能够帮助我？我没有时间… 🙂 /彼得

根据您的需要，您可能无法使用csvparsing器，而实际上可能需要重新发明轮子！

你可以用一些简单的正则expression式来实现

 (?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)

这将做到以下几点：

(?:^|,) =匹配expression式“行或string的开始”

(\"(?:[^\"]+|\"\")*\"|[^,]*) =一个编号的捕获组，

东西在报价中
东西之间的逗号

这应该给你你正在寻找的输出。

C＃中的示例代码

 public static string[] SplitCSV(string input) { Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled); List<string> list = new List<string>(); string curr = null; foreach (Match match in csvSplit.Matches(input)) { curr = match.Value; if (0 == curr.Length) { list.Add(""); } list.Add(curr.TrimStart(',')); } return list.ToArray<string>(); } private void button1_Click(object sender, RoutedEventArgs e) { Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\"")); }

我真的很喜欢jimplode的答案，但是我认为带有yield return的版本更有用一些，所以这里是：

 public IEnumerable<string> SplitCSV(string input) { Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled); foreach (Match match in csvSplit.Matches(input)) { yield return match.Value.TrimStart(','); } }

也许这更像是一个扩展方法：

 public static class StringHelper { public static IEnumerable<string> SplitCSV(this string input) { Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled); foreach (Match match in csvSplit.Matches(input)) { yield return match.Value.TrimStart(','); } } }

不要重新创buildCSVparsing器，请尝试FileHelpers 。

这个正则expression式不需要像在公认的答案中那样遍历值和TrimStart(',') ：

 ((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))

这里是在C＃中的实现：

 string values = "111,222,\"33,44,55\",666,\"77,88\",\"99\""; MatchCollection matches = new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))").Matches(values); foreach (var match in matches) { Console.WriteLine(match); }

输出

 111 222 33,44,55 666 77,88 99

尝试这个：

  string s = @"111,222,""33,44,55"",666,""77,88"",""99"""; List<string> result = new List<string>(); var splitted = s.Split('"').ToList<string>(); splitted.RemoveAll(x => x == ","); foreach (var it in splitted) { if (it.StartsWith(",") || it.EndsWith(",")) { var tmp = it.TrimEnd(',').TrimStart(','); result.AddRange(tmp.Split(',')); } else { if(!string.IsNullOrEmpty(it)) result.Add(it); } } //Results: foreach (var it in result) { Console.WriteLine(it); }

对于Jay的回答，如果你使用第二个布尔值，那么你可以在单引号内嵌套双引号，反之亦然。

  private string[] splitString(string stringToSplit) { char[] characters = stringToSplit.ToCharArray(); List<string> returnValueList = new List<string>(); string tempString = ""; bool blockUntilEndQuote = false; bool blockUntilEndQuote2 = false; int characterCount = 0; foreach (char character in characters) { characterCount = characterCount + 1; if (character == '"' && !blockUntilEndQuote2) { if (blockUntilEndQuote == false) { blockUntilEndQuote = true; } else if (blockUntilEndQuote == true) { blockUntilEndQuote = false; } } if (character == '\'' && !blockUntilEndQuote) { if (blockUntilEndQuote2 == false) { blockUntilEndQuote2 = true; } else if (blockUntilEndQuote2 == true) { blockUntilEndQuote2 = false; } } if (character != ',') { tempString = tempString + character; } else if (character == ',' && (blockUntilEndQuote == true || blockUntilEndQuote2 == true)) { tempString = tempString + character; } else { returnValueList.Add(tempString); tempString = ""; } if (characterCount == characters.Length) { returnValueList.Add(tempString); tempString = ""; } } string[] returnValue = returnValueList.ToArray(); return returnValue; }

我知道我已经有点晚了，但是对于search来说，这是我在C sharp时所做的

 private string[] splitString(string stringToSplit) { char[] characters = stringToSplit.ToCharArray(); List<string> returnValueList = new List<string>(); string tempString = ""; bool blockUntilEndQuote = false; int characterCount = 0; foreach (char character in characters) { characterCount = characterCount + 1; if (character == '"') { if (blockUntilEndQuote == false) { blockUntilEndQuote = true; } else if (blockUntilEndQuote == true) { blockUntilEndQuote = false; } } if (character != ',') { tempString = tempString + character; } else if (character == ',' && blockUntilEndQuote == true) { tempString = tempString + character; } else { returnValueList.Add(tempString); tempString = ""; } if (characterCount == characters.Length) { returnValueList.Add(tempString); tempString = ""; } } string[] returnValue = returnValueList.ToArray(); return returnValue; }

当string在引号内有一个逗号时，如"value, 1" ， "value, 1"或双引号转义，如"value ""1""" ，这些value, 1应该被parsing为有效的CSV value, 1和value "1" 。

如果您传递选项卡而不是逗号作为您的分隔符，这也将使用制表符分隔的格式。

 public static IEnumerable<string> SplitRow(string row, char delimiter = ',') { var currentString = new StringBuilder(); var inQuotes = false; var quoteIsEscaped = false; //Store when a quote has been escaped. row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser. foreach (var character in row.Select((val, index) => new {val, index})) { if (character.val == delimiter) //We hit a delimiter character... { if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value. { Console.WriteLine(currentString); yield return currentString.ToString(); currentString.Clear(); } else { currentString.Append(character.val); } } else { if (character.val != ' ') { if(character.val == '"') //If we've hit a quote character... { if(character.val == '\"' && inQuotes) //Does it appear to be a closing quote? { if (row[character.index + 1] == character.val) //If the character afterwards is also a quote, this is to escape that (not a closing quote). { quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote. } else if (quoteIsEscaped) { quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false. currentString.Append(character.val); } else { inQuotes = false; } } else { if (!inQuotes) { inQuotes = true; } else { currentString.Append(character.val); //...It's a quote inside a quote. } } } else { currentString.Append(character.val); } } else { if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell { currentString.Append(character.val); } } } } }

对“Chad Hedgcock”提供的function稍作更新。

更新：

第26行：character.val =='\“' – 由于在第24行做了检查，这永远不会是真的。ie character.val =='”'

第28行：if（row [character.index + 1] == character.val）join！quoteIsEscaped来转义3个连续的引号。

 public static IEnumerable<string> SplitRow(string row, char delimiter = ',') { var currentString = new StringBuilder(); var inQuotes = false; var quoteIsEscaped = false; //Store when a quote has been escaped. row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser. foreach (var character in row.Select((val, index) => new {val, index})) { if (character.val == delimiter) //We hit a delimiter character... { if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value. { //Console.WriteLine(currentString); yield return currentString.ToString(); currentString.Clear(); } else { currentString.Append(character.val); } } else { if (character.val != ' ') { if(character.val == '"') //If we've hit a quote character... { if(character.val == '"' && inQuotes) //Does it appear to be a closing quote? { if (row[character.index + 1] == character.val && !quoteIsEscaped) //If the character afterwards is also a quote, this is to escape that (not a closing quote). { quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote. } else if (quoteIsEscaped) { quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false. currentString.Append(character.val); } else { inQuotes = false; } } else { if (!inQuotes) { inQuotes = true; } else { currentString.Append(character.val); //...It's a quote inside a quote. } } } else { currentString.Append(character.val); } } else { if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell { currentString.Append(character.val); } } } }

}

目前我使用下面的正则expression式：

  public static Regex regexCSVSplit = new Regex(@"(?x:( (?<FULL> (^|[,;\t\r\n])\s* ( (?<CODAT> (?<CO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<CO>\s*)[,;\t\r\n])*)\k<CO>) | (?<CODAT> (?<DAT> [^""',;\s\r\n]* )) ) (?=\s*([,;\t\r\n]|$)) ) | (?<FULL> (^|[\s\t\r\n]) ( (?<CODAT> (?<CO>[""'])(?<DAT> [^""',;\s\t\r\n]* )\k<CO>) | (?<CODAT> (?<DAT> [^""',;\s\t\r\n]* )) ) (?=[,;\s\t\r\n]|$)) ))", RegexOptions.Compiled);

这个解决scheme可以处理相当混乱的情况，如下所示：在这里输入图像描述

这是如何将结果馈送到数组中的：

  var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().Select(x => x.Groups["DAT"].Value).ToArray();

在这里看到这个例子

我曾经不得不做类似的事情，最后我遇到了正则expression式。无法正则expression式有状态使它非常棘手 – 我只是写了一个简单的小parsing器 。

如果你正在做CSVparsing，你应该坚持使用CSVparsing器 – 不要重新发明轮子。

这是我基于string原始指针操作的最快实现：

 string[] FastSplit(string sText, char? cSeparator = null, char? cQuotes = null) { string[] oTokens; if (null == cSeparator) { cSeparator = DEFAULT_PARSEFIELDS_SEPARATOR; } if (null == cQuotes) { cQuotes = DEFAULT_PARSEFIELDS_QUOTE; } unsafe { fixed (char* lpText = sText) { #region Fast array estimatation char* lpCurrent = lpText; int nEstimatedSize = 0; while (0 != *lpCurrent) { if (cSeparator == *lpCurrent) { nEstimatedSize++; } lpCurrent++; } nEstimatedSize++; // Add EOL char(s) string[] oEstimatedTokens = new string[nEstimatedSize]; #endregion #region Parsing char[] oBuffer = new char[sText.Length]; int nIndex = 0; int nTokens = 0; lpCurrent = lpText; while (0 != *lpCurrent) { if (cQuotes == *lpCurrent) { // Quotes parsing lpCurrent++; // Skip quote nIndex = 0; // Reset buffer while ( (0 != *lpCurrent) && (cQuotes != *lpCurrent) ) { oBuffer[nIndex] = *lpCurrent; // Store char lpCurrent++; // Move source cursor nIndex++; // Move target cursor } } else if (cSeparator == *lpCurrent) { // Separator char parsing oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); // Store token nIndex = 0; // Skip separator and Reset buffer } else { // Content parsing oBuffer[nIndex] = *lpCurrent; // Store char nIndex++; // Move target cursor } lpCurrent++; // Move source cursor } // Recover pending buffer if (nIndex > 0) { // Store token oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); } // Build final tokens list if (nTokens == nEstimatedSize) { oTokens = oEstimatedTokens; } else { oTokens = new string[nTokens]; Array.Copy(oEstimatedTokens, 0, oTokens, 0, nTokens); } #endregion } } // Epilogue return oTokens; }

我需要一些更强大的东西，所以我从这里拿走并创build了这个…这个解决scheme稍微不够优雅，稍微冗长些，但在我的testing中（有1,000,000行样本），我发现这是2要快3倍另外它处理非转义的embedded式引号。由于我的解决scheme的要求，我使用string分隔符和限定符而不是字符。我发现它比我预期find一个好的，通用的CSVparsing器更困难，所以我希望这个parsingalgorithm可以帮助某人。

  public static string[] SplitRow(string record, string delimiter, string qualifier, bool trimData) { // In-Line for example, but I implemented as string extender in production code Func <string, int, int> IndexOfNextNonWhiteSpaceChar = delegate (string source, int startIndex) { if (startIndex >= 0) { if (source != null) { for (int i = startIndex; i < source.Length; i++) { if (!char.IsWhiteSpace(source[i])) { return i; } } } } return -1; }; var results = new List<string>(); var result = new StringBuilder(); var inQualifier = false; var inField = false; // We add new columns at the delimiter, so append one for the parser. var row = $"{record}{delimiter}"; for (var idx = 0; idx < row.Length; idx++) { // A delimiter character... if (row[idx]== delimiter[0]) { // Are we inside qualifier? If not, we've hit the end of a column value. if (!inQualifier) { results.Add(trimData ? result.ToString().Trim() : result.ToString()); result.Clear(); inField = false; } else { result.Append(row[idx]); } } // NOT a delimiter character... else { // ...Not a space character if (row[idx] != ' ') { // A qualifier character... if (row[idx] == qualifier[0]) { // Qualifier is closing qualifier... if (inQualifier && row[IndexOfNextNonWhiteSpaceChar(row, idx + 1)] == delimiter[0]) { inQualifier = false; continue; } else { // ...Qualifier is opening qualifier if (!inQualifier) { inQualifier = true; } // ...It's a qualifier inside a qualifier. else { inField = true; result.Append(row[idx]); } } } // Not a qualifier character... else { result.Append(row[idx]); inField = true; } } // ...A space character else { if (inQualifier || inField) { result.Append(row[idx]); } } } } return results.ToArray<string>(); }

一些testing代码：

  //var input = "111,222,\"33,44,55\",666,\"77,88\",\"99\""; var input = "111, 222, \"99\",\"33,44,55\" , \"666 \"mark of a man\"\", \" spaces \"77,88\" \""; Console.WriteLine("Split with trim"); Console.WriteLine("---------------"); var result = SplitRow(input, ",", "\"", true); foreach (var r in result) { Console.WriteLine(r); } Console.WriteLine(""); // Split 2 Console.WriteLine("Split with no trim"); Console.WriteLine("------------------"); var result2 = SplitRow(input, ",", "\"", false); foreach (var r in result2) { Console.WriteLine(r); } Console.WriteLine(""); // Time Trial 1 Console.WriteLine("Experimental Process (1,000,000) iterations"); Console.WriteLine("-------------------------------------------"); watch = Stopwatch.StartNew(); for (var i = 0; i < 1000000; i++) { var x1 = SplitRow(input, ",", "\"", false); } watch.Stop(); elapsedMs = watch.ElapsedMilliseconds; Console.WriteLine($"Total Process Time: {string.Format("{0:0.###}", elapsedMs / 1000.0)} Seconds"); Console.WriteLine("");

结果

 Split with trim --------------- 111 222 99 33,44,55 666 "mark of a man" spaces "77,88" Split with no trim ------------------ 111 222 99 33,44,55 666 "mark of a man" spaces "77,88" Original Process (1,000,000) iterations ------------------------------- Total Process Time: 7.538 Seconds Experimental Process (1,000,000) iterations -------------------------------------------- Total Process Time: 3.363 Seconds

使用带引号和不带引号的string拆分逗号分隔的string

在c＃中使用十进制值作为属性参数？

C ++相当于sprintf？

C中的关联数组

如何使用entity frameworklocking读取的表格？

未定义的引用vtable。试图编译一个Qt项目

为什么我的讲师写的所有C文件都以＃开始？

财产是否应该与其types相同？

如何使用dependency injection的属性？

如何将Metro应用程序部署到桌面？

将事件转换为任务的可重用模式

使用带引号和不带引号的string拆分逗号分隔的string

在c＃中使用十进制值作为属性参数？

C ++相当于sprintf？

C中的关联数组

如何使用entity frameworklocking读取的表格？

未定义的引用vtable。 试图编译一个Qt项目

为什么我的讲师写的所有C文件都以＃开始？

财产是否应该与其types相同？

如何使用dependency injection的属性？

如何将Metro应用程序部署到桌面？

将事件转换为任务的可重用模式

未定义的引用vtable。试图编译一个Qt项目