如何用一个空格replace多个空格

比方说,我有一个string,如:

"Hello how are you doing?" 

我想要一个把多个空格变成一个空格的函数。

所以我会得到:

 "Hello how are you doing?" 

我知道我可以使用正则expression式或调用

 string s = "Hello how are you doing?".replace(" "," "); 

但是我不得不多次调用它以确保所有顺序的空格都被replace为一个。

有没有内置的方法呢?

 string cleanedString = System.Text.RegularExpressions.Regex.Replace(dirtyString,@"\s+"," "); 

这个问题并不像其他海报所说的那样简单(而且正如我原先所认为的那样) – 因为这个问题不是很确切。

“空格”和“空白”之间有区别。 如果你只是指空格,那么你应该使用" {2,}"的正则expression式。 如果你的意思是任何空白,这是另一回事。 所有的空格都应该转换为空格吗? 开始和结束时会发生什么?

对于下面的基准,我假设你只关心空间,而且你不想对单个空间做任何事情,即使在开始和结束时也是如此。

请注意,正确性总是比性能更重要。 Split / Join解决scheme删除任何前导/尾随空白(即使只是单个空格)的事实是不正确的,只要您指定的要求(当然可能不完整)。

基准使用MiniBench 。

 using System; using System.Text.RegularExpressions; using MiniBench; internal class Program { public static void Main(string[] args) { int size = int.Parse(args[0]); int gapBetweenExtraSpaces = int.Parse(args[1]); char[] chars = new char[size]; for (int i=0; i < size/2; i += 2) { // Make sure there actually *is* something to do chars[i*2] = (i % gapBetweenExtraSpaces == 1) ? ' ' : 'x'; chars[i*2 + 1] = ' '; } // Just to make sure we don't have a \0 at the end // for odd sizes chars[chars.Length-1] = 'y'; string bigString = new string(chars); // Assume that one form works :) string normalized = NormalizeWithSplitAndJoin(bigString); var suite = new TestSuite<string, string>("Normalize") .Plus(NormalizeWithSplitAndJoin) .Plus(NormalizeWithRegex) .RunTests(bigString, normalized); suite.Display(ResultColumns.All, suite.FindBest()); } private static readonly Regex MultipleSpaces = new Regex(@" {2,}", RegexOptions.Compiled); static string NormalizeWithRegex(string input) { return MultipleSpaces.Replace(input, " "); } // Guessing as the post doesn't specify what to use private static readonly char[] Whitespace = new char[] { ' ' }; static string NormalizeWithSplitAndJoin(string input) { string[] split = input.Split (Whitespace, StringSplitOptions.RemoveEmptyEntries); return string.Join(" ", split); } } 

一些testing运行:

 c:\Users\Jon\Test>test 1000 50 ============ Normalize ============ NormalizeWithSplitAndJoin 1159091 0:30.258 22.93 NormalizeWithRegex 26378882 0:30.025 1.00 c:\Users\Jon\Test>test 1000 5 ============ Normalize ============ NormalizeWithSplitAndJoin 947540 0:30.013 1.07 NormalizeWithRegex 1003862 0:29.610 1.00 c:\Users\Jon\Test>test 1000 1001 ============ Normalize ============ NormalizeWithSplitAndJoin 1156299 0:29.898 21.99 NormalizeWithRegex 23243802 0:27.335 1.00 

在这里,第一个数字是迭代次数,第二个是需要的时间,第三个是缩放分数,1.0是最好的。

这表明,至less在某些情况下(包括这个),正则expression式可以超越Split / Join解决scheme,有时甚至会有非常显着的差距。

但是,如果您更改为“全部空白”要求,则“拆分/连接” 似乎会胜出。 正如往常一样,魔鬼是在细节…

虽然现有的答案是好的,但我想指出一个不起作用的方法:

 public static string DontUseThisToCollapseSpaces(string text) { while (text.IndexOf(" ") != -1) { text = text.Replace(" ", " "); } return text; } 

这可以永远循环。 有人在乎为什么? (几年前,当我被问及新闻组的问题时,我才发现这个问题……有人真的遇到了这个问题。)

定期expressoin将是最简单的方法。 如果以正确的方式编写正则expression式,则不需要多次调用。

将其更改为:

 string s = System.Text.RegularExpressions.Regex.Replace(s, @"\s{2,}", " "); 

正如已经指出的那样,这很容易通过正则expression式来完成。 我只是补充说,你可能想添加一个.trim()来摆脱前/后空白。

我分享我使用的东西,因为看起来我已经想出了一些不同的东西。 我已经使用了一段时间,这对我来说足够快。 我不确定它是如何叠加起来的。 我在分隔的文件编写器中使用它,并通过它每次运行一个字段的大型数据表。

  public static string NormalizeWhiteSpace(string S) { string s = S.Trim(); bool iswhite = false; int iwhite; int sLength = s.Length; StringBuilder sb = new StringBuilder(sLength); foreach(char c in s.ToCharArray()) { if(Char.IsWhiteSpace(c)) { if (iswhite) { //Continuing whitespace ignore it. continue; } else { //New WhiteSpace //Replace whitespace with a single space. sb.Append(" "); //Set iswhite to True and any following whitespace will be ignored iswhite = true; } } else { sb.Append(c.ToString()); //reset iswhitespace to false iswhite = false; } } return sb.ToString(); } 

使用Jon Skeet发布的testing程序,我试着看看能否得到一个手写的循环来加快运行速度。
每次我都可以击败NormalizeWithSplitAndJoin,但是只能input1000,5的NormalizeWithRegex。

 static string NormalizeWithLoop(string input) { StringBuilder output = new StringBuilder(input.Length); char lastChar = '*'; // anything other then space for (int i = 0; i < input.Length; i++) { char thisChar = input[i]; if (!(lastChar == ' ' && thisChar == ' ')) output.Append(thisChar); lastChar = thisChar; } return output.ToString(); } 

我没有看到抖动产生的机器代码,但是我期望的问题是调用StringBuilder.Append()所花费的时间,要做得更好,需要使用不安全的代码。

所以Regex.Replace()是非常快,很难打!

这是我工作的解决scheme。 没有RegEx和String.Split。

 public static string TrimWhiteSpace(this string Value) { StringBuilder sbOut = new StringBuilder(); if (!string.IsNullOrEmpty(Value)) { bool IsWhiteSpace = false; for (int i = 0; i < Value.Length; i++) { if (char.IsWhiteSpace(Value[i])) //Comparion with WhiteSpace { if (!IsWhiteSpace) //Comparison with previous Char { sbOut.Append(Value[i]); IsWhiteSpace = true; } } else { IsWhiteSpace = false; sbOut.Append(Value[i]); } } } return sbOut.ToString(); } 

所以你可以:

 string cleanedString = dirtyString.TrimWhiteSpace(); 
 Regex regex = new Regex(@"\W+"); string outputString = regex.Replace(inputString, " "); 

VB.NET Linha.Split(“”).ToList()。Where(Function(x)x <>“”).ToArray

C#Linha.Split(“”).ToList()。其中​​(x => x <>“”).ToArray()

享受Linq = D的力量

一个快速的额外的空白卸妆…这是最快的一个,是基于菲利普马查多的就地副本。

 static string InPlaceCharArray(string str) { var len = str.Length; var src = str.ToCharArray(); int dstIdx = 0; bool lastWasWS = false; for (int i = 0; i < len; i++) { var ch = src[i]; if (src[i] == '\u0020') { if (lastWasWS == false) { src[dstIdx++] = ch; lastWasWS = true; } } else { lastWasWS = false; src[dstIdx++] = ch; } } return new string(src, 0, dstIdx); } 

基准…

InPlaceCharArraySpaceOnly由Felipe Machado在CodeProject 2015上进行修改,并由Sunsetquest进行修改,以便进行多空间移除。 时间:3.75蜱

InPlaceCharArray by Felipe Machado 2015,稍微修改了Sunsetquest的多空间删除。 时间6.50蜱 (也支持标签)

由Jon Skeet提供的 SplitAndJoinOnSpace。 时间:13.25蜱

StringBuilder by fubo 时间:13.5 滴答 (也支持标签)

正则expression式由Jon Skeet编译。 时间:17蜱

由String S的 StringBuilder 2013年时间:30.5蜱

正则expression式与布兰登非编译时间:63.25蜱

StringBuilder由user214147 时间:77.125蜱

用非编译Tim Hoolihan的正则expression式时间:147.25蜱

基准代码…

 using System; using System.Text.RegularExpressions; using System.Diagnostics; using System.Threading; using System.Text; static class Program { public static void Main(string[] args) { long seed = ConfigProgramForBenchmarking(); Stopwatch sw = new Stopwatch(); string warmup = "This is a Warm up function for best benchmark results." + seed; string input1 = "Hello World, how are you doing?" + seed; string input2 = "It\twas\t \tso nice to\t\t see you \tin 1950. \t" + seed; string correctOutput1 = "Hello World, how are you doing?" + seed; string correctOutput2 = "It\twas\tso nice to\tsee you in 1950. " + seed; string output1,output2; //warm-up timer function sw.Restart(); sw.Stop(); sw.Restart(); sw.Stop(); long baseVal = sw.ElapsedTicks; // InPlace Replace by Felipe Machado but modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin) output1 = InPlaceCharArraySpaceOnly (warmup); sw.Restart(); output1 = InPlaceCharArraySpaceOnly (input1); output2 = InPlaceCharArraySpaceOnly (input2); sw.Stop(); Console.WriteLine("InPlaceCharArraySpaceOnly : " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); // InPlace Replace by Felipe R. Machado and slightly modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin) output1 = InPlaceCharArray(warmup); sw.Restart(); output1 = InPlaceCharArray(input1); output2 = InPlaceCharArray(input2); sw.Stop(); Console.WriteLine("InPlaceCharArray: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //Regex with non-compile Tim Hoolihan (https://stackoverflow.com/a/1279874/2352507) string cleanedString = output1 = Regex.Replace(warmup, @"\s+", " "); sw.Restart(); output1 = Regex.Replace(input1, @"\s+", " "); output2 = Regex.Replace(input2, @"\s+", " "); sw.Stop(); Console.WriteLine("Regex by Tim Hoolihan: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //Regex with compile by Jon Skeet (https://stackoverflow.com/a/1280227/2352507) output1 = MultipleSpaces.Replace(warmup, " "); sw.Restart(); output1 = MultipleSpaces.Replace(input1, " "); output2 = MultipleSpaces.Replace(input2, " "); sw.Stop(); Console.WriteLine("Regex with compile by Jon Skeet: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //Split And Join by Jon Skeet (https://stackoverflow.com/a/1280227/2352507) output1 = SplitAndJoinOnSpace(warmup); sw.Restart(); output1 = SplitAndJoinOnSpace(input1); output2 = SplitAndJoinOnSpace(input2); sw.Stop(); Console.WriteLine("Split And Join by Jon Skeet: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //Regex by Brandon (https://stackoverflow.com/a/1279878/2352507 output1 = Regex.Replace(warmup, @"\s{2,}", " "); sw.Restart(); output1 = Regex.Replace(input1, @"\s{2,}", " "); output2 = Regex.Replace(input2, @"\s{2,}", " "); sw.Stop(); Console.WriteLine("Regex by Brandon: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //StringBuilder by user214147 (https://stackoverflow.com/a/2156660/2352507 output1 = user214147(warmup); sw.Restart(); output1 = user214147(input1); output2 = user214147(input2); sw.Stop(); Console.WriteLine("StringBuilder by user214147: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //StringBuilder by fubo (https://stackoverflow.com/a/27502353/2352507 output1 = fubo(warmup); sw.Restart(); output1 = fubo(input1); output2 = fubo(input2); sw.Stop(); Console.WriteLine("StringBuilder by fubo: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //StringBuilder by David S 2013 (https://stackoverflow.com/a/16035044/2352507) output1 = SingleSpacedTrim(warmup); sw.Restart(); output1 = SingleSpacedTrim(input1); output2 = SingleSpacedTrim(input2); sw.Stop(); Console.WriteLine("StringBuilder(SingleSpacedTrim) by David S: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); } // InPlace Replace by Felipe Machado and slightly modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin) static string InPlaceCharArray(string str) { var len = str.Length; var src = str.ToCharArray(); int dstIdx = 0; bool lastWasWS = false; for (int i = 0; i < len; i++) { var ch = src[i]; if (src[i] == '\u0020') { if (lastWasWS == false) { src[dstIdx++] = ch; lastWasWS = true; } } else { lastWasWS = false; src[dstIdx++] = ch; } } return new string(src, 0, dstIdx); } // InPlace Replace by Felipe R. Machado but modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin) static string InPlaceCharArraySpaceOnly (string str) { var len = str.Length; var src = str.ToCharArray(); int dstIdx = 0; bool lastWasWS = false; //Added line for (int i = 0; i < len; i++) { var ch = src[i]; switch (ch) { case '\u0020': //SPACE case '\u00A0': //NO-BREAK SPACE case '\u1680': //OGHAM SPACE MARK case '\u2000': // EN QUAD case '\u2001': //EM QUAD case '\u2002': //EN SPACE case '\u2003': //EM SPACE case '\u2004': //THREE-PER-EM SPACE case '\u2005': //FOUR-PER-EM SPACE case '\u2006': //SIX-PER-EM SPACE case '\u2007': //FIGURE SPACE case '\u2008': //PUNCTUATION SPACE case '\u2009': //THIN SPACE case '\u200A': //HAIR SPACE case '\u202F': //NARROW NO-BREAK SPACE case '\u205F': //MEDIUM MATHEMATICAL SPACE case '\u3000': //IDEOGRAPHIC SPACE case '\u2028': //LINE SEPARATOR case '\u2029': //PARAGRAPH SEPARATOR case '\u0009': //[ASCII Tab] case '\u000A': //[ASCII Line Feed] case '\u000B': //[ASCII Vertical Tab] case '\u000C': //[ASCII Form Feed] case '\u000D': //[ASCII Carriage Return] case '\u0085': //NEXT LINE if (lastWasWS == false) //Added line { src[dstIdx++] = ch; //Added line lastWasWS = true; //Added line } continue; default: lastWasWS = false; //Added line src[dstIdx++] = ch; break; } } return new string(src, 0, dstIdx); } static readonly Regex MultipleSpaces = new Regex(@" {2,}", RegexOptions.Compiled); //Split And Join by Jon Skeet (https://stackoverflow.com/a/1280227/2352507) static string SplitAndJoinOnSpace(string input) { string[] split = input.Split(new char[] { ' '}, StringSplitOptions.RemoveEmptyEntries); return string.Join(" ", split); } //StringBuilder by user214147 (https://stackoverflow.com/a/2156660/2352507 public static string user214147(string S) { string s = S.Trim(); bool iswhite = false; int iwhite; int sLength = s.Length; StringBuilder sb = new StringBuilder(sLength); foreach (char c in s.ToCharArray()) { if (Char.IsWhiteSpace(c)) { if (iswhite) { //Continuing whitespace ignore it. continue; } else { //New WhiteSpace //Replace whitespace with a single space. sb.Append(" "); //Set iswhite to True and any following whitespace will be ignored iswhite = true; } } else { sb.Append(c.ToString()); //reset iswhitespace to false iswhite = false; } } return sb.ToString(); } //StringBuilder by fubo (https://stackoverflow.com/a/27502353/2352507 public static string fubo(this string Value) { StringBuilder sbOut = new StringBuilder(); if (!string.IsNullOrEmpty(Value)) { bool IsWhiteSpace = false; for (int i = 0; i < Value.Length; i++) { if (char.IsWhiteSpace(Value[i])) //Comparison with WhiteSpace { if (!IsWhiteSpace) //Comparison with previous Char { sbOut.Append(Value[i]); IsWhiteSpace = true; } } else { IsWhiteSpace = false; sbOut.Append(Value[i]); } } } return sbOut.ToString(); } //David S. 2013 (https://stackoverflow.com/a/16035044/2352507) public static String SingleSpacedTrim(String inString) { StringBuilder sb = new StringBuilder(); Boolean inBlanks = false; foreach (Char c in inString) { switch (c) { case '\r': case '\n': case '\t': case ' ': if (!inBlanks) { inBlanks = true; sb.Append(' '); } continue; default: inBlanks = false; sb.Append(c); break; } } return sb.ToString().Trim(); } /// <summary> /// We want to run this item with max priory to lower the odds of /// the OS from doing program context switches in the middle of our code. /// source:https://stackoverflow.com/a/16157458 /// </summary> /// <returns>random seed</returns> private static long ConfigProgramForBenchmarking() { //prevent the JIT Compiler from optimizing Fkt calls away long seed = Environment.TickCount; //use the second Core/Processor for the test Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); //prevent "Normal" Processes from interrupting Threads Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High; //prevent "Normal" Threads from interrupting this thread Thread.CurrentThread.Priority = ThreadPriority.Highest; return seed; } 

}

基准说明:发布模式,无debugging器连接,i7处理器,平均4次运行,只testing短string

最小的解决scheme:

var regExp = / \ s + / g,newString = oldString.replace(regExp,'');

没有办法做到这一点。 你可以试试这个:

 private static readonly char[] whitespace = new char[] { ' ', '\n', '\t', '\r', '\f', '\v' }; public static string Normalize(string source) { return String.Join(" ", source.Split(whitespace, StringSplitOptions.RemoveEmptyEntries)); } 

这将删除前导和尾随whitespce以及将任何内部空白折叠为单个空格字符。 如果你真的只想崩溃空间,那么使用正则expression式的解决scheme更好; 否则这个解决scheme更好。 (请参阅Jon Skeet所做的分析 。)