如何用一个空格replace多个空格
比方说,我有一个string,如:
"Hello how are you doing?"
我想要一个把多个空格变成一个空格的函数。
所以我会得到:
"Hello how are you doing?"
我知道我可以使用正则expression式或调用
string s = "Hello how are you doing?".replace(" "," ");
但是我不得不多次调用它以确保所有顺序的空格都被replace为一个。
有没有内置的方法呢?
string cleanedString = System.Text.RegularExpressions.Regex.Replace(dirtyString,@"\s+"," ");
这个问题并不像其他海报所说的那样简单(而且正如我原先所认为的那样) – 因为这个问题不是很确切。
“空格”和“空白”之间有区别。 如果你只是指空格,那么你应该使用" {2,}"
的正则expression式。 如果你的意思是任何空白,这是另一回事。 所有的空格都应该转换为空格吗? 开始和结束时会发生什么?
对于下面的基准,我假设你只关心空间,而且你不想对单个空间做任何事情,即使在开始和结束时也是如此。
请注意,正确性总是比性能更重要。 Split / Join解决scheme删除任何前导/尾随空白(即使只是单个空格)的事实是不正确的,只要您指定的要求(当然可能不完整)。
基准使用MiniBench 。
using System; using System.Text.RegularExpressions; using MiniBench; internal class Program { public static void Main(string[] args) { int size = int.Parse(args[0]); int gapBetweenExtraSpaces = int.Parse(args[1]); char[] chars = new char[size]; for (int i=0; i < size/2; i += 2) { // Make sure there actually *is* something to do chars[i*2] = (i % gapBetweenExtraSpaces == 1) ? ' ' : 'x'; chars[i*2 + 1] = ' '; } // Just to make sure we don't have a \0 at the end // for odd sizes chars[chars.Length-1] = 'y'; string bigString = new string(chars); // Assume that one form works :) string normalized = NormalizeWithSplitAndJoin(bigString); var suite = new TestSuite<string, string>("Normalize") .Plus(NormalizeWithSplitAndJoin) .Plus(NormalizeWithRegex) .RunTests(bigString, normalized); suite.Display(ResultColumns.All, suite.FindBest()); } private static readonly Regex MultipleSpaces = new Regex(@" {2,}", RegexOptions.Compiled); static string NormalizeWithRegex(string input) { return MultipleSpaces.Replace(input, " "); } // Guessing as the post doesn't specify what to use private static readonly char[] Whitespace = new char[] { ' ' }; static string NormalizeWithSplitAndJoin(string input) { string[] split = input.Split (Whitespace, StringSplitOptions.RemoveEmptyEntries); return string.Join(" ", split); } }
一些testing运行:
c:\Users\Jon\Test>test 1000 50 ============ Normalize ============ NormalizeWithSplitAndJoin 1159091 0:30.258 22.93 NormalizeWithRegex 26378882 0:30.025 1.00 c:\Users\Jon\Test>test 1000 5 ============ Normalize ============ NormalizeWithSplitAndJoin 947540 0:30.013 1.07 NormalizeWithRegex 1003862 0:29.610 1.00 c:\Users\Jon\Test>test 1000 1001 ============ Normalize ============ NormalizeWithSplitAndJoin 1156299 0:29.898 21.99 NormalizeWithRegex 23243802 0:27.335 1.00
在这里,第一个数字是迭代次数,第二个是需要的时间,第三个是缩放分数,1.0是最好的。
这表明,至less在某些情况下(包括这个),正则expression式可以超越Split / Join解决scheme,有时甚至会有非常显着的差距。
但是,如果您更改为“全部空白”要求,则“拆分/连接” 似乎会胜出。 正如往常一样,魔鬼是在细节…
虽然现有的答案是好的,但我想指出一个不起作用的方法:
public static string DontUseThisToCollapseSpaces(string text) { while (text.IndexOf(" ") != -1) { text = text.Replace(" ", " "); } return text; }
这可以永远循环。 有人在乎为什么? (几年前,当我被问及新闻组的问题时,我才发现这个问题……有人真的遇到了这个问题。)
定期expressoin将是最简单的方法。 如果以正确的方式编写正则expression式,则不需要多次调用。
将其更改为:
string s = System.Text.RegularExpressions.Regex.Replace(s, @"\s{2,}", " ");
正如已经指出的那样,这很容易通过正则expression式来完成。 我只是补充说,你可能想添加一个.trim()来摆脱前/后空白。
我分享我使用的东西,因为看起来我已经想出了一些不同的东西。 我已经使用了一段时间,这对我来说足够快。 我不确定它是如何叠加起来的。 我在分隔的文件编写器中使用它,并通过它每次运行一个字段的大型数据表。
public static string NormalizeWhiteSpace(string S) { string s = S.Trim(); bool iswhite = false; int iwhite; int sLength = s.Length; StringBuilder sb = new StringBuilder(sLength); foreach(char c in s.ToCharArray()) { if(Char.IsWhiteSpace(c)) { if (iswhite) { //Continuing whitespace ignore it. continue; } else { //New WhiteSpace //Replace whitespace with a single space. sb.Append(" "); //Set iswhite to True and any following whitespace will be ignored iswhite = true; } } else { sb.Append(c.ToString()); //reset iswhitespace to false iswhite = false; } } return sb.ToString(); }
使用Jon Skeet发布的testing程序,我试着看看能否得到一个手写的循环来加快运行速度。
每次我都可以击败NormalizeWithSplitAndJoin,但是只能input1000,5的NormalizeWithRegex。
static string NormalizeWithLoop(string input) { StringBuilder output = new StringBuilder(input.Length); char lastChar = '*'; // anything other then space for (int i = 0; i < input.Length; i++) { char thisChar = input[i]; if (!(lastChar == ' ' && thisChar == ' ')) output.Append(thisChar); lastChar = thisChar; } return output.ToString(); }
我没有看到抖动产生的机器代码,但是我期望的问题是调用StringBuilder.Append()所花费的时间,要做得更好,需要使用不安全的代码。
所以Regex.Replace()是非常快,很难打!
这是我工作的解决scheme。 没有RegEx和String.Split。
public static string TrimWhiteSpace(this string Value) { StringBuilder sbOut = new StringBuilder(); if (!string.IsNullOrEmpty(Value)) { bool IsWhiteSpace = false; for (int i = 0; i < Value.Length; i++) { if (char.IsWhiteSpace(Value[i])) //Comparion with WhiteSpace { if (!IsWhiteSpace) //Comparison with previous Char { sbOut.Append(Value[i]); IsWhiteSpace = true; } } else { IsWhiteSpace = false; sbOut.Append(Value[i]); } } } return sbOut.ToString(); }
所以你可以:
string cleanedString = dirtyString.TrimWhiteSpace();
Regex regex = new Regex(@"\W+"); string outputString = regex.Replace(inputString, " ");
VB.NET Linha.Split(“”).ToList()。Where(Function(x)x <>“”).ToArray
C#Linha.Split(“”).ToList()。其中(x => x <>“”).ToArray()
享受Linq = D的力量
一个快速的额外的空白卸妆…这是最快的一个,是基于菲利普马查多的就地副本。
static string InPlaceCharArray(string str) { var len = str.Length; var src = str.ToCharArray(); int dstIdx = 0; bool lastWasWS = false; for (int i = 0; i < len; i++) { var ch = src[i]; if (src[i] == '\u0020') { if (lastWasWS == false) { src[dstIdx++] = ch; lastWasWS = true; } } else { lastWasWS = false; src[dstIdx++] = ch; } } return new string(src, 0, dstIdx); }
基准…
InPlaceCharArraySpaceOnly由Felipe Machado在CodeProject 2015上进行修改,并由Sunsetquest进行修改,以便进行多空间移除。 时间:3.75蜱
InPlaceCharArray by Felipe Machado 2015,稍微修改了Sunsetquest的多空间删除。 时间6.50蜱 (也支持标签)
由Jon Skeet提供的 SplitAndJoinOnSpace。 时间:13.25蜱
StringBuilder by fubo 时间:13.5 滴答 (也支持标签)
正则expression式由Jon Skeet编译。 时间:17蜱
由String S的 StringBuilder 2013年时间:30.5蜱
正则expression式与布兰登非编译时间:63.25蜱
StringBuilder由user214147 时间:77.125蜱
用非编译Tim Hoolihan的正则expression式时间:147.25蜱
基准代码…
using System; using System.Text.RegularExpressions; using System.Diagnostics; using System.Threading; using System.Text; static class Program { public static void Main(string[] args) { long seed = ConfigProgramForBenchmarking(); Stopwatch sw = new Stopwatch(); string warmup = "This is a Warm up function for best benchmark results." + seed; string input1 = "Hello World, how are you doing?" + seed; string input2 = "It\twas\t \tso nice to\t\t see you \tin 1950. \t" + seed; string correctOutput1 = "Hello World, how are you doing?" + seed; string correctOutput2 = "It\twas\tso nice to\tsee you in 1950. " + seed; string output1,output2; //warm-up timer function sw.Restart(); sw.Stop(); sw.Restart(); sw.Stop(); long baseVal = sw.ElapsedTicks; // InPlace Replace by Felipe Machado but modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin) output1 = InPlaceCharArraySpaceOnly (warmup); sw.Restart(); output1 = InPlaceCharArraySpaceOnly (input1); output2 = InPlaceCharArraySpaceOnly (input2); sw.Stop(); Console.WriteLine("InPlaceCharArraySpaceOnly : " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); // InPlace Replace by Felipe R. Machado and slightly modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin) output1 = InPlaceCharArray(warmup); sw.Restart(); output1 = InPlaceCharArray(input1); output2 = InPlaceCharArray(input2); sw.Stop(); Console.WriteLine("InPlaceCharArray: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //Regex with non-compile Tim Hoolihan (https://stackoverflow.com/a/1279874/2352507) string cleanedString = output1 = Regex.Replace(warmup, @"\s+", " "); sw.Restart(); output1 = Regex.Replace(input1, @"\s+", " "); output2 = Regex.Replace(input2, @"\s+", " "); sw.Stop(); Console.WriteLine("Regex by Tim Hoolihan: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //Regex with compile by Jon Skeet (https://stackoverflow.com/a/1280227/2352507) output1 = MultipleSpaces.Replace(warmup, " "); sw.Restart(); output1 = MultipleSpaces.Replace(input1, " "); output2 = MultipleSpaces.Replace(input2, " "); sw.Stop(); Console.WriteLine("Regex with compile by Jon Skeet: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //Split And Join by Jon Skeet (https://stackoverflow.com/a/1280227/2352507) output1 = SplitAndJoinOnSpace(warmup); sw.Restart(); output1 = SplitAndJoinOnSpace(input1); output2 = SplitAndJoinOnSpace(input2); sw.Stop(); Console.WriteLine("Split And Join by Jon Skeet: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //Regex by Brandon (https://stackoverflow.com/a/1279878/2352507 output1 = Regex.Replace(warmup, @"\s{2,}", " "); sw.Restart(); output1 = Regex.Replace(input1, @"\s{2,}", " "); output2 = Regex.Replace(input2, @"\s{2,}", " "); sw.Stop(); Console.WriteLine("Regex by Brandon: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //StringBuilder by user214147 (https://stackoverflow.com/a/2156660/2352507 output1 = user214147(warmup); sw.Restart(); output1 = user214147(input1); output2 = user214147(input2); sw.Stop(); Console.WriteLine("StringBuilder by user214147: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //StringBuilder by fubo (https://stackoverflow.com/a/27502353/2352507 output1 = fubo(warmup); sw.Restart(); output1 = fubo(input1); output2 = fubo(input2); sw.Stop(); Console.WriteLine("StringBuilder by fubo: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); //StringBuilder by David S 2013 (https://stackoverflow.com/a/16035044/2352507) output1 = SingleSpacedTrim(warmup); sw.Restart(); output1 = SingleSpacedTrim(input1); output2 = SingleSpacedTrim(input2); sw.Stop(); Console.WriteLine("StringBuilder(SingleSpacedTrim) by David S: " + (sw.ElapsedTicks - baseVal)); Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL ")); Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL ")); } // InPlace Replace by Felipe Machado and slightly modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin) static string InPlaceCharArray(string str) { var len = str.Length; var src = str.ToCharArray(); int dstIdx = 0; bool lastWasWS = false; for (int i = 0; i < len; i++) { var ch = src[i]; if (src[i] == '\u0020') { if (lastWasWS == false) { src[dstIdx++] = ch; lastWasWS = true; } } else { lastWasWS = false; src[dstIdx++] = ch; } } return new string(src, 0, dstIdx); } // InPlace Replace by Felipe R. Machado but modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin) static string InPlaceCharArraySpaceOnly (string str) { var len = str.Length; var src = str.ToCharArray(); int dstIdx = 0; bool lastWasWS = false; //Added line for (int i = 0; i < len; i++) { var ch = src[i]; switch (ch) { case '\u0020': //SPACE case '\u00A0': //NO-BREAK SPACE case '\u1680': //OGHAM SPACE MARK case '\u2000': // EN QUAD case '\u2001': //EM QUAD case '\u2002': //EN SPACE case '\u2003': //EM SPACE case '\u2004': //THREE-PER-EM SPACE case '\u2005': //FOUR-PER-EM SPACE case '\u2006': //SIX-PER-EM SPACE case '\u2007': //FIGURE SPACE case '\u2008': //PUNCTUATION SPACE case '\u2009': //THIN SPACE case '\u200A': //HAIR SPACE case '\u202F': //NARROW NO-BREAK SPACE case '\u205F': //MEDIUM MATHEMATICAL SPACE case '\u3000': //IDEOGRAPHIC SPACE case '\u2028': //LINE SEPARATOR case '\u2029': //PARAGRAPH SEPARATOR case '\u0009': //[ASCII Tab] case '\u000A': //[ASCII Line Feed] case '\u000B': //[ASCII Vertical Tab] case '\u000C': //[ASCII Form Feed] case '\u000D': //[ASCII Carriage Return] case '\u0085': //NEXT LINE if (lastWasWS == false) //Added line { src[dstIdx++] = ch; //Added line lastWasWS = true; //Added line } continue; default: lastWasWS = false; //Added line src[dstIdx++] = ch; break; } } return new string(src, 0, dstIdx); } static readonly Regex MultipleSpaces = new Regex(@" {2,}", RegexOptions.Compiled); //Split And Join by Jon Skeet (https://stackoverflow.com/a/1280227/2352507) static string SplitAndJoinOnSpace(string input) { string[] split = input.Split(new char[] { ' '}, StringSplitOptions.RemoveEmptyEntries); return string.Join(" ", split); } //StringBuilder by user214147 (https://stackoverflow.com/a/2156660/2352507 public static string user214147(string S) { string s = S.Trim(); bool iswhite = false; int iwhite; int sLength = s.Length; StringBuilder sb = new StringBuilder(sLength); foreach (char c in s.ToCharArray()) { if (Char.IsWhiteSpace(c)) { if (iswhite) { //Continuing whitespace ignore it. continue; } else { //New WhiteSpace //Replace whitespace with a single space. sb.Append(" "); //Set iswhite to True and any following whitespace will be ignored iswhite = true; } } else { sb.Append(c.ToString()); //reset iswhitespace to false iswhite = false; } } return sb.ToString(); } //StringBuilder by fubo (https://stackoverflow.com/a/27502353/2352507 public static string fubo(this string Value) { StringBuilder sbOut = new StringBuilder(); if (!string.IsNullOrEmpty(Value)) { bool IsWhiteSpace = false; for (int i = 0; i < Value.Length; i++) { if (char.IsWhiteSpace(Value[i])) //Comparison with WhiteSpace { if (!IsWhiteSpace) //Comparison with previous Char { sbOut.Append(Value[i]); IsWhiteSpace = true; } } else { IsWhiteSpace = false; sbOut.Append(Value[i]); } } } return sbOut.ToString(); } //David S. 2013 (https://stackoverflow.com/a/16035044/2352507) public static String SingleSpacedTrim(String inString) { StringBuilder sb = new StringBuilder(); Boolean inBlanks = false; foreach (Char c in inString) { switch (c) { case '\r': case '\n': case '\t': case ' ': if (!inBlanks) { inBlanks = true; sb.Append(' '); } continue; default: inBlanks = false; sb.Append(c); break; } } return sb.ToString().Trim(); } /// <summary> /// We want to run this item with max priory to lower the odds of /// the OS from doing program context switches in the middle of our code. /// source:https://stackoverflow.com/a/16157458 /// </summary> /// <returns>random seed</returns> private static long ConfigProgramForBenchmarking() { //prevent the JIT Compiler from optimizing Fkt calls away long seed = Environment.TickCount; //use the second Core/Processor for the test Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); //prevent "Normal" Processes from interrupting Threads Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High; //prevent "Normal" Threads from interrupting this thread Thread.CurrentThread.Priority = ThreadPriority.Highest; return seed; }
}
基准说明:发布模式,无debugging器连接,i7处理器,平均4次运行,只testing短string
最小的解决scheme:
var regExp = / \ s + / g,newString = oldString.replace(regExp,'');
没有办法做到这一点。 你可以试试这个:
private static readonly char[] whitespace = new char[] { ' ', '\n', '\t', '\r', '\f', '\v' }; public static string Normalize(string source) { return String.Join(" ", source.Split(whitespace, StringSplitOptions.RemoveEmptyEntries)); }
这将删除前导和尾随whitespce以及将任何内部空白折叠为单个空格字符。 如果你真的只想崩溃空间,那么使用正则expression式的解决scheme更好; 否则这个解决scheme更好。 (请参阅Jon Skeet所做的分析 。)