Java在一个string中replace多个不同的子string(或以最有效的方式)

我需要以最有效的方式replacestring中的许多不同的子string。 有没有另一种方式,然后使用string.replacereplace每个字段的蛮力的方式?

如果你正在操作的string很长,或者你正在操作很多string,那么使用java.util.regex.Matcher可能是值得的(这需要时间来编译,所以它不会有效如果你的input很小或者你的search模式经常改变)。

以下是一个完整的示例,基于从地图中获取的令牌列表。 (使用来自Apache Commons Lang的StringUtils)。

Map<String,String> tokens = new HashMap<String,String>(); tokens.put("cat", "Garfield"); tokens.put("beverage", "coffee"); String template = "%cat% really needs some %beverage%."; // Create pattern of the format "%(cat|beverage)%" String patternString = "%(" + StringUtils.join(tokens.keySet(), "|") + ")%"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(template); StringBuffer sb = new StringBuffer(); while(matcher.find()) { matcher.appendReplacement(sb, tokens.get(matcher.group(1))); } matcher.appendTail(sb); System.out.println(sb.toString()); 

一旦正则expression式被编译,扫描inputstring通常是非常快的(虽然如果你的正则expression式很复杂或涉及回溯,那么你仍然需要基准来证实这一点!)

algorithm

取代匹配string(不带正则expression式)的最有效方法之一是使用Aho-Corasickalgorithm和高性能Trie (发音为“try”),快速哈希algorithm以及高效的集合实现。

简单的代码

也许最简单的代码是利用Apache的StringUtils.replaceEach ,如下所示:

  private String testStringUtils( final String text, final Map<String, String> definitions ) { final String[] keys = keys( definitions ); final String[] values = values( definitions ); return StringUtils.replaceEach( text, keys, values ); } 

这会减慢大文本。

快速代码

Bor的 Aho-Corasickalgorithm的实现引入了更多的复杂性,通过使用具有相同方法签名的外观,成为实现细节:

  private String testBorAhoCorasick( final String text, final Map<String, String> definitions ) { // Create a buffer sufficiently large that re-allocations are minimized. final StringBuilder sb = new StringBuilder( text.length() << 1 ); final TrieBuilder builder = Trie.builder(); builder.onlyWholeWords(); builder.removeOverlaps(); final String[] keys = keys( definitions ); for( final String key : keys ) { builder.addKeyword( key ); } final Trie trie = builder.build(); final Collection<Emit> emits = trie.parseText( text ); int prevIndex = 0; for( final Emit emit : emits ) { final int matchIndex = emit.getStart(); sb.append( text.substring( prevIndex, matchIndex ) ); sb.append( definitions.get( emit.getKeyword() ) ); prevIndex = emit.getEnd() + 1; } // Add the remainder of the string (contains no more matches). sb.append( text.substring( prevIndex ) ); return sb.toString(); } 

基准

对于基准,缓冲区是使用randomNumeric创build的,如下所示:

  private final static int TEXT_SIZE = 1000; private final static int MATCHES_DIVISOR = 10; private final static StringBuilder SOURCE = new StringBuilder( randomNumeric( TEXT_SIZE ) ); 

MATCHES_DIVISOR表示要注入的variables的数量:

  private void injectVariables( final Map<String, String> definitions ) { for( int i = (SOURCE.length() / MATCHES_DIVISOR) + 1; i > 0; i-- ) { final int r = current().nextInt( 1, SOURCE.length() ); SOURCE.insert( r, randomKey( definitions ) ); } } 

基准代码本身( JMH似乎矫枉过正):

 long duration = System.nanoTime(); final String result = testBorAhoCorasick( text, definitions ); duration = System.nanoTime() - duration; System.out.println( elapsed( duration ) ); 

1,000,000:1,000

一个简单的微型基准,包含1,000,000个字符和1,000个随机放置的string来replace。

  • testStringUtils: 25秒,25533毫秒
  • testBorAhoCorasick: 0秒,68毫秒

没有比赛。

10,000:1,000

使用10,000个字符和1,000个匹配的string来replace:

  • testStringUtils: 1秒,1402毫秒
  • testBorAhoCorasick: 0秒,37毫秒

划分closures。

1000:10

使用1,000个字符和10个匹配的string来replace:

  • testStringUtils: 0秒,7毫秒
  • testBorAhoCorasick: 0秒,19毫秒

对于短string来说,设置Aho-Corasick的开销超过了StringUtils.replaceEach的蛮力方法。

基于文本长度的混合方法是可能的,以获得两种实现的最佳效果。

实现

考虑比较长于1 MB的文本的其他实现,包括:

文件

有关该algorithm的论文和信息:

如果您要多次更改一个string,那么使用StringBuilder通常会更有效率(但要衡量您的性能以找出问题)

 String str = "The rain in Spain falls mainly on the plain"; StringBuilder sb = new StringBuilder(str); // do your replacing in sb - although you'll find this trickier than simply using String String newStr = sb.toString(); 

每次你在一个String上进行replace,都会创build一个新的String对象,因为Strings是不可变的。 StringBuilder是可变的,也就是说,它可以随意更改。

StringBuilder将更有效地执行replace,因为它的字符数组缓冲区可以被指定为所需的长度。 StringBuilder是专为多于追加!

当然真正的问题是这是否是一个优化太远? JVM非常擅长处理多个对象的创build和随后的垃圾回收,就像所有的优化问题一样,我的第一个问题是你是否测量了这个并确定它是一个问题。

如何使用replaceAll()方法?

检查这个:

的String.format(STR,STR [])

例如:

String.format(“把你的%s放在你的%s是”,“钱”,“嘴”);

Rythm是一个Java模板引擎,现在发布了一个叫做String插入模式的新function,它允许你做类似的事情:

 String result = Rythm.render("@name is inviting you", "Diana"); 

上面的情况显示你可以通过位置将parameter passing给模板。 Rythm也允许你通过名字传递参数:

 Map<String, Object> args = new HashMap<String, Object>(); args.put("title", "Mr."); args.put("name", "John"); String result = Rythm.render("Hello @title @name", args); 

注意Rythm非常快,比String.format和velocity要快2到3倍,因为它将模板编译成java字节码,运行时性能非常接近与StringBuilder的联合。

链接:

  • 检查全function演示
  • 阅读Rythm简要介绍
  • 下载最新的软件包或
  • 叉吧
 public String replace(String input, Map<String, String> pairs) { // Reverse lexic-order of keys is good enough for most cases, // as it puts longer words before their prefixes ("tool" before "too"). // However, there are corner cases, which this algorithm doesn't handle // no matter what order of keys you choose, eg. it fails to match "edit" // before "bed" in "..bedit.." because "bed" appears first in the input, // but "edit" may be the desired longer match. Depends which you prefer. final Map<String, String> sorted = new TreeMap<String, String>(Collections.reverseOrder()); sorted.putAll(pairs); final String[] keys = sorted.keySet().toArray(new String[sorted.size()]); final String[] vals = sorted.values().toArray(new String[sorted.size()]); final int lo = 0, hi = input.length(); final StringBuilder result = new StringBuilder(); int s = lo; for (int i = s; i < hi; i++) { for (int p = 0; p < keys.length; p++) { if (input.regionMatches(i, keys[p], 0, keys[p].length())) { /* TODO: check for "edit", if this is "bed" in "..bedit.." case, * ie look ahead for all prioritized/longer keys starting within * the current match region; iff found, then ignore match ("bed") * and continue search (find "edit" later), else handle match. */ // if (better-match-overlaps-right-ahead) // continue; result.append(input, s, i).append(vals[p]); i += keys[p].length(); s = i--; } } } if (s == lo) // no matches? no changes! return input; return result.append(input, s, hi).toString(); } 

下面是基于托德·欧文的答案 。 该解决scheme有问题,如果replace包含正则expression式中有特殊含义的字符,您可以得到意外的结果。 我也希望能够select做一个不区分大小写的search。 这是我想出来的:

 /** * Performs simultaneous search/replace of multiple strings. Case Sensitive! */ public String replaceMultiple(String target, Map<String, String> replacements) { return replaceMultiple(target, replacements, true); } /** * Performs simultaneous search/replace of multiple strings. * * @param target string to perform replacements on. * @param replacements map where key represents value to search for, and value represents replacem * @param caseSensitive whether or not the search is case-sensitive. * @return replaced string */ public String replaceMultiple(String target, Map<String, String> replacements, boolean caseSensitive) { if(target == null || "".equals(target) || replacements == null || replacements.size() == 0) return target; //if we are doing case-insensitive replacements, we need to make the map case-insensitive--make a new map with all-lower-case keys if(!caseSensitive) { Map<String, String> altReplacements = new HashMap<String, String>(replacements.size()); for(String key : replacements.keySet()) altReplacements.put(key.toLowerCase(), replacements.get(key)); replacements = altReplacements; } StringBuilder patternString = new StringBuilder(); if(!caseSensitive) patternString.append("(?i)"); patternString.append('('); boolean first = true; for(String key : replacements.keySet()) { if(first) first = false; else patternString.append('|'); patternString.append(Pattern.quote(key)); } patternString.append(')'); Pattern pattern = Pattern.compile(patternString.toString()); Matcher matcher = pattern.matcher(target); StringBuffer res = new StringBuffer(); while(matcher.find()) { String match = matcher.group(1); if(!caseSensitive) match = match.toLowerCase(); matcher.appendReplacement(res, replacements.get(match)); } matcher.appendTail(res); return res.toString(); } 

这里是我的unit testing用例:

 @Test public void replaceMultipleTest() { assertNull(ExtStringUtils.replaceMultiple(null, null)); assertNull(ExtStringUtils.replaceMultiple(null, Collections.<String, String>emptyMap())); assertEquals("", ExtStringUtils.replaceMultiple("", null)); assertEquals("", ExtStringUtils.replaceMultiple("", Collections.<String, String>emptyMap())); assertEquals("folks, we are not sane anymore. with me, i promise you, we will burn in flames", ExtStringUtils.replaceMultiple("folks, we are not winning anymore. with me, i promise you, we will win big league", makeMap("win big league", "burn in flames", "winning", "sane"))); assertEquals("bcaacbbcaacb", ExtStringUtils.replaceMultiple("abccbaabccba", makeMap("a", "b", "b", "c", "c", "a"))); assertEquals("bcaCBAbcCCBb", ExtStringUtils.replaceMultiple("abcCBAabCCBa", makeMap("a", "b", "b", "c", "c", "a"))); assertEquals("bcaacbbcaacb", ExtStringUtils.replaceMultiple("abcCBAabCCBa", makeMap("a", "b", "b", "c", "c", "a"), false)); assertEquals("c colon backslash temp backslash star dot star ", ExtStringUtils.replaceMultiple("c:\\temp\\*.*", makeMap(".", " dot ", ":", " colon ", "\\", " backslash ", "*", " star "), false)); } private Map<String, String> makeMap(String ... vals) { Map<String, String> map = new HashMap<String, String>(vals.length / 2); for(int i = 1; i < vals.length; i+= 2) map.put(vals[i-1], vals[i]); return map; }