如何在C ++中标记string?

Java有一个方便的拆分方法:

String str = "The quick brown fox"; String[] results = str.split(" "); 

有一个简单的方法来做到这一点在C + +?

你的简单情况可以很容易地使用std::string::find方法来构build。 不过,看看Boost.Tokenizer 。 这很棒。 Boost通常有一些非常酷的string工具。

Boost tokenizer类可以使这种事情变得非常简单:

 #include <iostream> #include <string> #include <boost/foreach.hpp> #include <boost/tokenizer.hpp> using namespace std; using namespace boost; int main(int, char**) { string text = "token, test string"; char_separator<char> sep(", "); tokenizer< char_separator<char> > tokens(text, sep); BOOST_FOREACH (const string& t, tokens) { cout << t << "." << endl; } } 

更新了C ++ 11:

 #include <iostream> #include <string> #include <boost/tokenizer.hpp> using namespace std; using namespace boost; int main(int, char**) { string text = "token, test string"; char_separator<char> sep(", "); tokenizer<char_separator<char>> tokens(text, sep); for (const auto& t : tokens) { cout << t << "." << endl; } } 

这是一个真正简单的:

 #include <vector> #include <string> using namespace std; vector<string> split(const char *str, char c = ' ') { vector<string> result; do { const char *begin = str; while(*str != c && *str) str++; result.push_back(string(begin, str)); } while (0 != *str++); return result; } 

使用strtok。 在我看来,除非strtok没有提供你需要的东西,否则不需要build立一个关于标记的类。 它可能不会,但是在C和C ++中编写各种parsing代码15年以上,我一直使用strtok。 这是一个例子

 char myString[] = "The quick brown fox"; char *p = strtok(myString, " "); while (p) { printf ("Token: %s\n", p); p = strtok(NULL, " "); } 

几个警告(可能不适合您的需求)。 这个string在这个过程中被“销毁”了,这意味着EOS字符被放在了分隔点的内部。 正确的用法可能要求您创build一个非const的string版本。 您也可以在parsing中更改分隔符的列表。

在我看来,上面的代码比为它编写单独的类要简单得多。 对我来说,这是语言提供的function之一,它干净利落。 这只是一个“基于C”的解决scheme。 这是适当的,这很容易,你不必写很多额外的代码:-)

另一个快速的方法是使用getline 。 就像是:

 stringstream ss("bla bla"); string s; while (getline(ss, s, ' ')) { cout << s << endl; } 

如果你愿意,你可以做一个简单的split()方法来返回一个vector<string> ,这非常有用。

您可以使用stream,迭代器和复制algorithm来相当直接地执行此操作。

 #include <string> #include <vector> #include <iostream> #include <istream> #include <ostream> #include <iterator> #include <sstream> #include <algorithm> int main() { std::string str = "The quick brown fox"; // construct a stream from the string std::stringstream strstr(str); // use stream iterators to copy the stream to the vector as whitespace separated strings std::istream_iterator<std::string> it(strstr); std::istream_iterator<std::string> end; std::vector<std::string> results(it, end); // send the vector to stdout. std::ostream_iterator<std::string> oit(std::cout); std::copy(results.begin(), results.end(), oit); } 

没有冒犯的人,但是对于这样一个简单的问题,你正在使事情变得太复杂。 有很多原因使用Boost 。 但是对于这样简单的事情来说,就像用20#的雪橇击打一只苍蝇一样。

 void split( vector<string> & theStringVector, /* Altered/returned value */ const string & theString, const string & theDelimiter) { UASSERT( theDelimiter.size(), >, 0); // My own ASSERT macro. size_t start = 0, end = 0; while ( end != string::npos) { end = theString.find( theDelimiter, start); // If at end, use length=maxLength. Else use length=end-start. theStringVector.push_back( theString.substr( start, (end == string::npos) ? string::npos : end - start)); // If at end, use start=maxSize. Else use start=end+delimiter. start = ( ( end > (string::npos - theDelimiter.size()) ) ? string::npos : end + theDelimiter.size()); } } 

例如(对于Doug的案例),

 #define SHOW(I,X) cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl int main() { vector<string> v; split( v, "A:PEP:909:Inventory Item", ":" ); for (unsigned int i = 0; i < v.size(); i++) SHOW( i, v[i] ); } 

是的,我们可以让split()返回一个新的向量,而不是传入一个新的向量。重载和重载并不重要。 但是根据我在做什么,我经常发现重新使用预先存在的对象而不是总是创build新的对象会更好。 (只要我不忘记清空它们之间的vector!)

参考: http : //www.cplusplus.com/reference/string/string/ 。

(我最初是在写一个对Doug的问题的回应: 基于分隔符的C ++string修改和提取(closures) ,但是自从Martin York用这个指针closures了这个问题之后,我只是概括了我的代码。

Boost有一个强大的分割函数: boost :: algorithm :: split 。

示例程序:

 #include <vector> #include <boost/algorithm/string.hpp> int main() { auto s = "a,b, c ,,e,f,"; std::vector<std::string> fields; boost::split(fields, s, boost::is_any_of(",")); for (const auto& field : fields) std::cout << "\"" << field << "\"\n"; return 0; } 

输出:

 "a" "b" " c " "" "e" "f" "" 

我知道你问了一个C ++解决scheme,但是你可能会认为这有帮助:

Qt的

 #include <QString> ... QString str = "The quick brown fox"; QStringList results = str.split(" "); 

在这个例子中,与Boost相比,它的优势在于它可以直接映射到您的post的代码。

查看更多的Qt文档

这是一个示例tokenizer类,可以做你想做的

 //Header file class Tokenizer { public: static const std::string DELIMITERS; Tokenizer(const std::string& str); Tokenizer(const std::string& str, const std::string& delimiters); bool NextToken(); bool NextToken(const std::string& delimiters); const std::string GetToken() const; void Reset(); protected: size_t m_offset; const std::string m_string; std::string m_token; std::string m_delimiters; }; //CPP file const std::string Tokenizer::DELIMITERS(" \t\n\r"); Tokenizer::Tokenizer(const std::string& s) : m_string(s), m_offset(0), m_delimiters(DELIMITERS) {} Tokenizer::Tokenizer(const std::string& s, const std::string& delimiters) : m_string(s), m_offset(0), m_delimiters(delimiters) {} bool Tokenizer::NextToken() { return NextToken(m_delimiters); } bool Tokenizer::NextToken(const std::string& delimiters) { size_t i = m_string.find_first_not_of(delimiters, m_offset); if (std::string::npos == i) { m_offset = m_string.length(); return false; } size_t j = m_string.find_first_of(delimiters, i); if (std::string::npos == j) { m_token = m_string.substr(i); m_offset = m_string.length(); return true; } m_token = m_string.substr(i, j - i); m_offset = j; return true; } 

例:

 std::vector <std::string> v; Tokenizer s("split this string", " "); while (s.NextToken()) { v.push_back(s.GetToken()); } 

pystring是一个小型库,它实现了一堆Python的string函数,包括split方法:

 #include <string> #include <vector> #include "pystring.h" std::vector<std::string> chunks; pystring::split("this string", chunks); // also can specify a separator pystring::split("this-string", chunks, "-"); 

这是一个使用std::findstd::find_first_not_of的简单STL解决scheme(~5行!),它处理分隔符的重复(例如空格或句点)以及前导和尾随分隔符:

 #include <string> #include <vector> void tokenize(std::string str, std::vector<string> &token_v){ size_t start = str.find_first_not_of(DELIMITER), end=start; while (start != std::string::npos){ // Find next occurence of delimiter end = str.find(DELIMITER, start); // Push back the token found into vector token_v.push_back(str.substr(start, end-start)); // Skip all occurences of the delimiter to find new start start = str.find_first_not_of(DELIMITER, end); } } 

试试吧!

我发布了类似的问题这个答案。
不要重新发明轮子。 我已经使用了许多库,而且我遇到的最快和最灵活的是: C ++ String Toolkit库 。

这里是一个如何使用它,我已经张贴在其他地方在stackoverflow的例子。

 #include <iostream> #include <vector> #include <string> #include <strtk.hpp> const char *whitespace = " \t\r\n\f"; const char *whitespace_and_punctuation = " \t\r\n\f;,="; int main() { { // normal parsing of a string into a vector of strings string s("Somewhere down the road"); std::vector<std::string> result; if( strtk::parse( s, whitespace, result ) ) { for(size_t i = 0; i < result.size(); ++i ) std::cout << result[i] << std::endl; } } { // parsing a string into a vector of floats with other separators // besides spaces string t("3.0, 3.14; 4.0"); std::vector<float> values; if( strtk::parse( s, whitespace_and_punctuation, values ) ) { for(size_t i = 0; i < values.size(); ++i ) std::cout << values[i] << std::endl; } } { // parsing a string into specific variables string u("angle = 45; radius = 9.9"); string w1, w2; float v1, v2; if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) ) { std::cout << "word " << w1 << ", value " << v1 << std::endl; std::cout << "word " << w2 << ", value " << v2 << std::endl; } } return 0; } 

检查这个例子。 这可能会帮助你

 #include <iostream> #include <sstream> using namespace std; int main () { string tmps; istringstream is ("the dellimiter is the space"); while (is.good ()) { is >> tmps; cout << tmps << "\n"; } return 0; } 

使用regex_token_iterator的解决scheme:

 #include <iostream> #include <regex> #include <string> using namespace std; int main() { string str("The quick brown fox"); regex reg("\\s+"); sregex_token_iterator iter(str.begin(), str.end(), reg, -1); sregex_token_iterator end; vector<string> vec(iter, end); for (auto a : vec) { cout << a << endl; } } 

您可以简单地使用正则expression式库并使用正则expression式来解决这个问题。

使用expression式(\ w +)和\ 1(或$ 1,取决于正则expression式的库实现)中的variables。

如果你愿意使用C,你可以使用strtok函数。 使用时应注意multithreading问题。

对于简单的东西,我只是使用以下内容:

 unsigned TokenizeString(const std::string& i_source, const std::string& i_seperators, bool i_discard_empty_tokens, std::vector<std::string>& o_tokens) { unsigned prev_pos = 0; unsigned pos = 0; unsigned number_of_tokens = 0; o_tokens.clear(); pos = i_source.find_first_of(i_seperators, pos); while (pos != std::string::npos) { std::string token = i_source.substr(prev_pos, pos - prev_pos); if (!i_discard_empty_tokens || token != "") { o_tokens.push_back(i_source.substr(prev_pos, pos - prev_pos)); number_of_tokens++; } pos++; prev_pos = pos; pos = i_source.find_first_of(i_seperators, pos); } if (prev_pos < i_source.length()) { o_tokens.push_back(i_source.substr(prev_pos)); number_of_tokens++; } return number_of_tokens; } 

怯懦的免责声明:我写的实时数据处理软件,通过二进制文件,套接字或一些API调用(I / O卡,相机)来进入数据。 我从来没有使用这个函数来处理比启动时读取外部configuration文件更复杂或更时间的东西。

MFC / ATL有一个非常好的分词器。 来自MSDN:

 CAtlString str( "%First Second#Third" ); CAtlString resToken; int curPos= 0; resToken= str.Tokenize("% #",curPos); while (resToken != "") { printf("Resulting token: %s\n", resToken); resToken= str.Tokenize("% #",curPos); }; Output Resulting Token: First Resulting Token: Second Resulting Token: Third 

这里有许多过于复杂的build议。 试试这个简单的std :: string解决scheme:

 using namespace std; string someText = ... string::size_type tokenOff = 0, sepOff = tokenOff; while (sepOff != string::npos) { sepOff = someText.find(' ', sepOff); string::size_type tokenLen = (sepOff == string::npos) ? sepOff : sepOff++ - tokenOff; string token = someText.substr(tokenOff, tokenLen); if (!token.empty()) /* do something with token */; tokenOff = sepOff; } 

我认为这是什么stringstream>>运算符是:

 string word; sin >> word; 

亚当·皮尔斯的答案提供了一个手工编码器在一个const char* 。 使用迭代器会有点问题,因为递增string的结束迭代器是未定义的 。 这就是说,给予string str{ "The quick brown fox" }我们当然可以做到这一点:

 auto start = find(cbegin(str), cend(str), ' '); vector<string> tokens{ string(cbegin(str), start) }; while (start != cend(str)) { const auto finish = find(++start, cend(str), ' '); tokens.push_back(string(start, finish)); start = finish; } 

现场示例


如果您想通过使用标准function来抽象复杂性,如On Freund所示, strtok是一个简单的选项:

 vector<string> tokens; for (auto i = strtok(data(str), " "); i != nullptr; i = strtok(nullptr, " ")) tokens.push_back(i); 

如果您无法访问C ++ 17,则需要像本例中那样replacedata(str) : http : //ideone.com/8kAGoa

虽然在示例中没有演示,但strtok不需要为每个标记使用相同的分隔符。 除了这个优点,还有几个缺点:

  1. strtok不能同时在多个strings上使用:要么必须传递nullptr来继续标记当前string要么必须传递一个新的char*来传递tokenize(有一些非标准的实现可以支持这个,比如: strtok_s
  2. 出于同样的原因, strtok不能同时在多个线程上使用(这可能是实现定义的,例如: Visual Studio的实现是线程安全的 )
  3. 调用strtok会修改正在操作的string ,所以它不能用于const stringconst char*或文字string,用strtok标记这些string,或者对需要保留内容的string进行操作, str将不得不被复制,然后副本可以运行

前面的方法都不能生成一个标记化的vector ,也就是说,如果不将它们抽象成一个辅助函数,它们就不能初始化const vector<string> tokens 。 该function接受任何空白分隔符的能力可以通过使用istream_iterator来使用。 例如给出: const string str{ "The quick \tbrown \nfox" }我们可以这样做:

 istringstream is{ str }; const vector<string> tokens{ istream_iterator<string>(is), istream_iterator<string>() }; 

现场示例

这个选项所需的istringstream构造比前两个选项的成本要高得多,但是这个成本通常隐藏在string分配的代价之内。


如果上述选项都不足以满足您的标记化需求,那么最灵活的select是使用regex_token_iterator ,当然这种灵活性会带来更大的花费,但这又可能隐藏在string分配成本中。 举例来说,我们要基于非转义的逗号进行标记化,同时吃下空格,给出以下input: const string str{ "The ,qu\\,ick ,\tbrown, fox" }我们可以这样做:

 const regex re{ "\\s*((?:[^\\\\,]|\\\\.)*?)\\s*(?:,|$)" }; const vector<string> tokens{ sregex_token_iterator(cbegin(str), cend(str), re, 1), sregex_token_iterator() }; 

现场示例

这里有一个方法可以让你控制是否包含空的标记(比如strsep)或排除(像strtok)。

 #include <string.h> // for strchr and strlen /* * want_empty_tokens==true : include empty tokens, like strsep() * want_empty_tokens==false : exclude empty tokens, like strtok() */ std::vector<std::string> tokenize(const char* src, char delim, bool want_empty_tokens) { std::vector<std::string> tokens; if (src and *src != '\0') // defensive while( true ) { const char* d = strchr(src, delim); size_t len = (d)? d-src : strlen(src); if (len or want_empty_tokens) tokens.push_back( std::string(src, len) ); // capture token if (d) src += len+1; else break; } return tokens; } 

对我来说这似乎很奇怪,因为我们所有人都在意识到这个问题,所以没有人提供过使用编译时生成的分隔符查找表的版本(下面的示例实现)。 使用查找表和迭代器应该在效率上击败std :: regex,如果你不需要击败正则expression式,就使用它,它的标准就像C ++ 11一样,而且超级灵活。

有些人已经提出了正则expression式,但是对于noobs来说,这里是一个打包的例子,它应该完成OP所期望的:

 std::vector<std::string> split(std::string::const_iterator it, std::string::const_iterator end, std::regex e = std::regex{"\\w+"}){ std::smatch m{}; std::vector<std::string> ret{}; while (std::regex_search (it,end,m,e)) { ret.emplace_back(m.str()); std::advance(it, m.position() + m.length()); //next start position = match position + match length } return ret; } std::vector<std::string> split(const std::string &s, std::regex e = std::regex{"\\w+"}){ //comfort version calls flexible version return split(s.cbegin(), s.cend(), std::move(e)); } int main () { std::string str {"Some people, excluding those present, have been compile time constants - since puberty."}; auto v = split(str); for(const auto&s:v){ std::cout << s << std::endl; } std::cout << "crazy version:" << std::endl; v = split(str, std::regex{"[^e]+"}); //using e as delim shows flexibility for(const auto&s:v){ std::cout << s << std::endl; } return 0; } 

如果我们需要更快一些,并且接受所有字符都必须是8位的限制,那么我们可以在编译时使用元编程来查找表:

 template<bool...> struct BoolSequence{}; //just here to hold bools template<char...> struct CharSequence{}; //just here to hold chars template<typename T, char C> struct Contains; //generic template<char First, char... Cs, char Match> //not first specialization struct Contains<CharSequence<First, Cs...>,Match> : Contains<CharSequence<Cs...>, Match>{}; //strip first and increase index template<char First, char... Cs> //is first specialization struct Contains<CharSequence<First, Cs...>,First>: std::true_type {}; template<char Match> //not found specialization struct Contains<CharSequence<>,Match>: std::false_type{}; template<int I, typename T, typename U> struct MakeSequence; //generic template<int I, bool... Bs, typename U> struct MakeSequence<I,BoolSequence<Bs...>, U>: //not last MakeSequence<I-1, BoolSequence<Contains<U,I-1>::value,Bs...>, U>{}; template<bool... Bs, typename U> struct MakeSequence<0,BoolSequence<Bs...>,U>{ //last using Type = BoolSequence<Bs...>; }; template<typename T> struct BoolASCIITable; template<bool... Bs> struct BoolASCIITable<BoolSequence<Bs...>>{ /* could be made constexpr but not yet supported by MSVC */ static bool isDelim(const char c){ static const bool table[256] = {Bs...}; return table[static_cast<int>(c)]; } }; using Delims = CharSequence<'.',',',' ',':','\n'>; //list your custom delimiters here using Table = BoolASCIITable<typename MakeSequence<256,BoolSequence<>,Delims>::Type>; 

With that in place making a getNextToken function is easy:

 template<typename T_It> std::pair<T_It,T_It> getNextToken(T_It begin,T_It end){ begin = std::find_if(begin,end,std::not1(Table{})); //find first non delim or end auto second = std::find_if(begin,end,Table{}); //find first delim or end return std::make_pair(begin,second); } 

Using it is also easy:

 int main() { std::string s{"Some people, excluding those present, have been compile time constants - since puberty."}; auto it = std::begin(s); auto end = std::end(s); while(it != std::end(s)){ auto token = getNextToken(it,end); std::cout << std::string(token.first,token.second) << std::endl; it = token.second; } return 0; } 

Here is a live example: http://ideone.com/GKtkLQ

There is no direct way to do this. Refer this code project source code to find out how to build a class for this.

you can take advantage of boost::make_find_iterator. Something similar to this:

 template<typename CH> inline vector< basic_string<CH> > tokenize( const basic_string<CH> &Input, const basic_string<CH> &Delimiter, bool remove_empty_token ) { typedef typename basic_string<CH>::const_iterator string_iterator_t; typedef boost::find_iterator< string_iterator_t > string_find_iterator_t; vector< basic_string<CH> > Result; string_iterator_t it = Input.begin(); string_iterator_t it_end = Input.end(); for(string_find_iterator_t i = boost::make_find_iterator(Input, boost::first_finder(Delimiter, boost::is_equal())); i != string_find_iterator_t(); ++i) { if(remove_empty_token){ if(it != i->begin()) Result.push_back(basic_string<CH>(it,i->begin())); } else Result.push_back(basic_string<CH>(it,i->begin())); it = i->end(); } if(it != it_end) Result.push_back(basic_string<CH>(it,it_end)); return Result; } 

If the maximum length of the input string to be tokenized is known, one can exploit this and implement a very fast version. I am sketching the basic idea below, which was inspired by both strtok() and the "suffix array"-data structure described Jon Bentley's "Programming Perls" 2nd edition, chapter 15. The C++ class in this case only gives some organization and convenience of use. The implementation shown can be easily extended for removing leading and trailing whitespace characters in the tokens.

Basically one can replace the separator characters with string-terminating '\0'-characters and set pointers to the tokens withing the modified string. In the extreme case when the string consists only of separators, one gets string-length plus 1 resulting empty tokens. It is practical to duplicate the string to be modified.

头文件:

 class TextLineSplitter { public: TextLineSplitter( const size_t max_line_len ); ~TextLineSplitter(); void SplitLine( const char *line, const char sep_char = ',', ); inline size_t NumTokens( void ) const { return mNumTokens; } const char * GetToken( const size_t token_idx ) const { assert( token_idx < mNumTokens ); return mTokens[ token_idx ]; } private: const size_t mStorageSize; char *mBuff; char **mTokens; size_t mNumTokens; inline void ResetContent( void ) { memset( mBuff, 0, mStorageSize ); // mark all items as empty: memset( mTokens, 0, mStorageSize * sizeof( char* ) ); // reset counter for found items: mNumTokens = 0L; } }; 

Implementattion file:

 TextLineSplitter::TextLineSplitter( const size_t max_line_len ): mStorageSize ( max_line_len + 1L ) { // allocate memory mBuff = new char [ mStorageSize ]; mTokens = new char* [ mStorageSize ]; ResetContent(); } TextLineSplitter::~TextLineSplitter() { delete [] mBuff; delete [] mTokens; } void TextLineSplitter::SplitLine( const char *line, const char sep_char /* = ',' */, ) { assert( sep_char != '\0' ); ResetContent(); strncpy( mBuff, line, mMaxLineLen ); size_t idx = 0L; // running index for characters do { assert( idx < mStorageSize ); const char chr = line[ idx ]; // retrieve current character if( mTokens[ mNumTokens ] == NULL ) { mTokens[ mNumTokens ] = &mBuff[ idx ]; } // if if( chr == sep_char || chr == '\0' ) { // item or line finished // overwrite separator with a 0-terminating character: mBuff[ idx ] = '\0'; // count-up items: mNumTokens ++; } // if } while( line[ idx++ ] ); } 

A scenario of usage would be:

 // create an instance capable of splitting strings up to 1000 chars long: TextLineSplitter spl( 1000 ); spl.SplitLine( "Item1,,Item2,Item3" ); for( size_t i = 0; i < spl.NumTokens(); i++ ) { printf( "%s\n", spl.GetToken( i ) ); } 

输出:

 Item1 Item2 Item3 

boost::tokenizer is your friend, but consider making your code portable with reference to internationalization (i18n) issues by using wstring / wchar_t instead of the legacy string / char types.

 #include <iostream> #include <boost/tokenizer.hpp> #include <string> using namespace std; using namespace boost; typedef tokenizer<char_separator<wchar_t>, wstring::const_iterator, wstring> Tok; int main() { wstring s; while (getline(wcin, s)) { char_separator<wchar_t> sep(L" "); // list of separator characters Tok tok(s, sep); for (Tok::iterator beg = tok.begin(); beg != tok.end(); ++beg) { wcout << *beg << L"\t"; // output (or store in vector) } wcout << L"\n"; } return 0; } 

Simple C++ code (standard C++98), accepts multiple delimiters (specified in a std::string), uses only vectors, strings and iterators.

 #include <iostream> #include <vector> #include <string> #include <stdexcept> std::vector<std::string> split(const std::string& str, const std::string& delim){ std::vector<std::string> result; if (str.empty()) throw std::runtime_error("Can not tokenize an empty string!"); std::string::const_iterator begin, str_it; begin = str_it = str.begin(); do { while (delim.find(*str_it) == std::string::npos && str_it != str.end()) str_it++; // find the position of the first delimiter in str std::string token = std::string(begin, str_it); // grab the token if (!token.empty()) // empty token only when str starts with a delimiter result.push_back(token); // push the token into a vector<string> while (delim.find(*str_it) != std::string::npos && str_it != str.end()) str_it++; // ignore the additional consecutive delimiters begin = str_it; // process the remaining tokens } while (str_it != str.end()); return result; } int main() { std::string test_string = ".this is.a.../.simple;;test;;;END"; std::string delim = "; ./"; // string containing the delimiters std::vector<std::string> tokens = split(test_string, delim); for (std::vector<std::string>::const_iterator it = tokens.begin(); it != tokens.end(); it++) std::cout << *it << std::endl; } 
 /// split a string into multiple sub strings, based on a separator string /// for example, if separator="::", /// /// s = "abc" -> "abc" /// /// s = "abc::def xy::st:" -> "abc", "def xy" and "st:", /// /// s = "::abc::" -> "abc" /// /// s = "::" -> NO sub strings found /// /// s = "" -> NO sub strings found /// /// then append the sub-strings to the end of the vector v. /// /// the idea comes from the findUrls() function of "Accelerated C++", chapt7, /// findurls.cpp /// void split(const string& s, const string& sep, vector<string>& v) { typedef string::const_iterator iter; iter b = s.begin(), e = s.end(), i; iter sep_b = sep.begin(), sep_e = sep.end(); // search through s while (b != e){ i = search(b, e, sep_b, sep_e); // no more separator found if (i == e){ // it's not an empty string if (b != e) v.push_back(string(b, e)); break; } else if (i == b){ // the separator is found and right at the beginning // in this case, we need to move on and search for the // next separator b = i + sep.length(); } else{ // found the separator v.push_back(string(b, i)); b = i; } } } 

The boost library is good, but they are not always available. Doing this sort of things by hand is also a good brain exercise. Here we just use the std::search() algorithm from the STL, see the above code.