如何安全地从std :: istream读取一行？

我想安全地从std::istream读取一行。 stream可以是任何东西，例如Web服务器上的连接或处理由未知来源提交的文件的东西。有很多答案开始做这个代码的道德等价物：

 void read(std::istream& in) { std::string line; if (std::getline(in, line)) { // process the line } }

考虑到可能存在的可疑来源，使用上述代码将导致一个漏洞：恶意代理可能使用巨大的代码对这个代码进行拒绝服务攻击。因此，我想将线路长度限制在一个相当高的值，比如400万个char 。尽pipe可能会遇到一些大的问题，但为每个文件分配一个缓冲区并使用std::istream::getline()是不可行的。

怎样才能限制这条线的最大尺寸，理想的情况是不会严重扭曲代码，而且事先没有分配大量的内存？

你可以用自己的版本的std::getline来读取参数的最大数量的字符，叫做getline_n或者其他东西。

 #include <string> #include <iostream> template<typename CharT, typename Traits, typename Alloc> auto getline_n(std::basic_istream<CharT, Traits>& in, std::basic_string<CharT, Traits, Alloc>& str, std::streamsize n) -> decltype(in) { std::ios_base::iostate state = std::ios_base::goodbit; bool extracted = false; const typename std::basic_istream<CharT, Traits>::sentry s(in, true); if(s) { try { str.erase(); typename Traits::int_type ch = in.rdbuf()->sgetc(); for(; ; ch = in.rdbuf()->snextc()) { if(Traits::eq_int_type(ch, Traits::eof())) { // eof spotted, quit state |= std::ios_base::eofbit; break; } else if(str.size() == n) { // maximum number of characters met, quit extracted = true; in.rdbuf()->sbumpc(); break; } else if(str.max_size() <= str.size()) { // string too big state |= std::ios_base::failbit; break; } else { // character valid str += Traits::to_char_type(ch); extracted = true; } } } catch(...) { in.setstate(std::ios_base::badbit); } } if(!extracted) { state |= std::ios_base::failbit; } in.setstate(state); return in; } int main() { std::string s; getline_n(std::cin, s, 10); // maximum of 10 characters std::cout << s << '\n'; }

可能会过度杀伤。

已经有了一个getline函数作为istream的成员函数，您只需要将其包装为缓冲区pipe理。

 #include <assert.h> #include <istream> #include <stddef.h> // ptrdiff_t #include <string> // std::string, std::char_traits typedef ptrdiff_t Size; namespace my { using std::istream; using std::string; using std::char_traits; istream& getline( istream& stream, string& s, Size const buf_size, char const delimiter = '\n' ) { s.resize( buf_size ); assert( s.size() > 1 ); stream.getline( &s[0], buf_size, delimiter ); if( !stream.fail() ) { Size const n = char_traits<char>::length( &s[0] ); s.resize( n ); // Downsizing. } return stream; } } // namespace my

通过在std :: istream :: getline中创build一个包装来replacestd :: getline ：

 std::istream& my::getline( std::istream& is, std::streamsize n, std::string& str, char delim ) { try { str.resize(n); is.getline(&str[0],n,delim); str.resize(is.gcount()); return is; } catch(...) { str.resize(0); throw; } }

如果你想避免过多的临时内存分配，你可以使用一个循环，根据需要增加分配（在每次传递时可能会增加一倍）。不要忘记istream对象可能会或可能不会启用exception。

这是一个更有效的分配策略的版本：

 std::istream& my::getline( std::istream& is, std::streamsize n, std::string& str, char delim ) { std::streamsize base=0; do { try { is.clear(); std::streamsize chunk=std::min(n-base,std::max(static_cast<std::streamsize>(2),base)); if ( chunk == 0 ) break; str.resize(base+chunk); is.getline(&str[base],chunk,delim); } catch( std::ios_base::failure ) { if ( !is.gcount () ) str.resize(0), throw; } base += is.gcount(); } while ( is.fail() && is.gcount() ); str.resize(base); return is; }

根据评论和答案，似乎有三种方法：

写一个自定义版本的getline()可能在内部使用std::istream::getline()成员来获取实际的字符。
使用过滤stream缓冲区来限制可能收到的数据量。
而不是阅读一个std::string ，使用一个string实例与自定义分配器限制存储在string中的内存量。

并非所有的build议都附带代码。这个答案提供了所有方法的代码，并对所有三种方法进行了一些讨论。在进入实施细节之前，首先值得指出的是，如果收到过长的input，会有多种select：

读取过长的行可能会导致读取部分行，即结果string包含读取内容，并且该stream没有设置任何错误标志。但是，这样做意味着不可能区分正好碰到限制或太长的行。因为无论如何这个限制是有些武断的，但这可能并不重要。
读取超长的行可能被认为是失败（即，设置std::ios_base::failbit和/或std::ios_base::bad_bit ），并且由于读取失败，产生一个空string。产生一个空string，很显然，可以防止潜在地查看string读到目前为止可能看到发生了什么。
读取超长的行可以提供部分行读取，并在stream上设置错误标志。这似乎是合理的行为，既检测到有一些东西，也为潜在的检查提供input。

尽pipe已经有多个代码实例实现了getline()的有限版本，但这里还有一个例子！我认为它比较简单（尽pipe可能比较慢;必要时可以处理性能），它也保留了std::getline()接口：它使用stream的width()来传递一个限制（可能考虑width() std::getline() ）的合理扩展：

 template <typename cT, typename Traits, typename Alloc> std::basic_istream<cT, Traits>& safe_getline(std::basic_istream<cT, Traits>& in, std::basic_string<cT, Traits, Alloc>& value, cT delim) { typedef std::basic_string<cT, Traits, Alloc> string_type; typedef typename string_type::size_type size_type; typename std::basic_istream<cT, Traits>::sentry cerberos(in); if (cerberos) { value.clear(); size_type width(in.width(0)); if (width == 0) { width = std::numeric_limits<size_type>::max(); } std::istreambuf_iterator<char> it(in), end; for (; value.size() != width && it != end; ++it) { if (!Traits::eq(delim, *it)) { value.push_back(*it); } else { ++it; break; } } if (value.size() == width) { in.setstate(std::ios_base::failbit); } } return in; }

这个版本的getline()就像std::getline()但是当限制读取数据的数量似乎是合理的时， width()被设置，例如：

 std::string line; if (safe_getline(in >> std::setw(max_characters), line)) { // do something with the input }

另一种方法是只使用过滤stream缓冲区来限制input的数量：filter只计算处理的字符数，并将数量限制为适当的字符数。这种方法实际上比单独的一行更容易应用于整个stream：在处理一行时，filter不能从基础stream中获取全部包含字符的缓冲区，因为没有可靠的方法来放回字符。实现一个无缓冲版本仍然很简单，但可能不是特别有效：

 template <typename cT, typename Traits = std::char_traits<char> > class basic_limitbuf : std::basic_streambuf <cT, Traits> { public: typedef Traits traits_type; typedef typename Traits::int_type int_type; private: std::streamsize size; std::streamsize max; std::basic_istream<cT, Traits>* stream; std::basic_streambuf<cT, Traits>* sbuf; int_type underflow() { if (this->size < this->max) { return this->sbuf->sgetc(); } else { this->stream->setstate(std::ios_base::failbit); return traits_type::eof(); } } int_type uflow() { if (this->size < this->max) { ++this->size; return this->sbuf->sbumpc(); } else { this->stream->setstate(std::ios_base::failbit); return traits_type::eof(); } } public: basic_limitbuf(std::streamsize max, std::basic_istream<cT, Traits>& stream) : size() , max(max) , stream(&stream) , sbuf(this->stream->rdbuf(this)) { } ~basic_limitbuf() { std::ios_base::iostate state = this->stream->rdstate(); this->stream->rdbuf(this->sbuf); this->stream->setstate(state); } };

这个stream缓冲区已经被设置为在构build时插入自身，并在破坏时自行移除。也就是说，它可以这样使用：

 std::string line; basic_limitbuf<char> sbuf(max_characters, in); if (std::getline(in, line)) { // do something with the input }

添加设置限制的操纵器也很容易。这种方法的一个优点是，如果可以限制stream的总大小，则不需要触及任何读取代码：可以在创buildstream之后立即设置filter。当不需要退出filter时，filter也可以使用缓冲器，这将大大提高性能。

build议的第三种方法是使用std::basic_string与自定义分配器。有两个方面对于分配器方法来说有些尴尬：

被读取的string实际上有一个不能立即转换为std::string （尽pipe它也不难转换）。
最大的数组大小可以很容易地被限制，但是string会有一些或多或less的随机大小：当stream失败时抛出一个exception被抛出，并且没有尝试以较小的大小来增长string。

以下是分配器限制分配大小的必要代码：

 template <typename T> struct limit_alloc { private: std::size_t max_; public: typedef T value_type; limit_alloc(std::size_t max): max_(max) {} template <typename S> limit_alloc(limit_alloc<S> const& other): max_(other.max()) {} std::size_t max() const { return this->max_; } T* allocate(std::size_t size) { return size <= max_ ? static_cast<T*>(operator new[](size)) : throw std::bad_alloc(); } void deallocate(void* ptr, std::size_t) { return operator delete[](ptr); } }; template <typename T0, typename T1> bool operator== (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) { return a0.max() == a1.max(); } template <typename T0, typename T1> bool operator!= (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) { return !(a0 == a1); }

分配器将被用于这样的事情（代码编译确定最近版本的叮当，但不是与海湾合作委员会）：

 std::basic_string<char, std::char_traits<char>, limit_alloc<char> > tmp(limit_alloc<char>(max_chars)); if (std::getline(in, tmp)) { std::string(tmp.begin(), tmp.end()); // do something with the input }

总之，有多种方法各有其小缺点，但各自合理可行的目标是限制基于超长线路的拒绝服务攻击：

使用getline()的自定义版本意味着阅读代码需要更改。
使用自定义stream缓冲区很慢，除非整个stream的大小可以被限制。
使用自定义分配器可以减less控制，并且需要对代码进行一些更改。

如何安全地从std :: istream读取一行？

为什么引用程序集中的DbParameterCollection抽象中有三个属性，否则是虚拟的？

为什么把未使用的返回值作废？

在C＃中使用存储过程输出参数

传递一个空数组作为可选参数的默认值

安全地使用Win32 API删除USB驱动器？

如何纯粹用C编写iOS应用程序

我如何获得程序集文件版本

如何将boostpathtypes转换为string？

C99'restrict'关键字的现实用法？

cout << a ++ << a;是什么答案？