使用正则expression式来validation电子邮件地址

多年来,我慢慢地开发了一个正则expression式来validationMOST电子邮件地址,假设他们不使用IP地址作为服务器部分。

我在几个PHP程序中使用它,并且大部分时间都在运行。 然而,有时候我会接触到一个使用它的站点有问题的人,最后我不得不做一些调整(最近我意识到我不允许使用4个字符的顶级域名)。

你有什么最好的正则expression式来validation电子邮件?

我已经看到了几个解决scheme,使用几个较短的expression式的函数,但我宁愿有一个简单的函数,而不是在一个更复杂的函数几个简短的expression式的一个长复杂的expression式。

完全符合RFC 822标准的正则expression式由于其长度而效率低下且模糊。 幸运的是,RFC 822被取代了两次,目前电子邮件地址的规范是RFC 5322 。 RFC 5322导致了一个正则expression式,可以理解,如果研究了几分钟,而且足够有效的实际使用。

一个符合RFC 5322标准的正则expression式可以在http://emailregex.com/的页面顶部find,但是使用在因特网上浮动的IP地址模式,其错误允许;00中的任何无符号字节十进制值以点分隔的地址,这是非法的。 其余部分似乎与RFC 5322语法一致,并且使用grep -Po传递了几个testing,包括域名,IP地址,错误和带和不带引号的帐户名。

纠正IP模式中的00错误,我们获得了一个工作和相当快的正则expression式。 (为实际的代码刮渲染的版本,而不是降价。)

(:[A-Z0-9#$%& '* + / = ^ _`{|}〜 – +(?!\ [A-Z0-9#$%&!]'?* + / ?= ^ _`{|}〜 – ] +)* |“(?:[\ x01- \ X08 \ X0B \ X0C \ x0e- \ X1F \ X21 \ x23- \ x5b \ x5d- \ 0x7F部分] | \\ [\ x01- \ X09 \ X0B \ X0C \ x0e- \ 0x7F部分])*“)@(:(:[α-Z0-9](:???[A-Z0-9 – ] * [A-Z0 ?-9])\)+ [A-Z0-9](:?[A-Z0-9 – ] * [A-Z0-9])| \ [(:( 🙁 2(5'? [0-5] | [0-4] [0-9])| 1 [0-9] [0-9] |。[1-9] [0-9]))\){3}( ?:( 2(5 [0-5] | [0-4] [0-9])| 1 [0-9] [0-9] | [1-9] [0-9])|〔 A-Z0-9 – ] * [A-Z0-9]:(?:[\ x01- \ X08 \ X0B \ X0C \ x0e- \ X1F \ x21- \ X5A \ x53- \ 0x7F部分] | \\ [\ x01- \ X09 \ X0B \ X0C \ x0e- \ 0x7F部分])+)\])

以上是正则expression式的有限状态机的 图 ,比正则expression式本身更清晰 在这里输入图像描述

Perl和PCRE中更复杂的模式(例如PHP中使用的正则expression式库)可以正确parsingRFC 5322 。 Python和C#也可以这样做,但是它们使用与前两种不同的语法。 但是,如果您不得不使用其中一种function较弱的模式匹配语言,那么最好使用真正的parsing器。

理解根据RFC进行validation也是很重要的,这一点告诉你这个地址是否真的存在于提供的域中,或者input地址的人是否是它的真正所有者。 人们总是用这种方式签署其他邮件列表。 解决这个问题需要更多的validation,包括向该地址发送一个包含确认令牌的消息,该令牌应该与地址一样input到同一个网页上。

确认令牌是唯一的方法来知道你有进入的人的地址。 这就是为什么大多数邮件列performance在都使用该机制来确认注册。 毕竟,任何人都可以放下president@whitehouse.gov ,甚至可以parsing为合法的,但不可能是另一端的人。

对于PHP,您不应该使用PHP中validation电子邮件地址给出的模式,我引用的是正确的方式 :

常见的使用方法和广泛的草率编码将会对电子邮件地址build立一个事实上的标准,这比规定的标准更为严格。

这并不比所有其他的非RFC模式更好。 它甚至不足以处理RFC 822 ,更不用说RFC 5322 了 。但是, 这一点是。

如果你想变得花俏和迂腐, 实现一个完整的状态引擎 。 正则expression式只能作为基本的filter。 正则expression式的问题是,告诉某人他们的完全有效的电子邮件地址是无效的(误报),因为你的正则expression式无法处理它是从用户的angular度来看粗鲁和不礼貌。 为此目的的状态引擎既可以validation甚至纠正电子邮件地址,否则这些电子邮件地址将被视为无效,因为它会根据每个RFC分解电子邮件地址。 这允许一个潜在的更令人愉快的体验,比如

指定的电子邮件地址“myemail @ address,com”无效。 你是不是指“myemail@address.com”?

另请参阅validation电子邮件地址 ,包括注释。 或比较电子邮件地址validation正则expression式 。

正则表达式可视化

Debuggex演示

您不应该使用正则expression式来validation电子邮件地址。

而是使用MailAddress类,如下所示:

 try { address = new MailAddress(address).Address; } catch(FormatException) { //address is invalid } 

MailAddress类使用BNF分析器完全按照RFC822来validation地址。

如果你真的想使用正则expression式, 这里是 :

  (?:(?: \ r \ n)?[\ t])*(?:(?:(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031 ] +(?:(?:(?:\ r \ n)?[\ t]
 )+ | \ž|(= [\ [? “()<> @,;:\\” \ [\]]))| “?(:[^ \” \ r \\] | \\。 |(?:(?:\ r \ n)?[\ t]))*“(?:( ?:
 (?:(?:\ r \ n)?[\ t])*(?:[^()<> @,;: \“。\ [\] \ 000- \ 031] +(?:(?:(
 (?:[\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' ^ \“\ r \\] | \\ |(:?(:\ r \ n)的[? 
 (?:(?:(r:n)?[\ t])*))* @(?:(?:\ r \ n)?[\ t])*(?: [^()<> @,;:\\“。\ [\] \ 000- \ 0
 31] +(?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”。\ [\ ]]))| \ [([^ \ [\] \ r \\] |。\\)* \
 ](?:(?:\ r \ n)?[\ t])*)(?:\。(?:(?:\ r \ n)?[\ t])*(?:[^ <> @,;:\\“。\ [\] \ 000- \ 031] +
 (?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”。\ [\]]) )| \ [([^ \ [\] \ r \\] |。\\)* \](?:
 (?:\ r \ n)?[\ t])*))* |(?: [^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?: (?:(?:\ r \ n)?[\ t])+ | \ Z
 |(= [\ [ “()<> @,;:\\?”。\ [\]])。?)| “?(:[^ \” \ r \\] | \\ |(:( ?:\ r \ n)?[\ t]))*“(?:(?:\ r \ n)
 ?(?:[^()<> @,;:\\“)。[\ t] \ [\] \ 000- \ 031] +(?:(?:(?:\
 r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”。\ [\]])) ] \ r \\] | \\)* \](:(:\ r \ n?)。?
  (?:\。)(?:(?:\ r \ n)?[\ t])*(?:[^()<> @,;:\\“。\ [\ \ 000- \ 031] +(:?(?:(?:\ r \ n)的
 ?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”。\ [\]]))| \ [([^ \ [\ \] | \\。)* \](?:(?:\ r \ n)?[\ t]
 )*(?:(^:,@(?:(?:\ r \ n)?[\ t])*(?:[^ 000- \ 031] +(:?(?:(?:\ r \ n)的[?
  \吨])+ | \ž|(= [\ [ “(<> @,;:\\。\ [\]?)”]))| \ [([^ \ [\] \ r \\] | \\。)* \](?:(?:\ r \ n)?[\ t])*
 (?:(?:\ r \ n)?[\ t])*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031 ] +(?:(?:(?:\ r \ n)?[\ t]
 )+ | \ž|(= [\ [? “()<> @,;:\\” \ [\]]))| \ [([^ \ [\] \ r \\] | \\ 。)* \](?:(?:\ r \ n)?[\ t])*))*)
 *:(?:(?:\ r \ n)?[\ t])*)?(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] + (?:(?:(?:\ r \ n)?[\ t])+
 | \ž|(= [\ [ “()<> @,;:\\?”。\ [\]])。)| “?(:[^ \” \ r \\] | \\ |( ?:(?:\ r \ n)?[\ t])*“(?:(?:\ r
 (?:(?:\ r \ n)?[\ t])*(?:[^()<> @,;:\\“ 。\ [\] \ 000- \ 031] +(?:(?:( ?:
 \ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”。\ [\]])) “\ r \\] | \\。|(?:(?:\ r \ n)?[\ t
 ]))*“(?:(?:\ r \ n)?[\ t])*))* @(?:(?:\ r \ n)?[\ t])*(?:[ ()<> @,;:\\“。\ [\] \ 000- \ 031
 ] +(?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”。\ [\ ]))| \ [([^ \ [\] \ r \\] |。\\)* \](
 ?(?:\ r \ n)?[\ t])*)(?:\。(?:(?:\ r \ n)?[\ t])*(?:[^ @,;:\\“。\ [\] \ 000- \ 031] +(?
 :(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”。\ [\]])) \ [([^ \ [\] \ r \\] | \\。)* \(:(?
 :\ r \ n)?[\ t))*))* \>(?:(?:\ r \ n)?[\ t]) :\\“。\ [\] \ 000- \ 031] +(?:(?
 (?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”。\ [\]])) :[^ \“\ r \\] | \\ |(:(:\ r \ n)的????
 [?:(?:(?:\ r \ n)?[\ t])*(?: (:(:[^()<> @,;:?\\” \ [\] 
 (?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\” \ [\]]))| “(?:[^ \” \ r \\] |
 (?:(?:(r:n)?[\ t] :(?:\ r \ n)?[\ t])*(?:[^()<>

 @:; \\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t])+ | \ Z | “()<> @,;:\\” \ [\]]))|”
 (?:[^ \“\ r \\] | \\。|(?:(?:\ r \ n)?[\ t])) \ t])*))* @(?:(?:\ r \ n)?[\ t]
 )*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t]) + | \ž|(= [\ [“()<> @,;:\\?
 “。\ \ [\]])| \ [([^ \ [\] \ r \\] | \\。)* \](?:(?:\ r \ n)?[\ t])* )(?:\。(?:(?:\ r \ n)?[\ t])*(?
 :[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t])+ | \ Z ?|(= [\ [ “()<> @,;:\\” \ [
 \]]))| \ [([^ \ [\] \ r \\ | \\。)* \](?:(?:\ r \ n)?[\ t])*))* | (?:[^()<> @,;:\\“。\ [\] \ 000-
 \(?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [ \]]))| “(?:[^ \”。\ r \\] | \\ |(
 ?:(?:\ r \ n)?[\ t]))*“(?:(r:n) n)?[\ t])*(?:@(?:[^()<> @,
 (?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“() <> @,;:\\。” \ [\]]))| \ [(
 (?:(?:\ r \ n)?[\ t])*)(?:\。(?:(?:\ r \ n)?[\ t])*(?:[^()<> @,;:\\“
 (?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“() ;:\\” \ [\]]))| \ [([^ \ [\
 ] \\。)* \](?:(?:\ r \ n)?[\ t])*))*(?:, )?[\ t])*(?:[^()<> @,;:\\“。\
 (?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“ \\” \ [\]]))|。\ [([^ \ [\] \
 (?:(?:\ r \ n)?[\ t])*)(?:\。吨])*(:[^(<> @,;:\\” \ [\?)] 
 (?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\” \ [\]]))| \ [([^ \ [\] \ r \\]
 | \\。)* \](?:(?:\ r \ n)?[\ t])*))*)*: )?(?:[^()<> @,;:\\“。\ [\] \ 0
 (?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“()<> @,;:\\”)。 \ [\]]))| “(?:[^ \” \ r \\] | \\
 (?:(?:\ r \ n)?[\ t]))*“(?:(?:\ r \ n)?[\ t])*)(?:\。 ?:\ r \ n)?[\ t])*(?:[^()<> @,
 (?:(?:(?:\ r \ n)?[\ t])+ | \ Z |(?= [\ [“( )<> @,;:\\ |( “\ [\]]))。”?
 :[?:(?:\ r \ n)?[\ t ])*))* @(?:(?:\ r \ n)?[\ t])*
 (?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t])+ \ž|(= [\ [ “()<> @,;:\\?”。
 \([:])():[\ t])(){(\\ [\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' (?:(?:\ r \ n)?[\ t])*(?:[
 ^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t])+ | \ Z | ?= [\ [ “()<> @,;:\\” \ [\]
 ]))| \ [([^ \ [\] \ r \\ | \\。)* \](?:(?:\ r \ n)?[\ t])*))* \> ?:(?:\ r \ n)?[\ t])*)(?:,\ s *(
 ?:(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t]) + | \ž|(= [\ [“()<> @,;:\\?
 “。\”[?]))|“(?:[^ \”\ r \\] | \\。|(?:(?:\ r \ n)?[\ t])) :(?:\ r \ n)?[\ t])*)(?:\。(?:(
 ?:[\ t])*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(? :\ r \ n)?[\ t])+ | \ Z |(?= [
 \ [ “()<> @,;:\\” \ [\]]。?))| “?(:[^ \” \ r \\] | \\ |(:(:\ r \ n)?[\ t]))*“(?:(?:\ r \ n)?[\ t
 ])*))* @(?:(?:\ r \ n)?[\ t])*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t
 ])+ | \ž|(= [\ [ “(<> @,;:\\。\ [\]?)”]))| \ [([^ \ [\] \ r \\] | \ (?:(?:\ r \ n)?[\ t])*)(?
 (?:(?:\ r \ n)?[\ t])*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +( ?:((?:(?:\ r \ n)?[\ t])+ |
 \ž|(= [\ [ “()<> @,;:\\?”。\ [\]])。)| \ [([^ \ [\] \ r \\] | \\)* (?:(?:\ r \ n)?[\ t])*))* |(?:
 [^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t])+ | \ Z | ?(= [\ [ “()<> @,;:\\” \ [\
 ]]))|“(?:[^ \”\ r \\] | \\。|(?:(?:\ r \ n)?[\ t])) r \ n)?[\ t])*)* \ <(?:(?:\ r \ n)
 ?(?:(?:(?:\ r)?[\ t])? \ n)?[\ t])+ | \ Z |(?= [\ [“
 ()<> @,;:。?\\” \ [\]]))| \ [([^ \ [\] \ r \\] | \\)* \](:(:\ r \ n)?[\ t])*)(?:\。(?:(?:\ r \ n)
 (?:(r:n))?[\ t])*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] [\ t])+ | \ Z |(?= [\ [“()<>

 @;:\\” \ [\]]))| \ [([^ \ [\] \ r \\] | \\)* \](:(:\ r \ n)的。??? [* t))*))*(?:,@(?:(?:\ r \ n)?[
  \(?:(?:(?:(?:\ r \ n)?[\吨])+ | \ž|(?= [\ [“()<> @,
 ;:\\“。\ [\]]))| \ [([^ \ [\] \ \ \] | \\。)* \](?:(?:\ r \ t))*)(?:\。(?:(?:\ r \ n)?[\ t]
 )*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t]) + | \ž|(= [\ [“()<> @,;:\\?
 “。\ \ [\]])| \ [([^ \ [\] \ r \\] | \\。)* \](?:(?:\ r \ n)?[\ t])* ))*)*:(?:(?:\ r \ n)?[\ t])*)?
 (?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t])+ \ž|(= [\ [ “()<> @,;:\\?”。
 \(?:(?:\ r \ n)?[\ t]))*“(?:( ?:\ r \ n)?[\ t])*)(?:\。(?:( ?:
 \(?:(?:(?:\ \)] [\ t] r \ n)?[\ t])+ | \ Z |(?= [\ [
 “()<> @,;:\\” \ [\]]))| “(?:[^ \”。?吗?\ r \\] | \\ |(:(:\ r \ n)的?[\ t]))*“(?:(?:\ r \ n)?[\ t])
 *))* @(?:((::\ r \ n)?[\ t])*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?:(?:(?:\ r \ n)?[\ t])
 + | \ž|(= [\ [ “()<> @,;:\\?” \ [\]]。))| \ [([^ \ [\] \ r \\] | \\。 )(?:(?:\ r \ n)?[\ t])*)(?:\
 (?:(?:\ r \ n)?[\ t])*(?:[^()<> @,;:\\“。\ [\] \ 000- \ 031] +(?: (?:(?:\ r \ n)?[\ t])+ | \ Z
 |(= [\ [ “()<> @,;:\\?”。\ [\]])。)| \ [([^ \ [\] \ r \\] | \\)* \] (?:(?:\ r \ n)?[\ t])*))* \>(?:(
 ?:\ r \ n)?[\ t])*))*)?; \ s *) 

这个问题被问了很多,但我想你应该退后一步,问自己为什么要在语法上validation电子邮件地址? 真的有什么好处?

  • 它不会遇到常见的错别字。
  • 它不会阻止人们input无效的电子邮件地址或input其他人的地址。

如果你想validation一个电子邮件是正确的,你别无select,只能发送确认电子邮件,并让用户回复。 在很多情况下,出于安全原因或出于道德方面的原因,您将不得不发送确认邮件(所以您不能按照自己的意愿签署某项服务)。

这取决于你最好的意思:如果你正在谈论捕获每个有效的电子邮件地址,使用以下内容:

 (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*) 

http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html )如果你正在寻找更简单的东西,但可以捕获大多数有效的电子邮件地址,请尝试如下所示:

 "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$" 

编辑:从链接:

这个正则expression式只会validation已经删除了任何注释的地址,并用空白replace(这是由模块完成的)。

这一切都取决于你想成为多么准确。 为了我的目的,我只是试图把bob @ aol.com (电子邮件中的空格)或steve (根本没有域名)或mary@aolcom (.com之前没有时间段)

 /^\S+@\S+\.\S+$/ 

当然,它会匹配那些不是有效的电子邮件地址的东西,但这是玩90/10规则的问题。

[已更新]我整理了我所知道的有关电子邮件地址validation的所有信息: http : //isemail.info ,现在它不仅validation了电子邮件地址,还诊断了问题。 我同意这里的许多意见,validation只是答案的一部分; 在http://isemail.info/about上看到我的论文。;

就我所知,is_email()仍然是唯一一个能够确切地告诉你一个给定的string是否是有效的电子邮件地址的validation器。 我已经在http://isemail.info/上传了一个新的版本;

我整理了来自Cal Henderson,Dave Child,Phil Haack,Doug Lovell,RFC5322和RFC 3696的testing用例。共有275个testing地址。 我对所有可以find的免费validation器进行了所有这些testing。

我会尽量保持这个页面是最新的,因为人们提高validation。 感谢Cal,Michael,Dave,Paul和Phil在编写这些testing方面的帮助和合作,并对我自己的validation器进行了build设性的批评。

人们应该特别注意对RFC 3696的勘误 。 其中三个规范的例子实际上是无效的地址。 而地址的最大长度是254或256个字符, 而不是 320。

根据W3C HTML5规范 :

 ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$ 

语境:

有效的电子邮件地址是与ABNF生产相匹配的string。

注意:这个要求是对RFC 5322的故意违反 , RFC 5322定义了一个同时过于严格(在“@”字符之前),过于模糊(在“@”字符之后)和过于宽松的电子邮件地址的语法允许大多数用户不熟悉的评论,空格字符和引用string)在这里是实际使用的。

以下JavaScript和Perl兼容的正则expression式是上述定义的一个实现。

 /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ 

在Perl 5.10或更新版本中很容易:

 /(?(DEFINE) (?<address> (?&mailbox) | (?&group)) (?<mailbox> (?&name_addr) | (?&addr_spec)) (?<name_addr> (?&display_name)? (?&angle_addr)) (?<angle_addr> (?&CFWS)? < (?&addr_spec) > (?&CFWS)?) (?<group> (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ; (?&CFWS)?) (?<display_name> (?&phrase)) (?<mailbox_list> (?&mailbox) (?: , (?&mailbox))*) (?<addr_spec> (?&local_part) \@ (?&domain)) (?<local_part> (?&dot_atom) | (?&quoted_string)) (?<domain> (?&dot_atom) | (?&domain_literal)) (?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dcontent))* (?&FWS)? \] (?&CFWS)?) (?<dcontent> (?&dtext) | (?&quoted_pair)) (?<dtext> (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e]) (?<atext> (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|}~]) (?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?) (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?) (?<dot_atom_text> (?&atext)+ (?: \. (?&atext)+)*) (?<text> [\x01-\x09\x0b\x0c\x0e-\x7f]) (?<quoted_pair> \\ (?&text)) (?<qtext> (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e]) (?<qcontent> (?&qtext) | (?&quoted_pair)) (?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))* (?&FWS)? (?&DQUOTE) (?&CFWS)?) (?<word> (?&atom) | (?&quoted_string)) (?<phrase> (?&word)+) # Folding white space (?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+) (?<ctext> (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e]) (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment)) (?<comment> \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) ) (?<CFWS> (?: (?&FWS)? (?&comment))* (?: (?:(?&FWS)? (?&comment)) | (?&FWS))) # No whitespace control (?<NO_WS_CTL> [\x01-\x08\x0b\x0c\x0e-\x1f\x7f]) (?<ALPHA> [A-Za-z]) (?<DIGIT> [0-9]) (?<CRLF> \x0d \x0a) (?<DQUOTE> ") (?<WSP> [\x20\x09]) ) (?&address)/x 

不知道最好的,但这个至less是正确的,只要地址有他们的评论剥离和replace为空白。

认真。 您应该使用已经写好的库来validation电子邮件。 最好的方法可能是发送一个validation电子邮件到该地址。

我用

 ^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$ 

这是由RegularExpressionValidator在ASP.NET中使用的。

The email addresses I want to validate are going to be used by an ASP.NET web application using the System.Net.Mail namespace to send emails to a list of people. So, rather than using some very complex regular expression, I just try to create a MailAddress instance from the address. The MailAddress construtor will throw an exception if the address is not formed properly. This way, I know I can at least get the email out of the door. Of course this is server-side validation but at a minimum you need that anyway.

 protected void emailValidator_ServerValidate(object source, ServerValidateEventArgs args) { try { var a = new MailAddress(txtEmail.Text); } catch (Exception ex) { args.IsValid = false; emailValidator.ErrorMessage = "email: " + ex.Message; } } 

Quick answer

Use the following regex for input validation:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

Addresses matched by this regex:

  • have a local part (ie the part before the @-sign) that is strictly compliant with RFC 5321/5322,
  • have a domain part (ie the part after the @-sign) that is a host name with at least two labels, each of which is at most 63 characters long.

The second constraint is a restriction on RFC 5321/5322.

Elaborate answer

Using a regular expression that recognizes email addresses could be useful in various situations: for example to scan for email addresses in a document, to validate user input, or as an integrity constraint on a data repository.

It should however be noted that if you want to find out if the address actually refers to an existing mailbox, there's no substitute for sending a message to the address. If you only want to check if an address is grammatically correct then you could use a regular expression, but note that ""@[] is a grammatically correct email address that certainly doesn't refer to an existing mailbox.

The syntax of email addresses has been defined in various RFCs , most notably RFC 822 and RFC 5322 . RFC 822 should be seen as the "original" standard and RFC 5322 as the latest standard. The syntax defined in RFC 822 is the most lenient and subsequent standards have restricted the syntax further and further, where newer systems or services should recognize obsolete syntax, but never produce it.

In this answer I'll take “email address” to mean addr-spec as defined in the RFCs (ie jdoe@example.org , but not "John Doe"<jdoe@example.org> , nor some-group:jdoe@example.org,mrx@exampel.org; ).

There's one problem with translating the RFC syntaxes into regexes: the syntaxes are not regular! This is because they allow for optional comments in email addresses that can be infinitely nested, while infinite nesting can't be described by a regular expression. To scan for or validate addresses containing comments you need a parser or more powerful expressions. (Note that languages like Perl have constructs to describe context free grammars in a regex-like way.) In this answer I'll disregard comments and only consider proper regular expressions.

The RFCs define syntaxes for email messages, not for email addresses as such. Addresses may appear in various header fields and this is where they are primarily defined. When they appear in header fields addresses may contain (between lexical tokens) whitespace, comments and even linebreaks. Semantically this has no significance however. By removing this whitespace, etc. from an address you get a semantically equivalent canonical representation . Thus, the canonical representation of first. last (comment) @ [3.5.7.9] is first.last@[3.5.7.9] .

Different syntaxes should be used for different purposes. If you want to scan for email addresses in a (possibly very old) document it may be a good idea to use the syntax as defined in RFC 822. On the other hand, if you want to validate user input you may want to use the syntax as defined in RFC 5322, probably only accepting canonical representations. You should decide which syntax applies to your specific case.

I use POSIX "extended" regular expressions in this answer, assuming an ASCII compatible character set.

RFC 822

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I'll try to fix the expression as soon as possible.

([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))*

I believe it's fully complient with RFC 822 including the errata . It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. Where an erratum has been published I give a separate expression for the corrected grammar rule (marked "erratum") and use the updated version as a subexpression in subsequent regular expressions.

As stated in paragraph 3.1.4. of RFC 822 optional linear white space may be inserted between lexical tokens. Where applicable I've expanded the expressions to accommodate this rule and marked the result with "opt-lwsp".

 CHAR = <any ASCII character> =~ . CTL = <any ASCII control character and DEL> =~ [\x00-\x1F\x7F] CR = <ASCII CR, carriage return> =~ \r LF = <ASCII LF, linefeed> =~ \n SPACE = <ASCII SP, space> =~ HTAB = <ASCII HT, horizontal-tab> =~ \t <"> = <ASCII quote mark> =~ " CRLF = CR LF =~ \r\n LWSP-char = SPACE / HTAB =~ [ \t] linear-white-space = 1*([CRLF] LWSP-char) =~ ((\r\n)?[ \t])+ specials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> / "." / "[" / "]" =~ [][()<>@,;:\\".] quoted-pair = "\" CHAR =~ \\. qtext = <any CHAR excepting <">, "\" & CR, and including linear-white-space> =~ [^"\\\r]|((\r\n)?[ \t])+ dtext = <any CHAR excluding "[", "]", "\" & CR, & including linear-white-space> =~ [^][\\\r]|((\r\n)?[ \t])+ quoted-string = <"> *(qtext|quoted-pair) <"> =~ "([^"\\\r]|((\r\n)?[ \t])|\\.)*" (erratum) =~ "(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*" domain-literal = "[" *(dtext|quoted-pair) "]" =~ \[([^][\\\r]|((\r\n)?[ \t])|\\.)*] (erratum) =~ \[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*] atom = 1*<any CHAR except specials, SPACE and CTLs> =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+ word = atom / quoted-string =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*" domain-ref = atom sub-domain = domain-ref / domain-literal =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*] local-part = word *("." word) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))* domain = sub-domain *("." sub-domain) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* addr-spec = local-part "@" domain =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*(\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*)*@((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (canonical) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))* 

RFC 5322

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I'll try to fix the expression as soon as possible.

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*])

I believe it's fully complient with RFC 5322 including the errata . It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. For rules that include semantically irrelevant (folding) whitespace, I give a separate regex marked "(normalized)" that doesn't accept this whitespace.

I ignored all the "obs-" rules from the RFC. This means that the regexes only match email addresses that are strictly RFC 5322 compliant. If you have to match "old" addresses (as the looser grammar including the "obs-" rules does), you can use one of the RFC 822 regexes from the previous paragraph.

 VCHAR = %x21-7E =~ [!-~] ALPHA = %x41-5A / %x61-7A =~ [A-Za-z] DIGIT = %x30-39 =~ [0-9] HTAB = %x09 =~ \t CR = %x0D =~ \r LF = %x0A =~ \n SP = %x20 =~ DQUOTE = %x22 =~ " CRLF = CR LF =~ \r\n WSP = SP / HTAB =~ [\t ] quoted-pair = "\" (VCHAR / WSP) =~ \\[\t -~] FWS = ([*WSP CRLF] 1*WSP) =~ ([\t ]*\r\n)?[\t ]+ ctext = %d33-39 / %d42-91 / %d93-126 =~ []!-'*-[^-~] ("comment" is left out in the regex) ccontent = ctext / quoted-pair / comment =~ []!-'*-[^-~]|(\\[\t -~]) (not regular) comment = "(" *([FWS] ccontent) [FWS] ")" (is equivalent to FWS when leaving out comments) CFWS = (1*([FWS] comment) [FWS]) / FWS =~ ([\t ]*\r\n)?[\t ]+ atext = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" =~ [-!#-'*+/-9=?AZ^-~] dot-atom-text = 1*atext *("." 1*atext) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)* dot-atom = [CFWS] dot-atom-text [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)* qtext = %d33 / %d35-91 / %d93-126 =~ []!#-[^-~] qcontent = qtext / quoted-pair =~ []!#-[^-~]|(\\[\t -~]) (erratum) quoted-string = [CFWS] DQUOTE ((1*([FWS] qcontent) [FWS]) / FWS) DQUOTE [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)? (normalized) =~ "([]!#-[^-~ \t]|(\\[\t -~]))+" dtext = %d33-90 / %d94-126 =~ [!-Z^-~] domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)? (normalized) =~ \[[\t -Z^-~]*] local-part = dot-atom / quoted-string =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+" domain = dot-atom / domain-literal =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*] addr-spec = local-part "@" domain =~ ((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?)@((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?) (normalized) =~ ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*]) 

Note that some sources (notably w3c ) claim that RFC 5322 is too strict on the local part (ie the part before the @-sign). This is because "..", "a..b" and "a." are not valid dot-atoms, while they may be used as mailbox names. The RFC, however, does allow for local parts like these, except that they have to be quoted. So instead of a..b@example.net you should write "a..b"@example.net , which is semantically equivalent.

Further restrictions

SMTP (as defined in RFC 5321 ) further restricts the set of valid email addresses (or actually: mailbox names). It seems reasonable to impose this stricter grammar, so that the matched email address can actually be used to send an email.

RFC 5321 basically leaves alone the "local" part (ie the part before the @-sign), but is stricter on the domain part (ie the part after the @-sign). It allows only host names in place of dot-atoms and address literals in place of domain literals.

The grammar presented in RFC 5321 is too lenient when it comes to both host names and IP addresses. I took the liberty of "correcting" the rules in question, using this draft and RFC 1034 as guidelines. Here's the resulting regex.

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

Note that depending on the use case you may not want to allow for a "General-address-literal" in your regex. Also note that I used a negative lookahead (?!IPv6:) in the final regex to prevent the "General-address-literal" part to match malformed IPv6 addresses. Some regex processors don't support negative lookahead. Remove the substring |(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ from the regex if you want to take the whole "General-address-literal" part out.

Here's the derivation:

 Let-dig = ALPHA / DIGIT =~ [0-9A-Za-z] Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig =~ [0-9A-Za-z-]*[0-9A-Za-z] (regex is updated to make sure sub-domains are max. 63 charactes long - RFC 1034 section 3.5) sub-domain = Let-dig [Ldh-str] =~ [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])? Domain = sub-domain *("." sub-domain) =~ [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)* Snum = 1*3DIGIT =~ [0-9]{1,3} (suggested replacement for "Snum") ip4-octet = DIGIT / %x31-39 DIGIT / "1" 2DIGIT / "2" %x30-34 DIGIT / "25" %x30-35 =~ 25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9] IPv4-address-literal = Snum 3("." Snum) =~ [0-9]{1,3}(\.[0-9]{1,3}){3} (suggested replacement for "IPv4-address-literal") ip4-address = ip4-octet 3("." ip4-octet) =~ (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} (suggested replacement for "IPv6-hex") ip6-h16 = "0" / ( (%x49-57 / %x65-70 /%x97-102) 0*3(%x48-57 / %x65-70 /%x97-102) ) =~ 0|[1-9A-Fa-f][0-9A-Fa-f]{0,3} (not from RFC) ls32 = ip6-h16 ":" ip6-h16 / ip4-address =~ (0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} (suggested replacement of "IPv6-addr") ip6-address = 6(ip6-h16 ":") ls32 / "::" 5(ip6-h16 ":") ls32 / [ ip6-h16 ] "::" 4(ip6-h16 ":") ls32 / [ *1(ip6-h16 ":") ip6-h16 ] "::" 3(ip6-h16 ":") ls32 / [ *2(ip6-h16 ":") ip6-h16 ] "::" 2(ip6-h16 ":") ls32 / [ *3(ip6-h16 ":") ip6-h16 ] "::" ip6-h16 ":" ls32 / [ *4(ip6-h16 ":") ip6-h16 ] "::" ls32 / [ *5(ip6-h16 ":") ip6-h16 ] "::" ip6-h16 / [ *6(ip6-h16 ":") ip6-h16 ] "::" =~ (((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?:: IPv6-address-literal = "IPv6:" ip6-address =~ IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::) Standardized-tag = Ldh-str =~ [0-9A-Za-z-]*[0-9A-Za-z] dcontent = %d33-90 / %d94-126 =~ [!-Z^-~] General-address-literal = Standardized-tag ":" 1*dcontent =~ [0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ address-literal = "[" ( IPv4-address-literal / IPv6-address-literal / General-address-literal ) "]" =~ \[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)] Mailbox = Local-part "@" ( Domain / address-literal ) =~ ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)]) 

User input validation

A common use case is user input validation, for example on an html form. In that case it's usually reasonable to preclude address-literals and to require at least two labels in the hostname. Taking the improved RFC 5321 regex from the previous section as a basis, the resulting expression would be:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

I do not recommend restricting the local part further, eg by precluding quoted strings, since we don't know what kind of mailbox names some hosts allow (like "a..b"@example.net or even "ab"@example.net ).

I also do not recommend explicitly validating against a list of literal top-level domains or even imposing length-constraints (remember how ".museum" invalidated [az]{2,4} ), but if you must:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?\.)*(net|org|com|info| etc… )

Make sure to keep your regex up-to-date if you decide to go down the path of explicit top-level domain validation.

进一步考虑

When only accepting host names in the domain part (after the @-sign), the regexes above accept only labels with at most 63 characters, as they should. However, they don't enforce the fact that the entire host name must be at most 253 characters long (including the dots). Although this constraint is strictly speaking still regular, it's not feasible to make a regex that incorporates this rule.

Another consideration, especially when using the regexes for input validation, is feedback to the user. If a user enters an incorrect address, it would be nice to give a little more feedback than a simple "syntactically incorrect address". With "vanilla" regexes this is not possible.

These two considerations could be addressed by parsing the address. The extra length constraint on host names could in some cases also be addressed by using an extra regex that checks it, and matching the address against both expressions.

None of the regexes in this answer are optimized for performance. If performance is an issue, you should see if (and how) the regex of your choice can be optimized.

There are plenty examples of this out on the net (and I think even one that fully validates the RFC – but it's tens/hundreds of lines long if memory serves). People tend to get carried away validating this sort of thing. Why not just check it has an @ and at least one . and meets some simple minimum length. It's trivial to enter a fake email and still match any valid regex anyway. I would guess that false positives are better than false negatives.

While deciding which characters are allowed, please remember your apostrophed and hyphenated friends. I have no control over the fact that my company generates my email address using my name from the HR system. That includes the apostrophe in my last name. I can't tell you how many times I have been blocked from interacting with a website by the fact that my email address is "invalid".

This regex is from Perl's Email::Valid library. I believe it to be the most accurate, it matches all 822. And, it is based on the regular expression in the O'Reilly book:

Regular expression built using Jeffrey Friedl's example in Mastering Regular Expressions ( http://www.ora.com/catalog/regexp/ ).

 $RFC822PAT = <<'EOF'; [\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\ xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xf f\n\015()]*)*\)[\040\t]*)*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\x ff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015 "]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\ xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80 -\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]* )*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\ \\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\ x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x8 0-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n \015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x 80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^ \x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040 \t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([ ^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\ \\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\ x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80- \xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015() ]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\ x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\04 0\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\ n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\ 015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?! [^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\ ]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\ x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\01 5()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*|(?:[^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff] )|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^ ()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*(?:(?:\([^\\\x80-\xff\n\0 15()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][ ^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\ n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>@,;:".\\\[\]\ x80-\xff\000-\010\012-\037]*)*<[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(? :(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80- \xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:@[\040\t]* (?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015 ()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015() ]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\0 40)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\ [^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\ xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]* )*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80 -\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x 80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t ]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\ \[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff]) *\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x 80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80 -\xff\n\015()]*)*\)[\040\t]*)*)*(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015( )]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\ \x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*@[\040\t ]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\0 15()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015 ()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^( \040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]| \\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80 -\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015() ]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x 80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^ \x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040 \t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff ])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\ \x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x 80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*(?:\([^\\\x80-\xff\n\015 ()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\ \\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)?(?:[^ (\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000- \037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\ n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]| \([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\)) [^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff \n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\x ff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*( ?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\ 000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\ xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\x ff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*) *\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\x ff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80- \xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*) *(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\ ]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\] )[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80- \xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\x ff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*( ?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80 -\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)< >@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x8 0-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?: \([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()] *(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*) *\)[\040\t]*)*)*>) EOF 

As you're writing in PHP I'd advice you to use the PHP build-in validation for emails.

 filter_var($value, FILTER_VALIDATE_EMAIL) 

If you're running a php-version lower than 5.3.6 please be aware of this issue: https://bugs.php.net/bug.php?id=53091

If you want more information how this buid-in validation works, see here: Does PHP's filter_var FILTER_VALIDATE_EMAIL actually work?

Cal Henderson (Flickr) wrote an article called Parsing Email Adresses in PHP and shows how to do proper RFC (2)822-compliant Email Address parsing. You can also get the source code in php , python and ruby which is cc licensed .

I never bother creating with my own regular expression, because chances are that someone else has already come up with a better version. I always use regexlib to find one to my liking.

There is not one which is really usable.
I discuss some issues in my answer to Is there a php library for email address validation? , it is discussed also in Regexp recognition of email address hard?

In short, don't expect a single, usable regex to do a proper job. And the best regex will validate the syntax, not the validity of an e-mail (jhohn@example.com is correct but it will probably bounce…).

One simple regular expression which would at least not reject any valid email address would be checking for something, followed by an @ sign and then something followed by a period and at least 2 somethings. It won't reject anything, but after reviewing the spec I can't find any email that would be valid and rejected.

email =~ /.+@[^@]+\.[^@]{2,}$/

You could use the one employed by the jQuery Validation plugin:

 /^((([az]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([az]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([az]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([az]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i 

For the most comprehensive evaluation of the best regular expression for validating an email address please see this link; " Comparing E-mail Address Validating Regular Expressions "

Here is the current top expression for reference purposes:

 /^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[az])\.)+[az]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i 

Not to mention that non-Latin (Chinese, Arabic, Greek, Hebrew, Cyrillic and so on) domain names are to be allowed in the near future . Everyone has to change the email regex used, because those characters are surely not to be covered by [az]/i nor \w . They will all fail.

After all, the best way to validate the email address is still to actually send an email to the address in question to validate the address. If the email address is part of user authentication (register/login/etc), then you can perfectly combine it with the user activation system. Ie send an email with a link with an unique activation key to the specified email address and only allow login when the user has activated the newly created account using the link in the email.

If the purpose of the regex is just to quickly inform the user in the UI that the specified email address doesn't look like in the right format, best is still to check if it matches basically the following regex:

 ^([^.@]+)(\.[^.@]+)*@([^.@]+\.)+([^.@]+)$ 

就那么简单。 Why on earth would you care about the characters used in the name and domain? It's the client's responsibility to enter a valid email address, not the server's. Even when the client enters a syntactically valid email address like aa@bb.cc , this does not guarantee that it's a legit email address. No one regex can cover that.

For a vivid demonstration, the following monster is pretty good but still does not correctly recognize all syntactically valid email addresses: it recognizes nested comments up to four levels deep.

This is a job for a parser, but even if an address is syntactically valid, it still may not be deliverable. Sometimes you have to resort to the hillbilly method of "Hey, y'all, watch ee-us!"

 // derivative of work with the following copyright and license: // Copyright (c) 2004 Casey West. All rights reserved. // This module is free software; you can redistribute it and/or // modify it under the same terms as Perl itself. // see http://search.cpan.org/~cwest/Email-Address-1.80/ private static string gibberish = @" (?-xism:(?:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+ |\s+)*[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+(?-xism:(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+ |\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:( ?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((? :\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x 0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*<DQ>(?-xism:(?-xism:[ ^\\<DQ>])|(?-xism:\\(?-xism:[^\x0A\x0D])))+<DQ>(?-xism:(?-xi sm:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xis m:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\ s*)+|\s+)*))+)?(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(? -xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*<(?-xism:(?-xi sm:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^( )\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*( ?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D])) |)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()< >\[\]:;@\,.<DQ>\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s] +)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism: (?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s *\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((? :\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x 0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xi sm:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)* <DQ>(?-xism:(?-xism:[^\\<DQ>])|(?-xism:\\(?-xism:[^\x0A\x0D] )))+<DQ>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-x ism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+ )*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism:( ?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(? -xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^ ()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s *\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+( ?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+)*)(?-xism:(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*) +|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism: (?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\(( ?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\ x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?-x ism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-xi sm:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism: \\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:( ?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+ )*\s*\)\s*)+|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?- xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))|(?-xism:(?-x ism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^ ()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s* (?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]) )|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F() <>\[\]:;@\,.<DQ>\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s ]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism :(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\ s*\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\(( ?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\ x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-x ism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+) *<DQ>(?-xism:(?-xism:[^\\<DQ>])|(?-xism:\\(?-xism:[^\x0A\x0D ])))+<DQ>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\ \]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?- xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|) +)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism: (?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\( ?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[ ^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\ s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+ (?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+)*)(?-xism:(?-xism :\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism: [^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+ ))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s* )+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism :(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\( (?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A \x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?- xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-x ism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism :\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism: (?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*)) +)*\s*\)\s*)+|\s+)*))))(?-xism:\s*\((?:\s*(?-xism:(?-xism:(? >[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?: \s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0 D]))|)+)*\s*\)\s*))+)*\s*\)\s*)*)" .Replace("<DQ>", "\"") .Replace("\t", "") .Replace(" ", "") .Replace("\r", "") .Replace("\n", ""); private static Regex mailbox = new Regex(gibberish, RegexOptions.ExplicitCapture); 

Here's the PHP I use. I've choosen this solution in the spirit of "false positives are better than false negatives" as declared by another commenter here AND with regards to keeping your response time up and server load down … there's really no need to waste server resources with a regular expression when this will weed out most simple user error. You can always follow this up by sending a test email if you want.

 function validateEmail($email) { return (bool) stripos($email,'@'); } 

According to official standard RFC 2822 valid email regex is

 (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]) 

if you want to use it in Java its really very easy

 import java.util.regex.*; class regexSample { public static void main(String args[]) { //Input the string for validation String email = "xyz@hotmail.com"; //Set the email pattern string Pattern p = Pattern.compile(" (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" +"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")" + "@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\]"); //Match the given string with the pattern Matcher m = p.matcher(email); //check whether match is found boolean matchFound = m.matches(); if (matchFound) System.out.println("Valid Email Id."); else System.out.println("Invalid Email Id."); } } 

RFC 5322 standard:

Allows dot-atom local-part, quoted-string local-part, obsolete (mixed dot-atom and quoted-string) local-part, domain name domain, (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain, and (nested) CFWS.

 '/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}@)((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?!(?1)[a-z0-9-]{64,})(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?!(?1)[a-z0-9-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD' 

RFC 5321 standard:

Allows dot-atom local-part, quoted-string local-part, domain name domain, and (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain.

 '/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?@)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD' 

Basic:

Allows dot-atom local-part and domain name domain (requiring at least two domain name labels with the TLD limited to 2-6 alphabetic characters).

 "/^(?!.{255,})(?!.{65,}@)([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*@(?!.*[^.]{64,})(?>[a-z0-9](?>[a-z0-9-]*[a-z0-9])?\.){1,126}[az]{2,6}$/iD" 

The HTML5 spec suggests a simple regex for validating email addresses:

 /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ 

This intentionally doesn't comply with RFC 5322 .

Note: This requirement is a willful violation of RFC 5322 , which defines a syntax for e-mail addresses that is simultaneously too strict (before the @ character), too vague (after the @ character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

 public bool ValidateEmail(string sEmail) { if (sEmail == null) { return false; } int nFirstAT = sEmail.IndexOf('@'); int nLastAT = sEmail.LastIndexOf('@'); if ((nFirstAT > 0) && (nLastAT == nFirstAT) && (nFirstAT < (sEmail.Length - 1))) { return (Regex.IsMatch(sEmail, @"^[az|0-9|AZ]*([_][az|0-9|AZ]+)*([.][az|0-9|AZ]+)*([.][az|0-9|AZ]+)*(([_][az|0-9|AZ]+)*)?@[az][az|0-9|AZ]*\.([az][az|0-9|AZ]*(\.[az][az|0-9|AZ]*)?)$")); } else { return false; } } 

Strange that you "cannot" allow 4 characters TLDs. You are banning people from .info and .name , and the length limitation stop .travel and .museum , but yes, they are less common than 2 characters TLDs and 3 characters TLDs.

You should allow uppercase alphabets too. Email systems will normalize the local part and domain part.

For your regex of domain part, domain name cannot starts with '-' and cannot ends with '-'. Dash can only stays in between.

If you used the PEAR library, check out their mail function (forgot the exact name/library). You can validate email address by calling one function, and it validates the email address according to definition in RFC822.