什么是最好的正则expression式来检查一个string是否是一个有效的URL?

如何检查给定的string是否是有效的URL地址?

我对正则expression式的了解是基本的,不允许我从我已经在网上看到的数百个正则expression式中进行select。

我写了我的URL(实际上IRI,国际化)模式,以符合RFC 3987( http://www.faqs.org/rfcs/rfc3987.html )。 这些在PCRE语法中。

绝对IRI(国际化):

/^[az](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?$/i 

还要允许相关的IRI:

 /^(?:[az](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?|(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?)$/i 

他们如何被编译(在PHP中):

 <?php /* Regex convenience functions (character class, non-capturing group) */ function cc($str, $suffix = '', $negate = false) { return '[' . ($negate ? '^' : '') . $str . ']' . $suffix; } function ncg($str, $suffix = '') { return '(?:' . $str . ')' . $suffix; } /* Preserved from RFC3986 */ $ALPHA = 'a-z'; $DIGIT = '0-9'; $HEXDIG = $DIGIT . 'a-f'; $sub_delims = '!\\$&\'\\(\\)\\*\\+,;='; $gen_delims = ':\\/\\?\\#\\[\\]@'; $reserved = $gen_delims . $sub_delims; $unreserved = '-' . $ALPHA . $DIGIT . '\\._~'; $pct_encoded = '%' . cc($HEXDIG) . cc($HEXDIG); $dec_octet = ncg(implode('|', array( cc($DIGIT), cc('1-9') . cc($DIGIT), '1' . cc($DIGIT) . cc($DIGIT), '2' . cc('0-4') . cc($DIGIT), '25' . cc('0-5') ))); $IPv4address = $dec_octet . ncg('\\.' . $dec_octet, '{3}'); $h16 = cc($HEXDIG, '{1,4}'); $ls32 = ncg($h16 . ':' . $h16 . '|' . $IPv4address); $IPv6address = ncg(implode('|', array( ncg($h16 . ':', '{6}') . $ls32, '::' . ncg($h16 . ':', '{5}') . $ls32, ncg($h16, '?') . '::' . ncg($h16 . ':', '{4}') . $ls32, ncg($h16 . ':' . $h16, '?') . '::' . ncg($h16 . ':', '{3}') . $ls32, ncg(ncg($h16 . ':', '{0,2}') . $h16, '?') . '::' . ncg($h16 . ':', '{2}') . $ls32, ncg(ncg($h16 . ':', '{0,3}') . $h16, '?') . '::' . $h16 . ':' . $ls32, ncg(ncg($h16 . ':', '{0,4}') . $h16, '?') . '::' . $ls32, ncg(ncg($h16 . ':', '{0,5}') . $h16, '?') . '::' . $h16, ncg(ncg($h16 . ':', '{0,6}') . $h16, '?') . '::', ))); $IPvFuture = 'v' . cc($HEXDIG, '+') . cc($unreserved . $sub_delims . ':', '+'); $IP_literal = '\\[' . ncg(implode('|', array($IPv6address, $IPvFuture))) . '\\]'; $port = cc($DIGIT, '*'); $scheme = cc($ALPHA) . ncg(cc('-' . $ALPHA . $DIGIT . '\\+\\.'), '*'); /* New or changed in RFC3987 */ $iprivate = '\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}'; $ucschar = '\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}' . '\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}' . '\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}' . '\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}' . '\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}' . '\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}'; $iunreserved = '-' . $ALPHA . $DIGIT . '\\._~' . $ucschar; $ipchar = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':@')); $ifragment = ncg($ipchar . '|' . cc('\\/\\?'), '*'); $iquery = ncg($ipchar . '|' . cc($iprivate . '\\/\\?'), '*'); $isegment_nz_nc = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '+'); $isegment_nz = ncg($ipchar, '+'); $isegment = ncg($ipchar, '*'); $ipath_empty = '(?!' . $ipchar . ')'; $ipath_rootless = ncg($isegment_nz) . ncg('\\/' . $isegment, '*'); $ipath_noscheme = ncg($isegment_nz_nc) . ncg('\\/' . $isegment, '*'); $ipath_absolute = '\\/' . ncg($ipath_rootless, '?'); // Spec says isegment-nz *( "/" isegment ) $ipath_abempty = ncg('\\/' . $isegment, '*'); $ipath = ncg(implode('|', array( $ipath_abempty, $ipath_absolute, $ipath_noscheme, $ipath_rootless, $ipath_empty ))) . ')'; $ireg_name = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '*'); $ihost = ncg(implode('|', array($IP_literal, $IPv4address, $ireg_name))); $iuserinfo = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':'), '*'); $iauthority = ncg($iuserinfo . '@', '?') . $ihost . ncg(':' . $port, '?'); $irelative_part = ncg(implode('|', array( '\\/\\/' . $iauthority . $ipath_abempty . '', '' . $ipath_absolute . '', '' . $ipath_noscheme . '', '' . $ipath_empty . '' ))); $irelative_ref = $irelative_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?'); $ihier_part = ncg(implode('|', array( '\\/\\/' . $iauthority . $ipath_abempty . '', '' . $ipath_absolute . '', '' . $ipath_rootless . '', '' . $ipath_empty . '' ))); $absolute_IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?'); $IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?'); $IRI_reference = ncg($IRI . '|' . $irelative_ref); 

编辑2011年3月7日:由于PHP处理带引号的string中的反斜杠的方式,这些默认情况下是不可用的。 除了反斜杠在正则expression式中有特殊含义的地方,你需要双重转义反斜杠。 你可以这样做:

 $escape_backslash = '/(?<!\\)\\(?![\[\]\\\^\$\.\|\*\+\(\)QEnrtaefvdwsDWSbAZzB1-9GX]|x\{[0-9a-f]{1,4}\}|\c[AZ]|)/'; $absolute_IRI = preg_replace($escape_backslash, '\\\\', $absolute_IRI); $IRI = preg_replace($escape_backslash, '\\\\', $IRI); $IRI_reference = preg_replace($escape_backslash, '\\\\', $IRI_reference); 

我刚刚写了一篇博客文章,提供了一个很好的解决scheme来识别大多数使用的格式的URL,例如:

  • www.google.com
  • http://www.google.com
  • mailto:somebody@google.com
  • somebody@google.com
  • www.url-with-querystring.com/?url=has-querystring

使用的正则expression式是:

 /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/ 

不过,我build议你去http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the查看工作示例。;

什么平台? 如果使用.NET,请使用System.Uri.TryCreate ,而不是正则expression式。

例如:

 static bool IsValidUrl(string urlString) { Uri uri; return Uri.TryCreate(urlString, UriKind.Absolute, out uri) && (uri.Scheme == Uri.UriSchemeHttp || uri.Scheme == Uri.UriSchemeHttps || uri.Scheme == Uri.UriSchemeFtp || uri.Scheme == Uri.UriSchemeMailto /*...*/); } // In test fixture... [Test] void IsValidUrl_Test() { Assert.True(IsValidUrl("http://www.example.com")); Assert.False(IsValidUrl("javascript:alert('xss')")); Assert.False(IsValidUrl("")); Assert.False(IsValidUrl(null)); } 

(感谢@Yoshi关于javascript:的提示)

这是RegexBuddy使用的。

 (\b(https?|ftp|file)://)?[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|] 

它匹配下面这些( ** **标志内):

 **http://www.regexbuddy.com** **http://www.regexbuddy.com/** **http://www.regexbuddy.com/index.html** **http://www.regexbuddy.com/index.html?source=library** 

你可以在http://www.regexbuddy.com/download.html下载RegexBuddy。

关于眼睑的回答post,上面写着:“这是基于我对URI规范的阅读”。谢谢眼睑,您是我寻求的完美解决scheme,因为它基于URI规范! 精湛的工作。 🙂

我不得不作出两个修正。 第一个获得正则expression式在PHP(v5.2.10)中使用preg_match()函数正确匹配IP地址URL。

我不得不在pipe道周围的“IP地址”上方添加一组括号:

 )|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?# 

不知道为什么。

我也把顶级域的最小长度从3个减less到2个字母,以支持.co.uk和类似的。

最终代码:

 /^(https?|ftp):\/\/(?# protocol )(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?# username )(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?# password )@)?(?# auth requires @ )((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?# domain segments AND )[az][a-z0-9-]*[a-z0-9](?# top level domain OR )|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?# )(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?# IP address ))(:\d+)?(?# port ))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path )(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?# query string )?)?)?(?# path and query string optional )(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?# fragment )$/i 

这个修改后的版本没有经过URI规范检查,所以我不能保证它的合规性,它被改变来处理本地networking环境和两个数字顶级域名以及其他types的url的URL,并在PHP中更好地工作我使用的设置。

作为PHP代码:

 define('URL_FORMAT', '/^(https?):\/\/'. // protocol '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password '@)?(?#'. // auth requires @ ')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND '[az][a-z0-9-]*[a-z0-9]'. // top level domain OR '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'. '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address ')(:\d+)?'. // port ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path '(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string '?)?)?'. // path and query string optional '(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment '$/i'); 

这是一个使用正则expression式validation各种URL的PHP​​testing程序:

 <?php define('URL_FORMAT', '/^(https?):\/\/'. // protocol '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password '@)?(?#'. // auth requires @ ')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND '[az][a-z0-9-]*[a-z0-9]'. // top level domain OR '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'. '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address ')(:\d+)?'. // port ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path '(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string '?)?)?'. // path and query string optional '(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment '$/i'); /** * Verify the syntax of the given URL. * * @access public * @param $url The URL to verify. * @return boolean */ function is_valid_url($url) { if (str_starts_with(strtolower($url), 'http://localhost')) { return true; } return preg_match(URL_FORMAT, $url); } /** * String starts with something * * This function will return true only if input string starts with * niddle * * @param string $string Input string * @param string $niddle Needle string * @return boolean */ function str_starts_with($string, $niddle) { return substr($string, 0, strlen($niddle)) == $niddle; } /** * Test a URL for validity and count results. * @param url url * @param expected expected result (true or false) */ $numtests = 0; $passed = 0; function test_url($url, $expected) { global $numtests, $passed; $numtests++; $valid = is_valid_url($url); echo "URL Valid?: " . ($valid?"yes":"no") . " for URL: $url. Expected: ".($expected?"yes":"no").". "; if($valid == $expected) { echo "PASS\n"; $passed++; } else { echo "FAIL\n"; } } echo "URL Tests:\n\n"; test_url("http://localserver/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true); test_url("http://www.google.com", true); test_url("http://www.google.co.uk/projects/my%20folder/test.php", true); test_url("https://myserver.localdomain", true); test_url("http://192.168.1.120/projects/index.php", true); test_url("http://192.168.1.1/projects/index.php", true); test_url("http://projectpier-server.localdomain/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true); test_url("https://2.4.168.19/project-pier?c=test&a=b", true); test_url("https://localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true); test_url("http://user:password@localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true); echo "\n$passed out of $numtests tests passed.\n\n"; ?> 

再次感谢正则expression式的眼睑

获取URL(Regex)部分的post讨论了parsingURL以识别其各个组件。 如果你想检查一个URL是否是格式良好的,它应该足以满足你的需求。

如果你需要检查它是否真的有效,你最终必须尝试访问另一端的东西。

但是,一般来说,使用由框架或其他库提供给您的函数可能会更好。 许多平台都包含parsingURL的function。 例如,有Python的urlparse模块,在.NET中可以使用System.Uri类的构造函数作为validationURL的方法。

Mathias Bynens在很多正则expression式的最佳对比方面有一篇很好的文章: 寻找完美的URLvalidation正则expression式

发布的最好的一个是有点长,但它匹配的任何东西,你可以扔在它。

JavaScript版本

 /^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\u00a1-\uffff0-9]-*)*[az\u00a1-\uffff0-9]+)(?:\.(?:[az\u00a1-\uffff0-9]-*)*[az\u00a1-\uffff0-9]+)*(?:\.(?:[az\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$/i 

PHP版本

 _^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\x{00a1}-\x{ffff}0-9]-*)*[az\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[az\x{00a1}-\x{ffff}0-9]-*)*[az\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[az\x{00a1}-\x{ffff}]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$_iuS 

这可能不是一个正则expression式的工作,而是用您所选语言的现有工具。 您可能想要使用已经编写,testing和debugging的现有代码。

在PHP中,使用parse_url函数。

Perl: URI模块 。

Ruby: URI模块 。

.NET: 'Uri'类

正则expression式不是一个魔术棒,你碰巧涉及到string的每一个问题。

为了便于参考,以下是IETF规范:( TXT | HTML )。 特别是, Appendix B. Parsing a URI Reference with a Regular Expression就是要点。 这是他们提供的正则expression式:

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 

正如其他人所说的,最好把它留给你已经使用的lib / framework。

这将匹配所有url

  • 有或没有http / https
  • 有或没有www

…包括子域名和那些新的顶级域名扩展名如。 博物馆学院基础等等,最多可以有63个字符(不只是.com.net.info等)

 (([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)? 

因为今天可用的顶级域名扩展的最大长度是13个字符,例如。 国际 ,您可以将expression式中的数字63更改为13,以防止有人滥用它。

作为JavaScript

 var urlreg=/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/; $('textarea').on('input',function(){ var url = $(this).val(); $(this).toggleClass('invalid', urlreg.test(url) == false) }); $('textarea').trigger('input'); 
 textarea{color:green;} .invalid{color:red;} 
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <textarea>http://www.google.com</textarea> <textarea>http//www.google.com</textarea> <textarea>googlecom</textarea> <textarea>https://www.google.com</textarea> 
  function validateURL(textval) { var urlregex = new RegExp( "^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&amp;%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&amp;%\$#\=~_\-]+))*$"); return urlregex.test(textval); } 

匹配http://site.com/dir/file.php?var=moo | FTP://用户:pass@site.com:21 /文件/目录

不匹配site.com | http://site.com/dir//

URL的最佳正则expression式是:

 "(([\w]+:)?//)?(([\d\w]|%[a-fA-F\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?" 

如果你真的寻找最终的匹配,你可能会发现它在“ 好url正则expression式? ”。

但是一个真正匹配所有可能的域的正则expression式,并允许根据RFC来允许的任何东西是非常漫长和不可读的,相信我;-)

 function validateURL(textval) { var urlregex = new RegExp( "^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&amp;%\$#\=~])*$"); return urlregex.test(textval); } 

匹配http://www.asdah.com/~joe | ftp://ftp.asdah.co.uk:2828/asdah%20asdah.gif | https://asdah.gov/asdh-ah.as

我写了一个可以运行的小Groovy版本

它匹配以下url(这对我来说已经足够了)

 public static void main(args){ String url = "go to http://www.m.abut.ly/abc its awesome" url = url.replaceAll(/https?:\/\/w{0,3}\w*?\.(\w*?\.)?\w{2,3}\S*|www\.(\w*?\.)?\w*?\.\w{2,3}\S*|(\w*?\.)?\w*?\.\w{2,3}[\/\?]\S*/ , { it -> "woof${it}woof" }) println url } 

http://google.com

http://google.com/help.php

http://google.com/help.php?a=5

http://www.google.com

http://www.google.com/help.php

http://www.google.com?a=5

google.com?a=5

google.com/help.php

google.com/help.php?a=5

http://www.m.google.com/help.php?a=5 (及其所有排列)

http://www.m.google.com/help.php?a=5(及其所有排列);

m.google.com/help.php?a=5(及其所有排列)

对于任何不以http或www开头的URL,重要的是它们必须包含一个/或?

我敢打赌,这可以调整多一点,但它做这个工作相当不错,因为这么短,紧凑…因为你几乎可以拆分它在3:

find任何以http:https:// w {0,3} \ w *?。\ w {2,3} \ S *

find以www:www。\ w *?。\ w {2,3} \ S *

或find任何必须有一个文本然后一个点,然后至less2个字母,然后一个? 或/:\ w *?。\ w {2,3} [/ \?] \ S *

我无法find正在寻找的正则expression式,所以我修改了一个正则expression式来满足我的要求,显然现在看起来工作正常。 我的要求是:

在这里,我想出了任何build议表示赞赏:

 @Test public void testWebsiteUrl(){ String regularExpression = "((http|ftp|https):\\/\\/)?[\\w\\-_]+(\\.[\\w\\-_]+)+([\\w\\-\\.,@?^=%&amp;:/~\\+#]*[\\w\\-\\@?^=%&amp;/~\\+#])?"; assertTrue("www.google.com".matches(regularExpression)); assertTrue("www.google.co.uk".matches(regularExpression)); assertTrue("http://www.google.com".matches(regularExpression)); assertTrue("http://www.google.co.uk".matches(regularExpression)); assertTrue("https://www.google.com".matches(regularExpression)); assertTrue("https://www.google.co.uk".matches(regularExpression)); assertTrue("google.com".matches(regularExpression)); assertTrue("google.co.uk".matches(regularExpression)); assertTrue("google.mu".matches(regularExpression)); assertTrue("mes.intnet.mu".matches(regularExpression)); assertTrue("cse.uom.ac.mu".matches(regularExpression)); assertTrue("http://www.google.com/path".matches(regularExpression)); assertTrue("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e".matches(regularExpression)); assertTrue("http://www.google.com/?queryparam=123".matches(regularExpression)); assertTrue("http://www.google.com/path?queryparam=123".matches(regularExpression)); assertFalse("www..dr.google".matches(regularExpression)); assertFalse("www:google.com".matches(regularExpression)); assertFalse("https://www@.google.com".matches(regularExpression)); assertFalse("https://www.google.com\"".matches(regularExpression)); assertFalse("https://www.google.com'".matches(regularExpression)); assertFalse("http://www.google.com/path'".matches(regularExpression)); assertFalse("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e'".matches(regularExpression)); assertFalse("http://www.google.com/?queryparam=123'".matches(regularExpression)); assertFalse("http://www.google.com/path?queryparam=12'3".matches(regularExpression)); } 

我一直在深入研究使用正则expression式讨论URIvalidation的文章。 它基于RFC3986。

正则expression式URIvalidation

虽然这篇文章还没有完成,但我已经想出了一个PHP函数,它在validationHTTP和FTP URL方面做得相当不错。 这是当前版本:

 // function url_valid($url) { Rev:20110423_2000 // // Return associative array of valid URI components, or FALSE if $url is not // RFC-3986 compliant. If the passed URL begins with: "www." or "ftp.", then // "http://" or "ftp://" is prepended and the corrected full-url is stored in // the return array with a key name "url". This value should be used by the caller. // // Return value: FALSE if $url is not valid, otherwise array of URI components: // eg // Given: "http://www.jmrware.com:80/articles?height=10&width=75#fragone" // Array( // [scheme] => http // [authority] => www.jmrware.com:80 // [userinfo] => // [host] => www.jmrware.com // [IP_literal] => // [IPV6address] => // [ls32] => // [IPvFuture] => // [IPv4address] => // [regname] => www.jmrware.com // [port] => 80 // [path_abempty] => /articles // [query] => height=10&width=75 // [fragment] => fragone // [url] => http://www.jmrware.com:80/articles?height=10&width=75#fragone // ) function url_valid($url) { if (strpos($url, 'www.') === 0) $url = 'http://'. $url; if (strpos($url, 'ftp.') === 0) $url = 'ftp://'. $url; if (!preg_match('/# Valid absolute URI having a non-empty, valid DNS host. ^ (?P<scheme>[A-Za-z][A-Za-z0-9+\-.]*):\/\/ (?P<authority> (?:(?P<userinfo>(?:[A-Za-z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*)@)? (?P<host> (?P<IP_literal> \[ (?: (?P<IPV6address> (?: (?:[0-9A-Fa-f]{1,4}:){6} | ::(?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3} | (?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2} | (?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4}: | (?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?:: ) (?P<ls32>[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4} | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?:: ) | (?P<IPvFuture>[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&\'()*+,;=:]+) ) \] ) | (?P<IPv4address>(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)) | (?P<regname>(?:[A-Za-z0-9\-._~!$&\'()*+,;=]|%[0-9A-Fa-f]{2})+) ) (?::(?P<port>[0-9]*))? ) (?P<path_abempty>(?:\/(?:[A-Za-z0-9\-._~!$&\'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*) (?:\?(?P<query> (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? (?:\#(?P<fragment> (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? $ /mx', $url, $m)) return FALSE; switch ($m['scheme']) { case 'https': case 'http': if ($m['userinfo']) return FALSE; // HTTP scheme does not allow userinfo. break; case 'ftps': case 'ftp': break; default: return FALSE; // Unrecognized URI scheme. Default to FALSE. } // Validate host name conforms to DNS "dot-separated-parts". if ($m['regname']) { // If host regname specified, check for DNS conformance. if (!preg_match('/# HTTP DNS host name. ^ # Anchor to beginning of string. (?!.{256}) # Overall host length is less than 256 chars. (?: # Group dot separated host part alternatives. [A-Za-z0-9]\. # Either a single alphanum followed by dot | # or... part has more than one char (63 chars max). [A-Za-z0-9] # Part first char is alphanum (no dash). [A-Za-z0-9\-]{0,61} # Internal chars are alphanum plus dash. [A-Za-z0-9] # Part last char is alphanum (no dash). \. # Each part followed by literal dot. )* # Zero or more parts before top level domain. (?: # Explicitly specify top level domains. com|edu|gov|int|mil|net|org|biz| info|name|pro|aero|coop|museum| asia|cat|jobs|mobi|tel|travel| [A-Za-z]{2}) # Country codes are exactly two alpha chars. \.? # Top level domain can end in a dot. $ # Anchor to end of string. /ix', $m['host'])) return FALSE; } $m['url'] = $url; for ($i = 0; isset($m[$i]); ++$i) unset($m[$i]); return $m; // return TRUE == array of useful named $matches plus the valid $url. } 

This function utilizes two regexes; one to match a subset of valid generic URIs (absolute ones having a non-empty host), and a second to validate the DNS "dot-separated-parts" host name. Although this function currently validates only HTTP and FTP schemes, it is structured such that it can be easily extended to handle other schemes.

I use this regex:

 ((https?:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)? 

To support both:

 http://stackoverflow.com https://stackoverflow.com 

和:

 //stackoverflow.com 

This one works for me very well. (https?|ftp)://(www\d?|[a-zA-Z0-9]+)?\.[a-zA-Z0-9-]+(\:|\.)([a-zA-Z0-9.]+|(\d+)?)([/?:].*)?

Here's a ready-to-go Java version from the Android source code. This is the best one I've found.

 public static final Matcher WEB = Pattern.compile(new StringBuilder() .append("((?:(http|https|Http|Https|rtsp|Rtsp):") .append("\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)") .append("\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_") .append("\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?") .append("((?:(?:[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}\\.)+") // named host .append("(?:") // plus top level domain .append("(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])") .append("|(?:biz|b[abdefghijmnorstvwyz])") .append("|(?:cat|com|coop|c[acdfghiklmnoruvxyz])") .append("|d[ejkmoz]") .append("|(?:edu|e[cegrstu])") .append("|f[ijkmor]") .append("|(?:gov|g[abdefghilmnpqrstuwy])") .append("|h[kmnrtu]") .append("|(?:info|int|i[delmnoqrst])") .append("|(?:jobs|j[emop])") .append("|k[eghimnrwyz]") .append("|l[abcikrstuvy]") .append("|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])") .append("|(?:name|net|n[acefgilopruz])") .append("|(?:org|om)") .append("|(?:pro|p[aefghklmnrstwy])") .append("|qa") .append("|r[eouw]") .append("|s[abcdeghijklmnortuvyz]") .append("|(?:tel|travel|t[cdfghjklmnoprtvwz])") .append("|u[agkmsyz]") .append("|v[aceginu]") .append("|w[fs]") .append("|y[etu]") .append("|z[amw]))") .append("|(?:(?:25[0-5]|2[0-4]") // or ip address .append("[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]") .append("|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]") .append("[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}") .append("|[1-9][0-9]|[0-9])))") .append("(?:\\:\\d{1,5})?)") // plus option port number .append("(\\/(?:(?:[a-zA-Z0-9\\;\\/\\?\\:\\@\\&\\=\\#\\~") // plus option query params .append("\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?") .append("(?:\\b|$)").toString() ).matcher(""); 

I found the following Regex for URLs, tested successfully with 500+ URLs :

/\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\x{00a1}-\x{ffff}0-9]+-?)*[az\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[az\x{00a1}-\x{ffff}0-9]+-?)*[az\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[az\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b/gi

I know it looks ugly, but the good thing is that it works. 🙂

Explanation and demo with 581 random URLs on regex101.

Source: In search of the perfect URL validation regex

I tried to formulate my version of url. My requirement was to capture instances in a String where possible url can be cse.uom.ac.mu – noting that it is not preceded by http nor www

 String regularExpression = "((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})"; assertTrue("www.google.com".matches(regularExpression)); assertTrue("www.google.co.uk".matches(regularExpression)); assertTrue("http://www.google.com".matches(regularExpression)); assertTrue("http://www.google.co.uk".matches(regularExpression)); assertTrue("https://www.google.com".matches(regularExpression)); assertTrue("https://www.google.co.uk".matches(regularExpression)); assertTrue("google.com".matches(regularExpression)); assertTrue("google.co.uk".matches(regularExpression)); assertTrue("google.mu".matches(regularExpression)); assertTrue("mes.intnet.mu".matches(regularExpression)); assertTrue("cse.uom.ac.mu".matches(regularExpression)); //cannot contain 2 '.' after www assertFalse("www..dr.google".matches(regularExpression)); //cannot contain 2 '.' just before com assertFalse("www.dr.google..com".matches(regularExpression)); // to test case where url www must be followed with a '.' assertFalse("www:google.com".matches(regularExpression)); // to test case where url www must be followed with a '.' //assertFalse("http://wwwe.google.com".matches(regularExpression)); // to test case where www must be preceded with a '.' assertFalse("https://www@.google.com".matches(regularExpression)); 

For Python, this is the actual URL validating regex used in Django 1.5.1:

 import re regex = re.compile( r'^(?:http|ftp)s?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[AZ]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain... r'localhost|' # localhost... r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4 r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6 r'(?::\d+)?' # optional port r'(?:/?|[/?]\S+)$', re.IGNORECASE) 

This does both ipv4 and ipv6 addresses as well as ports and GET parameters.

Found in the code here , Line 44.

whats wrong with plain and simple FILTER_VALIDATE_URL ?

  $url = "http://www.example.com"; if(!filter_var($url, FILTER_VALIDATE_URL)) { echo "URL is not valid"; } else { echo "URL is valid"; } 

I know its not the question exactly but it did the job for me when I needed to validate urls so thought it might be useful to others who come across this post looking for the same thing

The following RegEx will work:

 "@((((ht)|(f))tp[s]?://)|(www\.))([az][-a-z0-9]+\.)?([az][-a-z0-9]+\.)?[az][-a-z0-9]+\.[az]+[/]?[a-z0-9._\/~#&=;%+?-]*@si" 

For convenience here's a one-liner regexp for URL's that will also match localhost where you're more likely to have ports than .com or similar.

 (http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}(\.[az]{2,6}|:[0-9]{3,4})\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*) 

You don't specify which language you're using. If PHP is, there is a native function for that:

 $url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/'; if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) { // Wrong } else { // Valid } 

Returns the filtered data, or FALSE if the filter fails.

Check it here >>

希望它有帮助。

这个怎么样:

 ^(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})$ 

These are the test cases:

Test cases

You can try it out in here : https://regex101.com/r/mS9gD7/41

This is a rather old thread now and the question asks for a regex based URL validator. I ran into the thread whilst looking for precisely the same thing. While it may well be possible to write a really comprehensive regex to validate URLs I eventually settled on another way to do things – by using PHP's parse_url function.

It returns boolean false if the url cannot be parsed. Otherwise it returns the scheme, the host and other information. This may well not be enough for a comprehensive URL check on its own but can be drilled down into for further analysis. If the intent is to simply catch typos, invalid schemes etc it is perfectly adequate.

To Check URL regex would be:

 ^http(s{0,1})://[a-zA-Z0-9_/\\-\\.]+\\.([A-Za-z/]{2,5})[a-zA-Z0-9_/\\&\\?\\=\\-\\.\\~\\%]*