我如何检查一个URL是否通过PHP存在?

如何检查PHP中是否存在URL(不是404)?

这里:

$file = 'http://www.domain.com/somefile.jpg'; $file_headers = @get_headers($file); if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') { $exists = false; } else { $exists = true; } 

从这里和上面的post下面 ,有一个curl的解决scheme:

 function url_exists($url) { if (!$fp = curl_init($url)) return false; return true; } 
 $headers = @get_headers($this->_value); if(strpos($headers[0],'200')===false)return false; 

所以任何时候你联系一个网站,并得到200以外的东西确定它会工作

当搞清楚一个url是否存在于PHP中时,有几点需要注意:

  • url本身是有效的(一个string,不是空的,好的语法),这是快速检查服务器端。
  • 等待响应可能需要时间并阻止代码执行。
  • 并不是所有由get_headers()返回的头都格式正确。
  • 使用curl(如果可以的话)。
  • 阻止提取整个主体/内容,但仅请求标题。
  • 考虑redirecturl:
    • 你想要返回第一个代码吗?
    • 或者按照所有redirect并返回最后的代码?
    • 你可能会得到一个200,但它可以redirect使用meta标签或JavaScript。 找出困难之后会发生什么。

请记住,无论您使用哪种方法,都需要等待响应。
所有的代码可能(也可能会)停止,直到你知道结果或请求超时。

例如:如果url无效或无法访问,以下代码可能需要很长时间才能显示该网页:

 <?php $urls = getUrls(); // some function getting say 10 or more external links foreach($urls as $k=>$url){ // this could potentially take 0-30 seconds each // (more or less depending on connection, target site, timeout settings...) if( ! isValidUrl($url) ){ unset($urls[$k]); } } echo "yay all done! now show my site"; foreach($urls as $url){ echo "<a href=\"{$url}\">{$url}</a><br/>"; } 

下面的function可能是有用的,你可能想修改它们以适应你的需要:

  function isValidUrl($url){ // first do some quick sanity checks: if(!$url || !is_string($url)){ return false; } // quick check url is roughly a valid http request: ( http://blah/... ) if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){ return false; } // the next bit could be slow: if(getHttpResponseCode_using_curl($url) != 200){ // if(getHttpResponseCode_using_getheaders($url) != 200){ // use this one if you cant use curl return false; } // all good! return true; } function getHttpResponseCode_using_curl($url, $followredirects = true){ // returns int responsecode, or false (if url does not exist or connection timeout occurs) // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings)) // if $followredirects == false: return the FIRST known httpcode (ignore redirects) // if $followredirects == true : return the LAST known httpcode (when redirected) if(! $url || ! is_string($url)){ return false; } $ch = @curl_init($url); if($ch === false){ return false; } @curl_setopt($ch, CURLOPT_HEADER ,true); // we want headers @curl_setopt($ch, CURLOPT_NOBODY ,true); // dont need body @curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true); // catch output (do NOT print!) if($followredirects){ @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true); @curl_setopt($ch, CURLOPT_MAXREDIRS ,10); // fairly random number, but could prevent unwanted endless redirects with followlocation=true }else{ @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false); } // @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5); // fairly random number (seconds)... but could prevent waiting forever to get a result // @curl_setopt($ch, CURLOPT_TIMEOUT ,6); // fairly random number (seconds)... but could prevent waiting forever to get a result // @curl_setopt($ch, CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"); // pretend we're a regular browser @curl_exec($ch); if(@curl_errno($ch)){ // should be 0 @curl_close($ch); return false; } $code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int @curl_close($ch); return $code; } function getHttpResponseCode_using_getheaders($url, $followredirects = true){ // returns string responsecode, or false if no responsecode found in headers (or url does not exist) // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings)) // if $followredirects == false: return the FIRST known httpcode (ignore redirects) // if $followredirects == true : return the LAST known httpcode (when redirected) if(! $url || ! is_string($url)){ return false; } $headers = @get_headers($url); if($headers && is_array($headers)){ if($followredirects){ // we want the the last errorcode, reverse array so we start at the end: $headers = array_reverse($headers); } foreach($headers as $hline){ // search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc. // note that the exact syntax/version/output differs, so there is some string magic involved here if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***" $code = $matches[1]; return $code; } } // no HTTP/xxx found in headers: return false; } // no headers : return false; } 

你不能在某些服务器上使用curl,你可以使用这个代码

 <?php $url = 'http://www.example.com'; $array = get_headers($url); $string = $array[0]; if(strpos($string,"200")) { echo 'url exists'; } else { echo 'url does not exist'; } ?> 
 $url = 'http://google.com'; $not_url = 'stp://google.com'; if (@file_get_contents($url)): echo "Found '$url'!"; else: echo "Can't find '$url'."; endif; if (@file_get_contents($not_url)): echo "Found '$not_url!"; else: echo "Can't find '$not_url'."; endif; // Found 'http://google.com'!Can't find 'stp://google.com'. 
 function URLIsValid($URL) { $exists = true; $file_headers = @get_headers($URL); $InvalidHeaders = array('404', '403', '500'); foreach($InvalidHeaders as $HeaderVal) { if(strstr($file_headers[0], $HeaderVal)) { $exists = false; break; } } return $exists; } 

我使用这个function:

 /** * @param $url * @param array $options * @return string * @throws Exception */ function checkURL($url, array $options = array()) { if (empty($url)) { throw new Exception('URL is empty'); } // list of HTTP status codes $httpStatusCodes = array( 100 => 'Continue', 101 => 'Switching Protocols', 102 => 'Processing', 200 => 'OK', 201 => 'Created', 202 => 'Accepted', 203 => 'Non-Authoritative Information', 204 => 'No Content', 205 => 'Reset Content', 206 => 'Partial Content', 207 => 'Multi-Status', 208 => 'Already Reported', 226 => 'IM Used', 300 => 'Multiple Choices', 301 => 'Moved Permanently', 302 => 'Found', 303 => 'See Other', 304 => 'Not Modified', 305 => 'Use Proxy', 306 => 'Switch Proxy', 307 => 'Temporary Redirect', 308 => 'Permanent Redirect', 400 => 'Bad Request', 401 => 'Unauthorized', 402 => 'Payment Required', 403 => 'Forbidden', 404 => 'Not Found', 405 => 'Method Not Allowed', 406 => 'Not Acceptable', 407 => 'Proxy Authentication Required', 408 => 'Request Timeout', 409 => 'Conflict', 410 => 'Gone', 411 => 'Length Required', 412 => 'Precondition Failed', 413 => 'Payload Too Large', 414 => 'Request-URI Too Long', 415 => 'Unsupported Media Type', 416 => 'Requested Range Not Satisfiable', 417 => 'Expectation Failed', 418 => 'I\'ma teapot', 422 => 'Unprocessable Entity', 423 => 'Locked', 424 => 'Failed Dependency', 425 => 'Unordered Collection', 426 => 'Upgrade Required', 428 => 'Precondition Required', 429 => 'Too Many Requests', 431 => 'Request Header Fields Too Large', 449 => 'Retry With', 450 => 'Blocked by Windows Parental Controls', 500 => 'Internal Server Error', 501 => 'Not Implemented', 502 => 'Bad Gateway', 503 => 'Service Unavailable', 504 => 'Gateway Timeout', 505 => 'HTTP Version Not Supported', 506 => 'Variant Also Negotiates', 507 => 'Insufficient Storage', 508 => 'Loop Detected', 509 => 'Bandwidth Limit Exceeded', 510 => 'Not Extended', 511 => 'Network Authentication Required', 599 => 'Network Connect Timeout Error' ); $ch = curl_init($url); curl_setopt($ch, CURLOPT_NOBODY, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); if (isset($options['timeout'])) { $timeout = (int) $options['timeout']; curl_setopt($ch, CURLOPT_TIMEOUT, $timeout); } curl_exec($ch); $returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if (array_key_exists($returnedStatusCode, $httpStatusCodes)) { return "URL: '{$url}' - Error code: {$returnedStatusCode} - Definition: {$httpStatusCodes[$returnedStatusCode]}"; } else { return "'{$url}' does not exist"; } } 

karim79的get_headers()解决scheme没有为我工作,因为我得到疯狂的结果与Pinterest。

 get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Array ( [url] => https://www.pinterest.com/jonathan_parl/ [exists] => ) get_headers(): Failed to enable crypto Array ( [url] => https://www.pinterest.com/jonathan_parl/ [exists] => ) get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed Array ( [url] => https://www.pinterest.com/jonathan_parl/ [exists] => ) 

无论如何,这个开发者certificatecURL比get_headers()更快:

http://php.net/manual/fr/function.get-headers.php#104723

由于许多人要求karim79修复是cURL解决scheme,这里是我今天build立的解决scheme。

 /** * Send an HTTP request to a the $url and check the header posted back. * * @param $url String url to which we must send the request. * @param $failCodeList Int array list of code for which the page is considered invalid. * * @return Boolean */ public static function isUrlExists($url, array $failCodeList = array(404)){ $exists = false; if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){ $url = "https://" . $url; } if (preg_match(RegularExpression::URL, $url)){ $handle = curl_init($url); curl_setopt($handle, CURLOPT_RETURNTRANSFER, true); curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($handle, CURLOPT_HEADER, true); curl_setopt($handle, CURLOPT_NOBODY, true); curl_setopt($handle, CURLOPT_USERAGENT, true); $headers = curl_exec($handle); curl_close($handle); if (empty($failCodeList) or !is_array($failCodeList)){ $failCodeList = array(404); } if (!empty($headers)){ $exists = true; $headers = explode(PHP_EOL, $headers); foreach($failCodeList as $code){ if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){ $exists = false; break; } } } } return $exists; } 

让我解释一下curl选项:

CURLOPT_RETURNTRANSFER :返回一个string,而不是在屏幕上显示调用页面。

CURLOPT_SSL_VERIFYPEER :cUrl不会签出证书

CURLOPT_HEADER :在string中包含标题

CURLOPT_NOBODY :不要在string中包含主体

CURLOPT_USERAGENT :有些网站需要正常运作(例如: https : //plus.google.com )


附加说明 :在这个函数中,我使用Diego Perini的正则expression式在发送请求之前validationURL:

 const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[az\d\x{00a1}-\x{ffff}]+-?)*[az\d\x{00a1}-\x{ffff}]+)(?:\.(?:[az\d\x{00a1}-\x{ffff}]+-?)*[az\d\x{00a1}-\x{ffff}]+)*(?:\.[az\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini 

附加说明2 :我分解了头string和用户头[0],以确保只validation返回码和消息(例如:200,404,405等)

附加注释3 :有时仅validation代码404是不够的(参见unit testing),所以有一个可选的$ failCodeList参数来提供所有的代码列表来拒绝。

当然,这里是unit testing(包括所有stream行的社交networking)来合法化我的编码:

 public function testIsUrlExists(){ //invalid $this->assertFalse(ToolManager::isUrlExists("woot")); $this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456")); $this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800")); $this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405))); $this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/")); $this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456")); $this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546")); $this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405))); $this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456")); //valid $this->assertTrue(ToolManager::isUrlExists("www.google.ca")); $this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque")); $this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque")); $this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/")); $this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque")); $this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/")); $this->assertTrue(ToolManager::isUrlExists("https://regex101.com")); $this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire")); $this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/")); $this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666")); } 

对所有人而言,

来自蒙特利尔的Jonathan Parent-Lévesque

相当快:

 function http_response($url){ $resURL = curl_init(); curl_setopt($resURL, CURLOPT_URL, $url); curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1); curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback'); curl_setopt($resURL, CURLOPT_FAILONERROR, 1); curl_exec ($resURL); $intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE); curl_close ($resURL); if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) { return 0; } else return 1; } echo 'google:'; echo http_response('http://www.google.com'); echo '/ ogogle:'; echo http_response('http://www.ogogle.com'); 
 function urlIsOk($url) { $headers = @get_headers($url); $httpStatus = intval(substr($headers[0], 9, 3)); if ($httpStatus<400) { return true; } return false; } 

以上所有解决scheme+额外的糖。 (Ultimate AIO解决scheme)

 /** * Check that given URL is valid and exists. * @param string $url URL to check * @return bool TRUE when valid | FALSE anyway */ function urlExists ( $url ) { // Remove all illegal characters from a url $url = filter_var($url, FILTER_SANITIZE_URL); // Validate URI if (filter_var($url, FILTER_VALIDATE_URL) === FALSE // check only for http/https schemes. || !in_array(strtolower(parse_url($url, PHP_URL_SCHEME)), ['http','https'], true ) ) { return false; } // Check that URL exists $file_headers = @get_headers($url); return !(!$file_headers || $file_headers[0] === 'HTTP/1.1 404 Not Found'); } 

例:

 var_dump ( urlExists('http://stackoverflow.com/') ); // Output: true; 

检查url是否在线或离线—

 function get_http_response_code($theURL) { $headers = @get_headers($theURL); return substr($headers[0], 9, 3); } 

简单的方法是curl(而且更快)

 <?php $mylinks="http://site.com/page.html"; $handlerr = curl_init($mylinks); curl_setopt($handlerr, CURLOPT_RETURNTRANSFER, TRUE); $resp = curl_exec($handlerr); $ht = curl_getinfo($handlerr, CURLINFO_HTTP_CODE); if ($ht == '404') { echo 'OK';} else { echo 'NO';} ?> 

其他检查URL是否有效的方法可以是:

 <?php if (isValidURL("http://www.gimepix.com")) { echo "URL is valid..."; } else { echo "URL is not valid..."; } function isValidURL($url) { $file_headers = @get_headers($url); if (strpos($file_headers[0], "200 OK") > 0) { return true; } else { return false; } } ?> 

这里是只读取源代码的第一个字节的解决scheme…如果file_get_contents失败,则返回false …这也适用于远程文件,如图像。

  function urlExists($url) { if (@file_get_contents($url,false,NULL,0,1)) { return true; } return false; }