一个网站可以检测到当你使用selenium与铬酸盐？

我一直在用ChromedrivertestingSelenium，我注意到有些页面可以检测到你使用Selenium，即使没有自动化。即使当我只是通过Selenium和Xephyr手动浏览时，我经常会看到一个页面，表示检测到可疑活动。我已经检查过我的用户代理和我的浏览器指纹，它们和普通的Chrome浏览器完全一样。

当我浏览到这些网站在正常的铬一切正常，但是当我使用selenium我检测到。

从理论上讲，chromedriver和chrome应该和任何web服务器完全一样，但不知何故，他们可以检测到它。

如果你想要一些testing代码试试这个：

from pyvirtualdisplay import Display from selenium import webdriver display = Display(visible=1, size=(1600, 902)) display.start() chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--disable-extensions') chrome_options.add_argument('--profile-directory=Default') chrome_options.add_argument("--incognito") chrome_options.add_argument("--disable-plugins-discovery"); chrome_options.add_argument("--start-maximized") driver = webdriver.Chrome(chrome_options=chrome_options) driver.delete_all_cookies() driver.set_window_size(800,800) driver.set_window_position(0,0) print 'arguments done' driver.get('http://stubhub.com')

如果浏览stubhub，你会得到一个或两个请求中的redirect和“阻塞”。我一直在调查这一点，我不知道他们如何能够告诉用户正在使用selenium。

他们是如何做到的呢？

编辑更新：

我在Firefox中安装了Selenium IDE插件，当我在正常的Firefox浏览器中使用额外的插件访问stubhub.com时，我被禁止了。

编辑：

当我使用Fiddler查看来回发送的HTTP请求时，我注意到'假浏览器'的请求通常在响应头中有'no-cache'。

编辑：

结果是这样的有没有一种方法来检测，我在一个来自Javascript的Selenium Webdriver页面build议，当你使用一个webdriver时，应该没有办法检测到。但是这个证据表明否则。

编辑：

该网站上传指纹到他们的服务器，但我检查和指纹selenium指纹是相同的指纹使用铬。

编辑：

这是他们发送到服务器的指纹有效载荷之一

 {"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

它在selenium和铬相同

编辑：

VPN只能用于一次使用，但在加载第一页之后才能检测到。显然有一些JavaScript正在运行来检测selenium。

基本上，selenium检测的工作方式是，他们testing预定义的JavaScriptvariables，这些variables在运行selenium时出现。机器人检测脚本通常在任何variables（在窗口对象上）中查找包含单词“selenium”/“webdriver”的任何内容，以及名为$cdc_和$wdc_文档variables。当然，这一切都取决于你在哪个浏览器上。所有不同的浏览器暴露不同的东西。

对我来说，我使用了chrome，所以我所要做的只是确保$cdc_不再作为文档variables存在，并且（下载chromedriver源代码，修改chromedriver并以不同的名称重新编译$cdc_ 。）

这是我在chromedriver中修改的function：

call_function.js：

 function getPageCache(opt_doc) { var doc = opt_doc || document; //var key = '$cdc_asdjflasutopfhvcZLmcfl_'; var key = 'randomblabla_'; if (!(key in doc)) doc[key] = new Cache(); return doc[key]; }

（注意评论，我所做的所有我把'$ cdc_'到'randomblabla_'。

下面是一个伪代码，演示了一些botnetworking可能使用的技术：

 runBotDetection = function () { var documentDetectionKeys = [ "__webdriver_evaluate", "__selenium_evaluate", "__webdriver_script_function", "__webdriver_script_func", "__webdriver_script_fn", "__fxdriver_evaluate", "__driver_unwrapped", "__webdriver_unwrapped", "__driver_evaluate", "__selenium_unwrapped", "__fxdriver_unwrapped", ]; var windowDetectionKeys = [ "_phantom", "__nightmare", "_selenium", "callPhantom", "callSelenium", "_Selenium_IDE_Recorder", ]; for (const windowDetectionKey in windowDetectionKeys) { const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey]; if (window[windowDetectionKeyValue]) { return true; } }; for (const documentDetectionKey in documentDetectionKeys) { const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey]; if (window['document'][documentDetectionKeyValue]) { return true; } }; for (const documentKey in window['document']) { if (documentKey.match(/\$[az]dc_/) && window['document'][documentKey]['cache_']) { return true; } } if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true; if (window['document']['documentElement']['getAttribute']('selenium')) return true; if (window['document']['documentElement']['getAttribute']('webdriver')) return true; if (window['document']['documentElement']['getAttribute']('driver')) return true; return false; };

正如我们已经在问题和已发布的答案中已经想到的那样，这里有一个叫做“Distil Networks”的反networking抓取和一个Bot检测服务。而且，根据该公司CEO的采访：

即使他们可以创build新的机器人， 我们也find了一种方法来识别Selenium他们正在使用的工具，所以无论他们在这个机器人上迭代多less次，我们都会阻止Selenium 。我们现在用Python和许多不同的技术来做这件事。一旦我们看到某种机器人模式出现，我们就会对其使用的技术进行逆向工程，并将其识别为恶意软件。

了解他们如何检测selenium，需要时间和额外的挑战，但目前我们可以肯定的是：

这与您使用selenium进行的操作无关 – 一旦您导航到该网站，您将立即被发现并被禁止。我试图在操作之间添加人为的随机延迟，在页面加载后暂停 – 没有任何帮助
它不是关于浏览器指纹 – 在多个浏览器中使用干净的configuration文件尝试它，而不是隐身模式 – 没有任何帮助
因为根据访谈中的提示，这是“逆向工程”，我怀疑这是由一些在浏览器中执行的JS代码完成的，这揭示了这是一个通过selenium webdriver自动执行的浏览器

决定张贴它作为答复，因为显然：

一个网站可以检测到当你使用selenium和铬的时候？

是。

此外，我没有尝试过的是旧版selenium和旧版浏览器版本 – 理论上，在Distil Networks机器人探测器目前所依赖的某个点上，可能会有某些东西被实施/添加到selenium中。那么，如果是这样的话，我们可能会发现（是的，让我们来检测探测器），在什么时候发生了相关的变化，查看变化logging和变化集，可能会给我们更多关于去哪里看的信息他们用什么来检测一个networking驱动的浏览器。这只是一个需要testing的理论。

如何在wellsfargo.com上实施的示例：

 try { if (window.document.documentElement.getAttribute("webdriver")) return !+[] } catch (IDLMrxxel) {} try { if ("_Selenium_IDE_Recorder" in window) return !+"" } catch (KknKsUayS) {} try { if ("__webdriver_script_fn" in document) return !+""

partial interface Navigator { readonly attribute boolean webdriver; };

Navigator接口的webdriver IDL属性必须返回最初为false的webdriver-active标志的值。

该属性允许网站确定用户代理受WebDriver控制，并可用于帮助减轻拒绝服务攻击。

直接从2017年W3C编辑的草稿WebDriver 。这意味着，至less将来的selenium驱动程序的迭代将是可识别的，以防止滥用。最终，很难说没有源代码，究竟是什么原因导致在特定的铬驱动程序是可检测的。

这听起来像是在一个Web应用程序防火墙后面。看看modsecurity和owasp，看看这些是如何工作的。实际上，你所要问的是如何进行机器人检测回避。这不是什么seleniumnetworking驱动程序。这是testing您的Web应用程序不击中其他Web应用程序。这是可能的，但基本上，你必须看看WAF在他们的规则集中寻找什么，如果可以的话，专门避免使用selenium。即使如此，它可能仍然不起作用，因为你不知道他们使用的是什么WAF。你做了正确的第一步，即伪造用户代理。如果这不起作用，那么一个WAF就位，你可能需要更棘手。

编辑：从其他答案采取点。确保您的用户代理实际上是第一次正确设置。也许它打到本地Web服务器或嗅探stream量出去。

尝试使用特定的chrome用户configuration文件来使用selenium，这样你就可以将它用作特定的用户，并定义你想要的任何东西。这样做的时候，它将作为一个“真正的”用户运行，用一些进程pipe理器查看chrome进程，你会看到与标签的区别。

例如：

 username = os.getenv("USERNAME") userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default" options = webdriver.ChromeOptions() options.add_argument("user-data-dir={}".format(userProfile)) # add here any tag you want. options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"]) chromedriver = "C:\Python27\chromedriver\chromedriver.exe" os.environ["webdriver.chrome.driver"] = chromedriver browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

铬标签列表在这里

即使你发送了所有正确的数据（例如Selenium没有作为扩展名显示，你有一个合理的解决scheme/位深度，＆c），有一些服务和工具，分析访问者的行为，以确定是否演员是用户还是自动系统。

例如，访问一个网站，然后立即通过将鼠标直接移动到相关button，在不到一秒的时间内执行一些操作，这是用户实际上不会做的事情。

它也可能是一个有用的debugging工具，使用像https://panopticlick.eff.org/这样的网站来检查浏览器的独特性。; 它也将帮助您validation是否有任何指定您在Selenium中运行的特定参数。

据说Firefox设置window.navigator.webdriver === true如果使用webdriver工作。这是根据一个较旧的规格（例如： archive.org ），但除了附录中的一些非常模糊的措辞之外，我无法在新版本中find它。

一个testing是在文件fingerprint_test.js中的selenium代码，其中最后的评论说：“目前只在Firefox中实现”，但我不能在一个简单的grep ing在这个方向上识别任何代码，既不在当前（41.0.2）Firefox版本树和Chromium树。

我还从2015年1月的firefox驱动程序b82512999938中发现了一个关于指纹识别的较旧提交的评论。该代码仍然在昨天在javascript/firefox-driver/extension/content/server.js下载的Selenium GIT-master中，其注释链接到当前w3c webdriver规范中略有不同的附录。

您可以通过添加Chrome浏览器vpn扩展来绕过网站的安全问题，使浏览器更安全。我已经尝试过使用ZenMate Security VPN 。我已经看到，如果你把这个国家设置为美国，那么它是有效的。要激活扩展，您可以将Chrome扩展添加到Chrome浏览器，然后find它应该看起来像/Users/mesutgunes/Library/Application Support/Google/Chrome/Default/Extensions/fdcgdnkidjaadafnichfpabhfomcebme/5.3.1_0 。

使用扩展path设置选项variables，并使用该选项创buildchrome驱动程序，然后导航url，激活VPN，然后导航stubhub。我希望这会有所帮助。

 >>> from selenium import webdriver >>> from selenium.webdriver.chrome.options import Options >>> opt = Options() >>> opt.add_argument("load-extension=/Users/mesutgunes/Library/Application Support/Google/Chrome/Default/Extensions/fdcgdnkidjaadafnichfpabhfomcebme/5.3.1_0") >>> dr = webdriver.Chrome(chrome_options=opt) >>> dr.get("http://www.stubhub.com/") #activate the vpn with US then get the url again >>> dr.get("http://www.stubhub.com/") >>> dr.title u'Buy sports, concert and theater tickets on StubHub!' >>>

在这里输入图像说明

用下面的代码写一个html页面。您将看到，在DOM中，selenium在outerHTML中应用了webdriver属性

 <html> <head> <script type="text/javascript"> <!-- function showWindow(){ javascript:(alert(document.documentElement.outerHTML)); } //--> </script> </head> <body> <form> <input type="button" value="Show outerHTML" onclick="showWindow()"> </form> </body> </html>

有些网站正在检测这个：

 function d() { try { if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_) return !0 } catch (e) {} try { //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72"))) if (window.document.documentElement.getAttribute("webdriver")) return !0 } catch (e) {} try { //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window) if ("_Selenium_IDE_Recorder" in window) return !0 } catch (e) {} try { //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document) if ("__webdriver_script_fn" in document) return !0 } catch (e) {}

一个网站可以检测到当你使用selenium与铬酸盐？

如何将选项传递给使用Python的Selenium Chrome驱动程序？