So it commences ...

If infrastructure is made of fragile materials, then infrastructure inevitably crumbles. Everything runs on infrastructure, so everything inevitably crumbles.

It is claimed that 85%+ of internet sites use a CMS package (Wordpress, Joomla, Drupal, etc.). Security holes are constantly found in the thousands of lines of code that make up these packages. I claim [unscientifically, based on personal experience] that 93% of web site owners/managers are inattentive and incompetent. Yes, I include myself in that category.

cPanel programmers and Hostmonster, Bluehost and other large hosting companies that use it for $4.95/mo websites claim it is the customer's obligation to keep their CMS installs "secure" and if insecurities and cPanel features lead to hacking of your site, that is your fault. And actually, not a bug but a feature of cPanel. Notably, these companies encourage and provide easy tools for creating, for instance, quick Wordpress sites.

Since (I am guessing) somewhere around 2018, one of my domains had apparently been used for "black hat SEO" as far as I can make out. There have been [possibly legit] hits every second or less of every day. You would have thought I would get some sort of warning regarding excess bandwidth use, or *something* from my hosting company while this was going on. But no.

Sometime, I don't know when, my site was also "anonymousfox" hacked.

Let me share what I know about these events:

AnonymousFox

top A couple of weeks ago (as of 2021-06-30), I became aware that a mostly idle site I had on Hostmonster.com had been hit with the anonymousfox. It may have been long-standing or it may have been right around the moment I became aware of the hack. My reasoning is that I still had admin control of my site, so *possibly* the anonymousfox hack was going forward as I became aware of it.

I did not know anything about anonymousfox, other than it brands the hacked site by creating an "anonymousfox-[random characters]" email and user; and an "smtpfox-[random characters]" email and user. So, after changing the passwords for these accounts, I did a websearch regarding "anonymousfox". I learned:

This has been an ongoing hack on Wordpress and other CMS websites for a number of years.
There is a website that provides downloadable anonymousfox hacking software and tutorials
Based on my logs, it is possible to launch the attack directly from the anonymousfox website, as I will see hack attempts that are not "direct", but are referred from the anonymousfox website.
The hacker(s) can take complete control of the user's website via cPanel
The hack embeds hidden php files in various directories that give the hacker(s) access after any effort to clean up the site
The hosting company can, upon request, run a malware scan that finds some (or all?) of the php malware.

I always assumed that CMS installs such as Wordpress were security risks. However, I assumed at worst, the hacker would compromise the CMS website. I did not understand or anticipate that the hacker could obtain console level access and the ability to create and hijack cPanel accounts, including the admin cPanel login.

How the hack went forward

My guess is that the anonymousfox tool accomplishes certain things in an automated fashion: scanning for domains with vulnerable subdirectories, launching the hack, creating accounts and deploying software.
Anonymousfox changes the file .contactemail, which contains the user reset email link. In my case, this file contained only my admin email address. By changing the file, anonymousfox redirected any password reset code to the new anonymousfox email address it created. And, no reset warning will go to the original contact. (Presumably, anonymousfox phones home with the login information to that newly created email account so that the hacker can retrieve the reset email.) This is possible because cPanel stores the reset email in plain text. I consider this a serious cPanel flaw, but Hostmonster does not.
Anonymousfox is able to retrieve the admin user name from cPanel. It's not clear to me how this is done or where it is stored. Hostmonster tech support assured me that I must have shared my admin user id with *someone on my team* -- but I have no team, nor have I ever provided anyone with access to my admin account. Nor does anyone have access to any of my computers (unless of course, they are also "hacked", which I suppose is a possibility). But, I think anonymousfox simply extracted it from the cPanel or directory structure. As far as I know, the anonymousfox hacker never logged into my cPanel, nor did it appear from the logs shared by Hostmonster that anyone other than myself (or Hostmonster tech) ever logged into my cPanel. I think accessibility of the admin user name is a cPanel flaw.
Apparently, anonymousfox can trigger via a console command sending of reset email. So, the hacker's ip address does not show up in website logs AND the reset email with the reset code states that the URL from which the reset code has been requested is 127.0.0.1 (localhost). I know this because I intercepted the reset email before anonymousfox retrieved it. I consider this a serious cPanel flaw, but Hostmonster does not.
Anonymousfox created for itself several user accounts, including email, webdisk and root level ftp access.

The "Black SEO Hack"

top I don't know whether this was an anonymousfox related hack or not. Around 2014, I changed a main domain entry page from a Joomla index.php to a static index.html page, intending my site to be "sleeping". Based on file dates, around September of 2018, my static page was copied into an index.php file that contained encrypted php code. The result was that anyone specifically looking at my website (eg., ME), would see what was expected -- a note regarding a sleeping website. On the other hand, a request for any subpage would result in -- I assume -- execution of the php code. Here is a typical log entry - and the page following the "GET" represents what the bot was looking for:

"114.119.138.177 - - [28/Jun/2021:05:07:07 -0700] "GET /0ft4452jq1052Gf16 HTTP/1.1" 301 248 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"

It's unclear to me whether this is a legitimate "PetalBot" search engine bot or part of the hacking system. I was receiving inquiries every second or less 24/7, and to some extent continue to do so. These are consistently gibberish page requests. It's unclear to me whether these are encrypted requests designed to trigger some response by the PHP, or some sort of page request. (Response code 301 tells the bot the page has moved.) PetalBot is allegedly a new search system created by Huawei. There were/are numerous similar hits from urls that purport to be "bingbot" and in fact trace back to Microsoft. AbuseIPdb reports that the bing bot IP addresses are legit, but there are many reports of abuse as well. It may be that the webmasters reporting the abuse are misunderstanding their logs (and in fact are unaware they are hosting hacked CMS systems); or alternatively that the bing bots are themselves compromised. Possibly both, because if bing bots are being tricked into such high intensity scanning of pointless websites, then Microsoft engineers are not paying attention to what their bots are up to. This is a huge waste of internet resources.

I also pointed out to Hostmonster it was a huge waste of their resources and asked why they were not monitoring and automatically blocking this kind of traffic. They seemed disinterested in the problem, thereby becoming part of the problem. So, we can see that professional internet software engineers at hosting companies and professional internet engineers at search engine companies are so disinterested in monitoring obvious intenet misuse that it goes on unimpeded.

After a day of 403 (permission denied), the petalbot seems to have got the idea and gone away. The bing bot(s) have not got the idea and continue to knock on the door (in fact, switching to IPs I have not blocked yet). And now, duckduckgo has joined the party:

"duckduckbot.duckduckgo.com - - [03/Jul/2021:09:09:23 -0700] "GET /ajZibTEwMDFENmJ2YXIxNzE4NUdidmFs HTTP/1.1" 301 263 "-" "'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'"

In addition, for some reason my hosting logs have changed from reporting IPs, to reporting domain names, such as the duckduckbot.duckduckgo.com noted in the above log entry. My hosting company says this occurs when I include a "host block" in my .htaccess file, as explained here. (I have pointed out to them that "Allow/Deny" is deprecated in Apache 2.4: "The Allow, Deny, and Order directives, provided by mod_access_compat, are deprecated and will go away in a future version. You should avoid using them, and avoid outdated tutorials recommending their use.") However, I contend this started happening *before* I added a host block, which I added because I was having trouble obtaining the ip to block. But, then I discovered I had typed an ip as "xxx-xxx-xxx-xxx" instead of "xxx.xxx.xxx.xxx". So, I thought this was the culprit. I ran a test for awhile and *thought* I was getting IPs only, but that turned out not to be true and as of today, I have given up on trying to revert my logs to IP only. I tracked down the apache server manual, and this is configured in the apache logs (but per my hosting company, can possibly be affected by my .htaccess file, as stated above). Something I read somewhere suggests that certain internet users (?) are able to specify the alias that is shown ... you can see how this can go off the tracks, if the hacker can specify a randomly generated alias that can not be backtracked via DNS. (Which is what I am now getting in certain instances. There was a clear "change" on my hosting service somwhere around 2021-07-01 (which coincidentally was shortly after I instituted the aggressive blocking effort described on this page). I have been doing an a lot of research on this -- among other things, I learned that (I think) that the "Allow,Deny" system for .htaccess has been replaced with "Require/Require all." See this. But, I am still studying the document and testing the Require/Require all statments. Meantime, I have not figured out how to block computers that log only as gibberish hostnames that cannot be identified by DNS search.

With respect to the constant barrage of gibberish page requests, I tried searching on a gibberish phrase in the search engine of the bot requesting it, since (if we assume all these search systems are legit bots), the gibberish page should be indexed. I did not get a result if I limited the search to the precise string, eg "+"ajZibTEwMDFENmJ2YXIxNzE4NUdidmFs". However, if I did not limit it and just searched on the string, on duckduckgo the search brought up links to pornography, such as hidden cam forums. On bing, I got a link to images, which brought up a lot of product photos. I wonder if this is an SEO system that embeds gibberish strings in websites and then generates hits to those websites by seeding the gibberish phrases into the search engines?

However, I did not consistently get search results for various strings, so the results I did get may have been entirely random.

2021-07-04 update: Now I am getting attacked from the googlebot, which is probing my blog for links such as "lost password". Did I mention the internet is going to hell? This is not the "GET /[gibberish] attack referred to above.

2021-07-04 later update: Now, oddly enough, I am getting the gibberish search from Yandex. So, to recap, initially gibberish searches from petalbot and bingbot. Seemingly after I blocked these, then it started from duckduckbot; and when I blocked that, then from yandex. Unfortunately, this means I have blocked many of the major search engines and my site will become invisible.

Why I think this is Black Hat SEO

I have been unable to figure out the correct search terms to find much discussion of this particular issue, but what I did come up with took me to a single site that described the hijacking of Joomla websites for the purpose of enhancing SEO for ... whomever.

Interestingly, I did not find any unexpected data in the mysql databases or in other locations on my website. So, my guess is that the hackers php code, itself, generates results that appeal to search engines. Or -- I am wrong. But, alternatively, it's unclear to me what value these constant gibberish inquiries would have to whomever created the hack and is generating the inquiries. Perhaps that is just lack of imagination on my part.

This all does help explain why searches are now often not worth doing. Looking for a restaurant's website? Good luck! You will get Yelp, yellow pages, white pages, white pages, yellow pages, yelp, restuarants of the world, collected menus, etc., etc. What you won't get is the restaurant's website. Whereas, if you enter the name of a relatively unique restaurant and it's city location, any reasonable algorithm would display that company's website as item #1. Or #3, if you concede it is reasonable for the search company to insert a couple of ads in there to pay for its operation. Which I do. But, that would not create insta-billionaires, it would just provide a reasonable income in return for a useful service.

Further point, FYI: 99.999% of the traffic on my website consists of hackers or useless bots whose customers have no real interest in my tiny web page(s). I can pretty much take my raw access logs every day and block all the IPs. You would think hosting companies might want to reduce their traffic and block it across the board. There reason for not doing so (as told to me)? "We might accidentally block something legitimate and have unhappy customers." Oh, sure. In the outside chance this happens, customer service could *unblock* that mistake on request. My hosting company's automation has in fact blocked *my* IP on occasion, so I have to request it be unblocked. There are certain indicators of malevolence that are totally obvious, eg. repeated direct requests for known wordpress directories and php files. The solution, of course, is to sell you more services, such as firewalls and malware scanners. I guess hackers are not a "bug", they're an internet "feature."

Um ... if this is your approach to infrastructure, yeah, it's all going to go to hell. The parasites will consume the host and nothing will work at all.

P.S. - from time to time, I work on my own node.js based search system that is designed to reject all SEO and return only information that is actually useful. But, it's a lot of work. OTOH, even in its total "alpha" state, it's kinda useful. Unless you are looking for social fluff, which I'm not.

Foolishness of placing code in web root

top For convenience of low priced web providers, the major CMS and other php-based code is placed within the web root. This is a terrible coding practice. Historically, even the cgi-bin has been placed within the web root (and I continue to see attacks that are searching for cgi code). This is completely unnecessary, since the server can access and utilize code that is not directly accessible from the web root.

Moving the code out of the web root will not solve the problem of insecure code. It will, however, make it much more difficult to access. Since the cPanel file structure and the file structure of all major CMS software is readily known to hackers, finding and accessing the expoitable code is trivial. So, for instance, a typical attack will consist of a search for a series of known files and directories. Eg,

"wp-login
"administrator"
"wp/wp-login"
"blog/wp-login"
"cms/wp-assets"
"oldsite/wp-assets

and so forth.

There should only be a single file exposed to the internet, called "index.php". This file is not "readable" by the browser, it only delivers what to the browser appears to be a static page. Granted, in widely available open-source software, the files actually linked from the index.php will be "known". But, even here, good coders know how to reduce the security risk of known structure. For instance, the major browsers and email programs place their code and data in folders that have randomly generated gibberish names. The same could and should be done with CMS/php code. Why put the code in a known folder, say "/php-includes", when during the install process it could be named "xy$wRfaa##" or whatever, with the locations then built into the database or re-written into the code during install. So, then we have two layers of security from hackers.r.us.co. First, the folders names are not known to the hacker; and second, the folders are not themselves subdirectories of the domain entry page, nor even accessible from the web.

What this means is that the hacker will have to actually follow the web pages, as fed by the CMS to the browser, in order to arrive at whatever exploit they have identified. This may also be a relatively trivial task, but does two things: the hacker cannot access a piece of code "out of context" of the CMS and inject hacking code directly into it; and it allows the CMS programmers more opportunity to have routines that identify and block probable hacking attempts.

A slightly less secure approach is to make all the subdirectories non-accessible, as though they were not in the root in the first place. This is accomplished with an .htaccess "deny all" statement in each such folder, as is done in the b2 Evolution CMS I was formerly using. (But even with respect to b2 Evolution, I noticed that the standard install provides "sample.htaccess" and not necessarily ".htaccess" locking the subdirectories.) Again, this prevents -- or maybe I should say impedes -- the hacker from accessing /subdirectory/insecure_code.php directly.

Given that these concepts are pretty straight-forward, it's difficult to decide whether failure to do this minimal security action is a "bug" or a "feature" in the eyes of the people maintaining the majority of internet sites. It's been interesting to me that in asking two separate large hosting companies about these issues, the response from tech support has been, "oh, we subcontract all that technical stuff out to 3rd party providers." I was a bit shocked by this response. I would have thought that if you are hosting 1 million+ websites, you might want to have your own engineers maintain the basic structure and security of your servers.

In addition, I can't comprehend why the default .htaccess does not have code to block access that is obviously an exploit attempt. It's trivial to redirect the typical wordpress or anonymousfox scan because the scans are automated and are looking for a set of known directories and files. So, you can redirect to something like this [not the actual page], which has the benefit of accumulating malevolent hackers in one convenient to parse log, separate from hopefully "legitimate" traffic. [I will explain how to do this in a later update.] When you have banned them, you might let them know why like this. Since if you actually WANT to get hacked, it's trivial to REMOVE the blocking, wouldn't it be a better practice to auto-block it in the first place? This would be the equivalent to email hosts providing decent spam filtering for known spam. Moreover, if you are running a server farm and a bot proceeds to knock on 10k websites with the same GET request, it's pretty obvious this is not a friendly IP. So, we have a problem on the scale of phone service providers refusing to provide accurate caller ids and refusing to block obvious robo callers. Again, apparently to the giant corporations that run the internet and the phone service, these are not bugs, they are features.

And, even if the web hosting company does not wish to waste its time securing your site via .htaccess, why doesn't the wordpress or other CMS team do it in a default .htaccess file that installs (or for which you are given the option to install, with an explanation of why) during the installation process? Similar to spam filtering, it should be regularly updated as new exploit scans are identified.

Failure to do this is not trivial. The internet is either run by stupid people or malevolent ones.

Bots that have no reason to be on my website. Do they have a reason to be on yours?

topSo, some of what I have put in this list are "legitimate" bots of one sort or another, but they have no reason to be on may website and have no benefit to me in any way, so should go away. Also, I have lumped into this category cloud services that make it easy for their customers to run malevolent bots. This seems to include VPN providers ... and there seems to be a lot of them. As I block groups of them, the attackers just return from more locations, which I assume are additional VPN servers and networks.

Anything from an amazonaws.com address. The hostname usually includes the ip address. These tend to be virtual machines, which are extremely convenient for hackers and pointless bots to set up multiple "computers" wasting internet bandwidth for useless or malevolent purposes.
Anything from DigitalOcean. They apparently have hosts all over the world.
Expanse. [The real] Expanse was purchased by paloaltonetworks.com, which presents itself as an internet security company working on protection/recovery from things like the Solarwinds hack. It has no legitimate reason to be scanning my website. The expanse that hits my website originates from googlecontent.com in China (per reverse ip), and solicits submitting website domain name and ip if you wish to be excluded. I have no idea whether this is "really" Expanse or a bot that masquerades as Expanse. It somehow evades some of my ip blocking and does not present a blockable user-agent.
Anything from Google cloud, which tends to log as googleusercontent.com. Typically, the hostname provides the ip address, but it is reversed.
Anything from Hetzner Germany
hrankbot. Who cares?
Microsoft cloud including Azure. Generally, these do not provide a hostname and log the ip address. Complaints to cert.microsoft.com seem to work, I have submitted a number of logs that show repeated wordpress attempts and the abuse emanating from the microsoft cloud has reduced considerably. They will claim that the responsible party is their Azure customer, but will not disclose any contact information for the customer. In fairness, though, no other cloud service has been responsive to my complaints.
Opera VPN. Same comments as for TOR, below.
Anything from OVH France, England, Canada or wherever.
Petalbot. Huawei's new search engine. How many computers do these people have? I'm not sure I want to be excluded, but for now, they are massively seeking gibberish websites along with bingbot, so remain banned. At least they aren't trying to evade my ip block, as other s on this list are doing.
[Global Internet Observatory]. From a University in Germany, abusedbip claims it is "white listed" (but with lots of hacking attempt reports). I also get hackers who hostname resolves to IPs on the same server and in the same block.
NetSystemsResearch "NetSystemsResearch studies the availability of various services across the internet."
Sitelock. Sitelock also claims to be a website security company, and once again has no business scanning my website. It presents itself as "placeholder.sitelock.com" to avoid providing its IP and has successfully evaded some of my blocking attempts such as on "sitelock.com" and sitelock's public IP address. 403 won't dissuade it because it then commences trolling my 403.html file.
Sogou. Sogou is a highly persistent search bot for a Chinese search company. Sogou search, being in Chinese, is of no value to me. I'm not sure that I care whether it indexes my site, but it is accessively aggressive, more so than googlebot or bingbot where I would like to have some presence.
Seznam. Seznam is a highly persistent bot that purports to be from a Czech Republic search engine. I don't care one way or the other except like Sogou, it's highly persistent. If these bots are wasting time on *my* site, how much computer power do they have, anyway?
TOR. Naturally if your mission is to make your users anonymous, what do you suppose they will do with your helpful cloaking?
Twingly. Refers to itself as "Internet Recon". As far as I can tell, it is simply a hacking bot, as it has looked for hackable subdirectories I don't have.
There are a bunch of SEO bots, who as far as I know are designed to disappear my website into a morass of spammy search results on all the major search engines, so definitely not my friends and have legitimate basis for trolling my site.

Just think, if I could get all these bots and hackers to go away, I'd have two visitors a day or something -- but at least I would know it was two visitors interested in something on my website. My blocking attempts have reduced some of the vast pointless traffic. Better for me, better for the internet. But, it's amazing how long a bot will keep testing and getting repeated 403 responses before the bot owner will give up and remove my site(s) from its harassment database. And, some hackers double-down, maliciously hitting my 403.html or other page seconds apart. Sort of like "road-rage", they are apparently angry I blocked their hacking probe. Which is childish, as I've done them a favor -- there is no CMS code on my website, so endlessly running wordpress probes on my site isn't going to produce results.

Note: blocking entire blocks of IP addresses from major webhosting sites is not going to block much legitimate viewership, because these typically are not also ISPs providing user internet service. Good bots put a little information on the bot into the log, so we can always open up their ip if we want to "join into" their searching. So, until then the only traffic from these sites is either hacking or pointless bot activity of no benefit to the website owner. I occasionally submit complaints to the hosting company, but since a possibly significant part of their cloud income stream seems to come from users running malevolent bots, they don't do much to shut it down. Just my opinion. Amazon purported to put a stop to (1) of such wordpress scanners - though I get at least two a day from amazonaws. Microsoft "abuse" does not respond to emails. Other host companies have sent me an AI response that they are "looking into" my complaint -- but no resolution or follow-up. So, it becomes "they don't need me and I don't need them" so block them all.

Block and create a log of "pattern" based automated hacking

top We know that someone who is trying to access certain CMS (or other) subdirectories directly is a hacker. So, let's created a log of just these domains and/or IP addresses. To do this on cPanel, we will want to make a subdomain, like this goaway.[yourdomain].com. We don't want to link to this or put it in search engines. It may end up there anyway, but only if a hacker links to the page somewhere that will get crawled by a search bot. So, we want a simple index.html in this subdirectory, which you might actually set up as [yourdomain].com/goaway. The index.html could be empty, or whatever message you would like the hacker to see. Now, let's send them there.

In our main .htaccess file, we need some redirect statements, like this:

WARNING!!! If you start messing with your htaccess file, a simple typo can lock *YOU* (and everyone) out of your website. (I think you will still have cPanel access and can revert to a prior .htaccess file *if you made one before editing it*. Otherwise, you will have the unhappy experience of asking tech support to revert it for you.


                Redirect 301 /2020 https://goaway.[yourdomain].com

                Redirect 301 /2019 https://goaway.[yourdomain].com

                Redirect 301 /autodiscover https://goaway.[yourdomain].com

                Redirect 301 /admin https://goaway.[yourdomain].com

                Redirect 301 /administrator https://goaway.[yourdomain].com

                Redirect 301 /backup https://goaway.[yourdomain].com

                Redirect 301 /blog https://goaway.[yourdomain].com

                Redirect 301 /b2evo1/htsrv https://goaway.[yourdomain].com

                Redirect 301 /cms https://goaway.[yourdomain].com

                Redirect 301 /.env https://goaway.[yourdomain].com

                Redirect 301 /file https://goaway.[yourdomain].com

                Redirect 301 /forum https://goaway.[yourdomain].com

                Redirect 301 /.git https://goaway.[yourdomain].com

                Redirect 301 /new https://goaway.[yourdomain].com

                Redirect 301 /news https://goaway.[yourdomain].com

                Redirect 301 /old https://goaway.[yourdomain].com

                Redirect 301 /oldsite https://goaway.[yourdomain].com

                Redirect 301 /shop https://goaway.[yourdomain].com

                Redirect 301 /site https://goaway.[yourdomain].com

                Redirect 301 /sito https://goaway.[yourdomain].com

                Redirect 301 /temp https://goaway.[yourdomain].com

                Redirect 301 /test https://goaway.[yourdomain].com

                Redirect 301 /web https://goaway.[yourdomain].com

                Redirect 301 /website https://goaway.[yourdomain].com

                Redirect 301 /wordpress https://goaway.[yourdomain].com

                Redirect 301 /wp-admin https://goaway.[yourdomain].com

                Redirect 301 /wp https://goaway.[yourdomain].com

                Redirect 301 /wp1 https://goaway.[yourdomain].com

                Redirect 301 /wp2 https://goaway.[yourdomain].com

                Redirect 301 /wp-content https://goaway.[yourdomain].com

                Redirect 301 /wp-content.php https://goaway.[yourdomain].com

                Redirect 301 /wp-login https://goaway.[yourdomain].com

                Redirect 301 /wp-login.php https://goaway.[yourdomain].com

                Redirect 301 /wp-includes https://goaway.[yourdomain].com

                Redirect 301 /xmlrpc.php https://goaway.[yourdomain].com

If we have set this up correctly, hackers asking for these typical CMS/git/forum/files subdirectories will be redirected to goaway.[yourdomain].com; and if you have set up the subdomain correctly in cPanel, you will get a separate raw access log just for "goaway". Any IP or host in the new raw access log list either tried to hack you, or tried to access your "goaway" page, where they have no business being. Warning: if you look at this page for testing purposes, your ip or hostname will be in the raw access log, so if you cut and paste it to your .htaccess list, you will find that you have locked yourself out of your own website. Been there, done that! But, once you know they are hackers, it would be a good idea to block them when they come back, like this:


            RequireAll

                    Require all granted

                    Require not ip 5.189.239.157

            #comment: will you annoying bots, please just go away?

                    Require not host sitelock.com mail.semanticsystems.com seostar.com netcraft.com digitalocean fbsv.net semrush okitup.net aachen twingly boardwalkhat ohris network-crm

            /RequireAll

Note: RequireAll and /RequireAll (above) must be bracketed by <>, but I don't know how to code html for the <> to display! I put comments in my .htaccess file so I can remember why I blocked something, in case I change my mind. The comments don't display anywhere or do anything. You could also use the language "Deny from [IP], which you will see in my sample block list. However, as of apache 2.4 that language is deprecated and slated to be removed (so I am trying to update my approach). See apache access control documentation here. Further Note: in my opinion/experience not all apache directives actually work. In theory, it should be possible to "Require not host colocrossing" or "Require not host M247", but somehow these haven't worked well. I did read that for hostname blocks to work, apache cross-checks the hostname via whois or something and the block won't work if the cross-check does not match up the hostname and the IP. So, maybe these are masquerading hostnames when the block doesn't work. I've tried multiple different block techniques on the worst offenders, but have not been 100% successful. It does appear that when I successfully block a specific host/route/useragent/etc., hackers regroup and attack in a different manner. There are certain "tells" in the logs that suggest it's the same hacker(s) trying again. It's not uncommon for a hacker that has been redirected to one of my honeypot pages to come back from a different IP and make a direct request for the honeypot page to look at it again. One hacker displayed a user agent of "virustotal", which I blocked, so came back from the same ip with user agent "VT". Apparently, vanity requires hackers to sign their work.

The pointless robots.txt file

I have made two observations (a) virtually no bot honors the statements in robots.txt. It's sort of like the original protocol requiring only plain text in email to keep bandwidth down. Hahahaha. Who honors that? Most email users probably do not even know what a plain text email looks like. (b) apache apparently does not bother to block requests for robots.txt. I repeatedly observed in my logs "blocked" ip addresses returning a "200" code when looking for robots.txt. So, robots.txt has effectively become a bot verification tool to determine whether a subdirectory exists or not. At first I did not know what to do about this, then I removed my robots.txt file. I have not observed any particular downside to this, and a considerable upside.

Another puzzle that I can't solve is that some "blocked" IPs seem to receive a 200 response, even around a password protected directory that gives everyone else a 401. But, when I look at the file size, it does not match up with the file(s) that would be delivered to a non-blocked IP. My best, totally uninformed, guess is that there is some sort of query that receives a response from the server root and not my personal web directory structure. I don't know if this is true, but the file size fed to the blocked IPs that receive a 200 response are consistently the same size, irrespective of which domain or subdirectory they have queried.

Some blocks send a 500 response and I did notice a log entry one time that seemed to be a bingbot query seeking to know if the domain had been permanently removed. If only the answer had been "yes"! I tried to use the webmaster bing tools to remove my website from bingbot search and received an automated response that my domain is "important" to the internet and so would not be removed. Pretty funny considering there is absolutely nothing on the domain at all except for a honeypot trap. (But it *is* the place bingbot is constantly looking for fake pages with long gibberish names.) I also filed a cert.microsoft.com complaint about this, but they did not respond.

Let's Look at a Distributed Attack

top So, I think what we see here is a distributed attack. These log entries are clustered together in time, and together go through a standard set of CMS hacking inquiries, but each query comes from a different IP in a different geolocation In some instances, the hacker is possibly using multiple public VPN services. As I add more IP ranges to my block list, it seems apparent that an number of the web providers with global presence are running subscription vpn services. Attacks also come from the TOR network. On some level I support the concept of anonymity to get around censorship. But, it's pretty difficult to support the concept of VPNs, when apparently the primary use is to attack/hack people's websites. That would sort of be the opposite of an "open exchange of ideas". Conundrum.

In any event, in what I call "distributed attacks", each IP reports the same iPhone user agent. What would be the odds of these disparate IPs hitting my site at approximately the same time, sequentially querying CMS files/subdirectories and all reporting precisely the same user agent? The hackers are messing with us, because two are located in Virginia and one, 168.151.192.220 resolves as, "intelligence network inc." in Ashburn VA. But, on the other coast, 185.246.173.104 resolves as "illuminati network" in San Franciso. Fun joke, but someone had to spend money setting up hosting machines with cute names for the paranoid. It's reasonably likely that some or all of the sites used in the coordinated attack are compromised sites and the owner is unaware of the activity. (See AnonymousFox and Black Hat SEO" above.) So, we really don't know "who to blame" in these attacks.


                168.151.192.220 - - [04/Jul/2021:17:44:40 -0700] "GET /login.php HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                102-129-128-97.quadranthosting.com - - [04/Jul/2021:17:44:40 -0700] "GET / HTTP/2.0" 200 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                181.214.179.158 - - [04/Jul/2021:17:44:40 -0700] "GET /us/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                64.137.108.75 - - [04/Jul/2021:17:44:40 -0700] "GET /home/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                181.214.151.135 - - [04/Jul/2021:17:44:40 -0700] "GET /ca/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                network-crm.com - - [04/Jul/2021:17:46:43 -0700] "GET /en/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                168.151.133.64 - - [04/Jul/2021:17:49:36 -0700] "GET /us/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                ohris.boardwalkhat.com - - [04/Jul/2021:17:49:36 -0700] "GET /ca/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                168.151.150.226 - - [04/Jul/2021:17:49:36 -0700] "GET /login.php HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                185.246.173.104 - - [04/Jul/2021:17:49:37 -0700] "GET /en/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                191.96.81.67 - - [04/Jul/2021:17:49:38 -0700] "GET /home/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

                64.137.41.141 - - [04/Jul/2021:17:49:40 -0700] "GET / HTTP/2.0" 200 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"

Dual Attacks

top I don't know what to make of this: I host an exe file that is useful for restoring the prom software for some older Cypress-based external harddrives. I consistently get dual GETs for this file, from two geographic areas and IPs, simultaneously.

Cloud Enterprise is a major PART OF THE PROBLEM (aka, that's not a bug, it's a feature)

top I have blocked literally 10s of thousands of "cloud service" IPs. While it is possible some google-wanna-be startup, with a motto like "don't be evil" or some such might have a valid reason for visiting my website, the reality is that with the exception of responsible web search indexes (is that an oxymoron?), there is no reason any cloud service/virtual machine should be visiting my website. So, no loss from blocking them at all.

Google, for instance, assured me that the complaint I filed was not from an IP hosted on Google Cloud -- perhaps true, as this hostname response may be spoofed: "47.160.245.35.bc.googleusercontent.com". But, it resolves to 35.245.160.47/Google LLC data center in abuseipdb.com's database; and that IP is definitely Google Cloud, per WHOIS. If this is some other entity spoofing google's cloud addresses, then the WHOIS system is way too lax. And, Google has substantial input into the internet naming system, whereas I ... don't. (Fortunately, it is possible to block "googleusercontent", whoever that is. And, the relevant Google cloud IPs. But, that does not do away with the constant barrage, even if my server repeatedly informs the spider "403".

Amazon, whose amazonaws cloud has attacked my website repeatedly and from all over the world, responded to my (one) complaint as follows: "This is a follow up regarding the abusive content or activity report that you submitted to AWS. We have investigated this report, and have taken steps to mitigate the reported abusive content or activity." Which is nice, but that is one out of multiple daily attacks.

Cloud platforms such as Azure actively hide the "bad actors" from disclosure, since website logs will only show the microsoft cloud IP and not information about the actual wrongdoer. But, when a complaint is filed, Microsoft refuses to disclose their "customer" for "privacy reason" -- even while stating that Microsoft is not responsible because its customer is the bad actor. How convenient for both. Microsoft (and all cloud platforms, including VPNs) could block clearly malevolent actions originating from its service AT THE SOURCE. The overall reduction in wasted internet resources would be enormous, along with reduction of the huge economic losses inflicted in websites that get hacked. The only recourse is to block all cloud services unless a specific IP demonstrates that it has a valid reason for querying our websites -- and one that is in fact useful to the web owner. Even then, since the cloud service is cloaking the company that purportedly offers a valid reason to address our website, we can never be sure exactly who is in fact behind the queries. (So, the requirement needs to be that the end-client get itself a valid domain name and IP that is disclosed in our logs and NOT a generic Microsoft one.) Of course, part of the issue is that many of these hacking attempts are originating from government entities, since it is public knowledge that Microsoft, Amazon and others have massive cloud services contracts with the government. In fact there is an explicit "clue" in the Microsoft response to complaints: sometimes the case-closed response says they have taken action. Sometimes it is an automated reply that simply says they have no control over Azure clients. Since the terms of Azure service no doubt prohibit illegal activity, the only conclusion must be that the exact same wordpress attack coming from a government Azure server is neither "illegal" nor a violation of the Azure terms of service. It's really a significant question how much of the hacking traffic is the instrumentality of one government or another. Why these bots end up on my site is anyone's guess ... since there is nothing on my website that is controversial or of any particular interest. The only possible purpose of running CMS attacks on my website, then, would be to hijack my website for nefarious purposes, "because we can." AKA, "that's not a bug, it's a feature."

But, it's reducing the internet to an inhospitable morass. My latest approach is moving toward "blocking all" and finding some method of authenticating actual people who might be interested in reading things I write (aka, "no one" -- unless perhaps you are reading this ...). I've implemented this approach on one of my subdomains to see what happens. I've included a self-logging 403 response page, because manually entering the daily hacker info has become a chore that is no longer educational or entertaining.

Let's look at an actual Microsoft response to an abuse complaint:

img src="cid" alt="Microsoft" width="74" height="16"> img src="cid:sys_attachment.dosys_iddf04784829c33300460ada3061b98a23@SNC.97bbe00f178e922a"

Cyber Defense Operations Center

Computer Emergency Response Team

Case Closure Notification

Case information

SIR4629650

This message is to notify you that the Computer Emergency Response Team has reviewed your reported issue and has actioned it appropriately.

The activity reported is associated with a customer account within the Microsoft Azure service. Microsoft Azure provides a cloud computing platform in which customers can deploy their own software applications. Customers, not Microsoft, control what applications are deployed on their account.

The specific details of this case will not be provided in accordance with our customer privacy policy. Microsoft is continuously refining its systems to detect and prevent abuse of its online services. For more information about Microsoft Azure, visit https://azure.microsoft.com/en-us.

Thank you,

Computer Emergency Response Team

[XXXX]@microsoft.com

About this notice

This notification is part of the Microsoft Computer Emergency Response Team standard operating procedure for handling reports of suspected abuse. Reports of suspected abuse can be filed directly at http://cert.microsoft.com. Microsoft respects your privacy. Please read our Privacy Statement: http://www.microsoft.com/privacystatement/en-us/OnlineServices/Default.aspx.

One Microsoft Way, Redmond, WA 98052 USA

Ref:MSG1681337_EgGsxrmsFfxLcRHjvlyU

A couple of additional points about this response: (1) I received the response in less than 5 minutes after [laboriously] entering my complaint into the Microsoft abuse online form. The only reasonable conclusions are that Microsoft receives enough of these complaints to have set up an automated response for complaints regarding the subject IP; and ... they did not actually contact the offending "customer" for an explanation. That is despite their own website Azure guidance that attacks (such as the wordpress attack I sent them evidence of) are a violation of their Azure terms of service. Unless, of course, they deem it "not illegal" and "not a violation of the terms of service" because their customer is in fact the US gov't... (2) It sets out an email address for response that will itself respond that the email address is no longer monitored as of March 1, 2021, which their team previously directed me to. The online form, BTW, requires way more information than is reasonably necessary since the raw logs speak for themselves. There is an "API" for bulk submitters, but little guys like me don't have the capability to use APIs. So, the intent is apparently to dissuade us from filing complaints in the first place. Moreover, they enable hackers to hack anonymously, but require webmasters to fully identify themselves in order to complain about the anonymous hacking emanating from their servers. Seems a bit unfairly asymmetrical to me. Protect the crooks, out the good guys (to the crooks). Frankly, their logs speak for themselves and don't require any "identifying" of the webmaster. The hack either occurred or it didn't. If it did, Microsoft outgoing logs should reveal it. If it didn't happen, there will be no corresponding outgoing log. Unless, of course, Microsoft does not bother to log outgoing traffic ....

Script to [more] easily examine access logs

top


            #!/bin/bash

                gzip -d *

                grep -v "[your hostname]" * | grep -v "[your ip]" >  grep1.txt

                grep '" 200' grep1.txt > grep200.txt

                grep '" 403' grep1.txt > grep403.txt

                grep '" 500' grep1.txt > grep500.txt

                grep -v '" 200' grep1.txt | grep -v '" 403' | grep -v '" 500' > grepOther.txt

                kate grep200.txt

                kate grep403.txt

                kate grep500.txt

                kate grepOther.txt

                exit 0

Place this script in ~/bin so it will execute from wherever. Name it "greppy" [or whatever you want]. Make an empty folder and move an "access_log_xxxxxx.gz" file into it from cpanel. Open a console *in the directory* and run "greppy". The accesslog will unzip, the gz file will delete, and then kate will open with tabs showing a log of your "200" responses, your "403" responses, your "500" responses and "everything else."

Sequentially, the script extracts the log file an deletes the archive; extracts all the data *except* your accesses to the site; extracts the 200s; extracts the 403s; extracts the 500s; extracts everything else; and then opens each of the .txt files in kate. (For some reason, if I do not already have kate open, it opens each txt file one at a time and pauses until I close kate, then displaying the next txt file.)

When done, empty the directory (rm *) and download another log archive into the directory for processing.

CMS, cPanel and it all goes to hell