So it commences ...
If infrastructure is made of fragile materials, then
infrastructure inevitably crumbles. Everything runs on infrastructure,
so everything inevitably crumbles.
It is claimed that 85%+ of internet sites use a CMS package
(Wordpress, Joomla, Drupal, etc.). Security holes are constantly found
in the thousands of lines of code that make up these packages. I claim
[unscientifically, based on personal experience] that 93% of web site
owners/managers are inattentive and incompetent. Yes, I include myself
in that category.
cPanel programmers and Hostmonster, Bluehost and other large
hosting companies that use it for $4.95/mo websites claim it is the
customer's obligation to keep their CMS installs "secure" and if
insecurities and cPanel features lead to hacking of your site, that is
your fault. And actually, not a bug but a feature of cPanel. Notably,
these companies encourage and provide easy tools for creating, for
instance, quick Wordpress sites.
Since (I am guessing) somewhere around 2018, one of my
domains had apparently been used for "black hat SEO" as far as I can
make out. There have been [possibly legit] hits every second or less of
every day. You would have thought I would get some sort of warning
regarding excess bandwidth use, or *something* from my hosting company
while this was going on. But no.
Sometime, I don't know when, my site was also "anonymousfox" hacked.
Let me share what I know about these events:
AnonymousFox
top A couple of weeks ago (as of
2021-06-30), I became aware that a mostly idle site I had on
Hostmonster.com had been hit with the anonymousfox. It may have been
long-standing or it may have been right around the moment I became aware
of the hack. My reasoning is that I still had admin control of my site,
so *possibly* the anonymousfox hack was going forward as I became aware
of it.
I did not know anything about anonymousfox, other than it
brands the hacked site by creating an "anonymousfox-[random characters]"
email and user; and an "smtpfox-[random characters]" email and user.
So, after changing the passwords for these accounts, I did a websearch
regarding "anonymousfox". I learned:
- This has been an ongoing hack on Wordpress and other CMS websites for a number of years.
- There is a website that provides downloadable anonymousfox hacking software and tutorials
- Based on my logs, it is possible to launch the attack
directly from the anonymousfox website, as I will see hack attempts that
are not "direct", but are referred from the anonymousfox website.
- The hacker(s) can take complete control of the user's website via cPanel
- The hack embeds hidden php files in various directories that give the hacker(s) access after any effort to clean up the site
- The hosting company can, upon request, run a malware scan that finds some (or all?) of the php malware.
I always assumed that CMS installs such as Wordpress were
security risks. However, I assumed at worst, the hacker would compromise
the CMS website. I did not understand or anticipate that the hacker
could obtain console level access and the ability to create and hijack
cPanel accounts, including the admin cPanel login.
How the hack went forward
- My guess is that the anonymousfox tool accomplishes
certain things in an automated fashion: scanning for domains with
vulnerable subdirectories, launching the hack, creating accounts and
deploying software.
- Anonymousfox changes the file .contactemail, which
contains the user reset email link. In my case, this file contained only
my admin email address. By changing the file, anonymousfox redirected
any password reset code to the new anonymousfox email address it
created. And, no reset warning will go to the original contact.
(Presumably, anonymousfox phones home with the login information to that
newly created email account so that the hacker can retrieve the reset
email.) This is possible because cPanel stores the reset email in plain
text. I consider this a serious cPanel flaw, but Hostmonster does not.
- Anonymousfox is able to retrieve the admin user name
from cPanel. It's not clear to me how this is done or where it is
stored. Hostmonster tech support assured me that I must have shared my
admin user id with *someone on my team* -- but I have no team, nor have I
ever provided anyone with access to my admin account. Nor does anyone
have access to any of my computers (unless of course, they are also
"hacked", which I suppose is a possibility). But, I think anonymousfox
simply extracted it from the cPanel or directory structure. As far as I
know, the anonymousfox hacker never logged into my cPanel, nor did it
appear from the logs shared by Hostmonster that anyone other than myself
(or Hostmonster tech) ever logged into my cPanel. I think accessibility of the admin user name is a cPanel flaw.
- Apparently, anonymousfox can trigger via a console
command sending of reset email. So, the hacker's ip address does not
show up in website logs AND the reset email with the reset code states
that the URL from which the reset code has been requested is 127.0.0.1
(localhost). I know this because I intercepted the reset email before anonymousfox retrieved it. I consider this a serious cPanel flaw, but Hostmonster does not.
- Anonymousfox created for itself several user accounts, including email, webdisk and root level ftp access.
The "Black SEO Hack"
top I don't know whether this was an
anonymousfox related hack or not. Around 2014, I changed a main domain
entry page from a Joomla index.php to a static index.html page,
intending my site to be "sleeping". Based on file dates, around
September of 2018, my static page was copied into an index.php file that
contained encrypted php code. The result was that anyone specifically
looking at my website (eg., ME), would see what was expected -- a note
regarding a sleeping website. On the other hand, a request for any
subpage would result in -- I assume -- execution of the php code. Here
is a typical log entry - and the page following the "GET" represents
what the bot was looking for:
"114.119.138.177 - - [28/Jun/2021:05:07:07 -0700] "GET
/0ft4452jq1052Gf16 HTTP/1.1" 301 248 "-" "Mozilla/5.0 (Linux; Android
7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36
(compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
It's unclear to me whether this is a legitimate "PetalBot"
search engine bot or part of the hacking system. I was receiving
inquiries every second or less 24/7, and to some extent continue to do
so. These are consistently gibberish page requests. It's unclear to me
whether these are encrypted requests designed to trigger some response
by the PHP, or some sort of page request. (Response code 301 tells the
bot the page has moved.) PetalBot is allegedly a new search system
created by Huawei. There were/are numerous similar hits from urls that
purport to be "bingbot" and in fact trace back to Microsoft. AbuseIPdb
reports that the bing bot IP addresses are legit, but there are many
reports of abuse as well. It may be that the webmasters reporting the
abuse are misunderstanding their logs (and in fact are unaware they are
hosting hacked CMS systems); or alternatively that the bing bots are
themselves compromised. Possibly both, because if bing bots are being
tricked into such high intensity scanning of pointless websites, then
Microsoft engineers are not paying attention to what their bots are up
to. This is a huge waste of internet resources.
I also pointed out to Hostmonster it was a huge waste of
their resources and asked why they were not monitoring and automatically
blocking this kind of traffic. They seemed disinterested in the
problem, thereby becoming part of the problem. So, we can see that
professional internet software engineers at hosting companies and
professional internet engineers at search engine companies are so
disinterested in monitoring obvious intenet misuse that it goes on
unimpeded.
After a day of 403 (permission denied), the petalbot seems
to have got the idea and gone away. The bing bot(s) have not got the
idea and continue to knock on the door (in fact, switching to IPs I have
not blocked yet). And now, duckduckgo has joined the party:
"duckduckbot.duckduckgo.com - - [03/Jul/2021:09:09:23 -0700]
"GET /ajZibTEwMDFENmJ2YXIxNzE4NUdidmFs HTTP/1.1" 301 263 "-"
"'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1;
https://duckduckgo.com/duckduckbot)'"
In addition, for some reason my hosting logs have changed
from reporting IPs, to reporting domain names, such as the
duckduckbot.duckduckgo.com noted in the above log entry. My hosting
company says this occurs when I include a "host block" in my .htaccess
file, as explained
here. (I have pointed out to them that "Allow/Deny" is deprecated in Apache 2.4:
"The
Allow, Deny, and Order directives, provided by mod_access_compat, are
deprecated and will go away in a future version. You should avoid using
them, and avoid outdated tutorials recommending their use.")
However, I contend this started happening *before* I added a host block,
which I added because I was having trouble obtaining the ip to block.
But, then I discovered I had typed an ip as "xxx-xxx-xxx-xxx" instead of
"xxx.xxx.xxx.xxx". So, I thought this was the culprit. I ran a test
for awhile and *thought* I was getting IPs only, but that turned out not
to be true and as of today, I have given up on trying to revert my logs
to IP only. I tracked down the apache server manual, and this is
configured in the apache logs (but per my hosting company, can possibly
be affected by my .htaccess file, as stated above). Something I read
somewhere suggests that certain internet users (?) are able to specify
the alias that is shown ... you can see how this can go off the tracks,
if the hacker can specify a randomly generated alias that can not be
backtracked via DNS. (Which is what I am now getting in certain
instances. There was a clear "change" on my hosting service somwhere
around 2021-07-01 (which coincidentally was shortly after I instituted
the aggressive blocking effort described on this page). I have been
doing an a lot of research on this -- among other things, I learned that
(I think) that the "Allow,Deny" system for .htaccess has been replaced
with "Require/Require all." See
this.
But, I am still studying the document and testing the Require/Require
all statments. Meantime, I have not figured out how to block computers
that log only as gibberish hostnames that cannot be identified by DNS
search.
With respect to the constant barrage of gibberish page
requests, I tried searching on a gibberish phrase in the search engine
of the bot requesting it, since (if we assume all these search systems
are legit bots), the gibberish page should be indexed. I did not get a
result if I limited the search to the precise string, eg
"+"ajZibTEwMDFENmJ2YXIxNzE4NUdidmFs". However, if I did not limit it and
just searched on the string, on duckduckgo the search brought up links
to pornography, such as hidden cam forums. On bing, I got a link to
images, which brought up a lot of product photos. I wonder if this is an
SEO system that embeds gibberish strings in websites and then generates
hits to those websites by seeding the gibberish phrases into the search
engines?
However, I did not consistently get search results for
various strings, so the results I did get may have been entirely random.
2021-07-04 update: Now I am getting attacked from the
googlebot, which is probing my blog for links such as "lost password".
Did I mention the internet is going to hell? This is not the "GET
/[gibberish] attack referred to above.
2021-07-04 later update: Now, oddly enough, I am getting the
gibberish search from Yandex. So, to recap, initially gibberish
searches from petalbot and bingbot. Seemingly after I blocked these,
then it started from duckduckbot; and when I blocked that, then from
yandex.
Unfortunately, this means I have blocked many of the major search engines and my site will become invisible.
Why I think this is Black Hat SEO
I have been unable to figure out the correct search terms to
find much discussion of this particular issue, but what I did come up
with took me to a single site that described the hijacking of Joomla
websites for the purpose of enhancing SEO for ... whomever.
Interestingly, I did not find any unexpected data in the
mysql databases or in other locations on my website. So, my guess is
that the hackers php code, itself, generates results that appeal to
search engines. Or -- I am wrong. But, alternatively, it's unclear to
me what value these constant gibberish inquiries would have to whomever
created the hack and is generating the inquiries. Perhaps that is just
lack of imagination on my part.
This all does help explain why searches are now often not
worth doing. Looking for a restaurant's website? Good luck! You will
get Yelp, yellow pages, white pages, white pages, yellow pages, yelp,
restuarants of the world, collected menus, etc., etc. What you won't
get is the restaurant's website. Whereas, if you enter the name of a
relatively unique restaurant and it's city location, any reasonable
algorithm would display that company's website as item #1. Or #3, if
you concede it is reasonable for the search company to insert a couple
of ads in there to pay for its operation. Which I do. But, that would
not create insta-billionaires, it would just provide a reasonable income
in return for a useful service.
Further point, FYI: 99.999% of the traffic on my website
consists of hackers or useless bots whose customers have no real
interest in my tiny web page(s). I can pretty much take my raw access
logs every day and block all the IPs. You would think hosting companies
might want to reduce their traffic and block it across the board. There
reason for not doing so (as told to me)? "We might accidentally block
something legitimate and have unhappy customers." Oh, sure. In the
outside chance this happens, customer service could *unblock* that
mistake on request. My hosting company's automation has in fact blocked
*my* IP on occasion, so I have to request it be unblocked. There are
certain indicators of malevolence that are totally obvious, eg. repeated
direct requests for known wordpress directories and php files. The
solution, of course, is to sell you more services, such as firewalls and
malware scanners. I guess hackers are not a "bug", they're an internet
"feature."
Um ... if this is your approach to infrastructure, yeah,
it's all going to go to hell. The parasites will consume the host and
nothing will work at all.
P.S. - from time to time, I work on my own node.js based
search system that is designed to reject all SEO and return only
information that is actually useful. But, it's a lot of work. OTOH,
even in its total "alpha" state, it's kinda useful. Unless you are
looking for social fluff, which I'm not.
Foolishness of placing code in web root
top For convenience of low priced web
providers, the major CMS and other php-based code is placed within the
web root. This is a terrible coding practice. Historically, even the
cgi-bin has been placed within the web root (and I continue to see
attacks that are searching for cgi code). This is completely
unnecessary, since the server can access and utilize code that is not
directly accessible from the web root.
Moving the code out of the web root will not solve the
problem of insecure code. It will, however, make it much more difficult
to access. Since the cPanel file structure and the file structure of
all major CMS software is readily known to hackers, finding and
accessing the expoitable code is trivial. So, for instance, a typical
attack will consist of a search for a series of known files and
directories. Eg,
- "wp-login
- "administrator"
- "wp/wp-login"
- "blog/wp-login"
- "cms/wp-assets"
- "oldsite/wp-assets
and so forth.
There should only be a single file exposed to the internet,
called "index.php". This file is not "readable" by the browser, it only
delivers what to the browser appears to be a static page. Granted, in
widely available open-source software, the files actually linked from
the index.php will be "known". But, even here, good coders know how to
reduce the security risk of known structure. For instance, the major
browsers and email programs place their code and data in folders that
have randomly generated gibberish names. The same could and should be
done with CMS/php code. Why put the code in a known folder, say
"/php-includes", when during the install process it could be named
"xy$wRfaa##" or whatever, with the locations then built into the
database or re-written into the code during install. So, then we have
two layers of security from hackers.r.us.co. First, the folders names
are not known to the hacker; and second, the folders are not themselves
subdirectories of the domain entry page, nor even accessible from the
web.
What this means is that the hacker will have to actually
follow the web pages, as fed by the CMS to the browser, in order to
arrive at whatever exploit they have identified. This may also be a
relatively trivial task, but does two things: the hacker cannot access a
piece of code "out of context" of the CMS and inject hacking code
directly into it; and it allows the CMS programmers more opportunity to
have routines that identify and block probable hacking attempts.
A slightly less secure approach is to make all the
subdirectories non-accessible, as though they were not in the root in
the first place. This is accomplished with an .htaccess "deny all"
statement in each such folder, as is done in the b2 Evolution CMS I was
formerly using. (But even with respect to b2 Evolution, I noticed that
the standard install provides "sample.htaccess" and not necessarily
".htaccess" locking the subdirectories.) Again, this prevents -- or
maybe I should say impedes -- the hacker from accessing
/subdirectory/insecure_code.php directly.
Given that these concepts are pretty straight-forward, it's
difficult to decide whether failure to do this minimal security action
is a "bug" or a "feature" in the eyes of the people maintaining the
majority of internet sites. It's been interesting to me that in asking
two separate large hosting companies about these issues, the response
from tech support has been, "oh, we subcontract all that technical stuff
out to 3rd party providers."
I was a bit shocked by this response. I
would have thought that if you are hosting 1 million+ websites, you
might want to have your own engineers maintain the basic structure and
security of your servers.
In addition, I can't comprehend why the default .htaccess
does not have code to block access that is obviously an exploit attempt.
It's trivial to redirect the typical wordpress or anonymousfox scan
because the scans are automated and are looking for a set of known
directories and files. So, you can redirect to something like
this
[not the actual page], which has the benefit of accumulating malevolent
hackers in one convenient to parse log, separate from hopefully
"legitimate" traffic. [I will explain how to do this in a later update.]
When you have banned them, you might let them know why like
this. Since
if you actually WANT to get hacked, it's trivial to REMOVE the blocking,
wouldn't it be a better practice to auto-block it in the first place?
This would be the equivalent to email hosts providing decent spam
filtering for known spam. Moreover, if you are running a server farm
and a bot proceeds to knock on 10k websites with the same GET request,
it's pretty obvious this is not a friendly IP. So, we have a problem on
the scale of phone service providers refusing to provide accurate
caller ids and refusing to block obvious robo callers. Again,
apparently to the giant corporations that run the internet and the phone
service, these are not bugs, they are features.
And, even if the web hosting company does not wish to waste
its time securing your site via .htaccess, why doesn't the wordpress or
other CMS team do it in a default .htaccess file that installs (or for
which you are given the option to install, with an explanation of why)
during the installation process? Similar to spam filtering, it should be
regularly updated as new exploit scans are identified.
Failure to do this is not trivial. The internet is either run by stupid people or malevolent ones.
Bots that have no reason to be on my website. Do they have a reason to be on yours?
topSo, some of what I have put in
this list are "legitimate" bots of one sort or another, but they have no
reason to be on may website and have no benefit to me in any way, so
should go away. Also, I have lumped into this category cloud services
that make it easy for their customers to run malevolent bots. This seems
to include VPN providers ... and there seems to be a lot of them. As I
block groups of them, the attackers just return from more locations,
which I assume are additional VPN servers and networks.
- Anything from an amazonaws.com address. The
hostname usually includes the ip address. These tend to be virtual
machines, which are extremely convenient for hackers and pointless bots
to set up multiple "computers" wasting internet bandwidth for useless or
malevolent purposes.
- Anything from DigitalOcean. They apparently have hosts all over the world.
- Expanse. [The real] Expanse was purchased by
paloaltonetworks.com, which presents itself as an internet security
company working on protection/recovery from things like the Solarwinds
hack. It has no legitimate reason to be scanning my website. The expanse
that hits my website originates from googlecontent.com in China (per
reverse ip), and solicits submitting website domain name and ip if you
wish to be excluded. I have no idea whether this is "really" Expanse or
a bot that masquerades as Expanse. It somehow evades some of my ip
blocking and does not present a blockable user-agent.
- Anything from Google cloud, which tends to log as googleusercontent.com. Typically, the hostname provides the ip address, but it is reversed.
- Anything from Hetzner Germany
- hrankbot. Who cares?
- Microsoft cloud including Azure. Generally,
these do not provide a hostname and log the ip address. Complaints to
cert.microsoft.com seem to work, I have submitted a number of logs that
show repeated wordpress attempts
and the abuse emanating from the microsoft cloud has
reduced considerably. They will claim that the responsible party is
their Azure customer, but will not disclose any contact information for
the customer. In fairness, though, no other cloud
service has been responsive to my complaints.
- Opera VPN. Same comments as for TOR, below.
- Anything from OVH France, England, Canada or wherever.
- Petalbot. Huawei's new search engine. How
many computers do these people have? I'm not sure I want to be excluded,
but for now, they are massively seeking gibberish websites along with
bingbot, so remain banned. At least they aren't trying to evade my ip
block, as other s on this list are doing.
- [Global Internet Observatory]. From a
University in Germany, abusedbip claims it is "white listed" (but with
lots of hacking attempt reports). I also get hackers who hostname
resolves to IPs on the same server and in the same block.
- NetSystemsResearch "NetSystemsResearch studies the availability of various services across the internet."
- Sitelock. Sitelock also claims to be a
website security company, and once again has no business scanning my
website. It presents itself as "placeholder.sitelock.com" to avoid
providing its IP and has successfully evaded some of my blocking
attempts such as on "sitelock.com" and sitelock's public IP address. 403
won't dissuade it because it then commences trolling my 403.html file.
- Sogou. Sogou is a highly persistent search
bot for a Chinese search company. Sogou search, being in Chinese, is of
no value to me. I'm not sure that I care whether it indexes my site, but
it is accessively aggressive, more so than googlebot or bingbot where I
would like to have some presence.
- Seznam. Seznam is a highly persistent bot
that purports to be from a Czech Republic search engine. I don't care
one way or the other except like Sogou, it's highly persistent. If
these bots are wasting time on *my* site, how much computer power do
they have, anyway?
- TOR. Naturally if your mission is to make your users anonymous, what do you suppose they will do with your helpful cloaking?
- Twingly. Refers to itself as "Internet
Recon". As far as I can tell, it is simply a hacking bot, as it has
looked for hackable subdirectories I don't have.
- There are a bunch of SEO bots, who as far as I know
are designed to disappear my website into a morass of spammy search
results on all the major search engines, so definitely not my friends
and have legitimate basis for trolling my site.
Just think, if I could get all these bots and hackers to go
away, I'd have two visitors a day or something -- but at least I would
know it was two visitors interested in something on my website. My
blocking attempts have reduced some of the vast pointless traffic.
Better for me, better for the internet. But, it's amazing how long a
bot will keep testing and getting repeated 403 responses before the bot
owner will give up and remove my site(s) from its harassment database.
And, some hackers double-down, maliciously hitting my 403.html or other
page seconds apart. Sort of like "road-rage", they are apparently angry
I blocked their hacking probe. Which is childish, as I've done them a
favor -- there is no CMS code on my website, so endlessly running
wordpress probes on my site isn't going to produce results.
Note:
blocking entire blocks of IP addresses from major webhosting sites is
not going to block much legitimate viewership, because these typically
are not also ISPs providing user internet service. Good bots put a
little information on the bot into the log, so we can always open up
their ip if we want to "join into" their searching. So, until then the
only
traffic from these sites is either hacking or pointless bot activity of
no benefit to the website owner. I occasionally submit complaints to
the hosting company, but since a possibly significant part of their
cloud income stream seems to come from users running malevolent bots,
they don't do much to shut it down. Just my opinion. Amazon purported to
put a stop to (1) of such wordpress scanners - though I get at least
two a day from amazonaws. Microsoft "abuse" does not respond to emails.
Other host companies have sent me an AI response that they are "looking
into" my complaint -- but no resolution or follow-up. So, it becomes
"they don't need me and I don't need them" so block them all.
Block and create a log of "pattern" based automated hacking
top We know that someone who is
trying to access certain CMS (or other) subdirectories directly is a
hacker. So, let's created a log of just these domains and/or IP
addresses. To do this on cPanel, we will want to make a subdomain, like
this goaway.[yourdomain].com. We don't want to link to this or put it in
search engines. It may end up there anyway, but only if a hacker links
to the page somewhere that will get crawled by a search bot. So, we
want a simple index.html in this subdirectory, which you might actually
set up as [yourdomain].com/goaway. The index.html could be empty, or
whatever message you would like the hacker to see. Now, let's send them
there.
In our main .htaccess file, we need some redirect statements, like this:
WARNING!!! If you start messing with your htaccess file, a simple typo can lock *YOU* (and everyone) out of your website. (I think you will still have cPanel access and can revert to a prior .htaccess file *if you made one before editing it*. Otherwise, you will have the unhappy experience of asking tech support to revert it for you.
Redirect 301 /2020 https://goaway.[yourdomain].com
Redirect 301 /2019 https://goaway.[yourdomain].com
Redirect 301 /autodiscover https://goaway.[yourdomain].com
Redirect 301 /admin https://goaway.[yourdomain].com
Redirect 301 /administrator https://goaway.[yourdomain].com
Redirect 301 /backup https://goaway.[yourdomain].com
Redirect 301 /blog https://goaway.[yourdomain].com
Redirect 301 /b2evo1/htsrv https://goaway.[yourdomain].com
Redirect 301 /cms https://goaway.[yourdomain].com
Redirect 301 /.env https://goaway.[yourdomain].com
Redirect 301 /file https://goaway.[yourdomain].com
Redirect 301 /forum https://goaway.[yourdomain].com
Redirect 301 /.git https://goaway.[yourdomain].com
Redirect 301 /new https://goaway.[yourdomain].com
Redirect 301 /news https://goaway.[yourdomain].com
Redirect 301 /old https://goaway.[yourdomain].com
Redirect 301 /oldsite https://goaway.[yourdomain].com
Redirect 301 /shop https://goaway.[yourdomain].com
Redirect 301 /site https://goaway.[yourdomain].com
Redirect 301 /sito https://goaway.[yourdomain].com
Redirect 301 /temp https://goaway.[yourdomain].com
Redirect 301 /test https://goaway.[yourdomain].com
Redirect 301 /web https://goaway.[yourdomain].com
Redirect 301 /website https://goaway.[yourdomain].com
Redirect 301 /wordpress https://goaway.[yourdomain].com
Redirect 301 /wp-admin https://goaway.[yourdomain].com
Redirect 301 /wp https://goaway.[yourdomain].com
Redirect 301 /wp1 https://goaway.[yourdomain].com
Redirect 301 /wp2 https://goaway.[yourdomain].com
Redirect 301 /wp-content https://goaway.[yourdomain].com
Redirect 301 /wp-content.php https://goaway.[yourdomain].com
Redirect 301 /wp-login https://goaway.[yourdomain].com
Redirect 301 /wp-login.php https://goaway.[yourdomain].com
Redirect 301 /wp-includes https://goaway.[yourdomain].com
Redirect 301 /xmlrpc.php https://goaway.[yourdomain].com
If we have set this up correctly, hackers asking for these
typical CMS/git/forum/files subdirectories will be redirected to
goaway.[yourdomain].com; and if you have set up the subdomain correctly
in cPanel, you will get a separate raw access log just for "goaway". Any
IP or host in the new raw access log list either tried to hack you, or
tried to access your "goaway" page, where they have no business being.
Warning:
if you look at this page for testing purposes, your ip or hostname will
be in the raw access log, so if you cut and paste it to your .htaccess
list, you will find that you have locked yourself out of your own
website.
Been there, done that! But, once you know they are hackers, it would be a good idea to block them when they come back, like this:
RequireAll
Require all granted
Require not ip 5.189.239.157
#comment: will you annoying bots, please just go away?
Require not host sitelock.com
mail.semanticsystems.com seostar.com netcraft.com digitalocean fbsv.net
semrush okitup.net aachen twingly boardwalkhat ohris network-crm
/RequireAll
Note: RequireAll and
/RequireAll (above) must be bracketed by
<>,
but I don't know how to code html for the <> to display! I put
comments in my .htaccess file so I can remember why I blocked something,
in case I change my mind. The comments don't display anywhere or do
anything. You could also use the language "Deny from [IP], which you
will see in my sample block list. However, as of apache 2.4 that
language is deprecated and slated to be removed (so I am trying to
update my approach). See apache access control documentation
here.
Further Note:
in my opinion/experience not all apache directives actually work.
In theory, it should be possible to "Require not host
colocrossing" or "Require not host M247", but somehow these haven't
worked well. I did read that for hostname blocks to work, apache
cross-checks the hostname via whois or something
and the block won't work if the cross-check does not match up
the hostname and the IP. So, maybe these are masquerading hostnames when
the block doesn't work. I've tried multiple different block techniques
on the worst offenders, but have not been 100% successful.
It does appear that when I successfully block a specific
host/route/useragent/etc., hackers regroup and attack in a different
manner. There are certain "tells" in the logs that suggest it's the
same hacker(s) trying
again. It's not uncommon for a hacker that has been redirected
to one of my honeypot pages to come back from a different IP and make a
direct request for the honeypot page to look at it again. One hacker
displayed a
user agent of "virustotal", which I blocked, so came back from
the same ip with user agent "VT". Apparently, vanity requires hackers to
sign their work.
The pointless robots.txt file
I have made two observations (a) virtually no bot honors the
statements in robots.txt. It's sort of like the original protocol
requiring only plain text in email to keep bandwidth down. Hahahaha.
Who honors that?
Most email users probably do not even know what a plain text
email looks like. (b) apache apparently does not bother to block
requests for robots.txt. I repeatedly observed in my logs "blocked" ip
addresses returning
a "200" code when looking for robots.txt. So, robots.txt has
effectively become a bot verification tool to determine whether a
subdirectory exists or not. At first I did not know what to do about
this, then
I removed
my robots.txt file. I have not observed any particular downside to this, and a considerable upside.
Another puzzle that I can't solve is that some "blocked" IPs
seem to receive a 200 response, even around a password protected
directory that gives everyone else a 401. But, when I look at the file
size, it
does not match up with the file(s) that would be delivered to a
non-blocked IP. My best, totally uninformed, guess is that there is
some sort of query that receives a response from the server root and not
my
personal web directory structure. I don't know if this is true,
but the file size fed to the blocked IPs that receive a 200 response
are consistently the same size, irrespective of which domain or
subdirectory
they have queried.
Some blocks send a 500 response and I did notice a log entry one
time that seemed to be a bingbot query seeking to know if the domain
had been permanently removed. If only the answer had been "yes"!
I tried to use the webmaster bing tools to remove my website
from bingbot search and received an automated response that my domain is
"important" to the internet and so would not be removed. Pretty funny
considering
there is absolutely nothing on the domain at all except for a
honeypot trap. (But it *is* the place bingbot is constantly looking for
fake pages with long gibberish names.) I also filed a cert.microsoft.com
complaint
about this, but they did not respond.
Let's Look at a Distributed Attack
top So, I think what we see here is a
distributed attack. These log entries are clustered together in time,
and together go through a standard set of CMS hacking inquiries,
but each query comes from a different IP in a different geolocation
In some instances, the hacker is possibly using multiple public VPN
services. As I add more IP ranges to my block list, it seems apparent
that an number of the web providers with global presence are running
subscription vpn services. Attacks also come from the TOR network. On
some level I support the concept of anonymity to get around censorship.
But, it's pretty difficult to support the concept of VPNs, when
apparently the primary use is to attack/hack people's websites. That
would sort of be the opposite of an "open exchange of ideas". Conundrum.
In any event, in what I call "distributed attacks", each IP
reports the same iPhone user agent. What would be the odds of these
disparate IPs hitting my site at approximately the same time,
sequentially querying CMS files/subdirectories and all reporting
precisely the same user agent? The hackers are messing with us, because
two are located in Virginia and one, 168.151.192.220 resolves as,
"intelligence network inc." in Ashburn VA. But, on the other coast,
185.246.173.104 resolves as "illuminati network" in San Franciso. Fun
joke, but someone had to spend money setting up hosting machines with
cute names for the paranoid.
It's reasonably likely that some or all
of the sites used in the coordinated attack are compromised sites and
the owner is unaware of the activity. (See AnonymousFox and Black Hat
SEO" above.) So, we really don't know "who to blame" in these attacks.
168.151.192.220 - - [04/Jul/2021:17:44:40 -0700] "GET
/login.php HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS
13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko)
Version/13.1.2 Mobile/15E148 Safari/604.1"
102-129-128-97.quadranthosting.com - -
[04/Jul/2021:17:44:40 -0700] "GET / HTTP/2.0" 200 234 "-" "Mozilla/5.0
(iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML,
like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1"
181.214.179.158 - - [04/Jul/2021:17:44:40 -0700] "GET
/us/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like
Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
64.137.108.75 - - [04/Jul/2021:17:44:40 -0700] "GET
/home/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7
like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
181.214.151.135 - - [04/Jul/2021:17:44:40 -0700] "GET
/ca/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like
Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
network-crm.com - - [04/Jul/2021:17:46:43 -0700] "GET
/en/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like
Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
168.151.133.64 - - [04/Jul/2021:17:49:36 -0700] "GET
/us/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like
Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
ohris.boardwalkhat.com - - [04/Jul/2021:17:49:36 -0700]
"GET /ca/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7
like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
168.151.150.226 - - [04/Jul/2021:17:49:36 -0700] "GET
/login.php HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS
13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko)
Version/13.1.2 Mobile/15E148 Safari/604.1"
185.246.173.104 - - [04/Jul/2021:17:49:37 -0700] "GET
/en/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like
Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
191.96.81.67 - - [04/Jul/2021:17:49:38 -0700] "GET
/home/ HTTP/2.0" 404 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7
like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
64.137.41.141 - - [04/Jul/2021:17:49:40 -0700] "GET /
HTTP/2.0" 200 234 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac
OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2
Mobile/15E148 Safari/604.1"
Dual Attacks
top I don't know what to make of this: I
host an exe file that is useful for restoring the prom software for some
older Cypress-based external harddrives. I consistently get dual GETs
for this file, from two geographic areas and IPs, simultaneously.
Cloud Enterprise is a major PART OF THE PROBLEM (aka, that's not a bug, it's a feature)
top
I have blocked literally 10s of thousands of "cloud service"
IPs. While it is possible some google-wanna-be startup, with a motto
like "don't be evil" or some such might have a valid reason for visiting
my website, the reality is that with the exception of
responsible web search indexes (is that an oxymoron?), there is no
reason any cloud service/virtual machine should be visiting my website.
So, no loss from blocking them at all.
Google, for instance, assured me that the complaint I filed was
not from an IP hosted on Google Cloud -- perhaps true, as this hostname
response may be spoofed: "47.160.245.35.bc.googleusercontent.com". But,
it
resolves to 35.245.160.47/Google LLC data center in
abuseipdb.com's database; and that IP is definitely Google Cloud, per
WHOIS. If this is some other entity spoofing google's cloud addresses,
then the WHOIS system
is way too lax. And, Google has substantial input into the
internet naming system, whereas I ... don't. (Fortunately, it is
possible to block "googleusercontent",
whoever that is. And, the
relevant
Google cloud IPs. But, that does not do away with the constant
barrage, even if my server repeatedly informs the spider
"403".
Amazon, whose amazonaws cloud has attacked my website
repeatedly and from all over the world, responded to my (one) complaint
as follows: "This is a follow up regarding the abusive content or
activity report that you submitted to AWS. We have investigated
this report, and have taken steps to mitigate the reported abusive
content or activity." Which is nice, but that is one out of multiple
daily attacks.
Cloud platforms such as Azure actively hide the "bad actors"
from disclosure, since website logs will only show the microsoft cloud
IP and not information about the actual wrongdoer. But, when a
complaint is filed,
Microsoft refuses to disclose their "customer" for "privacy
reason" -- even while stating that
Microsoft is not responsible because its
customer is the bad actor. How convenient for both.
Microsoft (and all cloud platforms, including VPNs) could
block clearly malevolent actions originating from its service AT THE
SOURCE. The overall reduction in wasted internet resources would be
enormous, along with
reduction of the huge economic losses inflicted in websites that
get hacked. The only recourse is to block all cloud services unless a
specific IP demonstrates that it has a valid reason for querying our
websites --
and one that is in fact useful to the web owner. Even then,
since the cloud service is cloaking the company that purportedly offers a
valid reason to address our website, we can never be sure exactly who
is in fact
behind the queries. (So, the requirement needs to be that the
end-client get itself a valid domain name and IP that is disclosed in
our logs and NOT a generic Microsoft one.) Of course, part of the issue
is
that many of these hacking attempts are originating from
government entities, since it is public knowledge that Microsoft, Amazon
and others have massive cloud services contracts with the government.
In fact
there is an explicit "clue" in the Microsoft response to
complaints: sometimes the case-closed response says they have taken
action. Sometimes it is an automated reply that simply says they have
no control
over Azure clients. Since the terms of Azure service no doubt
prohibit illegal activity, the only conclusion must be that the
exact same wordpress attack
coming from a government Azure server is
neither "illegal" nor a violation of the Azure terms of
service. It's really a significant question how much of the hacking
traffic is the instrumentality of one government or another. Why these
bots end
up on my site is anyone's guess ... since there is nothing on
my website that is controversial or of any particular interest. The only
possible purpose of running CMS attacks on my website, then, would be
to hijack my website for nefarious purposes, "because we can."
AKA, "that's not a bug, it's a feature."
But, it's reducing the internet to an inhospitable morass. My
latest approach is moving toward "blocking all" and finding some method
of authenticating actual people who might be interested in reading
things
I write (aka, "no one" -- unless perhaps you are reading this
...). I've implemented this approach on one of my subdomains to see what
happens. I've included a self-logging 403 response page, because
manually
entering the daily hacker info has become a chore that is no
longer educational or entertaining.
Let's look at an actual Microsoft response to an abuse complaint:
img src="cid" alt="Microsoft" width="74" height="16">
img src="cid:sys_attachment.dosys_iddf04784829c33300460ada3061b98a23@SNC.97bbe00f178e922a"
Cyber Defense Operations Center
Computer Emergency Response Team
Case Closure Notification
Case information
SIR4629650
This
message is to notify you that the Computer Emergency Response Team has
reviewed your reported issue and has actioned it appropriately.
The
activity reported is associated with a customer account within the
Microsoft Azure service. Microsoft Azure provides a cloud computing
platform in which customers can deploy their own software applications.
Customers, not Microsoft, control what applications are deployed on
their account.
The
specific details of this case will not be provided in accordance with
our customer privacy policy. Microsoft is continuously refining its
systems to detect and prevent abuse of its online services. For more
information about Microsoft Azure, visit https://azure.microsoft.com/en-us.
Thank you,
Computer Emergency Response Team
[XXXX]@microsoft.com
About this notice
This
notification is part of the Microsoft Computer Emergency Response Team
standard operating procedure for handling reports of suspected abuse.
Reports of suspected abuse can be filed directly at http://cert.microsoft.com. Microsoft respects your privacy. Please read our Privacy Statement: http://www.microsoft.com/privacystatement/en-us/OnlineServices/Default.aspx.
One Microsoft Way, Redmond, WA 98052 USA
Ref:MSG1681337_EgGsxrmsFfxLcRHjvlyU
A couple of additional points about this response: (1) I received the
response in less than 5 minutes after [laboriously] entering my
complaint into the Microsoft abuse online form.
The only reasonable
conclusions are that Microsoft receives enough of these complaints
to have set up an automated response for complaints regarding the
subject IP; and ... they did not actually contact the offending
"customer" for an explanation. That is despite their own website
Azure guidance
that attacks (such as the wordpress attack I sent them evidence of) are a
violation of their Azure terms of service. Unless, of course, they deem
it "not illegal" and "not a violation of the terms of service"
because their customer is in fact the US gov't... (2) It sets out an
email address for response that will itself respond that the email
address is no longer monitored as of March 1, 2021, which their team
previously directed me to. The online form, BTW, requires way more
information than is reasonably necessary since the raw logs speak for
themselves. There is an "API" for bulk submitters, but little guys like
me don't have the capability to use APIs. So, the intent is apparently
to dissuade us from filing complaints in the first place.
Moreover, they enable hackers to hack anonymously, but require webmasters to fully
identify themselves in order to complain about the anonymous hacking emanating from their servers.
Seems a bit unfairly asymmetrical to me. Protect the crooks, out the
good guys (to the crooks). Frankly, their
logs speak for themselves and don't require any "identifying" of the
webmaster. The hack either occurred or it didn't. If it did, Microsoft
outgoing logs should reveal it. If it didn't happen, there will be
no corresponding outgoing log.
Unless, of course, Microsoft does not bother to log outgoing traffic ....
Script to [more] easily examine access logs
top
#!/bin/bash
gzip -d *
grep -v "[your hostname]" * | grep -v "[your ip]" > grep1.txt
grep '" 200' grep1.txt > grep200.txt
grep '" 403' grep1.txt > grep403.txt
grep '" 500' grep1.txt > grep500.txt
grep -v '" 200' grep1.txt | grep -v '" 403' | grep -v '" 500' > grepOther.txt
kate grep200.txt
kate grep403.txt
kate grep500.txt
kate grepOther.txt
exit 0
Place this script in ~/bin so it will execute from wherever.
Name it "greppy" [or whatever you want]. Make an empty folder and move
an "access_log_xxxxxx.gz" file into it from cpanel. Open a console *in
the directory*
and run "greppy".
The accesslog will unzip, the gz file will delete, and then kate
will open with tabs showing a log of your "200" responses, your "403"
responses, your "500" responses and "everything else."
Sequentially, the script extracts the log file an deletes the
archive; extracts all the data *except* your accesses to the site;
extracts the 200s; extracts the 403s; extracts the 500s; extracts
everything else; and
then opens each of the .txt files in kate. (For some reason, if I
do not already have kate open, it opens each txt file one at a time and
pauses until I close kate, then displaying the next txt file.)
When done, empty the directory (rm *) and download another log archive into the directory for processing.