They say any publicity is good publicity, and a fate far worse than people talking about you is nobody talking about you. That probably applies to starlets and aspiring celebrities, or perhaps struggling TV shows. But when it comes to software, good talking points is better than bad talking points. But I guess in a way you haven’t made it until someone writes a crack for your licensing scheme or tries to attack your users with malicious software.
I’m referring to Registration SPAM bots, those pesky automated programs who roam the internet, looking for sites to post fake users to, for the purposes of updating user profiles with junky links and text, or filling up forums with spam posts. This is a long-standing problem in the web world, something which has long been an issue with other Content delivery software like Wordpress, Drupal and others. Given the longevity and reach of DNN, it was only a matter of time before the bot-writers turned their attention to the large amount of live DNN sites on the internet.
This post is going to cover identifying and removing those SPAM user registrations in DNN. I’m not going to cover preventative measures here because that’s another topic entirely. The most important thing I can see is keep up with DNN releases as they come out, because the community and DNN teams are working on these issues continually.
What SPAM bots do, and why they do it
Usually, it’s all about Search Engine Optimization (SEO). Sites are considered by search engines to be more authoritative and popular if they contain many other links pointing at them. The link text of those links matters – a site linked by many others for ‘Red Top Hats’ is likely to rank well for that term. So a site owner might go around the internet and try and generate as many links as possible back to their site with the anchor text of ‘Red Top Hats’. All going well, they’ll rise to the top of the list for that phrase, sell oodles of Red Top Hats, and retire to the Caribbean.
That’s a great plan to corner the vividly coloured millinery market – but it is rather tedious trying to generate all those links. The less-scrupulous (or, those paying the less scrupulous, either knowingly or unknowingly) do what anyone does when faced with a large, repetitive task – they automate it. The automation is done by writing a program which will crawl all over the internet, looking for forms that accept input, and attempt to post html to those forms, in the hope that the html stays on the site somewhere, thus creating the links.
For the innocent site owner, what this looks like is a torrent of broken form submissions and errors, and a pile of weird looking links and posts if they happen to be successful. These usually are run on infected computers so they come from all over the place – it’s not easy to just block a specific IP address.
SPAM bots generally come in two flavours – the generic and dumb, and the tailored. Generic bots look for any form with input fields and a submit button, and just post content. This is the type that tends to fill up blog comments and end up in enquiry forms. (Hey Joe, we just got an enquiry for <a href=http://www.redtophats.com>buy red top hats</> – do you know them?)
The tailored are bots written specifically to target a specific type of software. And that’s the type that has been making the rounds of DNN sites since about May 2014. And boy, are they annoying or what? Specifically, this one registers a fake user account, then goes to update the user profile biography field, filling it with ‘Hey, here’s a great Red Top Hat shop where you can buy Red Top Hats’.
data:image/s3,"s3://crabby-images/b829f/b829f39ace7d1d34a2ed9d66ef065a75dc6c9b2d" alt=""
Pictured above is the type of payload delivered in a User Profile. It’s typical in that it has keyword anchor text.
With enough of these around, then the net affect of all those links from all those sites can help a site rank for a specific term. There is a problem for the spammers though – the first is that they usually are not very careful with crafting their payloads, and usually a lot of links are broken (an affected site will see plenty of 404 messages in the log with weird looking URLs as a result). In the above example – not one single link went to the intended location because the html wasn’t crafted properly. The second is that the Google anti-spam team have rolled out several updates to try and combat the practice by penalizing sites which have dubious links. But that still leaves the unfortunate site owner with a mess that needs cleaning up.
Identifying SPAM User Registrations in DNN
The difficulty with identifying fake users is sorting them from real users. If you don’t have any real users registering on your site, your job is that much easier because all users from a certain point onwards will be spam users (and you should switch off public registration is that is the case).
The key to identifying fake users is that they don’t behave like ‘real’ users. Real users log in more than once, they fill out their profile fields in random ways, and they use real names in different ways. The SPAM created users generally use lists of emails of real people to register on the site, so that they look like real people to the incurious administrator.
I hesitate to proscribe a one size fits all solution to identifying fake users because every site has a different set of users. Keeping that in mind, here are some queries I have used to identify fake users in SPAM affected DNN sites. I advise running these through your Host->SQL page (or direct in a SQL query tool, if you use one). See if they are identifying only fake users, and not catching real users in the queries. Then I’ll show how you can use these to delete the fake ones.
First Query – Identify Users which have never logged in, and where the DisplayName isn’t two words.
select top 100 *
from {databaseOwner}{objectQualifier}Users
where (LastIPAddress = '' or LastIPAddress is null)
and IsDeleted = 0
and DisplayName not like ('% %')
and CreatedOnDate > '2014-04-01'
order by UserId desc
I have found that SPAM registered users use a single-word Display Name like ‘JoeBloggs’ instead of Joe Bloggs. Because this is a pattern, it makes it easier to identify. The other part of this query is identifying users created after a specific date, which is when the site started getting affected by registration bots.
Note that I have a restriction on the number of rows returned – this is because I use these queries to see if I am capturing only fake users, and I don’t want to return more rows than I could possibly look through.
Second Query – Identify Users with a big chunk of Html in their profile
select top 100 *
from {databaseOwner}{objectQualifier}Users u inner join {databaseOwner}{objectQualifier}UserProfile up on u.UserId = up.UserId and PropertyDefinitionId = 36
where u.UserId in (
select UserId from {databaseOwner}{objectQualifier}UserProfile
where PropertyDefinitionId = 36
and PropertyValue like '<%')
and datediff(d,LastModifiedOnDate, CreatedOnDate) > -7
This looks for encoded Html in the User Profile field of Property Definition ID ‘36’ – which in this site is the profile field for the biography. You may want to check your profile definitions to see which one is your biography profile feed – not all DNN installs will have the same ID of 36.
The second part of it looks for a difference between the LastModifiedOnDate – which is a proxy for the last login – and the CreatedOnDate, which is the date it was created. This separates people in your user list who have a ‘real’ biography – who are likely to have logged on more than a few times – with those who logged on once, updated their profile and never came back.
Take a look through the results. You’ll see that the user profile feed is encoded Html, so you can find an online html decoder to see the result.
data:image/s3,"s3://crabby-images/9e7c8/9e7c8ff7df97a0f28edc0e2adf03fd108d1e783a" alt="Html in User Profile Table"
Copying / pasting one of these values into an online Html decoder shows you what the spammers are up to:
data:image/s3,"s3://crabby-images/115fa/115fa070b15ee1189ecc61755db8f751f5a0cafe" alt=""
However, typically, after it gets posted to a DNN User Profile, the Html is messed up and the links, formatting and images don’t work:
data:image/s3,"s3://crabby-images/e6522/e65223858e9e87764312c919de304516e79373ab" alt=""
If they actually took care with this stuff, they’d be twice as effective. Being malicious and sloppy just makes it all the more frustrating.
With these two queries, I found I could identify about 90% of the fake users without catching any of the ‘real’ users in the net. In truth, it’s likely that one or two real users got caught in the trap. It all depends on how important your user records are (is it a blogging site where people register to comment, or a paid membership site?)
There are other methods identified as well – see Sebastian Leupolds blog ‘Getting rid of SPAM users’. I suggest trying a few different queries out, and being creative with the fields in the User table and UserProfile tables until you can reliably identify fake users by SQL query. Keep it simple by writing multiple queries rather than trying to write one query to rule them all.
Deleting Fake Users
Deleting users can either be done through the DNN Administrative interface – if you have 1 or 2 – but if you have 10, 20 or 100,000 fake users, then it’s going to take some batch database updates.
What you need to do is set the IsDeleted column in the UserPortals table to ‘1’. This ‘soft’ deletes the user, so that the account is disabled and the profile is no longer available.
Here’s the accompanying Update statements to match the above queries.
First Query
update {databaseOwner}{objectQualifier}UserPortals
set IsDeleted = 1
where UserId in (
select u.UserId from {databaseOwner}{objectQualifier}UserProfile up
inner join {databaseOwner}{objectQualifier}Users u
on u.UserId = up.UserId
where PropertyDefinitionId = 36
and PropertyValue like '<%'
and datediff(d,LastModifiedOnDate, CreatedOnDate) > -7)
Second Query
update {databaseOwner}{objectQualifier}UserPortals
set IsDeleted = 1
where UserId in (select UserId from {databaseOwner}{objectQualifier}Users
where (LastIPAddress = '' or LastIPAddress is null)
and IsDeleted = 0
and DisplayName not like ('% %')
and CreatedOnDate > '2014-04-01'
A soft delete is used because it’s easy to reverse if you have problems or inadvertently delete a real user. The accounts cannot be used or seen once they are soft deleted.
Completely Removing Deleted Users
If you’ve soft deleted your fake users, and haven’t had any problems with real users not appearing, then you might want to completely remove the users. The DNN Admin interface provides a way to do this – but, again, if you have 100,000 of these fake accounts to delete, then you’ll need a batch process. The problem with just deleting the users from the database is that it leaves behind the user folders in the Portal directory. If you had a bad infection of SPAM users, you’ll have 10s of thousands of empty user folders. This can cause delays in creating backups, make it slow to navigate the site, and cause other issues. Unfortunately, you can’t easily identify the folders by user id, because the folder naming uses an algorithm to ensure that the folder hierarchy grows organically.
The only way to do it properly is to use the DNN API itself to delete the user, which cleans up the database record correctly, deletes the folders and completely removes the user safely from the site.
I wrote a small utility to achieve this, called Dnn Bulk User Delete. This is a two-part install, where one part is installed to your DNN site, and the other is run from your local machine. The DNN extension is a library which calls the correct API and exposes a WebAPI endpoint. The console is a simple Windows App that calls the WebAPI endpoint on your site to delete users. It’s designed to repetitively delete small batches of users so that the deletion doesn’t overwhelm the database and file system of your site – which it will if you try and delete 100,000 users at once.
You can obtain the Bulk User Delete from here : https://github.com/brucerchapman/DnnBulkUserDelete – just download the Console App and Bulk User Delete install package from the 01.00.00_Release folder. The Wiki for the utility is here: https://github.com/brucerchapman/DnnBulkUserDelete/wiki
Steps to use the Dnn Bulk User Delete Utility
- Obtain the console and Dnn install packages from the Dnn Bulk User Delete Github repository.
- Install the Dnn.BulkUserDelete_01.00.00_Install.zip into the affected Dnn site by using the Host->Extensions page
- Extract the Console program from the Dnn.BulkUserDelet_01.00.00_Console.zip archive and put it on your local machine somewhere, then run it
- You’ll see the console as below:
data:image/s3,"s3://crabby-images/29d67/29d67da913960f77146dbf8918415cd116bbe8f6" alt=""
- Enter your site alias (domain name, or domain name and path if that is what the site alias is). This works with one single portal at a time. Enter your Host user and password. You can click on ‘Ping Host’ to check the authentication is correct. If not, try the ‘Ping Anon’ to see if you have basic connection to the site working.
- Use the ‘Check Soft Deleted Users’ button to ensure that everything is correct and that there are users to delete. This will identify the size of the task at hand. This is simply checking the Users in the portal for how many are soft deleted.
- Check the ‘Test Mode’ and try running a ‘Hard Delete 1 Batch’. This will do a mock run and see if everything is working, that the users can be identified and that everything is set up.
- When ready, uncheck the ‘Test Mode’ and do an actual ‘Hard Delete 1 Batch’. This will delete the users and the folders for those users.
- If the Hard delete 1 batch worked, set the batch size and the wait time, and click on ‘Start Deleting Batches’. The program will chew away deleting batches of users, and give you a predicted completion time, based on the speed of the last delete. This number jumps around a bit because different batches take different times.
- Leave it running until the very last fake SPAM user is gone from your site. You’ll see something like this:
data:image/s3,"s3://crabby-images/b2278/b227803b263c03ba62f7a281e905772e82918c61" alt=""
The option for ‘Fast Delete’ does a direct delete of the user folder, instead of calling the DNN Folder delete. This is because the DNN Folder delete also updates the cache and resynchs, which can take much longer. The Fast Delete directly deletes the folders from the server. When completed, you’ll need to re-synchronize the DNN File System in the File Manager. You’ll also notice that ‘Parent’ folders are cleaned up as users are deleted – this ensures that the Users folder hierarchy shrinks back down as you remove users – so that you don’t end up with a messy set of parent folders with no subfolders. In theory, if you had 100,000 spam users and only two legitimate users, you’d go from 200,00+ folders back to about 4.
It’s not very pretty, and the interface is JSON results into a plain text field, but it does work and I have used it to clean up some very bad infestations of SPAM users. Please try it out and see if it works for you – it’s a cleaner and more robust way of deleting users than attacking it piecemeal with direct database and folder deletions.
If you have other tips for identifying SPAM users, please share via the comments.