I’m working on a WordPress plugin that replaces the bad words from the comments with random new ones from a list.
I now have 2 arrays: one containing the bad words and another containing the good words.
$bad = array("bad", "words", "here");
$good = array("good", "words", "here");
Since I’m a beginner, I got stuck at some point.
In order to replace the bad words, I’ve been using $newstring = str_replace($bad, $good, $string);
.
My first problem is that I want to turn off the case sensivity, so I won’t put the words like this "bad", "Bad", "BAD", "bAd", "BAd", etc
but I need the new word to keep the format of the original word, for example if I write “Bad”, it would be replaced with “Words”, but if I type “bad”, it would be replaced with “words”, etc.
My first tought was to use str_ireplace
, but it forgets if the original word had a capital letter.
The second problem is that I don’t know how to deal with the users that type like this: “b a d”, “w o r d s”, etc. I need an idea.
In order to make it select a random word, I think I can use $new = $good[rand(0, count($good)-1)];
then $newstring = str_replace($bad, $new, $string);
. If you have a better idea, I’m here to listen.
The general look of my script:
function noswear($string)
{
if ($string)
{
$bad = array("bad", "words");
$good = array("good", "words");
$newstring = str_replace($bad, $good, $string);
return $newstring;
}
echo noswear("I see bad words coming!");
Thank you in advance for your help!
Precursor
There are (as has been pointed out in the comments numerous times) gaping holes for you – and/or your code – to fall into through implementing such a feature, to name but a few:
You’d do better to implement a moderation/flagging system where people can flag offensive comments which can then be edited/removed by mods, users, etc.
On that understanding, let us proceed…
Solution
Given that you:
$bad_words
$good_words
You can very easily use
PHP
spreg_replace_callback
function:Okay, so what the
preg_replace_callback
does is it compiles a regex pattern consisting of all of the bad words. Matches will then be in the format:The
i
modifier makes it case insensitive so bothbad
andBad
would match.The function
replace_words
then takes the matched word and it’s boundaries (either blank or a white space character) and replaces it with the boundaries and a random good word.Anonymous function
You could rewrite the above as a one liner using an anonymous function in the
preg_replace_callback
Function wrapper
If you’re going to use it multiple times you may also write it as a self-contained function, although in this case you’re most likely going to want to feed the good/bad words in to the function when calling it (or hard code them in there permanently) but that depends on how you derive them…
Output
Running the above functions consecutively with the input and word lists shown in the first example:
Of course the replacement words are chosen randomly so if I refreshed the page I’d get something else… But this shows what does/doesn’t get replaced.
N.B.
Escaping
$bad_words
Word boundaries
b
In this code I’ve used
b
,s
, and^
or$
as word boundaries there is a good reason for this. Whilewhite space
,start of string
, andend of string
are all considered word boundariesb
will not match in all cases, for example:This is because
b
matches against non-word characters (i.e.[^a-zA-Z0-9]
) and characters like$
don’t count as word characters.Misc
Depending on the size of your word list there are a couple of potential hiccups. From a system design perspective it’s generally bad form to have huge regexes for a couple of reasons:
Given that the regex pattern is compiled by
PHP
the first reason is negated. The second should be negated as well; if you’re word list is large with a dozen permutations of each bad word then I suggest you stop and rethink your approach (read: use a flagging/moderation system).To clarify, I don’t see a problem have a small word list to filter out specific expletives as it serves a purpose: to stop users from having an outburst at one another; the problem comes when you try to filter out too much including permutations. Stick to filtering common swear words and if that doesn’t work then – for the last time – implement a flagging/moderation system.
I came up to this method and it’s working fine. Returning true, in case there is an entry of bad words in the entry.
Example:
Usage:
As the word ‘bad’ is blacklisted it will echo.
Online example 1
EDIT 1:
As offered by rid it’s also possible to do simple
in_array
check:Online example 2
EDIT 2:
As I promised, I came up to the slightly different idea of replacing bad words with good words, as you mentioned in your question. I hope it will help you a bit but this is the best I can offer at the moment, as I’m totally not sure on what you’re trying to do.
Example:
1. Let’s combine an array with bad and good words into one
2. Your imaginary user input
3. Replacing bad words with good words
4. Getting the desired output
Online example 3
EDIT 3:
To follow the correct comment from Wrikken, I have totally forgotten about that
strtr
is case sensitive and that it’s better to follow word-boundary. I have borrowed the following example fromPHP: strtr – Manual and modified it slightly.
Same idea as in my second edit but not register dependent, it checks for word boundaries and puts a backslash in front of every character that is part of the regular expression syntax:
1. Method:
2. An array with bad and good words
3. Replacement
Online example 4