I was looking for a regular expression (php) to find/replace some words in a web page. But, it cant replace words between all html tags, only between: italic <i>, bold <b> and plain text.
Example:
word: “hello” (case insensitive)
<a href="#">Hello</a> im a writer that i like to say hello everyday. <b>Hello</b> Spiderman.
Replaces: in anchor cant replace, only hello and < b>Hello< /b>
can be replaced.
I tested some regular expressions but none is working properly:
1) from SMART SEO LINKS (WP plugin)
$reg = '/(?!(?:[^<[]+[>]]|[^>]]+</a>))b($word)b/Imsu';
Doesn’t work well, sometimes, deletes the content and put the simbol “>”
I made some ââmodifications to this regexp, removing “?!” or “?:” (i dont know whats mean), but stop working.
2) Others I’ve been tried:
$reg = "/<([w]+)[^>]*>b('.$word.')b</1>/Imsu";
$reg = '/<+s*/sb('.$word.')b[^>]/s>+/I';
not replace anything
$reg = '/<(w+)[^>]*>b('.$name.')b</1>/Imsu';
Sometimes works.
The truth is that im not regexp expert, and I was few days testing, trying to create a new regexp, but not meeting the results that I need.
The fact is that replace will be use in a WP plugin, which sometimes affects the template or others plugins or DOM isnt well created
Anyone have any idea why not work correctly? Thanks.
Try combination of these patterns
Example
Result
Notes to your question
Explanation
See the link about assertions above:
?<!
for negative lookbehind assertion can not be used to match<a href="#">
, because it is not fixed length and causes compile error. Therefore I used lookahead assertion?!
to match</a>
after hello. The brackets at the beginning and end include any surrounding HTML tag, so everything except following</a>
assertion is replaced.The trick to avoid hello replacement inside tags is to replace them for some unique string (say
!X!
) then do the original replacement, then replace back the!X!
for hello back. It may not be the best solution, but it works.Why your regexps didn’t work
You used
/I
modifier (at the end of your pattern). Modifiers are case-sensitive,/i
means case-insensitive evaluation, see the list of modifiers. I believe theb
(word boundary) in your patterns is redundant.