I’m trying to write a little WordPress plugin to support some migrated content.
The syntax highlighter expects (for proper highlighting):
<pre lang='something'>
<code>
The code...
</code>
</pre>
However, my markdown code has the following:
<pre>
<code>
:::something
The code...
</code>
</pre>
I think you can see where this is going. What I want to achieve is this:
:::something
should be removed, and the<pre>
tag should be updated to<pre lang="something">
.- If
:::something
does not exist, the<pre>
tag should be<pre lang="plain">
- There may be multiple occurrences per page that need to be updated.
How would a PHP function achieving the above look like?
function set_syntax_lang($content) {
// Do stuff here
return $new_content;
}
What I gathered so far is this regex:
/<pre.*>s*<code>s*:::(w)/
This even yields me, using preg_match
, the actual syntax indicator (something
), but I don’t know how to update the pre
-tag correctly.
It’s been a very long time since I coded PHP and regexes are not really my strong suit. So all help is appreciated.
Finding :::something
This is an edge-case. But normally I should advise you to NOT use regex for html (bobince someone?).
Also next time try be less verbouse on your question. I took more time to read you than to write this answer.
Finding code without :::something
Fixing
<code>
You answered most of your question in the steps you gave. Break it down into those chunks — FIRST see if you have
:::something
, THEN update your<pre>
tag and REPEAT.You’ll have a much easier time of it if you use the DOM instead of regex. It will make the job of navigating through the
<pre>
and<code>
tags very simple. As has been said many, many times here, html is not a regular language, so a regular expression cannot parse it correctly. Even for a limited subset of HTML, it’s really not the right tool. The regex for:::something
is trivial once you use the DOM to get the text between<code>
and</code>
:/:::(w+)/
First of all some points I ran over:
According to your question, there never is a space in there if you make use of
:::something
. But you add it into your regex. I wonder why.If the language specifier is larger than one character (which I assume) you must write that into the regex, like
w+
for one or more letters.The rest looks quite like you have already everything. Probably not the replacement:
Hopefully this helps.