What’s the most efficient way to go about writing a complex content filter?

A short while ago, I asked a fairly lengthy question regarding a content filter for my website. The post can be found here, please feel free to have a read at your own leisure.

I’ve accepted @DampeS8N ‘s answer because he answered well and sent me in the right direction.

Read More

Unfortunately, the outcome of said question was that the content filter I’m using is simply too inefficient to run in the context of my application.

Here’s the information:

  • I have ~2000+ glossary terms* and ~1200+ species profiles**
  • The titles (glossary entry) or scientific names (species profiles) of these posts make up the search terms for the filter
  • I’d like to filter the contents of my species profiles (preferably upon save, but it could be done with a cron job) to search for the above search terms and replace them with links to the relevant glossary entries or species profiles

* A glossary entry can be found here.

An example list of glossary entries may be, caudal fin, dorsal,
filter, etc.

** A species profile can be found here.

An example list of species may be, Apistogramma panduro, A.
panduro
, Dario dario, D. dario, Betta sp. 'Maha Chai' etc.

Here’s the problem:

  • My CMS is driven by WordPress. This isn’t specifically relevant to this post, other than to understand the structure of my species profiles.
  • My species profiles are made of up some basic information stored in the wp_posts table, then additional information stored in the wp_postmeta table.
  • Most of the information in my species profiles are stored in a number of meta fields, which are entries into the wp_postmeta table. This can be seen below.

http://www.seriouslyfish.com/species/puntius-sahyadriensis/

wp_postmeta

INSERT INTO `wp_postmeta` (`meta_id`, `post_id`, `meta_key`, `meta_value`) VALUES
(104395, 2288, 'genus', '<em>Puntius</em>'),
(104396, 2288, 'species', '<em>sahyadriensis</em>'),
(104397, 2288, 'family', 'Cyprinidae'),
(104398, 2288, 'common_names', ''),
(104399, 2288, 'distribution', '<a class="link_glossary" href="/glossary/e/endemic" rel="/glossary/e/endemic?hover=true">Endemic</a> to streams of the Yenna <a class="link_glossary" href="/glossary/r/river%20basin" rel="/glossary/r/river%20basin?hover=true">river basin</a> close to the city of Mahabaleshwar in the Western Ghats mountain range, Satara district, Maharashtra state, India.'),
(104400, 2288, 'habitat', 'The <a class="link_glossary" href="/glossary/r/river" rel="/glossary/r/river?hover=true">river</a> Yenna flows through lush evergreen forest meaning the hill streams in which the fish can be found are likely to be shaded by the forest canopy and dense <a class="link_glossary" href="/glossary/m/marginal" rel="/glossary/m/marginal?hover=true">marginal</a> vegetation. Substrates should be composed of boulders, smaller stones, <a class="link_glossary" href="/glossary/s/sand" rel="/glossary/s/sand?hover=true">sand</a> or <a class="link_glossary" href="/glossary/g/gravel" rel="/glossary/g/gravel?hover=true">gravel</a> with submerged tree roots around the margins and quieter areas in which fallen branches and leaf litter collect. As with similar members of the <a class="link_glossary" href="/glossary/g/genus" rel="/glossary/g/genus?hover=true">genus</a> it is likely to congregate in <a class="link_glossary" href="/glossary/b/backwater" rel="/glossary/b/backwater?hover=true">backwater</a> pools or deeper areas with lower flow.'),
(104402, 2288, 'max_size', 'Around 2.75"/7cm.'),
(104403, 2288, 'aquarium_size', 'It is an active <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> and a <a class="link_glossary" href="/glossary/t/tank" rel="/glossary/t/tank?hover=true">tank</a> measuring at least 36" x 12" x 12"/90cm x 30cm x 30cm/85 litres is needed to house a group.'),
(104404, 2288, 'maintenance', 'Choice of decor is not as critical as water quality and the amount of swimming-space provided. We suggest keeping it in a roomy, well-planted <a class="link_glossary" href="/glossary/a/aquarium" rel="/glossary/a/aquarium?hover=true">aquarium</a> or alternatively it would look superb in a set-up designed to resemble a flowing <a class="link_glossary" href="/glossary/r/river" rel="/glossary/r/river?hover=true">river</a> with a <a class="link_glossary" href="/glossary/s/substrate" rel="/glossary/s/substrate?hover=true">substrate</a> of variably-sized rocks and <a class="link_glossary" href="/glossary/g/gravel" rel="/glossary/g/gravel?hover=true">gravel</a> and some large water-worn boulders. A rivertank manifold could also be constructed to provide naturalistic unidirectional flow. The <a class="link_glossary" href="/glossary/t/tank" rel="/glossary/t/tank?hover=true">tank</a> can be further furnished with driftwood branches and <a class="link_glossary" href="/glossary/a/aquatic" rel="/glossary/a/aquatic?hover=true">aquatic</a> plants for aesthetic value. While the vast majority of plant <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> will fail to thrive in such conditions possibilities include hardy <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> such as Java fern, <em>Bolbitis</em> or <em>Anubias</em> <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> which can be grown attached to the decor. Like many other <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> that hail from pristine natural environments it is intolerant to the accumulation of <a class="link_glossary" href="/glossary/o/organic" rel="/glossary/o/organic?hover=true">organic</a> wastes and requires spotless water at all times in order to thrive.'),
(104405, 2288, 'water_chemistry', '<strong>Temperature</strong>: Prefers slightly cool conditions within the range 20 - 24°C/68 - 75°C. Higher temperatures are known to stimulate spawning with an associated increase of aggression in males.rnrn<strong>pH</strong>: 6.8 - 7.8rnrn<strong>Hardness</strong>: 5 - 15°H'),
(104406, 2288, 'diet', 'Likely to feed on small invertebrates, <a class="link_glossary" href="/glossary/a/algae" rel="/glossary/a/algae?hover=true">algae</a> and other <a class="link_glossary" href="/glossary/z/zooplankton" rel="/glossary/z/zooplankton?hover=true">zooplankton</a> in nature. In the <a class="link_glossary" href="/glossary/a/aquarium" rel="/glossary/a/aquarium?hover=true">aquarium</a> it will accept dried foods of a suitable size but should not be fed these exclusively. Daily meals of small live and frozen fare such as <em><a class="link_glossary" href="/glossary/D/Daphnia" rel="/glossary/D/Daphnia?hover=true">Daphnia</a></em>, <em><a class="link_glossary" href="/glossary/A/Artemia" rel="/glossary/A/Artemia?hover=true">Artemia</a></em> and suchlike will result in the best colouration and encourage the fish to come into breeding condition.'),
(104407, 2288, 'behaviour', 'Not an aggressive fish but best kept with other hillstream-dwelling <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> such as <em>Danio</em>, <em>Devario</em>, other small <em>Puntius</em>, <em>Garra</em> and balitorid loaches. That said provided its oxygen and temperature requirements can be met it can be mixed with most peaceful fish too large to be considered food. A <a class="link_glossary" href="/glossary/b/biotope" rel="/glossary/b/biotope?hover=true">biotope</a>-style <a class="link_glossary" href="/glossary/c/community" rel="/glossary/c/community?hover=true">community</a> based around <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> from hill streams of the Western Ghats would also make an interesting project with <em><a class="link_species" href="/species/puntius-filamentosus" rel="/species/puntius-filamentosus/?hover=true">Puntius filamentosus</a></em>, <em><a class="link_species" href="/species/puntius-fasciatus" rel="/species/puntius-fasciatus/?hover=true">P. fasciatus</a></em>, <em><a class="link_species" href="/species/puntius-narayani" rel="/species/puntius-narayani/?hover=true">P. narayani</a></em>, <em><a class="link_species" href="/species/puntius-ticto" rel="/species/puntius-ticto/?hover=true">P. ticto</a></em>, <em>Barilius bakeri</em>, <em>B. canarensis</em>, <em><a class="link_species" href="/speciesario-aequipinnatus" rel="/speciesario-aequipinnatus/?hover=true">Devario aequipinnatus</a></em>, <em><a class="link_species" href="/speciesario-malabaricus" rel="/speciesario-malabaricus/?hover=true">D. malabaricus</a></em>, <em>Rasbora daniconius</em>, <em>Laubuca laubuca</em>, <em>Nemacheilus rupelli</em>, <em>Mesonemacheilus triangularis</em> and <em><a class="link_species" href="/species/mesonoemacheilus-guentheri" rel="/species/mesonoemacheilus-guentheri/?hover=true">M. guentheri</a></em> among the numerous suitable <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> available in the trade at times.rnrnIt''s a shoaling <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> by nature and really should be kept in a group of at least 8-10 specimens. Maintaining it in decent numbers will not only make the fish less nervous but will result in a more effective, natural-looking display while allowing individuals some respite from the vigorous alpha male(s). Males will also display their best colours and some interesting behaviour as they compete with one other for female attention. In particular the dominant individual in a given group will develop some stunning colouration.'),
(104408, 2288, 'dimorphism', 'The male is noticeably slimmer and more brightly coloured than the female especially when the fish are in <a class="link_glossary" href="/glossary/s/spawning" rel="/glossary/s/spawning?hover=true">spawning</a> condition. Most notably the body colouration is more intense, <a class="link_glossary" href="/glossary/v/ventral" rel="/glossary/v/ventral?hover=true">ventral</a> fins tipped with white, other finnage redder and prominent tubercules develop around the <a class="link_glossary" href="/glossary/s/snout" rel="/glossary/s/snout?hover=true">snout</a> and head in sexually <a class="link_glossary" href="/glossary/m/mature" rel="/glossary/m/mature?hover=true">mature</a> specimens.'),
(104409, 2288, 'reproduction', 'We''re not sure if it has been bred in the hobby although it should certainly be possible. Like most cyprinids this <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> is an <a class="link_glossary" href="/glossary/e/egg" rel="/glossary/e/egg?hover=true">egg</a>-scattering, continuous spawner that exhibits no <a class="link_glossary" href="/glossary/p/parental%20care" rel="/glossary/p/parental%20care?hover=true">parental care</a>. That is to say when the fish are in good condition they will <a class="link_glossary" href="/glossary/s/spawn" rel="/glossary/s/spawn?hover=true">spawn</a> often and in a well-furnished, <a class="link_glossary" href="/glossary/m/mature" rel="/glossary/m/mature?hover=true">mature</a> <a class="link_glossary" href="/glossary/a/aquarium" rel="/glossary/a/aquarium?hover=true">aquarium</a> it is feasible that small numbers of <a class="link_glossary" href="/glossary/f/fry" rel="/glossary/f/fry?hover=true">fry</a> may start to appear without human intervention.rnrnHowever if you want to increase the yield of <a class="link_glossary" href="/glossary/f/fry" rel="/glossary/f/fry?hover=true">fry</a> a slightly more controlled approach is required and we suggest using an approach that has proven successful for similar members of the <a class="link_glossary" href="/glossary/g/genus" rel="/glossary/g/genus?hover=true">genus</a> as a starting point. The adult group can still be conditioned together but one or more small, say 18" x 10" x 10"/45cm x 25cm x 25cm/29.5 <a class="link_glossary" href="/glossary/l/litre" rel="/glossary/l/litre?hover=true">litre</a> containers should also be set up and half-filled with water. These should be very dimly lit with the <a class="link_glossary" href="/glossary/b/base" rel="/glossary/b/base?hover=true">base</a> either left bare or covered with some kind of mesh of a large enough grade so that any eggs that fail to adhere to the plant can pass through but small enough so that the adults cannot reach them. The widely available plastic ''grass''-<a class="link_glossary" href="/glossary/t/type" rel="/glossary/t/type?hover=true">type</a> matting can also be used and works very well. A decent-sized clump of Java moss or other fine-leaved plant should also be added filling perhaps half the available space. The water should be around <a class="link_glossary" href="/glossary/n/neutral" rel="/glossary/n/neutral?hover=true">neutral</a> <a class="link_glossary" href="/glossary/p/pH" rel="/glossary/p/pH?hover=true">pH</a>, <a class="link_glossary" href="/glossary/G/GH" rel="/glossary/G/GH?hover=true">gH</a> &lt;8, with a slightly raised temperature of 75 - 80°F. A small air-powered <a class="link_glossary" href="/glossary/s/sponge%20filter" rel="/glossary/s/sponge%20filter?hover=true">sponge filter</a> bubbling away very gently is all that is needed in terms of filtration.rnrnWhen the adult fish are well-conditioned and the females appear full of eggs a single <a class="link_glossary" href="/glossary/p/pair" rel="/glossary/p/pair?hover=true">pair</a> should then be introduced to each container. If conditions are to their liking they should <a class="link_glossary" href="/glossary/s/spawn" rel="/glossary/s/spawn?hover=true">spawn</a> the following morning. Be sure to provide plenty of cover for the female as the male may be quite aggressive in his pursuit of her. In some cases she might even require a period of post-<a class="link_glossary" href="/glossary/s/spawning" rel="/glossary/s/spawning?hover=true">spawning</a> rehabilitation in a <a class="link_glossary" href="/glossary/t/tank" rel="/glossary/t/tank?hover=true">tank</a> that does not contain any males.rnrnThe adults will eat the eggs given the chance and should be removed as soon as any are noticed. Incubation in <em>Puntius</em> eggs is temperature-dependant to an extent but usually takes between 20 and 48 hours with the young free-swimming 24 to 48 hours later. Initial food should be <em>Paramecium</em> or similar introducing <em><a class="link_glossary" href="/glossary/A/Artemia" rel="/glossary/A/Artemia?hover=true">Artemia</a></em> <a class="link_glossary" href="/glossary/n/nauplii" rel="/glossary/n/nauplii?hover=true">nauplii</a> and/or <a class="link_glossary" href="/glossary/m/microworm" rel="/glossary/m/microworm?hover=true">microworm</a> once the <a class="link_glossary" href="/glossary/f/fry" rel="/glossary/f/fry?hover=true">fry</a> are large enough to accept them.'),
(104410, 2288, 'misc_notes', 'This beautiful <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> is not often seen in the hobby, presumably as a result of its limited distribution, and tends to command a relatively high price when available. It may be seen on sale under the trade names ''Maharaja'' or ''Khavli'' <a class="link_glossary" href="/glossary/b/barb" rel="/glossary/b/barb?hover=true">barb</a>.rnrnThe <a class="link_glossary" href="/glossary/g/genus" rel="/glossary/g/genus?hover=true">genus</a> <em>Puntius</em> is currently viewed as something of a catch-all for well over 100 <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> of small cyprinid. Most experts agree that a full <a class="link_glossary" href="/glossary/r/revision" rel="/glossary/r/revision?hover=true">revision</a> is required, with the likely outcome that many <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> will be placed into new or different <a class="link_glossary" href="/glossary/g/genera" rel="/glossary/g/genera?hover=true">genera</a>. When describing the <a class="link_glossary" href="/glossary/g/genus" rel="/glossary/g/genus?hover=true">genus</a> in 1822, Hamilton identified the defining characteristics as: "absence or presence of <a class="link_glossary" href="/glossary/m/maxillary" rel="/glossary/m/maxillary?hover=true">maxillary</a> only or <a class="link_glossary" href="/glossary/r/rostral" rel="/glossary/r/rostral?hover=true">rostral</a> and <a class="link_glossary" href="/glossary/m/maxillary" rel="/glossary/m/maxillary?hover=true">maxillary</a> barbels; <a class="link_glossary" href="/glossary/d/dorsal" rel="/glossary/d/dorsal?hover=true">dorsal</a> <a class="link_glossary" href="/glossary/f/fin" rel="/glossary/f/fin?hover=true">fin</a> with last <a class="link_glossary" href="/glossary/s/simple%20ray" rel="/glossary/s/simple%20ray?hover=true">simple ray</a> <a class="link_glossary" href="/glossary/s/serrate" rel="/glossary/s/serrate?hover=true">serrate</a> or entire, <a class="link_glossary" href="/glossary/b/branched%20rays" rel="/glossary/b/branched%20rays?hover=true">branched rays</a> usually 8; anal <a class="link_glossary" href="/glossary/f/fin" rel="/glossary/f/fin?hover=true">fin</a> with last <a class="link_glossary" href="/glossary/s/simple%20ray" rel="/glossary/s/simple%20ray?hover=true">simple ray</a> entire, <a class="link_glossary" href="/glossary/b/branched%20rays" rel="/glossary/b/branched%20rays?hover=true">branched rays</a> usually 5; <a class="link_glossary" href="/glossary/l/lateral%20line" rel="/glossary/l/lateral%20line?hover=true">lateral line</a> complete or incomplete, <a class="link_glossary" href="/glossary/l/lateral" rel="/glossary/l/lateral?hover=true">lateral</a>-line scales 17-36 in row; <a class="link_glossary" href="/glossary/c/cephalic" rel="/glossary/c/cephalic?hover=true">cephalic</a> <a class="link_glossary" href="/glossary/c/cutaneous" rel="/glossary/c/cutaneous?hover=true">cutaneous</a> <a class="link_glossary" href="/glossary/p/papillae" rel="/glossary/p/papillae?hover=true">papillae</a> minute or absent; <a class="link_glossary" href="/glossary/p/pharyngeal%20teeth" rel="/glossary/p/pharyngeal%20teeth?hover=true">pharyngeal teeth</a> in 3 rows, usually 2,3,5/5,3,2; colour pattern extremely variable." All the <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> currently in the <a class="link_glossary" href="/glossary/g/genus" rel="/glossary/g/genus?hover=true">genus</a> are <a class="link_glossary" href="/glossary/n/native" rel="/glossary/n/native?hover=true">native</a> to Southeast Asia, India and Sri Lanka.rnrnThe other main source of confusion with <em>Puntius</em> is that some authors do not recognise all the member <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> as such, rather following Walter Rainboth (1996) and preferring to place some into the alternative <a class="link_glossary" href="/glossary/g/genus" rel="/glossary/g/genus?hover=true">genus</a> <em>Systomus</em>. Rainboth proposed that <em>Systomus</em> should be reinstated (it was first erected in the 19th century) as a valid <a class="link_glossary" href="/glossary/g/genus" rel="/glossary/g/genus?hover=true">genus</a> on account of the fact that in its current state <em>Puntius</em> would seem to constitute a <a class="link_glossary" href="/glossary/p/polyphyletic" rel="/glossary/p/polyphyletic?hover=true">polyphyletic</a> grouping i.e. not all of its members appear to have descended from the same common ancestor. The defining characteristics of a <em>Systomus</em> <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> are (according to Rainboth) a <a class="link_glossary" href="/glossary/s/serrated" rel="/glossary/s/serrated?hover=true">serrated</a> (vs. smooth in <em>Puntius</em>) <a class="link_glossary" href="/glossary/d/dorsal" rel="/glossary/d/dorsal?hover=true">dorsal</a> <a class="link_glossary" href="/glossary/s/spine" rel="/glossary/s/spine?hover=true">spine</a>, the presence of 2 or 4 barbels (vs. always 2) and less than 12 <a class="link_glossary" href="/glossary/g/gill" rel="/glossary/g/gill?hover=true">gill</a> rakers (vs. 12-20). SF tentatively lists all <a class="link_glossary" href="/glossary/s/species" rel="/glossary/s/species?hover=true">species</a> as <em>Puntius</em> at present.'),
(167927, 2288, 'etymology', ''),
(104421, 2288, 'attached_media', 'a:4:{i:0;s:5:"22881";i:1;s:5:"22882";i:2;s:5:"22883";i:3;s:5:"22884";}'),
(150578, 2288, 'references', 'a:2:{i:0;a:5:{s:13:"ref_doc_title";s:67:"Assemblage structure of stream fishes in the Western Ghats (India).";s:12:"ref_pub_year";s:0:"";s:13:"ref_page_nums";s:5:"1-31.";s:14:"ref_publishers";s:17:"Hydrobiologia 430";s:11:"ref_authors";s:19:"Arunachalam M. 2000";}i:1;a:5:{s:13:"ref_doc_title";s:0:"";s:12:"ref_pub_year";s:0:"";s:13:"ref_page_nums";s:0:"";s:14:"ref_publishers";s:16:"www.fishbase.org";s:11:"ref_authors";s:0:"";}}'),
(167925, 2288, '_edit_lock', '1341244017:4'),
(167928, 2288, 'species_author', 'Silas'),
(167929, 2288, 'year_described', '1953'),
(167930, 2288, 'beginner_suitability', '3'),
(167931, 2288, 'type_of_fish', '2')

wp_posts

INSERT INTO `wp_posts` (`ID`, `post_author`, `post_date`, `post_date_gmt`, `post_content`, `post_title`, `post_excerpt`, `post_status`, `comment_status`, `ping_status`, `post_password`, `post_name`, `to_ping`, `pinged`, `post_modified`, `post_modified_gmt`, `post_content_filtered`, `post_parent`, `guid`, `menu_order`, `post_type`, `post_mime_type`, `comment_count`) VALUES
(2288, 4, '2012-03-13 13:24:32', '2012-03-13 13:24:32', '', 'Puntius sahyadriensis', 'This beautiful species is not often seen in the hobby, presumably as a result of its limited distribution, and tends to command a relatively high price when available. It may be seen on sale under the trade names 'Maharaja' or 'Khavli' barb.rnrnThe genus Puntius is currently viewed as something of a catch-all for well over 100 species of small cyprinid. Most experts agree that a full revision is required, with the likely outcome that many species will be placed into new or di...', 'publish', 'open', 'open', '', 'puntius-sahyadriensis', '', '', '2012-07-02 16:48:43', '2012-07-02 15:48:43', '', 0, 'http://www.seriouslyfish.com/?post_type=species&p=2288', 0, 'species', '', 0);

As you can see, the content of the species profiles is incredibly lengthy. Hopefully this will also show you the kind of link I’d like to use to get my pop-ups:

<a class="link_glossary" href="/glossary/n/native" rel="/glossary/n/native?hover=true">native</a>

My problem is this: the filter I’m running at the moment, the details of which can be found in the first linked post, or directly here for my filter.php file, simply isn’t efficient enough to work with this many database fields and this much information.

The filter did work when I first started using our new WordPress-driven CMS, as I ran it on a localhost WAMP install when I imported the data. It took nearly 40 minutes to run through all of the data.


My tl;dr question is this: how can I go about creating an efficient content filter, considering the quantity of data and number of database fields?

Related posts

Leave a Reply

3 comments

  1. Database Structure

    wp_postmeta (meta_id, post_id, meta_key, meta_value)
    wp_glossary (glossary_id, glossary_key, glossary_value)
    wp_relation (meta_id, glossary_id)
    

    wp_relation stores relation between wp_glossary and wp_postmeta. The following is example of a wp_postmeta’s entry:

    Maintenance
    Tank should be well planted with floating plants also used. An
    abundance of hiding places should be provided as this species likes to
    hide away during the day. Bogwood, rock caves and PVC piping are all
    suitable for this purpose. Sand should be used as substrate as the
    spiny eels often like to bury themselves. Dimmer lighting will
    encourage the fish to venture from its hiding places more often. A
    close-fitting hood is required as the eel can find its way through the
    smallest of gaps. Water flow should be fairly gentle as the fish
    mainly inhabits areas of still water in the wild.

    The following is example of wp_glossary’s entries

    glossary_id | glossary_key | glossary_value
             101   tank           ......
             102   species        ......
             103   bogwood        ......
             104   sand           ......
             105   substrate      ......
    

    The following is example of how wp_relation hold the relation of glossary and postmeta. Assuming the above postmeta’s id is 405

    glossary_id | meta_id 
             101   405       
             102   405  
             103   405    
             104   405       
             105   405  
    

    Output Profile

    Once you can maintain above database structure, you can easily construct glossary link for each wp_postmeta by searching id of wp_postmeta’s entry in wp_relation. Then, using str_replace to fill link to your content.

    To increase performance, you can cache the result content in HTML. Set expire of cache to 1 or 2 days (depend on how often you update your glossary and your site traffic). You can use memcached, file or database to store your cache.


    Maintain Database Structure

    The most difficult task is maintain our database relationship. There are two major actions that alter the database relationship.

    Alter Glossary

    • Add New Glossary: Search through wp_postmeta and construct relation.
    • Remove Glossary: Delete all relation between this glossary and other wp_postmeta
    • Edit Glossary: It is combination of remove glossary and add new glossary.

    Alter Postmeta

    • Add/Edit Postmeta: Reconstruct relation between glossary and its content. Below is algorithm that can be used to construct relation. I believe there are other better ways to do it. The code is not complete (I haven’t tested it yet), but it is enough for you to understand the algorithm.

      // Initialize text to add glossary link
      $desc = "Tank should be well planted with floating plants also used. 
              An abundance of hiding places should be provided as this species 
              likes to hide away during the day. Bogwood, rock caves and PVC 
              piping are all suitable for this purpose. Sand should be used as 
              substrate as the spiny eels often like to bury themselves. 
              Dimmer lighting will encourage the fish to venture from 
              its hiding places more often. A close-fitting hood is required 
              as the eel can find its way through the smallest of gaps. Water 
              flow should be fairly gentle as the fish mainly inhabits 
              areas of still water in the wild.";
      
      // Split text into list of words
      $split = preg_split("/[.,s]+/", $desc);
      
      // Frequency English words
      $freq['will'] = true;
      $freq['also'] = true;
      $freq['with'] = true;
      $freq['about'] = true;
      $freq['back'] = true;
      $freq['been'] = true;
      $freq['were'] = true;
      $freq['want'] = true;
      
      // Get list of unqiue word and elimate unnessary words
      foreach($split as $value) {
          $value = strtolower($value);
      
          if (strlen($value) < 4) continue;
          if (is_numeric($value)) continue;
          if (isset($freq[$value])) continue;
      
          if (!isset($hash[$value])) $hash[$value] = true;
      }
      
      // Join the list for search
      $keys = "";
      foreach ($hash as $key => $value)
          $keys .= "^{$key}|";
      $keys = rtrim($keys, '|');
      
      // Search for list of glossary
      $glossary;
      $result = mysql_query("SELECT glossary_id,glossary_key FROM wp_glossary  WHERE gossary_key REGEXP '{$keys}'")
      if ($result) {
          while($row = mysql_fetch_row($result)) {
              if (strpos($desc, $row[1]) !== false)
                  $glossary[$row[0]] = $row[1];
          }
      }
      
      // You can start to construct the relation from this $glossary
      // by loop throught it one by one and insert it into wp_relation
      print_r($glossary);
      
    • Remove Postmeta : Remove relation between this postmeta and glossary.


    Extra Benefits

    Lets say that you to find which species profile that use glossary “bogwood”. You can easily track it via wp_relation.


    Questions

    1. Firstly, how will a str_replace affect my species profiles which have existing code in them, i.e. a href or img code?

    It does not affect your species profiles if there is no a link to glossary and species name.

    2. Secondly, you’ve mentioned the glossary entries but not the slightly more complex species genus/species names. Would I use a separate table for those, or incorporate it all into one?

    You can have separate table to store relation between species name and postmeta or you can simply treat species names as glossary term with special flag (which need to change how glossary is structured)

    3. How would I then specify the difference (different class in the a tag) between glossary entries and species profiles?

    Answered is depended on which method you use for your second questions.

  2. Here’s another thought, do the markup on the client.

    Extract the taxonomies as text files and cache the “transform terms”.

    guppie
    salmon
    

    Have JS go through them as the reader is reading the page and rewrite the urls;

    <a href="/section/guppie">guppie</a> and a <a href="/section/salmon">salmon</a>
    

    There must be more on the net about this method since an old discussion I once had here.

    I also stumbled across this tut on WP Custom Taxonomies which might be relevant to your case.

  3. Have you considered a Ternary Search Tree?

    This is the data structure I favour when facing the problem of “find x number of words in text mass y”.

    Example tree built (fin, filter, caudal, gal, gaudy):

            f
           /|
    caudal* i ga
            |  | 
            n* l*
           /    
        lter*    udy*
    
    * = end node
    

    And the algorithm will be along the lines of:

    • Begin word to match at root node.
    • As long as current position matches move down.
    • If current position < node. Move left.
    • If current position > node. Move right.
    • If word is finished and node is end node. Word found.
    • If move left/right not possible or word ends in non-end node. Word not found.

    Here’s a good example.

    The advantage is that the tree can be pre-calculated and stored away in some serialized form. It’s often even possible to either build or transfer a pre-built tree to the client and do the matching in the browser.