Extract the first oembed url inserted on the content of a post

I want to extract the first oEmbed url inserted on the content of a post in order to put in a meta tag from the header, or elsewhere as a way to style it differently from the rest of the content.

Related posts

Leave a Reply

3 comments

  1. I assume that you’re only interested in the first URL that actually succeeds at discovering actual oembed data. The oembed system processes all links it finds, but not every link will have oembed going for it, obviously.

    The filter you’ll want to use is embed_oembed_html and it gets the HTML cached by oembed, the url, any attributes on the embed, and the post_ID, which is important for your code.

    add_filter('embed_oembed_html', 'my_function',10,4);
    function my_function( $cache, $url, $attr, $post_ID ) {
      global $my_previous_post_id;
      if ($my_previous_post_id != $post_ID) {
        // post ID changed, so this is the first oembed for the post
        // do something with $url
        $my_previous_post_id = $post_ID;
      }
      return $cache; // it's important that you return the $cache value as-is
    }
    

    Now, the whole oembed system is running at the same time as shortcodes do: during the_content filter call. So if you want to grab stuff for the header, you’ll have to start the main Loop in the header, run the_content filter over the get_the_content() value, then call rewind_posts() to rewind the query back to the start for the actual main Loop later on in the page.

    This sort of behavior causes problems with plugins (like Nextgen gallery) that do stupid things when you run a loop in the header. There’s no working around it, but the fact is that those plugins are fundamentally broken and you can’t correct their problems. I get this sort of report with SFC-Share and SFC-Like all the time (because they pull content out to put in the header too). Nothing you can do about it, frankly.

  2. Try this:

    function get_first_oembed($id) {
    
        $meta = get_post_custom($id);
    
        foreach ($meta as $key => $value)
            if (false !== strpos($key, 'oembed'))
                return $value[0];
    }
    

    embeds seem to be stored as rendered blocks of HTML so if you want link alone you will additionally need to extract it.

  3. I have tried the answer from the @Rarst, but in fact it is not very stable solution to the problem. If you try to var_dump the $meta var above, you will notice that there is at least one another meta field _oembed_time_xxxxxxxxxxxx which is probably for cache busting.

    And also, when I removed the oEmbed link from the content of the post, the cached eEmbed meta strings were not removed, which is not what I wanted.

    So I dig a bit deeper in the native WP codes of the wp-includes/class-oembed.php and wp-includes/class-wp-embed.php and came up with much more solid solution:

    function get_first_oembed_from_content( $content ) {
        if ( preg_match( '|^s*(https?://[^s"]+)s*$|im', $content, $matches ) ) {
            return wp_oembed_get( $matches[1] );
        }
    
        return false;
    }
    

    The regex pattern is copied from the native WP_Embed:autoembed() function, so it is the most reliable one.

    It is easy from here to modify this function and extract only the first URL as well.