Correct regex to detect font family names from google font link src

I’ve been trying to get array of the fonts that I’m enqeueing on my wordpress theme. This is just for testing.

On input:

Read More
http://fonts.googleapis.com/css?family=Arimo:400,700|Quicksand:400,700|Cantarell:400,700,400italic,700italic|Muli:300,400,300italic,400italic|Roboto+Slab:400,700|Share:400,700,400italic,700italic|Inconsolata:400,700|Karla:400,700,400italic,700italic|Maven+Pro:400,500,700,900|Roboto+Slab:400,700|Open+Sans:400italic,600italic,700italic,400,600,700

What I need on output is like this:

array(
[0] => 'Arimo',
[1] => 'Quicksand',
[2] => 'Cantarell',
... so on
)

Till now, I have done almost everything but one little problem.

My code:

$input = 'http://fonts.googleapis.com/css?family=Arimo:400,700|Quicksand:400,700|Cantarell:400,700,400italic,700italic|Muli:300,400,300italic,400italic|Roboto+Slab:400,700|Share:400,700,400italic,700italic|Inconsolata:400,700|Karla:400,700,400italic,700italic|Maven+Pro:400,500,700,900|Roboto+Slab:400,700|Open+Sans:400italic,600italic,700italic,400,600,700';

$against = "/[A-Z][a-z]+[+][A-Z][a-z]+|[A-Z][a-z]+/";

$matches = array()

preg_match_all( $against, $input, $matches );

print_r($matches);

From this, the output is like this:

array(
0   =>  Arimo
1   =>  Quicksand
2   =>  Cantarell
3   =>  Muli
4   =>  Roboto+Slab
5   =>  Share
6   =>  Inconsolata
7   =>  Karla
8   =>  Maven+Pro
9   =>  Roboto+Slab
10  =>  Open+Sans
)

There’s the + sign where the font name has spaces. I want to get rid of that.

I’m not a regex expert. So, couldn’t manage to do that.

Note: I know I could do it with str_replace() but don’t want to go through that long process. I want to know if it’s possible to escape the + sign through and leave an empty space there when we are collecting matched expressions.

Related posts

Leave a Reply

4 comments

  1. Without regex:

    $query = strtr(substr(parse_url($url, PHP_URL_QUERY),7), '+', ' ');
    
    $result = array_map(function ($i) { return explode(':', $i)[0]; }, explode('|', $query));
    

    With regex:

    if (preg_match_all('~(?:G(?!A)|[^?&]+[?&]family=)([^:|&]+):[^:|&]*(?:[|&#]|z)~', strtr($url, '+', ' '), $m))
       $result2 = $m[1];
    
  2. From your code, output is given me something like this.

    array([0] => array([0]   =>  Arimo[1]   =>  Quicksand[2]   =>  Cantarell[3]   =>  Muli[4]   =>  Roboto+Slab[5]   =>  Share[6]   =>  Inconsolata[7]   =>  Karla[8]   =>  Maven+Pro[9]   =>  Roboto+Slab[10]  =>  Open+Sans))
    

    if is correct, then i was solve this issue ‘+’. here is the solution.

    $input = 'http://fonts.googleapis.com/css?family=Arimo:400,700|Quicksand:400,700|Cantarell:400,700,400italic,700italic|Muli:300,400,300italic,400italic|Roboto+Slab:400,700|Share:400,700,400italic,700italic|Inconsolata:400,700|Karla:400,700,400italic,700italic|Maven+Pro:400,500,700,900|Roboto+Slab:400,700|Open+Sans:400italic,600italic,700italic,400,600,700';
    
    $against = "/[A-Z][a-z]+[+][A-Z][a-z]+|[A-Z][a-z]+/";
    
    $matches = array();
    $newArr=array();
    preg_match_all( $against, $input, $matches );
    
    for($i=0;$i< count($matches);$i++){
        for($j=0;$j< count($matches[$i]);$j++){
            $string=preg_replace('/[^A-Za-z0-9-]/', ' ', $matches[$j]);
            if($string!=""){
                $newArr[]=$string;
            }
        }    
    }
    print_r($newArr);
    
  3. In general, you have more than + characters to worry about.

    Special characters, such as the ampersand (&), and non-ASCII characters in URL query parameters have to be escaped using percent-encoding (%xx). In addition, when an HTML form is submitted, spaces are encoded using the + character.

    For example:

    • The font family “Jacques & Gilles” would be escaped as:

      Jacques+%26+Gilles

    • The Unicode character U+1E99 (LATIN SMALL LETTER Y WITH RING ABOVE), serialized into octets as UTF-8 (E1 BA 99), would be escaped as:

      %e1%ba%99


    To do what you want properly, you have to extract the query string from the URL, and use parse_str() to extract the name=value pairs. The parse_str() function will automatically urldecode() the names and values including the + characters.

    First, split the URL on the ? character to extract the query string:

    $url = 'http://fonts.googleapis.com/css?family=Arimo:400,700|...|Maven+Pro:400,500,700,900|Roboto+Slab:400,700|...';
    
    $a = explode ('?', $url, 2);
    if (isset ($a[1])) {
      $query = $a[1];
    }
    

    You can also use parse_url ($url, PHP_URL_QUERY), but it doesn’t buy you much in this case.

    Then extract all the parameters:

    if (isset ($query)) {
      parse_str ($query, $params);
    
      if (isset ($params['family'])) {
        /* OK: Extract family names. */
      } else {
        /* Error: No family parameter found. */
      }
    } else {
      /* Error: No query string found. */
    }
    

    Note: You should always specify the second parameter of parse_str() to avoid clobbering existing variables.