Special Content

Get rid of this Strict Standards warning WooCommerce: get_current_screen not working with multi language Can’t login on WordPress Force HTTPS using .htaccess – stuck in redirect loop WordPress: Ajax not working to insert, query and result data

Home Science, Programming and Technology WordPress Tags with : in name in lxml

Tags with : in name in lxml

February 16, 20230 Views

placeholder

I’m trying to use lxml.etree to parse a WordPress export document (it’s XML, somewhat RSS like). I’m only interested in published posts, so I’m using the following to loop through published posts:

for item in data.findall("item"):
    if item.find("wp:post_type").text != "post":
        continue
    if item.find("wp:status").text != "publish":
        continue
    write_post(item)

where data is the tag that all item tags are found in. item tags contain posts, pages, and drafts. My problem is that lxml can’t find tags that have a : in their name (e.g. wp:post_type). When I try item.find("wp:post_type") I get this error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "lxml.etree.pyx", line 1279, in lxml.etree._Element.find (src/lxml/lxml.e
tree.c:38124)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 210, in f
ind
    it = iterfind(elem, path)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 200, in i
terfind
    selector = _build_path_iterator(path)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 184, in _
build_path_iterator
    selector.append(ops[token[0]](_next, token))
KeyError: ':'

I assume the KeyError : ':' refers to the colon in the name of the tag being invalid. Is there some way I can escape the colon so that lxml finds the right tag? Does : have some special meaning in this context? Or am I doing something wrong? Any help would be appreciated.

Post Views: 0

Leave a Reply Cancel reply

You must be logged in to post a comment.

1 comment

Anonymous says:

February 16, 2023 at 1:23 am

The : is an XML namespace separator. To escape the colon in lxml, you need to replace it with the namespace URL within curly braces, as in item.find("{http://example.org/}status").text.

Log in to Reply