First note: This site is hosted on WPEngine (varnish caching), but I can’t seem to replicate the issue on another server.
We need to be able to access the $_GET php variable on some pages. For testing, I modified our WordPress header.php to do a var_dump on the very first line.
Normally, everything works OK. However if the URL string contains “utm_”, every subsequent variable in $_GET is stripped. The extra weird part is that if I am logged in to WordPress, everything works fine.
Our Paypal return URL looks like this:
http://oururl.com/buy/thankyou/?utm_nooverride=1tx=xxxxyyyy…
The utm_nooverride causes the $_GET to be an empty array. If I change it to “test=1&tx=xxxxyyyy”, it works OK. If I use “utm_test=1&tx=xxxxyyyy” I get an empty array again.
There is nothing odd in the .htaccess, only a few standard WordPress lines.
Could there be something in the hosting causing this?
In case anyone else runs into the same problem, as I just did, I spoke with WPEngine support team via Live chat and. They rectified it within a few minutes
Here’s a shortened transcript of our chat:
Link for reference: https://wpengine.com/support/utm-gclid-variables-caching/
WP Engine may have (mis)configured Varnish to ignore query string parameters when they reference Google Analytics campaign variables. They may have done this so they can reference the cache of the page without the query string, since the campaign variables are read client-side (not server-side) by analytics provider. Therefore ignoring these variables server-side would ostensibly have no effect, and it would improve performance for sites making heavy use of inbound Google Analytics tracking.
I say it’s possible since there’s a Stack Overflow question asking how to do just that: “Stripping out select querystring attribute/value pairs so varnish will not vary cache by them”. The only way to know for certain is to contact WP Engine.
I’m currently chatting with WPEngine in the hopes of getting this issue resolved.
WPEngine’s varnish cache does in fact strip out utm_ and gclid_ parameters in an effort to improve caching. Sadly, WPEngines implementation of this “feature” strips out all subsequent query parameters after the first utm_ or gclid_ parameter is identified.
For example the the URL:
http://www.example.com/test/?foo=bar&utm_source=email&page=1
What you would expect your server to receive:
http://www.example.com/test/?foo=bar&page=1
What your server actually receives:
http://www.example.com/test/?foo=bar
Notice how the page=1 parameter is removed even though it is not a utm_ or gclid_ parameter.
WPEngine’s proposed workaround is to apply utm_ and gclid_ to the cache exclusion list, but this means that cache will not be served if ever there is a a utm_ or gclid_ parameter in your url. This seems less than ideal as if a URL has a utm_ or gclid_ parameter it is most likely from an email, and doing large email send outs means a spike in traffic which is exactly when you would want to serve a cached page.
Below is some javascript that detects if the utm_ or gclid_ is in the URL and if it is it rearranges the url so that the utm_ and gclid_ parameters are at the end of the query string and then triggers a page redirect. The code below specifically looks for a parameter called tfa_next which is a parameter added on to the end of a URL via a FormAssembly redirect. The below code could be improved to be more generic but I hope it serves as a starting point for anyone who needs it.