I dumped my wordpress posts to Elasticsearch, but when search for suggestion terms I still get stopwords and html elements. For example, the
, a
or even p
tag. I specified in the index already to use these filters.
Here’s my code.
es.indices.create(
index='wp-posts',
body={
'settings': {
# just one shard, no replicas for testing
'number_of_shards': 1,
'number_of_replicas': 0,
# custom analyzer for analyzing file paths
'analysis': {
'analyzer': {
"my_analyzer": {
"type": "standard",
"stopwords": "_english_"
},
'wordpress_content': {
'type': 'custom',
'tokenizer': 'standard',
'filter': ['html_strip']
}
}
}
}
},
# Will ignore 400 errors, remove to ensure you're prompted
ignore=400
)
And this is how I search for suggestion. Unless I do something wrong.
result = es.suggest(index="wp-posts", body={"my_suggestion": {"text": post['content'], "term": {"field":"content" }}})