Basic search with Elasticsearch
Inspired by this tutorial I tried to continue investigating Elasticsearch since I would like to use a fast indexing tool for the data I am gathering and the applications I am developing.
Install the Python library for Elasticsearch¶
https://elasticsearch-py.readthedocs.io/en/master/
$ pip install elasticsearch
Note: on my Mac I installed Elasticsearch through Brew
$ brew install elasticsearch
$ brew services start elasticsearch
In [1]:
import pandas as pd
character_df = pd.read_csv('data/nintendo_characters.csv')
character_df
Out[1]:
Remove the NaN
In [2]:
character_df.occupation = character_df.occupation.fillna('')
Read the world data
In [3]:
world_df = pd.read_csv('data/super_mario_3_worlds.csv', sep=';')
world_df
Out[3]:
In [4]:
ES_HOST = {"host" : "localhost", "port" : 9200}
INDEX_NAME = 'nintendo'
TYPE_NAME = 'character'
ID_FIELD = 'id'
Setup the Elasticsearch connector¶
In [5]:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts = [ES_HOST])
Create the index¶
Create the index for nintendo
if it does not exists, otherwise first delete it.
In [6]:
if es.indices.exists(INDEX_NAME):
print("Deleting the '%s' index" % (INDEX_NAME))
res = es.indices.delete(index = INDEX_NAME)
print("Acknowledged: '%s'" % (res['acknowledged']))
request_body = {
"settings" : {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
print("Creating the '%s' index!" % (INDEX_NAME))
res = es.indices.create(index = INDEX_NAME, body = request_body)
print("Acknowledged: '%s'" % (res['acknowledged']))
Create the bulk data¶
Loop through the dataframe and create the data to insert into the index.
In [7]:
bulk_data = []
In [8]:
for index, row in character_df.iterrows():
data_dict = {}
for i in range(len(row)):
data_dict[character_df.columns[i]] = row[i]
op_dict = {
"index": {
"_index": 'nintendo',
"_type": 'character',
"_id": data_dict['id']
}
}
bulk_data.append(op_dict)
bulk_data.append(data_dict)
In [9]:
for index, row in world_df.iterrows():
data_dict = {}
for i in range(len(row)):
data_dict[world_df.columns[i]] = row[i]
op_dict = {
"index": {
"_index": 'nintendo',
"_type": 'world',
"_id": data_dict['id']
}
}
bulk_data.append(op_dict)
bulk_data.append(data_dict)
Insert the data into the index¶
In [10]:
import json
print("Bulk indexing...")
res = es.bulk(index = INDEX_NAME, body = bulk_data, refresh = True)
Query using CURL¶
In [11]:
!curl -XGET 'http://localhost:9200/_search?pretty'
Search all worlds:
curl -XGET 'http://localhost:9200/nintendo/world/_search?pretty'
Pagination:
curl -XGET 'http://localhost:9200/nintendo/world/_search?size=2&from=2&pretty'
Specify the fields you want to be returned:
curl -XGET 'http://localhost:9200/nintendo/character/_search?pretty&q=name:Luigi&fields=name,occupation'
Search for the word 'pipe':
curl -XGET 'http://localhost:9200/nintendo/world/_search?pretty&q=pipe'