{"id":131,"date":"2019-07-11T16:27:59","date_gmt":"2019-07-11T13:27:59","guid":{"rendered":"http:\/\/b00blik.ru\/tech\/?p=131"},"modified":"2022-08-01T23:36:57","modified_gmt":"2022-08-01T20:36:57","slug":"how-to-analyse-csv-data-with-elastic-stack","status":"publish","type":"post","link":"https:\/\/b00blik.ru\/tech\/?p=131","title":{"rendered":"How to analyse CSV data with Elastic stack"},"content":{"rendered":"<p style=\"font-size: 12px;  margin-top:-8px; margin-bottom: -8px\">\ud83d\udc41 124<\/p>\n\n\n<p>Hi!<\/p>\n<p>Imagine that we have to analyse a big amount of users data. At this article there will be a data in CSV format.<\/p>\n<p>For example, we can have a data in next format:<\/p>\n<pre class=\"lang:default decode:true\" title=\"Data example\">(0118) 008 0694 | 180504 | Hattersheim am Main | 6<\/pre>\n<p>First column is user&#8217;s phone number, second column is date in YYMMDD format, third is a city, and the last one is a billed amount for this phone number for this day.<\/p>\n<p>It would&nbsp;be great to get some aggregated info by data\/cities\/etc. To perform this aggregations for a huge amount of data we can use ELK-stack.<\/p>\n<p><!--more--><\/p>\n<p>What is ELK-stack? According to official website:<\/p>\n<blockquote><p>&#171;ELK&#187; is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. <strong>Elasticsearch<\/strong> is a search and analytics engine. <strong>Logstash<\/strong> is a server\u2011side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a &#171;stash&#187; like Elasticsearch. <strong>Kibana<\/strong> lets users visualize data with charts and graphs in Elasticsearch.<\/p><\/blockquote>\n<p>Well, we need to perform next steps:<\/p>\n<ol>\n<li>Download and install Elasticsearch, Logstash and Kibana;<\/li>\n<li>Configure Logstash to put&nbsp;data from CSV file to Elasticsearch index;<\/li>\n<li>Perform some magic queries to Elasticsearch to aggregate some data.<\/li>\n<\/ol>\n<p>Great. Let&#8217;s find our three components at official page: <a href=\"https:\/\/www.elastic.co\/downloads\/\" target=\"_blank\" rel=\"noopener\">elastic.co<\/a><\/p>\n<p>Extract archives (zip\/tar) to some directories (I extracted to ~\/elk\/kibana and etc. for example).<\/p>\n<p>It&#8217;s not necessary to change default&nbsp;configuration for Kibana and Elasticsearch. Firstly, we should run Elasticsearch and Kibana after that:<\/p>\n<pre class=\"lang:sh decode:true\" title=\"ES and Kibana start\">~\/elk\/elasticsearch\/bin\/elasticsearch\n~\/elk\/kibana\/bin\/kibana<\/pre>\n<p>So, after few seconds terminal will look like that:<a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/elkscreen.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-133\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/elkscreen.png?resize=640%2C475&#038;ssl=1\" alt=\"elkscreen\" width=\"640\" height=\"475\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/elkscreen.png?resize=1024%2C760&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/elkscreen.png?resize=300%2C223&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/elkscreen.png?w=2006&amp;ssl=1 2006w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/elkscreen.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/elkscreen.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>After starting Elasticsearch we can find some info about this instance at <a href=\"http:\/\/localhost:9200\" target=\"_blank\" rel=\"noopener\">localhost<\/a>&nbsp;with port 9200:<\/p>\n<pre class=\"theme:vs2012 lang:default decode:true\" title=\"elasticsearch instance info\">{\n  \"name\" : \"MacBook-Air-Yuri.local\",\n  \"cluster_name\" : \"elasticsearch\",\n  \"cluster_uuid\" : \"EKex54PqTSOsU9GoE6nuZg\",\n  \"version\" : {\n    \"number\" : \"7.2.0\",\n    \"build_flavor\" : \"default\",\n    \"build_type\" : \"tar\",\n    \"build_hash\" : \"508c38a\",\n    \"build_date\" : \"2019-06-20T15:54:18.811730Z\",\n    \"build_snapshot\" : false,\n    \"lucene_version\" : \"8.0.0\",\n    \"minimum_wire_compatibility_version\" : \"6.8.0\",\n    \"minimum_index_compatibility_version\" : \"6.0.0-beta1\"\n  },\n  \"tagline\" : \"You Know, for Search\"\n}<\/pre>\n<p>And at <a href=\"http:\/\/localhost:5601\" target=\"_blank\" rel=\"noopener\">port 5601<\/a>&nbsp;we can find Kibana. Kibana connects to our running Elasticsearch instance. Let&#8217;s click Monitoring tab at the left and get some info about instance:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-134\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana.png?resize=640%2C376&#038;ssl=1\" alt=\"kibana\" width=\"640\" height=\"376\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana.png?resize=1024%2C602&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana.png?resize=300%2C176&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>We&#8217;ve&nbsp;found that everything is OK with our ES and Kibana instances, so let&#8217;s configure&nbsp;Logstash and put some data into Elasticsearch index.<\/p>\n<blockquote><p>Q: What is an index?<\/p>\n<p>A: &nbsp;An index is like a <span class=\"emphasis\">table<\/span> in a relational database. It has a mapping which contains a type, which contains the fields in the index. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.<\/p><\/blockquote>\n<p>Let&#8217;s configure pipeline.&nbsp;The Logstash event processing pipeline has three stages: inputs \u2192 filters \u2192 outputs. Inputs generate events, filters modify them, and outputs ship them elsewhere.<\/p>\n<p>There is&nbsp;<em>pipeline<\/em> directory in Logstash installation path. So, there are pipelines configurations located. We can create&nbsp;<em>example_pipeline.conf<\/em> file.<\/p>\n<pre class=\"lang:default decode:true \">input {\n    file {\n        path =&gt; \"\/home\/usershome\/somelocation\/datadir\/*.csv\"\n        start_position =&gt; \"beginning\"\n        sincedb_path =&gt; \"\/dev\/null\"\n    }\n}\n\nfilter {\n    csv {\n        separator =&gt; \" | \"\n\tcolumns =&gt; [\"phone\", \"date\", \"city\", \"amount\"]\n    }\n}\noutput {\n    elasticsearch {\n        hosts =&gt; [\"http:\/\/localhost:9200\"]\n        index =&gt; \"cdr_index_v1\"\n    }\n    stdout { }\n}<\/pre>\n<p>Our target index is&nbsp;<em>cdr_index_v1<\/em>.<\/p>\n<p>Now let&#8217;s tell&nbsp;Logstash that we prepared configuration for the new pipeline. There is also <em>config<\/em> directory in Logstash installation. We have to remove example configuration files and make some changes in pipelines.yml:<\/p>\n<pre class=\"lang:default decode:true\" title=\"pipelines.yml\"> - pipeline.id: example_pipeline\n   pipeline.workers: 2\n   path.config: \"\/Users\/b00blik\/elk\/logstash\/pipeline\/example_pipeline.conf\"<\/pre>\n<p>OK, our configuration is prepared. Now let&#8217;s go to&nbsp;<em>logstash\/bin<\/em> directory and start&nbsp;<em>logstash&nbsp;<\/em>executable file. Something like following will be written during data process:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/logstash.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-140 size-large\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/logstash.png?resize=640%2C392&#038;ssl=1\" alt=\"logstash\" width=\"640\" height=\"392\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/logstash.png?resize=1024%2C627&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/logstash.png?resize=300%2C184&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/logstash.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/logstash.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a>And after that we can check our Index Management in Kibana:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana_index.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-141\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana_index.png?resize=640%2C376&#038;ssl=1\" alt=\"kibana_index\" width=\"640\" height=\"376\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana_index.png?resize=1024%2C602&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana_index.png?resize=300%2C176&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana_index.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/kibana_index.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Let&#8217;s go to &#171;Dev Tools&#187; tab and perform some query to select data:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/query_by_city.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-145\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/query_by_city.png?resize=640%2C400&#038;ssl=1\" alt=\"query_by_city\" width=\"640\" height=\"400\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/query_by_city.png?resize=1024%2C640&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/query_by_city.png?resize=300%2C188&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/query_by_city.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/query_by_city.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>We performed query to select data by city. <em>hits<\/em>-&gt;<em>total<\/em>-&gt;<em>value<\/em> is 2000, so ES returned to us 2000 documents. In&nbsp;<em>hits<\/em> array there are all selected documents.<\/p>\n<p>Pay attention to&nbsp;<em>amount<\/em> field in documents \u2013 it contains string value instead of number and <em>date<\/em> is also string-typed instead of datetime.<\/p>\n<p>To perform some aggregations we have to convert this data into specific type. Conversion can be performed by <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/docs-reindex.html\" target=\"_blank\" rel=\"noopener\">reindexing<\/a>.<\/p>\n<p>Using reindex API we can make a copy of source index with some changes to field types.<\/p>\n<p>Firstly, we have to prepare <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/master\/ingest.html\" target=\"_blank\" rel=\"noopener\">ingest node<\/a>. In Elasticsearch installation directory set following string in&nbsp;<em>config\/elasticsearch.yml<\/em> file:<\/p>\n<pre class=\"lang:default decode:true \">node.ingest: true<\/pre>\n<p>If it was false, restart an Elasticsearch.<\/p>\n<p>Go to Dev Tools console. Let&#8217;s create pipeline for data converting:<\/p>\n<pre class=\"lang:default decode:true\">PUT _ingest\/pipeline\/ingest-pipeline-Date\n{\n  \"description\": \"convert Date\",\n  \"processors\": [\n    {\n      \"date\": {\n        \"field\": \"date\",\n        \"target_field\": \"eventDate\",\n        \"formats\": [\"yyMMdd\"],\n        \"timezone\": \"UTC\"\n      },\n      \"convert\": {\n        \"field\": \"amount\",\n        \"type\": \"long\"\n      }\n    }\n  ]\n}<\/pre>\n<p>And click &#171;play&#187; button to perform request. Answer will be like that:<\/p>\n<pre class=\"theme:vs2012 lang:default decode:true \">{\n  \"acknowledged\" : true\n}<\/pre>\n<p>Next, let&#8217;s perform reindex request to make new index in ES with updated fields&#8217; types:<\/p>\n<pre class=\"lang:default decode:true \">POST _reindex?refresh=true\n{\n  \"source\": {\n    \"index\": \"cdr_index_v1\"\n  },\n  \"dest\": {\n    \"index\": \"cdr_index_refiltered_v1\",\n    \"pipeline\": \"ingest-pipeline-Date\"\n  }\n}<\/pre>\n<p>A new index can be found in Kibana:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/indices.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-146\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/indices.png?resize=640%2C400&#038;ssl=1\" alt=\"indices\" width=\"640\" height=\"400\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/indices.png?resize=1024%2C640&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/indices.png?resize=300%2C188&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/indices.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/indices.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>Great! After querying to ES all docs will look like that:<\/p>\n<pre class=\"theme:vs2012 lang:default decode:true \">{\n        \"_index\" : \"cdr_index_refiltered_v1\",\n        \"_type\" : \"_doc\",\n        \"_id\" : \"OBGy4GsB4cQZQeUNx0Pq\",\n        \"_score\" : 5.013716,\n        \"_source\" : {\n          \"date\" : \"180503\",\n          \"amount\" : 9,\n          \"city\" : \"Norman\",\n          \"message\" : \"(0171)313 8219 | 180503 | Norman | 9\",\n          \"path\" : \"\/Users\/b00blik\/elk\/datadir\/example.csv\",\n          \"@timestamp\" : \"2019-07-11T11:02:21.152Z\",\n          \"phone\" : \"(0171)313 8219\",\n          \"@version\" : \"1\",\n          \"host\" : \"Air-Yuri.Dlink\",\n          \"eventDate\" : \"2018-05-03T00:00:00.000Z\"\n        }<\/pre>\n<p>Looks good. But this index exists only in Elasticsearch, and we have to tell Kibana that we have something new! We can do it by creating Index Pattern. Select &#171;Index Patterns&#187; in Kibana section at Management console and create Index Pattern:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p1.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-147\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p1.png?resize=640%2C400&#038;ssl=1\" alt=\"ind_p1\" width=\"640\" height=\"400\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p1.png?resize=1024%2C640&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p1.png?resize=300%2C188&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p1.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p1.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>At the second step Kibana asks us to select a field with timestamp. I selected nothing.<a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p2.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-148\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p2.png?resize=640%2C400&#038;ssl=1\" alt=\"ind_p2\" width=\"640\" height=\"400\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p2.png?resize=1024%2C640&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p2.png?resize=300%2C188&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p2.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/ind_p2.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>Well, now we can make some visualizations. Open &#171;Visualize&#187; tab in Kibana and click &#171;Create new visualization&#187;:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis1.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-149\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis1.png?resize=640%2C365&#038;ssl=1\" alt=\"vis1\" width=\"640\" height=\"365\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis1.png?resize=1024%2C583&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis1.png?resize=300%2C171&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis1.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis1.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>After that, for example, let&#8217;s select &#171;Horizontal bar&#187; and our recently created Kibana index.<\/p>\n<p>We can make visualization by cities. Axis can be changed in the left panel:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis2.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-150\" src=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis2.png?resize=640%2C400&#038;ssl=1\" alt=\"vis2\" width=\"640\" height=\"400\" srcset=\"https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis2.png?resize=1024%2C640&amp;ssl=1 1024w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis2.png?resize=300%2C188&amp;ssl=1 300w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis2.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/b00blik.ru\/tech\/wp-content\/uploads\/2019\/07\/vis2.png?w=1920&amp;ssl=1 1920w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Another type of data analysis can be an <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/search-aggregations.html\" target=\"_blank\" rel=\"noopener\">aggregation<\/a>.&nbsp;For example, we can aggregate an amount value for every city. Query will look like this:<\/p>\n<pre class=\"lang:default decode:true\">POST cdr_index_refiltered_v1\/_search?size=0\n{\n  \"aggs\": {\n    \"by_city\": {\n        \"terms\": {\n          \"field\":\"city.keyword\"\n        },\n        \"aggs\": {\n                    \"sum_au\": {\n                      \"sum\": {\n                        \"field\": \"amount\"\n                      }\n                    }\n       }\n    }\n  }\n}<\/pre>\n<p>And result of query will look like that:<\/p>\n<pre class=\"theme:vs2012 lang:default decode:true\">{\n  \"took\" : 70,\n  \"timed_out\" : false,\n  \"_shards\" : {\n    \"total\" : 1,\n    \"successful\" : 1,\n    \"skipped\" : 0,\n    \"failed\" : 0\n  },\n  \"hits\" : {\n    \"total\" : {\n      \"value\" : 10000,\n      \"relation\" : \"gte\"\n    },\n    \"max_score\" : null,\n    \"hits\" : [ ]\n  },\n  \"aggregations\" : {\n    \"by_city\" : {\n      \"doc_count_error_upper_bound\" : 0,\n      \"sum_other_doc_count\" : 274000,\n      \"buckets\" : [\n        {\n          \"key\" : \"Castelnovo del Friuli\",\n          \"doc_count\" : 4000,\n          \"sum_au\" : {\n            \"value\" : 20000.0\n          }\n        },\n        {\n          \"key\" : \"Roveredo in Piano\",\n          \"doc_count\" : 4000,\n          \"sum_au\" : {\n            \"value\" : 24000.0\n          }\n        },\n        {\n          \"key\" : \"Allentown\",\n          \"doc_count\" : 3000,\n          \"sum_au\" : {\n            \"value\" : 21000.0\n          }\n        },\n        {\n          \"key\" : \"Devon\",\n          \"doc_count\" : 3000,\n          \"sum_au\" : {\n            \"value\" : 26000.0\n          }\n        },\n        {\n          \"key\" : \"Flin Flon\",\n          \"doc_count\" : 3000,\n          \"sum_au\" : {\n            \"value\" : 16000.0\n          }\n        },\n        {\n          \"key\" : \"Acciano\",\n          \"doc_count\" : 2000,\n          \"sum_au\" : {\n            \"value\" : 2000.0\n          }\n        },\n        {\n          \"key\" : \"Alphen aan den Rijn\",\n          \"doc_count\" : 2000,\n          \"sum_au\" : {\n            \"value\" : 4000.0\n          }\n        },\n        {\n          \"key\" : \"Al\u00e8s\",\n          \"doc_count\" : 2000,\n          \"sum_au\" : {\n            \"value\" : 14000.0\n          }\n        },\n        {\n          \"key\" : \"Aquila d'Arroscia\",\n          \"doc_count\" : 2000,\n          \"sum_au\" : {\n            \"value\" : 20000.0\n          }\n        },\n        {\n          \"key\" : \"Bendigo\",\n          \"doc_count\" : 2000,\n          \"sum_au\" : {\n            \"value\" : 10000.0\n          }\n        }\n      ]\n    }\n  }\n}\n<\/pre>\n<p>Useful links:<\/p>\n<ol>\n<li><a href=\"https:\/\/www.elastic.co\/guide\/en\/elastic-stack\/current\/index.html\" target=\"_blank\" rel=\"noopener\">https:\/\/www.elastic.co\/guide\/en\/elastic-stack\/current\/index.html<\/a><\/li>\n<li><a href=\"https:\/\/www.elastic.co\/guide\/en\/kibana\/current\/setup.html\" target=\"_blank\" rel=\"noopener\">https:\/\/www.elastic.co\/guide\/en\/kibana\/current\/setup.html<\/a><\/li>\n<li><a href=\"https:\/\/www.elastic.co\/guide\/en\/kibana\/current\/visualize.html\" target=\"_blank\" rel=\"noopener\">https:\/\/www.elastic.co\/guide\/en\/kibana\/current\/visualize.html<\/a><\/li>\n<\/ol>","protected":false},"excerpt":{"rendered":"<p>Hi! Imagine that we have to analyse a big amount of users data. At this article there will be a data in CSV format. For example, we can have a data in next format: (0118) 008 0694 | 180504 | Hattersheim am Main | 6 First column is user&#8217;s phone number, second column is date &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/b00blik.ru\/tech\/?p=131\" class=\"more-link\">\u0427\u0438\u0442\u0430\u0442\u044c \u0434\u0430\u043b\u0435\u0435<span class=\"screen-reader-text\"> \u00abHow to analyse CSV data with Elastic stack\u00bb<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[1],"tags":[29,27,28],"class_list":["post-131","post","type-post","status-publish","format-standard","hentry","category-1","tag-data","tag-elastic","tag-elk","entry"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","views":{"total":109,"cached_at":""},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p6oGDv-27","_links":{"self":[{"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=\/wp\/v2\/posts\/131","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=131"}],"version-history":[{"count":11,"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=\/wp\/v2\/posts\/131\/revisions"}],"predecessor-version":[{"id":563,"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=\/wp\/v2\/posts\/131\/revisions\/563"}],"wp:attachment":[{"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=131"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=131"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/b00blik.ru\/tech\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=131"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}