I decided to load some AFL data into the Elastic Stack and do some basic visualisations. I loaded data for all home and away plus finals games since 2017 so four seasons in total. Follow below if you want to do the same.
Steps
Note: We already have Elasticsearch cluster running for this demo
$ curl -u "elastic:welcome1" localhost:9200 { "name" : "node1", "cluster_name" : "apples-cluster", "cluster_uuid" : "hJrp2eJaRGCfBt7Zg_-EJQ", "version" : { "number" : "7.10.0", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96", "build_date" : "2020-11-09T21:30:33.964949Z", "build_snapshot" : false, "lucene_version" : "8.7.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
First I need the data loaded into the Elastic Stack I did that using Squiggle API which you would do as follows
1. I use HTTPie rather then curl.
http "https://api.squiggle.com.au/?q=games;complete=100" > games-2017-2020.json
2. Now this data itself needs to be altered slightly so I can BULK load it into Elasticsearch cluster and I do that as follows. I use JQ to do this.
$ cat games-2017-2020.json | jq -c '.games[] | {"index": {"_id": .id}}, .' > converted-games-2017-2020.json
Snippet I what the JSON file now looks like
{"index":{"_id":1}}
{"round":1,"hgoals":14,"roundname":"Round 1","hteamid":3,"hscore":89,"winner":"Richmond","ateam":"Richmond","hbehinds":5,"venue":"M.C.G.","year":2017,"complete":100,"id":1,"localtime":"2017-03-23 19:20:00","agoals":20,"date":"2017-03-23 19:20:00","hteam":"Carlton","updated":"2017-04-15 15:59:16","tz":"+11:00","ascore":132,"ateamid":14,"winnerteamid":14,"is_grand_final":0,"abehinds":12,"is_final":0}
{"index":{"_id":2}}
{"date":"2017-03-24 19:50:00","agoals":15,"ateamid":18,"winnerteamid":18,"hteam":"Collingwood","updated":"2017-04-15 15:59:16","tz":"+11:00","ascore":100,"is_grand_final":0,"abehinds":10,"is_final":0,"round":1,"hgoals":12,"hscore":86,"winner":"Western Bulldogs","ateam":"Western Bulldogs","roundname":"Round 1","hteamid":4,"hbehinds":14,"venue":"M.C.G.","year":2017,"complete":100,"id":2,"localtime":"2017-03-24 19:50:00"}
{"index":{"_id":3}}
{"hscore":82,"ateam":"Port Adelaide","winner":"Port Adelaide","roundname":"Round 1","hteamid":16,"round":1,"hgoals":12,"complete":100,"id":3,"localtime":"2017-03-25 16:35:00","venue":"S.C.G.","hbehinds":10,"year":2017,"ateamid":13,"winnerteamid":13,"updated":"2017-04-15 15:59:16","hteam":"Sydney","tz":"+11:00","ascore":110,"date":"2017-03-25 16:35:00","agoals":17,"is_final":0,"is_grand_final":0,"abehinds":8}
Load data into Elasticsearch cluster as follows
$ curl -u "elastic:welcome1" -H "Content-Type: application/json" -XPOST "localhost:9200/afl_games/_bulk?pretty&refresh" --data-binary "@converted-games-2017-2020.json"
3. Using DevTools with Kibana we can run a query as follows
Question: Get each teams winning games for the season 2020 before finals - Final Ladder
Query:
GET afl_games/_search { "size": 0, "query": { "bool": { "must": [ { "match": { "year": 2020 } }, { "match": { "is_final": 0 } } ] } }, "aggs": { "group_by_winner": { "terms": { "field": "winner.keyword", "size": 20 } } } }
Results:
Results { "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 153, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_winner" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Brisbane Lions", "doc_count" : 14 }, { "key" : "Port Adelaide", "doc_count" : 14 }, { "key" : "Geelong", "doc_count" : 12 }, { "key" : "Richmond", "doc_count" : 12 }, { "key" : "West Coast", "doc_count" : 12 }, { "key" : "St Kilda", "doc_count" : 10 }, { "key" : "Western Bulldogs", "doc_count" : 10 }, { "key" : "Collingwood", "doc_count" : 9 }, { "key" : "Melbourne", "doc_count" : 9 }, { "key" : "Greater Western Sydney", "doc_count" : 8 }, { "key" : "Carlton", "doc_count" : 7 }, { "key" : "Fremantle", "doc_count" : 7 }, { "key" : "Essendon", "doc_count" : 6 }, { "key" : "Gold Coast", "doc_count" : 5 }, { "key" : "Hawthorn", "doc_count" : 5 }, { "key" : "Sydney", "doc_count" : 5 }, { "key" : "Adelaide", "doc_count" : 3 }, { "key" : "North Melbourne", "doc_count" : 3 } ] } } }
4. Finally using Kibana Lens to easily visualize this data using a Kibana Dasboard
Of course you could do much more plus load more data from Squiggle and with the power of Kibana feel free to create your own visualizations.
More Information
Squiggle API
Getting Started with the Elastic Stack
https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
No comments:
Post a Comment