Search This Blog

Tuesday 29 December 2020

Loading Australian Football League (AFL) Data into the Elastic Stack with some cool visulaizations

I decided to load some AFL data into the Elastic Stack and do some basic visualisations. I loaded data for all home and away plus finals games since 2017 so four seasons in total. Follow below if you want to do the same. 

Steps

Note: We already have Elasticsearch cluster running for this demo

$ curl -u "elastic:welcome1" localhost:9200
{
  "name" : "node1",
  "cluster_name" : "apples-cluster",
  "cluster_uuid" : "hJrp2eJaRGCfBt7Zg_-EJQ",
  "version" : {
    "number" : "7.10.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
    "build_date" : "2020-11-09T21:30:33.964949Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}  

First I need the data loaded into the Elastic Stack I did that using Squiggle API which you would do as follows

1. I use HTTPie rather then curl. 

http "https://api.squiggle.com.au/?q=games;complete=100" > games-2017-2020.json

2. Now this data itself needs to be altered slightly so I can BULK load it into Elasticsearch cluster and I do that as follows. I use JQ to do this.

$ cat games-2017-2020.json | jq -c '.games[] | {"index": {"_id": .id}}, .' > converted-games-2017-2020.json

Snippet I what the JSON file now looks like

{"index":{"_id":1}}

{"round":1,"hgoals":14,"roundname":"Round 1","hteamid":3,"hscore":89,"winner":"Richmond","ateam":"Richmond","hbehinds":5,"venue":"M.C.G.","year":2017,"complete":100,"id":1,"localtime":"2017-03-23 19:20:00","agoals":20,"date":"2017-03-23 19:20:00","hteam":"Carlton","updated":"2017-04-15 15:59:16","tz":"+11:00","ascore":132,"ateamid":14,"winnerteamid":14,"is_grand_final":0,"abehinds":12,"is_final":0}

{"index":{"_id":2}}

{"date":"2017-03-24 19:50:00","agoals":15,"ateamid":18,"winnerteamid":18,"hteam":"Collingwood","updated":"2017-04-15 15:59:16","tz":"+11:00","ascore":100,"is_grand_final":0,"abehinds":10,"is_final":0,"round":1,"hgoals":12,"hscore":86,"winner":"Western Bulldogs","ateam":"Western Bulldogs","roundname":"Round 1","hteamid":4,"hbehinds":14,"venue":"M.C.G.","year":2017,"complete":100,"id":2,"localtime":"2017-03-24 19:50:00"}

{"index":{"_id":3}}

{"hscore":82,"ateam":"Port Adelaide","winner":"Port Adelaide","roundname":"Round 1","hteamid":16,"round":1,"hgoals":12,"complete":100,"id":3,"localtime":"2017-03-25 16:35:00","venue":"S.C.G.","hbehinds":10,"year":2017,"ateamid":13,"winnerteamid":13,"updated":"2017-04-15 15:59:16","hteam":"Sydney","tz":"+11:00","ascore":110,"date":"2017-03-25 16:35:00","agoals":17,"is_final":0,"is_grand_final":0,"abehinds":8}

Load data into Elasticsearch cluster as follows

$ curl -u "elastic:welcome1" -H "Content-Type: application/json" -XPOST "localhost:9200/afl_games/_bulk?pretty&refresh"  --data-binary "@converted-games-2017-2020.json"

3. Using DevTools with Kibana we can run a query as follows

Question: Get each teams winning games for the season 2020 before finals - Final Ladder

Query:

GET afl_games/_search
{
  "size": 0, 
  "query": {
      "bool": {
        "must": [
          {
            "match": {
              "year": 2020
            }
          },
          {
            "match": {
              "is_final": 0
            }
          }
        ]
      }
    }, 
    "aggs": {
      "group_by_winner": {
        "terms": {
          "field": "winner.keyword",
          "size": 20
        }
      }
    }
} 

Results:

Results  
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 153,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_winner" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Brisbane Lions",
          "doc_count" : 14
        },
        {
          "key" : "Port Adelaide",
          "doc_count" : 14
        },
        {
          "key" : "Geelong",
          "doc_count" : 12
        },
        {
          "key" : "Richmond",
          "doc_count" : 12
        },
        {
          "key" : "West Coast",
          "doc_count" : 12
        },
        {
          "key" : "St Kilda",
          "doc_count" : 10
        },
        {
          "key" : "Western Bulldogs",
          "doc_count" : 10
        },
        {
          "key" : "Collingwood",
          "doc_count" : 9
        },
        {
          "key" : "Melbourne",
          "doc_count" : 9
        },
        {
          "key" : "Greater Western Sydney",
          "doc_count" : 8
        },
        {
          "key" : "Carlton",
          "doc_count" : 7
        },
        {
          "key" : "Fremantle",
          "doc_count" : 7
        },
        {
          "key" : "Essendon",
          "doc_count" : 6
        },
        {
          "key" : "Gold Coast",
          "doc_count" : 5
        },
        {
          "key" : "Hawthorn",
          "doc_count" : 5
        },
        {
          "key" : "Sydney",
          "doc_count" : 5
        },
        {
          "key" : "Adelaide",
          "doc_count" : 3
        },
        {
          "key" : "North Melbourne",
          "doc_count" : 3
        }
      ]
    }
  }
}

4. Finally using Kibana Lens to easily visualize this data using a Kibana Dasboard


Of course you could do much more plus load more data from Squiggle and with the power of Kibana feel free to create your own visualizations.

More Information

Squiggle API

https://api.squiggle.com.au/

Getting Started with the Elastic Stack

https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html

Tuesday 22 December 2020

VMware Solutions Hub - Elastic Cloud on Kubernetes - the official Elasticsearch Operator from the creators

Proud to have worked on this with the VMware Tanzu team and Elastic team to add this to VMware Solution Hub page clearly highlighting what the Elastic Stack on Kubernetes really means.

Do you need to run your Elastic Stack on a certified Kubernetes distribution, bolstered by the global Kubernetes community allowing you to focus on delivering innovative applications powered by Elastic?

If so click below to get started:

https://tanzu.vmware.com/solutions-hub/data-management/elastic

More Information

https://tanzu.vmware.com/solutions-hub/data-management/elastic