Dealing with nested data
Dealing with nested data can be really frustrating…
Especially if you want to keep your workspace nice and tidy with all your data in tables!
With no actual experience trying to get at these nested tibbles can seem almost impossible:
Downloading data from an api created by Blizzard
To illustrate how you would deal with nested data I found an api that let’s you download all kinds of data on the e-sport/game called Overwatch. By downloading this game data you could make all kinds of predictions for your fantasy league team etc.
According to a response from the api…
List of api requests possible:
- /rankings - returns current rankings
- /schedule - returns current schedule including past matches
- /matches - Returns all matches
- /matches/:matchID
- /match - Returns all matches
- /match/:matchID - Returns a specific match
- /teams - Returns all teams
- /teams/:TeamID - Returns a specific team
- /news - Returns all news items
- /news/:blogID - Returns a specified news item
- /data/countries - Returns list of countries
- /v2/email - Unsure what this does, but it was in the js
- /live-match - Presumably returns live data for a match?
- /v2/streams - Returns owl stream links
- /maps - Returns list of maps
- /vods - Returns list of vods for prior matches
Get some data…
First we can get some data on the maps played…
data_request <-
httr::GET(url = "https://api.overwatchleague.com/maps")
data_content <-
data_request %>%
httr::content()
What is in here?
data_content[[1]] %>%
names
## [1] "id" "name" "background" "icon"
## [5] "type" "description" "thumbnail" "map"
## [9] "esportsApiId"
Which of these are nested deeper?
data_types <-
as.data.frame(data_content[[1]] %>% map_chr(class)) %>%
setNames("data_type")
data_types
## data_type
## id character
## name list
## background character
## icon character
## type character
## description list
## thumbnail character
## map list
## esportsApiId character
owl_map_df <-
fromJSON("https://api.overwatchleague.com/maps", flatten = TRUE) %>%
select(id,type, esportsApiId,everything())
owl_map_df %>%
names
## [1] "id" "type" "esportsApiId"
## [4] "background" "icon" "thumbnail"
## [7] "name.en_US" "name.es_MX" "name.pt_BR"
## [10] "name.de_DE" "name.en_GB" "name.es_ES"
## [13] "name.fr_FR" "name.it_IT" "name.pl_PL"
## [16] "name.ru_RU" "name.ko_KR" "name.zh_TW"
## [19] "name.zh_CN" "name.ja_JP" "description.en_US"
## [22] "description.es_MX" "description.pt_BR" "description.de_DE"
## [25] "description.en_GB" "description.es_ES" "description.fr_FR"
## [28] "description.it_IT" "description.pl_PL" "description.ru_RU"
## [31] "description.ko_KR" "description.zh_TW" "description.zh_CN"
## [34] "description.ja_JP" "map.href"
This wasn’t too bad… Let’s get some teams!!
Teams?
Can we also build a lookup table for teams?
What does the staging response look like?
owl_teams_df <-
fromJSON("https://api.overwatchleague.com/teams", flatten = TRUE)
owl_teams_df %>%
names
## [1] "id" "availableLanguages" "name"
## [4] "description" "competitors" "game"
## [7] "logo" "owl_divisions" "strings"
Let’s clean this
First we do the teams:
teams_cleaned_df <-
owl_teams_df$competitors %>%
select(competitor.name,competitor.id,competitor.handle,competitor.abbreviatedName) %>%
tbl_df()
teams_cleaned_df
## # A tibble: 12 x 4
## competitor.name competitor.id competitor.handle competitor.abbr…
## * <chr> <int> <chr> <chr>
## 1 Dallas Fuel 4523 fuel.6990 DAL
## 2 Philadelphia Fusion 4524 fusion.2056 PHI
## 3 Houston Outlaws 4525 houston.4749 HOU
## 4 Boston Uprising 4402 boston.8698 BOS
## 5 New York Excelsior 4403 new-york-city.16… NYE
## 6 San Francisco Shock 4404 san-francisco.38… SFS
## 7 Los Angeles Valiant 4405 los-angeles-a.27… VAL
## 8 Los Angeles Gladiators 4406 los-angeles-b.38… GLA
## 9 Florida Mayhem 4407 miami-orlando.68… FLA
## 10 Shanghai Dragons 4408 shanghai.1319 SHD
## 11 Seoul Dynasty 4409 seoul.1029 SEO
## 12 London Spitfire 4410 london.6950 LDN
And now the players!
players_cleaned_df <-
owl_teams_df$competitors$competitor.players %>%
map(~.x %>% select(player.id,team.id,player.availableLanguages,player.handle,player.name,player.homeLocation,player.familyName,player.nationality,player.attributes.heroes,player.attributes.player_number,player.attributes.role)) %>%
reduce(bind_rows) %>%
tbl_df()
players_cleaned_df %>% glimpse
## Observations: 126
## Variables: 11
## $ player.id <int> 3660, 3985, 4658, 3983, 5717, ...
## $ team.id <int> 4523, 4523, 4523, 4523, 4523, ...
## $ player.availableLanguages <list> ["en", "en", <"en", "ko">, "e...
## $ player.handle <chr> "harryhook.5132", "akm.8009", ...
## $ player.name <chr> "HarryHook", "aKm", "Rascal", ...
## $ player.homeLocation <chr> "Asturias", "Angoulême", "Seou...
## $ player.familyName <chr> "Tejedor Rua", "Bignet", "Kim"...
## $ player.nationality <chr> "ES", "FR", "KR", "FR", "KR", ...
## $ player.attributes.heroes <list> [<"lucio", "soldier-76", "zen...
## $ player.attributes.player_number <int> 9, 5, 27, 24, 2, 13, 4, 7, 23,...
## $ player.attributes.role <chr> "support", "offense", "offense...
So with the players we do have some nested fields but these are just vectors…
For now we will do 2 things;
1. We will keep the nested fields because we could unnest and invert the data to uncover relationships between character preferences and win rates for example…
2. We will create 2 new fields to describe these nested fields.
players_cleaned_df %<>%
mutate(player.nr_of_languages = player.availableLanguages %>% map_int(length)) %>%
mutate(player.nr_of_heros = player.attributes.heroes %>% map_int(length))
players_cleaned_df %>%
select(player.id,player.nr_of_languages,player.nr_of_heros)
## # A tibble: 126 x 3
## player.id player.nr_of_languages player.nr_of_heros
## <int> <int> <int>
## 1 3660 1 3
## 2 3985 1 2
## 3 4658 2 3
## 4 3983 1 3
## 5 5717 1 2
## 6 3380 1 3
## 7 4125 1 3
## 8 3969 1 3
## 9 4608 1 3
## 10 4618 1 3
## # ... with 116 more rows
OK great! Now we have some player and team data, let’s start by joining these to get ready for the join to matches played and their results…
Does the ids match?
players_cleaned_df$team.id %in% teams_cleaned_df$competitor.id %>% any()
## [1] TRUE
Yes, so we can join on this:
players_cleaned_df <-
players_cleaned_df %>%
left_join(teams_cleaned_df, by = c("team.id" = "competitor.id"))
players_cleaned_df %>%
glimpse
## Observations: 126
## Variables: 16
## $ player.id <int> 3660, 3985, 4658, 3983, 5717, ...
## $ team.id <int> 4523, 4523, 4523, 4523, 4523, ...
## $ player.availableLanguages <list> ["en", "en", <"en", "ko">, "e...
## $ player.handle <chr> "harryhook.5132", "akm.8009", ...
## $ player.name <chr> "HarryHook", "aKm", "Rascal", ...
## $ player.homeLocation <chr> "Asturias", "Angoulême", "Seou...
## $ player.familyName <chr> "Tejedor Rua", "Bignet", "Kim"...
## $ player.nationality <chr> "ES", "FR", "KR", "FR", "KR", ...
## $ player.attributes.heroes <list> [<"lucio", "soldier-76", "zen...
## $ player.attributes.player_number <int> 9, 5, 27, 24, 2, 13, 4, 7, 23,...
## $ player.attributes.role <chr> "support", "offense", "offense...
## $ player.nr_of_languages <int> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, ...
## $ player.nr_of_heros <int> 3, 2, 3, 3, 2, 3, 3, 3, 3, 3, ...
## $ competitor.name <chr> "Dallas Fuel", "Dallas Fuel", ...
## $ competitor.handle <chr> "fuel.6990", "fuel.6990", "fue...
## $ competitor.abbreviatedName <chr> "DAL", "DAL", "DAL", "DAL", "D...
Seems like we have a complete dataset on teams and their players with some usefull features describing them.
Now we need to get some actual match data!
Get some matches
owl_matches_df <-
fromJSON("https://api.overwatchleague.com/matches", flatten = TRUE)
owl_matches_df %>%
names
## [1] "content" "totalPages" "totalElements"
## [4] "last" "numberOfElements" "size"
## [7] "number" "sort" "first"
Huh… Well, contents is a deep structure; let’s go scavenging for easter eggs
owl_matches_df$content %>%
names
## [1] "id"
## [2] "competitors"
## [3] "scores"
## [4] "conclusionValue"
## [5] "conclusionStrategy"
## [6] "state"
## [7] "games"
## [8] "clientHints"
## [9] "dateCreated"
## [10] "flags"
## [11] "handle"
## [12] "startDate"
## [13] "endDate"
## [14] "showStartTime"
## [15] "showEndTime"
## [16] "winner.id"
## [17] "winner.availableLanguages"
## [18] "winner.handle"
## [19] "winner.name"
## [20] "winner.homeLocation"
## [21] "winner.primaryColor"
## [22] "winner.secondaryColor"
## [23] "winner.game"
## [24] "winner.attributesVersion"
## [25] "winner.abbreviatedName"
## [26] "winner.addressCountry"
## [27] "winner.logo"
## [28] "winner.icon"
## [29] "winner.players"
## [30] "winner.secondaryPhoto"
## [31] "winner.type"
## [32] "winner.attributes.city"
## [33] "winner.attributes.hero_image"
## [34] "winner.attributes.manager"
## [35] "winner.attributes.team_guid"
## [36] "bracket.id"
## [37] "bracket.matchConclusionValue"
## [38] "bracket.matchConclusionStrategy"
## [39] "bracket.winners"
## [40] "bracket.teamSize"
## [41] "bracket.repeatableMatchUps"
## [42] "bracket.type"
## [43] "bracket.clientHints"
## [44] "bracket.advantageComparing"
## [45] "bracket.thirdPlaceMatch"
## [46] "bracket.allowDraw"
## [47] "bracket.stage.id"
## [48] "bracket.stage.availableLanguages"
## [49] "bracket.stage.title"
## [50] "bracket.stage.tournament.id"
## [51] "bracket.stage.tournament.availableLanguages"
## [52] "bracket.stage.tournament.game"
## [53] "bracket.stage.tournament.location"
## [54] "bracket.stage.tournament.featured"
## [55] "bracket.stage.tournament.draft"
## [56] "bracket.stage.tournament.handle"
## [57] "bracket.stage.tournament.title"
## [58] "bracket.stage.tournament.attributesVersion"
## [59] "bracket.stage.tournament.subEvents"
## [60] "bracket.stage.tournament.attributes.program.environment"
## [61] "bracket.stage.tournament.attributes.program.phase"
## [62] "bracket.stage.tournament.attributes.program.season_id"
## [63] "bracket.stage.tournament.attributes.program.type"
## [64] "bracket.stage.tournament.attributes.program.stage.format"
## [65] "bracket.stage.tournament.attributes.program.stage.number"
## [66] "bracket.stage.tournament.series.id"
OK, so this was also a json but the nifty json package has flattened some of these nested results into columns.
What types of columns do we have left?
owl_matches_df$content %>%
map_chr(class) %>%
data.frame() %>%
setNames("var_type")
## var_type
## id integer
## competitors list
## scores list
## conclusionValue integer
## conclusionStrategy character
## state character
## games list
## clientHints list
## dateCreated numeric
## flags list
## handle character
## startDate numeric
## endDate numeric
## showStartTime logical
## showEndTime logical
## winner.id integer
## winner.availableLanguages list
## winner.handle character
## winner.name character
## winner.homeLocation character
## winner.primaryColor character
## winner.secondaryColor character
## winner.game character
## winner.attributesVersion character
## winner.abbreviatedName character
## winner.addressCountry character
## winner.logo character
## winner.icon character
## winner.players list
## winner.secondaryPhoto character
## winner.type character
## winner.attributes.city character
## winner.attributes.hero_image logical
## winner.attributes.manager logical
## winner.attributes.team_guid character
## bracket.id integer
## bracket.matchConclusionValue integer
## bracket.matchConclusionStrategy character
## bracket.winners integer
## bracket.teamSize integer
## bracket.repeatableMatchUps integer
## bracket.type character
## bracket.clientHints list
## bracket.advantageComparing character
## bracket.thirdPlaceMatch logical
## bracket.allowDraw logical
## bracket.stage.id integer
## bracket.stage.availableLanguages list
## bracket.stage.title character
## bracket.stage.tournament.id integer
## bracket.stage.tournament.availableLanguages list
## bracket.stage.tournament.game character
## bracket.stage.tournament.location character
## bracket.stage.tournament.featured logical
## bracket.stage.tournament.draft logical
## bracket.stage.tournament.handle character
## bracket.stage.tournament.title character
## bracket.stage.tournament.attributesVersion character
## bracket.stage.tournament.subEvents list
## bracket.stage.tournament.attributes.program.environment character
## bracket.stage.tournament.attributes.program.phase character
## bracket.stage.tournament.attributes.program.season_id character
## bracket.stage.tournament.attributes.program.type character
## bracket.stage.tournament.attributes.program.stage.format character
## bracket.stage.tournament.attributes.program.stage.number integer
## bracket.stage.tournament.series.id integer
Seems as though the rabit hole goes all the way down…
Many of these features have sub lists and in them sub dataframes…
owl_matches_df$content$scores[[1]]
## value
## 1 0
## 2 4
Scores can be unnested from here and joined to an id
owl_matches_df$content$competitors[[1]]$id
## [1] 4404 4405
It would seem as though the players field contains all the players, not just the players that played since the game is a 6 player first person shooter:
owl_matches_df$content$competitors[[1]]$players
## [[1]]
## flags team.id team.type player.id player.availableLanguages
## 1 NULL 4404 TEAM 5716 en
## 2 NULL 4404 TEAM 4025 en
## 3 NULL 4404 TEAM 4490 en
## 4 NULL 4404 TEAM 4491 en
## 5 NULL 4404 TEAM 4139 en
## 6 NULL 4404 TEAM 4488 en
## 7 NULL 4404 TEAM 4489 en
## 8 NULL 4404 TEAM 4492 en
## 9 NULL 4404 TEAM 3661 en
## 10 NULL 4404 TEAM 4626 en
## 11 NULL 4404 TEAM 5715 en, ko
## player.handle player.name player.homeLocation player.game
## 1 moth.3967 Moth Worcester, MA OVERWATCH
## 2 dhak.8567 dhaK Gijón OVERWATCH
## 3 super.6832 super Philadelphia OVERWATCH
## 4 babybay.7271 BABYBAY Chicago, IL OVERWATCH
## 5 sinatraa.3257 sinatraa Shoreline, WA OVERWATCH
## 6 leepy.5762 sleepy Lakeland, FL OVERWATCH
## 7 danteh.8452 Danteh Chicago, IL OVERWATCH
## 8 nomy.1216 Nomy Tijuana OVERWATCH
## 9 iddqd.9839 iddqd Stockholm OVERWATCH
## 10 nevix.5457 Nevix Helgenäs OVERWATCH
## 11 architect.5497 Architect Seoul OVERWATCH
## player.attributesVersion player.familyName player.givenName
## 1 1.0.0 Espe Grant
## 2 1.0.0 Martinez Paz Daniel
## 3 1.0.0 DeLisi Matthew
## 4 1.0.0 Francisty Andrej
## 5 1.0.0 Won Jay
## 6 1.0.0 Andrews Nikola
## 7 1.0.0 Cruz Dante
## 8 1.0.0 Lizarraga Ramirez Osmar David
## 9 1.0.0 Dahlström André
## 10 1.0.0 Karlsson Andreas
## 11 1.0.0 Park Minho
## player.nationality
## 1 US
## 2 ES
## 3 US
## 4 US
## 5 US
## 6 US
## 7 US
## 8 MX
## 9 SE
## 10 SE
## 11 KR
## player.headshot
## 1 https://bnetcmsus-a.akamaihd.net/cms/page_media/5z/5Z2WUOHSFSBX1521590864746.png
## 2 https://bnetcmsus-a.akamaihd.net/cms/page_media/4Q3H9T4BPAR21512777246918.png
## 3 https://bnetcmsus-a.akamaihd.net/cms/page_media/YUQL4VLXE7PX1512777247831.png
## 4 https://bnetcmsus-a.akamaihd.net/cms/page_media/U9QK0OYK87F01512777246945.png
## 5 https://bnetcmsus-a.akamaihd.net/cms/page_media/GAH625QTQPGC1512777247784.png
## 6 https://bnetcmsus-a.akamaihd.net/cms/page_media/YWYMSL7EOA951512777247847.png
## 7 https://bnetcmsus-a.akamaihd.net/cms/page_media/FD9N9OR9YO7L1512777246926.png
## 8 https://bnetcmsus-a.akamaihd.net/cms/page_media/NFY3Y5MRRO951512777247574.png
## 9 https://bnetcmsus-a.akamaihd.net/cms/page_media/WSFQZ1R6LCJ31512777247534.png
## 10 https://bnetcmsus-a.akamaihd.net/cms/page_media/QCO0N7H8SQFT1512777247525.png
## 11 https://bnetcmsus-a.akamaihd.net/cms/gallery/7YOLZ6HNVS2Y1522979583351.png
## player.type player.attributes.heroes player.attributes.player_number
## 1 PLAYER ana, lucio, mercy 64
## 2 PLAYER lucio, mercy, ana 10
## 3 PLAYER pharah, roadhog 1
## 4 PLAYER soldier-76, mccree, genji 13
## 5 PLAYER tracer, zarya, genji 2
## 6 PLAYER ana, zenyatta, mercy 27
## 7 PLAYER tracer, genji, sombra 6
## 8 PLAYER winston, reinhardt 7
## 9 PLAYER tracer, soldier-76, mccree 66
## 10 PLAYER dva, pharah, junkrat 9
## 11 PLAYER genji, pharah 3
## player.attributes.preferred_slot player.attributes.role
## 1 4 support
## 2 4 support
## 3 1 flex
## 4 0 offense
## 5 0 offense
## 6 5 support
## 7 3 offense
## 8 1 tank
## 9 0 offense
## 10 2 flex
## 11 <NA> offense
## player.attributes.hero_image player.attributes.hometown
## 1 NA <NA>
## 2 NA Gijón, Spain
## 3 NA Philadelphia, Pennsylvania
## 4 NA Chicago, Illionis
## 5 NA Shoreline, Washington
## 6 NA Lakeland, Flordia
## 7 NA Wonder Lake, illinois
## 8 NA Tijuana, Mexico
## 9 NA Stockholm, Sweden
## 10 NA Helgenäs , Sweden
## 11 NA <NA>
##
## [[2]]
## flags team.id team.type player.id player.availableLanguages
## 1 NULL 4405 TEAM 4635 en, ko
## 2 NULL 4405 TEAM 4667 en, ko
## 3 NULL 4405 TEAM 4636 en, ko
## 4 NULL 4405 TEAM 4633 en
## 5 NULL 4405 TEAM 3577 en
## 6 NULL 4405 TEAM 3663 en
## 7 NULL 4405 TEAM 4637 en
## 8 NULL 4405 TEAM 4142 en
## 9 NULL 4405 TEAM 3987 en
## player.handle player.name player.homeLocation player.game
## 1 kariv.3355 Kariv Seoul OVERWATCH
## 2 bunny.8066 Bunny Suwon OVERWATCH
## 3 fate.1459 Fate Seoul OVERWATCH
## 4 verbo.6137 Verbo Toronto OVERWATCH
## 5 custa.8424 Custa Adelaide OVERWATCH
## 6 numlocked.8944 Numlocked Leamington Spa OVERWATCH
## 7 space.7513 Space New York City, NY OVERWATCH
## 8 agilities.5443 Agilities Blairmore OVERWATCH
## 9 soon.8087 Soon Hangest-En-Santerre OVERWATCH
## player.attributesVersion player.familyName player.givenName
## 1 1.0.0 Bak Young-Seo
## 2 1.0.0 Chae Jun-Hyeok
## 3 1.0.0 Koo Pan-Seung
## 4 1.0.0 Disalvo Stefano
## 5 1.0.0 Kennedy Scott
## 6 1.0.0 Barton Sebastian
## 7 1.0.0 Halpern Indy
## 8 1.0.0 Girardi Brady
## 9 1.0.0 Tarlier Terence
## player.nationality
## 1 KR
## 2 KR
## 3 KR
## 4 CA
## 5 AU
## 6 GB
## 7 US
## 8 CA
## 9 FR
## player.headshot
## 1 https://bnetcmsus-a.akamaihd.net/cms/page_media/6QOV26CVECXE1512776701055.png
## 2 https://bnetcmsus-a.akamaihd.net/cms/page_media/S02042JXNHF81515718127993.png
## 3 https://bnetcmsus-a.akamaihd.net/cms/page_media/3SWJQYWMYTV11512776700780.png
## 4 https://bnetcmsus-a.akamaihd.net/cms/page_media/GB1BRHQZ4CLL1512776701508.png
## 5 https://bnetcmsus-a.akamaihd.net/cms/gallery/5BKHVJ9PGEFR1522979605000.png
## 6 https://bnetcmsus-a.akamaihd.net/cms/page_media/H4XYM23GR1061512776701073.png
## 7 https://bnetcmsus-a.akamaihd.net/cms/page_media/BP0XDXDMWZ421512776701354.png
## 8 https://bnetcmsus-a.akamaihd.net/cms/page_media/47PKEMP7FWAQ1512776700755.png
## 9 https://bnetcmsus-a.akamaihd.net/cms/page_media/E29MO86JEHJD1512776701352.png
## player.type player.attributes.hero_image player.attributes.heroes
## 1 PLAYER NA zenyatta, ana, widowmaker
## 2 PLAYER NA tracer, mccree, soldier-76
## 3 PLAYER NA winston, reinhardt, tracer
## 4 PLAYER NA lucio, mercy, ana
## 5 PLAYER NA zenyatta, lucio, mercy
## 6 PLAYER NA winston, reinhardt, dva
## 7 PLAYER NA zarya, roadhog, dva
## 8 PLAYER NA genji, tracer, pharah
## 9 PLAYER NA tracer, widowmaker, mccree
## player.attributes.hometown player.attributes.player_number
## 1 Seoul 7
## 2 <NA> 8
## 3 Seoul 1
## 4 Toronto 13
## 5 <NA> 10
## 6 Leamington Spa 6
## 7 New York City, New York 16
## 8 Blairmore, AB 3
## 9 Hangest-En-Santerre, France 99
## player.attributes.preferred_slot player.attributes.role
## 1 3 support
## 2 1 offense
## 3 0 tank
## 4 4 support
## 5 4 support
## 6 0 tank
## 7 2 flex
## 8 5 offense
## 9 1 offense
We can see if the match has been played:
owl_matches_df$content$state
## [1] "CONCLUDED" "CONCLUDED" "CONCLUDED" "CONCLUDED" "CONCLUDED"
## [6] "CONCLUDED" "CONCLUDED" "CONCLUDED" "CONCLUDED" "CONCLUDED"
## [11] "CONCLUDED" "CONCLUDED" "CONCLUDED" "CONCLUDED" "CONCLUDED"
## [16] "CONCLUDED" "CONCLUDED" "CONCLUDED" "CONCLUDED" "CONCLUDED"
At this point we can create an initial frame for each of the overall matchups:
matches_cleaned_df <-
owl_matches_df$content %>%
select(id,competitors,scores,state,endDate,winner.name,bracket.id) %>%
dplyr::rename(match_id = id) %>%
mutate(competitors = competitors %>% map(~.x %>% select(id)))
matches_cleaned_df
## match_id competitors scores state endDate
## 1 10223 4404, 4405 0, 4 CONCLUDED 1.515636e+12
## 2 10224 4408, 4406 0, 4 CONCLUDED 1.515641e+12
## 3 10225 4523, 4409 1, 2 CONCLUDED 1.515650e+12
## 4 10226 4410, 4407 3, 1 CONCLUDED 1.515714e+12
## 5 10227 4524, 4525 3, 2 CONCLUDED 1.515721e+12
## 6 10228 4402, 4403 1, 3 CONCLUDED 1.515728e+12
## 7 10229 4405, 4523 3, 0 CONCLUDED 1.515810e+12
## 8 10230 4407, 4402 0, 4 CONCLUDED 1.515816e+12
## 9 10231 4404, 4408 3, 1 CONCLUDED 1.515823e+12
## 10 10232 4410, 4524 4, 0 CONCLUDED 1.515876e+12
## 11 10233 4403, 4525 3, 1 CONCLUDED 1.515883e+12
## 12 10234 4409, 4406 4, 0 CONCLUDED 1.515891e+12
## 13 10235 4404, 4524 1, 2 CONCLUDED 1.516240e+12
## 14 10236 4407, 4409 0, 4 CONCLUDED 1.516247e+12
## 15 10237 4525, 4408 4, 0 CONCLUDED 1.516253e+12
## 16 10238 4523, 4525 0, 4 CONCLUDED 1.516318e+12
## 17 10239 4403, 4405 3, 0 CONCLUDED 1.516327e+12
## 18 10240 4524, 4406 2, 3 CONCLUDED 1.516335e+12
## 19 10241 4409, 4402 4, 0 CONCLUDED 1.516413e+12
## 20 10242 4408, 4407 0, 4 CONCLUDED 1.516419e+12
## winner.name bracket.id
## 1 Los Angeles Valiant 2725
## 2 Los Angeles Gladiators 2725
## 3 Seoul Dynasty 2725
## 4 London Spitfire 2725
## 5 Philadelphia Fusion 2725
## 6 New York Excelsior 2725
## 7 Los Angeles Valiant 2725
## 8 Boston Uprising 2725
## 9 San Francisco Shock 2725
## 10 London Spitfire 2725
## 11 New York Excelsior 2725
## 12 Seoul Dynasty 2725
## 13 Philadelphia Fusion 2725
## 14 Seoul Dynasty 2725
## 15 Houston Outlaws 2725
## 16 Houston Outlaws 2725
## 17 New York Excelsior 2725
## 18 Los Angeles Gladiators 2725
## 19 Seoul Dynasty 2725
## 20 Florida Mayhem 2725
The only problem here is that we have to columns with data frames in them so let’s fix that
One way of tidying this is to throw them together by mapping a bind_cols
function over them and that allows us to use unnest
to go into long format. Think of it as a reduce
into another bind_cols
matches_cleaned_df %<>%
mutate(match_outcome = pmap(matches_cleaned_df,~bind_cols(..2,..3))) %>%
select(-competitors,-scores) %>%
unnest() %>%
dplyr::rename(team.id = id, match_score = value)
# mutate(match_outcome = list(competitors,scores) %>% reduce(bind_cols) %>% list())
Inside each of these matches there are games. This is because the match is decided by the team that wins the most of the rounds of games.
owl_matches_df$content$games[[1]] %>%
names
## [1] "id" "number"
## [3] "points" "attributesVersion"
## [5] "players" "state"
## [7] "stats" "handle"
## [9] "attributes.instanceID" "attributes.map"
## [11] "attributes.mapScore.team1" "attributes.mapScore.team2"
Seems like some useful info here.
owl_matches_df$content$games[[1]] %>%
map_chr(class) %>%
data.frame() %>%
setNames("var_type")
## var_type
## id integer
## number integer
## points list
## attributesVersion character
## players list
## state character
## stats logical
## handle character
## attributes.instanceID character
## attributes.map character
## attributes.mapScore.team1 integer
## attributes.mapScore.team2 integer
It seems as though we can drop a lot of these columns already. And we can wrangle the data so that it is also on a team level not a game level.
Furthermore; the list of players another level down shows only the players that played. This is crucial! This means we can join player info onto this list and record statistics only when a player plays a game…
Another key insight here is that the order these lists were arranged in is team1 first and then team2 second. I guess to normalize the data structure these teams were not named in the games sub data frames.
games_cleaned_df <-
owl_matches_df$content %>%
select(id,games) %>%
dplyr::rename(match_id = id) %>%
mutate(games = games %>% map(~.x %>%
select(
id,
number,
attributes.map,
attributes.mapScore.team1,
attributes.mapScore.team2,
players
) %>%
tidyr::gather(
key = "team_tag",
value = "team_score",
attributes.mapScore.team1,
attributes.mapScore.team2
) %>%
dplyr::rename(game_id = id, game_number = number)
)) %>%
unnest() %>%
mutate(players = players %>% map(~.x %>% select(team.id,player.id,player.name))) %>%
unnest() %>%
tbl_df()
games_cleaned_df %>% glimpse
## Observations: 1,968
## Variables: 9
## $ match_id <int> 10223, 10223, 10223, 10223, 10223, 10223, 10223...
## $ game_id <int> 5599, 5599, 5599, 5599, 5599, 5599, 5599, 5599,...
## $ game_number <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,...
## $ attributes.map <chr> "dorado", "dorado", "dorado", "dorado", "dorado...
## $ team_tag <chr> "attributes.mapScore.team1", "attributes.mapSco...
## $ team_score <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,...
## $ team.id <int> 4405, 4405, 4405, 4405, 4405, 4405, 4404, 4404,...
## $ player.id <int> 4634, 4635, 4142, 3987, 3983, 4636, 4491, 4025,...
## $ player.name <chr> "Envy", "Kariv", "Agilities", "Soon", "Unkoe", ...
Unfortunately the api did not include a clear mapping from games to teams for score. One would have to figure out how to join them or realize that the “team1” was always ontop in the matches hierarchy.
Even more disappointing is the fact that the players have no juicy data in here like kills or assists etc.
If this was available one could immediately start joining these tables and train some machine learning models to predict wins, hero picks, find factors associated with winning etc.
Visualize some data!
Nationality of the players
players_cleaned_df %>%
ggplot(aes(x = player.nationality))+
geom_bar(aes(fill = player.nationality))
nationality by team counting number of players
players_cleaned_df %>%
ggplot(aes(x = competitor.name,group = player.nationality))+
geom_bar(aes(fill = player.nationality))+
coord_flip()
Any correlation between average number of game wins and number of nationalities in a team?
# matches_cleaned_df %>%
games_cleaned_df %>%
group_by(match_id,team.id) %>%
summarise(team_score = sum(team_score,na.rm = TRUE)) %>%
right_join(matches_cleaned_df) %>%
# group_by(match_id, winner.name) %>%
# summarise(nr_wins = max(match_score)) %>%
right_join(players_cleaned_df, by = c("winner.name" = "competitor.name")) %>%
group_by(winner.name) %>%
mutate(nr_uniq_nationalities = unique(player.nationality) %>% length) %>%
summarise(mean_wins = mean(team_score,na.rm = TRUE),
nr_uniq_nationalities = mean(nr_uniq_nationalities,na.rm = TRUE)) %>%
# players_cleaned_df %>%
gather(key = "metric", value = "count",nr_uniq_nationalities,mean_wins) %>%
ggplot(aes(x = winner.name, y = count, group = metric, fill = metric))+
geom_bar(stat = "identity", position = "dodge")+
coord_flip()
## Joining, by = c("match_id", "team.id")
## Warning: Removed 2 rows containing missing values (geom_bar).
Any correlation between average number of game wins and the map being played?
# matches_cleaned_df %>%
games_cleaned_df %>%
group_by(match_id,team.id,attributes.map) %>%
summarise(team_score = sum(team_score,na.rm = TRUE)) %>%
left_join(matches_cleaned_df) %>%
# group_by(match_id, winner.name) %>%
# summarise(nr_wins = max(match_score)) %>%
right_join(players_cleaned_df, by = c("winner.name" = "competitor.name")) %>%
group_by(winner.name,attributes.map) %>%
mutate(nr_uniq_nationalities = unique(player.nationality) %>% length) %>%
summarise(mean_wins = mean(team_score,na.rm = TRUE),
nr_uniq_nationalities = mean(nr_uniq_nationalities,na.rm = TRUE)) %>%
# players_cleaned_df %>%
ggplot(aes(x = attributes.map, y = mean_wins, group = winner.name, fill = winner.name))+
geom_bar(stat = "identity")+
coord_flip()
## Joining, by = c("match_id", "team.id")
## Warning: Removed 2 rows containing missing values (position_stack).