This notebook is the result of a short investigation of the 2015 Pronto Data Challenge. It is intended to be a demonstration of how IPython/Jupyter notebooks can be easily combined with the pandas, spark and d3 libraries based on previously existing demonstrations.
This notebook contains queries to generate two charts. One explores trip flows between different regions of the Pronto system. The second recreates a chart for a performance metric that has been measured in other bike share systems in the US.
Hopefully this notebook can be used to gain some quick insights into Pronto's first year, but it it is intended more to be a starting point for more queries.
This section can be skipped if you are more interested in the results than the underlying technology.
Load the data. Pandas currently is currently more convenient than Spark at reading csv files of the format provided by Pronto, so we'll read them as Pandas dataframes.
This notebook does not do automated clustering on the data. But as a quick alternative, it turns out the station ids have a region label in them. The following lines make that an explicit column in the station table.
The Pandas library can be used alone, but in some situations the SQL language is more useful. The Spark library converts easily between Pandas and Spark SQL Dataframes. The following are SQL versions of the tables.
The SQL tables can now be used to group and count the trips by region labels. Pandas could be used to express this query as well, though the specific query would be different.
One benefit of Pandas dataframes is they display automatically as a table in Jupyter notebooks. This is just the first five lines of the region result.
This table is easier to interpret as a cross-tabulation. The cross-tabulation will also make it easier to display as a chart later.
The arcs around the perimeter represent the region labels for the stations, and their size represents the number of trips. The paths between regions represent all the intra-region trips. The width of the path where it touches a region arc is the number of trips originating in that region. If you hover over a region trips in other regions will be dimmed.
At first glance this chart has a few noticable features. The large number of intra-region trips stands out, though several smaller regions are exceptions (First Hill, Belltown and Central Business District. Some regions, such as UW and UD are tightly connected only to each other. Capitol Hill does originate more trips to South Lake Union, the Central Business District and Belltown than return, but the effect doesn't dominate the chart. Pioneer Square has very even connections to other regions, which may be a function of the Second Avenue protected bike lane.
The thing to remember is the size of the regions in the chart above is based on the number of trips. Another query can show how many stations are in each region. This query also assigns random colors (different from the chart above) for use later.
This shows that South Lake Union is a group of only ten stations. It's the same for Capitol Hill. UW and the University District combined are also 10 stations. Pioneer square has only two stations, but they seem to be serving a distinct purpose, as are the four BT stations and the two waterfront stations.
My immediate concern is whether we currently have three network fragments (UW/UD, Capitol Hill and SLU/CBD/BT) rather than one dense network. The data does have enough information to check how our network is performing compared to some existing networks in the US. This will be the topic of the next chart.
Analysis of other bike share networks shows that usage statistics rise non-linearly with station density. http://nacto.org/wp-content/uploads/2015/09/NACTO_Walkable-Station-Spacing-Is-Key-For-Bike-Share_Sc.pdf
The following query simply counts station pairs that have had a trip of less than 15 minutes. They are then grouped and sorted from most destinations to least.
|0||CH-05||15th Ave E & E Thomas St||44|
|1||CH-02||E Harrison St & Broadway Ave E||43|
|2||CH-03||Summit Ave E & E Republican St||43|
|3||CH-01||Summit Ave & E Denny Way||42|
|4||SLU-01||REI / Yale Ave N & John St||42|
|5||CH-08||Cal Anderson Park / 11th Ave & Pine St||42|
|6||CH-15||12th Ave & E Mercer St||42|
|7||CH-12||Bellevue Ave & E Pine St||41|
|8||CH-07||E Pine St & 16th Ave||41|
|9||SLU-16||Pine St & 9th Ave||40|
|10||CBD-03||7th Ave & Union St||40|
|11||SLU-15||Westlake Ave & 6th Ave||39|
|12||CH-06||12th Ave & E Denny Way||39|
|13||CH-09||Harvard Ave & E Pine St||39|
|14||CBD-13||2nd Ave & Pine St||39|
|15||SLU-07||PATH / 9th Ave & Westlake Ave||38|
|16||SLU-20||Terry Ave & Stewart St||37|
|17||CBD-06||2nd Ave & Spring St||37|
|18||BT-05||2nd Ave & Blanchard St||36|
|19||FH-01||Frye Art Museum / Terry Ave & Columbia St||36|
|20||CBD-04||Union St & 4th Ave||35|
|21||BT-04||6th Ave & Blanchard St||35|
|22||SLU-18||Dexter Ave & Denny Way||34|
|23||SLU-04||Republican St & Westlake Ave N||34|
|24||CBD-07||City Hall / 4th Ave & James St||33|
|25||DPD-01||9th Ave N & Mercer St||33|
|26||FH-04||Seattle University / E Columbia St & 12th Ave||32|
|27||BT-03||2nd Ave & Vine St||32|
|28||BT-01||3rd Ave & Broad St||31|
|29||CBD-05||1st Ave & Marion St||29|
|30||SLU-02||Dexter Ave N & Aloha St||29|
|31||SLU-19||Key Arena / 1st Ave N & Harrison St||29|
|32||PS-05||King Street Station Plaza / 2nd Ave Extension ...||28|
|33||PS-04||Occidental Park / Occidental Ave S & S Washing...||28|
|34||EL-03||E Blaine St & Fairview Ave E||27|
|35||ID-04||6th Ave S & S King St||27|
|36||SLU-17||Lake Union Park / Valley St & Boren Ave N||26|
|37||EL-01||Fred Hutchinson Cancer Research Center / Fairv...||25|
|38||CD-01||12th Ave & E Yesler Way||23|
|39||WF-01||Pier 69 / Alaskan Way & Clay St||23|
|40||WF-04||Seattle Aquarium / Alaskan Way S & Elliott Bay...||19|
|41||EL-05||Eastlake Ave E & E Allison St||18|
|42||UD-02||NE 42nd St & University Way NE||16|
|43||UW-04||15th Ave NE & NE 40th St||14|
|44||UD-04||12th Ave & NE Campus Pkwy||14|
|45||UW-02||Burke Museum / E Stevens Way NE & Memorial Way NE||13|
|46||SLU-21||Mercer St & 9th Ave N||13|
|47||UD-07||NE 47th St & 12th Ave NE||13|
|48||UW-10||UW Magnuson Health Sciences Center Rotunda / C...||12|
|49||UW-07||UW Intramural Activities Building||11|
|50||UW-06||UW Engineering Library / E Stevens Way NE & Je...||11|
|51||UD-01||Burke-Gilman Trail / NE Blakeley St & 24th Ave NE||11|
|52||UW-01||UW McCarty Hall / Whitman Ct||10|
|53||DPD-03||Children's Hospital / Sandpoint Way NE & 40th ...||6|
|54||Pronto shop||Pronto shop||1|
This shows that there are 10 stations that are reasonably well connected (40+ destinations). The two busy waterfront stations actually only have a mediocre number of destinations (19 and 23). The University stations have very low connectivity and Children's Hospital is an outlier at the edge of the network.
The graphs in the NACTO paper suggest that stations with 40 destinations could expect to have around 10-30 departures per day. This can also be checked.
|0||WF-01||19.758904||18.471233||Pier 69 / Alaskan Way & Clay St||23||38.230137||WF||#7F7B14|
|1||WF-04||12.106849||8.660274||Seattle Aquarium / Alaskan Way S & Elliott Bay...||19||20.767123||WF||#7F7B14|
|2||CBD-13||19.695890||14.753425||2nd Ave & Pine St||39||34.449315||CBD||#E861ED|
|3||CBD-06||12.512329||8.246575||2nd Ave & Spring St||37||20.758904||CBD||#E861ED|
|4||CBD-05||11.241096||7.936986||1st Ave & Marion St||29||19.178082||CBD||#E861ED|
|5||CBD-03||10.846575||8.147945||7th Ave & Union St||40||18.994521||CBD||#E861ED|
|6||CBD-07||5.295890||5.035616||City Hall / 4th Ave & James St||33||10.331507||CBD||#E861ED|
|7||CBD-04||1.791781||1.890411||Union St & 4th Ave||35||3.682192||CBD||#E861ED|
|8||BT-01||15.890411||16.123288||3rd Ave & Broad St||31||32.013699||BT||#A8C00D|
|9||BT-05||9.476712||9.227397||2nd Ave & Blanchard St||36||18.704110||BT||#A8C00D|
|10||BT-03||9.276712||11.504110||2nd Ave & Vine St||32||20.780822||BT||#A8C00D|
|11||BT-04||5.084932||6.084932||6th Ave & Blanchard St||35||11.169863||BT||#A8C00D|
|12||SLU-07||14.767123||10.213699||PATH / 9th Ave & Westlake Ave||38||24.980822||SLU||#1D487F|
|13||SLU-15||14.597260||13.715068||Westlake Ave & 6th Ave||39||28.312329||SLU||#1D487F|
|14||SLU-16||13.265753||9.624658||Pine St & 9th Ave||40||22.890411||SLU||#1D487F|
|15||SLU-04||12.821918||8.279452||Republican St & Westlake Ave N||34||21.101370||SLU||#1D487F|
|16||SLU-01||12.079452||12.194521||REI / Yale Ave N & John St||42||24.273973||SLU||#1D487F|
|17||SLU-19||11.235616||10.315068||Key Arena / 1st Ave N & Harrison St||29||21.550685||SLU||#1D487F|
|18||SLU-02||9.830137||8.972603||Dexter Ave N & Aloha St||29||18.802740||SLU||#1D487F|
|19||SLU-17||8.849315||8.243836||Lake Union Park / Valley St & Boren Ave N||26||17.093151||SLU||#1D487F|
|20||SLU-18||5.610959||7.038356||Dexter Ave & Denny Way||34||12.649315||SLU||#1D487F|
|21||SLU-20||2.945205||2.704110||Terry Ave & Stewart St||37||5.649315||SLU||#1D487F|
|22||SLU-21||0.150685||0.213699||Mercer St & 9th Ave N||13||0.364384||SLU||#1D487F|
|23||PS-04||12.928767||7.701370||Occidental Park / Occidental Ave S & S Washing...||28||20.630137||PS||#C1F771|
|24||PS-05||9.183562||5.235616||King Street Station Plaza / 2nd Ave Extension ...||28||14.419178||PS||#C1F771|
|25||DPD-01||9.446575||7.002740||9th Ave N & Mercer St||33||16.449315||DPD||#D84B0A|
|26||DPD-03||2.054795||2.287671||Children's Hospital / Sandpoint Way NE & 40th ...||6||4.342466||DPD||#D84B0A|
|27||EL-03||9.389041||7.293151||E Blaine St & Fairview Ave E||27||16.682192||EL||#ED8E33|
|28||EL-05||6.621918||5.389041||Eastlake Ave E & E Allison St||18||12.010959||EL||#ED8E33|
|29||EL-01||5.709589||5.208219||Fred Hutchinson Cancer Research Center / Fairv...||25||10.917808||EL||#ED8E33|
|30||CH-08||8.284932||13.367123||Cal Anderson Park / 11th Ave & Pine St||42||21.652055||CH||#D64025|
|31||CH-02||8.172603||13.123288||E Harrison St & Broadway Ave E||43||21.295890||CH||#D64025|
|32||CH-09||5.254795||8.016438||Harvard Ave & E Pine St||39||13.271233||CH||#D64025|
|33||CH-03||5.208219||8.660274||Summit Ave E & E Republican St||43||13.868493||CH||#D64025|
|34||CH-12||5.153425||7.021918||Bellevue Ave & E Pine St||41||12.175342||CH||#D64025|
|35||CH-01||5.087671||8.863014||Summit Ave & E Denny Way||42||13.950685||CH||#D64025|
|36||CH-07||4.539726||14.219178||E Pine St & 16th Ave||41||18.758904||CH||#D64025|
|37||CH-05||3.356164||10.347945||15th Ave E & E Thomas St||44||13.704110||CH||#D64025|
|38||CH-15||3.183562||7.778082||12th Ave & E Mercer St||42||10.961644||CH||#D64025|
|39||CH-06||1.775342||4.575342||12th Ave & E Denny Way||39||6.350685||CH||#D64025|
|40||UD-04||6.476712||5.216438||12th Ave & NE Campus Pkwy||14||11.693151||UD||#C9BE53|
|41||UD-01||5.745205||4.841096||Burke-Gilman Trail / NE Blakeley St & 24th Ave NE||11||10.586301||UD||#C9BE53|
|42||UD-07||2.693151||3.523288||NE 47th St & 12th Ave NE||13||6.216438||UD||#C9BE53|
|43||UD-02||1.739726||1.726027||NE 42nd St & University Way NE||16||3.465753||UD||#C9BE53|
|44||ID-04||6.369863||4.010959||6th Ave S & S King St||27||10.380822||ID||#3A949E|
|45||UW-04||4.205479||3.761644||15th Ave NE & NE 40th St||14||7.967123||UW||#7557AA|
|46||UW-06||3.468493||3.758904||UW Engineering Library / E Stevens Way NE & Je...||11||7.227397||UW||#7557AA|
|47||UW-10||3.153425||2.304110||UW Magnuson Health Sciences Center Rotunda / C...||12||5.457534||UW||#7557AA|
|48||UW-07||2.769863||2.394521||UW Intramural Activities Building||11||5.164384||UW||#7557AA|
|49||UW-02||2.030137||2.972603||Burke Museum / E Stevens Way NE & Memorial Way NE||13||5.002740||UW||#7557AA|
|50||UW-01||1.216438||1.301370||UW McCarty Hall / Whitman Ct||10||2.517808||UW||#7557AA|
|51||FH-04||3.758904||5.104110||Seattle University / E Columbia St & 12th Ave||32||8.863014||FH||#B318E2|
|52||FH-01||2.101370||5.547945||Frye Art Museum / Terry Ave & Columbia St||36||7.649315||FH||#B318E2|
|53||CD-01||1.112329||1.205479||12th Ave & E Yesler Way||23||2.317808||CD||#CB3C65|
This plot looks similar to the plots in the NACTO paper. The stations with the most destinations have between 10 and 30 rides per day. The more remote stations appear to have fewer rides. Because this notebook did not correct for stations that were added during the year, a few stations such as the new station at 9th and Mercer do not have an accurate rides per day number.
Note also that the region colors in this plot do not correspond to the colors in the chord chart above, and there is no color/region key. Still it is possible to see that the different regions have slightly different usage characteristics beyond station density and region size.
This notebook is intended to provides some quick insight into Pronto performance after a year of operation and toin demonstrate some useful technologies for continuing investigation.
The inter-region trip chart seems to show that there is some fragmentation and hill effects in the network, but they are not out of scale when compared to intra-region trips in the larger regions and the well-connected central business district, Belltown and even pioneer square regions.
The 15 minute rides chart shows that our network has similar performance to other bike share systems, but with the caveat that less dense and peripheral stations do not have enough destinations see high usage.
The Pandas, Spark and D3 libraries make it easy to manipulate and visualize this data. Similar charts can be generated by demographics, time of day or year with slight modifications to the query. It would be interesting to see the charts with automated region clusters rather than the region labels from the station table.