Pronto

 
## Introduction​This notebook is the result of a short investigation of the 2015 Pronto Data Challenge.It is intended to be a demonstration of how IPython/Jupyter notebookscan be easily combined with the pandas, spark and d3 libraries based on previously existing demonstrations.​This notebook contains queries to generate two charts.One explores trip flows between different regions of the Pronto system.The second recreates a chart for a performance metric that has been measured in other bike share systems in the US.​Hopefully this notebook can be used to gain some quick insights into Pronto's first year, but it it is intended more to be a starting point for more queries.

Introduction¶

This notebook is the result of a short investigation of the 2015 Pronto Data Challenge. It is intended to be a demonstration of how IPython/Jupyter notebooks can be easily combined with the pandas, spark and d3 libraries based on previously existing demonstrations.

This notebook contains queries to generate two charts. One explores trip flows between different regions of the Pronto system. The second recreates a chart for a performance metric that has been measured in other bike share systems in the US.

Hopefully this notebook can be used to gain some quick insights into Pronto's first year, but it it is intended more to be a starting point for more queries.

 
## Library Initialization​To begin with, the system needs to be initialized and the libraries loaded.  This notebook relies on a combination of Pandas dataframes and Spark SQL queries to generate data for the D3 javascript and Matplotlib libraries.  It is based on examples that are referenced in the code comments.​This section can be skipped if you are more interested in the results than the underlying technology.

Library Initialization¶

To begin with, the system needs to be initialized and the libraries loaded. This notebook relies on a combination of Pandas dataframes and Spark SQL queries to generate data for the D3 javascript and Matplotlib libraries. It is based on examples that are referenced in the code comments.

This section can be skipped if you are more interested in the results than the underlying technology.

In [1]:

 
# see http://stackoverflow.com/questions/29783520/create-pyspark-profile-for-ipythonimport findsparkfindspark.init()import pysparksc = pyspark.SparkContext(appName="ProntoChallenge")

In [2]:

 
# see http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontextfrom pyspark.sql import SQLContextsqlContext = SQLContext(sc)

In [3]:

 
import pandas as pdimport numpy as np

 
Load the data.  Pandas currently is currently more convenient than Spark at reading csv files of the format provided by Pronto, so we'll read them as Pandas dataframes.

Load the data. Pandas currently is currently more convenient than Spark at reading csv files of the format provided by Pronto, so we'll read them as Pandas dataframes.

In [4]:

 
station_df = pd.read_csv('2015_station_data.csv')trip_df = pd.read_csv('2015_trip_data.csv')trip_df.dropna()weather_df = pd.read_csv('2015_weather_data.csv')# it's bigger dataset, and not used in this example.#status_df = pd.read_csv('2015_status_data.csv')

 
## Visualization of Inter- and Intra-region Trips​This notebook does not do automated clustering on the data.  But as a quick alternative, it turns out the station ids have a region label in them.  The following lines make that an explicit column in the station table.

Visualization of Inter- and Intra-region Trips¶

This notebook does not do automated clustering on the data. But as a quick alternative, it turns out the station ids have a region label in them. The following lines make that an explicit column in the station table.

In [5]:

 
trip_df.loc[:,'from_region'] = trip_df.loc[:,'from_station_id'].map(lambda r: str(r).split('-')[0])trip_df.loc[:,'to_region'] = trip_df.loc[:,'to_station_id'].map(lambda r: str(r).split('-')[0])station_df.loc[:,'region'] = station_df.loc[:,'terminal'].map(lambda r: str(r).split('-')[0])

 
The Pandas library can be used alone, but in some situations the SQL language is more useful.  The Spark library converts easily between Pandas and Spark SQL Dataframes.  The following are SQL versions of the tables.

The Pandas library can be used alone, but in some situations the SQL language is more useful. The Spark library converts easily between Pandas and Spark SQL Dataframes. The following are SQL versions of the tables.

In [6]:

 
stations_sdf = sqlContext.createDataFrame(station_df)stations_sdf.registerTempTable('stations')

In [7]:

 
trip_sdf = sqlContext.createDataFrame(trip_df)trip_sdf.registerTempTable('trips')

 
The SQL tables can now be used to group and count the trips by region labels.  Pandas could be used to express this query as well, though the specific query would be different.

The SQL tables can now be used to group and count the trips by region labels. Pandas could be used to express this query as well, though the specific query would be different.

In [8]:

 
regions_sql = """    SELECT from_region, to_region, count(*) as trips    FROM trips    GROUP BY from_region, to_region"""region_result = sqlContext.sql(regions_sql)

 
One benefit of Pandas dataframes is they display automatically as a table in Jupyter notebooks.This is just the first five lines of the region result.

One benefit of Pandas dataframes is they display automatically as a table in Jupyter notebooks. This is just the first five lines of the region result.

In [9]:

 
region_result.toPandas()[:5]

Out[9]:

	from_region	to_region	trips
0	DPD	WF	35
1	CH	UW	388
2	UW	UW	2828
3	CH	DPD	411
4	UW	DPD	212

 
This table is easier to interpret as a cross-tabulation.The cross-tabulation will also make it easier to display as a chart later.

This table is easier to interpret as a cross-tabulation. The cross-tabulation will also make it easier to display as a chart later.

In [10]:

 
region_to_from_df = region_result.toPandas()region_crosstab_df = pd.crosstab(trip_df.from_region, trip_df.to_region)region_crosstab_df

Out[10]:

to_region	BT	CBD	CD	CH	DPD	EL	FH	ID	PS	Pronto shop	SLU	UD	UW	WF
from_region
BT	2200	3587	5	728	842	627	49	180	982	0	4809	111	50	1503
CBD	2974	3195	27	1341	184	305	159	608	2174	2	4175	75	31	1544
CD	8	73	13	116	2	1	36	57	82	0	17	3	0	32
CH	2150	6493	236	11698	411	1133	1284	677	1502	4	8105	594	388	355
DPD	524	165	3	112	316	282	12	4	79	0	1212	412	235	35
EL	559	244	1	199	284	1515	22	7	82	0	2473	601	472	71
FH	177	708	48	1345	30	101	231	225	389	0	568	18	10	38
ID	100	482	18	214	6	5	80	137	140	0	132	4	3	143
PS	475	1372	21	288	64	53	99	156	465	0	764	5	0	960
Pronto shop	1	0	0	0	0	0	0	0	0	0	0	0	0	0
SLU	4365	4601	13	1781	1454	2703	141	179	1038	7	14944	513	324	1340
UD	106	69	5	192	361	575	7	1	7	0	475	1984	1784	21
UW	29	77	0	195	212	601	2	1	6	0	309	1731	2828	29
WF	833	1339	16	47	32	27	17	93	1125	0	763	28	23	5560

 
For use with the javascript D3 visualization library, one last format conversion to JSON is helpful.  

For use with the javascript D3 visualization library, one last format conversion to JSON is helpful.

In [11]:

 
import json​# see http://stackoverflow.com/questions/3488934/simplejson-and-numpy-arrayclass NumpyAwareJSONEncoder(json.JSONEncoder):    def default(self, obj):        if isinstance(obj, np.ndarray) and obj.ndim == 1:            return obj.tolist()​region_json = json.dumps(region_crosstab_df.as_matrix().tolist(), cls=NumpyAwareJSONEncoder)region_labels = list(region_crosstab_df.columns.values)

 
The following section is based on a demonstration by Ariel Keselman on integrating the D3 javascript library with IPython/Jupyter notebooks, specifically with chord charts.​Note that in the current version of this notebook, the json matrix and region labels have been copied to the javascript template by hand.

The following section is based on a demonstration by Ariel Keselman on integrating the D3 javascript library with IPython/Jupyter notebooks, specifically with chord charts.

Note that in the current version of this notebook, the json matrix and region labels have been copied to the javascript template by hand.

In [12]:

 
# This section is modified from https://github.com/skariel/IPython_d3_js_demo/blob/master/d3_js_demo.ipynbfrom IPython.display import IFrameimport re​def replace_all(txt,d):    rep = dict((re.escape('{'+k+'}'), str(v)) for k, v in d.items())    pattern = re.compile("|".join(rep.keys()))    return pattern.sub(lambda m: rep[re.escape(m.group(0))], txt)    ​count=0def serve_html(s,w,h):    import os    global count    count+=1    fn= '__tmp'+str(os.getpid())+'_'+str(count)+'.html'    with open(fn,'w') as f:        f.write(s)    fn = '__tmp22236_1.html'    return IFrame('files/'+fn,w,h)​def chord_chart(w=800,                h=800,                ball_count=150,                rad_min=2,                rad_fac=11,                color_count=14):    d={       'width'      :w,       'height'     :h,       'ball_count' :ball_count,       'rad_min'    :rad_min,       'rad_fac'    :rad_fac,       'color_count':color_count       }    with open('chord.template','r') as f:        s=f.read()    s= replace_all(s,d)            return serve_html(s,w+30,h+30)

 
## Note: unfortunately the chart will not render in a static notebook, such as on github.

Note: unfortunately the chart will not render in a static notebook, such as on github.¶

 
The arcs around the perimeter represent the region labels for the stations, and their size represents the number of trips.  The paths between regions represent all the intra-region trips.  The width of the path where it touches a region arc is the number of trips originating in that region.  If you hover over a region trips in other regions will be dimmed.​At first glance this chart has a few noticable features.  The large number of intra-region trips stands out, though several smaller regions are exceptions (First Hill, Belltown and Central Business District.  Some regions, such as UW and UD are tightly connected only to each other.  Capitol Hill does originate more trips to South Lake Union, the Central Business District and Belltown than return, but the effect doesn't dominate the chart.  Pioneer Square has very even connections to other regions, which may be a function of the Second Avenue protected bike lane.

The arcs around the perimeter represent the region labels for the stations, and their size represents the number of trips. The paths between regions represent all the intra-region trips. The width of the path where it touches a region arc is the number of trips originating in that region. If you hover over a region trips in other regions will be dimmed.

At first glance this chart has a few noticable features. The large number of intra-region trips stands out, though several smaller regions are exceptions (First Hill, Belltown and Central Business District. Some regions, such as UW and UD are tightly connected only to each other. Capitol Hill does originate more trips to South Lake Union, the Central Business District and Belltown than return, but the effect doesn't dominate the chart. Pioneer Square has very even connections to other regions, which may be a function of the Second Avenue protected bike lane.

In [13]:

 
chord_chart(ball_count=30, color_count=14, rad_fac=10, rad_min=3, w=800, h=800)

Out[13]:

 
The thing to remember is the size of the regions in the chart above is based on the number of trips.  Another query can show how many stations are in each region.  This query also assigns random colors (different from the chart above) for use later.

The thing to remember is the size of the regions in the chart above is based on the number of trips. Another query can show how many stations are in each region. This query also assigns random colors (different from the chart above) for use later.

In [14]:

 
stations_per_region_sql = """    SELECT region, count(*) as n_stations    FROM stations    GROUP BY region"""stations_per_region = sqlContext.sql(stations_per_region_sql).toPandas()​import randomrandom.seed(20151115)stations_per_region.loc[:,'color'] = stations_per_region.iloc[:,0].map(    lambda n: '#%02X%02X%02X' % (random.randint(0,255),                                 random.randint(0,255),                                 random.randint(0,255)))stations_per_region

Out[14]:

	region	n_stations	color
0	UD	4	#C9BE53
1	UW	6	#7557AA
2	DPD	2	#D84B0A
3	BT	4	#A8C00D
4	ID	1	#3A949E
5	CBD	6	#E861ED
6	CD	1	#CB3C65
7	CH	10	#D64025
8	PS	2	#C1F771
9	WF	2	#7F7B14
10	EL	3	#ED8E33
11	SLU	11	#1D487F
12	FH	2	#B318E2

 
This shows that South Lake Union is a group of only ten stations.  It's the same for Capitol Hill.  UW and the University District combined are also 10 stations.  Pioneer square has only two stations, but they seem to be serving a distinct purpose, as are the four BT stations and the two waterfront stations.​My immediate concern is whether we currently have three network fragments (UW/UD, Capitol Hill and SLU/CBD/BT) rather than one dense network.  The data does have enough information to check how our network is performing compared to some existing networks in the US. This will be the topic of the next chart.

This shows that South Lake Union is a group of only ten stations. It's the same for Capitol Hill. UW and the University District combined are also 10 stations. Pioneer square has only two stations, but they seem to be serving a distinct purpose, as are the four BT stations and the two waterfront stations.

My immediate concern is whether we currently have three network fragments (UW/UD, Capitol Hill and SLU/CBD/BT) rather than one dense network. The data does have enough information to check how our network is performing compared to some existing networks in the US. This will be the topic of the next chart.

 
## 15 Minute Destinations​Analysis of other bike share networks shows that usage statistics rise non-linearly with station density.  http://nacto.org/wp-content/uploads/2015/09/NACTO_Walkable-Station-Spacing-Is-Key-For-Bike-Share_Sc.pdf​The following query simply counts station pairsthat have had a trip of less than 15 minutes.They are then grouped and sorted from most destinations to least. 

15 Minute Destinations¶

Analysis of other bike share networks shows that usage statistics rise non-linearly with station density. http://nacto.org/wp-content/uploads/2015/09/NACTO_Walkable-Station-Spacing-Is-Key-For-Bike-Share_Sc.pdf

The following query simply counts station pairs that have had a trip of less than 15 minutes. They are then grouped and sorted from most destinations to least.

In [15]:

 
stations_within_15_min_sql = """    SELECT DISTINCT from_station_id as station_id,                    from_station_name as station_name,                    count(DISTINCT to_station_id) as destinations    FROM trips    WHERE tripduration <= (15*60)    AND from_station_id != to_station_id    GROUP BY from_station_id, from_station_name    ORDER BY destinations DESC"""n_destinations_df = sqlContext.sql(stations_within_15_min_sql).toPandas()n_destinations_df

Out[15]:

	station_id	station_name	destinations
0	CH-05	15th Ave E & E Thomas St	44
1	CH-02	E Harrison St & Broadway Ave E	43
2	CH-03	Summit Ave E & E Republican St	43
3	CH-01	Summit Ave & E Denny Way	42
4	SLU-01	REI / Yale Ave N & John St	42
5	CH-08	Cal Anderson Park / 11th Ave & Pine St	42
6	CH-15	12th Ave & E Mercer St	42
7	CH-12	Bellevue Ave & E Pine St	41
8	CH-07	E Pine St & 16th Ave	41
9	SLU-16	Pine St & 9th Ave	40
10	CBD-03	7th Ave & Union St	40
11	SLU-15	Westlake Ave & 6th Ave	39
12	CH-06	12th Ave & E Denny Way	39
13	CH-09	Harvard Ave & E Pine St	39
14	CBD-13	2nd Ave & Pine St	39
15	SLU-07	PATH / 9th Ave & Westlake Ave	38
16	SLU-20	Terry Ave & Stewart St	37
17	CBD-06	2nd Ave & Spring St	37
18	BT-05	2nd Ave & Blanchard St	36
19	FH-01	Frye Art Museum / Terry Ave & Columbia St	36
20	CBD-04	Union St & 4th Ave	35
21	BT-04	6th Ave & Blanchard St	35
22	SLU-18	Dexter Ave & Denny Way	34
23	SLU-04	Republican St & Westlake Ave N	34
24	CBD-07	City Hall / 4th Ave & James St	33
25	DPD-01	9th Ave N & Mercer St	33
26	FH-04	Seattle University / E Columbia St & 12th Ave	32
27	BT-03	2nd Ave & Vine St	32
28	BT-01	3rd Ave & Broad St	31
29	CBD-05	1st Ave & Marion St	29
30	SLU-02	Dexter Ave N & Aloha St	29
31	SLU-19	Key Arena / 1st Ave N & Harrison St	29
32	PS-05	King Street Station Plaza / 2nd Ave Extension ...	28
33	PS-04	Occidental Park / Occidental Ave S & S Washing...	28
34	EL-03	E Blaine St & Fairview Ave E	27
35	ID-04	6th Ave S & S King St	27
36	SLU-17	Lake Union Park / Valley St & Boren Ave N	26
37	EL-01	Fred Hutchinson Cancer Research Center / Fairv...	25
38	CD-01	12th Ave & E Yesler Way	23
39	WF-01	Pier 69 / Alaskan Way & Clay St	23
40	WF-04	Seattle Aquarium / Alaskan Way S & Elliott Bay...	19
41	EL-05	Eastlake Ave E & E Allison St	18
42	UD-02	NE 42nd St & University Way NE	16
43	UW-04	15th Ave NE & NE 40th St	14
44	UD-04	12th Ave & NE Campus Pkwy	14
45	UW-02	Burke Museum / E Stevens Way NE & Memorial Way NE	13
46	SLU-21	Mercer St & 9th Ave N	13
47	UD-07	NE 47th St & 12th Ave NE	13
48	UW-10	UW Magnuson Health Sciences Center Rotunda / C...	12
49	UW-07	UW Intramural Activities Building	11
50	UW-06	UW Engineering Library / E Stevens Way NE & Je...	11
51	UD-01	Burke-Gilman Trail / NE Blakeley St & 24th Ave NE	11
52	UW-01	UW McCarty Hall / Whitman Ct	10
53	DPD-03	Children's Hospital / Sandpoint Way NE & 40th ...	6
54	Pronto shop	Pronto shop	1

 
This shows that there are 10 stations that are reasonably well connected (40+ destinations).  The two busy waterfront stations actually only have a mediocre number of destinations (19 and 23).  The University stations have very low connectivity and Children's Hospital is an outlier at the edge of the network.​The graphs in the NACTO paper suggest that stations with 40 destinations could expect to have around 10-30 departures per day.  This can also be checked.

This shows that there are 10 stations that are reasonably well connected (40+ destinations). The two busy waterfront stations actually only have a mediocre number of destinations (19 and 23). The University stations have very low connectivity and Children's Hospital is an outlier at the edge of the network.

The graphs in the NACTO paper suggest that stations with 40 destinations could expect to have around 10-30 departures per day. This can also be checked.

In [17]:

 
departures_per_day_sql = """    SELECT from_station_id as station_id, count(*)/365 as departures    FROM trips    GROUP BY from_station_id    ORDER BY departures DESC"""arrivals_per_day_sql = """    SELECT to_station_id as station_id, count(*)/365 as arrivals    FROM trips    GROUP BY to_station_id    ORDER BY arrivals DESC"""departures_df = sqlContext.sql(departures_per_day_sql).toPandas()arrivals_df = sqlContext.sql(arrivals_per_day_sql).toPandas()n_trips_df = arrivals_df.merge(departures_df).merge(n_destinations_df)n_trips_df.loc[:,'trips_to_from'] = n_trips_df.loc[:,'arrivals'] + n_trips_df.loc[:,'departures']​# merge in region, stations_per_region and region colorn_trips_df.loc[:,'region'] = n_trips_df.loc[:,'station_id'].map(lambda r: str(r).split('-')[0])n_trips_df = n_trips_df.merge(stations_per_region.loc[:,['region','color']], on='region')​n_trips_df

Out[17]:

	station_id	arrivals	departures	station_name	destinations	trips_to_from	region	color
0	WF-01	19.758904	18.471233	Pier 69 / Alaskan Way & Clay St	23	38.230137	WF	#7F7B14
1	WF-04	12.106849	8.660274	Seattle Aquarium / Alaskan Way S & Elliott Bay...	19	20.767123	WF	#7F7B14
2	CBD-13	19.695890	14.753425	2nd Ave & Pine St	39	34.449315	CBD	#E861ED
3	CBD-06	12.512329	8.246575	2nd Ave & Spring St	37	20.758904	CBD	#E861ED
4	CBD-05	11.241096	7.936986	1st Ave & Marion St	29	19.178082	CBD	#E861ED
5	CBD-03	10.846575	8.147945	7th Ave & Union St	40	18.994521	CBD	#E861ED
6	CBD-07	5.295890	5.035616	City Hall / 4th Ave & James St	33	10.331507	CBD	#E861ED
7	CBD-04	1.791781	1.890411	Union St & 4th Ave	35	3.682192	CBD	#E861ED
8	BT-01	15.890411	16.123288	3rd Ave & Broad St	31	32.013699	BT	#A8C00D
9	BT-05	9.476712	9.227397	2nd Ave & Blanchard St	36	18.704110	BT	#A8C00D
10	BT-03	9.276712	11.504110	2nd Ave & Vine St	32	20.780822	BT	#A8C00D
11	BT-04	5.084932	6.084932	6th Ave & Blanchard St	35	11.169863	BT	#A8C00D
12	SLU-07	14.767123	10.213699	PATH / 9th Ave & Westlake Ave	38	24.980822	SLU	#1D487F
13	SLU-15	14.597260	13.715068	Westlake Ave & 6th Ave	39	28.312329	SLU	#1D487F
14	SLU-16	13.265753	9.624658	Pine St & 9th Ave	40	22.890411	SLU	#1D487F
15	SLU-04	12.821918	8.279452	Republican St & Westlake Ave N	34	21.101370	SLU	#1D487F
16	SLU-01	12.079452	12.194521	REI / Yale Ave N & John St	42	24.273973	SLU	#1D487F
17	SLU-19	11.235616	10.315068	Key Arena / 1st Ave N & Harrison St	29	21.550685	SLU	#1D487F
18	SLU-02	9.830137	8.972603	Dexter Ave N & Aloha St	29	18.802740	SLU	#1D487F
19	SLU-17	8.849315	8.243836	Lake Union Park / Valley St & Boren Ave N	26	17.093151	SLU	#1D487F
20	SLU-18	5.610959	7.038356	Dexter Ave & Denny Way	34	12.649315	SLU	#1D487F
21	SLU-20	2.945205	2.704110	Terry Ave & Stewart St	37	5.649315	SLU	#1D487F
22	SLU-21	0.150685	0.213699	Mercer St & 9th Ave N	13	0.364384	SLU	#1D487F
23	PS-04	12.928767	7.701370	Occidental Park / Occidental Ave S & S Washing...	28	20.630137	PS	#C1F771
24	PS-05	9.183562	5.235616	King Street Station Plaza / 2nd Ave Extension ...	28	14.419178	PS	#C1F771
25	DPD-01	9.446575	7.002740	9th Ave N & Mercer St	33	16.449315	DPD	#D84B0A
26	DPD-03	2.054795	2.287671	Children's Hospital / Sandpoint Way NE & 40th ...	6	4.342466	DPD	#D84B0A
27	EL-03	9.389041	7.293151	E Blaine St & Fairview Ave E	27	16.682192	EL	#ED8E33
28	EL-05	6.621918	5.389041	Eastlake Ave E & E Allison St	18	12.010959	EL	#ED8E33
29	EL-01	5.709589	5.208219	Fred Hutchinson Cancer Research Center / Fairv...	25	10.917808	EL	#ED8E33
30	CH-08	8.284932	13.367123	Cal Anderson Park / 11th Ave & Pine St	42	21.652055	CH	#D64025
31	CH-02	8.172603	13.123288	E Harrison St & Broadway Ave E	43	21.295890	CH	#D64025
32	CH-09	5.254795	8.016438	Harvard Ave & E Pine St	39	13.271233	CH	#D64025
33	CH-03	5.208219	8.660274	Summit Ave E & E Republican St	43	13.868493	CH	#D64025
34	CH-12	5.153425	7.021918	Bellevue Ave & E Pine St	41	12.175342	CH	#D64025
35	CH-01	5.087671	8.863014	Summit Ave & E Denny Way	42	13.950685	CH	#D64025
36	CH-07	4.539726	14.219178	E Pine St & 16th Ave	41	18.758904	CH	#D64025
37	CH-05	3.356164	10.347945	15th Ave E & E Thomas St	44	13.704110	CH	#D64025
38	CH-15	3.183562	7.778082	12th Ave & E Mercer St	42	10.961644	CH	#D64025
39	CH-06	1.775342	4.575342	12th Ave & E Denny Way	39	6.350685	CH	#D64025
40	UD-04	6.476712	5.216438	12th Ave & NE Campus Pkwy	14	11.693151	UD	#C9BE53
41	UD-01	5.745205	4.841096	Burke-Gilman Trail / NE Blakeley St & 24th Ave NE	11	10.586301	UD	#C9BE53
42	UD-07	2.693151	3.523288	NE 47th St & 12th Ave NE	13	6.216438	UD	#C9BE53
43	UD-02	1.739726	1.726027	NE 42nd St & University Way NE	16	3.465753	UD	#C9BE53
44	ID-04	6.369863	4.010959	6th Ave S & S King St	27	10.380822	ID	#3A949E
45	UW-04	4.205479	3.761644	15th Ave NE & NE 40th St	14	7.967123	UW	#7557AA
46	UW-06	3.468493	3.758904	UW Engineering Library / E Stevens Way NE & Je...	11	7.227397	UW	#7557AA
47	UW-10	3.153425	2.304110	UW Magnuson Health Sciences Center Rotunda / C...	12	5.457534	UW	#7557AA
48	UW-07	2.769863	2.394521	UW Intramural Activities Building	11	5.164384	UW	#7557AA
49	UW-02	2.030137	2.972603	Burke Museum / E Stevens Way NE & Memorial Way NE	13	5.002740	UW	#7557AA
50	UW-01	1.216438	1.301370	UW McCarty Hall / Whitman Ct	10	2.517808	UW	#7557AA
51	FH-04	3.758904	5.104110	Seattle University / E Columbia St & 12th Ave	32	8.863014	FH	#B318E2
52	FH-01	2.101370	5.547945	Frye Art Museum / Terry Ave & Columbia St	36	7.649315	FH	#B318E2
53	CD-01	1.112329	1.205479	12th Ave & E Yesler Way	23	2.317808	CD	#CB3C65

In [18]:

 
%matplotlib inlineimport matplotlib.pyplot as plt​_, rides = plt.subplots()rides.grid(color='grey',           linestyle='dotted')​scatter = rides.scatter(n_trips_df.loc[:,'destinations'],                        n_trips_df.loc[:,'trips_to_from'],                        c=n_trips_df.loc[:,'color'],                        s = n_trips_df.loc[:,'destinations'])​rides.set_xlabel('other docks within 15 minute ride')rides.set_ylabel('rides per day to/from dock')rides.set_title("15 Minute Rides", size=14);

 
This plot looks similar to the plots in the NACTO paper.  The stations with the most destinations have between 10 and 30 rides per day.  The more remote stations appear to have fewer rides.  Because this notebook did not correct for stations that were added during the year, a few stations such as the new station at 9th and Mercer do not have an accurate rides per day number. ​Note also that the region colors in this plot do not correspond to the colors in the chord chart above, and there is no color/region key.  Still it is possible to see that the different regions have slightly different usage characteristics beyond station density and region size.

This plot looks similar to the plots in the NACTO paper. The stations with the most destinations have between 10 and 30 rides per day. The more remote stations appear to have fewer rides. Because this notebook did not correct for stations that were added during the year, a few stations such as the new station at 9th and Mercer do not have an accurate rides per day number.

Note also that the region colors in this plot do not correspond to the colors in the chord chart above, and there is no color/region key. Still it is possible to see that the different regions have slightly different usage characteristics beyond station density and region size.

 
## Conclusions and Next Steps​This notebook is intended to provides some quick insight into Pronto performance after a year of operation and toin demonstrate some useful technologies for continuing investigation.​The inter-region trip chart seems to show that there is some fragmentation and hill effects in the network, but they are not out of scale when compared to intra-region trips in the larger regions and the well-connected central business district, Belltown and even pioneer square regions.​The 15 minute rides chart shows that our network has similar performance to other bike share systems, but with the caveat that less dense and peripheral stations do not have enough destinations see high usage.​The Pandas, Spark and D3 libraries make it easy to manipulate and visualize this data.  Similar charts can be generated by demographics, time of day or year with slight modifications to the query.  It would be interesting to see the charts with automated region clusters rather than the region labels from the station table.​

Conclusions and Next Steps¶

This notebook is intended to provides some quick insight into Pronto performance after a year of operation and toin demonstrate some useful technologies for continuing investigation.

The inter-region trip chart seems to show that there is some fragmentation and hill effects in the network, but they are not out of scale when compared to intra-region trips in the larger regions and the well-connected central business district, Belltown and even pioneer square regions.

The 15 minute rides chart shows that our network has similar performance to other bike share systems, but with the caveat that less dense and peripheral stations do not have enough destinations see high usage.

The Pandas, Spark and D3 libraries make it easy to manipulate and visualize this data. Similar charts can be generated by demographics, time of day or year with slight modifications to the query. It would be interesting to see the charts with automated region clusters rather than the region labels from the station table.