Using Vue.js in a Jupyter notebook
Objectives
- Generate data using webcrawling with requests from Canada's Top 100
- Use of Scrapy
- Use of Pandas
- Integrate VueJS in a notebook
- Create simple table with filter functionality
Scraping data
Approach
To scrape the data, we will use the Scrapy library. Instead of writing our own scrapers, it is faster for this tutorial to simply use a proper library that was build to scrape for you.
- Load the main page
- Find all company links
- For each company link, open the corresponding page
- For each company page, find all ratings
Markup for companies links
<div id="winners" class="page-section">
...
<li><span><a target="_blank" href="http://content.eluta.ca/top-employer-3m-canada">3M Canada Company</a></span></li>
...
</div>
This corresponds with the Python code from the CompanySpider class:
for href in response.css('div#winners a::attr(href)').extract():
Markup for ratings
<h3 class="rating-row">
<span class="nocolor">Physical Workplace</span>
<span class="rating">
<span class="score" title="Great-West Life Assurance Company, The's physical workplace is rated as exceptional. ">A+</span>
</span>
</h3>
Python crawler
The crawler in Scrapy is defined in the following code snippet.
import logging
import scrapy
from scrapy.crawler import CrawlerProcess
class CompanySpider(scrapy.Spider):
name = "companies"
start_urls = [
"http://www.canadastop100.com/national/"
]
custom_settings = {
'LOG_LEVEL': logging.CRITICAL,
'FEED_FORMAT':'json',
'FEED_URI': 'canadastop100.json'
}
def parse(self, response):
for href in response.css('div#winners a::attr(href)').extract():
yield scrapy.Request(response.urljoin(href),
callback=self.parse_company)
def parse_company(self, response):
name = response.css('div.side-panel-wrap div.widget h4::text').extract_first()
for rating in response.css('h3.rating-row')[1:]:
yield {
'name': name,
'title': rating.css('span.nocolor::text').extract_first(),
'value': rating.css('span.rating span.score::text').extract_first(),
}
Make sure the output file does not exist in the directory where the script is going to be executed.
rm canadastop100.json
Next we need to define the crawling processor with the following:
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(CompanySpider)
process.start()
Executing this will give the following result:
2017-10-06 12:09:45 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-10-06 12:09:45 [scrapy.utils.log] INFO: Overridden settings: {'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}
Preparing data
import pandas as pd
Read the output file from the scraper.
df = pd.read_json('canadastop100.json')
df.head()
name | title | value | |
---|---|---|---|
0 | Bell Canada | Physical Workplace | A+ |
1 | Bell Canada | Work Atmosphere & Communications | A |
2 | Bell Canada | Financial Benefits & Compensation | A |
3 | Bell Canada | Health & Family-Friendly Benefits | B |
4 | Bell Canada | Vacation & Personal Time-Off | B |
Get the unique names in the database.
len(df['name'].unique())
101
Filter out the companies without a title.
df = df[df['title'].notnull()]
The unique elements in the value column are given by
df['value'].unique()
and result in
array(['A+', 'A', 'B', 'B+', 'B-', 'A-', 'C+'], dtype=object)
Lets map these values to a number to make it easier to work them in the dataset. Define the mapping:
mapping = {'A+': 10,
'A': 9,
'A-': 8,
'B+': 7,
'B': 6,
'B-': 5,
'C+': 4}
and apply the mapping to the value column:
df['value'] = df['value'].map(mapping)
Now we need to transpose the dataframe, since we want a matrix with the companies per row and the different scores as a column.
df = df.pivot(index='name', columns='title', values='value')
We add a column to get the total score:
df['Total Score'] = df.sum(axis=1)
The dataframe has the following layout after adding the extra column:
df.head()
title | Community Involvement | Employee Engagement & Performance | Financial Benefits & Compensation | Health & Family-Friendly Benefits | Physical Workplace | Training & Skills Development | Vacation & Personal Time-Off | Work Atmosphere & Communications | Total Score |
---|---|---|---|---|---|---|---|---|---|
name | |||||||||
3M Canada Company | 10 | 7 | 9 | 9 | 10 | 9 | 6 | 10 | 70 |
Aboriginal Peoples Television Network Inc. / APTN | 9 | 6 | 7 | 9 | 7 | 9 | 9 | 9 | 65 |
Accenture Inc. | 10 | 9 | 7 | 9 | 7 | 7 | 6 | 9 | 64 |
Agrium Inc. | 10 | 7 | 7 | 6 | 10 | 10 | 8 | 9 | 67 |
Air Canada | 10 | 6 | 9 | 7 | 9 | 10 | 4 | 6 | 61 |
As a last step we need to attach the dataframe to the body of the notebook by using some JavaScript. We import the proper libraries
from IPython.display import HTML, Javascript, display
and attach the dataframe, after converting, to the window.
Javascript("""ยง
window.companyData={};
""".format(df.reset_index().to_json(orient='records')))
<IPython.core.display.Javascript object>
Write to JSON file on disk if you want. This can be used in turn to move to the server where the VueJS application will be deployed.
df.reset_index().to_json('canadastop100.json', orient='records')
Visualizing data
Next step is to visualize the data using VueJS. VueJS can be included from https://cdnjs.cloudflare.com/ajax/libs/vue/2.4.0/vue. This notebook will make use of the example of the grid-component from the official documentation to create a table representing the crawled data.
Add the requirement to the notebook.
%%javascript
require.config({
paths: {
vue: "https://cdnjs.cloudflare.com/ajax/libs/vue/2.4.0/vue"
}
});
<IPython.core.display.Javascript object>
Define the template for displaying the data in a table using the x-template script type and the VueJS syntax.
%%html
<script type="text/x-template" id="data-template">
<table class="canada">
<thead>
<tr>
<th v-for="key in columns"
@click="sortBy(key)"
:class="{ active: sortKey == key }">
{{ key | capitalize }}
<span class="arrow" :class="sortOrders[key] > 0 ? 'asc' : 'dsc'">
</span>
</th>
</tr>
</thead>
<tbody>
<tr v-for="entry in filteredData">
<td v-for="key in columns">
{{entry[key]}}
</td>
</tr>
</tbody>
</table>
</script>
Define the main HTML that contains the template we defined earlier.
%%html
<div id="vue-app">
<form id="search">
Search <input name="query" v-model="searchQuery">
</form>
<data-grid
:data="gridData"
:columns="gridColumns"
:filter-key="searchQuery">
</data-grid>
</div>
Initialize the VueJS application using Javascript by extracting the data from the window, attaching the component with the table for the data and creating a new Vue instance.
%%javascript
require(['vue'], function(Vue) {
console.log(Vue.version);
var companyData = window.companyData;
console.log(JSON.stringify(companyData));
Vue.component('data-grid', {
template: '#data-template',
props: {
data: Array,
columns: Array,
filterKey: String
},
data: function () {
var sortOrders = {}
this.columns.forEach(function (key) {
sortOrders[key] = 1
})
return {
sortKey: '',
sortOrders: sortOrders
}
},
computed: {
filteredData: function () {
var sortKey = this.sortKey
var filterKey = this.filterKey && this.filterKey.toLowerCase()
var order = this.sortOrders[sortKey] || 1
var data = this.data
if (filterKey) {
data = data.filter(function (row) {
return Object.keys(row).some(function (key) {
return String(row[key]).toLowerCase().indexOf(filterKey) > -1
})
})
}
if (sortKey) {
data = data.slice().sort(function (a, b) {
a = a[sortKey]
b = b[sortKey]
return (a === b ? 0 : a > b ? 1 : -1) * order
})
}
return data
}
},
filters: {
capitalize: function (str) {
return str.charAt(0).toUpperCase() + str.slice(1)
}
},
methods: {
sortBy: function (key) {
this.sortKey = key
this.sortOrders[key] = this.sortOrders[key] * -1
}
}
})
var vueApp = new Vue({
el: '#vue-app',
data: {
searchQuery: '',
gridColumns: Object.keys(companyData[0]),
gridData: companyData
}
})
});
<IPython.core.display.Javascript object>
Attach a style to make the table more attractive.
%%html
<style>
table.canada {
border: 2px solid rgb(102, 153, 255);
border-radius: 3px;
background-color: #fff;
}
table.canada th {
background-color: rgb(102, 153, 255);
color: rgba(255,255,255,0.66);
cursor: pointer;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
}
table.canada td {
background-color: #f9f9f9;
}
table.canada th, table.canada td {
min-width: 120px;
padding: 10px 20px;
}
table.canada th.active {
color: #fff;
}
table.canada th.active .arrow {
opacity: 1;
}
.arrow {
display: inline-block;
vertical-align: middle;
width: 0;
height: 0;
margin-left: 5px;
opacity: 0.66;
}
.arrow.asc {
border-left: 4px solid transparent;
border-right: 4px solid transparent;
border-bottom: 4px solid #fff;
}
.arrow.dsc {
border-left: 4px solid transparent;
border-right: 4px solid transparent;
border-top: 4px solid #fff;
}
</style>
The result can also be tested on the jsfiddle that I have created. The source for the page can be found in my Vue repository and is visible on my bl.ocks.org. The notebook can be found on my Github and the final result is shown on this page.