Date
August, 2018
Info
I worked as a data scientist intern at Parse.ly for the summer of 2018 and this was my primary project. Parse.ly empowers companies to understand and improve digital audience engagement through data. They are a SAAS company that offers a suite of services that helps top news sites measure audience attention and view internet attention trends from over a billion people.
Top news companies like Conde Nast, Ars Technica, and the Wall Street Journal use Parse.ly to learn of real-time attention shifts for the content on their sites.
With current Parse.ly software, a site such as the New Yorker sees that they have 'X' amount of views on their articles, but they cant easily compare articles from their Business section to their Politics section. They also can't compare their Politics section to the politics section of other publishers on the web.
The prototype I created helps give a benchmark of where a single publisher stacks up in a sea of publishers. A site can compare how they're doing in any category, and compare that to their own historical record and the performance of a large portion of publishers on the web. I used Python in the backend to automatically pull and create a unique dataset for each customer that generated a report, and D3 to visualize the data on the front-end.
This section uses reports generated for a variety of customers, with names redacted to retain customer privacy.
See what categories are performing the best for the time period selected, and what stories people from the network are writing about for that category. See what size you are in the overall network, determined by total page views.
View each individual post for the category, see how it compares to your own distribution, the distribution of publishers of a similar size, and the distribution of the entire category.
Reading the above graph, you can see that examplesite's performance is nearly 5 times greater when comparing median to the overall network. They may then want to view the content for the set of posts that did 100x better than the median, for example.
See how much of your traffic comes from search engines or social media, and how those metrics compare to the network averages for that category.
In addition, look at traffic sources in even finer detail by looking at top referral sites for a category.
At the time of prototyping, I only had a few months of data to pull from as it was very new. Months later, it's now possible to get more aggregate data for this section.
Rather than making the prototype a feature in the current software, after I left the team decided to turn it into its own software for new customers. As it is before release, I can't share any relevant code due to privacy concerns.