Analyzing Crime Data in Boston using Tellius

Christopher Reuter
7 min readFeb 19, 2020

--

I joined Tellius about 5 weeks ago - we’re the augmented analytics company making it easy for anyone to access data and discover insights. Over the past 5 weeks (one of which I spent out of the office), I’ve worked hard to come up to speed on our strategy, install base, competitors and most importantly our product.

Tellius enables everyday users to find Insights from their data quickly and easily. It features powerful data ingestion and transformation, world-class search, AI-driven Insights and enterprise machine learning capabilities. All of this on a scalable platform, built on Spark and deployable anywhere.

One thing that I love about the Tellius platform is its ease of use. I think it is important to be able to use the product that you’re selling. This is a basic tenet of my personal life that I think extends to technology sales. Too often people are repeating what they’ve heard instead of what they’ve experienced. If it’s not a commodity, I want to talk to somebody who is experienced in using what I’m buying.

To demonstrate how easy it is to use Tellius, I found a cool data set on Kaggle that is of personal interest to me: crime in Boston. I lived in Boston for 2 years and it’s in my official top 5 US cities (guess you just have to be from New England). The raw csv file is 58 MB, with 319,074 rows and 17 columns.

Where to start — with the data!

Loading data was easy - I downloaded the csv file from Kaggle and uploaded it directly in to Tellius. You can connect a variety of external data sources via native connectors, or use JDBC to connect to almost anything.

Data sources supported by Tellius
Data set after loading it in to Tellius

This crime data set had a nice timestamp field with date and time of the event. This was automatically flagged as a timestamp (see the orange icon in the 8th column). I was able to easily add day, month and year columns to this data set using built in transformation features. I also added a fixed value column with a value of 1 for each incident.

Adding a column via a date transformation option

After that I created a business view, which is like a database’s logical view. As I am only analyzing one table I didn’t need to join anything. Below is an example of a more complicated data set, with joins across multiple tables. This is easy to do via a point and click interface, but you can also write SQL or use something like Pandas if that is what you’re comfortable with.

Tellius supports joining multiple tables

All in, loading data, adding columns and publishing a business view took a total of 4 minutes and 45 seconds of compute time (see below) and roughly 10–15 minutes of my time exploring and understanding the data.

Total compute time for loading and manipulating data

Search

Now that my data is loaded I want to start exploring it. I head to the Search area of Tellius to start.

This is what the Search screen on Tellius looks like

There’s a data dictionary available for this dataset but I’m as lazy as the next millennial, so I first glance through all of the fields available on the Tellius sidebar.

Available fields

Great, I see some interesting fields in here: District, Offense_Code_Group, Street, Location. What does Tellius recommend?

Some of Tellius’ recommended questions

Some interesting suggested questions. Let’s take the “bottom 8” suggestion and flip it, instead looking at the top 8 crimes for 2016 by type of crime.

We can see that some of these are not severe crimes or even crimes at all — motor vehicle accidents, medical assistance, other, etc. Let’s explore our data and see if we can find a field that might help us to pare down our analysis. I return to our data set and look through interest fields. I check the Statistics of a variable called UCR_Part by clicking the Statistics button and see the following:

Statistics for UCR_PART

This looks like it might be a classification field! After Googling UCR, we can see that this is a classification of crimes called the Uniform Crime Reporting standard. Part One crimes are considered to be more severe, Part Two are less severe. See the UCR Handbook from the FBI for a summary of how crimes are reported. If I wanted to make this repeatable analysis I would set up a hierarchy in Tellius’ data preparation section. This is part of the one-time setup of a Tellius business view.

Let’s take a look at those more severe crimes as those are first priority. I click the Filter icon, select UCR_PART and choose a criteria for filtering.

Filtering for UCR_PART

After filtering for more severe crimes, we can now see that Larceny is by far the largest group of crimes in the Part One classification

Top 8 crimes where UCR_PART = Part One

Let’s investigate what is potential driving larceny. I click on the larceny bar, and up pops a menu:

Insights

Clicking on Insights kicks of a machine learning process that runs in the background. With this relatively simple & small data set, the process finishes in 3 second. Tellius finds 2 anomalies instantly:

This anomaly identified two districts where larceny is highest (D4 and A1)
This anomaly identified when larceny is least likely to happen (Sunday)

Tellius quickly gave us some real, actionable Discoveries. Additional law enforcement focused on larceny could be assigned to Districts D4 and A1. Enforcement could be scaled back on Sundays and increased on Tuesdays. If we want to dive deeper in to these anomalies, we can kick off Insights.

Let’s take a look at a type of Insight that we call Trend Based Insight. Law enforcement measurements frequently compare time periods — it makes sense to track and make decisions based on increases or decreases in crime. First, I start by typing in natural language “show me incidntcount by month for 2016”. You’ll notice I misspelled Incident. Here is what Tellius returned:

Crime over 2016 by month

Tellius realized what I was asking for an automatically returned the appropriate visual. It also highlighted an anomaly, where crime increased significantly from February to March. Clicking on this particularly point will automatically kick off a machine learning model that will determine what the key drivers were behind this change.

Trend Insight: Top level explanation of the change
Trend Insight: Change reason contributor breakdown

We can see that the month-to-month growth of 891 incidents was due to two factors — an increase in Part Two crimes, and an increase in Districts B2 & C6. Part Two crimes include fraud, embezzlement, vandalism, possession of weapons and drug crimes. While any increase in crime is negative, over 40% of our month-to-month change was due to an increase in these milder offenses. The Part Two increase is broken down further on the right, with various factors (Wednesday, District=C6, etc.) being significant contributors within the Part Two increase.

You’ll notice that these crimes also increased significantly in Districts B2 and C6. These insights can help to inform everything from hiring, to resource deployment to community outreach programs.

This process ran in 98 seconds. Imagine what would be required with your current tools and skills in order to discover this kind of insight. Tellius took a process that would require an analyst and data scientist collaborating hours to days and compressed that in to 98 seconds. Add in the one-time data load process and we still are under 30 minutes.

There are other Insights that Tellius can provide, including Comparison and Segmentation. These AI-driven Insights give visibility in to data that was previously unavailable, or that once took days to complete collaborating across multiple roles.

Next stop on the Tellius blog train will be the Machine Learning end of the spectrum. We’ll train some models and deploy them using Tellius. Until then, check out our free trial, available today on our website.

--

--

Christopher Reuter
Christopher Reuter

Written by Christopher Reuter

Marketing @ Resourcely. Previously at Prefect, Tellius, IBM. Proud Mainer.

No responses yet