Analyzing Crime Data in Boston using Tellius

I joined Tellius about 5 weeks ago - we’re the augmented analytics company making it easy for anyone to access data and discover insights. Over the past 5 weeks (one of which I spent out of the office), I’ve worked hard to come up to speed on our strategy, install base, competitors and most importantly our product.

Tellius enables everyday users to find Insights from their data quickly and easily. It features powerful data ingestion and transformation, world-class search, AI-driven Insights and enterprise machine learning capabilities. All of this on a scalable platform, built on Spark and deployable anywhere.

One thing that I love about the Tellius platform is its ease of use. I think it is important to be able to use the product that you’re selling. This is a basic tenet of my personal life that I think extends to technology sales. Too often people are repeating what they’ve heard instead of what they’ve experienced. If it’s not a commodity, I want to talk to somebody who is experienced in using what I’m buying.

To demonstrate how easy it is to use Tellius, I found a cool data set on Kaggle that is of personal interest to me: crime in Boston. I lived in Boston for 2 years and it’s in my official top 5 US cities (guess you just have to be from New England). The raw csv file is 58 MB, with 319,074 rows and 17 columns.

Where to start — with the data!

Loading data was easy - I downloaded the csv file from Kaggle and uploaded it directly in to Tellius. You can connect a variety of external data sources via native connectors, or use JDBC to connect to almost anything.

This crime data set had a nice timestamp field with date and time of the event. This was automatically flagged as a timestamp (see the orange icon in the 8th column). I was able to easily add day, month and year columns to this data set using built in transformation features. I also added a fixed value column with a value of 1 for each incident.

After that I created a business view, which is like a database’s logical view. As I am only analyzing one table I didn’t need to join anything. Below is an example of a more complicated data set, with joins across multiple tables. This is easy to do via a point and click interface, but you can also write SQL or use something like Pandas if that is what you’re comfortable with.

All in, loading data, adding columns and publishing a business view took a total of 4 minutes and 45 seconds of compute time (see below) and roughly 10–15 minutes of my time exploring and understanding the data.

Search

Now that my data is loaded I want to start exploring it. I head to the Search area of Tellius to start.

There’s a data dictionary available for this dataset but I’m as lazy as the next millennial, so I first glance through all of the fields available on the Tellius sidebar.

Great, I see some interesting fields in here: District, Offense_Code_Group, Street, Location. What does Tellius recommend?

Some interesting suggested questions. Let’s take the “bottom 8” suggestion and flip it, instead looking at the top 8 crimes for 2016 by type of crime.

We can see that some of these are not severe crimes or even crimes at all — motor vehicle accidents, medical assistance, other, etc. Let’s explore our data and see if we can find a field that might help us to pare down our analysis. I return to our data set and look through interest fields. I check the Statistics of a variable called UCR_Part by clicking the Statistics button and see the following:

This looks like it might be a classification field! After Googling UCR, we can see that this is a classification of crimes called the Uniform Crime Reporting standard. Part One crimes are considered to be more severe, Part Two are less severe. See the UCR Handbook from the FBI for a summary of how crimes are reported. If I wanted to make this repeatable analysis I would set up a hierarchy in Tellius’ data preparation section. This is part of the one-time setup of a Tellius business view.

Let’s take a look at those more severe crimes as those are first priority. I click the Filter icon, select UCR_PART and choose a criteria for filtering.

After filtering for more severe crimes, we can now see that Larceny is by far the largest group of crimes in the Part One classification

Let’s investigate what is potential driving larceny. I click on the larceny bar, and up pops a menu:

Insights

Clicking on Insights kicks of a machine learning process that runs in the background. With this relatively simple & small data set, the process finishes in 3 second. Tellius finds 2 anomalies instantly:

Tellius quickly gave us some real, actionable Discoveries. Additional law enforcement focused on larceny could be assigned to Districts D4 and A1. Enforcement could be scaled back on Sundays and increased on Tuesdays. If we want to dive deeper in to these anomalies, we can kick off Insights.

Let’s take a look at a type of Insight that we call Trend Based Insight. Law enforcement measurements frequently compare time periods — it makes sense to track and make decisions based on increases or decreases in crime. First, I start by typing in natural language “show me incidntcount by month for 2016”. You’ll notice I misspelled Incident. Here is what Tellius returned:

Tellius realized what I was asking for an automatically returned the appropriate visual. It also highlighted an anomaly, where crime increased significantly from February to March. Clicking on this particularly point will automatically kick off a machine learning model that will determine what the key drivers were behind this change.

We can see that the month-to-month growth of 891 incidents was due to two factors — an increase in Part Two crimes, and an increase in Districts B2 & C6. Part Two crimes include fraud, embezzlement, vandalism, possession of weapons and drug crimes. While any increase in crime is negative, over 40% of our month-to-month change was due to an increase in these milder offenses. The Part Two increase is broken down further on the right, with various factors (Wednesday, District=C6, etc.) being significant contributors within the Part Two increase.

You’ll notice that these crimes also increased significantly in Districts B2 and C6. These insights can help to inform everything from hiring, to resource deployment to community outreach programs.

This process ran in 98 seconds. Imagine what would be required with your current tools and skills in order to discover this kind of insight. Tellius took a process that would require an analyst and data scientist collaborating hours to days and compressed that in to 98 seconds. Add in the one-time data load process and we still are under 30 minutes.

There are other Insights that Tellius can provide, including Comparison and Segmentation. These AI-driven Insights give visibility in to data that was previously unavailable, or that once took days to complete collaborating across multiple roles.

Next stop on the Tellius blog train will be the Machine Learning end of the spectrum. We’ll train some models and deploy them using Tellius. Until then, check out our free trial, available today on our website.

Sales @TelliusData. Previous IBMer and Mainer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store