Reporting is the process of organizing data to monitor performance; analysis is the process of exploring data and reports to…
Continue ReadingMarketing Analytics Tools For Success
I wouldn’t say GA is the be-all and end-all piece of equipment, and there are many different services and tools…
Continue ReadingMath for Machine Learning: Top Math Resources for Data Scientists
What Math Skills do Data Scientists Need Forms of the question “what math do I need for data science” and “what…
Continue ReadingSustainability Data- Finding Alpha
Using skills I learned at the Data Science Dojo bootcamp, I was able to tweak parameters of our classifier and…
Continue ReadingPrevent Your Organization from Becoming a Ransomware Statistic
The first step in recovering from a ransomware attack is to disconnect the affected machine from the local network, otherwise…
Continue ReadingData Ethics: Keeping Your Ethics in Check as a Data Scientist
This also means that any conclusions you make about certain groups of people or how the world works depends on…
Continue Reading5 Design Problems You Should Be Solving with Data
When you’re juggling all of this, you might think to yourself, “at least I don’t have to be a numbers…
Continue ReadingBoosting Interest in Data by Branding the Importance of Data Literacy
If we can do one thing in 2019 to increase interest in data, let it be that: to increase interest…
Continue ReadingUnfolding Naïve Bayes from Scratch: Part 1
This problem happened because the product (p of a test word “j” in class c) was zero for both the…
Continue ReadingData Science and Law | What One Lawyer Learned From a 50-Hour Data Science Bootcamp
Based on my criteria, Data Science Dojo’s data science bootcamp fit the bill: it’s a reasonably priced 5-day, 50-hour onsite program that didn’t…
Continue ReadingIntroduction to Blockchain and What it Means to Big Data
Using the Blockchain development technology for storing Big Data can be cost saving for companies..Blockchain has the capacity for storing…
Continue ReadingBig Data Ethics and 10 Controversial Data Science Experiments
Data and Big Data Ethics Data science is changing the game when it comes to manipulating data sets and visualizing…
Continue ReadingDetecting Algorithmic Bias and Skewed Decision Making
When considering all predictor variables, including the race attribute, the model learned to correlate race with the criminality outcome..Then, the…
Continue ReadingData Privacy and Anonymization Techniques
Simple Techniques to Anonymize Data A simple approach to maintaining personal data privacy when using data for predictive modeling or…
Continue ReadingDoes Data Democratization Result in Data Anarchy and Bad Business Decisions?
An Augmented Data Discovery Solution is not an Argument for Data Anarchy Rather, with the tools provided, an organization can…
Continue ReadingThe Future of AI, Voice Assistants, and Augmented Intelligence in 2019
This year we’re going to see Voice Assistants smart enough to really ‘get’ us..They’re already providing assistance with simple tasks…
Continue Reading1.1 Billion Taxi Rides: Spark 2.4.0 versus Presto 0.214
$ hdfs dfsadmin -report | grep 'Configured Capacity' Configured Capacity: 1480673034240 (1.35 TB) Configured Capacity: 74033651712 (68.95 GB) Configured Capacity:…
Continue Reading1.1 Billion Taxi Rides with MapD & AWS EC2
$ vi create_trips_table.sql CREATE TABLE trips ( trip_id INTEGER, vendor_id VARCHAR(3) ENCODING DICT, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag VARCHAR(1) ENCODING…
Continue ReadingAll 1.1 Billion Taxi Rides in Elasticsearch
$ vi ~/trips.conf input { file { path => "/one_tb_drive/taxi-trips/*.csv" type => "trip" start_position => "beginning" } } filter {…
Continue Reading1.1 Billion Taxi Rides with BrytlytDB & 2 GPU-Powered p2.16xlarge EC2 Instances
| |===============================+======================+======================| | 0 Tesla K80 Off | 0000:00:0F.0 Off | 0 | | N/A 61C P0 60W / 149W…
Continue ReadingA Billion Taxi Rides in Elasticsearch
$ sudo /etc/init.d/elasticsearch restart Importing a Billion Trips into Elasticsearch The machine used in this blog post has two physical…
Continue ReadingTightening Django Admin Logins
To do so I first created a new setting ADMIN_IP_WHITELIST in service/settings.py: ADMIN_IP_WHITELIST = ('555.5.5.1',) I then created white_list/__init__.py in…
Continue Reading1.1 Billion Taxi Rides with BrytlytDB 2.0 & 2 GPU-Powered p2.16xlarge EC2 Instances
| |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:00:0F.0 Off | 0 | | N/A 65C P0 57W / 149W…
Continue ReadingPerformance Impact of File Sizes on Presto Query Times
For reference these are the table names: trips_64mb trips_256mb trips_1024mb CREATE EXTERNAL TABLE trips_64mb ( trip_id INT, vendor_id STRING, pickup_datetime…
Continue Reading1.1 Billion Taxi Rides with BrytlytDB 2.1 & a 5-node IBM Minsky Cluster
$ sudo docker exec -ti cluster /usr/local/brytlyt/bin/psql brytlyt brytlyt CREATE NODE d04 WITH (HOST='x.x.x.213', TYPE='datanode', PORT=15432); CREATE NODE d05 WITH…
Continue Reading