Faster IPv4 WHOIS Crawling

$ ansible coordinator -m shell -a 'bash -c "cd /home/ubuntu/ips && source /home/ubuntu/.ips/bin/activate && nohup python runserver &"' $ ansible coordinator -m shell -a 'bash -c "cd /home/ubuntu/ips && source /home/ubuntu/.ips/bin/activate && nohup python collect_whois &"' With those in place Ill tell each Redis instance across the cluster of worker nodes what the private IP address of the Redis master is..The worker nodes are already up and running but wont begin to work till they can collect their configuration from their local Redis instance..The master Redis instance already has these configuration keys in place and once the slaves have this information replicated to them they will get started..$ ansible worker -m shell -a "echo 'slaveof 6379' | redis-cli" Cluster Telemetry I have two primary commands that report back on the progress the cluster is making..The first shows the per-minute, per-node telemetry which can be found simply by following the metrics Kafka topic..$ ssh -i ~/.ssh/ip_whois.pem ubuntu@ "/tmp/kafka_2.11- –zookeeper localhost:2181 –topic metrics –from-beginning" Here is an example output line (formatted and key-sorted for clarity)..{ "Host": "", "Timestamp": "2016-04-29T19:09:59.575451", "Within Known CIDR Block": 93, "Awaiting Registry": 1, "Found Registry": 135, "Looking up WHOIS": 10, "Got WHOIS": 191, "Failed to lookup WHOIS": 10 } The second command collects the latest telemetry from each individual host seen in the metrics topic and sums their values of each metric reported on..This lets me see a running total of the clusters overall performance..$ ssh -i ~/.ssh/ip_whois.pem ubuntu@ "cd /home/ubuntu/ips && source /home/ubuntu/.ips/bin/activate && python telemetry" Here is an example output line (formatted and key-sorted for clarity)..{ "Within Known CIDR Block": 1953, "Awaiting Registry": 47, "Found Registry": 2303, "Looking up WHOIS": 378, "Got WHOIS": 4080, "Failed to lookup WHOIS": 128 } In the previous deployment of this cluster the coordinator was under heavy load from performing CPU-intensive CIDR hit calculations on behalf of all the worker nodes..Ive since moved that task on to each of the worker nodes themselves..45 minutes after the cluster was launched I ran top on the coordinator and one of the workers to see how much pressure they were under.. More details

Leave a Reply