Hypothesis Testing – A Lesson in Hacking

Last week we recovered from a catastrophic data loss with one of our hosting providers. This week some of our hypothesis testing attracted the interest of a rather clever script kiddy looking to mine Amazon gift cards. We’re having lots of fun online. 🙂

We’re currently trying to get feedback from translation customers and providers over at www.canadatranslates.ca. To do that, we’ve got some links and ads directing potential clients to surveys to collect some of the data we need to validate our business model. As an incentive to participate, we offered professional translators $5 gift cards to complete a 5-minute survey. A few people participated the first day the site went live but it wasn’t popular.

Overnight on the second day, 40 surveys trickled in. This was pretty shocking – how did we go from 1 every 8 hours to 6 per hour? I looked closer and the submissions didn’t make sense. Values weren’t aligned with what we expected, email addresses didn’t match names, submission times weren’t too far apart… the data was just too suspicious. I took everything down while I investigated. Whoever tried to mine the gift cards did a pretty good job of trying to make the submissions look legit. Entries came from Canadian IP addresses from different parts of the country, and unique Canadian addresses accompanied each submission. After adding a captcha and some mandatory fields that required valid text input, I broke our visitor’s script and made it easier to identify false entries. It was a pretty annoying less to learn though, and I had to manually clear out all the bad data from our records.

Lessons Learned

For our next surveys, he’s what we’ll watch more carefully:

Make several text fields mandatory that require unique submissions and at least a sentence or 2 of text
Include a captcha
Block multiple entries from a single IP address

Luckily our survey was small and we had few submissions. If our work was more popular, our guest’s interference would have cost us $200 PLUS made our research data invalid.

Leave a Comment Cancel Reply