Monday, July 28, 2014

Big data in Science

Rapidly making in-roads into the scientific research domain, the potential of big data in advancing research is undeniable.

Traditionally though, big data has always been used by scientists. Large data sets were used by high-energy particle and nuclear physics. When the Human Genome Project finished mapping in 2003, they were sitting on a data set worth 3 billion in alphabets.

Of course, this is discounting the many codes for specific proteins which would make this number even more fantastic!

What has now changed is the magnitude and the scope of data. New-age sciences like the -omics (genomics, proteomics etc.), astrophysics, and astronomers in particular, are looking to harness big data and analytics to answer questions long in the dark.

Thursday, June 26, 2014

Growth Hacking Techniques - Part-2

Growth hacking techniques rely on analytics and data to get better insight. The advantage over competition is more if organic search, SEO, and SEM is right. 

It is essential for a growth hacker - a specialist in his own right - to be able to answer the following questions: 

    Growth Hacking Ideas and Tips
  • What the customer wants
  • Where he is
  • What does he speak 

Organic search is one way to answer these questions. However, it is more pertinent for organic search to return results by using rich snippets that draw attention and make conversation more interesting. Sharable content also helps boost customer base. The more lists, DIY's and more importantly, pictures you have, the wider the net is cast as audience shares your content. Google trends or social media searching makes life easy in this case since your audience directly tells you what it is they are interested to hear.

If you're an entrepreneur with social media standing, and are treated as an influence in your own right, then an ideal way is to be the guest. Your voice matters and it will echo everywhere, adding to your growth. This is word-of-the-mouth backed by solid coding and algorithms to drive traffic.

Contributing to relevant topics across the Web and then back-tracking it to your own site also creates SEO value. The idea is to talk more: to be useful in sharing content/ information and not spamming! Everyone loves sharing content, more so if it is relevant. 

Another side to content sharing is to be vocal about it. It's not a pretty mountain lake waiting to be discovered; make it the ubiquitous bottle everyone wants. Allowing built-in sharing diverts the traffic. It may be annoying, but hey, it's the numbers that matter! 

Ultimately it's about the audience, and if your CRM isn't great, your audience pool will dry up faster than it filled up. Loyalty speaks the loudest. To keep the numbers growing, use e-mail triggers to activate dormant or infrequent users. Maximize return visits with offers and/or special alerts. 

Growth Hacking Goals

Engagement is the key in growth hacking. However, focusing too much on increase the customer base can divert from other vital tasks such as quality control, product enhancement or even managing internal affairs. 
Typically, lean startups can take the most advantage of growth hacking if there is a bigger picture in mind and the wise should know to grow not just vertically, but horizontally as well. 

Image Credits: clicklaboratoryaxiomcreative

Tuesday, June 24, 2014

Growth Hacking | Part - 1

Dropbox did it and so did Airbnb; and quite successfully at that. Both have millions of users today because of what they did. What did they do? 

They hacked growth - a hybrid of traditional marketing and coding that enables rapid customer acquisition growth. 

Growth Hacker


With Growth Hacking, a new generation of marketers is changing the rules of the game. Traditionalists may call it Lean marketing, yet it is more than just that. It is a funnel that with the help of strong technical skills, innovative marketing hacks and the ability to track results helps grow the subscriber (read: audience) base in a small time.

Tuesday, June 17, 2014

The PromptCloud Advantage- Web Scraping with an Edge

The global market is now more aware of its data scraping needs. And so with the demand, the list of suppliers has grown too. This post is dedicated to bringing out the PromptCloud Advantage among such providers.

Ahead in the race


1. The know-how- Crawling the web, as mundane as it may sound, is a fairly complex task. No one is to be blamed for overlooking the complexity as these things surface only after you've tried it yourself and delved into the nitty-gritty. The design decisions you take sit at the core of what you build and eventually monetize. And the long-term effects of such architectural choices are as pleasing if you've done it right as disturbing they might turn out if you're not far-sighted.

Tuesday, June 10, 2014

Big Data for Mass Media

House of Cards by Netflix exploded on TV when it aired. Big data helped it gather the audience – and the ratings! – they wanted. Hollywood’s doing it now, as are major networks and media corporations across the world. The question naturally is: how is big data relevant to media & entertainment business?

No more are massive content consumers anonymous. It is possible to know who they are, what they do, what they like, how they spend, and what they spend on. More importantly, how much they spend.

Media Influence


How do we know all this?

Big data tells us almost everything. It changes the way you listen to your audience,

Thursday, April 24, 2014

Structured Data Formats- What to use When

Days of unstructured raw data are blurry as most data-driven companies turn their eyes on to structured data which can be easily imported into their internal systems, read by their machines, and analyzed by their teams to keep cutting the edges. But while you have stepped on to the task of data acquisition via crawling or various such methods, how do you define the structure that's most appropriate for your requirements?

Most of the times we find our clients unable to take a decision on which output format to select post crawl and extraction. Here's our bit to help weigh your options.

1. XML - By far, the most robust structure that you can opt for. Does not break with unwanted special characters in the data and can be as nested as you like. However, if your internal systems are not designed for large-scale, it might end up being a bloat with huge file sizes to handle.

Summary- trade-off the bloat for robustness of XML and its flexibility of schema.

Follows an illustration of an XML record-

Monday, March 24, 2014

Outsourcing Web Crawls- SLAs to visit

If you deal with data on the web, more often than not would you have made a choice to employ an external entity for your data acquisition needs.

Here are the SLAs you'd need to consider when having a DaaS provider do the crawls for you-

1. Crawlability- If you're into the crawling business, this is the primary attribute to be assured of. Irrespective of the technical variety of the websites, crawls should run smoothly. The crawlers need to be adept with the roadblocks and their corresponding workarounds. Here's a post discussing these roadblocks and this one digging into AJAX pages.
Credit: auditagency.com.ua


2. Scalability- While crawling as a process might seem overrated when doing this for a few web pages or even to a couple of sites at max, the problem changes by an order of magnitude when this needs to be done at scale. Managing multiple clusters, distributing crawls across them, monitoring the same, collating results from these crawls and then grouping them is where the devils of crawling lie. Make sure your provider is agnostic to the scale you anticipate (look for cues like thousands of sites or millions of pages). Even if your current need is a low-scale arrangement, it's better to go with a solution that's scalable so that you have a reasonably though-out solution at your disposal with all nuts and bolts in place.

3. Data structuring capabilities- Crawling is only half the problem if your requirement is ready to use structured data. Every web page is different, and so are the requirements pertaining to every project. How detailed can your provider be in terms of extracting information from any nook of the page is something for you to validate. This becomes especially critical when your vendor is using a generic crawler in which case number of fields is limited as opposed to writing custom rules per site wherein you define the data schema as per your needs. It's also a good idea to add quality checks at your end to avoid compromises because with web-scale and automation, there could be surprises.

4. Data accuracy- This is in lieu with the above point on structuring capabilities. You'd like access to untouched and uncontaminated information from the web pages. Most providers will extract data as-is from the site for the same reason because any minor modification might defeat the purpose of extracting such data in most cases. However, sometimes you might be resistant to too many new lines, spaces, tabs, etc. (from the web page itself) and hence some level of cleaning could be asked for.

5. Data coverage- Crawls can end up in few pages being missed or skipped for various reasons like page does not exist, page timing out or taking faster to load, or just that the crawler never got to that page. Although such issues are unavoidable especially at scale, they can sure be cured by keeping logs and, for the least, being aware of which ones crept in. Discuss the tolerance levels that you're comfortable with so that the provider can configure their system accordingly.

6. Availability- Data acquisition, at its core, demands availability of right data at the right time. Let your provider know beforehand of the uptimes that you expect. Most of the providers who run data acquisition as a primary business should be able to guarantee ~99% availability of their data delivery channels.

7. Adaptability- Let's come to terms with the fact that whichever process you have adopted between waterfall to agile, requirements do change because of the market dynamism. When acquiring data, you might reveal that adding more information to the data feeds will give you a competitive edge or you might simply have gotten aware of other data sources. How easily your provider can adapt to (if at all) such dynamics is something to check for upfront.

8. Maintainability- As big a deal the crawling and structuring of data is, so is monitoring the pipeline for  regular automated feeds. Although it purely depends on your provider's business model, it's better to be aware of what's included with the project. Given how often websites change, it's better to employ someone who gets notified of changes and does the fixes, so that your team can avoid the hassles of maintaining it.

Do you think there's more to this? We welcome your comments.