Data Analytics

We have the expertise and competence in extracting data from various formats like PDF, text documents, excel sheets and images and discovering patterns and insights in unstrcutured data.

Optical Character Recognition

OCR is the technology which enables us to interpret and extract text from images. We have immense experience in running OCR on large datasets. We use Tesseract and Google Cloud Vision API. 

Log file analysis

Log files generated every hour, minute or day by various programs. We can help you find patterns from log files.

Raspberry Pi

Raspberry Pi acts as an excellent portable Linux server. We can provide Raspberry Pi related services.

Case Studies

All India voter data analysis

We have acquired voter data of 4 states (and counting) of India and run several analytics on it for our client CRDDP (Centre for Research and Debates in Development Policy) including but not limited to:

  • Extract and parse data from PDF files and excel sheets and upload to MySQL database
  • Extract text from images
  • Assign religion of the voter based on his/her name
  • Translate names of voters and their guardians to English and local Indian languages
  • Provide demographic data 

Volume of data analysed and managed:

  • 500+ excel sheets
  • 100000+ images subjected to OCR using Google Cloud Vision API
  • 1+ billion names translated (transliteration) to Indian languages and English using Google API and PHP transliteration
  • Religion identification algorithm developed to assign religion to each voter in India (Identifies 92% Indian names)

Raspberry PI

Configured, set-up and deployed 3 SMS servers on Raspberry PI serving 3 GSM modems.