Data Scientist

Technology

Apply Now

Blue Sky is a Indian Climate-tech startup that combines the power of AI & Satellite data to create environmental data-sets.

Our first 2 data-sets on Air Quality & Crop Fires have been awarded by MIT, the European Space Agency while our funders include the Patrick McGovern Foundation, Schmidt Futures & Tata Trusts.

Over the next 12 months, we aim to expand to 10 environmental data-sets spanning water, land, heat etc. We are looking to grow our data science team to power this expansion.

Responsibilities:

  • Develop dataframes & automated pipelines of merging and cleaning data from multiple sources.
  • Develop training and cross-validation data sets for machine learning algorithms.
  • Deploy machine learning models based on climate & geospatial science
  • Work cross-functionally with software engineers to integrate data science models into production pipelines.
  • Help develop, improve, and evangelize our data science knowledge base and infrastructure.
  • Develop training and cross-validation data sets for machine learning algorithms.
  • Deploy data science models and work cross-functionally with software engineers to integrate data science models into production pipelines.
  • Iterate rapidly on everything; all of the above happens in a relatively fast paced business driven environment, and you must be comfortable with that
  • Help develop, improve, and evangelize our data science knowledge base and infrastructure

What we expect in a Data Scientist:

  • Ability to curate and analyse vast amount of geospatial datasets such as satellite imagery, elevation data, meteorological datasets, openstreetmaps, demographic data, socio-econometric data and topography to extract useful insights about the events happening on our planet.
  • Must know how to handle large image and location datasets, how to visualise and query them, and how to use them for making predictions.
  • Must be familiar with geospatial libraries such as GDAL and rasterio to read/write the data, a GIS software such as QGIS for visualisation and query, and basic machine learning algorithms to make predictions.
  • Apply best practices in GIS to manage geospatial data.
  • Must research source documents to resolve missing or conflicting data
  • Experience working with and creating data architectures. (Qualification)
  • Must have worked with raster and vector datasets , performed raster analytics.
  • Producing maps showing the spatial distribution of various kinds of data, including emission statistics and pollution hotspots.
  • Develop innovative algorithms and models with geospatial data to solve global environmental problems like tracking depleting water levels in lakes in India, illegal tree logging, tracking oil spills in oceans etc.
  • Develop trend analysis and forecasting models and may design, develop, or maintain more complex data layers and data sets.
  • Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications.
  • Develop processes and tools to monitor and analyze model performance and data accuracy.
  • You will work closely with data engineers, data scientists, software developers and colleagues from other business functions.
  • Must have ability to prepare reports and give their professional opinion to explain geographic trends and findings to clients and other team members
  • Excellent written and verbal communication skills for coordinating across teams.

Skills & Qualifications:

  • Min 1 year of experience
  • Demonstrable experience implementing efficient neural network models and deploying them in a production environment
  • Excellent coding skills in python (including deep familiarity with NumPy, SciPy, pandas)
  • Significant experience with git, GitHub, SQL, AWS (S3 and EC2)
  • An ability to communicate complex data science concepts and results in a readily-understood manner
  • Minimum two years of demonstrable industry experience working with large and noisy datasets.
  • Degree in a STEM field (e.g., statistics, machine learning, computer science, engineering)
  • QGIS & GIS experience is a bonus
  • Technical Skills - Python/R , Hadoop Platform/Apache Spark, SQL Database/Coding, Machine Learning skills like Supervised machine learning, Unsupervised machine learning, Time series, Natural language processing, Outlier detection, Computer vision, Recommendation engines, Survival analysis, Reinforcement learning, and Adversarial learning. Data Visualisation using ggplot, d3.js and Matplottlib, and Tableau.