Earth science is making rapid advances through effective utilization of large-scale data repositories such as aerial LiDAR and access to NSF-funded cyberinfrastructures (e.g. the OpenTopography.org data portal, iPlant Collaborative, and XSEDE). Scaling analysis tasks that are traditionally developed using desktops, laptops or computing clusters to effectively leverage national and regional scale cyberinfrastructure pose unique challenges and barriers to adoption. To address some of these challenges in Fall 2014 an ‘Applied Cyberinfrastructure Concepts’ a project-based learning course (ISTA 420/520) at the University of Arizona focused on developing scalable models of ‘Effective Energy and Mass Transfer’ (EEMT, MJ m-2 yr-1) for use by the NSF Critical Zone Observatories (CZO) project. EEMT is a quantitative measure of the flux of available energy to the critical zone, and its computation involves inputs that have broad applicability (e.g. solar insolation). The course comprised of 25 students with varying level of computational skills and with no prior domain background in the geosciences, collaborated with domain experts to develop the scalable workflow. The original workflow relying on open-source QGIS platform on a laptop was scaled to effectively utilize cloud environments (Openstack), UA Campus HPC systems, iRODS, and other XSEDE and OSG resources. The project utilizes public data, e.g. DEMs produced by OpenTopography.org and climate data from Daymet, which are processed using GDAL, GRASS and SAGA and the Makeflow and Work-queue task management software packages. Students were placed into collaborative groups to develop the separate aspects of the project. They were allowed to change teams, alter workflows, and design and develop novel code. The students were able to identify all necessary dependencies, recompile source onto the target execution platforms, and demonstrate a functional workflow, which was further improved upon by one of the group leaders over Spring 2015. All of the code, documentation and workflow description are currently available on GitHub and a public data portal is in development. We present a case study of how students reacted to the challenge of a real science problem, their interactions with end-users, what went right, and what could be done better in the future.
Pelletier J., Swetnam T., Merchant N., Callahan N., Lyons E. (2015): Scaling Critical Zone analysis tasks from desktop to the cloud utilizing contemporary distributed computing and data management approaches: A case study for project based learning of Cyberinfrastructure concepts. Abstract IN43B-1730 presented at 2015 Fall Meeting, AGU, San Francisco, CA, 14-18 Dec..
This Paper/Book acknowledges NSF CZO grant support.