Official US Government Icon

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure Site Icon

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Conducting Analysis in SDC

As a data analyst, work within the USDOT Secure Data Commons (SDC) to share code and data, upload datasets, and export approved derived analyses. Through the SDC, you can:

  • Share code and data with other analysts
  • Upload your own datasets
  • Export approved derived analysis

We'll provide you with a cloud-based workstation with preloaded programming environments and software that grants you access to the data lake and data warehouse. The workstation also includes commercially available tools - no local software or tool installation needed!

Analytical Tools and Query Languages Supported

The SDC platform provides on-demand access to popular programming and statistical tool packages for cloud-based processing (for experienced analysts). Other, nonstandard software can be installed upon request, both individually and across user groups. For software requiring special licenses, analysts may provide their own existing licenses.

Analytical Tools 

 

 

Custom options available upon request

 

Types of Datasets

The SDC platform provides a data lake of transportation-related structured, semi-structured, and unstructured datasets that are stored in raw, curated, and published formats. Each dataset has different data agreements based on the complexity and sensitivity of the data. Access to specific data is approved by data providers - learn more about specific dataset formats below:

Raw Datasets

Raw datasets are unaltered data are stored in their native/original "as-is" format. Uploads can be continuous through streaming sources (i.e., APIs or sensors) or through one-time uploads from external sources. This data can be structured (databases, logs, financial data), semi-structured (HTML, XML, RDF, CSV), or unstructured (images, PDFs, Word documents). Raw data cannot be copied or exported out of SDC.

Curated

Data curation is the organization and integration of raw data collected from various sources. The curated data is annotated, so that the value of the data is maintained and made available for reuse and preservation. During the curation process, data is transformed from unstructured and semi-structured formats to structured formats; and data deduplication, obfuscation, and cleansing processes are conducted - resulting in high-quality data that enables researchers to elicit meaningful insights.

Published

Researchers create published datasets to disclose their research and allow other users to verify and reuse the data beyond their original purpose. Published datasets are a result of combining analyses on curated datasets in the SDC platform with other datasets or algorithms owned or created by a researcher or data scientist.

What's Next?

As a data analyst planning to do analysis in the SDC, use the steps below to get started.

Download the access request form , fill out the required details, and send an email to sdc-support@dot.gov. Once approved, we will send you an email with the instructions for accessing the platform.

Follow the instructions in the Welcome Email from the SDC. Review the Research Analyst User Guide .

Our Enablement Services team offers custom upgrades to help your project team along the way

Last updated October 2021