Data Driven Development
11/12/2019
Agenda
6 pm -- 6:30 Check-In
6:30 --6:35 pm Sponsors(JJLake + Harnham ) intro
6:35 -- 7:10 pm Talk 1 (BlackSesame)
7:10 -- 7:45 pm Talk 2 (Databricks)
7:45 -- 8:20 pm Talk 3 (Uber)
8:30 -- Closing
Talk 1: Case studies of data driven development in autonomous driving.
Autonomous driving software system is highly complex, mission critical, and rapidly iterating. The development cycle of such system involves software, hardware, and continuous testing. To guarantee a closed and efficient feedback loop from software engineering procedure to the road tests, there has to be a data driven pipeline that facilitates that. In this talk, we will use a in-garage vision based autonomous driving system as an example to talk about not only some unique challenges in such system design, but also how to use data driven development approaches to enable the fast delivery a close-to-production level software in this vertical.
Spark: Guan Wang (BlackSesame)
Dr. Guan Wang is the Head of AI at Blacksesame Technologies (AI chip startup in Santa Clara). His team has built a pure vision-based autonomous driving platform in the in-garage driving vertical, and a pure vision-based crowdsourcing HD mapping system. Prior to Blacksesame, he was one of the founding team member of NIO US (an electronic car startup, IPO'19). He worked on productionizing deep learning systems in the embedded environment. Before NIO, he worked at LinkedIn for cloud-based machine learning platform
Talk 2: Uncovering performance regressions in the TCP SACKs vulnerability fixes
In early July 2019, Databricks noticed some Apache Spark workloads regressing by as much as 6x. In this talk, we'll discuss how we traced these regressions back to the Linux kernel and the fixes for the TCP SACKs vulnerabilities. We will explain the symptoms we were seeing, walk through how we debugged the TCP connections, and dive into the Linux source to uncover the root cause.
Speaker: Chris Stevens (Databricks)
Chris Stevens is a software engineer at Databricks where he works on the reliability, scalability, and security of Apache Spark clusters. His work focuses on auto-scaling compute, auto-scaling storage, node initialization performance, and node health monitoring. Prior to Databricks, Chris founded the Minoca OS project, where he built a POSIX compliant, general purpose OS - from scratch - to run on resource constrained device. He got his start at Microsoft working on the Windows kernel team, porting the Windows boot environment from BIOS to UEFI.
Talk3: How to performance-tune Spark applications in large clusters
Uber developed an new Spark ingestion system, Marmaray, for data ingestion from various sources. It’s designed to ingest billions of Kafka messages every 30 minutes. The amount of data handled by the pipeline is of the order hundreds of TBs. Omar details how to tackle such scale and insights into the optimizations techniques. Some key highlights are how to understand bottlenecks in Spark applications, to cache or not to cache your Spark DAG to avoid rereading your input data, how to effectively use accumulators to avoid unnecessary Spark actions, how to inspect your heap and nonheap memory usage across hundreds of executors, how you can change the layout of data to save long-term storage cost, how to effectively use serializers and compression to save network and disk traffic, and how to reduce amortize the cost of your application by multiplexing your jobs, different techniques for reducing memory footprint, runtime, and on-disk usage. CGI was able to significantly (~10%–40%) reduce memory footprint, runtime, and disk usage.
Speaker: Omkar Joshi (Uber)
Omkar Joshi is a senior software engineer on Uber’s Hadoop platform team, where he’s architecting Marmaray. Previously, he led object store and NFS solutions at Hedvig and was an initial contributor to Hadoop’s YARN scheduler.