Data Wrangler at Automattic: 2011 – present
M.S., Computer Science, University of Colorado: 2009 – 2011
Engineer at Broadcom and Teradyne (Hardware, Software, Verification): 2000-2009
B.S., Electrical Engineering, Cornell University 2000

email: browngp @
CV: GregBrownResume.pdf
Profile: LinkedIn




I am interested in using statistical techniques to extract useful information from large data sets. I’ve applied machine learning to natural language processing tasks as well as in other domains. My interests are in implementing and scaling up these systems to solve real problems.

Professional Work

  • Data Team: July 2011-Present
    Helped to grow multi-disciplinary team from 5 to 18 engineers, data scientists, and designers.
    Took on diverse roles in a flexible, mostly flat team structure including individual contributor, project lead, and co-lead of the team.
    Scaled 40+ node Elasticsearch cluster to support 50+ million search requests per day across 2 billion documents.
    Led Elasticsearch development/maintenance efforts across the company enabling dozens of developers to build fast, reliable search systems for many different applications.
    Built and shipped multiple front end features used by millions of users every day: Related Posts, Notifications, Stats. Participated in the design of these features from conception to launch.
    Helped to drive the direction of the Data Team as it scaled out data systems on many different technologies: Elasticsearch, Hadoop, HDFS, Kafka, Hive, Impala, Spark. Projects regularly required scaling to handle many terabytes of data and millions of events per day.
    Contributed to data science efforts on engagement, content recommendation, search, and ad hoc data analysis.
  • J.D Power and Associates: Web Intelligence: June 2010-Dec 2010
    Built integrated NLP research infrastructure (text tokenization, part of speech tagging with HMM, and
    RRM dependency parser) for science team researching automated sentiment analysis of online media.
    Researched sequence labeling techniques for determining sentiment, entity mentions, and meronymy in text documents. This infrastruture was then also used for my Master’s Thesis on relation extraction.
  • Broadcom: High Def Video Chip Development: 2005-2007
    Responsible for planning, implementing, and maintaining C++ and Perl verification and development environments
    for the DDR2 Memory System and MIPS processors in a HD-DVD/Blu-Ray System on a Chip. Worked with a large
    team to design, verify, and bring to production multiple versions of the chip.
  • Teradyne: System Platform Development: 2000-2005
    Designed and verified various large and small FPGA projects, taking designs from early concepts
    through to completion. Heavily involved in designing and implementing distributed, embedded DSP processing system
    which was patented in 2002. Lead technical designer on multiple projects. Developed hardware systems and
    extensive software for the development of the systems.

Research Projects

  • Master’s Thesis: Relation Extraction on the J.D. Power and Associates Sentiment Corpus: Aug 2010 – April 2011
    Using existing and novel techniques to extract relations between entities in the JDPA Sentiment Corpus (consisting of blogs and other social media). I used support vector machines to classify relations with both binary vectors and tree kernels to encode the features. Additionally I built an ensemble classification scheme to combine predicted relations across multiple sentences and improve the entity relation extraction.
    Details: Thesis
  • Data Mining Project: Examining News Bias with Topic Modeling: Feb 2011 – April 2011
    Performed topic modeling across multiple news sources to explore how much insight can be gained into news bias by examining the variation in the topics across different news sources. We focused on the protests and revolts in Egypt and Libya from Jan 2011 through April 2011 and how the events were discussed by a number of news organizations from around the world.
    Details: Final Paper
  • Accelerating System Performance Analysis with Machine Learning: Aug 2009 – August 2010
    Learning from an expert human’s analysis of a computer system’s performance which metrics
    are most likely to lead to the root cause of performance anomalies.
    An Examination of the Local Dynamics of Computer Performance (starting on page 16)
    Waveform Shape Recognition for Vertical Profiling with an Ensemble of Support Vector Machines
  • Modeling of Hybrid Multi-Robot Systems: Fall 2009
    Comparing discrete-time simulations of a robotic system with probabilistic state-space models of the same system
    in order to find better methods of understanding the dynamics.
    Cooperative Lifting of an Object Using a Hybrid Robotic System
  • Evolutionary Computation Research: Jan 2008 – July 2008
    Research into self-adaptive genetic algorithms where the parameters of the evolution are also optimized by
    running a separate genetic algorithm.