NLP & AI FOR CIVIC TECH

Block Party: A Platform to Explore NYC Community Board Meetings

Explore how civic-technology can bring together Community Board meetings in New York City.

Block Party seeks to bridge the information gap between Community Boards and the people they represent. Key to hyperlocal democracy, what is discussed, prioritized, and passed at these meetings can affect your day-to-day life. Topics range from education, transportation, sanitation, public safety, economic development, housing, parks, and how your neighborhood is coping with Covid-19.

But, have you ever attended a Community Board meeting?

Maybe you know what Community District you live in, but don’t have the time (or appetite) to attend several 1–3 hour-long meetings. Across the city, there should be a way to explore the topics and issues at the forefront of these neighborhood meetings. This is why we built Block Party, our goal is to make local policy information accessible and byte-sized.

The purpose of block party is to make local policy information accessible and byte-sized.

With vast responsibilities, Community Boards advocate new initiatives, advise on permits, approve land-use and zoning policy, allocate budgets, and since the Covid-19 lockdown — their meetings became accessible in new ways.

Due to the city-wide shutdown, civic engagement is more virtual. Community Boards have adapted and turned to Zoom, Webex, Facebook, and YouTube to host their meetings live and publish the recordings online. Aligned with the principles of the New York Open Meetings Law, these government bodies provide public access to what was said at their meetings.

Out of the city’s 59 Community Boards, we found 31 districts host meetings on YouTube. In addition to driving an increase in attendance levels, we saw a new opportunity to process the video’s closed-caption text into a full transcript, in order to share the meeting conversation with a wider audience.

53% of NYC Community Board meetings are available on YouTube

Over the past few months, we developed a pipeline to transform the raw text from the YouTube recordings into a full transcript, meeting highlights, and topic classification. With open-source tools in Python, we leverage Natural Language Processing (NLP) and Artificial Intelligence (AI) to create a structured dataset of meeting information that can help explore the conversations throughout New York City at a local level.

To date, we have more than 1,300 meeting transcripts available in our public archive, collected from any district with a YouTube channel representing all five boroughs since the start of the pandemic. We process and add about 20–30 meetings each week.

We automatically tag each meeting with a topic category. Our taxonomy framework was inspired by the priorities listed in the NYC Department of City Planning’s Community District Profiles.

A Community Board meeting can be tagged with the following topics:

Human Services, Employment, Youth, Education, Health, Safety, Zoning, Landmarks, Housing, Commercial Development, Land Use, Quality of Life, Transportation, Infrastructure, Parks, Waterfront, Budget, Equity, Arts and Culture, Technology, Police, Utilities, Elections, Libraries

If your Community Board shares their meeting on YouTube and has the closed-caption text-enabled, we can share the meeting highlights and full transcript. Filter by date, location, or topic to view the meeting conversation. You can also subscribe to your Community Board so it’s easier to stay in the loop. On a weekly basis, we send an auto-generated email with quotes from the most recent meeting.

Because people might not know which Community Board district they live in, we visualize the GIS Community Districts from NYC Open Data to provide a map of each Community Board. Our web application prompts a user to click on a specific Community District or search by address to select their district.

We continue to improve our process and update our database with each week of meetings. Our collection of transcript data can also be analyzed to find a signal throughout NYC Community Board meetings. At Open Data Week, a partnership led by BetaNYC and the Mayor’s Office of Data Analytics, we will present our findings with a case study about the trends in topics we found about transportation.

We are looking to connect with Data Enthusiasts, Policymakers, Researchers, and the Civic-Tech community. We are open to sharing insight found in Community Board meetings and brainstorm how this data can be further used, for civic engagement, local democracy, and community building.

In the rest of this post, we will provide more information about how we built the tool.

Transcript Generation Process

Get Transcript

First, we gather the raw text from YouTube’s speech-to-text transcription for each NYC Community Board’s channel and meeting recording available. We collect additional metadata for context, including the video’s publication date, title, and length of time.

Format Text

The raw text from YouTube is just a list of phrases and a duration timestamp. It lacks functional grammar, such as sentence structure, punctuation, and capitalization of proper nouns. In order to read like a transcript, the text must be transformed into sentences. We restore grammar based on the textural features of the raw text to predict where a period, comma, and other punctuation should be placed.

Here is how the raw text looks from the closed-captions in YouTube:

[{'text': "it's six o'clock we're gonna wait for a",
'start': 9.519,
'duration': 2.961},
{'text': 'few minutes uh', 'start': 11.28, 'duration': 3.92},
{'text': 'while we get some more board members to',
'start': 12.48,
'duration': 6.08},
{'text': 'attend so we have a quorum', 'start': 15.2, 'duration': 3.36},
...]

In our process, we tokenize and annotate each word of the transcript. If the word’s part-of-speech is identified as a proper noun, such as a name or location, we capitalize the word.

We do not alter the recorded dialog in any way, outside of fixing spelling errors and removing the words ‘um’ or ‘uh’ to improve the overall flow and readability.

We show how many times we identified and removed the words “um” and uh” from the full transcript.

Interestingly enough, YouTube is not yet familiar with the word “Covid-19” and misspells it almost every time it is transcribed. Some notable spelling errors we have added to our pipeline include “cova da 19”, “kobit”, “cobid”, and ”coca-19". We search and replace these mishaps with the string “Covid-19”.

Generate Summary

Next, we generate a high-level summary by extracting key sentences from the full transcript. Rather than an abstract summary, our taxonomy of terms guides the creation of a verbatim summary. For example, we would collect a sentence with the word “education” over the word “un-mute”, with the goal to highlight the sentences with the most critical content.

Once we have a subset of sentences, we apply a TextRank algorithm that gathers the “most representative” sentences to format a 500-word summary of meeting highlights.

To assign the topics, we predict and tag the relevant category for the meeting classified by a pre-trained transformer model.

Share Meeting

Lastly, we process the summary, full transcript, and meeting metadata into a structured database that feeds our front-end web application and weekly email delivery. We host all of the transcript data on our website.

We hope you found this post helpful — get in touch if you’d like to continue the conversation.

Please subscribe to your Community Board or follow us on Twitter, we highlight and share timely quotes from meeting transcripts.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Microsoft Access Acronyms

Create Data products Solutions with Data fabric or Data mesh

Ethics in Data Science or: How I Learned to Start Worrying and Question the Process

From the veracity of data, how far does humanity have to go?

DESCRIPTIVE STATISTICS PART1

Metal Prices: Obtain LME Aluminum Pricing in CAD

Top 5 Reasons Starting with Python Will Make You a Better Data Scientist

How to Embed Interactive Charts on your Medium Articles and Personal Website

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sarah June Sachs

Sarah June Sachs

More from Medium

Intro to C3 AI design blog series

The Metaverse and the Pornification of Experience

Telling how accurate an address is, solving for incorrect user location| Zomato

SandBox Demo: Spatial