Skip to content

Conversation

@jessealama
Copy link
Contributor

@jessealama jessealama commented Nov 11, 2025

Adds JSON-LD structured data to the events page for better search engine indexing using schema.org/Event markup. Fixes #738

Notes and potential areas for feedback

  • data/events.yaml now allows city, country, and venue fields, all optional.
  • The preferred way to indicate countries in Schema.org is to use the ISO 2-letter uppercase alpha format. As it stands now, location is freeform text. For the country field we have no checks, so it still remains freeform, but one might imagine defining a check. Another approach would be to render e.g. "DE" as "Germany".
  • I ran into rate limiting problems when developing this locally, and I was able to get around this by setting the NODOWNLOAD environment variable. I'm not sure if this is quite right; any feedback welcome.

jessealama and others added 17 commits November 11, 2025 13:25
This change adds structured semantic data (JSON-LD) to the events page,
making event information machine-readable for search engines and other
tools.

Changes:
- Add city, country, state, and venue fields to Event dataclass
- Generate schema.org Event markup with PostalAddress for physical events
- Use VirtualLocation for virtual events
- Compute location display strings from structured fields
- Add structured data to sample events in events.yaml
- Integrate validation in GitHub Actions using structured-data-testing-tool
- Stub project downloads when NODOWNLOAD=1 to avoid rate limits

The semantic data includes proper geographic information using PostalAddress
with addressLocality (city), addressCountry (country), and addressRegion
(state) fields, making events more discoverable and accessible.
When a venue is specified, the Place name should be just the venue
(e.g., 'Spielfeld') rather than the full computed location string
(e.g., 'Spielfeld, Berlin, Germany'). The full address details are
already captured in the PostalAddress structure.

For events without a venue, the city name is used as the Place name.
Convert all events from unstructured 'location' strings to structured
fields (city, country, state, venue). This enables proper schema.org
PostalAddress markup with addressLocality, addressCountry, and
addressRegion.

Changes:
- 54 events updated with structured location data
- State abbreviations expanded (RI, CA, MA, CO, PA, etc.)
- Country names standardized (NL → Netherlands, UK → United Kingdom)
- Venue abbreviations preserved (ICERM, ICTS, MSRI, etc.)
- Virtual events unchanged (location: virtual)
- Fixed typo: Tblisi → Tbilisi

The Place name in schema.org now uses venue (if available) or city
(as fallback), rather than the full location string. PostalAddress
provides structured address details.

All 70 events pass validation (100%).
Add hybrid event support to distinguish events that offer both
in-person and virtual attendance options.

Changes:
- Add 'hybrid: bool' field to Event dataclass
- Add validation: hybrid events must have city and country
- Update schema.org generation for MixedEventAttendanceMode
- Location for hybrid events is an array containing both:
  - Place with PostalAddress (physical location)
  - VirtualLocation (online access)
- Mark 'Learning Mathematics with Lean' as hybrid event

The schema.org location field now properly represents three modes:
- OnlineEventAttendanceMode: VirtualLocation only
- OfflineEventAttendanceMode: Place with PostalAddress only
- MixedEventAttendanceMode: Array with both Place and VirtualLocation

All 70 events pass validation (100%).
Update addressCountry values to use standard two-letter country codes
(e.g., US, GB, DE) instead of full country names for better machine
readability and compliance with schema.org recommendations.
- Remove duplicate description field from event JSON-LD (was just repeating the title)
- Fix Formalization class to respect NODOWNLOAD environment variable
- Prevents GitHub API rate limiting during local development with NODOWNLOAD=1
Move Node.js and npm setup earlier in the workflow and remove conditional
checks. This makes the workflow simpler and allows validation to run as a
standard part of the build process rather than an optional step.

Changes:
- Move setup-node action to run right after Python setup
- Move npm ci to run before build step
- Remove conditional checks from Node.js setup and validation steps
- Simplify validation step to just run npm command
- Use version tag (v4.1.0) instead of commit hash for setup-node action
Introduce a is_fully_remote() method to centralize the logic for detecting
virtual/online events, replacing duplicate inline checks throughout the code.

Also extend validation to ensure fully remote events don't have physical
location fields (city, state, country), maintaining data consistency.

Changes:
- Add Event.is_fully_remote() method
- Update compute_location() to use new method
- Update generate_schema_org_json() to use new method
- Add validation check for fully remote events
Change generate_schema_org_json to throw ValueError exceptions with helpful
error messages instead of silently returning empty strings when dates cannot
be parsed. This makes debugging easier and ensures all events have valid dates.

Changes:
- Combine date parsing and ISO 8601 formatting into single lines
- Use separate try/except blocks for start and end dates
- Raise ValueError with specific error messages indicating which event and
  which date field is invalid
Co-authored-by: jessealama <56691+jessealama@users.noreply.github.com>
@jessealama
Copy link
Contributor Author

Example of the generated markup for an event:

<li> <a href="https://pitmonticone.github.io/ItaLean2025/">ItaLean 2025: Bridging Formal Mathematics and AI</a> (Bologna, IT. December 9–12, 2025)
    <small class="align-middle">
        <span class="badge badge-secondary align-middle event-tag-conference">conference</span>
    </small>
    
    <script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Event",
  "name": "ItaLean 2025: Bridging Formal Mathematics and AI",
  "url": "https://pitmonticone.github.io/ItaLean2025/",
  "startDate": "2025-12-09",
  "endDate": "2025-12-12",
  "eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
  "location": {
    "@type": "Place",
    "name": "Bologna",
    "address": {
      "@type": "PostalAddress",
      "addressLocality": "Bologna",
      "addressCountry": "IT"
    }
  },
  "eventStatus": "https://schema.org/EventScheduled"
}
    </script>
    
</li>

@bryangingechen
Copy link
Collaborator

Indeed, the NODOWNLOAD env var is mentioned in our README; the other option is to pass a GitHub token.

I hadn't heard of this before but it looks like it's relatively widely used on other such community pages, cf. rust foundation events, python events, so I'm not opposed to this. I am a little wary of introducing a node dependency, even if only in CI. Is there a comparable Python package for validation that we could run as part of make_site.py?

@jessealama
Copy link
Contributor Author

Indeed, the NODOWNLOAD env var is mentioned in our README; the other option is to pass a GitHub token.

I hadn't heard of this before but it looks like it's relatively widely used on other such community pages, cf. rust foundation events, python events, so I'm not opposed to this. I am a little wary of introducing a node dependency, even if only in CI. Is there a comparable Python package for validation that we could run as part of make_site.py?

Thanks for taking a look! I reailze this is probably a bit of a curveball PR. I'll take a look at equivalent Python tools to see what can be done there; surely there's an equivalent.

@jessealama jessealama marked this pull request as draft November 11, 2025 22:12
@jessealama
Copy link
Contributor Author

Converting to draft while I investigate Python equivalents for JavaScript's structured data test tool.

@jessealama jessealama force-pushed the add-semantic-data-to-events branch from 889c0bd to 9e1644f Compare November 12, 2025 10:08
@jessealama
Copy link
Contributor Author

I found pydantic2-schemaorg. It's a neat package (and could also be used elsewhere, too) that does just what we want. I've removed the NPM/JS setup.

@jessealama jessealama marked this pull request as ready for review November 12, 2025 10:27
@bryangingechen bryangingechen self-assigned this Nov 17, 2025
Add structured location data to Swiss Math Soc Spring Meeting event.
@jessealama
Copy link
Contributor Author

ping @bryangingechen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add schema.org structured data to events page

2 participants