How might we reduce 5 unlock steps to 1, using facial recognition, without compromising security or privacy?
I started by reviewing Yale's existing smart doorbell experience end-to-end. The current flow requires five steps to unlock: press the doorbell, wait for a push notification, open the app, check the video feed, then tap unlock.
This was designed for visitors, not residents. It assumes someone inside the house is available to verify and unlock. But when the resident is the one at the door (carrying shopping, standing in the rain) they're stuck waiting for someone else to go through five steps, or they reach for a physical key.
The core friction: the smart lock's unlock flow makes residents dependent on someone else. The "smart" part becomes the backup, not the primary method.
I spoke with 8 smart lock users (friends, family, and online contacts). These were informal conversations, not formal structured interviews. The insights are directional, not statistically significant. A real project would require a larger, more diverse sample.
6 out of 8 people told me they use a physical key or PIN code more than the app. Why this matters: if users avoid the "smart" feature, the product is failing at its primary value proposition. The app unlock is too slow when you're physically at the door.
Nobody was worried about their own entry. The stress came from remote-unlocking for someone else: a delivery driver, a family member who forgot their key. Why this matters: the security model should differentiate between "me at my door" and "someone else at my door."
Every single person rejected auto-unlock. Even if the tech were perfect, it felt "wrong." Why this matters: perceived control is as important as actual security. The confirmation step isn't a compromise. It's the feature.
People want the door to know who they are. They don't want it to decide for them. The design must preserve the feeling of choice.
I mapped the current unlock journey step by step, marking where friction and dependency occur. Then I designed a proposed flow based on the research findings: facial recognition to identify, confirmation tap to unlock.
The proposed flow removes the dependency on a second person. The resident interacts directly with the doorbell. Steps 3-5 of the current flow (open app, check video, tap unlock) are replaced by one action: tap the doorbell.
The system needs to handle two distinct scenarios. I mapped separate flows for recognised household members and unrecognised visitors, because the experience and the security requirements are fundamentally different.
Every participant rejected auto-unlock. The single tap preserves user agency. "I chose to open the door" vs "the door decided for me." This directly addresses the research finding that perceived control matters more than speed.
Facial data is GDPR "special category" biometric data. All matching happens on-device using embeddings (mathematical representations), not stored photos. Nothing leaves the doorbell hardware. No cloud means no breach surface.
Without liveness detection, a printed photo held to the camera could trigger the unlock prompt. Depth sensing + micro-movement analysis adds 1-2 seconds. In user feedback, participants read this pause as "the system is being thorough" rather than slow.
Facial recognition creates real exclusion risks. I designed fallback methods for every scenario where the primary flow might fail or be inappropriate. No user should be locked out because they can't or won't use face recognition.
GDPR restricts biometric processing for minors. Face + mandatory PIN. Parents set time-based entry rules.
Face recognition is opt-in only. PIN pad and physical key remain as parallel entry methods.
Time-limited access codes via app. Auto-expire and are logged. No biometric enrollment needed.
Never trigger facial recognition flow. Standard video call to homeowner for remote decision.
IR LEDs for night. Below confidence threshold → automatic fallback to PIN entry.
Silent alarm PIN: opens door normally while silently alerting emergency contacts.
Would people actually enroll their faces? Even with on-device processing and full transparency, biometric enrollment is a big ask. I don't have real opt-in data.
Does one-tap feel secure enough in context? Prototype testing in a controlled environment isn't the same as standing at your front door at night. Physical context changes how security feels.
Can current doorbell hardware support this? On-device face matching with liveness detection requires significant processing power. I haven't verified chipset capabilities or battery impact.
Security and convenience aren't opposites. The current 5-step flow isn't more secure. It just feels that way because it requires more effort. A face match plus confirmation is faster and arguably safer than a quick video glance from across the room.
The hardest UX problems are emotional. Everyone understood auto-unlock could work technically. They still didn't want it. Designing for the feeling of control mattered more than optimising for speed.
GDPR as a design constraint pushed better decisions. On-device processing, explicit consent, instant deletion. Regulations forced privacy-first architecture that also happens to be a trust-building feature.
I worked alone, and it shows in the research. 8 informal conversations isn't enough. A real project needs structured interviews, diverse demographics, and physical prototype testing at actual front doors. I'm honest about that limitation.