Smart TVs and Mobile Devices Covertly Tapped for AI Web Scraping via Embedded SDK, New Research Reveals Significant Privacy and Security Concerns

Independent cybersecurity research has exposed how a widely embedded software development kit (SDK) from Bright Data, a prominent player in the data business, transforms consumer devices—including always-on smart TVs and mobile phones—into unwitting exit nodes for extensive web-scraping operations. These findings, published on June 5 by Include Security and independent researcher Buchodi, detail the technical mechanisms by which Bright Data’s iOS SDK, and by extension its broader platform, leverages user internet connections to relay vast amounts of data traffic, primarily for its artificial intelligence (AI) industry clientele. The revelations underscore a critical, yet often invisible, dimension of the data economy, where personal bandwidth and IP addresses become crucial infrastructure for corporate data acquisition.

The Unveiling of a Hidden Network

The core of the investigation involved the reverse-engineering of Bright Data’s iOS SDK, meticulously documenting its functionality and the methods it employs to integrate consumer devices into its global proxy network. The research highlights that once embedded within free consumer applications, often behind what is described as an "opt-in" screen, this SDK enables devices to act as residential proxies. This means that web-scraping requests initiated by Bright Data’s customers are routed through the IP addresses of ordinary households, making the traffic appear as legitimate user activity rather than automated scraping from data centers.

Crucially, the immediate risk identified is not direct data theft or account compromise but rather the commandeering of a user’s home internet connection and its bandwidth for commercial purposes without fully transparent consent. Smart TVs, with their perennial connection to the internet, often unmetered and typically linked to high-speed broadband, are described as near-ideal candidates for this role. Their ‘always-on’ nature ensures a consistent and robust relay point, functioning effectively as an unattended, energy-efficient server within a consumer’s living room. The technical analysis further revealed that the peer channel facilitating these scraping jobs lacks robust authentication, and on iOS devices, this traffic can even bypass a user’s configured Virtual Private Network (VPN), operating largely undetected by standard monitoring tools.

Bright Data’s Business Model and Its Evolution

Bright Data, formerly known as Luminati, positions itself as operating the world’s largest residential proxy network, boasting access to over 400 million residential IP addresses. A substantial portion of this vast network is reportedly supplied through this very SDK, marketed as a "consent-sourced pool" of more than 150 million IPs. The company’s business model revolves around providing enterprises, particularly those in the burgeoning AI sector, with the ability to collect public web data at scale. This data is essential for training AI models, competitive intelligence, market research, and various other applications.

The evolution of this model traces back to Luminati, which itself emerged from the controversial Hola VPN service. In 2015, Hola VPN faced widespread criticism when it was revealed to be selling its free users’ bandwidth through Luminati, effectively turning their devices into exit nodes for a fee, reportedly at $20 per gigabyte. This historical context illuminates a persistent business strategy: leveraging a distributed network of consumer devices to facilitate commercial data operations. What has changed significantly in the intervening years is not just the scale, but the primary driver of demand. The rise of sophisticated anti-bot defenses implemented by services like Cloudflare and DataDome has made it increasingly difficult for AI scrapers to operate effectively from traditional data center IPs. This technological arms race has thus propelled the value and demand for residential IPs, which blend seamlessly into regular internet traffic, making them far harder to detect and block.

The Allure of Residential IPs for AI Scraping

Free Apps Are Quietly Turning Smart TVs Into Web-Scraping Proxies for AI

The shift towards residential proxies is a direct response to the escalating sophistication of anti-bot and anti-scraping technologies. Traditional data centers, with their identifiable IP ranges and predictable traffic patterns, are easily flagged and blocked by website security systems. Residential IPs, on the other hand, originate from ordinary homes and are associated with legitimate internet service providers. When AI companies or other entities use these IPs to scrape data, their requests appear to come from individual users, making it significantly more challenging for websites to differentiate between a legitimate visitor and an automated scraper.

This strategic advantage is critical for the AI industry, which relies on massive datasets for training machine learning models. From sentiment analysis to competitor pricing, and from trend forecasting to content generation, the quality and volume of data directly impact AI model performance. Access to a vast, geographically diverse network of residential IPs allows these companies to bypass restrictions, collect public data more efficiently, and gather insights that might otherwise be unattainable. The economic incentive is clear: the ability to gather comprehensive data provides a significant competitive edge in the rapidly evolving AI landscape.

The Consent Conundrum: A Closer Look at Opt-In

A central point of contention raised by the research is the stark discrepancy between the language used in the opt-in screens presented to users and the actual operational scope of the embedded SDK. For instance, in one Roku app named Petflix, the consent screen reportedly stated that the device and its connection would be used "occasionally." However, the SDK’s internal settings allowed for an astonishing traffic volume of up to 200 GB per month. This massive potential usage far exceeds any reasonable interpretation of "occasionally" and raises serious questions about the transparency and informed nature of user consent.

Furthermore, the research found that in a select few countries, such as Uzbekistan and Oman, the traffic limits were set even higher, with devices cleared to continue relaying data almost until their battery was completely drained. This geographical variation in operational parameters suggests a calculated approach to maximizing network capacity in specific regions. The SDK’s capability to link a user’s phone and computers running the same company’s apps, treating them as a single aggregated user, further complicates the privacy landscape, creating a broader profile of user activity and available bandwidth. This "consent gap" is particularly concerning in an era of heightened data privacy awareness and regulations like the GDPR in Europe and the CCPA in California, which emphasize explicit, unambiguous, and informed consent for data processing. Whether the current "opt-in" mechanisms meet these stringent legal and ethical standards remains an open question, and one that could attract significant regulatory scrutiny.

Platform Responses and Industry Dynamics

The issue of consumer devices being co-opted for proxy networks is not entirely new, but its scale and application in the AI economy have brought it renewed attention. The Lowpass newsletter, syndicated by The Verge, first highlighted the smart TV angle of this issue in February, laying the groundwork for this technical teardown. Following increased scrutiny and the emerging understanding of these practices, major platform providers have begun to take action. Google, Amazon, and Roku have reportedly restricted the use of background proxy SDKs on their respective platforms.

In response to these platform-level changes, Bright Data has indicated that it has dropped support for these platforms. However, the company’s public partner list continues to include makers of smart TV apps for other operating systems, such as Samsung’s Tizen and LG’s webOS. The researcher, Buchodi, responsibly notes that while a company being on Bright Data’s partner list indicates a past or present working relationship, it does not definitively confirm that their apps currently contain the SDK. Each individual app would require independent verification to confirm its current operational status. This ongoing cat-and-mouse game between SDK providers, app developers, and platform gatekeepers highlights the dynamic and often opaque nature of the app ecosystem and the challenges in enforcing user consent and platform policies.

The broader context of illicit proxy networks further complicates the picture. Reports, such as those by KrebsOnSecurity in October 2025, indicated that botnets like Aisuru were shifting from DDoS attacks to fueling large-scale AI data harvesting. Similarly, Google’s disruption of the criminal IPIDEA proxy network in January of the same year underscores the pervasive nature of device hijacking for commercial data operations. Bright Data draws a clear distinction, asserting that its exit nodes are "opt-in" through consent screens, contrasting its model with these overtly criminal operations that hijack devices without any pretense of consent. However, the findings of the Include Security research challenge the meaningfulness of that consent, blurring the line between ostensibly legitimate operations and those that exploit users’ lack of informed awareness.

User Risks and Broader Implications

While the immediate risk is primarily related to bandwidth consumption rather than direct data theft, the implications for users are significant and multifaceted.

Bandwidth Usage and Performance: Consistent high-volume data relay can consume a user’s allocated bandwidth, potentially leading to slower internet speeds for legitimate household use, higher bills if data caps are exceeded, or even throttling by ISPs.
IP Reputation and Blacklisting: If a user’s IP address is used by Bright Data’s customers for activities deemed abusive or illegal (e.g., scraping copyrighted content, aggressive crawling, or even activities mistakenly flagged as malicious), that IP could be blacklisted. This could result in legitimate websites blocking access to the user’s home network, impacting their ability to access online services, banking, or entertainment platforms.
Legal Liability (Theoretical but Possible): Although rare, there’s a theoretical risk that an IP address used for illegal activities by a Bright Data client could be traced back to the residential user, potentially involving them in investigations, even if they are entirely innocent.
Security Vulnerabilities: While the research did not point to direct hacking risks, the presence of an SDK with weak authentication and VPN bypass capabilities introduces an additional attack surface. Any vulnerabilities in this "peer channel" could potentially be exploited by malicious actors.
Ethical Concerns and Digital Sovereignty: Beyond technical risks, the practice raises profound ethical questions about digital sovereignty. Users expect their home internet connection to be for their personal use, not repurposed as corporate infrastructure, particularly when consent is ambiguously obtained.

For the AI industry, these revelations highlight the ongoing ethical and legal challenges associated with data sourcing. The demand for vast datasets often outstrips readily available, ethically sourced, and legally compliant data. This pressure can inadvertently incentivize methods that push the boundaries of user privacy and consent. As AI governance frameworks develop globally, the practices of residential proxy networks and the transparency of their consent mechanisms are likely to face increasing scrutiny.

Protecting Your Home Network: Practical Steps

Given these findings, individuals and organizations can take several steps to mitigate the risks associated with unauthorized bandwidth usage and proxy activities on their networks:

Network-Level Blocking: The most effective immediate action for home users is to block the specific web addresses (domains) that the SDK uses to connect. Tools like Pi-hole or NextDNS, which operate at the router level, can filter DNS requests for all devices on a home network. The primary domains identified by the research are:
- proxyjs.brdtnet.com
- proxyjs.luminatinet.com
- proxyjs.bright-sdk.com
- clientsdk.bright-sdk.com
- clientsdk.brdtnet.com
  Blocking these domains should prevent devices from acting as relays without affecting Bright Data’s paid services, which reportedly operate on separate addresses. Users should be aware that Bright Data could alter these connection points in the future, necessitating updates to any blocklists.
App Review and Device Monitoring: Users should regularly review the permissions and terms of service for free apps, especially those that offer seemingly disproportionate value for free. Monitoring network activity for unusual spikes in data usage on specific devices (like smart TVs or mobile phones) can also indicate covert proxy activity.
Router Security: Ensuring that home routers are secured with strong passwords and up-to-date firmware can provide a baseline level of protection against various network threats.
Organizational Scans: Companies managing staff phones or devices should implement mobile device management (MDM) solutions capable of scanning for apps containing such SDKs. It’s important to note that if devices are using mobile data connections, they may bypass office Wi-Fi network blocks, requiring a more comprehensive monitoring strategy.
VPN Usage (with caveats): While the research indicates that the SDK traffic can bypass configured VPNs on iOS, using a reputable, system-wide VPN can still offer some protection against other forms of tracking and surveillance. However, it’s not a panacea for this specific issue.

The Future of Data Sourcing in the AI Era

The revelations surrounding Bright Data’s SDK and its integration into consumer devices underscore a fundamental tension in the digital economy: the insatiable demand for data to fuel AI and the imperative to protect user privacy and autonomy. As AI continues to evolve, the methods for data acquisition will undoubtedly become more sophisticated, necessitating a continuous cycle of research, disclosure, and regulatory adaptation. The challenge lies in establishing clear ethical guidelines and robust legal frameworks that ensure transparency and genuine informed consent, thereby allowing the benefits of AI to flourish without compromising the digital rights of individuals. The ongoing debate around residential proxy networks is a crucial battleground in this broader struggle for a more responsible and transparent data ecosystem.