In-not-so-cognito!

a_cold_brew
11 min readAug 23, 2019

The curiosity, the cat and the killer! (Ep 2)

An inspired hack from Ying Yu leads to the launch of a browsing history driven model for matching people. As events progress, Yu discovers a potentially consequential improvement he could add. Although a mighty risky endeavor, the curiosity in Yu provokes him to go the extra mile anyway. Like Jesse Pinkman is aghast by the new discoveries about people he stumbles upon, a shock ensues in Yu’s journey!

With social media remaining under complete government surveillance, it became an avenue for spreading negative propaganda and people losing their self-security over peers’ “stories”, a trend-shift had taken place over the past decade. Actions like inner reflection, journaling, mental peace, self-love, head-space, knowing what I am, how I feel had taken much much higher priority. While many people drifted away from social media and avoided using it voluntarily, the majority had gotten too busy trying to make themselves better in their own eyes that they could not afford social media time. Of course, there were many who managed both or who still lived a life of external pretense. A huge chunk of the populace was now inactive on social media. But the need for a deeper — focused and long-lasting connection with a partner was very much still in demand. And since the existing dating apps only relied on signals from social media to match people, they would just not work for this chunk. Physical appearance would become the only deciding signal and that was flawed by design in the face of it. Some apps countered this by giving the users an option to describe themselves, share a representation of “the self”. But most consumers were not sure what to write — they did not know a lot about themselves yet, they felt they were not “sell-able” enough, they did not know how to write flamboyantly, they did not know what to include/exclude. All of which manifested into a stream of constructed pretentious semi-lies and cause the whole fakeness problem. Explicit self-expression did not partner well with a giant user base and Yu’s team was betting on some implicism to make it more real. The browser history of a person (like what he is reading, what he is learning, shopping, searching, listening, watching by himself, even believing), although implicit could be a stronger reflection of the self than the narrative he tries to convey on social media was their notion.

“I am gonna soon become one of those data bros, the ones who go ‘Bro, Data bro!’ for every situation”, Ying Yu ceded to his roommate while typing frantically on his laptop on a hump-day evening. During the project, the team realized they could find out potential “connecting” factors for two people based on the highest common contributing weights in their respective vectors. Yu hypothesized it would make a vital difference if they showed some keywords portraying these weights for each match. For example — the name of an artist someone ardently follows, a skill/technology someone is passionately trying to learn, the theme of web searches over the past few months, or music genre predispositions, a problem someone is trying to cope with, uncommon beliefs people are seeking validation for, etc. To test his assumption, he started displaying popular keywords related to weights which contributed the most for the mutual score on the website. He supposed people could use it as a piece of guiding information or even a conversation starter. Ying Yu was very critical about the model being able to capture the genuineness of a person. A couple of months - endless white-boarding, gallons of coffees, many iterations of further feature engineering and hyper-parameters tweaking later, the team was convinced that they have arrived at a good heuristic (attribute set) to represent a person mathematically, specifically for the dating use case. Ying Yu also devised and implemented a fairly accurate but hacky algorithm to compute match-distance — a potential indicator of the likelihood that two people are a good fit for each other based on their mutual vectors. They trained the graph edge prediction pipeline again with these better heuristics this time. The website was made inclusive for the LGBT community. The company decided to dog-food it with a chosen set of few users wanting to be found a potential date. To make testing and getting feedback easier, they only released it to people within the radius of Shenzhen. The per-person-model they had built based on his/her interaction with the web is a “true” representation of his/her personality was the premise to be validated. And as a result, if two people meeting on the platform establish a bond stronger than a connection inspired out superficial social media guided pretense (like on apps), it was a win! They launched. They waited.

Ying Yu was super thrilled when he was able to see traffic on the website. A month passed and the traffic had risen by two orders of magnitudes and there was a pile of requests to get enrolled on the website. Ying Yu was stumped! He waited for the weekend to dig deeper into what was going on. On that rainy Sunday, he drew some statistics to gauge how many recommendations from the model were liked by people, what were the scores for those matches, and find out how many of these manifested into a good date in the real world. He presumed it was wise to interpret data from the side of the female populace because his main source of motivation to do this originated from people like Chunhua. He pooled all the female data, and the males they had selected from the site. He got their matching scores and tried to estimate a baseline matching score from the matches. From a cursory analysis, he was pleasantly shocked to see that the algorithm had worked, because 70% of the chosen matches, had scores towards a very high end. A real-world manifestation of something going right motivated Ying Yu. And as an attempt to get easy validation, he pinged his friends Chunhua, Tingyu, and others with whom he had shared the website before, to know their experience.

“After a long time I did not have to keep track of time during my date, we spoke for 2 hours! It felt normal. We both were very curious about the concept of God and religion and I could never have imagined that leading into a nice date. Not only had we thought, read and subscribed to many similar ideas, but there was also a fine mix of conflicting ones. Over which we had a very healthy discourse. We are certainly meeting again”, Yaling Wu narrated her experience. “Dude we both are such ardent Anoushka Shankar fans! The connection with him over music did not feel like it was new at all. And guess what, I even found out we are mega GIF nerds!” Chunhua grinned sipping her tea. “I do not know what to say, it is surprising we ended up speaking about rank caching in a Redis server-side cache for 45 minutes and it did not seem weird. It was a chilled hangout. We spoke, rather brainstormed. Like a lot!”, Tingyu was amused. “On the contrary, I had the quietest of the experiences but a positive one quite literally. I met this Zen meditation practitioner and he educated me about channeling energy propagation within, by silent Shahaja practices. I had never thought my interest in meditation would manifest as a date. Ever!” interjected Alexandra. Even Angie was fortunate to find a co-learner — “I had been learning and curating resources on major scales and music theory for a while now. I met Hong who is a trained traditional vocalist and understands eastern music. We had a lot to share and are planning to jam soon”, she happily explained. “We had a great debate on one of his many absurd takes on how one should believe astrology because it is a connotation of machine learning,” she continued. “It was much more comforting to meet someone going through acute impostor syndrome, talk about it and share a space of mutual understanding than just read about it”, smiled Kylee briefing about his unique date experience. Ying Yu did receive many unsatisfying responses as well, in-fact a ton of them! To quote a few — “I still got bored at mine. She was so quiet, felt like I was forcing her to talk. We spoke briefly about cheesecakes but that was it, our personalities do not match at all.” ranted Shun. “He was a such a different person when we spoke over chat but a radically opposite one when we met. It felt like something tipped him off, not really sure what and how?”, lamented Galina. “We had interests in common, but as people, we are very different. I like to stay quiet but she is more like a social butterfly”, commented Sunha. Ying Yu’s mind was blowing with all this. He saw a pattern in all the positive responses. Although the percentage of positive responses was frugal, there was a pattern. He could see that even in the tiny experimental sample space, there were around 10% of curious people who could potentially establish better connection evident from the intersecting trails of their internet browsing footprint and not via social media either because a lot of the information was not available with social media, the information is seldom spoken about or what was available was all lies. Initially, the team had zeroed in on the problem to solve and now they narrowed down on the persona they want to solve the problem for with their solution. They had a market and a potential product — a dating platform for socially inactive but alluring internet users. They celebrated.

A tangible reward is the highest form of motivation to keep us persevering and a dangerous one to tread on because we get accustomed to it. But while it was lasting, Ying Yu’s enthusiasm was off the charts due to the speed at which he was able to execute and get feedback. People were really, so to say, lonely and troubled by the existing dating paradigms. Ying Yu again pooled a bunch of 50 random people to interview about their experience and reiterated. This became a monotonous process and Ying Yu was again, bored! Ying Yu knew there was an unopened pandora’s box which he could explore and possibly uncover the remnant farce mask on people’s faces — the browsing history from their incognito mode! A person’s interaction with the internet is incomplete without incognito mode and without it, the model cannot be 100% real. He presumed people involve in societally acclaimed embarrassing activities, hide private information, hide their insecurities and vulnerabilities, access sensitive information behind the walls of incognito mode. But if they could see other people also do the exact same actions or share similar opinions, it can become a norm and the apprehensions would be hugely reduced — self-liberation. Ying Yu felt this could empower the mathematical model to get closer to reality — which would eventually translate into greater control over people’s minds, their ultimate goal. “Given the time spent on the internet by an individual today, if I can capture all of his browsing sessions in my model, can I completely know that person?”, he ambitiously wondered. Third-party cookies were predominantly used to perform targeted advertisements. These small files reside on our computers and track navigation history. Incognito mode isolates these cookies in a way that the cookie-sender is fooled in believing that every session is new and has no history before, like a sandbox. But, it does not end there. For people who are using a static connection with the internet or a shoddy VPN, it is not a hard problem to map the location from which the requests are coming, because the IP would mostly be same (or a subset) all the time. And once identified, any further requests made, be it from incognito or not, signed in or not, the provider would know it is originating from that user’s computer. The little like buttons on the websites send this information to the provider which stashes it and maps the browsing history to a source machine and eventually to that person. Incognito was still an allowed feature since its major use-case was to isolate the privacy of multiple users accessing the same physical machine. GFW had all the infrastructure needed to get their hands on incognito traffic and yet nobody had ventured into incognito tracking because organizations believed tracking it could potentially add no significant value. Yu’s hands were itching and he started reading more about these technologies to leverage leaks and loopholes in GFW to generate the above mapping. He approached the research division at WeChat and brainstormed with a network expert. With the reduced regulations, the expanse of GFW monitoring and consent from Internet Service Providers, they came up with a rough design to snoop and map incognito traffic to its sender, although only partially accurate. This was a risky endeavor also because incognito traffic was legally supposed to remain anonymous. Tracking it could have implied people and activists using it against the government to stage a revolt which the government did not want to entertain. But when Yu got backing from his peers, he decided to give it a shot anyway. They enabled tracking of people based on static IP to counter incognito mode, and when he had identified 100 such potential users, he enabled annotation of any browsing (incognito or normal) from them. After a few days when he had a size-able history accumulated for these users, he retrained the model and revised the website. The next day, to be extra sure, Yu and his manager decided to stop displaying keywords collected from incognito mode but use it only in the matching algorithm. Because a suspicious keyword could be an easy give away that something fishy was going on.

One evening at his desk, Ying Yu sat down to explore what are the top topics of interest, or domains which have connected people so far. “People only talk about diversity. But when it comes to partnering, their choices are more often in the same profession/background. Is this a by-product of the model or is this a preference of the society?” Ying Yu was talking to himself. “As true it is that music possesses divine powers in bringing people together, so does shopping! It is just not spoken about enough”. “Nihilism, alcohol addiction, atheism, goat yoga… I am at-least glad that people are getting some form of validation and company here” remarked Ying Yu as he mined the most uncommon domains where interests significantly intersected between two people. “Birth control, weed strains, bonsai gardening and wait. What? Murders??”. Ying Yu was appalled at the keyword/domain he just read. It made sense to him when he established that the girl in the pair, Jia, was a 30-year-old aspiring thriller and mystery series writer. She had published a few semi-fictional stories which Ying Yu could read. Out of curiosity, he looked the guy up on WeChat to understand what made the guy so curious about murders. A debonair picture of a 34-year-old, Li Xiao, sales assistant in a government-owned detergent manufacturing company Zhipu showed up and Ying Yu could not posit any reasonable justification. Perplexed he sat. “Must just be a handsome thriller buff”, Yu gave in and left for the day.

--

--

a_cold_brew

Atypically typical tech-guy | Messy Scribbler | Rookie guitar player | Cooking enthusiast | Runner along the bay | Cricket geek