Metadata and You, Shallow End Edition

24 min readDec 21, 2020

This article is an introduction to metadata, some of the overlooked methods in how it is collected, and the precursors of profiling. The information provided here will hopefully provide you a better understanding of threat modeling around metadata collection. Privacy takes work to keep and anonymity takes even more work. There is no silver bullet, let’s do a light introduction into the world of operational security, threat modeling, and metadata collection.

Assume cameras are everywhere, assume they are actually monitoring, and take precautions.

Operational Security(OpSec)

Operational Security or OpSec is the security of your operation(s). This applies to your day to day disclosures of information to companies, social media usage, and other activities you may engage in. Metadata collection can greatly impact your operational security in several ways. We first need to understand what metadata is, how it is collected, and how it will impact our operation security so we can threat model around it.

Metadata collection happens in every device, service, and packet we send. This can be used to build incredible amounts of profiling data about the individual including identification of devices and ownership of accounts. While many companies and influencers try and tell everyone to use a VPN for privacy, there are caveats to this and I will cover them later. In network configurations, there are many layers, as with applications and how they do their calls there are also many layers in how they make their API calls. Recently, Apple drew ire with the Big Sur launch as many applications were bypassing some VPN services that were not doing the correct way of network traffic handling, with their applications calling home without honoring VPN configurations. Keep in mind that all operating systems are subject to this unless you are running Linux or BSD variant that is not proprietary. Telemetry, or what is better known as usage and statistic data, is metadata that can be used to track performance and crashes.

In some cases, metadata collectors do try to be anonymous. There are some caveats with this as there are many fingerprints that are unique to your machine even on a network just monitoring your network usage. Such instances would be a college network, a public wifi network, or even your home ISP. Metadata collection happens not only on the device level, application level, and network level. These facets should be considered when analyzing your operation security, there is no perfect operation security plan except to stay offline. Outside of staying offline, we must do our best to either misinform the metadata collections or to prevent the metadata collection which could impact our experience using services.

There are tons of data points to understand just what data is collected about us during our usage, I will do my best to give a light and digestible introduction with some examples for you. Keep in mind this topic is extremely vast and could fill tons of books about the various forms of metadata collection and how it is done. Metadata can and will be used against you. If you have read the article on hidden services, then you understand how to get a pretty secure service up and running. You need to understand the risks of the operations that you have as well as the service, but we often overlook our threats. Let’s get into it.

Most large scale networks have operations center similar to this with real time monitoring.

Threat Modeling

Threat modeling is often a process associated with securing a system, we should also apply this methodology to analyzing our threats based on operational security. An example of this, the use of personal email addresses via nonsecure models or access on devices and networks we plan to use secure devices. This allows for not only the ISP to collect data but also the email provider. We should also note that if we are using a network with a network monitor like in most public places, the device could have been fingerprinted in the logs as well via the mac address. Every network device is issued a mac address upon creation of the network device, many devices on a machine have identifying information. This information is tied to the device such as a laptop or desktop, and inventory tracking can be tracked to purchases online or at a location. While this seems like a non-issue it is very important to understand for your operational security dependent on what your operation may be. Each person has different conditions and hazards, what operation hazards you have will differ from mine, some may be the same but not all.

In practice, network monitoring happens at pretty much most locations that are operated by larger corporations. On a smaller scale, it is less likely that a nail salon, a locally owned coffee shop, or a small business will be monitoring their public WiFi. This is not a guaranteed rule, so make sure you do a little research and some clever social engineering to determine this information about a location you may connect to for various operations. You should also understand the risks of the devices that you plan to use, often assumptions are made and these assumptions can end with you catching a case or being identified if you choose to want to remain anonymous. It is important to note in day-to-day activities it is much harder to remain anonymous, especially in public places.

Use cash or cryptocurrency when you can for purchases. Use the cryptocurrency without KYC, you should obtain it without KYC as well or mix it with Samourai Whirlpool at a minimum before spending bitcoin. Transparent systems are just that, transparent, so one must fully understand that transactional records are recorded with the transaction and a timestamp. This information can also be used against you and do not think that it will not be. As we begin to see regulators attempting to push regulations on use and acceptance, the use of cryptocurrency properly as digital cash will require the same amount of work as using Tor properly. If obtained correctly, mixed, and used properly it can be digital cash as intended without the linking to the user, except for surveillance cameras when in person. Online usage is still subject to other forms of monitoring and metadata collection by the company you may interact with.

Camera surveillance in the current climate is being circumvented thanks to the onset of Covid. This has made it socially acceptable to be completely covered in public, use this to your advantage. As you may be able to buy untracked and unidentifiable in public there is another risk that we should consider when out and about performing our operation, the onlooker. Many people are starting to go out in protest of being forced to be at home, boredom, and in some cases also performing their operations. It is important to understand that people can and will see your screen in public if you are not using a screen filter for privacy, this can be a huge problem. As people are extremely bored and being encouraged to tell on others for any and everything they do, make sure you keep the visible viewing arrangement of your screen minimal and unassuming. While sitting in the very back seat may seem like a great idea to avoid looking eyes, it does also cause suspicion from other customers and employees at the location if you are in public. There are a lot of these types of cues that can be given from not properly analyzing our threat model of how we act in public. When in doubt, ask yourself what would you do if you were the employee or customer and saw someone sitting in the back of a coffee shop doing exactly what you are doing?

Also keep in mind that your phone screen is also visible in a lot of angles and by others walking by, causing suspicion is a good way to lead to not only losing this spot as a work zone for your operation, but it could also lead to higher monitoring of your behaviors. This could lead to law enforcement involvement or at minimum actual monitoring of the network logs of your specific activities. Your phone and your laptop could tell a lot about you onthis layer, and you must never be using personal devices anywhere you are using your secured devices. This will lead to a bad time, you are on camera in most cases and it is easy to identify what devices with timestamps you are using. Do not be foolish about this, it will end badly.

Phone applications are also overlooked in threat modeling, they are worse than even the broadband providers.

Phones

Adversarial conditions are everywhere, understand them, or fall victim to them. We now have an understanding of some of the physical concepts that a device can be tracked to purchase, this applies to a phone as well, but a phone is much more dangerous than just tracking to purchase. Phones pose an alarming number of problems for privacy and operational security. The amount of metadata collection involved in the operation of a modern smartphone is staggering. As a precaution, if you are choosing to remain anonymous or separation of your operation from your normal nonanonymous information, DO NOT EVER HAVE THEM TOGETHER AND ON AT THE SAME TIME. There is no exception to this, by connecting the two phones you have essentially destroyed the anonymity associated with the device you wanted private or anonymous. At this point, the device would need to be destroyed. While this may seem like an extreme reaction, if you are serious about your operational security you have to take all of the actions associated with failures in the operational security as well. There is a cost of doing things right, you need to weigh monetary value over freedom value in some cases, in others you must weigh time value over freedom.

Phone metadata is very complicated for most to comprehend the dangers of, as most are unaware of the various information that they do contain outside their personal information. As mentioned earlier, each network device has a MAC address, a phone is issued an IMEI(International Mobile Equipment Identity) number that is unique to the device itself. This information like the MAC address can be traced to the purchase. This is extremely important to understand that even on 99% of phones this can be traced by packet collection on a network with simple monitoring and application usage. DNS requests to certain endpoints via an application for things like login, message checking, or even background polling for email are all sent over the network, these requests can and will be monitored. Most are over non-encrypted requests, though there is a large movement for pushing DNS over HTTPS. DNS over HTTPS makes it impossible for ISP’s, third parties, and government agencies from spying on your traffic with proper usage. Tor uses encryption over the overlay network for it’s routing, DNS over HTTPS would provide encrypted DNS requests. However, it may stop snoops on the network level knowing what you are doing, it does not stop application snooping.

Phone applications are very dangerous for our operational security as many of them are collecting application data across applications as well as capturing a large amount of data with every request we send. This information is used for a variety of reasons. Many companies collect the following when you are doing simple requests on an application on your phone:

IMEI
MAC
Public IP Address
LAN IP Address
GeoLocation data via GPS
Operating System Version
Any Running Application ID’s when possible
Account ID
System Account ID
Application ID
Timestamps

This is just a basic sample, this information can tell you a lot about the device and the user. As Android and iOS currently do require some emails associated with their devices, this provides a lot of other details to these companies. The IMEI and MAC address provides the company information that they can look up in public databases such as this. The public IP address can be used along with geolocation information provided by phone GPS. The operating system version in conjunction with other running application IDs when the system permits it gives them a large number of data points about what type of user you are. Your account id and system id are associated with accounts tied to the operating system. Application id can provide to them your exact installation date and time for data profiles to match with your specific details. This information is typically gathered for advertisement, but it can and also will be provided to law enforcement for investigation purposes as well. This information essentially could blow all aspects of our operational security if we are choosing to remain anonymous.

As we can see the application data that is collected is extremely dangerous, now let’s go back to earlier when I mentioned not to have your personal and anonymous devices on at the same time. It is crystal clear as to why this could be very dangerous, if you are using an application on both with different logins you would essentially be telling the company you are doing so. It would also tell the mobile broadband provider that you are doing this as well.

Mobile providers collect a ton of information about your location and data usage. This is done with the IMSI(Internation Mobile Subscriber Identity). Your subscriber number correlated with the IMEI paints a definitive picture of who the user is, this information can be verified if you are using a mobile provider plan provided by companies such as AT&T, Version, T-Mobile, and others. As described in previous write-ups, to avoid this use a burner phone and pay with cash. You can even prepay for debit cards to pay for the prepaid bills to keep yourself from identifying yourself for the phone provider. Keep in mind, that if you ever have the devices on at the same time in the same area you need to destroy the burner phone if your operational security is important. At that point, it is a liability and will be leveraged against you if you are ever under investigation. This is also how some spies with the CIA had their cover blown as well.

At this time, it is possible to spoof your IMEI, but in doing so you are essentially telling that mobile provider that you are spoofing it and in most countries, it is highly illegal. This information is often shared via telecoms and phone manufacturers very frequently because of fraud investigations and government regulations. In practice, it is better to keep a secure device with a removable battery as a private phone. Again, be very mindful of its usage and keep the phone separate from your other phone(s). Treat this very seriously if you are wanting to be private or anonymous to third parties. By compromising on this, you are likely to compromise on other aspects of your operational security and this becomes a very slippery slope. If you would like to see just what can be obtained from your phone application usage, I would suggest using Charles proxy and follow this great how-to. Adversaries are everywhere, even in our own pockets.

A simple relational database schema with your information from usage could build a large footprint of you and your devices.

Operating Systems

Operating systems have several various attack vectors and each new program or service you introduce changes the threat modeling involved. iOS and Android are the largest install base for your phone, but each of them is subject to laws and regulations that turn them into large scale surveillance systems. Often Android is looked at as being a better option for your mobile phone privacy, but I would advise you to read up on Google Play Services Libraries, amongst others before attempting to make such conclusions. On your laptop or desktop, things are a lot different there are some caveats but you get a lot more freedom to some extent. This becomes very important for our operational security.

With regards to our computers that are not phones, we have a large selection of operating systems that we can use, and each has its pros and cons. Windows is the largest computer install base in the world. This is due to larger amounts of application support, focus on the “business” user, and a lot of IT companies require its use. Governments also use Windows as part of their default load for a machine they may issue, whether a laptop or workstation, they are often accompanied with a CAC card and Windows as the operating system. Microsoft has implemented a large number of security features, but they have also introduced many privacy eroding features within its operating system. Windows is proprietary, meaning its codebase is held secretly by Microsoft and only their employees and contractors are permitted to see it and resolve issues. I will cover why this is dangerous later, but there are other factors for Windows we need to consider as well. Microsoft has introduced a large number of telemetry systems into the operating system as well, they have also been showing advertisements in their filesystem folder searches and other features in the operating system. Microsoft has made it very clear they collect this data and will plan to for the foreseeable future.

Microsoft has recently made huge leaps in bounds of support for open-source software. They have also made some interesting acquisitions such as that of GitHub. Windows does have the most malware, viruses, and vulnerabilities as it is the most widely deployed. For hackers, this translates into the time spent will equate to more targets and payouts. This is why there exists far more attack for these systems versus Unix and Linux based operating systems. There is a growing number of attacks for Linux and Unix each year but Windows remains as largest install base. Due to this, it makes sense for hackers to spend more time focused on these attacks until the landscape changes. There are Windows hackers who have published lots of information on staying safe and fairly anonymous while using Windows, but as with all proprietary systems, there are dangers.

Apple also has a decent install base but it only represents about a tenth of the computers in the wild, with a mix of proprietary and open-source operating systems required to run on their hardware. Apple’s macOS is a UNIX operating system, it is based on the Darwin BSD kernel. Apple in the public eye has often made extreme attempts to show itself as a privacy-focused company, to some level this is true, but on others, it is not. Apple has long catered to creative types and professional creative spaces, if you are a software engineer there is also a high percentage chance the company you work for provides you with a Macbook Pro, a Mac Pro, or some type of Mac. The reason for this as most code you build on a Mac will work with Linux, often with little to no changes needed thanks to POSIX. This is not always true, but for most it is. MacOS also has a large number of business applications and support so this helps in why companies go this route. MacOS recently became under heavy scrutiny as Apple applications in their operating system were calling home and bypassing some VPN services. Apple services do call home and Apple does ask during installation if you choose to share with Apple telemetry data. At this time Apple does honor your refusal to share the analytic data via the telemetry and does disable most processes from running these options enabled. As macOS is a Unix, there are a lot of Unix hackers who share methods to disable a lot of privacy eroding tactics that Apple has employed over the years. Just as Windows, each proprietary system has its risks.

Linux and BSD Unix based free and open-source operating systems give us the greatest freedom and protection on the operating system level. As they are developed by companies and individuals they are also audited quite frequently. There are still some caveats with them that must not be overlooked as there are a growing number of non-open source plugins, codecs, and applications that run on Linux now. This poses similar risks as macOS in this case, keep in mind that while your operating system may be open source, the application you are running isn’t, and are you fully aware of what it is doing? Mixed source systems can provide a great experience for the user and we should remind ourselves not everyone’s operational security needs are the same.

There are some good practices to follow, but there are also compromises users will make for various reasons. Linux has a larger install base on servers than many will understand, on the desktop, it is growing. More applications and businesses are now adopting Linux, you can now buy devices with Linux pre-installed and configured to take full advantage of the hardware. There are still some issues keeping it from going mainstream, but it continues to grow and grow.

Linux is largely supported by companies and individuals who use it, develop it, and rely on it to run their businesses. Some big examples of this would be Google, Amazon Web Services, and Facebook. These companies rely on Linux to provide their applications and services to users. Linux provides the most flexibility and at a lower cost or no cost for operating system installation. In Linux and BSD Unix environments, a team can transition from one to another fairly easily as the system tools, and most programs will work on most flavors of them.

Linux can also run on a live image, which prevents the need for installation. This can be a huge boost to operational security if the user decides to use Tails and prevent the need from installing an operating system on a disk. Installation of an operating system on a fixed disk like an HDD or SSD can be subject to forensics fairly easily and should be considered in your operational security assessment.

ChromeOS is malware as an operating system just as

Each desktop operating system will also honor the MAC addresses assigned to its devices, in Linux and macOS you can change this but you must do the process correctly. The order is as follows, make sure the interface is down, change the MAC address, then bring the interface up to connect to a network. DO NOT CONNECT AND THEN CHANGE THE MAC ADDRESS. It would be present in the logs of the network that the device had connected. How to do this in Linux and macOS is as follows:

ifconfig(Identify the device you which to change)
sudo ifconfig $dev down (This will shut down the interface)
sudo ifconfig $dev hw ether 00:00:00:00:00:01(Sample MAC, get creative)
sudo ifconfig $dev up

Here is a simple command if you have OpenSSL installed on your machine to generate valid mac addresses randomly, you could define this in a shell alias in your .profile so you do not need to remember the command explicitly:

• openssl rand -hex 6 | sed ‘s/$..$/\1:/g; s/.$//’

The user is in control over what makes the most sense to them for their operational security for an operating system. I highlighted some of the risks involved and benefits, but you must identify what works for you and your needs. It would be of sound advice to consider using operation-specific tactics in the handling of each need, this way you are properly threat modeling around adversaries on each facet.

Each interface has a MAC address, keep this in mind when juggling multiple devices.

Browsers

Each browser has it’s pro’s and con’s, some have become extremely popular but are nothing more than malware for its user’s privacy. Browser cookies are very dangerous, they pose another threat to the user and are often overlooked. Browser DNS policies can also compromise your security by performing requests without honoring your operating system’s DNS settings. When using a browser it is important to rely on open-source browsers as proprietary and open-source mixed browsers can create dangerous monitoring tools that can compromise our anonymity and privacy. Until all browsers switch to DNS over HTTPS they are also nothing more than information leaks to network monitors. It also a note that all browsers on startup send many requests, read this for more information.

Browsers are used by everyone every day in some capacity for work and play on the internet. They are wonderful tools but have been weaponized against us thanks to cookies. I am sure you have gotten popups on sites asking you to accept their cookies policy, often they tell you how it improves their site and allows them to operate for free. There is a reason that they can continue to operate for free and that is these cookies are tracking your browsing behavior and traffic for targeted ads. Applications do this as well so it is not isolated to browsers for the ad targeting and many services that collect profile data as mentioned above do this as well. There are also browser companies who provide incentives and rewards for certain ads, clicking sites, or whatnot and this should be viewed as a threat and not an incentive. You are giving others your information for pennies when they are making much more off of that same data. Google, Facebook, and others use very aggressive tracking cookies.

Browsers currently are all over the place when it comes to privacy, Tor is a great browser but using it properly will impact many user’s expectations of how some sites work. Firefox or Tor are my go-to daily driver for clearnet, it is robust and pretty secure. I also do not need to worry about the Google Malware that is Chrome, Chrome may be “faster” but Google as a company can not and should not be trusted as they collect massive amounts of data about the users. Always be skeptical of any proprietary browser, as you should with a proprietary operating system. If your operational security is important then you must make compromises on your interactions with the web.

Javascript is largely used on most modern websites today and it poses some serious risks for your anonymity and privacy. This is where most users will take a lazy approach to even using Tor properly, enabling Javascript on Tor is like bringing a condom not to use it. It quite literally is there and could protect you but you chose not to use it and wonder why you caught an STD, or in this case why your anonymity was compromised. Do not make this mistake, even if the site is open source, you need to verify that it is the same version deployed or you could very well end up disclosing more to that site than you bargained for.

While this may be a meme, it is pretty accurate.

VPN’s

A VPN will effectively hide your traffic from your ISP if they are configured and behaving properly, however, they will present logs under pressure from law enforcement. They are also a business and are focused on profits. While they may advertise they care about your privacy and will refuse to cooperate with law enforcement, it is important to remember that business operation is much more important than the user. This is always the case with any business. There is a large amount of confusion about the scope of what a VPN is doing and providing to its customers.

A VPN provides an encrypted tunnel from your device to a system that they run. This does provide a couple of advantages if they are configured properly. It provides an encrypted tunnel that your ISP can not see directly what you are doing with your internet traffic. It can also provide a great way of circumventing some DNS restrictions behind some firewalls. A large number of businesses allow for their employees to use VPN’s, typically into their network, with Covid and everyone being remote they are now promoting the usage of VPN’s when on networks to prevent snooping. This is great, until a point.

As we have seen in previous weeks and months the number of cybersecurity issues with large businesses and security firms, are VPN providers auditing their configurations and systems frequently? Remember you are only connecting to a machine that others are connecting to as well under the guise that it is secured, this could be misleading. A VPN host being compromised would be of great value to many adversaries and agencies for reconnaissance and investigative purposes. There is also an even more interesting behavioral problem with VPN’s, many only use them when they plan to torrent or something questionable. While your traffic may be encrypted, your ISP knows that you are up to something if you only use it at times of increased traffic volumes. Understand that a VPN can be a great tool, but just like Tor, you got to use it properly or it is going to leak information that can be used for profiling. Also, when using a VPN always test connection dropping when connecting or disconnecting, do this with TCP connections and verify that you are being dropped when using it, else it may not be working correctly. Also, verify your DNS is not leaking either, you can use this site. DO NOT USE THE SAME VPN FROM YOUR PRIVATE DEVICES AND YOUR DAILY DRIVERS. I will be posting a VPN specific piece in the future on various aspects of them and their use.

I would also like to point out that VPNs on mobile are tricky, many of them timeout and even more of them are not properly routing traffic over the VPN. Feel free to check with a proxy for yourself, but will cover that in the future for analyzing VPN effectiveness.

A common UI theme for VPN services to select servers to connect to in what country.

Compromises

As there is no perfect method to stay secure even against metadata collection, we must be willing to make compromises in the modern digital world. We should understand these compromises before agreeing to them. The notion of being against Big Tech companies has begun to become much more pronounced, however jumping from one company to another does not solve the problem. There are some arguments on the support of small business and reject large businesses, there is a direct counter to this as well. Let’s spend a little time understanding each of these points and get a better understanding of the compromises we make in each of the respected corners of the fight.

On one hand, we have large tech companies who are notorious for collecting data by any means necessary from users. The argument has long been for advertisement and tailored application or service experience but at the cost of the user’s privacy. This poses an interesting question, How much are we willing to disclose for a company to show us relevant results on a particular topic? Or at what cost are we willing to pay with data to see a variety of items that are loosely connected with something we are looking for to purchase? Some lines need to be drawn, those lines do not exist in the current world we live in. Data collection from companies that provide a number of the services many rely on are removing the concept of privacy for the individual. Alexa, Nest, Apple Home, Google Home, Ring, and just about any IoT device you can think of from companies poses more alarming risks than many understand.

These companies are producing products and services consumers do want, as they continue to buy them. Capitalism and free-market models do decide what is popular, it does not always reflect what is best for the consumer, which is often overlooked. The reason that many services and devices can be sold as low as they are is at the cost of your data. There is a positive to consider though in some respects to your data security as it is held on some of these systems are more likely to be safer than on a small company, however just as Equifax proved this is not always the case. Even recently the SolarWinds fiasco is showing security-based companies and providers are also not always secure. Research and testing become inherently more important as technology becomes even more pervasive in our lives and our daily activities.

There is another angle to consider as well, this time from a business perspective. Amazon, Google, and Microsoft are some of the largest cloud hosting providers at this time. While this provides an easy to launch and usage model for small companies to get their company up and running, it also can provide these companies a large amount of information about your application architecture and your business without you granting them access. As all of these systems monitor API requests and key usages, this allows them to better tailor services for their customers, but it also provides them a very low-level understanding of how your day to day operations of your services work. While at first glance this does not seem like it could be a danger, large companies such as Amazon, Google, and Microsoft are notorious for taking ideas of small companies and reverse engineering them into their own. This has been extremely pervasive over the last decade and there will be even more in the future. As you can derive what a service is doing because of marketing material, these companies can see how it is done and create their competing products. These products will have larger budgets for marketing and even larger reach through their previously installed customer base. There are pro’s and con’s to every approach, so while cloud providers make it easy, they also make it easy for these companies to essentially create a competitive product to eliminate yours.

On the other side, running your equipment can be quite costly and much more risk involved. On paper, it makes more sense that you launch a service or company with minimum infrastructure with little commitment with a cloud provider versus renting server room space, hiring data center operations, system administrators, and your developers to get started versus just writing the application or service and deploying. Scaling is also much cheaper from a time perspective through a cloud provider. You are sacrificing a lot of your control by relying on third parties for hosting, however, in many cases the benefits outweigh the negative impact potential. This is where compromises are made even at the business level. Convenience does come at a cost, one way or another.

Charts of data are collected somehow, remember that it often comes at the cost of the user.

Conclusion

Nothing is truly “safe” online. With that being said, it is better to understand to keep safe in your own needs. Each person’s needs vary and no two are alike, there are some universal needs we should all recognize and encourage others to stride for as well. There was a lot covered in this write up and there is even more to be covered, as technology progresses so does the ever present erosion of privacy. Convenience is something many are willing to compromise their privacy in order to obtain, this is a personal choice.

I for one embrace privacy and support others to understand the risks involved with introducing a new service or piece of hardware into their life. It is important though that you decide what is right for you, because what is important to me may not be important to you.

There will be a few articles in this series on metadata, there is just so much to go over and only so much of it can be digestible in chunks without writing an absolute unit of an article. The next metadata article will introduce packet captures, identifying data points, and even setting up our own profiling experiment. I kept this one light and got into a gentle entry into metadata collection models. I want to give a shout out to JACE for help in proofreading.

Signal:(867–675–1041)

Tox: D7D264EA7541C4324625A8360267C3C54F9C1AF564D4266FE45F2BCB68924E21CB2A75746D51

Twitter