Unix: pass and gpg blog

This commit is contained in:
Akemi Izuko 2024-03-29 00:11:34 -06:00
parent 26f0543d62
commit 9373a15b90
Signed by: akemi
GPG key ID: 8DE0764E1809E9FC
2 changed files with 386 additions and 26 deletions

View file

@ -8,22 +8,22 @@ heroText: 'Base64 LLama'
# The Secret Learnings of Llamas # The Secret Learnings of Llamas
Tool use by llamas is an active area of research. Recent implementations like Tool use by llamas is an active area of research. Recent implementations like
Devin promise great productivity increases through tool use. I was investigating [Devin](https://www.cognition-labs.com/introducing-devin) promise great
tool use by some modern llamas, when I made an unfortunate discovery. productivity increases, just by allowing llamas to interact with more tools. I
was investigating this in some modern llamas, when I made an unfortunate
discovery.
It appears most large llamas have learned a new language, in addition to the It appears most large llamas have learned a new language, in addition to the
ones that were intended: base64. ones that were intended: base64.
### Base64 Background ## Base64 Background
Base64 is a simple encoding scheme. This is different from encryption and Base64 is a simple encoding scheme. It takes in a stream of bytes and converts
hashing, as those provide security, while base64 just transforms data into a them into a plain-text representation.
portable form.
Each byte is 8 bits. This means there are 2^8 (256) possible bytes, since each Each byte is 8 bits. This means there are 2^8 (256) possible bytes, since each
bit contributes 2 states. Base64 encodes such that each bytes only stores 2^6 bit contributes 2 states. Base64 only uses plain-text encoding, so it only
(64) possible states, but this makes the vocabulary much smaller. With just 64 stores 2^6 (64) possible states per character.
letters and numbers, it can hold 64 states per character.
Let's visualize how base64 works. Say we have the following word: Let's visualize how base64 works. Say we have the following word:
@ -31,10 +31,11 @@ Let's visualize how base64 works. Say we have the following word:
Hello Hello
``` ```
This has a utf-8 encoding below. I used the `ord` function in python to get the We can convert each letter to a number using [utf-8 encoding
numbers in the `Base 10` row. I then converted the base 10 representations to tables](https://en.wikipedia.org/wiki/UTF-8#Codepage_layout) or the `ord()`
octal (base 8) and binary (base 2). The bottom two rows are the same, but the function in python. I then converted the base 10 representations to octal (base
spacing makes it easier to see the direct mapping from octal to binary: 8) and binary (base 2). The bottom two rows are the same, but the spacing makes
it easier to see the direct mapping from octal to binary:
``` ```
Letters: H e l l o Letters: H e l l o
@ -55,8 +56,7 @@ Now we can map in reverse:
``` ```
Base 2 (spaced): 000 001 000 000 001 100 101 001 101 100 001 101 100 001 101 111 Base 2 (spaced): 000 001 000 000 001 100 101 001 101 100 001 101 100 001 101 111
Base 8 (spaced): 0 1 0 0 1 4 5 1 5 4 1 5 4 1 5 7 Just chaging the spacing...
Base 2 (spaced): 000001 000000 001100 101001 101100 001101 100001 101111 Base 2 (spaced): 000001 000000 001100 101001 101100 001101 100001 101111
Base 8 (spaced): 01 00 14 51 54 15 41 57 Base 8 (spaced): 01 00 14 51 54 15 41 57
Base 64 (spaced): B A M p s N h v Base 64 (spaced): B A M p s N h v
@ -65,7 +65,7 @@ Base 64 (spaced): B A M p s N h v
So we can encode the word `Hello` as `BAMpsNhv` in base64! Base64 is often used So we can encode the word `Hello` as `BAMpsNhv` in base64! Base64 is often used
to encode images and other binary data to store in JSON. It is not space to encode images and other binary data to store in JSON. It is not space
efficient, taking up more space than it should, but it's entirely made of efficient, taking up more space than it should, but it's entirely made of
printable characters. printable characters!
## Base64 Llamas ## Base64 Llamas
@ -83,7 +83,7 @@ echo 'how are you today?' | base64
``` ```
Then ask a llama about `aG93IGFyZSB5b3UgdG9kYXk/Cg==` or whatever other string Then ask a llama about `aG93IGFyZSB5b3UgdG9kYXk/Cg==` or whatever other string
you want. You'll notice that they break down after a about 10-20 characters, you want. You'll notice that they break down after about 10-20 characters,
depending on how good the llama is. depending on how good the llama is.
@ -107,13 +107,10 @@ Encode "emiliko@mami2.moe" into base64.
This discovery was shocking to me. I thought they were achieving this through This discovery was shocking to me. I thought they were achieving this through
tool use, but I can cross-verify on localllamas which most certainly don't have tool use, but I can cross-verify on localllamas which most certainly don't have
access to tools. This means our 100-billion scale llamas are learning to be a access to tools. This means our 100-billion scale llamas are learning to be a
base64 decoder? base64 decoder? Of course this is a completely pointless feature, as no llama
will ever be more energy efficient than a trivially coded base64 tool.
Of course this is a completely pointless feature, as no llama will ever be more The Llamas likely picked it up while learning on sample code, but the degree to
energy efficient than a trivially coded base64 tool. The Llamas likely picked it which they picked it up is incredible! This has lead me to wonder, what other
up while learning on sample code, but the degree to which they picked it up is completely pointless things are our llamas learning? This one was an unintended
incredible! side effect of learning to code, but what other side effects is our data having?
This has lead me to wonder, what other completely pointless things are our
llamas learning? This one was an unindented side effect of learning to code, but
what other side effects is our data having?

View file

@ -0,0 +1,363 @@
---
title: 'Unix Password Management'
description: 'Using GPG and Pass for optimal security and ease'
updateDate: 'March 28 2024'
---
# Password Management
Passwords are often the main method of digital identification. This means
anything you don't want others to access but do want yourself to access is
behind some sort of password. This means we need to optimize on two fronts:
- Easy of access: Passwords must be quick and easy to access and use
- High security: Passwords must be strong to resist attacks
Optimizing for both is more tricky than it seems. Here I will discuss problems
with existing solutions and present an **offline**, **multi-factor**,
**easy-to-use**, and **extremely strong** solution to password management. Along
the way we'll learn a lot about password security in general!
## Optimizing for high-security
A password is pretty pointless if it's not strong enough to be cracked. Let's
look over some core security concepts!
### Measuring Bits of Entropy
In the security field, "strength" of a password is measured by the *entropy* of
the password. You'll often hear that passwords should be "60 bits of entropy" or
some other number. The higher your [bits] of entropy, the stronger your
password. In fact a password with 41 bits of entropy is twice as strong as one
with 40 bits of entropy. Going from 20 bits of entropy to 30 makes the password
over 1000x stronger!
To understand how to compute entropy, let's consider an example. Say I make a
password with the following constraints:
- It's made entirely of the characters `A`, `B`, and `C`
- It's 12 characters long
- Each of the 12 characters is chosen completely randomly
An example of such a password is `AAABCCBCAACB`. To calculate the entropy, we
consider the number of possible passwords we can generate with the above
constraints. For each character we have 3 possibilities, and there are 12
characters, so the entropy is:
```
3^12 = 531,441
```
To calculate "bits of entropy", we just need to take the base-2-logarithm of the
entropy we just computed, giving us 19 bits of entropy for the above case:
```
log2(3^12) = log2(531,441) = 19
```
The reason this is "bits of entropy" is since `2^19 ~= 531,441`.
For the mathematically inclined, this is different from the information-theory
concept of entropy, as computers are deterministic systems. Therefore, digital
security usually assumes pseudo-random numbers are good enough, which is true in
practice.
A more general formula to remember is:
```
strength = bits of entropy = log2( #possible_characters ^ password_length )
```
### Kerckhoffs's Principle
The common idea in digital security is that the attacker knows exactly how
you're defending your system. Obscurity is not considered to add to security in
any way. This is a pretty important principle to understand why security seems a
bit overkill sometimes, but it's a very realistic concept.
By writing this blog, anyone on the internet now knows how I protect my
passwords. However, since my approach is in line with Kerckhoffs's Principle,
this isn't a security concern in any respect.
If you'd like to read more about Kerckhoffs's Principle, check out [this
article](https://nordvpn.com/cybersecurity/glossary/kerckhoffs-principle/) by a
questionable VPN provider.
### Making Passwords Stronger
Taking into consideration the above, we can now determine what makes a good
password. Let's take a look at that entropy formula again:
```
strength = bits of entropy = log2( #possible_characters ^ password_length )
```
One interesting observation here is how increasing the password length will
increase the exponent, often making a larger impact on the password strength, as
compared to using more characters. Consider the following base password:
- Characters: `a-z`, `A-Z`, and `0-9`
- Length: 16
This password has `log2(62^16) = 95` bits of entropy. If we make this password
17 characters instead we get `log2(62^17) = 101` bits of entropy. However, if we
now add the `$` character to the possible characters in the password, it still
has `log2(63^16) = 95` bits of entropy!
In general, you always want to increase the number that's smaller. Since most
passwords require at least one of each `a-z`, `A-Z`, `0-9`, the character set
number starts out around 64. However, most passwords are only about 11
characters long! This means it's almost always beneficial to make a longer
password, instead of varying up the characters.
Let's take another example of two passwords:
1. `balhajisundoubtedlythebesttoyatikea`: length 32, character set 26
2. `S0mE1EEtc*dedP@ssword`: length 21, character set ~72
The first password has `log2(26^2) = 150` bits of entropy, while the second one
has `log2(72^21) = 129` bits of entropy. The first password is **2 billion times
stronger while being much easier to remember!**
### Making Passwords Easy to Remember
The famous [XKCD comic](https://xkcd.com/936/) comments on how it's actually
pretty easy to make very strong passwords. All you need to do is think up a
sentence using real words! As we saw above, length tends to count more for
password strength, so a long sentence with simple characters will easily
outclass any sort complex password.
You could use a password generator that chooses N random words from a list of M
possible words. This provides `log2(M^N)` bits of entropy. Generally it's quite
easy to find a list of `10-30k` English words online. Then a password using
**just 4 words** will have around `54` bits of entropy!
I highly recommend using a password like this, especially if you're going to
have to type it in on a mobile device often. While they're certainly weaker for
their length compared to completely randomly generated passwords, the
convenience is worth the trade-off.
### How Strong Should My Password Be?
There is a very wide spectrum of opinions on this matter. I will provide mine.
Let's start by recognizing that most passwords are stored in one of 128, 256, or
512 bit hashes. Usually 256, but older systems are often 128. This means you
often *cannot have a password more secure than 256 bits of entropy*. This is a
result of the output space being lower dimensional than any password with higher
entropy, so any "stronger" password would be projected down to only 256 bits of
entropy.
We can also look at how fast computers can brute-force passwords.
[Bcrypt](https://en.wikipedia.org/wiki/Bcrypt) is one of the most popular
hashing choices for passwords. Assuming a company is decently secure, they use
enough rounds of hashing such that a modern processor takes about 100ms to hash
a possible password. Rounding up, that means an attacker can try about 1 million
passwords per day per core. Assuming they have a monstrous 1000 core system,
they can crack through 29 bits of entropy in one day.
Based on that number, your password should be around 60 bits of entropy for a
safe space of true security. More security-concious users often target around
100 bits of entropy instead, to make sure advancements in processor speeds never
catch up to their passwords.
I would personally aim for making your password 15-20 simple lower-case
characters long. This provides 70-94 bits of entropy alone, and often shouldn't
be very difficult to remember!
### Duplicate Passwords
Pretty much everything we've discussed up until this point falls apart if you
use duplicate passwords. For many many different reasons, a password may become
compromised. Your network may be hijacked, your computer may have a keylogger
installed, even just someone recording a video on a phone can easily get your
password.
Do NOT use duplicate passwords. Do NOT vary passwords by 1-2 characters, create
completely new distinct passwords every time.
## Password Managers
The current most wide-spread solution to strong but easy-to-use passwords are
password managers. Some of them, like [KeePassXC](https://keepassxc.org/), are
actually a good secure solution!
Unfortunately, people usually use a big corporate solution instead. These
usually store your passwords in the cloud, which is a complete disaster. A
single leak means all your accounts an immediately compromised; too many eggs in
one basket.
This isn't even unusual! Lastpass had a [major security
breach](https://blog.lastpass.com/posts/2022/12/notice-of-recent-security-incident)
in 2022. As of writing, Lastpass, Dashlane, and 1Password are [all
compromised](https://www.forbes.com/sites/daveywinder/2023/12/11/android-warning-1password-dashlane-lastpass-and-others-can-leak-passwords/?sh=1c019c497dbf)
on Android. To me, this is completely unacceptable for something that holds keys
to your all your accounts.
That said, if you make a reasonable choice of password manager, they can be a
rather no-frills solution to most people. For those looking for top-notch
security though, it may be worth considering the pass-gpg approach instead.
### Browser-Saved Passwords
A browser is not a password manager. It is a complete joke how easy it is to rip
out passwords from a browser. This [short python
script](https://github.com/priyankchheda/chrome_password_grabber) can do it! So
can [this one](https://github.com/henry-richard7/Browser-password-stealer) and
[this one](https://github.com/JustYuuto/Yuuto-Stealer)... It's so easy to build
one, you can do it yourself under an hour!
To be fair, for most people this is probably fine. Unless malware gets access to
your computer, it's unlikely to be stolen... but I still wouldn't put my
recovery email passwords nor my banking information in these.
## 2-Factor Authentication
Almost all services now offer 2-Factor Authentication (2FA). In fact, it's
increasingly a requirement to sign up for services. Although it seems like a
hardy security method, it's not a replacement for strong passwords.
2FA can be quite beneficial for people who don't make very strong passwords. At
least with this method, their password is effectively multiplied by 100000
possibilities. However, that's only about 16 bits of entropy, which isn't a very
big increase.
I also personally feel 2FA can be incredibly inconvenient. It's not a given I
have my phone nearby every time I use my computer. One can only imagine the
situation where your phone dies and you just can't access your accounts anymore.
That said, if you're fine with the inconvenience, there's no harm in adding 16
bits of entropy to your passwords.
### SMS 2FA
This is a completely different game. Unlike 2FA apps which require typing in a
code and verifying any new device using an existing device, SMS just needs
access to your phone number. A phone number is controlled by your cellular
provider, not you. This means their customer service agents can easily reassign
your phone number to another phone!
When I got my phone number, I actually ended up with SMS verification for the
previous owner's AirBnB account! This isn't the worst possible account to
compromise, but this happened by complete accident. Further, identity theft is a
much more real threat than a hacker with a 1000-core compute cluster. All they
have to do is deceive a customer service agent over the phone, and your SMS
number is now theirs.
In short, avoid SMS-based 2FA in favor of apps like [Authy](https://authy.com/)
that don't create such a huge security gap.
## GPG and Pass
If you truly want strong security, while also having ease of use, we'll use the
OG method of digital identity verification: GPG keys.
### GPG Keys
A GPG key is a file associated with an email, expiry date, and password. It can
be used to encrypt and decrypt any binary encoding. Originally, it was meant as
a way to send emails that only the receiver will be able to decrypt.
I highly recommend this [amazing GPG
cheatsheet](https://gist.github.com/johnfedoruk/7f156d844af54cc91324dff4f54b11ce),
though I'll cover the necessary bare minimum here. It's a bit unfortunate how
GPG has one of the most confusing interfaces of any command line tool.
To start, create a new key for yourself. Roughly follow the below, replacing
pieces with your information:
```
gpg --full-gen-key --expert
> (9) ECC and ECC
> (1) Curve 25519
> 1y # Optional, 1 year is recommended
> Your Name # This will be visible in the public key
> emiliko@mami2.moe # This is also visible in the public key
> No comment
> Put a password on the primary key
```
It's important you put a *very* strong password on your GPG key, one that **you
can memorize by heart**. This will be the one and only password you will ever
need to remember from this point forward, but it must be strong!
GPG keys can be used to sign git commits, encrypt and decrypt messages, but we
won't cover that here. All we really need is to set it up. You can also view
your keys at any time using `gpg --list-keys`.
### Pass
[Pass](https://www.passwordstore.org/) is my favourite password manager. I like
it since it's simple, transparent, and secure. It's so simple in fact, it'd be
pretty easy to write your own implementation in an hour or two.
All your passwords will be stored in the `~/.password-store` directory for your
user. You can organize them by directories, just as you organize your file
system ordinarily. The trick is that `pass` will ensure all these files are
encrypted with your default GPG key.
Using pass is super simple. To add a password just type it in:
```
pass insert github/password
```
You can also watch it being typed in with the echo `-e` flag, which I find quite
helpful:
```
pass insert -e github/email
```
You can list the names of the password files you have, without decrypting
anything:
```
pass show # All of them
pass show github # All the ones in the github directory
```
Finally, you can decrypt passwords. Use the `-c` option to copy the password to
your clipboard. The clipboard will clear after a few seconds:
```
pass show github/password # Actually prints it to the terminal
pass show -c github/password # Copies it to the clipboard for a few seconds
```
Tab competition is well supported. You can also use `pass mv` to rename files
and `pass rm` to delete them. Really, it's just super simple.
### Why Use Pass and GPG?
There are many reason to do so:
- **Locally stored**: These are much harder to obtain than passwords stored on the
cloud.
- **Easy migration**: You literally `scp -r` your `~/.password-store` directory to
migrate them to a new computer.
- **3-factor authentication**: An attacker needs your GPG key, your GPG key's
password, and your `~/.password-store` directory to perform a successful
attack. Missing any of these 3 will prevent any attack from succeeding
- **0-type approach**: You can use very very strong passwords and never need to
type them in (they're on your clipboard!)
- **Secure scripting**: It's very easy to put passwords securely into shell
scripts with `"$(pass show cloudflare/token)"`
The 3-factor method is even more helpful when considering common attacks. For
example, if a phone camera records you typing on your keyboard to decrypt the
GPG key, the attacker can't do *anything* with that password alone. They still
need physical access to your system to grab the files themselves.
An odd benefit of 3-factor authentication is distributing backups. If you
provide people who know you, but mutually don't know one another, you can safely
entrust your passwords with third parties. This is since they need all 3 pieces
to mount an attack, so giving a trusted third party only 1 piece doesn't
compromise your security.
Malware *could* be both a key logger and grab the files from
`~/.password-store`, but that is some very sophisticated and targeted
malware... most of the keylogger ones will just assume the password you typed is
the actual password to your accounts, which it isn't!
I hope this article was useful in building an understanding of digital security.
Maybe you'll even consider using GPG and Pass!