A Rust-powered tool for encrypting files securely using AES-GCM and enabling searchable encryption through deterministic tokens.
BlindSearch encrypts files using a password-derived key and stores the encrypted version with an .enc extension. It also optionally generates encrypted search tokens (keywords) that enable secure querying over encrypted data—without revealing the plaintext.
- 📂 Accepts a mandatory file path argument.
- 🔑 Securely prompts for a password (no command-line exposure).
- 🔒 Uses AES-256-GCM for authenticated encryption (confidentiality + integrity).
- 🧂 Derives keys with Argon2id and stores salt in
<filename>.salt. - 🔁 Supports:
- 🔐 Encryption/decryption based on file extension.
- 🔖 Search token generation with
-g/--generate_keywords. - 🔍 Keyword search with
-q <keyword>/--query <keyword>.
If you are using Ubuntu you need curl for downloading the installation script, and you need gcc, pkg-config and libssl-dev to build the source code.
On Ubuntu you can do:
sudo apt install curl gcc pkg-config libssl-dev
Install Rust: https://www.rust-lang.org/tools/install
Then after decompressing the zip file:
cd blindsearch
cargo build --release
cargo test
If you want to install it on you system:
cargo install --path .
cargo run -- input.txt # encrypt
cargo run -- input.txt.enc # decrypt
cargo run -- input.txt -g # generate tokens
cargo run -- input.txt -q secrets # positive test
cargo run -- input.txt -q secret # negative test
Simple encryption:
> blindsearch input.txt
🔐 Enter your password:
✅ Loaded existing salt from input.txt.salt
Encrypting file: input.txt
✅ Encrypted file: "input.txt.enc"
Blind Search finished successful!
Generate keywords:
> blindsearch input.txt -g
🔐 Enter your password:
✅ Loaded existing salt from input.txt.salt
Encrypting file: input.txt
✅ Encrypted file: "input.txt.enc"
Generating keywords tokens in file: input.txt.keywords
Blind Search finished successful!
Query:
> blindsearch input.txt -q secrets
🔐 Enter your password:
✅ Loaded existing salt from input.txt.salt
Querying for keyword: secrets
Keyword 'secrets' found in the encrypted database.
In order to generate a large.txt file, we did:
head -n 100000 /usr/share/dict/words > large.txt
This resulted in a file of size ˜1 Mb.
Then we got the worst case with regards to the size of the keywords extracted from it, since there is no repeatition, which resulted in file size of 1.59 Mb.
This project implements a simplified form of Searchable Symmetric Encryption (SSE), inspired by the construction in Figure 2 (page 21) of the paper:
Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions by Curtmola et al.
We make the following simplifications and assumptions:
- Since this tool operates on a single document at a time, we fix
n = 1(number of documents). Therefore, we do not append indices likew || ito the keywords, as the paper suggests for multi-document settings. - The document identifier is simply the filename. Avoid embedding sensitive information in filenames, as they are not encrypted.
To enable encrypted search:
- Deterministic encryption (e.g., AES in GCM mode) is used to convert each keyword into a fixed-length (16-byte) search token.
- These tokens are:
- Truncated to a uniform length.
- Shuffled before being written to the
.keywordsfile to prevent revealing search patterns or keyword order.
An adversary:
- Cannot generate valid tokens without the user's encryption key.
- Cannot distinguish between valid and dummy tokens.
To query:
- The user inputs a keyword.
- It is deterministically encrypted using the same process and key.
- The token is compared against the stored list.
- If found, the keyword is confirmed to exist in the document.
-
File encryption method: AES-GCM is part of many different standards, including NIST and FIPS, and is specifically adopted for encryption of data at rest by big companies like Amazon and Google, see the References.
-
Deterministic encryption: The original paper recommends using a PRP for generation of tokens. This means that the function used must be invertible and its output must be indistiguishable from random. Therefore, naive encryption would leak the keyword length. One solution for it is to use fixed length keywords, padding when needed. Here we simply adopt truncated AES-GCM with zero nonce. We leave as a future task to implement padding appropriately for different sizes of keywords like suggested in the paper.
This design assumes the following:
- The adversary may observe:
- The encrypted file (
.enc) - The salt (
.salt) - The keyword tokens (
.keywords)
- The encrypted file (
- The adversary does not know the user's password.
- The adversary cannot:
- Derive the encryption key from the password (thanks to Argon2id).
- Generate valid tokens.
- Access plaintext contents or keywords.
This model does not hide:
- Access/query patterns (i.e., which tokens are queried and how often).
More specifically, the paper improves the model called IND2-CKA, where CPA-security is adapted to protect keywords. The high level idea is that an adversary can not distinguish keywords encryption from random data, therefore keywords are secure. The model must assume the adversary have no information about plaintexts, because deterministic encryption would always output the same result, hence the adversary would win the underlying security game. Beyond the basic model, the paper provides a simulation-based definition, which gives an extra guarantee.
- The number of keywords is revealed by the
.keywordsfile. - This can be mitigated by:
- Defining a maximum keyword capacity.
- Padding unused slots with random dummy tokens.
- ✅ Fast and simple implementation suitable for local single-file use.
- ❌ Does not support complex queries, multi-document indexing, or access pattern privacy.
- Other cryptographic solution could be used to solve all the privacy issues, like for example FHE, however it would be orders of magnitude slower in practice.
Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions. Reza Curtmola, Juan Garay, Seny Kamara, Rafail Ostrovsky.
Practical Challenges with AES-GCM and the need for a new cipher. Panos Kampanakis, Matt Campagna, Eric Crocket, Adam Petcher, Shay Gueron.
Searchable Encryption: The “Just Right” Balance of Data Privacy and Usability Blind Insight