-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
#141 Working on integrating phileas-benchmark.
- Loading branch information
1 parent
ae40648
commit 4a107c6
Showing
10 changed files
with
866 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,4 +20,6 @@ manifest.json | |
model.onnx | ||
vocab.txt | ||
venv/ | ||
site/ | ||
site/ | ||
.idea | ||
*.jsonl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# phileas-benchmark | ||
Benchmark tests for Phileas PII engine | ||
|
||
This command-line utility runs a series of single-threaded workloads using [Phileas](https://github.com/philterd/phileas) | ||
to redact PII tokens in strings of varying sizes. Workloads can be run multiple times to warm up the JVM or test long-term use. | ||
Workloads run for a fixed amount of time rather than a fixed number of iterations. | ||
|
||
[![CodeFactor](https://www.codefactor.io/repository/github/philterd/phileas-benchmark/badge)](https://www.codefactor.io/repository/github/resurfaceio/phileas-benchmark) | ||
|
||
## Dependencies | ||
|
||
* Java 21 | ||
* Maven 3.9.x | ||
* [philterd/phileas](https://github.com/philterd/phileas) | ||
|
||
## Running Locally | ||
|
||
``` | ||
mvn clean package | ||
# run workloads across all documents | ||
java -server -Xmx512M -XX:+AlwaysPreTouch -XX:PerBytecodeRecompilationCutoff=10000 -XX:PerMethodRecompilationCutoff=10000 -jar target/phileas-benchmark-cmd.jar all mask_all 1 15000 | ||
# run workloads for specific document | ||
java -server -Xmx512M -XX:+AlwaysPreTouch -XX:PerBytecodeRecompilationCutoff=10000 -XX:PerMethodRecompilationCutoff=10000 -jar target/phileas-benchmark-cmd.jar gettysberg_address mask_credit_cards 1 1000 | ||
``` | ||
|
||
To get the results back as a JSON object, append a `json` argument to the command: | ||
|
||
``` | ||
java -server -Xmx512M -XX:+AlwaysPreTouch -XX:PerBytecodeRecompilationCutoff=10000 -XX:PerMethodRecompilationCutoff=10000 -jar target/phileas-benchmark-cmd.jar all mask_all 1 15000 json | ||
``` | ||
|
||
### Available documents | ||
|
||
* hello_world (11 chars) | ||
* gettysberg_address (1474 chars) | ||
* i_have_a_dream (7727 chars) | ||
|
||
### Available redactors | ||
|
||
For testing single identifiers: | ||
* mask_bank_routing_numbers | ||
* mask_bitcoin_addresses | ||
* mask_credit_cards | ||
* mask_drivers_licenses | ||
* mask_email_addresses | ||
* mask_iban_codes | ||
* mask_ip_addresses | ||
* mask_passport_numbers | ||
* mask_phone_numbers | ||
* mask_ssns | ||
* mask_tracking_numbers | ||
* mask_vehicle_numbers | ||
|
||
For testing multiple identifiers: | ||
* mask_all (the identifiers listed above 👆) | ||
* mask_fastest (bank routing numbers, bitcoin addresses, credit cards, email addresses, IBAN codes, phone numbers, ssns) | ||
* mask_none | ||
|
||
--- | ||
Copyright 2024 Philterd, LLC @ https://www.philterd.ai |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
<parent> | ||
<groupId>ai.philterd</groupId> | ||
<artifactId>phileas</artifactId> | ||
<version>2.9.0-SNAPSHOT</version> | ||
</parent> | ||
<artifactId>phileas-benchmark</artifactId> | ||
<name>phileas-benchmark</name> | ||
<build> | ||
<plugins> | ||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-surefire-plugin</artifactId> | ||
<version>${maven.surefire.version}</version> | ||
<configuration> | ||
<groups>benchmarks</groups> | ||
<argLine>-server -Xmx512M -XX:+AlwaysPreTouch -XX:PerBytecodeRecompilationCutoff=10000 -XX:PerMethodRecompilationCutoff=10000</argLine> | ||
<systemProperties> | ||
<property> | ||
<name>phileasVersion</name> | ||
<value>${project.version}</value> | ||
</property> | ||
</systemProperties> | ||
</configuration> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
<dependencies> | ||
<dependency> | ||
<groupId>ai.philterd</groupId> | ||
<artifactId>phileas-core</artifactId> | ||
<version>${project.version}</version> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.mscharhag.oleaster</groupId> | ||
<artifactId>oleaster-matcher</artifactId> | ||
<version>0.2.0</version> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.junit.jupiter</groupId> | ||
<artifactId>junit-jupiter-engine</artifactId> | ||
<version>${junit.version}</version> | ||
<scope>test</scope> | ||
</dependency> | ||
</dependencies> | ||
</project> |
103 changes: 103 additions & 0 deletions
103
phileas-benchmark/src/test/java/ai/philterd/phileas/benchmarks/Documents.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
/* | ||
* Copyright 2024 Philterd, LLC @ https://www.philterd.ai | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package ai.philterd.phileas.benchmarks; | ||
|
||
import java.util.AbstractMap; | ||
import java.util.List; | ||
import java.util.Map; | ||
|
||
/** | ||
* Predefined documents for redaction benchmarks. | ||
*/ | ||
public class Documents { | ||
|
||
// todo add JSON-encoded documents | ||
// todo add documents with PCI matches | ||
|
||
public static final String GETTYSBURG_ADDRESS = """ | ||
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. | ||
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. | ||
But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth. | ||
"""; | ||
|
||
public static final String I_HAVE_A_DREAM = """ | ||
I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation. | ||
Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon of hope to millions of slaves, who had been seared in the flames of whithering injustice. It came as a joyous daybreak to end the long night of their captivity. But one hundred years later, the colored America is still not free. One hundred years later, the life of the colored American is still sadly crippled by the manacle of segregation and the chains of discrimination. | ||
One hundred years later, the colored American lives on a lonely island of poverty in the midst of a vast ocean of material prosperity. One hundred years later, the colored American is still languishing in the corners of American society and finds himself an exile in his own land So we have come here today to dramatize a shameful condition. | ||
In a sense we have come to our Nation’s Capital to cash a check. When the architects of our great republic wrote the magnificent words of the Constitution and the Declaration of Independence, they were signing a promissory note to which every American was to fall heir. | ||
This note was a promise that all men, yes, black men as well as white men, would be guaranteed the inalienable rights of life liberty and the pursuit of happiness. | ||
It is obvious today that America has defaulted on this promissory note insofar as her citizens of color are concerned. Instead of honoring this sacred obligation, America has given its colored people a bad check, a check that has come back marked “insufficient funds.” | ||
But we refuse to believe that the bank of justice is bankrupt. We refuse to believe that there are insufficient funds in the great vaults of opportunity of this nation. So we have come to cash this check, a check that will give us upon demand the riches of freedom and security of justice. | ||
We have also come to his hallowed spot to remind America of the fierce urgency of Now. This is not time to engage in the luxury of cooling off or to take the tranquilizing drug of gradualism. | ||
Now is the time to make real the promise of democracy. | ||
Now it the time to rise from the dark and desolate valley of segregation to the sunlit path of racial justice. | ||
Now it the time to lift our nation from the quicksand of racial injustice to the solid rock of brotherhood. | ||
Now is the time to make justice a reality to all of God’s children. | ||
I would be fatal for the nation to overlook the urgency of the moment and to underestimate the determination of it’s colored citizens. This sweltering summer of the colored people’s legitimate discontent will not pass until there is an invigorating autumn of freedom and equality. Nineteen sixty-three is not an end but a beginning. Those who hope that the colored Americans needed to blow off steam and will now be content will have a rude awakening if the nation returns to business as usual. | ||
There will be neither rest nor tranquility in America until the colored citizen is granted his citizenship rights. The whirlwinds of revolt will continue to shake the foundations of our nation until the bright day of justice emerges. | ||
We can never be satisfied as long as our bodies, heavy with the fatigue of travel, cannot gain lodging in the motels of the highways and the hotels of the cities. | ||
We cannot be satisfied as long as the colored person’s basic mobility is from a smaller ghetto to a larger one. | ||
We can never be satisfied as long as our children are stripped of their selfhood and robbed of their dignity by signs stating “for white only.” | ||
We cannot be satisfied as long as a colored person in Mississippi cannot vote and a colored person in New York believes he has nothing for which to vote. | ||
No, no we are not satisfied and we will not be satisfied until justice rolls down like waters and righteousness like a mighty stream. | ||
I am not unmindful that some of you have come here out of your trials and tribulations. Some of you have come from areas where your quest for freedom left you battered by storms of persecutions and staggered by the winds of police brutality. | ||
You have been the veterans of creative suffering. Continue to work with the faith that unearned suffering is redemptive. | ||
Go back to Mississippi, go back to Alabama, go back to South Carolina go back to Georgia, go back to Louisiana, go back to the slums and ghettos of our modern cities, knowing that somehow this situation can and will be changed. | ||
Let us not wallow in the valley of despair. I say to you, my friends, we have the difficulties of today and tomorrow. | ||
I still have a dream. It is a dream deeply rooted in the American dream. | ||
I have a dream that one day this nation will rise up and live out the true meaning of its creed. We hold these truths to be self-evident that all men are created equal. | ||
I have a dream that one day out in the red hills of Georgia the sons of former slaves and the sons of former slaveowners will be able to sit down together at the table of brotherhood. | ||
I have a dream that one day even the state of Mississippi, a state sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice. | ||
I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by their character. | ||
I have a dream today. | ||
I have a dream that one day down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification; that one day right down in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers. | ||
I have a dream today. | ||
I have a dream that one day every valley shall be engulfed, every hill shall be exalted and every mountain shall be made low, the rough places will be made plains and the crooked places will be made straight and the glory of the Lord shall be revealed and all flesh shall see it together. | ||
This is our hope. This is the faith that I will go back to the South with. With this faith we will be able to hew out of the mountain of despair a stone of hope. | ||
With this faith we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. | ||
With this faith we will be able to work together, to pray together, to struggle together, to go to jail together, to climb up for freedom together, knowing that we will be free one day. | ||
This will be the day when all of God’s children will be able to sing with new meaning “My country ’tis of thee, sweet land of liberty, of thee I sing. Land where my father’s died, land of the Pilgrim’s pride, from every mountainside, let freedom ring!” | ||
And if America is to be a great nation, this must become true. So let freedom ring from the hilltops of New Hampshire. Let freedom ring from the mighty mountains of New York. | ||
Let freedom ring from the heightening Alleghenies of Pennsylvania. | ||
Let freedom ring from the snow-capped Rockies of Colorado. | ||
Let freedom ring from the curvaceous slopes of California. | ||
But not only that, let freedom, ring from Stone Mountain of Georgia. | ||
Let freedom ring from every hill and molehill of Mississippi and every mountainside. | ||
When we let freedom ring, when we let it ring from every tenement and every hamlet, from every state and every city, we will be able to speed up that day when all of God’s children, black men and white men, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the old spiritual, “Free at last, free at last. Thank God Almighty, we are free at last. | ||
"""; | ||
|
||
public static final List<String> keys = List.of( | ||
"hello_world", | ||
"gettysburg_address", | ||
"i_have_a_dream" | ||
); | ||
|
||
public static final Map<String, String> map = Map.ofEntries( | ||
new AbstractMap.SimpleEntry<>("hello_world", "Hello world"), | ||
new AbstractMap.SimpleEntry<>("gettysburg_address", GETTYSBURG_ADDRESS), | ||
new AbstractMap.SimpleEntry<>("i_have_a_dream", I_HAVE_A_DREAM) | ||
); | ||
|
||
public static String get(String document) { | ||
if (map.containsKey(document)) { | ||
return map.get(document); | ||
} else { | ||
throw new IllegalArgumentException("Invalid document name: " + document); | ||
} | ||
} | ||
|
||
} |
Oops, something went wrong.