Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the objectives of this library? #34

Open
k00ni opened this issue May 15, 2024 · 2 comments
Open

What are the objectives of this library? #34

k00ni opened this issue May 15, 2024 · 2 comments
Labels
question Further information is requested

Comments

@k00ni
Copy link
Contributor

k00ni commented May 15, 2024

Your question

What are the objectives of this library?

Its stated in the README.md: Because TransformersPHP is designed to be functionally equivalent to the Python library, it's super easy to learn from existing Python or Javascript code. Does that mean that you aim for a 99% coverage of the HuggingFace Python library? Or in other words: are you re-implementing the whole HuggingFace Python library in PHP?

Context (optional)

I recently started using HuggingFace and was surprised, that the PHP-support is so bad. Luckily, I found your library (saw your comment in HuggingFace forum).

I am wondering why there isn't a basic Python wrapper in PHP at least? My local tests show that its quiet possible to generate Python code on the fly for basic function calls at least. Of course, custom Python code would be hard to realize, but for common functionality its quiet doable.

The following script requires a working PHP 8 and Python 3 environment (with datasets, evaluate, torch, transformers[sentencepiece] installed via pip). Its very basic, but can read a PDF file using a third party lib and uses some Python-code to generate a summary of the PDF content.

use Smalot\PdfParser\Parser;

require __DIR__.'/vendor/autoload.php';

// get all text from a given PDF file
$parser = new Parser();
$document = $parser->parseFile('test.pdf');
$text = $document->getText();

// represents Python code (each entry is a line)
$pythonCode = [
    'from transformers import pipeline',
    'summarizer = pipeline("summarization", "sshleifer/distilbart-cnn-12-6")',
    'result = summarizer(',
    '"""',
    $text,
    '"""',
    ')',
    'print(result)',
];

$pythonFile = __DIR__.'/generated_file.py';
if (file_exists($pythonFile)) {
    unlink($pythonFile);
}

// write custom Python code to file for later execution
$pythonCode = implode(PHP_EOL, $pythonCode);
file_put_contents($pythonFile, $pythonCode);

// execute the Python file and save its output into $output
ob_start();
$output = shell_exec('python3 generated_file.py &2> /dev/null');
ob_end_clean();

var_dump($output);
// outputs [{'summary_text': '...

// get result as array
$resultAsArray = json_decode($output, true);

// process the result ...

Reference (optional)

No response

@k00ni k00ni added the question Further information is requested label May 15, 2024
@CodeWithKyrian
Copy link
Owner

Hey @k00ni,

Thanks for reaching out and bringing up your thoughts on TransformersPHP. I appreciate the opportunity to shed some light on the objectives and rationale behind the library.

First off, you're absolutely right on your first statement - the primary aim of TransformersPHP is indeed to mirror the functionality of the renowned Hugging Face Python library as closely as possible. However, the approach isn't about merely wrapping the Python implementation in PHP. No, the goal is to execute these tasks natively within PHP itself.

TransformersPHP aims to provide native support for ML tasks within the PHP ecosystem. Now, this doesn't imply that every single operation happens purely in PHP. I leverage FFI (Foreign Function Interface) to interact with C libraries for certain tensor operations and running the actual model (just like Python does). But the user experience, the interface, the feel of using TransformersPHP, is indeed native to PHP.

Your example script demonstrates a workaround by generating Python code on the fly for basic function calls and then executing it. While wrapping Python functionality in PHP might seem like a shortcut, it comes with its own set of limitations and complexities. Control over crucial aspects like cache directory, direct model usage, tokenization, post processing, state management, and even performance optimization becomes a challenge. Sure, for simpler tasks like using pipelines, the approach might suffice. But when it comes to more customized usage, the intricacies multiply, making it less viable.

Moreover, relying on a Python environment introduces additional overhead and complexity, which might not resonate well with many PHP developers. Setting up and maintaining a Python environment is a task not everyone is keen on (I'm number 1 on the list😅), especially when they're deeply entrenched in the PHP ecosystem. TransformersPHP, on the other hand, offers a seamless experience. With just a composer install, you're set to use the library within the PHP environment we're all familiar with.

I hope this sheds some light on the philosophy behind TransformersPHP. Your feedback and questions are always welcome as they help shape the direction of the project. If you have any further queries or suggestions, feel free to reach out anytime.

Cheers!

@he426100
Copy link

Actually, there is a Python wrapper in PHP: https://github.com/swoole/phpy/, you can use it like this:

$transformers = PyCore::import( 'transformers');
$AutoTokenizer = $transformers->AutoTokenizer;
$AutoModelForSequenceClassification = $transformers->AutoModelForSequenceClassification;

$os = PyCore::import('os');
$os->environ->__setitem__('https_proxy', getenv('https_proxy'));

$tokenizer = $AutoTokenizer->from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student");
$model = $AutoModelForSequenceClassification->from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants