Skip to content
This repository has been archived by the owner on Dec 11, 2020. It is now read-only.

Faker 2.0 design #1807

Open
pimjansen opened this issue Oct 9, 2019 · 26 comments
Open

Faker 2.0 design #1807

pimjansen opened this issue Oct 9, 2019 · 26 comments
Assignees
Labels
Milestone

Comments

@pimjansen
Copy link
Contributor

Hey all,

Today we discussed that 1.9.0 will be our last minor release version of the 1.x branch. After this no new enhancements will be done and we will start focusing on Faker 2.0. To kick off lets have an open discussion with all of you and share our ideas.

  • Which PHP versions will be supported
  • Which development tooling is going to be used
  • How does the actual architecture looks like

Best,
Pim

@pimjansen pimjansen added the rfc label Oct 9, 2019
@pimjansen pimjansen added this to the 2.0.0 milestone Oct 9, 2019
@pimjansen pimjansen pinned this issue Oct 9, 2019
@pimjansen
Copy link
Contributor Author

My cents:

PHP version
I would really like to kick off fresh by starting with 7.4. I know it will not be backwards compatible but that is the same with 1.x vs 2.x. The main reason for this is typed properties which makes an application way more strict. Faker 1.x is around for so long and is still working fine. The upgrade from 7.2 > 7.4 is an easy one. Also the active support for 7.2 is already ending jan-2020.

Tooling
There is a lot of great tooling available to ensure packages are working fine.

  • PHPStan
  • PHPCS
  • PHPMD (yes it is maintained again)
  • PHPUnit

Architecture
One of the main problems of Faker today is the fact that there are soo many locales. As you might know none of us knows everything which means there are a lot of PRs in locales where we have no idea what it is about. A second problem is the licensing of the content. A lot of this is even unreadable for us so also the content can not really be verified by any of us.

My suggestion would be to split Faker in a Core library which holds the basics. Actually everything that is not locale specific

  • Numbering
  • Date calculation
  • etc

For all the other things the Core will provide some interfaces where needed so that we can ensure that a locale can be connected with the core properly. The same idea here goes for all the providers. They should also be pluggable just like a locale.

The downside actually is that there will be a lot of different libs that should be loaded, however most of the time you use 1 or maybe 2 and that is it. At this point we are always shipping all of them which is totally unneeded.

Myself i will probably provide a nl_NL locale since it is easy for me to read and handle but this will not be coupled with the Faker core itself.

My first cents, probably much more to come though. So let us know

@mohamed-aiman
Copy link

Really support the idea of making a core and pluggable locales. An advantage of this architecture will be the possibility to inject custom locale versions and currently unavailable locales. I am happy to provide dv_MV.

@pimjansen
Copy link
Contributor Author

@localheinz i was thinking about how we can handle the locales and seperate packages where the faker core still holds its value. For example if i have a carrier which can hold "Vodafone", "T-Mobile", "AT&T" and so on. This is typically something that a locale could hold on its own since it will be different for each of the implementations.

However how are we handling that from the core? Are we going to implement the interface that should be implemented for those? This will keep real mean to the core however there is always a lot of locale specific. Like a random VAT charge and so on, different identification types. I don't think the core should and can hold all of that (brings us in the same troubles).

From the other hand, just leaving all of that up to the locale is also not great. This means that there is no real value of the core itself except maybe make it easy for you to load different locales together. But if you are not doing that, why not just load the locale directly in that case? Imo this is not something we should want since it will split it off way too much.

What do you think?

@djunehor
Copy link

I think one way to go about that a core class is created as single point of entry with methods providing non locale-specific properties e.g Date, Random number, text, etc. Other locale classes extend same class and define their methods. This will mean each locale needs to have specific readme to list available methods and properties. Something like this:

//base interface
namespace Faker;
interface FakerContract {
//core method that shouldn't be overridden
public static final function random() : int

//core method that can be overridden
public static function phone() : string
}

//base class
namespace Faker;
class FakerConcrete {
//core method that shouldn't be overridden
public static final function random() : int {}

//core method that can be overridden
public static function phone() : string {}
}

namespace Faker\en_NG;

class Phone extends FakerConcrete {
//override parent method
public static function phone() {}

//locale specific
public static state() {}
}

@fzaninotto
Copy link
Owner

I'm OK with these ideas. Another thing to work on is the ability for locale-specific Fakers to use a different charset for the Text providers. Users of non-latin alphabets like Japanese or Arabic currently can't use anything else than RealText, which is low as hell and probably not fit for generating random words.

@pimjansen
Copy link
Contributor Author

@fzaninotto agree! One thing i however did not mention yet is the ORM integration there is. Are we going to keep that or pull that and maybe publish it as a standalone provider?

@fzaninotto
Copy link
Owner

I think they should be moved out and managed by the ORM developers. We can't know and master every ORM out there!

@ManojKiranA
Copy link

i can help with laravel ORM

@pimjansen
Copy link
Contributor Author

i can help with laravel ORM

Good to hear @ManojKiranA. Once we have a first beta of the Faker core i think its time to think about how to implement providers that can hook into ORM there.

@ManojKiranA
Copy link

how to implement providers that can hook into ORM there.

Waiting for it 😎

@joelharkes
Copy link

joelharkes commented Oct 29, 2019

Biggest problem I currently have with this library is that the random method/seed is global/singleton (mt_srand mt_rand).

It would be amazing if each Faker instance can have its own Seed. Or at least faker should base on 1 randomBytes() or randomInt() method which is easily overridable so an alternative generator (with an instance seed) can be used.

psuedo code below:

$f1 = new Faker();
$f2 = new Faker();
$f2->seed(1235);
$f1->seed(1234);
$f2->number(); // this will be first number from seed 1234 instead of 1235

@pimjansen
Copy link
Contributor Author

@joelharkes not sure if this is 100% the case but if so i agree. The seed should be isolated for each instance on its own indeed

@fzaninotto
Copy link
Owner

Random thoughts, feel free to comment

namespace Faker\English;

// general purpose, locale-specific factory
class Factory extends Faker\Core\Factory {
    protected static $defaultProviders = [
        'address' => Faker\English\Address::class,
        'barcode' => Faker\Core\Barcode::class,
        'color' => Faker\English\Color::class,
        'datetime' => Faker\English\DateTime::class,
        'image' => Faker\Core\Image::class,
        'internet' => Faker\English\Internet::class,
        'lorem' => Faker\Latin\Lorem::class,
        'misc' => Faker\English\Miscellaneous::class,
        'payment' => Faker\English\Payment::class,
        'person' => Faker\English\Person::class,
        'phone' => Faker\English\PhoneNumber::class,
        'text' => Faker\English\Text::class,
        'uuid' => Faker\English\Uuid::class,
    ];

    // the sttic create() method comes from the parent
}

// specialized purpose, locale-specific factory
class EcommerceFactory extends Faker\Core\Factory {
    protected static $defaultProviders = [
        'address' => Faker\English\Address::class,
        'barcode' => Faker\Core\Barcode::class,
        'color' => Faker\English\Color::class,
        'datetime' => Faker\English\DateTime::class,
        'image' => Faker\Core\Image::class,
        'lorem' => Faker\Latin\Lorem::class,
        'payment' => Faker\English\Payment::class,
        'person' => Faker\English\Person::class,
        'phone' => Faker\English\PhoneNumber::class,
        'text' => Faker\English\Text::class,
    ];
}

// allowing users to oferride the providers of a particular factory
class MyEcommerceFactory extends Faker\English\EcommerceFactory {
    // is it possible in PHP?
    protected static $defaultProviders = [
        ...Faker\English\EcommerceFactory::$defaultProviders,
        'image' => My\Image::class
    ];

    // if we cannot do it, let's just do
    public static function create() {
        return Faker\English\EcommerceFactory::create([
            'image' => My\Image::class,
        ])
    }
}

// usage: use localize factory directly
$faker = new Faker\English\Factory::create();

// multi-language support
$englishFaker = new Faker\English\CRMFactory::create();
$frenchFaker = new Faker\French\CRMFactory::create();

$multiFaker = new \Faker\Core\LanguageAggregate($englishFaker, $frnechFaker)
echo $multiFaker->lastName(); // chooses either one of the locales

@fzaninotto fzaninotto changed the title Faker 2.0 version Faker 2.0 design Nov 14, 2019
@JoshuaLuckers
Copy link

There should be a distinction between the "country" and "language". For example: Belgium might format addresses differently than The Netherlands but they (might) speak the same "language".

@svenluijten
Copy link

svenluijten commented Nov 15, 2019

@JoshuaLuckers The concept of "locales" solves that problem; you'd have nl_BE and nl_NL, where the language for both is Dutch (nl), but the location is different (NL/BE). This way you can localize everything like formatting addresses/phone numbers etc, while still maintaining the same spoken language.

So instead of using the language names in the namespaces (like \Faker\English\Text::class), I propose we use the locale: \Faker\NL\BE\Text::class and \Faker\NL\NL\Text::class. In addition to the classes provided by core, of course: \Faker\Core\Text::class. Typing it out though, having a namespace like \Faker\NL\NL\... might make the API a bit confusing to use, so we'd need the docs to be crystal clear on this.

Thoughts?

@rflavien
Copy link

@svenluijten it may not rely on a crystal clear doc with explicite namespace like :
\Faker\Language\NL\Location\BE\Text::class
\Faker\Language\NL\Location\NL\Text::class could be simplified with \Faker\Language\NL\Text::class

🤷‍♂️

@stof
Copy link
Contributor

stof commented Nov 15, 2019

another option is to keep the locale itself as a single segment of the namespace: \Faker\nl_NL\Test::class

@Hipska
Copy link

Hipska commented Nov 15, 2019

What about the idea of linked data? Eg. generate 1 set of data to be able to get related data fields. Similar as @joelharkes mentioned, but little more extended:

$faker->fixed(true);
echo $faker->name;
    // Adaline Reichel
echo $faker->firstName;
    // Adaline
echo $faker->lastName;
    // Reichel
echo $faker->safeEmail;
    // [email protected]

// ...

echo $faker->name; 
    // Adaline Reichel
$faker->next();
echo $faker->name;
    // Roscoe Johns

@recchia
Copy link

recchia commented Nov 15, 2019

I like have factories in faker as factory-muffins do, would be great for tests

@hubertnnn
Copy link

hubertnnn commented Nov 21, 2019

I do like the idea of linked data, in fact I came here to suggest the same.
Though it might require a bit more complex sourcing.
My example would involve address that will link city, country, post code and GPS coordinates.

When it comes to instance of faker per locale, I don't think that is a good idea.
It would quickly turn into a big mess of dependency injection.
Also it would be nice to have an easy way to use multiple locales.

Instead I would suggest to have a main Faker/Faker class and then multiple Faker/Provider classes that will describe themselves (using a getLoacle method or something similar).
Then you could fetch data using something similar to current modifiers.
Eg. $faker->locale('en_US|ru_RU|es_ES')->firstName(); to get a name in one of 3 languages.

This could be even expanded to create some form of FakerContext object that will hold details about what we want now.

$faker = Faker/Factory::create();
$seededFaker = $faker->seed(123);
$englishFaker = $seededFaker->locale('en_US');
$person = $englishFaker->person();
$fullname = $person->name; // Adaline Reichel
$firstName = $person->firstName; // Adaline
$lastName - $person->lastName; // Reichel

In this example $faker could be an instance of faker itself,
$seededFaker would be a context with seed stored,
$englishFaker would be a new Context (clone) with seed and locale set.
$person would be a special context with data already picked.

@xfudox
Copy link

xfudox commented Nov 21, 2019

@hubertnnn 's idea looks promising to me

@pimjansen
Copy link
Contributor Author

@hubertnnn i was thinking more like this:

$faker = new Faker();
$faker->addLocale(new MyEnglishLocale);
$faker->addLocale(new MyFrenchLocale);
$faker->addLocale(new MyGermanLocale);
$faker->addProvider(new CustomTelcoEcommerceProvider);

$faker->getFirstname(); // Outputs a random firstname from the list of given locales

The only concern im having that this would work fine for a given interface spec where the locale implements a certain interface (and ofc the core methods like digits/date and so on). However in some cases there are also methods specific on a locale. For example different kind of person registration types and so on.

@hubertnnn
Copy link

hubertnnn commented Nov 22, 2019

@pimjansen
Your solution is not that far from mine. I did not show the providers part cose I am not sure how the providers should be loaded in the standalone version (aka. in Faker\Factory::create()), but I was thinking that when used in a framework, the service provider would generate something like this:

function provideFaker() 
{
    $faker = new Faker();
    $faker->addProvider(new MyEnglishLocale);
    $faker->addProvider(new MyFrenchLocale);
    $faker->addProvider(new MyGermanLocale);
    $faker->addProvider(new CustomTelcoEcommerceProvider);
    return $faker;
}

then you could use the provided faker directly or use a context to narrow the operations:

$faker->getFirstName();
$faker->seed(100)->getFirstName();
$faker->locale('en_US')->getFirstName();
$faker->person()->getFirstName();

All the above would be a valid way to call getFirstName.

So we would have 2 points to set locale:

  1. During creation of faker we will add providers for supported locales.
  2. During generation of data we would be able to filter providers used.

@pimjansen
Copy link
Contributor Author

@hubertnnn yeah that could indeed be an idea ofcourse. We should aim for an as easy as possible usecase. Im pretty sure that the way it works now won't be backwards compatible. So it is very import to do it correct for the users right away

@AhmedRaafat14
Copy link

@pimjansen May I ask about the status of the project so far?! Do you need help on the new design development?!

@ckrack
Copy link

ckrack commented Jul 3, 2020

The Guesser should be usable from outside and should have a method to get the guessed name instead of an anonymous Closure.
I'm working on a code-generator for tests, that should auto-populate with Faker calls.

Also: The Guesser could have an option for localization.
We could have a registry for each available Faker and the Faker could have an own Guesser method, that receives the name and additional params and returns true if it matched.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests