-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(SemanticSearch): Added a new semantic search agent that uses fuzzy string mathcing and levenshtein distance. #103
base: master
Are you sure you want to change the base?
Conversation
…. The project was not building without this update, using the same package values specified.
…zy string mathcing and levenshtein distance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looks good, Needs test!!
Maybe we can add the agent to Build and Test stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hero2323 , I tested the working! Found some issues :)
Also, Update the README on how to use SemanticSearch Agent. (processLicenseList flag) etc.
Please take a look.
def __init__(self, licenseList): | ||
super().__init__(licenseList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__(self, licenseList): | |
super().__init__(licenseList) | |
def __init__(self, licenseList, verbose=0): | |
super().__init__(licenseList) |
If verbose type output is planned, The input flag for verbose is defined but not passed. Prone to throw error
fuzzy_similarity_matrix_2 = np.zeros(len(self.licenseList)) | ||
for i in range(len(self.licenseList)): | ||
fuzzy_similarity_matrix_2[i] = fuzz.ratio(appended_comment, self.licenseList.loc[i, 'text']) | ||
if pd.notna(licenseList.loc[i, 'license_header']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if pd.notna(licenseList.loc[i, 'license_header']): | |
if pd.notna(self.licenseList.loc[i, 'license_header']): |
licenseList
variable is not accessible
args = parser.parse_args() | ||
|
||
inputFile = args.inputFile | ||
licenseList = args.processedLicenseList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also make it more reliable, If user doesnt provide processedLicenseList, Agent should pick what we already have :)
Something like:
defaultProcessed = resource_filename("atarashi",
"data/licenses/processedLicenses.csv")
if processedLicense is None:
processedLicense = defaultProcessed
Added a new Semantic Search Agent that can be used as follows:
atarashi -a SemanticSearch /path/to/file.c